期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Risk-sensitive average continuous-time Markov decision processes with unbounded rates

《Optimization》2012,61(4):773-800

Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs. 相似文献

2.

Unbounded cost Markov decision processes with limsup and liminf average criteria: new conditions

Quanxin Zhu Xianping Guo Yonglong Dai 《Mathematical Methods of Operations Research》2005,61(3):469-482

相似文献

3.

Credibilistic Markov decision processes: The average case

Masayuki Kageyama 《Journal of Computational and Applied Mathematics》2009

Using a concept of random fuzzy variables in credibility theory, we formulate a credibilistic model for unichain Markov decision processes under average criteria. And a credibilistically optimal policy is defined and obtained by solving the corresponding non-linear mathematical programming. Also we give a computational example to illustrate the effectiveness of our new model. 相似文献

4.

Weighted discounted Markov decision processes with perturbation

刘克《应用数学学报(英文版)》1999,15(2):183-189

1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe... 相似文献

5.

Average cost Markov decision processes under the hypothesis of Doeblin

Masami Kurano 《Annals of Operations Research》1991,29(1):375-385

Average cost Markov decision processes (MDPs) with compact state and action spaces and bounded lower semicontinuous cost functions are considered. Kurano [7] has treated the general case in which several ergodic classes and a transient set are permitted for the Markov process induced by any randomized stationary policy under the hypothesis of Doeblin and showed the existence of a minimum pair of state and policy. This paper considers the same case as that discussed in Kurano [7] and proves some new results which give the existence theorem of an optimal stationary policy under some reasonable conditions. 相似文献

6.

Constrained continuous-time Markov decision processes with average criteria

Lanlan Zhang Xianping Guo 《Mathematical Methods of Operations Research》2008,67(2):323-340

In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our conditions. Supported by NSFC, NCET and RFDP. 相似文献

7.

Optimal control in light traffic Markov decision processes

Ger Koole Olaf Passchier 《Mathematical Methods of Operations Research》1997,45(1):63-79

We consider Markov Decision Processes under light traffic conditions. We develop an algorithm to obtain asymptotically optimal policies for both the total discounted and the average cost criterion. This gives a general framework for several light traffic results in the literature. We illustrate the method by deriving the asymptotically optimal control of a simple ATM network. 相似文献

8.

连续时间Markov决策过程的均值-方差优化问题

下载免费PDF全文

叶柳儿 ;黄香香《中国科学:数学》2014,44(8):883-898

本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性. 相似文献

9.

An improved algorithm for solving communicating average reward Markov decision processes 总被引：1，自引：0，他引：1

Moshe Haviv Martin L. Puterman 《Annals of Operations Research》1991,28(1):229-242

This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527. 相似文献

10.

On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes

Emmanuel Fernández-Gaucherand Aristotle Arapostathis Steven I. Marcus 《Annals of Operations Research》1991,29(1):439-469

We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.This research was supported in part by the Advanced Technology Program of the State of Texas, in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, and in part by the Air Force Office of Scientific Research (AFSC) under Contract F49620-89-C-0044. 相似文献

11.

Denumerable state semi-Markov decision processes with unbounded costs,average cost criterion

A. Federgruen A. Hordijk H.C. Tijms 《Stochastic Processes and their Applications》1979,9(2):223-235

This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy. 相似文献

12.

An envelope theorem and some applications to discounted Markov decision processes

Hugo Cruz-Suárez Raúl Montes-de-Oca 《Mathematical Methods of Operations Research》2008,67(2):299-321

In this paper, an Envelope Theorem (ET) will be established for optimization problems on Euclidean spaces. In general, the Envelope Theorems permit analyzing an optimization problem and giving the solution by means of differentiability techniques. The ET will be presented in two versions. One of them uses concavity assumptions, whereas the other one does not require such kind of assumptions. Thereafter, the ET established will be applied to the Markov Decision Processes (MDPs) on Euclidean spaces, discounted and with infinite horizon. As the first application, several examples (including some economic models) of discounted MDPs for which the et allows to determine the value iteration functions will be presented. This will permit to obtain the corresponding optimal value functions and the optimal policies. As the second application of the ET, it will be proved that under differentiability conditions in the transition law, in the reward function, and the noise of the system, the value function and the optimal policy of the problem are differentiable with respect to the state of the system. Besides, various examples to illustrate these differentiability conditions will be provided. This work was partially supported by Benemérita Universidad Aut ónoma de Puebla (BUAP) under grant VIEP-BUAP 38/EXC/06-G, by Consejo Nacional de Ciencia y Tecnología (CONACYT), and by Evaluation-orientation de la COopération Scientifique (ECOS) under grant CONACyT-ECOS M06-M01. 相似文献

13.

Optimal switching among a finite number of Markov processes

B. Doshi 《Journal of Optimization Theory and Applications》1981,35(4):581-610

This paper investigates the problem of the optimal switching among a finite number of Markov processes, generalizing some of the author's earlier results for controlled one-dimensional diffusion. Under rather general conditions, it is shown that the optimal discounted cost function is the unique solution of a functional equation. Under more restrictive assumptions, this function is shown to be the unique solution of some quasi-variational inequalities. These assumptions are verified for a large class of control problems. For controlled Markov chains and controlled one-dimensional diffusion, the existence of a stationary optimal policy is established. Finally, a policy iteration method is developed to calculate an optimal stationary policy, if one exists.This research was sponsored by the Air Force Office of Scientific Research (AFSC), United States Air Force, under Contract No. F-49620-79-C-0165.The author would like to thank the referee for bringing Refs. 7, 8, and 9 to his attention. 相似文献

14.

Another set of conditions for Markov decision processes with average sample-path costs

Quanxin Zhu Xianping Guo 《Journal of Mathematical Analysis and Applications》2006,322(2):1199-1214

This paper deals with discrete-time Markov decision processes with average sample-path costs (ASPC) in Borel spaces. The costs may have neither upper nor lower bounds. We propose new conditions for the existence of ε-ASPC-optimal (deterministic) stationary policies in the class of all randomized history-dependent policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of ASPC optimal stationary policies are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has first been used to study the ASPC criterion. Also, the approach provided here is slightly different from the “optimality equation approach” widely used in the previous literature. On the other hand, under mild assumptions we show that average expected cost optimality and ASPC-optimality are equivalent. Finally, we use a controlled queueing system to illustrate our results. 相似文献

15.

Stochastic approximations of constrained discounted Markov decision processes

François Dufour Tomás Prieto-Rumeau 《Journal of Mathematical Analysis and Applications》2014

We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure _μ_n

μ_{n}

of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model. 相似文献

16.

An average-value-at-risk criterion for Markov decision processes with unbounded costs

Qiuli LIU Wai-Ki CHING Junyu ZHANG Hongchu WANG 《Frontiers of Mathematics in China》2022,17(4):673

We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model. 相似文献

17.

Approximation of average cost optimal policies for general Markov decision processes with unbounded costs

Evgueni Gordienko Raúl Montes-De-Oca Adolfo Minjárez-Sosa 《Mathematical Methods of Operations Research》1997,45(2):245-263

The aim of the paper is to show that Lyapunov-like ergodicity conditions on Markov decision processes with Borel state space and possibly unbounded cost provide the approximation of an average cost optimal policy by solvingn-stage optimization problems (n = 1, 2, ...). The used approach ensures the exponential rate of convergence. The approximation of this type would be useful to find adaptive procedures of control and to estimate stability of an optimal control under disturbances of the transition probability.Research supported in part by Consejo Nacional de Ciencia y Tecnologia (CONACYT) under grant 0635P-E9506.Research supported by Fondo del Sistema de Investigatión del Mar de Cortés under Grant SIMAC/94/CT-005. 相似文献

18.

Finite-horizon piecewise deterministic Markov decision processes with unbounded transition rates

Yonghui Huang Xianping Guo 《Stochastics An International Journal of Probability and Stochastic Processes》2019,91(1):67-95

This paper is concerned with the problem of minimizing the expected finite-horizon cost for piecewise deterministic Markov decision processes. The transition rates may be unbounded, and the cost functions are allowed to be unbounded from above and from below. The optimality is over the general history-dependent policies, where the control is continuously acting in time. The infinitesimal approach is employed to establish the associated Hamilton-Jacobi-Bellman equation, via which the existence of optimal policies is proved. An example is provided to verify all the assumptions proposed. 相似文献

19.

Weighted Markov decision processes with perturbation

Ke Liu Jerzy A. Filar 《Mathematical Methods of Operations Research》2001,53(3):465-480

相似文献

20.

On the optimality equation for average cost Markov control processes with Feller transition probabilities

Anna Ja?kiewicz Andrzej S. Nowak 《Journal of Mathematical Analysis and Applications》2006,316(2):495-509

We consider Markov control processes with Borel state space and Feller transition probabilities, satisfying some generalized geometric ergodicity conditions. We provide a new theorem on the existence of a solution to the average cost optimality equation. 相似文献