期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

叶柳儿 ;黄香香《中国科学:数学》2014,44(8):883-898

本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性. 相似文献

2.

Weighted discounted Markov decision processes with perturbation

刘克《应用数学学报(英文版)》1999,15(2):183-189

1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe... 相似文献

3.

Optimal control in light traffic Markov decision processes

Ger Koole Olaf Passchier 《Mathematical Methods of Operations Research》1997,45(1):63-79

We consider Markov Decision Processes under light traffic conditions. We develop an algorithm to obtain asymptotically optimal policies for both the total discounted and the average cost criterion. This gives a general framework for several light traffic results in the literature. We illustrate the method by deriving the asymptotically optimal control of a simple ATM network. 相似文献

4.

Credibilistic Markov decision processes: The average case

Masayuki Kageyama 《Journal of Computational and Applied Mathematics》2009

Using a concept of random fuzzy variables in credibility theory, we formulate a credibilistic model for unichain Markov decision processes under average criteria. And a credibilistically optimal policy is defined and obtained by solving the corresponding non-linear mathematical programming. Also we give a computational example to illustrate the effectiveness of our new model. 相似文献

5.

Markov decision processes with state-dependent discount factors and unbounded rewards/costs

Qingda Wei Xianping Guo 《Operations Research Letters》2011,39(5):369-374

This paper deals with discrete-time Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Under general conditions, we develop an iteration algorithm for computing the optimal value function, and also prove the existence of optimal stationary policies. Furthermore, we illustrate our results with a cash-balance model. 相似文献

6.

Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities

《Stochastics An International Journal of Probability and Stochastic Processes》2013,85(2):273-307

相似文献

7.

Average optimality for continuous-time Markov decision processes with a policy iteration approach

Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704

This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献

8.

An improved algorithm for solving communicating average reward Markov decision processes 总被引：1，自引：0，他引：1

Moshe Haviv Martin L. Puterman 《Annals of Operations Research》1991,28(1):229-242

This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527. 相似文献

9.

Markov ratio decision processes

V. Aggarwal R. Chandrasekaran K. P. K. Nair 《Journal of Optimization Theory and Applications》1977,21(1):27-37

A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given. 相似文献

10.

Revised simplex algorithm for finite Markov decision processes

M. Sun 《Journal of Optimization Theory and Applications》1993,79(2):405-413

We introduce a revised simplex algorithm for solving a typical type of dynamic programming equation arising from a class of finite Markov decision processes. The algorithm also applies to several types of optimal control problems with diffusion models after discretization. It is based on the regular simplex algorithm, the duality concept in linear programming, and certain special features of the dynamic programming equation itself. Convergence is established for the new algorithm. The algorithm has favorable potential applicability when the number of actions is very large or even infinite. 相似文献

11.

Unbounded cost Markov decision processes with limsup and liminf average criteria: new conditions

Quanxin Zhu Xianping Guo Yonglong Dai 《Mathematical Methods of Operations Research》2005,61(3):469-482

相似文献

12.

Optimal policy for minimizing risk models in Markov decision processes

Y. Ohtsubo K. Toyonaga 《Journal of Mathematical Analysis and Applications》2002,271(1):66-81

We consider the minimizing risk problems in discounted Markov decisions processes with countable state space and bounded general rewards. We characterize optimal values for finite and infinite horizon cases and give two sufficient conditions for the existence of an optimal policy in an infinite horizon case. These conditions are closely connected with Lemma 3 in White (1993), which is not correct as Wu and Lin (1999) point out. We obtain a condition for the lemma to be true, under which we show that there is an optimal policy. Under another condition we show that an optimal value is a unique solution to some optimality equation and there is an optimal policy on a transient set. 相似文献

13.

An average-value-at-risk criterion for Markov decision processes with unbounded costs

Qiuli LIU Wai-Ki CHING Junyu ZHANG Hongchu WANG 《Frontiers of Mathematics in China》2022,17(4):673

We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model. 相似文献

14.

Semi-infinite Markov decision processes

Ming Chen Jerzy A. Filar Ke Liu 《Mathematical Methods of Operations Research》2000,51(1):115-137

相似文献

15.

Finite-horizon piecewise deterministic Markov decision processes with unbounded transition rates

Yonghui Huang Xianping Guo 《Stochastics An International Journal of Probability and Stochastic Processes》2019,91(1):67-95

This paper is concerned with the problem of minimizing the expected finite-horizon cost for piecewise deterministic Markov decision processes. The transition rates may be unbounded, and the cost functions are allowed to be unbounded from above and from below. The optimality is over the general history-dependent policies, where the control is continuously acting in time. The infinitesimal approach is employed to establish the associated Hamilton-Jacobi-Bellman equation, via which the existence of optimal policies is proved. An example is provided to verify all the assumptions proposed. 相似文献

16.

Adaptive control of Markov processes with incomplete state information and unknown parameters

O. Hernandez-Lerma S. I. Marcus 《Journal of Optimization Theory and Applications》1987,52(2):227-241

Recent results for parameter-adaptive Markov decision processes (MDP's) are extended to partially observed MDP's depending on unknown parameters. These results include approximations converging uniformly to the optimal reward function and asymptotically optimal adaptive policies.This research was supported in part by the Consejo del Sistema Nacional de Educación Tecnologica (COSNET) under Grant 178/84, in part by the Air Force Office of Scientific Research under Grant AFOSR-84-0089, in part by the National Science Foundation under Grant ECS-84-12100, and in part by the Joint Services Electronics Program under Contract F49602-82-C-0033. 相似文献

17.

A useful technique for piecewise deterministic Markov decision processes

《Operations Research Letters》2021,49(1):55-61

This note presents a technique that is useful for the study of piecewise deterministic Markov decision processes (PDMDPs) with general policies and unbounded transition intensities. This technique produces an auxiliary PDMDP from the original one. The auxiliary PDMDP possesses certain desired properties, which may not be possessed by the original PDMDP. We apply this technique to risk-sensitive PDMDPs with total cost criteria, and comment on its connection with the uniformization technique. 相似文献

18.

Weighted Markov decision processes with perturbation

Ke Liu Jerzy A. Filar 《Mathematical Methods of Operations Research》2001,53(3):465-480

相似文献

19.

Stochastic approximations of constrained discounted Markov decision processes

François Dufour Tomás Prieto-Rumeau 《Journal of Mathematical Analysis and Applications》2014

We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure _μ_n

μ_{n}

of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model. 相似文献

20.

Metastability of reversible finite state Markov processes

J. Beltrán 《Stochastic Processes and their Applications》2011,121(8):1633-1677

We prove the metastable behavior of reversible Markov processes on finite state spaces under minimal conditions on the jump rates. To illustrate the result we deduce the metastable behavior of the Ising model with a small magnetic field at very low temperature. 相似文献