期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Approximation of average cost optimal policies for general Markov decision processes with unbounded costs

Evgueni Gordienko Raúl Montes-De-Oca Adolfo Minjárez-Sosa 《Mathematical Methods of Operations Research》1997,45(2):245-263

The aim of the paper is to show that Lyapunov-like ergodicity conditions on Markov decision processes with Borel state space and possibly unbounded cost provide the approximation of an average cost optimal policy by solvingn-stage optimization problems (n = 1, 2, ...). The used approach ensures the exponential rate of convergence. The approximation of this type would be useful to find adaptive procedures of control and to estimate stability of an optimal control under disturbances of the transition probability.Research supported in part by Consejo Nacional de Ciencia y Tecnologia (CONACYT) under grant 0635P-E9506.Research supported by Fondo del Sistema de Investigatión del Mar de Cortés under Grant SIMAC/94/CT-005. 相似文献

2.

Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state

Rolando Cavazos-Cadena 《Applied Mathematics and Optimization》1992,26(2):171-194

We consider discrete-timeaverage reward Markov decision processes with denumerable state space andbounded reward function. Under structural restrictions on the model the existence of an optimal stationary policy is proved; both the lim inf and lim sup average criteria are considered. In contrast to the usual approach our results donot rely on the average regard optimality equation. Rather, the arguments are based on well-known facts fromRenewal Theory.This research was supported in part by the Consejo Nacional de Ciencia y Tecnologia (CONACYT) under Grants PCEXCNA 040640 and 050156, and by SEMAC under Grant 89-1/00ifn$. 相似文献

3.

On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes

Emmanuel Fernández-Gaucherand Aristotle Arapostathis Steven I. Marcus 《Annals of Operations Research》1991,29(1):439-469

We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.This research was supported in part by the Advanced Technology Program of the State of Texas, in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, and in part by the Air Force Office of Scientific Research (AFSC) under Contract F49620-89-C-0044. 相似文献

4.

Value iteration and approximately optimal stationary policies in finite-state average Markov decision chains

Rolando Cavazos-Cadena Rolando Cavazos-Cadena 《Mathematical Methods of Operations Research》2002,56(2):181-196

相似文献

5.

Isotone optimal policies for structured Markov decision processes

D.J. White 《European Journal of Operational Research》1981,7(4):396-402

This paper deals with a particular type of Markov decision process in which the state takes the form I = S × Z, where S is countable, and Z = {1, 2}, and the action space K = Z, independently of s?S. The state space I is ordered by a partial order ?, which is specified in terms of an integer valued function on S. The action space K has the natural order ≤. Under certain monotonicity and submodularity conditions it is shown that isotone optimal policies exist with respect to ? and ? on I and K respectively. The paper examines how the particular isotone structure may be used to simplify the usual policy space algorithm. A brief discussion of the usual successive approximation (value iteration) method, and also the extension of the ideas to semi-Markov decision processes, is given. 相似文献

6.

Recent results on conditions for the existence of average optimal stationary policies

Rolando Cavazos-Cadena 《Annals of Operations Research》1991,28(1):3-27

This paper concerns countable state space Markov decision processes endowed with a (long-run expected)average reward criterion. For these models we summarize and, in some cases,extend some recent results on sufficient conditions to establish the existence of optimal stationary policies. The topics considered are the following: (i) the new assumptions introduced by Sennott in [20–23], (ii)necessary and sufficient conditions for the existence of a bounded solution to the optimality equation, and (iii) equivalence of average optimality criteria. Some problems are posed.This research was partially supported by the Third World Academy of Sciences (TWAS) under Grant No. TWAS RG MP 898-152. 相似文献

7.

Conditions for the uniqueness of optimal policies of discounted Markov decision processes

Daniel?Cruz-Suárez Raúl?Montes-de-Oca Email author Francisco?Salem-Silva 《Mathematical Methods of Operations Research》2004,60(3):415-436

相似文献

8.

Continuity of the optimal average cost in Markov decision chains with small risk-sensitivity

Selene Chávez-Rodríguez Rolando Cavazos-Cadena Hugo Cruz-Suárez 《Mathematical Methods of Operations Research》2015,81(3):269-298

相似文献

9.

Unbounded cost Markov decision processes with limsup and liminf average criteria: new conditions

Quanxin Zhu Xianping Guo Yonglong Dai 《Mathematical Methods of Operations Research》2005,61(3):469-482

相似文献

10.

Notes on equivalent stationary policies in Markov decision processes with total rewards

Eugene A. Feinberg Isaac M. Sonin 《Mathematical Methods of Operations Research》1996,44(2):205-221

We construct examples of Markov Decision Processes for which, for a given initial state and for a given nonstationary transient policy, there is no equivalent (randomized) stationary policy, i.e. there is no stationary policy which occupation measure is equal to the occupation measure of a given policy. We also investigate the relation between the existence of equivalent stationary policies in special models and the existence of equivalent strategies in various classes of nonstationary policies in general models. 相似文献

11.

Non-randomized policies for constrained Markov decision processes

Richard C. Chen Eugene A. Feinberg 《Mathematical Methods of Operations Research》2007,66(1):165-179

This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non-randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application is presented. 相似文献

12.

A policy improvement method for constrained average Markov decision processes

Hyeong Soo Chang 《Operations Research Letters》2007,35(4):434-438

This brief paper presents a policy improvement method for constrained Markov decision processes (MDPs) with average cost criterion under an ergodicity assumption, extending Howard's policy improvement for MDPs. The improvement method induces a policy iteration-type algorithm that converges to a local optimal policy. 相似文献

13.

Value iteration in countable state average cost Markov decision processes with unbounded costs

Linn I. Sennott 《Annals of Operations Research》1991,28(1):261-271

We deal with countable state Markov decision processes with finite action sets and (possibly) unbounded costs. Assuming the existence of an expected average cost optimal stationary policyf, with expected average costg, when canf andg be found using undiscounted value iteration? We give assumptions guaranteeing the convergence of a quantity related tong?Ν _n (i), whereΝ _n (i) is the minimum expectedn-stage cost when the process starts in statei. The theory is applied to a queueing system with variable service rates and to a queueing system with variable arrival parameter. 相似文献

14.

Optimal policies for constrained average-cost Markov decision processes

Juan Gonz��lez-Hern��ndez C��sar E. Villarreal 《TOP》2011,19(1):107-120

We give mild conditions for the existence of optimal solutions for a Markov decision problem with average cost, under m constraints of the same kind, in Borel actions and states spaces. Moreover, there is an optimal policy that is a convex combination of at most m+1 deterministic policies. 相似文献

15.

Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities

《Stochastics An International Journal of Probability and Stochastic Processes》2013,85(2):273-307

相似文献

16.

On some algorithms for limiting average Markov decision processes

C. Daoui M. Abbad 《Operations Research Letters》2007,35(2):261-266

We consider limiting average Markov decision processes (MDP) with finite state and action spaces. We propose some algorithms to determine optimal strategies for deterministic and general MDPs. These algorithms are based on graph theory and the construction of levels in some aggregated MDP. 相似文献

17.

Constrained continuous-time Markov decision processes with average criteria

Lanlan Zhang Xianping Guo 《Mathematical Methods of Operations Research》2008,67(2):323-340

In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our conditions. Supported by NSFC, NCET and RFDP. 相似文献

18.

Credibilistic Markov decision processes: The average case

Masayuki Kageyama 《Journal of Computational and Applied Mathematics》2009

Using a concept of random fuzzy variables in credibility theory, we formulate a credibilistic model for unichain Markov decision processes under average criteria. And a credibilistically optimal policy is defined and obtained by solving the corresponding non-linear mathematical programming. Also we give a computational example to illustrate the effectiveness of our new model. 相似文献

19.

Adaptive aggregation for reinforcement learning in average reward Markov decision processes

Ronald Ortner 《Annals of Operations Research》2013,208(1):321-336

We present an algorithm which aggregates online when learning to behave optimally in an average reward Markov decision process. The algorithm is based on the reinforcement learning algorithm UCRL and uses confidence intervals for aggregating the state space. We derive bounds on the regret our algorithm suffers with respect to an optimal policy. These bounds are only slightly worse than the original bounds for UCRL. 相似文献

20.

New sufficient conditions for average optimality in continuous-time Markov decision processes

Liuer Ye Xianping Guo 《Mathematical Methods of Operations Research》2010,72(1):75-94

This paper is devoted to studying continuous-time Markov decision processes with general state and action spaces, under the long-run expected average reward criterion. The transition rates of the underlying continuous-time Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We provide new sufficient conditions for the existence of average optimal policies. Moreover, such sufficient conditions are imposed on the controlled process’ primitive data and thus they are directly verifiable. Finally, we apply our results to two new examples. 相似文献