共查询到20条相似文献,搜索用时 1 毫秒
1.
Evgueni Gordienko Raúl Montes-De-Oca Adolfo Minjárez-Sosa 《Mathematical Methods of Operations Research》1997,45(2):245-263
The aim of the paper is to show that Lyapunov-like ergodicity conditions on Markov decision processes with Borel state space and possibly unbounded cost provide the approximation of an average cost optimal policy by solvingn-stage optimization problems (n = 1, 2, ...). The used approach ensures the exponential rate of convergence. The approximation of this type would be useful to find adaptive procedures of control and to estimate stability of an optimal control under disturbances of the transition probability.Research supported in part by Consejo Nacional de Ciencia y Tecnologia (CONACYT) under grant 0635P-E9506.Research supported by Fondo del Sistema de Investigatión del Mar de Cortés under Grant SIMAC/94/CT-005. 相似文献
2.
Rolando Cavazos-Cadena 《Applied Mathematics and Optimization》1992,26(2):171-194
We consider discrete-timeaverage reward Markov decision processes with denumerable state space andbounded reward function. Under structural restrictions on the model the existence of an optimal stationary policy is proved; both the lim inf and lim sup average criteria are considered. In contrast to the usual approach our results donot rely on the average regard optimality equation. Rather, the arguments are based on well-known facts fromRenewal Theory.This research was supported in part by the Consejo Nacional de Ciencia y Tecnologia (CONACYT) under Grants PCEXCNA 040640 and 050156, and by SEMAC under Grant 89-1/00ifn$. 相似文献
3.
Emmanuel Fernández-Gaucherand Aristotle Arapostathis Steven I. Marcus 《Annals of Operations Research》1991,29(1):439-469
We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.This research was supported in part by the Advanced Technology Program of the State of Texas, in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, and in part by the Air Force Office of Scientific Research (AFSC) under Contract F49620-89-C-0044. 相似文献
4.
5.
D.J. White 《European Journal of Operational Research》1981,7(4):396-402
This paper deals with a particular type of Markov decision process in which the state takes the form I = S × Z, where S is countable, and Z = {1, 2}, and the action space K = Z, independently of s?S. The state space I is ordered by a partial order ?, which is specified in terms of an integer valued function on S. The action space K has the natural order ≤. Under certain monotonicity and submodularity conditions it is shown that isotone optimal policies exist with respect to ? and ? on I and K respectively. The paper examines how the particular isotone structure may be used to simplify the usual policy space algorithm. A brief discussion of the usual successive approximation (value iteration) method, and also the extension of the ideas to semi-Markov decision processes, is given. 相似文献
6.
Rolando Cavazos-Cadena 《Annals of Operations Research》1991,28(1):3-27
This paper concerns countable state space Markov decision processes endowed with a (long-run expected)average reward criterion. For these models we summarize and, in some cases,extend some recent results on sufficient conditions to establish the existence of optimal stationary policies. The topics considered are the following: (i) the new assumptions introduced by Sennott in [20–23], (ii)necessary and sufficient conditions for the existence of a bounded solution to the optimality equation, and (iii) equivalence of average optimality criteria. Some problems are posed.This research was partially supported by the Third World Academy of Sciences (TWAS) under Grant No. TWAS RG MP 898-152. 相似文献
7.
8.
9.
10.
We construct examples of Markov Decision Processes for which, for a given initial state and for a given nonstationary transient policy, there is no equivalent (randomized) stationary policy, i.e. there is no stationary policy which occupation measure is equal to the occupation measure of a given policy. We also investigate the relation between the existence of equivalent stationary policies in special models and the existence of equivalent strategies in various classes of nonstationary policies in general models. 相似文献
11.
This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled
by non-randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series
of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application
is presented. 相似文献
12.
Hyeong Soo Chang 《Operations Research Letters》2007,35(4):434-438
This brief paper presents a policy improvement method for constrained Markov decision processes (MDPs) with average cost criterion under an ergodicity assumption, extending Howard's policy improvement for MDPs. The improvement method induces a policy iteration-type algorithm that converges to a local optimal policy. 相似文献
13.
Linn I. Sennott 《Annals of Operations Research》1991,28(1):261-271
We deal with countable state Markov decision processes with finite action sets and (possibly) unbounded costs. Assuming the existence of an expected average cost optimal stationary policyf, with expected average costg, when canf andg be found using undiscounted value iteration? We give assumptions guaranteeing the convergence of a quantity related tong?Ν n (i), whereΝ n (i) is the minimum expectedn-stage cost when the process starts in statei. The theory is applied to a queueing system with variable service rates and to a queueing system with variable arrival parameter. 相似文献
14.
We give mild conditions for the existence of optimal solutions for a Markov decision problem with average cost, under m constraints of the same kind, in Borel actions and states spaces. Moreover, there is an optimal policy that is a convex combination of at most m+1 deterministic policies. 相似文献
15.
16.
We consider limiting average Markov decision processes (MDP) with finite state and action spaces. We propose some algorithms to determine optimal strategies for deterministic and general MDPs. These algorithms are based on graph theory and the construction of levels in some aggregated MDP. 相似文献
17.
In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded
reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed
on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover,
we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally,
we use a controlled queueing system to illustrate our conditions.
Supported by NSFC, NCET and RFDP. 相似文献
18.
Using a concept of random fuzzy variables in credibility theory, we formulate a credibilistic model for unichain Markov decision processes under average criteria. And a credibilistically optimal policy is defined and obtained by solving the corresponding non-linear mathematical programming. Also we give a computational example to illustrate the effectiveness of our new model. 相似文献
19.
Ronald Ortner 《Annals of Operations Research》2013,208(1):321-336
We present an algorithm which aggregates online when learning to behave optimally in an average reward Markov decision process. The algorithm is based on the reinforcement learning algorithm UCRL and uses confidence intervals for aggregating the state space. We derive bounds on the regret our algorithm suffers with respect to an optimal policy. These bounds are only slightly worse than the original bounds for UCRL. 相似文献
20.
This paper is devoted to studying continuous-time Markov decision processes with general state and action spaces, under the long-run expected average reward criterion. The transition rates of the underlying continuous-time Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We provide new sufficient conditions for the existence of average optimal policies. Moreover, such sufficient conditions are imposed on the controlled process’ primitive data and thus they are directly verifiable. Finally, we apply our results to two new examples. 相似文献