共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Edilson F. Arruda 《Operations Research Letters》2011,39(3):193-197
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms. 相似文献
3.
4.
This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation. 相似文献
5.
MARKOV DECISION PROGRAMMING WITH CONSTRAINTS 总被引:1,自引:0,他引:1
MARKOVDECISIONPROGRAMMINGWITHCONSTRAINTSLIUJIANYONG(刘建庸);LIUKE(刘克)(InstituteofAppliedMathematics,theChineseAcademyofSciences,... 相似文献
6.
V. Aggarwal R. Chandrasekaran K. P. K. Nair 《Journal of Optimization Theory and Applications》1977,21(1):27-37
A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given. 相似文献
7.
8.
K. D. Glazebrook 《Annals of Operations Research》1991,29(1):537-563
A class of discounted Markov decision processes (MDPs) is formed by bringing together individual MDPs sharing the same discount rate. These are in competition in the sense that at each decision epoch a single action is chosen from the union of the action sets of the individual MDPs. Such families of competing MDPs have been used to model a variety of problems in stochastic resource allocation and in the sequential design of experiments. Suppose thatS is a stationary strategy for such a family, thatS* is an optimal strategy and thatR(S),R(S*) denote the respective rewards earned. The paper extends (and explains) existing theory based on the Gittins index to give bounds onR(S*)-R(S) for this important class of processes. The procedures are illustrated by examples taken from the fields of stochastic scheduling and research planning. 相似文献
9.
This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled
by non-randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series
of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application
is presented. 相似文献
10.
Qingda Wei 《4OR: A Quarterly Journal of Operations Research》2017,15(1):67-84
In this paper we study the continuous-time Markov decision processes with a denumerable state space, a Borel action space, and unbounded transition and cost rates. The optimality criterion to be considered is the finite-horizon expected total cost criterion. Under the suitable conditions, we propose a finite approximation for the approximate computations of an optimal policy and the value function, and obtain the corresponding error estimations. Furthermore, our main results are illustrated with a controlled birth and death system. 相似文献
11.
D.J. White 《European Journal of Operational Research》1981,7(4):396-402
This paper deals with a particular type of Markov decision process in which the state takes the form I = S × Z, where S is countable, and Z = {1, 2}, and the action space K = Z, independently of s?S. The state space I is ordered by a partial order ?, which is specified in terms of an integer valued function on S. The action space K has the natural order ≤. Under certain monotonicity and submodularity conditions it is shown that isotone optimal policies exist with respect to ? and ? on I and K respectively. The paper examines how the particular isotone structure may be used to simplify the usual policy space algorithm. A brief discussion of the usual successive approximation (value iteration) method, and also the extension of the ideas to semi-Markov decision processes, is given. 相似文献
12.
We give mild conditions for the existence of optimal solutions for a Markov decision problem with average cost, under m constraints of the same kind, in Borel actions and states spaces. Moreover, there is an optimal policy that is a convex combination of at most m+1 deterministic policies. 相似文献
13.
Policy iteration is a well-studied algorithm for solving stationary Markov decision processes (MDPs). It has also been extended to robust stationary MDPs. For robust nonstationary MDPs, however, an “as is” execution of this algorithm is not possible because it would call for an infinite amount of computation in each iteration. We therefore present a policy iteration algorithm for robust nonstationary MDPs, which performs finitely implementable approximate variants of policy evaluation and policy improvement in each iteration. We prove that the sequence of cost-to-go functions produced by this algorithm monotonically converges pointwise to the optimal cost-to-go function; the policies generated converge subsequentially to an optimal policy. 相似文献
14.
Q. X. Zhu 《Mathematical Methods of Operations Research》2007,65(3):519-538
This paper studies both the average sample-path reward (ASPR) criterion and the limiting average variance criterion for denumerable discrete-time Markov decision processes. The rewards may have neither upper nor lower bounds. We give sufficient conditions on the system’s primitive data and under which we prove the existence of ASPR-optimal stationary policies and variance optimal policies. Our conditions
are weaker than those in the previous literature. Moreover, our results are illustrated by a controlled queueing system.
Research partially supported by the Natural Science Foundation of Guangdong Province (Grant No: 06025063) and the Natural
Science Foundation of China (Grant No: 10626021). 相似文献
15.
This paper deals with the bias optimality of multichain models for finite continuous-time Markov decision processes. Based on new performance difference formulas developed here, we prove the convergence of a so-called bias-optimal policy iteration algorithm, which can be used to obtain bias-optimal policies in a finite number of iterations. 相似文献
16.
M. Sun 《Journal of Optimization Theory and Applications》1993,79(2):405-413
We introduce a revised simplex algorithm for solving a typical type of dynamic programming equation arising from a class of finite Markov decision processes. The algorithm also applies to several types of optimal control problems with diffusion models after discretization. It is based on the regular simplex algorithm, the duality concept in linear programming, and certain special features of the dynamic programming equation itself. Convergence is established for the new algorithm. The algorithm has favorable potential applicability when the number of actions is very large or even infinite. 相似文献
17.
18.
19.
20.
The Markov decision process is studied under the maximization of the probability that total discounted rewards exceed a target level. We focus on and study the dynamic programing equations of the model. We give various properties of the optimal return operator and, for the infinite planning-horizon model, we characterize the optimal value function as a maximal fixed point of the previous operator. Various turnpike results relating the finite and infinite-horizon models are also given. 相似文献