期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Accelerated modified policy iteration algorithms for Markov decision processes

Oleksandr Shlakhter Chi-Guhn Lee 《Mathematical Methods of Operations Research》2013,78(1):61-76

We propose a new approach to accelerate the convergence of the modified policy iteration method for Markov decision processes with the total expected discounted reward. In the new policy iteration an additional operator is applied to the iterate generated by Markov operator, resulting in a bigger improvement in each iteration. 相似文献

2.

Monotone value iteration for discounted finite Markov decision processes

D.J White 《Journal of Mathematical Analysis and Applications》1985,109(2):311-324

相似文献

3.

A new strong optimality criterion for nonstationary Markov decision processes

Xianping Guo Peng Shi Weiping Zhu 《Mathematical Methods of Operations Research》2000,52(2):287-306

This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given. 相似文献

4.

A value iteration method for undiscounted multichain Markov decision processes

K. Ohno 《Mathematical Methods of Operations Research》1988,32(2):71-93

This paper proposes a value iteration method which finds an-optimal policy of an undiscounted multichain Markov decision process in a finite number of iterations. The undiscounted multichain Markov decision process is reduced to an aggregated Markov decision process, which utilizes maximal gains of undiscounted Markov decision sub-processes and is formulated as an optimal stopping problem. As a preliminary, sufficient conditions are presented under which a policy is-optimal.

Zusammenfassung In dieser Arbeit wird eine Wertiterationsmethode vorgeschlagen, die eine-optimale Politik für einen undiskontierten nicht-irreduziblen Markovschen Entscheidungsprozeß (MEP) in endlichen vielen Schritten liefert. Der undiskontierte nicht-irreduzible MEP wird auf einen aggregierten MEP reduziert, der maximale Gewinn eines undiskontierten Sub-MEP verwendet und als optimales Stopp-Problem formuliert wird. Zu Beginn werden hinreichende Bedingungen für die-Optimalität einer Politik angegeben.

相似文献

5.

Computational comparison of value iteration algorithms for discounted Markov decision processes

L.C. Thomas R. Harley A.C. Lavercombe 《Operations Research Letters》1983,2(2):72-76

This paper describes a computational comparison of value iteration algorithms for discounted Markov decision processes. 相似文献

6.

Average optimality for continuous-time Markov decision processes with a policy iteration approach

Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704

This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献

7.

Some notions of nonstationary multistep iteration processes

I. K. Argyros F. Szidarovszky 《Acta Mathematica Hungarica》1994,64(1):59-64

相似文献

8.

A K-step look-ahead analysis of value iteration algorithms for Markov decision processes

Meir Herzberg Uri Yechiali 《European Journal of Operational Research》1996

We introduce and analyze a general look-ahead approach for Value Iteration Algorithms used in solving both discounted and undiscounted Markov decision processes. This approach, based on the value-oriented concept interwoven with multiple adaptive relaxation factors, leads to accelerating procedures which perform better than the separate use of either the concept of value oriented or of relaxation. Evaluation and computational considerations of this method are discussed, practical guidelines for implementation are suggested and the suitability of enhancing the method by incorporating Phase 0, Action Elimination procedures and Parallel Processing is indicated. The method was successfully applied to several real problems. We present some numerical results which support the superiority of the developed approach, particularly for undiscounted cases, over other Value Iteration variants. 相似文献

9.

Value iteration in countable state average cost Markov decision processes with unbounded costs

Linn I. Sennott 《Annals of Operations Research》1991,28(1):261-271

We deal with countable state Markov decision processes with finite action sets and (possibly) unbounded costs. Assuming the existence of an expected average cost optimal stationary policyf, with expected average costg, when canf andg be found using undiscounted value iteration? We give assumptions guaranteeing the convergence of a quantity related tong?Ν _n (i), whereΝ _n (i) is the minimum expectedn-stage cost when the process starts in statei. The theory is applied to a queueing system with variable service rates and to a queueing system with variable arrival parameter. 相似文献

10.

Forecast horizon in nonstationary Markov decision problems

《Optimization》2012,61(6):853-857

The paper extends the concept of decision and forecast horizons from classes of stationary to classes of nonstationary Markov decision problems. The horizons are explicitly obtained for a family of inventory models. The family is indexed by nonstationary Markov chains and deterministic sequences. For the proof only reference to simlier work on the stationary case is made. 相似文献

11.

Extreme point characterization of constrained nonstationary infinite-horizon Markov decision processes with finite state space

《Operations Research Letters》2014,42(3):238-245

相似文献

12.

Semi-infinite Markov decision processes

Ming Chen Jerzy A. Filar Ke Liu 《Mathematical Methods of Operations Research》2000,51(1):115-137

相似文献

13.

Markov ratio decision processes

V. Aggarwal R. Chandrasekaran K. P. K. Nair 《Journal of Optimization Theory and Applications》1977,21(1):27-37

A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given. 相似文献

14.

Competing Markov decision processes

K. D. Glazebrook 《Annals of Operations Research》1991,29(1):537-563

A class of discounted Markov decision processes (MDPs) is formed by bringing together individual MDPs sharing the same discount rate. These are in competition in the sense that at each decision epoch a single action is chosen from the union of the action sets of the individual MDPs. Such families of competing MDPs have been used to model a variety of problems in stochastic resource allocation and in the sequential design of experiments. Suppose thatS is a stationary strategy for such a family, thatS* is an optimal strategy and thatR(S),R(S*) denote the respective rewards earned. The paper extends (and explains) existing theory based on the Gittins index to give bounds onR(S*)-R(S) for this important class of processes. The procedures are illustrated by examples taken from the fields of stochastic scheduling and research planning. 相似文献

15.

Semi-infinite discounted Markov decision processes: Policy improvement and singular perturbations

Mohammed Abbad Khalid Rahhali 《Mathematical Methods of Operations Research》2001,54(2):279-290

相似文献

16.

Non-randomized policies for constrained Markov decision processes

Richard C. Chen Eugene A. Feinberg 《Mathematical Methods of Operations Research》2007,66(1):165-179

This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non-randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application is presented. 相似文献

17.

Risk-averse dynamic programming for Markov decision processes

Andrzej Ruszczyński 《Mathematical Programming》2010,125(2):235-261

We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models: a finite horizon model and a discounted infinite horizon model. For both models we derive risk-averse dynamic programming equations and a value iteration method. For the infinite horizon problem we develop a risk-averse policy iteration method and we prove its convergence. We also propose a version of the Newton method to solve a nonsmooth equation arising in the policy iteration method and we prove its global convergence. Finally, we discuss relations to min–max Markov decision models. 相似文献

18.

Finite approximation for finite-horizon continuous-time Markov decision processes

Qingda Wei 《4OR: A Quarterly Journal of Operations Research》2017,15(1):67-84

In this paper we study the continuous-time Markov decision processes with a denumerable state space, a Borel action space, and unbounded transition and cost rates. The optimality criterion to be considered is the finite-horizon expected total cost criterion. Under the suitable conditions, we propose a finite approximation for the approximate computations of an optimal policy and the value function, and obtain the corresponding error estimations. Furthermore, our main results are illustrated with a controlled birth and death system. 相似文献

19.

Isotone optimal policies for structured Markov decision processes

D.J. White 《European Journal of Operational Research》1981,7(4):396-402

This paper deals with a particular type of Markov decision process in which the state takes the form I = S × Z, where S is countable, and Z = {1, 2}, and the action space K = Z, independently of s?S. The state space I is ordered by a partial order ?, which is specified in terms of an integer valued function on S. The action space K has the natural order ≤. Under certain monotonicity and submodularity conditions it is shown that isotone optimal policies exist with respect to ? and ? on I and K respectively. The paper examines how the particular isotone structure may be used to simplify the usual policy space algorithm. A brief discussion of the usual successive approximation (value iteration) method, and also the extension of the ideas to semi-Markov decision processes, is given. 相似文献

20.

Optimal policies for constrained average-cost Markov decision processes

Juan Gonz��lez-Hern��ndez C��sar E. Villarreal 《TOP》2011,19(1):107-120

We give mild conditions for the existence of optimal solutions for a Markov decision problem with average cost, under m constraints of the same kind, in Borel actions and states spaces. Moreover, there is an optimal policy that is a convex combination of at most m+1 deterministic policies. 相似文献