期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Computational comparison of value iteration algorithms for discounted Markov decision processes

L.C. Thomas R. Harley A.C. Lavercombe 《Operations Research Letters》1983,2(2):72-76

This paper describes a computational comparison of value iteration algorithms for discounted Markov decision processes. 相似文献

2.

Average optimality for continuous-time Markov decision processes with a policy iteration approach

Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704

This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献

3.

Policy iteration for robust nonstationary Markov decision processes

Saumya Sinha Archis Ghate 《Optimization Letters》2016,10(8):1613-1628

Policy iteration is a well-studied algorithm for solving stationary Markov decision processes (MDPs). It has also been extended to robust stationary MDPs. For robust nonstationary MDPs, however, an “as is” execution of this algorithm is not possible because it would call for an infinite amount of computation in each iteration. We therefore present a policy iteration algorithm for robust nonstationary MDPs, which performs finitely implementable approximate variants of policy evaluation and policy improvement in each iteration. We prove that the sequence of cost-to-go functions produced by this algorithm monotonically converges pointwise to the optimal cost-to-go function; the policies generated converge subsequentially to an optimal policy. 相似文献

4.

A K-step look-ahead analysis of value iteration algorithms for Markov decision processes

Meir Herzberg Uri Yechiali 《European Journal of Operational Research》1996

We introduce and analyze a general look-ahead approach for Value Iteration Algorithms used in solving both discounted and undiscounted Markov decision processes. This approach, based on the value-oriented concept interwoven with multiple adaptive relaxation factors, leads to accelerating procedures which perform better than the separate use of either the concept of value oriented or of relaxation. Evaluation and computational considerations of this method are discussed, practical guidelines for implementation are suggested and the suitability of enhancing the method by incorporating Phase 0, Action Elimination procedures and Parallel Processing is indicated. The method was successfully applied to several real problems. We present some numerical results which support the superiority of the developed approach, particularly for undiscounted cases, over other Value Iteration variants. 相似文献

5.

Monotone value iteration for discounted finite Markov decision processes

D.J White 《Journal of Mathematical Analysis and Applications》1985,109(2):311-324

相似文献

6.

A value iteration method for undiscounted multichain Markov decision processes

K. Ohno 《Mathematical Methods of Operations Research》1988,32(2):71-93

This paper proposes a value iteration method which finds an-optimal policy of an undiscounted multichain Markov decision process in a finite number of iterations. The undiscounted multichain Markov decision process is reduced to an aggregated Markov decision process, which utilizes maximal gains of undiscounted Markov decision sub-processes and is formulated as an optimal stopping problem. As a preliminary, sufficient conditions are presented under which a policy is-optimal.

Zusammenfassung In dieser Arbeit wird eine Wertiterationsmethode vorgeschlagen, die eine-optimale Politik für einen undiskontierten nicht-irreduziblen Markovschen Entscheidungsprozeß (MEP) in endlichen vielen Schritten liefert. Der undiskontierte nicht-irreduzible MEP wird auf einen aggregierten MEP reduziert, der maximale Gewinn eines undiskontierten Sub-MEP verwendet und als optimales Stopp-Problem formuliert wird. Zu Beginn werden hinreichende Bedingungen für die-Optimalität einer Politik angegeben.

相似文献

7.

On some algorithms for limiting average Markov decision processes

C. Daoui M. Abbad 《Operations Research Letters》2007,35(2):261-266

We consider limiting average Markov decision processes (MDP) with finite state and action spaces. We propose some algorithms to determine optimal strategies for deterministic and general MDPs. These algorithms are based on graph theory and the construction of levels in some aggregated MDP. 相似文献

8.

Hierarchical algorithms for discounted and weighted Markov decision processes

Abbad M. Daoui C. 《Mathematical Methods of Operations Research》2003,58(2):237-245

We consider a discrete time finite Markov decision process (MDP) with the discounted and weighted reward optimality criteria. In [1] the authors considered some decomposition of limiting average MDPs. In this paper, we use an analogous approach for discounted and weighted MDPs. Then, we construct some hierarchical decomposition algorithms for both discounted and weighted MDPs. 相似文献

9.

Models and algorithms for skip-free Markov decision processes on trees

Edmund J Collins 《The Journal of the Operational Research Society》2015,66(10):1595-1604

We introduce a class of models for multidimensional control problems that we call skip-free Markov decision processes on trees. We describe and analyse an algorithm applicable to Markov decision processes of this type that are skip-free in the negative direction. Starting with the finite average cost case, we show that the algorithm combines the advantages of both value iteration and policy iteration—it is guaranteed to converge to an optimal policy and optimal value function after a finite number of iterations but the computational effort required for each iteration step is comparable with that for value iteration. We show that the algorithm can also be used to solve discounted cost models and continuous-time models, and that a suitably modified algorithm can be used to solve communicating models. 相似文献

10.

A policy improvement method for constrained average Markov decision processes

Hyeong Soo Chang 《Operations Research Letters》2007,35(4):434-438

This brief paper presents a policy improvement method for constrained Markov decision processes (MDPs) with average cost criterion under an ergodicity assumption, extending Howard's policy improvement for MDPs. The improvement method induces a policy iteration-type algorithm that converges to a local optimal policy. 相似文献

11.

Optimal policy for minimizing risk models in Markov decision processes

Y. Ohtsubo K. Toyonaga 《Journal of Mathematical Analysis and Applications》2002,271(1):66-81

We consider the minimizing risk problems in discounted Markov decisions processes with countable state space and bounded general rewards. We characterize optimal values for finite and infinite horizon cases and give two sufficient conditions for the existence of an optimal policy in an infinite horizon case. These conditions are closely connected with Lemma 3 in White (1993), which is not correct as Wu and Lin (1999) point out. We obtain a condition for the lemma to be true, under which we show that there is an optimal policy. Under another condition we show that an optimal value is a unique solution to some optimality equation and there is an optimal policy on a transient set. 相似文献

12.

Value iteration in countable state average cost Markov decision processes with unbounded costs

Linn I. Sennott 《Annals of Operations Research》1991,28(1):261-271

We deal with countable state Markov decision processes with finite action sets and (possibly) unbounded costs. Assuming the existence of an expected average cost optimal stationary policyf, with expected average costg, when canf andg be found using undiscounted value iteration? We give assumptions guaranteeing the convergence of a quantity related tong?Ν _n (i), whereΝ _n (i) is the minimum expectedn-stage cost when the process starts in statei. The theory is applied to a queueing system with variable service rates and to a queueing system with variable arrival parameter. 相似文献

13.

Asymptotic linear programming and policy improvement for singularly perturbed Markov decision processes

Eitan Altman Konstantin E. Avrachenkov Jerzy A. Filar 《Mathematical Methods of Operations Research》1999,49(1):97-109

相似文献

14.

Markov ratio decision processes

V. Aggarwal R. Chandrasekaran K. P. K. Nair 《Journal of Optimization Theory and Applications》1977,21(1):27-37

A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given. 相似文献

15.

Semi-infinite Markov decision processes

Ming Chen Jerzy A. Filar Ke Liu 《Mathematical Methods of Operations Research》2000,51(1):115-137

相似文献

16.

Competing Markov decision processes

K. D. Glazebrook 《Annals of Operations Research》1991,29(1):537-563

A class of discounted Markov decision processes (MDPs) is formed by bringing together individual MDPs sharing the same discount rate. These are in competition in the sense that at each decision epoch a single action is chosen from the union of the action sets of the individual MDPs. Such families of competing MDPs have been used to model a variety of problems in stochastic resource allocation and in the sequential design of experiments. Suppose thatS is a stationary strategy for such a family, thatS* is an optimal strategy and thatR(S),R(S*) denote the respective rewards earned. The paper extends (and explains) existing theory based on the Gittins index to give bounds onR(S*)-R(S) for this important class of processes. The procedures are illustrated by examples taken from the fields of stochastic scheduling and research planning. 相似文献

17.

A policy iteration heuristic for constrained discounted controlled Markov Chains

Hyeong Soo Chang 《Optimization Letters》2012,6(7):1573-1577

This brief paper presents a policy-improvement method of generating a feasible stochastic policy ${\tilde{\pi}}$ from a given feasible stochastic base-policy π such that ${\tilde{\pi}}$ improves all of the feasible policies “induced” from π for infinite-horizon constrained discounted controlled Markov chains (CMCs). A policy-iteration heuristic for approximately solving constrained discounted CMCs is developed from this improvement method. 相似文献

18.

Non-randomized policies for constrained Markov decision processes

Richard C. Chen Eugene A. Feinberg 《Mathematical Methods of Operations Research》2007,66(1):165-179

This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non-randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application is presented. 相似文献

19.

Risk-averse dynamic programming for Markov decision processes

Andrzej Ruszczyński 《Mathematical Programming》2010,125(2):235-261

We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models: a finite horizon model and a discounted infinite horizon model. For both models we derive risk-averse dynamic programming equations and a value iteration method. For the infinite horizon problem we develop a risk-averse policy iteration method and we prove its convergence. We also propose a version of the Newton method to solve a nonsmooth equation arising in the policy iteration method and we prove its global convergence. Finally, we discuss relations to min–max Markov decision models. 相似文献

20.

Discounted Markov games: Generalized policy iteration method

J. Van der Wal 《Journal of Optimization Theory and Applications》1978,25(1):125-138

In this paper, we consider two-person zero-sum discounted Markov games with finite state and action spaces. We show that the Newton-Raphson or policy iteration method as presented by Pollats-chek and Avi-Itzhak does not necessarily converge, contradicting a proof of Rao, Chandrasekaran, and Nair. Moreover, a set of successive approximation algorithms is presented of which Shapley's method and a total-expected-rewards version of Hoffman and Karp's method are the extreme elements. 相似文献