期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Stopped Markov decision processes with multiple constraints

Masayuki Horiguchi 《Mathematical Methods of Operations Research》2001,54(3):455-469

相似文献

2.

Average cost Markov decision processes under the hypothesis of Doeblin

Masami Kurano 《Annals of Operations Research》1991,29(1):375-385

Average cost Markov decision processes (MDPs) with compact state and action spaces and bounded lower semicontinuous cost functions are considered. Kurano [7] has treated the general case in which several ergodic classes and a transient set are permitted for the Markov process induced by any randomized stationary policy under the hypothesis of Doeblin and showed the existence of a minimum pair of state and policy. This paper considers the same case as that discussed in Kurano [7] and proves some new results which give the existence theorem of an optimal stationary policy under some reasonable conditions. 相似文献

3.

Stochastic approximations of constrained discounted Markov decision processes

François Dufour Tomás Prieto-Rumeau 《Journal of Mathematical Analysis and Applications》2014

We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure _μ_n

μ_{n}

of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model. 相似文献

4.

First passage Markov decision processes with constraints and varying discount factors

Xiao WU Xiaolong ZOU Xianping GUO 《Frontiers of Mathematics in China》2015,10(4):1005

This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results. 相似文献

5.

Time aggregated Markov decision processes via standard dynamic programming

Edilson F. Arruda 《Operations Research Letters》2011,39(3):193-197

This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms. 相似文献

6.

Revised simplex algorithm for finite Markov decision processes

M. Sun 《Journal of Optimization Theory and Applications》1993,79(2):405-413

We introduce a revised simplex algorithm for solving a typical type of dynamic programming equation arising from a class of finite Markov decision processes. The algorithm also applies to several types of optimal control problems with diffusion models after discretization. It is based on the regular simplex algorithm, the duality concept in linear programming, and certain special features of the dynamic programming equation itself. Convergence is established for the new algorithm. The algorithm has favorable potential applicability when the number of actions is very large or even infinite. 相似文献

7.

Target-level criterion in Markov decision processes

M. Bouakiz Y. Kebir 《Journal of Optimization Theory and Applications》1995,86(1):1-15

The Markov decision process is studied under the maximization of the probability that total discounted rewards exceed a target level. We focus on and study the dynamic programing equations of the model. We give various properties of the optimal return operator and, for the infinite planning-horizon model, we characterize the optimal value function as a maximal fixed point of the previous operator. Various turnpike results relating the finite and infinite-horizon models are also given. 相似文献

8.

Markov decision processes with a stopping time constraint

Masayuki Horiguchi 《Mathematical Methods of Operations Research》2001,53(2):279-295

相似文献

9.

Constrained Markov decision processes with uncertain costs

《Operations Research Letters》2022,50(2):218-223

We consider a finite state-action discounted constrained Markov decision process with uncertain running costs and known transition probabilities. We propose equivalent linear programming, second-order cone programming and semidefinite programming problems for the robust constrained Markov decision processes when the uncertain running cost vectors belong to polytopic, ellipsoidal, and semidefinite cone uncertainty sets, respectively. As an application, we study a variant of a machine replacement problem and perform numerical experiments on randomly generated instances of various sizes. 相似文献

10.

On an extremal property of Markov chains and sufficiency of Markov strategies in Markov decision processes with the Dubins-Savage criterion

I. M. Sonin 《Annals of Operations Research》1991,29(1):417-426

An inequality regarding the minimum ofP(lim inf(X _n D _n)) is proved for a class of random sequences. This result is related to the problem of sufficiency of Markov strategies for Markov decision processes with the Dubins-Savage criterion, the asymptotical behaviour of nonhomogeneous Markov chains, and some other problems. 相似文献

11.

A policy improvement method for constrained average Markov decision processes

Hyeong Soo Chang 《Operations Research Letters》2007,35(4):434-438

This brief paper presents a policy improvement method for constrained Markov decision processes (MDPs) with average cost criterion under an ergodicity assumption, extending Howard's policy improvement for MDPs. The improvement method induces a policy iteration-type algorithm that converges to a local optimal policy. 相似文献

12.

Markov ratio decision processes

V. Aggarwal R. Chandrasekaran K. P. K. Nair 《Journal of Optimization Theory and Applications》1977,21(1):27-37

A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given. 相似文献

13.

Convergence of Markov decision processes with constraints and state-action dependent discount factors

Wu Xiao Guo Xianping 《中国科学数学(英文版)》2020,63(1):167-182

This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the"limit"one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation. 相似文献

14.

Markov decision processes for infinite horizon problems solved with the cosine simplex method

《Optimization》2012,61(9):1133-1150

This article presents a new method of linear programming (LP) for solving Markov decision processes (MDPs) based on the simplex method (SM). SM has shown to be the most efficient method in many practical problems; unfortunately, classical SM has an exponential complexity. Therefore, new SMs have emerged for obtaining optimal solutions in the most efficient way. The cosine simplex method (CSM) is one of them. CSM is based on the Karush Kuhn Tucker conditions, and is able to efficiently solve general LP problems. This work presents a new method named the Markov Cosine Simplex Method (MCSM) for solving MDP problems, which is an extension of CSM. In this article, the efficiency of MCSM is compared to the traditional revised simplex method (RSM); experimental results show that MCSM is far more efficient than RSM. 相似文献

15.

Semi-infinite Markov decision processes

Ming Chen Jerzy A. Filar Ke Liu 《Mathematical Methods of Operations Research》2000,51(1):115-137

相似文献

16.

Markov decision processes and strongly excessive functions

K.M. van Hee J. Wessels 《Stochastic Processes and their Applications》1978,8(1):59-76

Strongly excessive functions play an important role in the theory of Markov decision processes and Markov games. In this paper the following question is investigated: What are the properties of Markov decision processes which possess a strongly excessive function? A probabilistic characterization is presented in the form of a random drift through a partitioned state space. For strongly excessive functions which have a positive lower bound a characterization is given in terms of the lifetime distribution of the process.Finally we give a characterization in terms of the spectral radius. 相似文献

17.

Aggregation and disaggregation in Markov decision models for inventory control

L.M.M. Veugen J. van der Wal J. Wessels 《European Journal of Operational Research》1985,20(2):248-254

In this paper the possibility is investigated of using aggregation in the action space for some Markov decision processes of inventory control type. For the standard (s, S) inventory control model the policy improvement procedure can be executed in a very efficient way, therefore, aggregation in the action space is not of much use. However, in situations where the decisions have some aftereffect and, hence, the old decision has to be incorporated in the state, it might be rewarding to aggregate actions. Some variants for aggregation and disaggregation are formulated and analyzed. Numerical evidence is presented. 相似文献

18.

A useful technique for piecewise deterministic Markov decision processes

《Operations Research Letters》2021,49(1):55-61

This note presents a technique that is useful for the study of piecewise deterministic Markov decision processes (PDMDPs) with general policies and unbounded transition intensities. This technique produces an auxiliary PDMDP from the original one. The auxiliary PDMDP possesses certain desired properties, which may not be possessed by the original PDMDP. We apply this technique to risk-sensitive PDMDPs with total cost criteria, and comment on its connection with the uniformization technique. 相似文献

19.

Weighted discounted Markov decision processes with perturbation

刘克《应用数学学报(英文版)》1999,15(2):183-189

1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe... 相似文献

20.

Markov decision processes with multidimensional action spaces

Dimitrios G. Pandelis 《European Journal of Operational Research》2010,200(2):207

We study controlled Markov processes where multiple decisions need to be made for each state. We present conditions on the cost structure and the state transition mechanism of the process under which optimal decisions are restricted to a subset of the decision space. As a result, the numerical computation of the optimal policy may be significantly expedited. 相似文献