期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Weighted discounted Markov decision processes with perturbation

刘克《应用数学学报(英文版)》1999,15(2):183-189

1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe... 相似文献

2.

Conditions for the uniqueness of optimal policies of discounted Markov decision processes

Daniel?Cruz-Suárez Raúl?Montes-de-Oca Email author Francisco?Salem-Silva 《Mathematical Methods of Operations Research》2004,60(3):415-436

相似文献

3.

Semi-infinite Markov decision processes

Ming Chen Jerzy A. Filar Ke Liu 《Mathematical Methods of Operations Research》2000,51(1):115-137

相似文献

4.

Stochastic approximations of constrained discounted Markov decision processes

François Dufour Tomás Prieto-Rumeau 《Journal of Mathematical Analysis and Applications》2014

We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure _μ_n

μ_{n}

of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model. 相似文献

5.

Weighted Markov decision processes with perturbation

Ke Liu Jerzy A. Filar 《Mathematical Methods of Operations Research》2001,53(3):465-480

相似文献

6.

Block-successive approximation for a discounted Markov decision model

Moshe Haviv 《Stochastic Processes and their Applications》1985,19(1):151-160

In this paper we suggest a new successive approximation method to compute the optimal discounted reward for finite state and action, discrete time, discounted Markov decision chains. The method is based on a block partitioning of the (stochastic) matrices corresponding to the stationary policies. The method is particularly attractive when the transition matrices are jointly nearly decomposable or nearly completely decomposable. 相似文献

7.

A homotopy approach for infinite horizon discounted Markov decision processes

Douglas John White 《Mathematical Methods of Operations Research》1996,43(3):353-372

In this paper we consider a homotopy deformation approach to solving Markov decision process problems by the continuous deformation of a simpler Markov decision process problem until it is identical with the original problem. Algorithms and performance bounds are given. 相似文献

8.

Percentiles and markovian decision processes

Jerzy A. Filar 《Operations Research Letters》1983,2(1):13-15

In a Markovian Decision Process in which the rewards are ordinal rather than cardinal, the decision-maker may be interested in a policy attaining desirable outcomes ‘sufficiently’ often. Here we define such policies more precisely and point out that they can be computed by known techniques. 相似文献

9.

Semi-infinite discounted Markov decision processes: Policy improvement and singular perturbations

Mohammed Abbad Khalid Rahhali 《Mathematical Methods of Operations Research》2001,54(2):279-290

相似文献

10.

Markov control processes with randomized discounted cost

Juan González-Hernández Raquiel R. López-Martínez J. Rubén Pérez-Hernández 《Mathematical Methods of Operations Research》2007,65(1):27-44

In this paper we consider Markov Decision Processes with discounted cost and a random rate in Borel spaces. We establish the dynamic programming algorithm in finite and infinity horizon cases. We provide conditions for the existence of measurable selectors. And we show an example of consumption-investment problem. This research was partially supported by the PROMEP grant 103.5/05/40. 相似文献

11.

Computational comparison of value iteration algorithms for discounted Markov decision processes

L.C. Thomas R. Harley A.C. Lavercombe 《Operations Research Letters》1983,2(2):72-76

This paper describes a computational comparison of value iteration algorithms for discounted Markov decision processes. 相似文献

12.

Markov decision processes with state-dependent discount factors and unbounded rewards/costs

Qingda Wei Xianping Guo 《Operations Research Letters》2011,39(5):369-374

This paper deals with discrete-time Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Under general conditions, we develop an iteration algorithm for computing the optimal value function, and also prove the existence of optimal stationary policies. Furthermore, we illustrate our results with a cash-balance model. 相似文献

13.

An envelope theorem and some applications to discounted Markov decision processes

Hugo Cruz-Suárez Raúl Montes-de-Oca 《Mathematical Methods of Operations Research》2008,67(2):299-321

In this paper, an Envelope Theorem (ET) will be established for optimization problems on Euclidean spaces. In general, the Envelope Theorems permit analyzing an optimization problem and giving the solution by means of differentiability techniques. The ET will be presented in two versions. One of them uses concavity assumptions, whereas the other one does not require such kind of assumptions. Thereafter, the ET established will be applied to the Markov Decision Processes (MDPs) on Euclidean spaces, discounted and with infinite horizon. As the first application, several examples (including some economic models) of discounted MDPs for which the et allows to determine the value iteration functions will be presented. This will permit to obtain the corresponding optimal value functions and the optimal policies. As the second application of the ET, it will be proved that under differentiability conditions in the transition law, in the reward function, and the noise of the system, the value function and the optimal policy of the problem are differentiable with respect to the state of the system. Besides, various examples to illustrate these differentiability conditions will be provided. This work was partially supported by Benemérita Universidad Aut ónoma de Puebla (BUAP) under grant VIEP-BUAP 38/EXC/06-G, by Consejo Nacional de Ciencia y Tecnología (CONACYT), and by Evaluation-orientation de la COopération Scientifique (ECOS) under grant CONACyT-ECOS M06-M01. 相似文献

14.

An improved algorithm for solving communicating average reward Markov decision processes 总被引：1，自引：0，他引：1

Moshe Haviv Martin L. Puterman 《Annals of Operations Research》1991,28(1):229-242

This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527. 相似文献

15.

Constrained Semi-Markov decision processes with average rewards

Eugene A. Feinberg 《Mathematical Methods of Operations Research》1994,39(3):257-288

This paper deals with constrained average reward Semi-Markov Decision Processes (SMDPs) with finite state and action sets. We consider two average reward criteria. The first criterion is time-average rewards, which equal the lower limits of the expected average rewards per unit time, as the horizon tends to infinity. The second criterion is ratio-average rewards, which equal the lower limits of the ratios of the expected total rewards during the firstn steps to the expected total duration of thesen steps asn . For both criteria, we prove the existence of optimal mixed stationary policies for constrained problems when the constraints are of the same nature as the objective functions. For unichain problems, we show the existence of randomized stationary policies which are optimal for both criteria. However, optimal mixed stationary policies may be different for each of these critria even for unichain problems. We provide linear programming algorithms for the computation of optimal policies. 相似文献

16.

Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints

Xianping Guo Wenzhao Zhang 《European Journal of Operational Research》2014

相似文献

17.

Nonatomic total rewards Markov decision processes with multiple criteria

Eugene A. Feinberg Aleksey B. Piunovskiy 《Journal of Mathematical Analysis and Applications》2002,273(1):93-111

We consider a Markov decision process with an uncountable state space for which the vector performance functional has the form of expected total rewards. Under the single condition that initial distribution and transition probabilities are nonatomic, we prove that the performance space coincides with that generated by nonrandomized Markov policies. We also provide conditions for the existence of optimal policies when the goal is to maximize one component of the performance vector subject to inequality constraints on other components. We illustrate our results with examples of production and financial problems. 相似文献

18.

A structured pattern matrix algorithm for multichain Markov decision processes

Tetsuichiro Iki Masayuki Horiguchi Masami Kurano 《Mathematical Methods of Operations Research》2007,66(3):545-555

In this paper, we are concerned with a new algorithm for multichain finite state Markov decision processes which finds an average optimal policy through the decomposition of the state space into some communicating classes and a transient class. For each communicating class, a relatively optimal policy is found, which is used to find an optimal policy by applying the value iteration algorithm. Using a pattern matrix determining the behaviour pattern of the decision process, the decomposition of the state space is effectively done, so that the proposed algorithm simplifies the structured one given by the excellent Leizarowitz’s paper (Math Oper Res 28:553–586, 2003). Also, a numerical example is given to comprehend the algorithm. 相似文献

19.

Successive approximations for Markov decision processes and Markov games with unbounded rewards

《Optimization》2012,61(3):431-455

The aim of this paper is to give a survey of recent developments in the area of successive approximations for Markov decision processes and Markov games. We will emphasize two aspects, viz. the conditions under which successive approximations converge in some strong sense and variations of these methods which diminish the amount of computational work to be executed. With respect to the first aspect it will be shown how much unboundedness of the rewards may be allowed without violation of the convergence

With respect to the second aspect we will present four ideas, that can be applied in conjunction, which may diminish the amount of work to be done. These ideas are: 1. the use of the actual convergence of the iterates for the construction of upper and lower bounds (Macqueen bounds), 2. the use of alternative policy improvement procedures (based on stopping times), 3. a better evaluation of the values of actual policies in each iteration step by a value oriented approach, 4. the elimination of suboptimal actions not only permanently, but also temporarily. The general presentation is given for Markov decision processes with a final section devoted to the possibilities of extension to Markov games. 相似文献

20.

Semi-infinite weighted Markov decision processes with perturbation

Mohammed?Abbad Email author Khalid?Rahhali 《Mathematical Methods of Operations Research》2004,60(2):251-265

相似文献