共查询到20条相似文献,搜索用时 15 毫秒
1.
刘克 《应用数学学报(英文版)》1999,15(2):183-189
1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe... 相似文献
2.
3.
4.
We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure μn of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model. 相似文献
5.
6.
Moshe Haviv 《Stochastic Processes and their Applications》1985,19(1):151-160
In this paper we suggest a new successive approximation method to compute the optimal discounted reward for finite state and action, discrete time, discounted Markov decision chains. The method is based on a block partitioning of the (stochastic) matrices corresponding to the stationary policies. The method is particularly attractive when the transition matrices are jointly nearly decomposable or nearly completely decomposable. 相似文献
7.
Douglas John White 《Mathematical Methods of Operations Research》1996,43(3):353-372
In this paper we consider a homotopy deformation approach to solving Markov decision process problems by the continuous deformation of a simpler Markov decision process problem until it is identical with the original problem. Algorithms and performance bounds are given. 相似文献
8.
Jerzy A. Filar 《Operations Research Letters》1983,2(1):13-15
In a Markovian Decision Process in which the rewards are ordinal rather than cardinal, the decision-maker may be interested in a policy attaining desirable outcomes ‘sufficiently’ often. Here we define such policies more precisely and point out that they can be computed by known techniques. 相似文献
9.
10.
Juan González-Hernández Raquiel R. López-Martínez J. Rubén Pérez-Hernández 《Mathematical Methods of Operations Research》2007,65(1):27-44
In this paper we consider Markov Decision Processes with discounted cost and a random rate in Borel spaces. We establish the
dynamic programming algorithm in finite and infinity horizon cases. We provide conditions for the existence of measurable
selectors. And we show an example of consumption-investment problem.
This research was partially supported by the PROMEP grant 103.5/05/40. 相似文献
11.
This paper describes a computational comparison of value iteration algorithms for discounted Markov decision processes. 相似文献
12.
This paper deals with discrete-time Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Under general conditions, we develop an iteration algorithm for computing the optimal value function, and also prove the existence of optimal stationary policies. Furthermore, we illustrate our results with a cash-balance model. 相似文献
13.
In this paper, an Envelope Theorem (ET) will be established for optimization problems on Euclidean spaces. In general, the Envelope Theorems permit analyzing an optimization problem and giving the solution by means of differentiability techniques. The ET will be presented in two versions. One of them uses concavity assumptions, whereas the other one does not require such kind of assumptions. Thereafter, the ET established will be applied to the Markov Decision Processes (MDPs) on Euclidean spaces, discounted and with infinite horizon. As the first application, several examples (including some economic models) of discounted MDPs for which the et allows to determine the value iteration functions will be presented. This will permit to obtain the corresponding optimal value functions and the optimal policies. As the second application of the ET, it will be proved that under differentiability conditions in the transition law, in the reward function, and the noise of the system, the value function and the optimal policy of the problem are differentiable with respect to the state of the system. Besides, various examples to illustrate these differentiability conditions will be provided. This work was partially supported by Benemérita Universidad Aut ónoma de Puebla (BUAP) under grant VIEP-BUAP 38/EXC/06-G, by Consejo Nacional de Ciencia y Tecnología (CONACYT), and by Evaluation-orientation de la COopération Scientifique (ECOS) under grant CONACyT-ECOS M06-M01. 相似文献
14.
An improved algorithm for solving communicating average reward Markov decision processes 总被引:1,自引:0,他引:1
This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527. 相似文献
15.
Eugene A. Feinberg 《Mathematical Methods of Operations Research》1994,39(3):257-288
This paper deals with constrained average reward Semi-Markov Decision Processes (SMDPs) with finite state and action sets. We consider two average reward criteria. The first criterion is time-average rewards, which equal the lower limits of the expected average rewards per unit time, as the horizon tends to infinity. The second criterion is ratio-average rewards, which equal the lower limits of the ratios of the expected total rewards during the firstn steps to the expected total duration of thesen steps asn . For both criteria, we prove the existence of optimal mixed stationary policies for constrained problems when the constraints are of the same nature as the objective functions. For unichain problems, we show the existence of randomized stationary policies which are optimal for both criteria. However, optimal mixed stationary policies may be different for each of these critria even for unichain problems. We provide linear programming algorithms for the computation of optimal policies. 相似文献
16.
17.
Eugene A. Feinberg Aleksey B. Piunovskiy 《Journal of Mathematical Analysis and Applications》2002,273(1):93-111
We consider a Markov decision process with an uncountable state space for which the vector performance functional has the form of expected total rewards. Under the single condition that initial distribution and transition probabilities are nonatomic, we prove that the performance space coincides with that generated by nonrandomized Markov policies. We also provide conditions for the existence of optimal policies when the goal is to maximize one component of the performance vector subject to inequality constraints on other components. We illustrate our results with examples of production and financial problems. 相似文献
18.
Tetsuichiro Iki Masayuki Horiguchi Masami Kurano 《Mathematical Methods of Operations Research》2007,66(3):545-555
In this paper, we are concerned with a new algorithm for multichain finite state Markov decision processes which finds an
average optimal policy through the decomposition of the state space into some communicating classes and a transient class.
For each communicating class, a relatively optimal policy is found, which is used to find an optimal policy by applying the
value iteration algorithm. Using a pattern matrix determining the behaviour pattern of the decision process, the decomposition
of the state space is effectively done, so that the proposed algorithm simplifies the structured one given by the excellent
Leizarowitz’s paper (Math Oper Res 28:553–586, 2003). Also, a numerical example is given to comprehend the algorithm. 相似文献
19.
《Optimization》2012,61(3):431-455
The aim of this paper is to give a survey of recent developments in the area of successive approximations for Markov decision processes and Markov games. We will emphasize two aspects, viz. the conditions under which successive approximations converge in some strong sense and variations of these methods which diminish the amount of computational work to be executed. With respect to the first aspect it will be shown how much unboundedness of the rewards may be allowed without violation of the convergence With respect to the second aspect we will present four ideas, that can be applied in conjunction, which may diminish the amount of work to be done. These ideas are: 1. the use of the actual convergence of the iterates for the construction of upper and lower bounds (Macqueen bounds), 2. the use of alternative policy improvement procedures (based on stopping times), 3. a better evaluation of the values of actual policies in each iteration step by a value oriented approach, 4. the elimination of suboptimal actions not only permanently, but also temporarily. The general presentation is given for Markov decision processes with a final section devoted to the possibilities of extension to Markov games. 相似文献