首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ  . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure μnμn of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model.  相似文献   

2.
1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe...  相似文献   

3.
4.
We generalize the geometric discount of finite discounted cost Markov Decision Processes to “exponentially representable”discount functions, prove existence of optimal policies which are stationary from some time N onward, and provide an algorithm for their computation. Outside this class, optimal “N-stationary” policies in general do not exist.  相似文献   

5.
We consider Markov control processes with Borel state space and Feller transition probabilities, satisfying some generalized geometric ergodicity conditions. We provide a new theorem on the existence of a solution to the average cost optimality equation.  相似文献   

6.
We consider constrained discounted-cost Markov control processes in Borel spaces, with unbounded costs. Conditions are given for the constrained problem to be solvable, and also equivalent to an equality-constrained (EC) linear program. In addition, it is shown that there is no duality gap between EC and its dual program EC*, and that, under additional assumptions, also EC* is solvable, so that in fact the strong duality condition holds. Finally, a Farkas-like theorem is included, which gives necessary and sufficient conditions for the primal program EC to be consistent.  相似文献   

7.
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms.  相似文献   

8.
In this paper, we consider discounted-reward finite-state Markov decision processes which depend on unknown parameters. An adaptive policy inspired by the nonstationary value iteration scheme of Federgruen and Schweitzer (Ref. 1) is proposed. This policy is briefly compared with the principle of estimation and control recently obtained by Schäl (Ref. 4).This research was supported in part by the Consejo Nacional de Ciencia y Tecnología under Grant No. PCCBBNA-005008, in part by a grant from the IBM Corporation, in part by the Air Force Office of Scientific Research under Grant No. AFOSR-79-0025, in part by the National Science Foundation under Grant No. ECS-0822033, and in part by the Joint Services Electronics Program under Contract No. F49620-77-C-0101.  相似文献   

9.
In this paper we consider a homotopy deformation approach to solving Markov decision process problems by the continuous deformation of a simpler Markov decision process problem until it is identical with the original problem. Algorithms and performance bounds are given.  相似文献   

10.
We study controlled Markov processes where multiple decisions need to be made for each state. We present conditions on the cost structure and the state transition mechanism of the process under which optimal decisions are restricted to a subset of the decision space. As a result, the numerical computation of the optimal policy may be significantly expedited.  相似文献   

11.
12.
13.
We consider Markov Decision Processes under light traffic conditions. We develop an algorithm to obtain asymptotically optimal policies for both the total discounted and the average cost criterion. This gives a general framework for several light traffic results in the literature. We illustrate the method by deriving the asymptotically optimal control of a simple ATM network.  相似文献   

14.
15.
In this paper, an Envelope Theorem (ET) will be established for optimization problems on Euclidean spaces. In general, the Envelope Theorems permit analyzing an optimization problem and giving the solution by means of differentiability techniques. The ET will be presented in two versions. One of them uses concavity assumptions, whereas the other one does not require such kind of assumptions. Thereafter, the ET established will be applied to the Markov Decision Processes (MDPs) on Euclidean spaces, discounted and with infinite horizon. As the first application, several examples (including some economic models) of discounted MDPs for which the et allows to determine the value iteration functions will be presented. This will permit to obtain the corresponding optimal value functions and the optimal policies. As the second application of the ET, it will be proved that under differentiability conditions in the transition law, in the reward function, and the noise of the system, the value function and the optimal policy of the problem are differentiable with respect to the state of the system. Besides, various examples to illustrate these differentiability conditions will be provided. This work was partially supported by Benemérita Universidad Aut ónoma de Puebla (BUAP) under grant VIEP-BUAP 38/EXC/06-G, by Consejo Nacional de Ciencia y Tecnología (CONACYT), and by Evaluation-orientation de la COopération Scientifique (ECOS) under grant CONACyT-ECOS M06-M01.  相似文献   

16.
17.
This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non-randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application is presented.  相似文献   

18.
This note describes sufficient conditions under which total-cost and average-cost Markov decision processes (MDPs) with general state and action spaces, and with weakly continuous transition probabilities, can be reduced to discounted MDPs. For undiscounted problems, these reductions imply the validity of optimality equations and the existence of stationary optimal policies. The reductions also provide methods for computing optimal policies. The results are applied to a capacitated inventory control problem with fixed costs and lost sales.  相似文献   

19.
We introduce a revised simplex algorithm for solving a typical type of dynamic programming equation arising from a class of finite Markov decision processes. The algorithm also applies to several types of optimal control problems with diffusion models after discretization. It is based on the regular simplex algorithm, the duality concept in linear programming, and certain special features of the dynamic programming equation itself. Convergence is established for the new algorithm. The algorithm has favorable potential applicability when the number of actions is very large or even infinite.  相似文献   

20.
Average cost Markov decision processes (MDPs) with compact state and action spaces and bounded lower semicontinuous cost functions are considered. Kurano [7] has treated the general case in which several ergodic classes and a transient set are permitted for the Markov process induced by any randomized stationary policy under the hypothesis of Doeblin and showed the existence of a minimum pair of state and policy. This paper considers the same case as that discussed in Kurano [7] and proves some new results which give the existence theorem of an optimal stationary policy under some reasonable conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号