首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper deals with expected average cost (EAC) and discount-sensitive criteria for discrete-time Markov control processes on Borel spaces, with possibly unbounded costs. Conditions are given under which (a) EAC optimality and strong –1-discount optimality are equivalent; (b) strong 0-discount optimality implies bias optimality; and, conversely, under an additional hypothesis, (c) bias optimality implies strong 0-discount optimality. Thus, in particular, as the class of bias optimal policies is nonempty, (c) gives the existence of a strong 0-discount optimal policy, whereas from (b) and (c) we get conditions for bias optimality and strong 0-discount optimality to be equivalent. A detailed example illustrates our results.  相似文献   

2.
3.
This paper concerns nonstationary continuous-time Markov control processes on Polish spaces, with the infinite-horizon discounted cost criterion. Necessary and sufficient conditions are given for a control policy to be optimal and asymptotically optimal. In addition, under suitable hypotheses, it is shown that the successive approximation procedure converges in the sense that the sequence of finite-horizon optimal cost functions and the corresponding optimal control policies both converge.  相似文献   

4.
We introduce a revised simplex algorithm for solving a typical type of dynamic programming equation arising from a class of finite Markov decision processes. The algorithm also applies to several types of optimal control problems with diffusion models after discretization. It is based on the regular simplex algorithm, the duality concept in linear programming, and certain special features of the dynamic programming equation itself. Convergence is established for the new algorithm. The algorithm has favorable potential applicability when the number of actions is very large or even infinite.  相似文献   

5.
本文对可数状态集、非空决策集、报酬无界的平均准则马氏决策过程,提出了一组新的条件,在此条件下存在(ε)最优平稳策略,且当最优不等式中的和有定义时最优不等式也成立。  相似文献   

6.
We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ  . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure μnμn of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model.  相似文献   

7.
1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe...  相似文献   

8.
非负费用折扣半马氏决策过程   总被引:1,自引:0,他引:1  
黄永辉  郭先平 《数学学报》2010,53(3):503-514
本文考虑可数状态非负费用的折扣半马氏决策过程.首先在给定半马氏决策核和策略下构造一个连续时间半马氏决策过程,然后用最小非负解方法证明值函数满足最优方程和存在ε-最优平稳策略,并进一步给出最优策略的存在性条件及其一些性质.最后,给出了值迭代算法和一个数值算例.  相似文献   

9.
We develop a general framework to analyze the convergence of linear-programming approximations for Markov control processes in metric spaces. The approximations are based on aggregation and relaxation of constraints, as well as inner approximations of the decision variables. In particular, conditions are given under which the control problems optimal value can be approximated by a sequence of finite-dimensional linear programs.  相似文献   

10.
Value iteration and optimization of multiclass queueing networks   总被引:2,自引:0,他引:2  
Chen  Rong-Rong  Meyn  Sean 《Queueing Systems》1999,32(1-3):65-97
This paper considers in parallel the scheduling problem for multiclass queueing networks, and optimization of Markov decision processes. It is shown that the value iteration algorithm may perform poorly when the algorithm is not initialized properly. The most typical case where the initial value function is taken to be zero may be a particularly bad choice. In contrast, if the value iteration algorithm is initialized with a stochastic Lyapunov function, then the following hold: (i) a stochastic Lyapunov function exists for each intermediate policy, and hence each policy is regular (a strong stability condition), (ii) intermediate costs converge to the optimal cost, and (iii) any limiting policy is average cost optimal. It is argued that a natural choice for the initial value function is the value function for the associated deterministic control problem based upon a fluid model, or the approximate solution to Poisson’s equation obtained from the LP of Kumar and Meyn. Numerical studies show that either choice may lead to fast convergence to an optimal policy. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

11.
This paper deals with the asymptotic optimality of a stochastic dynamic system driven by a singularly perturbed Markov chain with finite state space. The states of the Markov chain belong to several groups such that transitions among the states within each group occur much more frequently than transitions among the states in different groups. Aggregating the states of the Markov chain leads to a limit control problem, which is obtained by replacing the states in each group by the corresponding average distribution. The limit control problem is simpler to solve as compared with the original one. A nearly-optimal solution for the original problem is constructed by using the optimal solution to the limit problem. To demonstrate, the suggested approach of asymptotic optimal control is applied to examples of manufacturing systems of production planning.  相似文献   

12.
该文考虑的是可数状态空间有限行动空间非齐次马氏决策过程的期望总报酬准则.与以往不同的是,我们是通过扩大状态空间的方法,将非齐次的马氏决策过程转化成齐次的马氏决策过程,于是非常简洁地得到了按传统的方法所得的主要结果.  相似文献   

13.
In this article, we study risk-sensitive control problem with controlled continuous time Markov chain state dynamics. Using multiplicative dynamic programming principle along with the atomic structure of the state dynamics, we prove the existence and a characterization of optimal risk-sensitive control under geometric ergodicity of the state dynamics along with a smallness condition on the running cost.  相似文献   

14.
15.
In this paper we consider Markov Decision Processes with discounted cost and a random rate in Borel spaces. We establish the dynamic programming algorithm in finite and infinity horizon cases. We provide conditions for the existence of measurable selectors. And we show an example of consumption-investment problem. This research was partially supported by the PROMEP grant 103.5/05/40.  相似文献   

16.
The Markov decision process is studied under the maximization of the probability that total discounted rewards exceed a target level. We focus on and study the dynamic programing equations of the model. We give various properties of the optimal return operator and, for the infinite planning-horizon model, we characterize the optimal value function as a maximal fixed point of the previous operator. Various turnpike results relating the finite and infinite-horizon models are also given.  相似文献   

17.
离散事件系统静态稳定性的马氏决策过程方法   总被引:3,自引:0,他引:3  
本文用马氏决策过程方法来讨论离散事件系统(DES)的静态稳定性问题,包括强吸引域和弱吸引域的计算,同时讨论了弱吸引域的稳定控制器的计算问题,所用方法不要求给定谓词的∑-(或∑u-)不变性和对环路的判别。  相似文献   

18.
This paper is concerned with the problem of minimizing the expected finite-horizon cost for piecewise deterministic Markov decision processes. The transition rates may be unbounded, and the cost functions are allowed to be unbounded from above and from below. The optimality is over the general history-dependent policies, where the control is continuously acting in time. The infinitesimal approach is employed to establish the associated Hamilton-Jacobi-Bellman equation, via which the existence of optimal policies is proved. An example is provided to verify all the assumptions proposed.  相似文献   

19.
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms.  相似文献   

20.
The literature about maximum of entropy for Markov processes deals mainly with discrete-time Markov chains. Very few papers dealing with continuous-time jump Markov processes exist and none dealing with semi-Markov processes. It is the aim of this paper to contribute to fill this lack. We recall the basics concerning entropy for Markov and semi-Markov processes and we study several problems to give an overview of the possible directions of use of maximum entropy in connection with these processes. Numeric illustrations are presented, in particular in application to reliability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号