首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by different classes of policies; we then focus on stationary policies and on mixed deterministic policies and present conditions under which optimal policies exist within these classes. We conclude by introducing an equivalent infinite Linear Program.  相似文献   

2.
This paper considers the application of a variable neighborhood search (VNS) algorithm for finite-horizon (H stages) Markov Decision Processes (MDPs), for the purpose of alleviating the “curse of dimensionality” phenomenon in searching for the global optimum. The main idea behind the VNSMDP algorithm is that, based on the result of the stage just considered, the search for the optimal solution (action) of state x in stage t is conducted systematically in variable neighborhood sets of the current action. Thus, the VNSMDP algorithm is capable of searching for the optimum within some subsets of the action space, rather than over the whole action set. Analysis on complexity and convergence attributes of the VNSMDP algorithm are conducted in the paper. It is shown by theoretical and computational analysis that, the VNSMDP algorithm succeeds in searching for the global optimum in an efficient way.  相似文献   

3.
This paper investigates Factored Markov Decision Processes with Imprecise Probabilities (MDPIPs); that is, Factored Markov Decision Processes (MDPs) where transition probabilities are imprecisely specified. We derive efficient approximate solutions for Factored MDPIPs based on mathematical programming. To do this, we extend previous linear programming approaches for linear approximations in Factored MDPs, resulting in a multilinear formulation for robust “maximin” linear approximations in Factored MDPIPs. By exploiting the factored structure in MDPIPs we are able to demonstrate orders of magnitude reduction in solution time over standard exact non-factored approaches, in exchange for relatively low approximation errors, on a difficult class of benchmark problems with millions of states.  相似文献   

4.
In this paper, we introduce a Markov decision model with absorbing states and a constraint on the asymptotic failure rate. The objective is to find a stationary policy which minimizes the infinite horizon expected average cost, given that the system never fails. Using Perron-Frobenius theory of non-negative matrices and spectral analysis, we show that the problem can be reduced to a linear programming problem. Finally, we apply this method to a real problem for an aeronautical system.  相似文献   

5.
The Unichain classification problem detects whether a finite state and action MDP is unichain under all deterministic policies. This problem is NP-hard. This paper provides polynomial algorithms for this problem when there is a state that is either recurrent under all deterministic policies or absorbing under some action.  相似文献   

6.
This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non-randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application is presented.  相似文献   

7.
Linear Programming is known to be an important and useful tool for solving Markov Decision Processes (MDP). Its derivation relies on the Dynamic Programming approach, which also serves to solve MDP. However, for Markov Decision Processes with several constraints the only available methods are based on Linear Programs. The aim of this paper is to investigate some aspects of such Linear Programs, related to multi-chain MDPs. We first present a stochastic interpretation of the decision variables that appear in the Linear Programs available in the literature. We then show for the multi-constrained Markov Decision Process that the Linear Program suggested in [9] can be obtained from an equivalent unconstrained Lagrange formulation of the control problem. This shows the connection between the Linear Program approach and the Lagrange approach, that was previously used only for the case of a single constraint [3, 14, 15].  相似文献   

8.
9.
The use of Markov Decision Processes for Inspection Maintenance and Rehabilitation of civil engineering structures relies on the use of several transition matrices related to the stochastic degradation process, maintenance actions and imperfect inspections. Point estimators for these matrices are usually used and they are evaluated using statistical inference methods and/or expert evaluation methods. Thus, considerable epistemic uncertainty often veils the true values of these matrices. Our contribution through this paper is threefold. First, we present a methodology for incorporating epistemic uncertainties in dynamic programming algorithms used to solve finite horizon Markov Decision Processes (which may be partially observable). Second, we propose a methodology based on the use of Dirichlet distributions which answers, in our sense, much of the controversy found in the literature about estimating Markov transition matrices. Third, we show how the complexity resulting from the use of Monte-Carlo simulations for the transition matrices can be greatly overcome in the framework of dynamic programming. The proposed model is applied to concrete bridge under degradation, in order to provide the optimal strategy for inspection and maintenance. The influence of epistemic uncertainties on the optimal solution is underlined through sensitivity analysis regarding the input data.  相似文献   

10.
周健伟 《应用数学》2000,13(2):86-89
设(Xt)是有转移函数的马尔可夫过程,其中Xt取值于状态空间(Et,ξ,t≥0。设ft是(Et,ξ)到状态空间(Et,ξ是的可测变换。本文给出了使(ft(Xt)仍是有转移函数的马尔可夫过程的充分条件,对于有函数的马尔可夫过程族,也讨论了类似的问题。  相似文献   

11.
We consider several applications of two state, finite action, infinite horizon, discrete-time Markov decision processes with partial observations, for two special cases of observation quality, and show that in each of these cases the optimal cost function is piecewise linear. This in turn allows us to obtain either explicit formulas or simplified algorithms to compute the optimal cost function and the associated optimal control policy. Several examples are presented.Research supported in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, in part by the Advanced Technology Program of the State of Texas, and in part by the DoD Joint Services Electronics Program through the Air Force Office of Scientific Research (AFSC) Contract F49620-86-C-0045.  相似文献   

12.
This paper is a survey of papers which make use of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return). It covers infinite-horizon nondiscounted formulations, infinite-horizon discounted formulations, and finite-horizon formulations. For problem formulations in terms solely of the probabilities of being in each state and taking each action, policy equivalence results are given which allow policies to be restricted to the class of Markov policies or to the randomizations of deterministic Markov policies. For problems which cannot be stated in such terms, in terms of the primitive state setI, formulations involving a redefinition of the states are examined.The author would like to thank two referees for a very thorough and helpful referceing of the original article and for the extra references (Refs. 47–52) now added to the original reference list.  相似文献   

13.
It is known that the value function in an unconstrained Markov decision process with finitely many states and actions is a piecewise rational function in the discount factor a, and that the value function can be expressed as a Laurent series expansion about = 1 for close enough to 1. We show in this paper that this property also holds for the value function of Markov decision processes with additional constraints. More precisely, we show by a constructive proof that there are numbers O = o <1 <... < m–1 < m = 1 such that for everyj = 1, 2, ...,m – 1 either the problem is not feasible for all discount factors in the open interval (j–1, j) or the value function is a rational function in a in the closed interval [j–1, j]. As a consequence, if the constrained problem is feasible in the neighborhood of = 1, then the value function has a Laurent series expansion about = 1. Our proof technique for the constrained case provides also a new proof for the unconstrained case.  相似文献   

14.
We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ  . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure μnμn of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model.  相似文献   

15.
高小燕 《大学数学》2013,29(1):38-42
研究了一类非齐次马氏链———渐近循环马氏链泛函的强大数定律,首先引出了渐近循环马氏链的概念,然后给出了若干引理.利用了渐近循环马氏链关于状态序偶出现频率的强大数定理给出并证明了关于渐近循环马氏链泛函的强大数定律,所得定理作为推论可得到已有的结果.  相似文献   

16.
讨论了第一型广义积分收敛时被积函数在无穷远处渐近性质,证明当广义积分收敛时,被积函数在无穷远处不一定趋于零,而可以表现为其他多种形式,如剧烈振荡的连续函数,或间断函数,甚至可以是特殊形式的非负连续函数等.最后给出当广义积分收敛时,判别被积函数在无穷远处是否趋于零时的几个条件.  相似文献   

17.
该文考虑的是可数状态空间有限行动空间非齐次马氏决策过程的期望总报酬准则.与以往不同的是,我们是通过扩大状态空间的方法,将非齐次的马氏决策过程转化成齐次的马氏决策过程,于是非常简洁地得到了按传统的方法所得的主要结果.  相似文献   

18.
Abstract

In this article, we study continuous-time Markov decision processes in Polish spaces. The optimality criterion to be maximized is the expected discounted criterion. The transition rates may be unbounded, and the reward rates may have neither upper nor lower bounds. We provide conditions on the controlled system's primitive data under which we prove that the transition functions of possibly non-homogeneous continuous-time Markov processes are regular by using Feller's construction approach to such transition functions. Then, under continuity and compactness conditions we prove the existence of optimal stationary policies by using the technique of extended infinitesimal operators associated with the transition functions of possibly non-homogeneous continuous-time Markov processes, and also provide a recursive way to compute (or at least to approximate) the optimal reward values. The conditions provided in this paper are different from those used in the previous literature, and they are illustrated with an example.  相似文献   

19.
引入了渐近循环马氏链的概念,它是循环马氏链概念的推广.首先研究了在强遍历的条件下,可列循环马氏链的收敛速度,作为主要结论给出了当满足不同条件时可列渐近循环马氏链的C-强遍历性,一致C-强遍历性和一致C-强遍历的收敛速度  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号