首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
4.
This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs.The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set.We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy.Then,we prove that the value function satisfies the optimality equation and there exists an optimal(or e-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach.Further we give some properties of optimal policies.In addition,a value iteration algorithm for computing the value function and optimal policies is developed and an example is given.Finally,it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes.  相似文献   

5.
《Optimization》2012,61(2-3):271-283
This paper presents a new concept of Markov decision processes: continuous time shock Markov decision processes, which model Markovian controlled systems sequentially shocked by its environment. Between two adjacent shocks, the system can be modeled by continuous time Markov decision processes. But according to each shock, the system's parameters are changed and an instantaneous state transition occurs. After presenting the model, we prove that the optimality equation, which consists of countable equations, has a unique solution in some function space Ω  相似文献   

6.
We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ  . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure μnμn of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model.  相似文献   

7.
The computational problem of transient solutions for denumerable state Markov Processes (MP's) has been solved by Hsu and Yuan [12], who derived an efficient algorithm with uniform error. However, when the state space of an MP is of two or more dimensions, even for computational methods dealing with stationary solutions, only the case where one of the dimensions is infinite and all the others are finite has been studied. In this paper, we study transient solutions for multidimensional denumerable state MP's and give an algorithm with uniform error. Some numerical results are presented.Supported by the National Natural Science Foundation of China.  相似文献   

8.
9.
We consider a discrete time finite Markov decision process (MDP) with the discounted and weighted reward optimality criteria. In [1] the authors considered some decomposition of limiting average MDPs. In this paper, we use an analogous approach for discounted and weighted MDPs. Then, we construct some hierarchical decomposition algorithms for both discounted and weighted MDPs.  相似文献   

10.
The paper is concerned with a discounted Markov decision process with an unknown parameter which is estimated anew at each stage. Algorithms are proposed which are intermediate between and include the classical (but time consuming) principle of estimation and control and the simpler nonstationary value iteration which converges more slowly. These algorithms perform one single policy improvement step after each estimation, and then the policy thus obtained is evaluated completely (policy-iteration) or incompletely (policy-value-iteration). The results show that especially both these methods lead to asymptotically discount optimal policies. In addition, these results are generalized to cases where systematic errors will not vanish when the number of stages increases.Dedicated to Prof. Dr. K. Hinderer on the occassion of his 60th birthday  相似文献   

11.
12.
1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe...  相似文献   

13.
We study Markov jump decision processes with both continuously and instantaneouslyacting decisions and with deterministic drift between jumps. Such decision processes were recentlyintroduced and studied from discrete time approximations point of view by Van der Duyn Schouten.Weobtain necessary and sufficient optimality conditions for these decision processes in terms of equations and inequalities of quasi-variational type. By means of the latter we find simple necessaryand sufficient conditions for the existence of stationary optimal policies in such processes with finite state and action spaces, both in the discounted and average per unit time reward cases.  相似文献   

14.
In this paper we consider a homotopy deformation approach to solving Markov decision process problems by the continuous deformation of a simpler Markov decision process problem until it is identical with the original problem. Algorithms and performance bounds are given.  相似文献   

15.
《Optimization》2012,61(2):255-269
Constrained Markov decision processes with compact state and action spaces are studied under long-run average reward or cost criteria. By introducing a corresponding Lagrange function, a saddle-point theorem is given, by which the existence of a constrained optimal pair of initial state distribution and policy is shown. Also, under the hypothesis of Doeblin, the functional characterization of a constrained optimal policy is obtained  相似文献   

16.
We show how to find a sequence of policies for essentially finite-state dynamic programs such that the corresponding vector of optimal returns converges pointwise to that of a denumerable-state dynamic program. The corresponding result for stochastic games is also given.  相似文献   

17.
We consider a class of discrete-time Markov control processes with Borel state and action spaces, and d i.i.d. disturbances with unknown distribution . Under mild semi-continuity and compactness conditions, and assuming that is absolutely continuous with respect to Lebesgue measure, we establish the existence of adaptive control policies which are (1) optimal for the average-reward criterion, and (2) asymptotically optimal in the discounted case. Our results are obtained by taking advantage of some well-known facts in the theory of density estimation. This approach allows us to avoid restrictive conditions on the state space and/or on the system's transition law imposed in recent works, and on the other hand, it clearly shows the way to other applications of nonparametric (density) estimation to adaptive control.Research partially supported by The Third World Academy of Sciences under Research Grant No. MP 898-152.  相似文献   

18.
This paper describes a computational comparison of value iteration algorithms for discounted Markov decision processes.  相似文献   

19.
Fitting the value function in a Markovian decision process by a linear superposition of M basis functions reduces the problem dimensionality from the number of states down to M, with good accuracy retained if the value function is a smooth function of its argument, the state vector. This paper provides, for both the discounted and undiscounted cases, three algorithms for computing the coefficients in the linear superposition: linear programming, policy iteration, and least squares.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号