首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《Optimization》2012,61(12):1427-1447
This article is concerned with the limiting average variance for discrete-time Markov control processes in Borel spaces, subject to pathwise constraints. Under suitable hypotheses we show that within the class of deterministic stationary optimal policies for the pathwise constrained problem, there exists one with a minimal variance.  相似文献   

2.
3.
4.
In this paper we consider Markov Decision Processes with discounted cost and a random rate in Borel spaces. We establish the dynamic programming algorithm in finite and infinity horizon cases. We provide conditions for the existence of measurable selectors. And we show an example of consumption-investment problem. This research was partially supported by the PROMEP grant 103.5/05/40.  相似文献   

5.
This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results.  相似文献   

6.
This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the"limit"one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation.  相似文献   

7.
Sufficient conditions are given for the optimal control of Markov processes when the control policy is stationary and the process possesses a stationary distribution. The costs are unbounded and additive, and may or may not be discounted. Applications to Semi-Markov processes are included, and the results for random walks are related to the author's previous papers on diffusion processes.  相似文献   

8.
MARKOV DECISION PROGRAMMING WITH CONSTRAINTS   总被引:1,自引:0,他引:1  
MARKOVDECISIONPROGRAMMINGWITHCONSTRAINTSLIUJIANYONG(刘建庸);LIUKE(刘克)(InstituteofAppliedMathematics,theChineseAcademyofSciences,...  相似文献   

9.
In the theory and applications of Markov decision processes introduced by Howard and subsequently developed by many authors, it is assumed that actions can be chosen independently at each state. A policy constrained Markov decision process is one where selecting a given action in one state restricts the choice of actions in another. This note describes a method for determining a maximal gain policy in the policy constrained case. The method involves the use of bounds on the gain of the feasible policies to produce a policy ranking list. This list then forms a basis for a bounded enumeration procedure which yields the optimal policy.  相似文献   

10.
In this paper, we study the optimal ergodic control problem with minimum variance for a general class of controlled Markov diffusion processes. To this end, we follow a lexicographical approach. Namely, we first identify the class of average optimal control policies, and then within this class, we search policies that minimize the limiting average variance. To do this, a key intermediate step is to show that the limiting average variance is a constant independent of the initial state. Our proof of this latter fact gives a result stronger than the central limit theorem for diffusions. An application to manufacturing systems illustrates our results.  相似文献   

11.
12.
In this paper we consider the problem of optimal stopping and continuous control on some local parameters of a piecewise-deterministic Markov processes (PDP's). Optimality equations are obtained in terms of a set of variational inequalities as well as on the first jump time operator of the PDP. It is shown that if the final cost function is absolutely continuous along trajectories then so is the value function of the optimal stopping problem with continuous control. These results unify and generalize previous ones in the current literature.  相似文献   

13.
14.
In this paper, we study the infinite-horizon expected discounted continuous-time optimal control problem for Piecewise Deterministic Markov Processes with both impulsive and gradual (also called continuous) controls. The set of admissible control strategies is supposed to be formed by policies possibly randomized and depending on the past-history of the process. We assume that the gradual control acts on the jump intensity and on the transition measure, but not on the flow. The so-called Hamilton–Jacobi–Bellman (HJB) equation associated to this optimization problem is analyzed. We provide sufficient conditions for the existence of a solution to the HJB equation and show that the solution is in fact unique and coincides with the value function of the control problem. Moreover, the existence of an optimal control strategy is proven having the property to be stationary and non-randomized.  相似文献   

15.
We consider Markov Decision Processes under light traffic conditions. We develop an algorithm to obtain asymptotically optimal policies for both the total discounted and the average cost criterion. This gives a general framework for several light traffic results in the literature. We illustrate the method by deriving the asymptotically optimal control of a simple ATM network.  相似文献   

16.
We study a time-non-homogeneous Markov process which arose from free probability, and which also appeared in the study of stochastic processes with linear regressions and quadratic conditional variances. Our main result is the explicit expression for the generator of the (non-homogeneous) transition operator acting on functions that extend analytically to complex domains.  相似文献   

17.
We consider the extinction events of Galton–Watson processes with countably infinitely many types. In particular, we construct truncated and augmented Galton–Watson processes with finite but increasing sets of types. A pathwise approach is then used to show that, under some sufficient conditions, the corresponding sequence of extinction probability vectors converges to the global extinction probability vector of the Galton–Watson process with countably infinitely many types. Besides giving rise to a family of new iterative methods for computing the global extinction probability vector, our approach paves the way to new global extinction criteria for branching processes with countably infinitely many types.  相似文献   

18.
Recent results for parameter-adaptive Markov decision processes (MDP's) are extended to partially observed MDP's depending on unknown parameters. These results include approximations converging uniformly to the optimal reward function and asymptotically optimal adaptive policies.This research was supported in part by the Consejo del Sistema Nacional de Educación Tecnologica (COSNET) under Grant 178/84, in part by the Air Force Office of Scientific Research under Grant AFOSR-84-0089, in part by the National Science Foundation under Grant ECS-84-12100, and in part by the Joint Services Electronics Program under Contract F49602-82-C-0033.  相似文献   

19.
We are concerned with Markov decision processes with Borel state and action spaces; the transition law and the reward function depend on anunknown parameter. In this framework, we study therecursive adaptive nonstationary value iteration policy, which is proved to be optimal under thesame conditions usually imposed to obtain the optimality of other well-knownnonrecursive adaptive policies. The results are illustrated by showing the existence of optimal adaptive policies for a class of additive-noise systems with unknown noise distribution.This research was supported in part by the Consejo Nacional de Ciencia y Tecnología under Grants PCEXCNA-050156 and A128CCOEO550, and in part by the Third World Academy of Sciences under Grant TWAS RG MP 898-152.  相似文献   

20.
The paper deals with a class of discrete-time Markov control processes with Borel state and action spaces, and possibly unbounded one-stage costs. The processes are given by recurrent equations x t +1=F(x t ,a t t ), t=1,2,… with i.i.d. ℜ k – valued random vectors ξ t whose density ρ is unknown. Assuming observability of ξ t , and taking advantage of the procedure of statistical estimation of ρ used in a previous work by authors, we construct an average cost optimal adaptive policy. Received March/Revised version October 1997  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号