首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性.  相似文献   

2.
1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe...  相似文献   

3.
We consider Markov Decision Processes under light traffic conditions. We develop an algorithm to obtain asymptotically optimal policies for both the total discounted and the average cost criterion. This gives a general framework for several light traffic results in the literature. We illustrate the method by deriving the asymptotically optimal control of a simple ATM network.  相似文献   

4.
Using a concept of random fuzzy variables in credibility theory, we formulate a credibilistic model for unichain Markov decision processes under average criteria. And a credibilistically optimal policy is defined and obtained by solving the corresponding non-linear mathematical programming. Also we give a computational example to illustrate the effectiveness of our new model.  相似文献   

5.
This paper deals with discrete-time Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Under general conditions, we develop an iteration algorithm for computing the optimal value function, and also prove the existence of optimal stationary policies. Furthermore, we illustrate our results with a cash-balance model.  相似文献   

6.
7.
This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature.  相似文献   

8.
This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527.  相似文献   

9.
A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given.  相似文献   

10.
We introduce a revised simplex algorithm for solving a typical type of dynamic programming equation arising from a class of finite Markov decision processes. The algorithm also applies to several types of optimal control problems with diffusion models after discretization. It is based on the regular simplex algorithm, the duality concept in linear programming, and certain special features of the dynamic programming equation itself. Convergence is established for the new algorithm. The algorithm has favorable potential applicability when the number of actions is very large or even infinite.  相似文献   

11.
12.
We consider the minimizing risk problems in discounted Markov decisions processes with countable state space and bounded general rewards. We characterize optimal values for finite and infinite horizon cases and give two sufficient conditions for the existence of an optimal policy in an infinite horizon case. These conditions are closely connected with Lemma 3 in White (1993), which is not correct as Wu and Lin (1999) point out. We obtain a condition for the lemma to be true, under which we show that there is an optimal policy. Under another condition we show that an optimal value is a unique solution to some optimality equation and there is an optimal policy on a transient set.  相似文献   

13.
We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model.  相似文献   

14.
15.
This paper is concerned with the problem of minimizing the expected finite-horizon cost for piecewise deterministic Markov decision processes. The transition rates may be unbounded, and the cost functions are allowed to be unbounded from above and from below. The optimality is over the general history-dependent policies, where the control is continuously acting in time. The infinitesimal approach is employed to establish the associated Hamilton-Jacobi-Bellman equation, via which the existence of optimal policies is proved. An example is provided to verify all the assumptions proposed.  相似文献   

16.
Recent results for parameter-adaptive Markov decision processes (MDP's) are extended to partially observed MDP's depending on unknown parameters. These results include approximations converging uniformly to the optimal reward function and asymptotically optimal adaptive policies.This research was supported in part by the Consejo del Sistema Nacional de Educación Tecnologica (COSNET) under Grant 178/84, in part by the Air Force Office of Scientific Research under Grant AFOSR-84-0089, in part by the National Science Foundation under Grant ECS-84-12100, and in part by the Joint Services Electronics Program under Contract F49602-82-C-0033.  相似文献   

17.
This note presents a technique that is useful for the study of piecewise deterministic Markov decision processes (PDMDPs) with general policies and unbounded transition intensities. This technique produces an auxiliary PDMDP from the original one. The auxiliary PDMDP possesses certain desired properties, which may not be possessed by the original PDMDP. We apply this technique to risk-sensitive PDMDPs with total cost criteria, and comment on its connection with the uniformization technique.  相似文献   

18.
19.
We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ  . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure μnμn of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model.  相似文献   

20.
We prove the metastable behavior of reversible Markov processes on finite state spaces under minimal conditions on the jump rates. To illustrate the result we deduce the metastable behavior of the Ising model with a small magnetic field at very low temperature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号