首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper studies the policy iteration algorithm (PIA) for average cost Markov control processes on Borel spaces. Two classes of MCPs are considered. One of them allows some restricted-growth unbounded cost functions and compact control constraint sets; the other one requires strictly unbounded costs and the control constraint sets may be non-compact. For each of these classes, the PIA yields, under suitable assumptions, the optimal (minimum) cost, an optimal stationary control policy, and a solution to the average cost optimality equation.  相似文献   

2.
This paper deals with semi-Markov decision processes under the average expected criterion. The state and action spaces are Borel spaces, and the cost/reward function is allowed to be unbounded from above and from below. We give another set of conditions, under which the existence of an optimal (deterministic) stationary policy is proven by a new technique of two average optimality inequalities. Our conditions are slightly weaker than those in the existing literature, and some new sufficient conditions for the verifications of our assumptions are imposed on the primitive data of the model. Finally, we illustrate our results with three examples.  相似文献   

3.
This paper deals with Markov Decision Processes (MDPs) on Borel spaces with possibly unbounded costs. The criterion to be optimized is the expected total cost with a random horizon of infinite support. In this paper, it is observed that this performance criterion is equivalent to the expected total discounted cost with an infinite horizon and a varying-time discount factor. Then, the optimal value function and the optimal policy are characterized through some suitable versions of the Dynamic Programming Equation. Moreover, it is proved that the optimal value function of the optimal control problem with a random horizon can be bounded from above by the optimal value function of a discounted optimal control problem with a fixed discount factor. In this case, the discount factor is defined in an adequate way by the parameters introduced for the study of the optimal control problem with a random horizon. To illustrate the theory developed, a version of the Linear-Quadratic model with a random horizon and a Logarithm Consumption-Investment model are presented.  相似文献   

4.
In this paper, we show that a discounted continuous-time Markov decision process in Borel spaces with randomized history-dependent policies, arbitrarily unbounded transition rates and a non-negative reward rate is equivalent to a discrete-time Markov decision process. Based on a completely new proof, which does not involve Kolmogorov??s forward equation, it is shown that the value function for both models is given by the minimal non-negative solution to the same Bellman equation. A verifiable necessary and sufficient condition for the finiteness of this value function is given, which induces a new condition for the non-explosion of the underlying controlled process.  相似文献   

5.
本文对可数状态集、非空决策集、报酬无界的平均准则马氏决策过程,提出了一组新的条件,在此条件下存在(ε)最优平稳策略,且当最优不等式中的和有定义时最优不等式也成立。  相似文献   

6.
In this paper a new algorithm is provided for obtaining approximately optimal policies for infinite-horizon discounted Markov decision processes. In addition, some of the properties of the algorithm are established. The algorithm is based upon the fact that the optimal value function is the unique vector minimum function within the superharmonic set.  相似文献   

7.
8.
9.
定义了离散时间折扣多目标马氏决策模型,在加权准则下,证明了存在(n,∞)最优马氏策略;在字典序准则下,利用最优策略的结构性质,将其最优问题转化为一系列单目标模型的最优问题。  相似文献   

10.
郭先平  戴永隆 《数学学报》2002,45(1):171-182
本文考虑的是转移速率族任意且费用率函数可能无界的连续时间马尔可夫决策过程的折扣模型.放弃了传统的要求相应于每个策略的 Q -过程唯一等条件,而首次考虑相应每个策略的 Q -过程不一定唯一, 转移速率族也不一定保守, 费用率函数可能无界, 且允许行动空间非空任意的情形. 本文首次用"α-折扣费用最优不等式"更新了传统的α-折扣费用最优方程,并用"最优不等式"和新的方法,不仅证明了传统的主要结果即最优平稳策略的存在性, 而且还进一步探讨了( ∈>0  )-最优平稳策略,具有单调性质的最优平稳策略, 以及(∈≥0) -最优决策过程的存在性, 得到了一些有意义的新结果. 最后, 提供了一个迁移率受控的生灭系统例子, 它满足本文的所有条件, 而传统的假设(见文献[1-14])均不成立.  相似文献   

11.
Impulsive control of continuous-time Markov processes with risk- sensitive long-run average cost is considered. The most general impulsive control problem is studied under the restriction that impulses are in dyadic moments only. In a particular case of additive cost for impulses, the impulsive control problem is solved without restrictions on the moments of impulses. Accepted 30 April 2001. Online publication 29 August 2001.  相似文献   

12.
In this paper we are concerned with the existence of optimal stationary policies for infinite-horizon risk-sensitive Markov control processes with denumerable state space, unbounded cost function, and long-run average cost. Introducing a discounted cost dynamic game, we prove that its value function satisfies an Isaacs equation, and its relationship with the risk-sensitive control problem is studied. Using the vanishing discount approach, we prove that the risk-sensitive dynamic programming inequality holds, and derive an optimal stationary policy. Accepted 1 October 1997  相似文献   

13.
该文考虑的是可数状态空间有限行动空间非齐次马氏决策过程的期望总报酬准则.与以往不同的是,我们是通过扩大状态空间的方法,将非齐次的马氏决策过程转化成齐次的马氏决策过程,于是非常简洁地得到了按传统的方法所得的主要结果.  相似文献   

14.
Journal of Optimization Theory and Applications - This work concerns with semi-Markov decision chains evolving on a finite state space. The controller has a positive and constant risk sensitivity...  相似文献   

15.
In this article, we study risk-sensitive control problem with controlled continuous time Markov chain state dynamics. Using multiplicative dynamic programming principle along with the atomic structure of the state dynamics, we prove the existence and a characterization of optimal risk-sensitive control under geometric ergodicity of the state dynamics along with a smallness condition on the running cost.  相似文献   

16.
This paper concerns nonstationary continuous-time Markov control processes on Polish spaces, with the infinite-horizon discounted cost criterion. Necessary and sufficient conditions are given for a control policy to be optimal and asymptotically optimal. In addition, under suitable hypotheses, it is shown that the successive approximation procedure converges in the sense that the sequence of finite-horizon optimal cost functions and the corresponding optimal control policies both converge.  相似文献   

17.
This paper deals with the expected discounted continuous control of piecewise deterministic Markov processes (PDMP’s) using a singular perturbation approach for dealing with rapidly oscillating parameters. The state space of the PDMP is written as the product of a finite set and a subset of the Euclidean space ℝ n . The discrete part of the state, called the regime, characterizes the mode of operation of the physical system under consideration, and is supposed to have a fast (associated to a small parameter ε>0) and a slow behavior. By using a similar approach as developed in Yin and Zhang (Continuous-Time Markov Chains and Applications: A Singular Perturbation Approach, Applications of Mathematics, vol. 37, Springer, New York, 1998, Chaps. 1 and 3) the idea in this paper is to reduce the number of regimes by considering an averaged model in which the regimes within the same class are aggregated through the quasi-stationary distribution so that the different states in this class are replaced by a single one. The main goal is to show that the value function of the control problem for the system driven by the perturbed Markov chain converges to the value function of this limit control problem as ε goes to zero. This convergence is obtained by, roughly speaking, showing that the infimum and supremum limits of the value functions satisfy two optimality inequalities as ε goes to zero. This enables us to show the result by invoking a uniqueness argument, without needing any kind of Lipschitz continuity condition.  相似文献   

18.
The paper describes a bound on the optimal value of a Markov Decision Process using the iterates of the value iteration algorithm. Previous bounds have depended upon the values of the last two iterations, but here we consider a bound which depends on the last three iterates. We show that in a maximisation problem, this leads to a better lower bound on the optimal value.  相似文献   

19.
20.
A model for switching between gilt-edged securities is developed using a modified version of Howard's Markov decision algorithm. The model makes use of empirical observations of the behaviour of relative price movements. It produced some interesting results in the theory of Markov decision processes, and empirical tests of methods of implementation, which allow for constraints not included in the formal model, showed very promising results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号