首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We consider a Markov decision process with a Borel state space, a countable action space, finite action sets, bounded rewards and a bounded transition density satisfying a simultaneous Doeblin condition. The existence of stationary strong 0-discount optimal polices is proved.Supported by NSF grant DMS-9404177.  相似文献   

2.
本文对可数状态集、非空决策集、报酬无界的平均准则马氏决策过程,提出了一组新的条件,在此条件下存在(ε)最优平稳策略,且当最优不等式中的和有定义时最优不等式也成立。  相似文献   

3.
The control of piecewise-deterministic processes is studied where only local boundedness of the data is assumed. Moreover the discount rate may be zero. The value function is shown to be solution to the Bellman equation in a weak sense; however the solution concept is strong enough to generate optimal policies. Continuity and compactness conditions are given for the existence of nonrelaxed optimal feedback controls.  相似文献   

4.
研究可数状态空间任意行动空间非一致性有界费用马氏决策过程(MDP)的强平均最优,给出了使得每个常用的平均最优策略也是强平均最优的条件,并实质性的推广了Cavazos-Cadena和Fernandez-Gaucheran(Math. Meth. Oper. Res., 1996, 43: 281-300)的主要结果.  相似文献   

5.
This paper concerns nonstationary continuous-time Markov control processes on Polish spaces, with the infinite-horizon discounted cost criterion. Necessary and sufficient conditions are given for a control policy to be optimal and asymptotically optimal. In addition, under suitable hypotheses, it is shown that the successive approximation procedure converges in the sense that the sequence of finite-horizon optimal cost functions and the corresponding optimal control policies both converge.  相似文献   

6.
《Optimization》2012,61(5):767-781
This paper consider Markov decision processes with countable state space, compact action spaces and a bounded reward function. Under some recurrence and connectedness condition, including the simultaneous Döblin condition, we prove the existence of bounded solutions of the optimality equations which arise for the multichain case in connection with the average reward criterion and sensitive optimality criteria, and we give a characterization of the sets of n-average optimal decision rules.  相似文献   

7.
Abstract

In this paper we study discrete-time Markov decision processes with average expected costs (AEC) and discount-sensitive criteria in Borel state and action spaces. The costs may have neither upper nor lower bounds. We propose another set of conditions on the system's primitive data, and under which we prove (1) AEC optimality and strong ? 1-discount optimality are equivalent; (2) a condition equivalent to strong 0-discount optimal stationary policies; and (3) the existence of strong n (n = ?1, 0)-discount optimal stationary policies. Our conditions are weaker than those in the previous literature. In particular, the “stochastic monotonicity condition” in this paper has been first used to study strong n (n = ?1, 0)-discount optimality. Moreover, we provide a new approach to prove the existence of strong 0-discount optimal stationary policies. It should be noted that our way is slightly different from those in the previous literature. Finally, we apply our results to an inventory system and a controlled queueing system.  相似文献   

8.
In recent years, sufficient optimality criteria and solution stability in optimal control have been investigated widely and used in the analysis of discrete numerical methods. These results were concerned mainly with weak local optima, whereas strong optimality has been considered often as a purely theoretical aspect. In this paper, we show via an example problem how weak the weak local optimality can be and derive new strong optimality conditions. The criteria are suitable for practical verification and can be applied to the case of discontinuous controls with changes in the set of active constraints.  相似文献   

9.
This paper studies the expected total cost (ETC) criterion for discrete-time Markov control processes on Borel spaces, and possibly unbounded cost-per-stage functions. It presents optimality results which include conditions for a control policy to be ETC-optimal and for the ETC-value function to be a solution of the dynamic programming equation. Conditions are also given for the ETC-value function to be the limit of the -discounted cost value function as 1, and for the Markov control process to be `stable" in the sense of Lagrange and almost surely. In addition, transient control models are fully analized. The paper thus provides a fairly complete, up-dated, survey-like presentation of the ETC criterion for Markov control processes on Borel spaces.  相似文献   

10.
11.
We consider Markov control processes with Borel state space and Feller transition probabilities, satisfying some generalized geometric ergodicity conditions. We provide a new theorem on the existence of a solution to the average cost optimality equation.  相似文献   

12.
周健伟 《应用数学》2000,13(2):86-89
设(Xt)是有转移函数的马尔可夫过程,其中Xt取值于状态空间(Et,ξ,t≥0。设ft是(Et,ξ)到状态空间(Et,ξ是的可测变换。本文给出了使(ft(Xt)仍是有转移函数的马尔可夫过程的充分条件,对于有函数的马尔可夫过程族,也讨论了类似的问题。  相似文献   

13.
Semi-Markov control processes with Borel state space and Feller transition probabilities are considered. The objective of the paper is to prove coincidence of two expected average costs: the time-average and the ratio-average for stationary policies. Moreover, the optimal stationary policy is the same for both criteria.  相似文献   

14.
This paper studies the policy iteration algorithm (PIA) for average cost Markov control processes on Borel spaces. Two classes of MCPs are considered. One of them allows some restricted-growth unbounded cost functions and compact control constraint sets; the other one requires strictly unbounded costs and the control constraint sets may be non-compact. For each of these classes, the PIA yields, under suitable assumptions, the optimal (minimum) cost, an optimal stationary control policy, and a solution to the average cost optimality equation.  相似文献   

15.
《Optimization》2012,61(4):773-800
Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs.  相似文献   

16.
This paper deals with the asymptotic optimality of a stochastic dynamic system driven by a singularly perturbed Markov chain with finite state space. The states of the Markov chain belong to several groups such that transitions among the states within each group occur much more frequently than transitions among the states in different groups. Aggregating the states of the Markov chain leads to a limit control problem, which is obtained by replacing the states in each group by the corresponding average distribution. The limit control problem is simpler to solve as compared with the original one. A nearly-optimal solution for the original problem is constructed by using the optimal solution to the limit problem. To demonstrate, the suggested approach of asymptotic optimal control is applied to examples of manufacturing systems of production planning.  相似文献   

17.
After an introduction into sensitive criteria in Markov decision processes and a discussion of definitions, we prove the existence of stationary Blackwell optimal policies under following main assumptions: (i) the state space is a Borel one; (ii) the action space is countable, the action sets are finite; (iii) the transition function is given by a transition density; (iv) a simultaneous Doeblin-type recurrence condition holds. The proof is based on an aggregation of randomized stationary policies into measures. Topology in the space of those measures is at the same time a weak and a strong one, and this fact yields compactness of the space and continuity of Laurent coefficients of the expected discounted reward. Another important tool is a lexicographical policy improvement. The exposition is mostly self-contained.Supported by the National Science Foundation.  相似文献   

18.
We consider a Markov decision process with a Borel state space, bounded rewards, and a bounded transition density satisfying a simultaneous Doeblin-Doob condition. An asymptotics for the discounted value function related to the existence of stationary strong 0-discount optimal policies is extended from the case of finite action sets to the case of compact action sets and continuous in action rewards and transition densities.Supported by NSF grant DMS-9404177  相似文献   

19.
Impulsive control of continuous-time Markov processes with risk- sensitive long-run average cost is considered. The most general impulsive control problem is studied under the restriction that impulses are in dyadic moments only. In a particular case of additive cost for impulses, the impulsive control problem is solved without restrictions on the moments of impulses. Accepted 30 April 2001. Online publication 29 August 2001.  相似文献   

20.
Abstract

In this article, we study continuous-time Markov decision processes in Polish spaces. The optimality criterion to be maximized is the expected discounted criterion. The transition rates may be unbounded, and the reward rates may have neither upper nor lower bounds. We provide conditions on the controlled system's primitive data under which we prove that the transition functions of possibly non-homogeneous continuous-time Markov processes are regular by using Feller's construction approach to such transition functions. Then, under continuity and compactness conditions we prove the existence of optimal stationary policies by using the technique of extended infinitesimal operators associated with the transition functions of possibly non-homogeneous continuous-time Markov processes, and also provide a recursive way to compute (or at least to approximate) the optimal reward values. The conditions provided in this paper are different from those used in the previous literature, and they are illustrated with an example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号