首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This note describes sufficient conditions under which total-cost and average-cost Markov decision processes (MDPs) with general state and action spaces, and with weakly continuous transition probabilities, can be reduced to discounted MDPs. For undiscounted problems, these reductions imply the validity of optimality equations and the existence of stationary optimal policies. The reductions also provide methods for computing optimal policies. The results are applied to a capacitated inventory control problem with fixed costs and lost sales.  相似文献   

2.
The unichain condition requires that every policy in an MDP result in a single ergodic class, and guarantees that the optimal average cost is independent of the initial state. We show that checking whether the unichain condition fails to hold is an NP-complete problem. We conclude with a brief discussion of the merits of the more general weak accessibility condition.  相似文献   

3.
For an infinite-horizon discounted Markov decision process with a finite number of states and actions, this note provides upper bounds on the number of operations required to compute an approximately optimal policy by value iterations in terms of the discount factor, spread of the reward function, and desired closeness to optimality. One of the provided upper bounds on the number of iterations has the property that it is a non-decreasing function of the value of the discount factor.  相似文献   

4.
This paper deals with discrete-time Markov decision processes with average sample-path costs (ASPC) in Borel spaces. The costs may have neither upper nor lower bounds. We propose new conditions for the existence of ε-ASPC-optimal (deterministic) stationary policies in the class of all randomized history-dependent policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of ASPC optimal stationary policies are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has first been used to study the ASPC criterion. Also, the approach provided here is slightly different from the “optimality equation approach” widely used in the previous literature. On the other hand, under mild assumptions we show that average expected cost optimality and ASPC-optimality are equivalent. Finally, we use a controlled queueing system to illustrate our results.  相似文献   

5.
We consider Markov control processes with Borel state space and Feller transition probabilities, satisfying some generalized geometric ergodicity conditions. We provide a new theorem on the existence of a solution to the average cost optimality equation.  相似文献   

6.
We consider Markov Decision Processes under light traffic conditions. We develop an algorithm to obtain asymptotically optimal policies for both the total discounted and the average cost criterion. This gives a general framework for several light traffic results in the literature. We illustrate the method by deriving the asymptotically optimal control of a simple ATM network.  相似文献   

7.
In this paper, we study the infinite-horizon expected discounted continuous-time optimal control problem for Piecewise Deterministic Markov Processes with both impulsive and gradual (also called continuous) controls. The set of admissible control strategies is supposed to be formed by policies possibly randomized and depending on the past-history of the process. We assume that the gradual control acts on the jump intensity and on the transition measure, but not on the flow. The so-called Hamilton–Jacobi–Bellman (HJB) equation associated to this optimization problem is analyzed. We provide sufficient conditions for the existence of a solution to the HJB equation and show that the solution is in fact unique and coincides with the value function of the control problem. Moreover, the existence of an optimal control strategy is proven having the property to be stationary and non-randomized.  相似文献   

8.
We find inequalities to estimate the stability (robustness) of a discounted cost optimization problem for discrete-time Markov control processes on a Borel state space. The one stage cost is allowed to be unbounded. Unlike the known results in this area we consider a perturbation of transition probabilities measured by the Kantorovich metric, closely related to the weak convergence. The results obtained make possible to estimate the vanishing rate of the stability index when approximation is made through empirical measures.  相似文献   

9.
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957).  相似文献   

10.
Abstract

In this paper we study discrete-time Markov decision processes with average expected costs (AEC) and discount-sensitive criteria in Borel state and action spaces. The costs may have neither upper nor lower bounds. We propose another set of conditions on the system's primitive data, and under which we prove (1) AEC optimality and strong ? 1-discount optimality are equivalent; (2) a condition equivalent to strong 0-discount optimal stationary policies; and (3) the existence of strong n (n = ?1, 0)-discount optimal stationary policies. Our conditions are weaker than those in the previous literature. In particular, the “stochastic monotonicity condition” in this paper has been first used to study strong n (n = ?1, 0)-discount optimality. Moreover, we provide a new approach to prove the existence of strong 0-discount optimal stationary policies. It should be noted that our way is slightly different from those in the previous literature. Finally, we apply our results to an inventory system and a controlled queueing system.  相似文献   

11.
12.
The control problem of controlling ruin probabilities by investments in a financial market is studied. The insurance business is described by the usual Cramer-Lundberg-type model and the risk driver of the financial market is a compound Poisson process. Conditions for investments to be profitable are derived by means of discrete-time dynamic programming. Moreover Lundberg bounds are established for the controlled model.  相似文献   

13.
作者考虑的是任意状态空间,任意行动空间非平稳MDP的平均样本轨道目标。在弱遍历条件下用鞅的极限理论,证明了最优马氏策略的存在性,推广了A.Arapostathis,V.Borkar,E.F.Gaucherand,M.Ghosh,S.Marcus(1993)的主要结果。  相似文献   

14.
This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature.  相似文献   

15.
In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our conditions. Supported by NSFC, NCET and RFDP.  相似文献   

16.
本文研究了在一般状态空间具有平均费用的非平稳Markov决策过程,把在平稳情形用补充的折扣模型的最优方程来建立平均费用的最优方程的结果,推广到非平稳的情形.利用这个结果证明了最优策略的存在性.  相似文献   

17.
In this paper we consider three discrete-time discounted Bayesian search problems with an unknown number of objects and uncertainty about the distribution of the objects among the boxes. Moreover, we admit uncertainty about the detection probabilities. The goal is to determine a policy which finds (dependent on the search problem) at least one object or all objects with minimal expected total cost. We give sufficient conditions for the optimality of the greedy policy which has been introduced in Liebig/Rieder (1996). For some examples in which the greedy policy is not optimal we derive a bound for the error.  相似文献   

18.
本文首次在报酬函数及转移速率族均非一致有界的条件下,对可数状态空间,可地动集的连续时间折扣马氏决策规划进行研究,文中引入一类新的无界报酬函数,在一类新的马氏策略中,讨论了最优策略的存在性及春结构,除证明了在有界报酬和一致有界转移速率族下成立的主要结果外,本文还得到一些重要结论。  相似文献   

19.
This paper presents the formulas of the expected long-run cost per unit time for a cold-standby system composed of two identical components with perfect switching. When a component fails, a repairman will be called in to bring the component back to a certain working state. The time to repair is composed of two different time periods: waiting time and real repair time. The waiting time starts from the failure of a component to the start of repair, and the real repair time is the time between the start to repair and the completion of the repair. We also assume that the time to repair can either include only real repair time with a probability p, or include both waiting and real repair times with a probability 1 − p. Special cases are discussed when both working times and real repair times are assumed to be geometric processes, and the waiting time is assumed to be a renewal process. The expected long-run cost per unit time is derived and a numerical example is given to demonstrate the usefulness of the derived expression.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号