共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper deals with discrete-time Markov decision processes with average sample-path costs (ASPC) in Borel spaces. The costs may have neither upper nor lower bounds. We propose new conditions for the existence of ε-ASPC-optimal (deterministic) stationary policies in the class of all randomized history-dependent policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of ASPC optimal stationary policies are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has first been used to study the ASPC criterion. Also, the approach provided here is slightly different from the “optimality equation approach” widely used in the previous literature. On the other hand, under mild assumptions we show that average expected cost optimality and ASPC-optimality are equivalent. Finally, we use a controlled queueing system to illustrate our results. 相似文献
2.
Q. X. Zhu 《Mathematical Methods of Operations Research》2007,65(3):519-538
This paper studies both the average sample-path reward (ASPR) criterion and the limiting average variance criterion for denumerable discrete-time Markov decision processes. The rewards may have neither upper nor lower bounds. We give sufficient conditions on the system’s primitive data and under which we prove the existence of ASPR-optimal stationary policies and variance optimal policies. Our conditions
are weaker than those in the previous literature. Moreover, our results are illustrated by a controlled queueing system.
Research partially supported by the Natural Science Foundation of Guangdong Province (Grant No: 06025063) and the Natural
Science Foundation of China (Grant No: 10626021). 相似文献
3.
Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704
This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献
4.
5.
本文对可数状态集、非空决策集、报酬无界的平均准则马氏决策过程,提出了一组新的条件,在此条件下存在(ε)最优平稳策略,且当最优不等式中的和有定义时最优不等式也成立。 相似文献
6.
该文考虑的是可数状态空间有限行动空间非齐次马氏决策过程的期望总报酬准则.与以往不同的是,我们是通过扩大状态空间的方法,将非齐次的马氏决策过程转化成齐次的马氏决策过程,于是非常简洁地得到了按传统的方法所得的主要结果. 相似文献
7.
Abstract In this article, we study continuous-time Markov decision processes in Polish spaces. The optimality criterion to be maximized is the expected discounted criterion. The transition rates may be unbounded, and the reward rates may have neither upper nor lower bounds. We provide conditions on the controlled system's primitive data under which we prove that the transition functions of possibly non-homogeneous continuous-time Markov processes are regular by using Feller's construction approach to such transition functions. Then, under continuity and compactness conditions we prove the existence of optimal stationary policies by using the technique of extended infinitesimal operators associated with the transition functions of possibly non-homogeneous continuous-time Markov processes, and also provide a recursive way to compute (or at least to approximate) the optimal reward values. The conditions provided in this paper are different from those used in the previous literature, and they are illustrated with an example. 相似文献
8.
Quanxin Zhu 《Mathematical Methods of Operations Research》2007,66(2):299-313
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward)
rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal
stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new”
cost (or reward) rate.
Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation
of Guangdong Province (Grant No: 06300957). 相似文献
9.
In this paper we discuss MDP with distribution function criterion of first-passage time. Some properties of several kinds
of optimal policies are given. Existence results and algorithms for these optimal policies are given in this paper.
Accepted 24 July 2000. Online publication 12 April 2001. 相似文献
10.
本文讨论离散型冲击折扣半马氏决策过程,在建立模型后,我们将它化成了一个等价的离散时间马氏决策过程. 相似文献
11.
Eilon Solan 《Journal of Theoretical Probability》2003,16(4):831-845
We provide a bound for the variation of the function that assigns to every competitive Markov decision process and every discount factor its discounted value. This bound implies that the undiscounted value of a competitive Markov decision process is continuous in the relative interior of the space of transition rules. 相似文献
12.
Boussemart M. Bickard T. Limnios N. 《Methodology and Computing in Applied Probability》2001,3(2):199-214
In this paper, we introduce a Markov decision model with absorbing states and a constraint on the asymptotic failure rate. The objective is to find a stationary policy which minimizes the infinite horizon expected average cost, given that the system never fails. Using Perron-Frobenius theory of non-negative matrices and spectral analysis, we show that the problem can be reduced to a linear programming problem. Finally, we apply this method to a real problem for an aeronautical system. 相似文献
13.
Quan-xin Zhu 《高校应用数学学报(英文版)》2010,25(4):400-410
This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions. 相似文献
14.
This paper deals with discrete-time Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Under general conditions, we develop an iteration algorithm for computing the optimal value function, and also prove the existence of optimal stationary policies. Furthermore, we illustrate our results with a cash-balance model. 相似文献
15.
Quanxin Zhu 《随机分析与应用》2013,31(5):953-974
Abstract In this paper we study discrete-time Markov decision processes with average expected costs (AEC) and discount-sensitive criteria in Borel state and action spaces. The costs may have neither upper nor lower bounds. We propose another set of conditions on the system's primitive data, and under which we prove (1) AEC optimality and strong ? 1-discount optimality are equivalent; (2) a condition equivalent to strong 0-discount optimal stationary policies; and (3) the existence of strong n (n = ?1, 0)-discount optimal stationary policies. Our conditions are weaker than those in the previous literature. In particular, the “stochastic monotonicity condition” in this paper has been first used to study strong n (n = ?1, 0)-discount optimality. Moreover, we provide a new approach to prove the existence of strong 0-discount optimal stationary policies. It should be noted that our way is slightly different from those in the previous literature. Finally, we apply our results to an inventory system and a controlled queueing system. 相似文献
16.
离散事件系统静态稳定性的马氏决策过程方法 总被引:3,自引:0,他引:3
本文用马氏决策过程方法来讨论离散事件系统(DES)的静态稳定性问题,包括强吸引域和弱吸引域的计算,同时讨论了弱吸引域的稳定控制器的计算问题,所用方法不要求给定谓词的∑-(或∑u-)不变性和对环路的判别。 相似文献
17.
Evgueni Gordienko Enrique Lemus-Rodríguez Raúl Montes-de-Oca 《Mathematical Methods of Operations Research》2009,70(1):13-33
We study perturbations of a discrete-time Markov control process on a general state space. The amount of perturbation is measured
by means of the Kantorovich distance. We assume that an average (per unit of time on the infinite horizon) optimal control
policy can be found for the perturbed (supposedly known) process, and that it is used to control the original (unperturbed)
process. The one-stage cost is not assumed to be bounded. Under Lyapunov-like conditions we find upper bounds for the average
cost excess when such an approximation is used in place of the optimal (unknown) control policy. As an application of the
found inequalities we consider the approximation by relevant empirical distributions. We illustrate our results by estimating
the stability of a simple autoregressive control process. Also examples of unstable processes are provided. 相似文献
18.
19.
The use of Markov Decision Processes for Inspection Maintenance and Rehabilitation of civil engineering structures relies on the use of several transition matrices related to the stochastic degradation process, maintenance actions and imperfect inspections. Point estimators for these matrices are usually used and they are evaluated using statistical inference methods and/or expert evaluation methods. Thus, considerable epistemic uncertainty often veils the true values of these matrices. Our contribution through this paper is threefold. First, we present a methodology for incorporating epistemic uncertainties in dynamic programming algorithms used to solve finite horizon Markov Decision Processes (which may be partially observable). Second, we propose a methodology based on the use of Dirichlet distributions which answers, in our sense, much of the controversy found in the literature about estimating Markov transition matrices. Third, we show how the complexity resulting from the use of Monte-Carlo simulations for the transition matrices can be greatly overcome in the framework of dynamic programming. The proposed model is applied to concrete bridge under degradation, in order to provide the optimal strategy for inspection and maintenance. The influence of epistemic uncertainties on the optimal solution is underlined through sensitivity analysis regarding the input data. 相似文献
20.
This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results. 相似文献