期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Nonstationary denumerable state Markov decision processes – with average variance criterion

Xianping Guo 《Mathematical Methods of Operations Research》1999,49(1):87-96

In this paper, we consider the nonstationary Markov decision processes (MDP, for short) with average variance criterion on a countable state space, finite action spaces and bounded one-step rewards. From the optimality equations which are provided in this paper, we translate the average variance criterion into a new average expected cost criterion. Then we prove that there exists a Markov policy, which is optimal in an original average expected reward criterion, that minimizies the average variance in the class of optimal policies for the original average expected reward criterion. 相似文献

2.

A new strong optimality criterion for nonstationary Markov decision processes

Xianping Guo Peng Shi Weiping Zhu 《Mathematical Methods of Operations Research》2000,52(2):287-306

This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given. 相似文献

3.

连续时间Markov决策过程的均值-方差优化问题

下载免费PDF全文

叶柳儿 ;黄香香《中国科学:数学》2014,44(8):883-898

本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性. 相似文献

4.

Borel状态空间非平稳MDP的平均方差准则

郭先平《数学学报》2001,44(2):333-342

本文考虑具有Ｂｏｒｅｌ状态空间和行动空间非平稳ＭＤＰ的平均方差准则．首先,在遍历条件下,利用最优方程,证明了关于平均期望目标最优马氏策略的存在性．然后,通过构造新的模型,利用马氏过程的理论,进一步证明了在关于平均期望目标是最优的一类马氏策略中,存在一个马氏策略使得平均方差达到最小．作为本文的特例还得到了ＤｙｎｋｉｎＥ．Ｂ．和ＹｕｓｈｋｅｖｉｃｈＡ．Ａ．及ＫｕｒａｎｏＭ．等中的主要结果．相似文献

5.

Optimality equations and sensitive optimality in bounded Markov decision processes 1

《Optimization》2012,61(5):767-781

This paper consider Markov decision processes with countable state space, compact action spaces and a bounded reward function. Under some recurrence and connectedness condition, including the simultaneous Döblin condition, we prove the existence of bounded solutions of the optimality equations which arise for the multichain case in connection with the average reward criterion and sensitive optimality criteria, and we give a characterization of the sets of n-average optimal decision rules. 相似文献

6.

Average optimality for continuous-time Markov decision processes with a policy iteration approach

Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704

This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献

7.

Recent results on conditions for the existence of average optimal stationary policies

Rolando Cavazos-Cadena 《Annals of Operations Research》1991,28(1):3-27

This paper concerns countable state space Markov decision processes endowed with a (long-run expected)average reward criterion. For these models we summarize and, in some cases,extend some recent results on sufficient conditions to establish the existence of optimal stationary policies. The topics considered are the following: (i) the new assumptions introduced by Sennott in [20–23], (ii)necessary and sufficient conditions for the existence of a bounded solution to the optimality equation, and (iii) equivalence of average optimality criteria. Some problems are posed.This research was partially supported by the Third World Academy of Sciences (TWAS) under Grant No. TWAS RG MP 898-152. 相似文献

8.

Average optimality inequality for continuous-time Markov decision processes in Polish spaces

Quanxin Zhu 《Mathematical Methods of Operations Research》2007,66(2):299-313

In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957). 相似文献

9.

Markov control processes with pathwise constraints

Armando F. Mendoza-Pérez Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2010,71(3):477-502

This paper deals with discrete-time Markov control processes in Borel spaces, with unbounded rewards. The criterion to be optimized is a long-run sample-path (or pathwise) average reward subject to constraints on a long-run pathwise average cost. To study this pathwise problem, we give conditions for the existence of optimal policies for the problem with “expected” constraints. Moreover, we show that the expected case can be solved by means of a parametric family of optimality equations. These results are then extended to the problem with pathwise constraints. 相似文献

10.

Quantifying the risk in age and block replacement policies

B C Giri T Dohi 《The Journal of the Operational Research Society》2010,61(7):1151-1158

Preventive maintenance policies have been studied in the literature without considering the risk due to the cost variability. In this paper, we consider the two most popular preventive replacement policies, namely, age and block replacement policies under long-run average cost and expected unit time cost criteria. To quantify the risk in the preventive maintenance policies, we use the long-run variance of the accumulated cost over a time interval. We numerically derive the Risk-sensitive preventive replacement policies and study the impact of the Risk-sensitive optimality criterion on the managerial decisions. We also examine the performance of the expected unit time cost criterion as an alternative to the traditional long-run average cost criterion. 相似文献

11.

Constrained Semi-Markov decision processes with average rewards

Eugene A. Feinberg 《Mathematical Methods of Operations Research》1994,39(3):257-288

This paper deals with constrained average reward Semi-Markov Decision Processes (SMDPs) with finite state and action sets. We consider two average reward criteria. The first criterion is time-average rewards, which equal the lower limits of the expected average rewards per unit time, as the horizon tends to infinity. The second criterion is ratio-average rewards, which equal the lower limits of the ratios of the expected total rewards during the firstn steps to the expected total duration of thesen steps asn . For both criteria, we prove the existence of optimal mixed stationary policies for constrained problems when the constraints are of the same nature as the objective functions. For unichain problems, we show the existence of randomized stationary policies which are optimal for both criteria. However, optimal mixed stationary policies may be different for each of these critria even for unichain problems. We provide linear programming algorithms for the computation of optimal policies. 相似文献

12.

An improved algorithm for solving communicating average reward Markov decision processes 总被引：1，自引：0，他引：1

Moshe Haviv Martin L. Puterman 《Annals of Operations Research》1991,28(1):229-242

This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527. 相似文献

13.

Multi-Armed bandit problem revisited 总被引：1，自引：0，他引：1

T. Ishikida P. Varaiya 《Journal of Optimization Theory and Applications》1994,83(1):113-154

In this paper, we revisit aspects of the multi-armed bandit problem in the earlier work (Ref. 1). An alternative proof of the optimality of the Gittins index rule is derived under the discounted reward criterion. The proof does not involve an explicit use of the interchange argument. The ideas of the proof are extended to derive the asymptotic optimality of the index rule under the average reward criterion. Problems involving superprocesses and arm-acquiring bandits are also reexamined. The properties of an optimal policy for an arm-acquiring bandit are discussed.This research was supported by NSF Grant IRI-91-20074. 相似文献

14.

Markov Decision Processes with Variance Minimization: A New Condition and Approach

Quanxin Zhu Xianping Guo 《随机分析与应用》2013,31(3):577-592

Abstract

This article deals with the limiting average variance criterion for discrete-time Markov decision processes in Borel spaces. The costs may have neither upper nor lower bounds. We propose another set of conditions under which we prove the existence of a variance minimal policy in the class of average expected cost optimal stationary policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of a variance minimal policy are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has been first used to study the limiting average variance criterion. Also, the optimality inequality approach provided here is different from the “optimality equation approach” widely used in the previous literature. Finally, we use a controlled queueing system to illustrate our results. 相似文献

15.

Optimal models with maximizing probability of first achieving target value in the preceding stages

林元烈伍从斌康波大《中国科学A辑(英文版)》2003,46(3):396-414

Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy 相似文献

16.

Continuous time Markov decision programming with average reward criterion and unbounded reward rate

Shaohui Zheng 《应用数学学报(英文版)》1991,7(1):6-16

This paper deals with the continuous time Markov decision programming (briefly CTMDP) with unbounded reward rate. The economic criterion is the long-run average reward. To the models with countable state space and compact metric action sets, we present a set of sufficient conditions to ensure the existence of the stationary optimal policies.This paper was prepared with the support of the National Youth Science Foundation. 相似文献

17.

Optimal Pricing and Admission Control in a Queueing System with Periodically Varying Parameters 总被引：2，自引：1，他引：1

Yoon Seunghwan Lewis Mark E. 《Queueing Systems》2004,47(3):177-199

We consider congestion control in a nonstationary queueing system. Assuming that the arrival and service rates are bounded, periodic functions of time, a Markov decision process (MDP) formulation is developed. We show under the infinite horizon discounted and average reward optimality criteria, for each fixed time, optimal pricing and admission control strategies are nondecreasing in the number of customers in the system. This extends stationary results to the nonstationary setting. Despite this result, the problem still seems intractable. We propose an easily implementable pointwise stationary approximation (PSA) to approximate the optimal policies, suggest a heuristic to improve the implementation of the PSA and verify its usefulness via a numerical study. 相似文献

18.

A mean–variance optimization problem for discounted Markov decision processes

Xianping Guo Liuer Ye George Yin 《European Journal of Operational Research》2012

In this paper, we consider a mean–variance optimization problem for Markov decision processes (MDPs) over the set of (deterministic stationary) policies. Different from the usual formulation in MDPs, we aim to obtain the mean–variance optimal policy that minimizes the variance over a set of all policies with a given expected reward. For continuous-time MDPs with the discounted criterion and finite-state and action spaces, we prove that the mean–variance optimization problem can be transformed to an equivalent discounted optimization problem using the conditional expectation and Markov properties. Then, we show that a mean–variance optimal policy and the efficient frontier can be obtained by policy iteration methods with a finite number of iterations. We also address related issues such as a mutual fund theorem and illustrate our results with an example. 相似文献

19.

Denumerable controlled Markov chains with strong average optimality criterion: Bounded & unbounded costs

Rolando Cavazos-Cadena Emmanuel Fernández-Gaucherand 《Mathematical Methods of Operations Research》1996,43(3):281-300

This paper studies discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains (CMC) with denumerable state space and compact action space, and with an infinite planning horizon. Recently, there has been a renewed interest in CMC with a long-run, expected average cost (AC) optimality criterion. A classical approach to study average optimality consists in formulating the AC case as a limit of the discounted cost (DC) case, as the discount factor increases to 1, i.e., as the discounting effectvanishes. This approach has been rekindled in recent years, with the introduction by Sennott and others of conditions under which AC optimal stationary policies are shown to exist. However, AC optimality is a rather underselective criterion, which completely neglects the finite-time evolution of the controlled process. Our main interest in this paper is to study the relation between the notions of AC optimality andstrong average cost (SAC) optimality. The latter criterion is introduced to asses the performance of a policy over long but finite horizons, as well as in the long-run average sense. We show that for bounded one-stage cost functions, Sennott's conditions are sufficient to guarantee thatevery AC optimal policy is also SAC optimal. On the other hand, a detailed counterexample is given that shows that the latter result does not extend to the case of unbounded cost functions. In this counterexample, Sennott's conditions are verified and a policy is exhibited that is both average and Blackwell optimal and satisfies the average cost inequality. 相似文献

20.

Weighted Markov decision processes with perturbation

Ke Liu Jerzy A. Filar 《Mathematical Methods of Operations Research》2001,53(3):465-480

相似文献