期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Finite horizon semi-Markov decision processes with application to maintenance systems

Yonghui Huang Xianping Guo 《European Journal of Operational Research》2011,212(1):131-140

This paper investigates finite horizon semi-Markov decision processes with denumerable states. The optimality is over the class of all randomized history-dependent policies which include states and also planning horizons, and the cost rate function is assumed to be bounded below. Under suitable conditions, we show that the value function is a minimum nonnegative solution to the optimality equation and there exists an optimal policy. Moreover, we develop an effective algorithm for computing optimal policies, derive some properties of optimal policies, and in addition, illustrate our main results with a maintenance system. 相似文献

2.

A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates

《中国科学数学(英文版)》2015,(9)

This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total loss incurred during a first passage time to some target set exceeds a loss level. First, we establish the optimality equation via a successive approximation technique, and show that the value function is the unique solution to the optimality equation. Second, we give suitable conditions, under which we prove the existence of optimal policies and develop an algorithm for computing ?-optimal policies. Finally, we apply our main results to a business system. 相似文献

3.

Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards

Arie Hordijk Alexander A. Yushkevich 《Mathematical Methods of Operations Research》1999,49(1):1-39

This paper is the first part of a study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove here the existence of deterministic stationary policies which are Blackwell optimal in the class of all, in general randomized, stationary policies. We establish also a lexicographical policy improvement algorithm leading to Blackwell optimal policies and the relation between such policies and the Blackwell optimality equation. Our technique is a combination of the weighted norms approach developed in Dekker and Hordijk (1988) for countable models with unbounded rewards and of the weak-strong topology approach used in Yushkevich (1997a) for Borel models with bounded rewards. 相似文献

4.

Discrete-time zero-sum Markov games with first passage criteria

Qiuli Liu Xiangxiang Huang 《Optimization》2017,66(4):571-587

In this paper, we deal with two-person zero-sum stochastic games for discrete-time Markov processes. The optimality criterion to be studied is the discounted payoff criterion during a first passage time to some target set, where the discount factor is state-dependent. The state and action spaces are all Borel spaces, and the payoff functions are allowed to be unbounded. Under the suitable conditions, we first establish the optimality equation. Then, using dynamic programming techniques, we obtain the existence of the value of the game and a pair of optimal stationary policies. Moreover, we present the exponential convergence of the value iteration and a ‘martingale characterization’ of a pair of optimal policies. Finally, we illustrate the applications of our main results with an inventory system. 相似文献

5.

离散时间马氏决策过程的首达目标准则

刘秋丽《应用数学学报》2011,34(6)

本文考虑可数状态离散时间马氏决策过程的首达目标模型的风险概率准则.优化的准则是最小化系统首次到达目标状态集的时间不超过某阈值的风险概率.首先建立最优方程并且证明最优值函数和最优方程的解对应,然后讨论了最优策略的一些性质,并进一步给出了最优平稳策略存在的条件,最后用一个例子说明我们的结果. 相似文献

6.

Optimization of a special case of continuous-time Markov decision processes with compact action set

Tang Hao Zhou Lei Arai Tamio 《European Journal of Operational Research》2008

Performance optimization is considered for average-cost multichain Markov decision processes (MDPs) with compact action set. Since, for a general compact multichain model, the optimality equation system may have no solution, and also a policy iteration algorithm may yield a suboptimal policy rather than an optimal one, we concentrate only on a special case of multichain models in this paper, where we assume that the classifications of states are fixed identically rather than varying with policies. By using the concept of performance potentials, the existence of solutions to the optimality equation system is established, and then a potential-based policy iteration algorithm is supposed to solve this system. In addition, the optimality convergence, for recurrent classes, of the algorithm has been proved. Finally, a numerical example is provided. 相似文献

7.

Optimal risk probability for first passage models in semi-Markov decision processes

Yonghui Huang 《Journal of Mathematical Analysis and Applications》2009,359(1):404-140

This paper studies the risk minimization problem in semi-Markov decision processes with denumerable states. The criterion to be optimized is the risk probability (or risk function) that a first passage time to some target set doesn't exceed a threshold value. We first characterize such risk functions and the corresponding optimal value function, and prove that the optimal value function satisfies the optimality equation by using a successive approximation technique. Then, we present some properties of optimal policies, and further give conditions for the existence of optimal policies. In addition, a value iteration algorithm and a policy improvement method for obtaining respectively the optimal value function and optimal policies are developed. Finally, two examples are given to illustrate the value iteration procedure and essential characterization of the risk function. 相似文献

8.

Controlled Markov set-chains under average criteria

Masanori Hosaka Masayuki Horiguchi Masami Kurano 《Applied mathematics and computation》2001,120(1-3):195-209

In this paper, applying an interval arithmetic analysis, we consider the average case of controlled Markov set-chains, whose process allows for fluctuating transition matrices at each step in time. We introduce a v-step contractive property for the average case, under which a Pareto optimal periodic policy is characterized as a maximal solution of optimality equation. Also, in the class of stationary policies, the behavior of the expected reward over T-horizon as T approaches ∞ is investigated and the left- and right-hand side optimality equations are given, by which a Pareto optimal stationary policy is found. As a numerical example, the Taxicab problem is considered. 相似文献

9.

Average optimality for continuous-time Markov decision processes with a policy iteration approach

Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704

This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献

10.

非负费用折扣半马氏决策过程 总被引：1，自引：0，他引：1

黄永辉郭先平《数学学报》2010,53(3):503-514

本文考虑可数状态非负费用的折扣半马氏决策过程.首先在给定半马氏决策核和策略下构造一个连续时间半马氏决策过程,然后用最小非负解方法证明值函数满足最优方程和存在ε-最优平稳策略,并进一步给出最优策略的存在性条件及其一些性质.最后,给出了值迭代算法和一个数值算例. 相似文献

11.

连续时间Markov决策过程的均值-方差优化问题

下载免费PDF全文

叶柳儿 ;黄香香《中国科学:数学》2014,44(8):883-898

本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性. 相似文献

12.

Risk-sensitive average continuous-time Markov decision processes with unbounded rates

《Optimization》2012,61(4):773-800

Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs. 相似文献

13.

Discounted continuous-time Markov decision processes with unbounded rates and randomized history-dependent policies: the dynamic programming approach

Alexey Piunovskiy Yi Zhang 《4OR: A Quarterly Journal of Operations Research》2014,12(1):49-75

This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation. 相似文献

14.

A new strong optimality criterion for nonstationary Markov decision processes

Xianping Guo Peng Shi Weiping Zhu 《Mathematical Methods of Operations Research》2000,52(2):287-306

This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given. 相似文献

15.

Denumerable state semi-Markov decision processes with unbounded costs,average cost criterion

A. Federgruen A. Hordijk H.C. Tijms 《Stochastic Processes and their Applications》1979,9(2):223-235

This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy. 相似文献

16.

Finite-horizon piecewise deterministic Markov decision processes with unbounded transition rates

Yonghui Huang Xianping Guo 《Stochastics An International Journal of Probability and Stochastic Processes》2019,91(1):67-95

This paper is concerned with the problem of minimizing the expected finite-horizon cost for piecewise deterministic Markov decision processes. The transition rates may be unbounded, and the cost functions are allowed to be unbounded from above and from below. The optimality is over the general history-dependent policies, where the control is continuously acting in time. The infinitesimal approach is employed to establish the associated Hamilton-Jacobi-Bellman equation, via which the existence of optimal policies is proved. An example is provided to verify all the assumptions proposed. 相似文献

17.

Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains

Tomás Prieto-Rumeau Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2009,70(3):527-540

This paper deals with denumerable-state continuous-time controlled Markov chains with possibly unbounded transition and reward rates. It concerns optimality criteria that improve the usual expected average reward criterion. First, we show the existence of average reward optimal policies with minimal average variance. Then we compare the variance minimization criterion with overtaking optimality. We present an example showing that they are opposite criteria, and therefore we cannot optimize them simultaneously. This leads to a multiobjective problem for which we identify the set of Pareto optimal policies (also known as nondominated policies). 相似文献

18.

连续时间马尔可夫决策过程的折扣模型

郭先平戴永隆《数学学报》2002,45(1):171-182

本文考虑的是转移速率族任意且费用率函数可能无界的连续时间马尔可夫决策过程的折扣模型．放弃了传统的要求相应于每个策略的　Ｑ　－过程唯一等条件,而首次考虑相应每个策略的　Ｑ　－过程不一定唯一,　转移速率族也不一定保守,　费用率函数可能无界,　且允许行动空间非空任意的情形．　本文首次用"α－折扣费用最优不等式"更新了传统的α－折扣费用最优方程,并用"最优不等式"和新的方法,不仅证明了传统的主要结果即最优平稳策略的存在性,　而且还进一步探讨了（　∈＞０　　）－最优平稳策略,具有单调性质的最优平稳策略,　以及（∈≥０）　－最优决策过程的存在性,　得到了一些有意义的新结果．　最后,　提供了一个迁移率受控的生灭系统例子,　它满足本文的所有条件,　而传统的假设（见文献［１－１４］）均不成立．相似文献

19.

Bias Optimality versus Strong 0-Discount Optimality in Markov Control Processes with Unbounded Costs

Nadine Hilgert Onésimo Hernández-Lerma 《Acta Appl Math》2003,77(3):215-235

This paper deals with expected average cost (EAC) and discount-sensitive criteria for discrete-time Markov control processes on Borel spaces, with possibly unbounded costs. Conditions are given under which (a) EAC optimality and strong –1-discount optimality are equivalent; (b) strong 0-discount optimality implies bias optimality; and, conversely, under an additional hypothesis, (c) bias optimality implies strong 0-discount optimality. Thus, in particular, as the class of bias optimal policies is nonempty, (c) gives the existence of a strong 0-discount optimal policy, whereas from (b) and (c) we get conditions for bias optimality and strong 0-discount optimality to be equivalent. A detailed example illustrates our results. 相似文献

20.

Solving optimal control problems for the unsteady Burgers equation in COMSOL Multiphysics

F. Y?lmaz 《Journal of Computational and Applied Mathematics》2011,235(16):4839-4850

The optimal control of unsteady Burgers equation without constraints and with control constraints are solved using the high-level modelling and simulation package COMSOL Multiphysics. Using the first-order optimality conditions, projection and semi-smooth Newton methods are applied for solving the optimality system. The optimality system is solved numerically using the classical iterative approach by integrating the state equation forward in time and the adjoint equation backward in time using the gradient method and considering the optimality system in the space-time cylinder as an elliptic equation and solving it adaptively. The equivalence of the optimality system to the elliptic partial differential equation (PDE) is shown by transforming the Burgers equation by the Cole-Hopf transformation to a linear diffusion type equation. Numerical results obtained with adaptive and nonadaptive elliptic solvers of COMSOL Multiphysics are presented both for the unconstrained and the control constrained case. 相似文献