期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimal risk probability for first passage models in semi-Markov decision processes

Yonghui Huang 《Journal of Mathematical Analysis and Applications》2009,359(1):404-140

This paper studies the risk minimization problem in semi-Markov decision processes with denumerable states. The criterion to be optimized is the risk probability (or risk function) that a first passage time to some target set doesn't exceed a threshold value. We first characterize such risk functions and the corresponding optimal value function, and prove that the optimal value function satisfies the optimality equation by using a successive approximation technique. Then, we present some properties of optimal policies, and further give conditions for the existence of optimal policies. In addition, a value iteration algorithm and a policy improvement method for obtaining respectively the optimal value function and optimal policies are developed. Finally, two examples are given to illustrate the value iteration procedure and essential characterization of the risk function. 相似文献

2.

Denumerable state semi-Markov decision processes with unbounded costs,average cost criterion

A. Federgruen A. Hordijk H.C. Tijms 《Stochastic Processes and their Applications》1979,9(2):223-235

This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy. 相似文献

3.

Weighted discounted Markov decision processes with perturbation

刘克《应用数学学报(英文版)》1999,15(2):183-189

1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe... 相似文献

4.

非负费用折扣半马氏决策过程 总被引：1，自引：0，他引：1

黄永辉郭先平《数学学报》2010,53(3):503-514

本文考虑可数状态非负费用的折扣半马氏决策过程.首先在给定半马氏决策核和策略下构造一个连续时间半马氏决策过程,然后用最小非负解方法证明值函数满足最优方程和存在ε-最优平稳策略,并进一步给出最优策略的存在性条件及其一些性质.最后,给出了值迭代算法和一个数值算例. 相似文献

5.

Optimal stopping time on discounted semi-Markov processes

Fang CHEN Xianping GUO Zhong-Wei LIAO 《Frontiers of Mathematics in China》2021,16(2):303

This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the equivalent semi-Markov decision processes (SMDPs). The equivalence is embodied in the expected discounted cost functions of SMPs and SMDPs, that is, every stopping time of SMPs can induce a policy of SMDPs such that the value functions are equal, and vice versa. The existence of the optimal stopping time of SMPs is proved by this equivalence relation. Next, we give the optimality equation of the value function and develop an effective iterative algorithm for computing it. Moreover, we show that the optimal and ε-optimal stopping time can be characterized by the hitting time of the special sets. Finally, to illustrate the validity of our results, an example of a maintenance system is presented in the end. 相似文献

6.

First passage time for some stationary processes

George Haiman 《Stochastic Processes and their Applications》1999,80(2):231-248

For a 1-dependent stationary sequence {X_n} we first show that if u satisfies p₁=p₁(u)=P(X₁>u)0.025 and n>3 is such that 88np₁³1, then

P{max(X₁,…,X_n)u}=ν·μⁿ+O{p₁³(88n(1+124np₁³)+561)}, n>3,

where

ν=1−p₂+2p₃−3p₄+p₁²+6p₂²−6p₁p₂,μ=(1+p₁−p₂+p₃−p₄+2p₁²+3p₂²−5p₁p₂)⁻¹

with

p_k=p_k(u)=P{min(X₁,…,X_k)>u}, k1

and

|O(x)||x|.

From this result we deduce, for a stationary T-dependent process with a.s. continuous path {Y_s}, a similar, in terms of P{max_0skTY_s<u}, k=1,2 formula for P{max_0stY_su}, t>3T and apply this formula to the process Y_s=W(s+1)−W(s), s0, where {W(s)} is the Wiener process. We then obtain numerical estimations of the above probabilities. 相似文献

7.

A basic formula for performance gradient estimation of semi-Markov decision processes

Yanjie Li Fang Cao 《European Journal of Operational Research》2013

This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature. 相似文献

8.

Constrained optimality for finite horizon semi-Markov decision processes in Polish spaces

Yonghui Huang Zhongfei Li Xianping Guo 《Operations Research Letters》2014

This paper focuses on solving a finite horizon semi-Markov decision process with multiple constraints. We convert the problem to a constrained absorbing discrete-time Markov decision process and then to an equivalent linear program over a class of occupancy measures. The existence, characterization and computation of constrained-optimal policies are established under suitable conditions. An example is given to demonstrate our results. 相似文献

9.

连续时间Markov决策过程的均值-方差优化问题

下载免费PDF全文

叶柳儿 ;黄香香《中国科学:数学》2014,44(8):883-898

本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性. 相似文献

10.

Optimal threshold probability and expectation in semi-Markov decision processes

Masahiko Sakaguchi 《Applied mathematics and computation》2010,216(10):2947-2958

We consider undiscounted semi-Markov decision process with a target set and our main concern is a problem minimizing threshold probability. We formulate the problem as an infinite horizon case with a recurrent class. We show that an optimal value function is a unique solution to an optimality equation and there exists a stationary optimal policy. Also several value iteration methods and a policy improvement method are given in our model. Furthermore, we investigate a relationship between threshold probabilities and expectations for total rewards. 相似文献

11.

An average-value-at-risk criterion for Markov decision processes with unbounded costs

Qiuli LIU Wai-Ki CHING Junyu ZHANG Hongchu WANG 《Frontiers of Mathematics in China》2022,17(4):673

We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model. 相似文献

12.

A policy gradient method for semi-Markov decision processes with application to call admission control

Sumeetpal S. Singh Vladislav B. Tadić Arnaud Doucet 《European Journal of Operational Research》2007

Solving a semi-Markov decision process (SMDP) using value or policy iteration requires precise knowledge of the probabilistic model and suffers from the curse of dimensionality. To overcome these limitations, we present a reinforcement learning approach where one optimizes the SMDP performance criterion with respect to a family of parameterised policies. We propose an online algorithm that simultaneously estimates the gradient of the performance criterion and optimises it using stochastic approximation. We apply our algorithm to call admission control. Our proposed policy gradient SMDP algorithm and its application to admission control is novel. 相似文献

13.

离散时间马氏决策过程的首达目标准则

刘秋丽《应用数学学报》2011,34(6)

本文考虑可数状态离散时间马氏决策过程的首达目标模型的风险概率准则.优化的准则是最小化系统首次到达目标状态集的时间不超过某阈值的风险概率.首先建立最优方程并且证明最优值函数和最优方程的解对应,然后讨论了最优策略的一些性质,并进一步给出了最优平稳策略存在的条件,最后用一个例子说明我们的结果. 相似文献

14.

Finite horizon semi-Markov decision processes with application to maintenance systems

Yonghui Huang Xianping Guo 《European Journal of Operational Research》2011,212(1):131-140

This paper investigates finite horizon semi-Markov decision processes with denumerable states. The optimality is over the class of all randomized history-dependent policies which include states and also planning horizons, and the cost rate function is assumed to be bounded below. Under suitable conditions, we show that the value function is a minimum nonnegative solution to the optimality equation and there exists an optimal policy. Moreover, we develop an effective algorithm for computing optimal policies, derive some properties of optimal policies, and in addition, illustrate our main results with a maintenance system. 相似文献

15.

Semi-infinite discounted Markov decision processes: Policy improvement and singular perturbations

Mohammed Abbad Khalid Rahhali 《Mathematical Methods of Operations Research》2001,54(2):279-290

相似文献

16.

Conditions for the uniqueness of optimal policies of discounted Markov decision processes

Daniel?Cruz-Suárez Raúl?Montes-de-Oca Email author Francisco?Salem-Silva 《Mathematical Methods of Operations Research》2004,60(3):415-436

相似文献

17.

Optimal control in light traffic Markov decision processes

Ger Koole Olaf Passchier 《Mathematical Methods of Operations Research》1997,45(1):63-79

We consider Markov Decision Processes under light traffic conditions. We develop an algorithm to obtain asymptotically optimal policies for both the total discounted and the average cost criterion. This gives a general framework for several light traffic results in the literature. We illustrate the method by deriving the asymptotically optimal control of a simple ATM network. 相似文献

18.

A homotopy approach for infinite horizon discounted Markov decision processes

Douglas John White 《Mathematical Methods of Operations Research》1996,43(3):353-372

In this paper we consider a homotopy deformation approach to solving Markov decision process problems by the continuous deformation of a simpler Markov decision process problem until it is identical with the original problem. Algorithms and performance bounds are given. 相似文献

19.

Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints

Xianping Guo Wenzhao Zhang 《European Journal of Operational Research》2014

相似文献

20.

Semi-Markov control processes with unknown holding times distribution under a discounted criterion

Fernando Luque-Vásquez J. Adolfo Minjárez-Sosa 《Mathematical Methods of Operations Research》2005,61(3):455-468

相似文献