期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《Operations Research Letters》2020,48(1):96-103

This paper deals with risk-sensitive piecewise deterministic Markov decision processes, where the expected exponential utility of a finite-horizon reward is to be maximized. Both the transition rates and reward functions are allowed to be unbounded. Feynman–Kac’s formula is developed in our setup, using which along with an approximation technique, we establish the associated Hamilton–Jacobi–Bellman equation and the existence of risk-sensitive optimal policies under suitable conditions. 相似文献

2.

Yonghui Huang Xianping Guo 《Stochastics An International Journal of Probability and Stochastic Processes》2019,91(1):67-95

This paper is concerned with the problem of minimizing the expected finite-horizon cost for piecewise deterministic Markov decision processes. The transition rates may be unbounded, and the cost functions are allowed to be unbounded from above and from below. The optimality is over the general history-dependent policies, where the control is continuously acting in time. The infinitesimal approach is employed to establish the associated Hamilton-Jacobi-Bellman equation, via which the existence of optimal policies is proved. An example is provided to verify all the assumptions proposed. 相似文献

3.

Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints

Xianping Guo Wenzhao Zhang 《European Journal of Operational Research》2014

相似文献

4.

Constrained continuous-time Markov decision processes with average criteria

Lanlan Zhang Xianping Guo 《Mathematical Methods of Operations Research》2008,67(2):323-340

In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our conditions. Supported by NSFC, NCET and RFDP. 相似文献

5.

《Nonlinear Analysis: Hybrid Systems》2021

相似文献

6.

Mrinal K. Ghosh Subhamay Saha 《随机分析与应用》2013,31(1):183-190

We introduce and study a class of non-stationary semi-Markov decision processes on a finite horizon. By constructing an equivalent Markov decision process, we establish the existence of a piecewise open loop relaxed control which is optimal for the finite horizon problem. 相似文献

7.

《Stochastic Processes and their Applications》2020,130(3):1515-1544

The purpose of this paper is to study a Markovian metapopulation model on a directed graph with edge-supported transfers and deterministic intra-nodal population dynamics. We first state tractable stability conditions for two typical frameworks motivated by applications: constant jump rates with multiplicative transfer amplitudes, and coercive jump rates with unitary transfers. More general criteria for boundedness, petiteness and ergodicity are then given. 相似文献

8.

Average optimality inequality for continuous-time Markov decision processes in Polish spaces

Quanxin Zhu 《Mathematical Methods of Operations Research》2007,66(2):299-313

In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957). 相似文献

9.

《Optimization》2012,61(4):773-800

Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs. 相似文献

10.

Smoothness of certain functions in two kinds of risk models with a barrier dividend strategy

Wei Wang Jing-min He Rong Wu 《应用数学学报(英文版)》2010,26(4):661-668

In this paper,we study the smoothness of certain functions in two kinds of risk models with a barrier dividend strategy.Mainly using technique from the piecewise deterministic Markov processes theory,we prove that the function is continuously differentiable in the first risk model.Using the weak infinitesimal generator method of Markov processes,we prove that the function is twice continuously differentiable in the second risk model.Intego-differential equations satisfied by them are derived. 相似文献

11.

Criteria for Feller transition functions

Yangrong Li Jia Li 《Journal of Mathematical Analysis and Applications》2009,359(2):653-665

This paper presents some conditions for the minimal Q-function to be a Feller transition function, for a given q-matrix Q. We derive a sufficient condition that is stated explicitly in terms of the transition rates. Furthermore, some necessary and sufficient conditions are derived of a more implicit nature, namely in terms of properties of a system of equations (or inequalities) and in terms of the operator induced by the q-matrix. The criteria lead to some perturbation results. These results are applied to birth-death processes with killing, yielding some sufficient and some necessary conditions for the Feller property directly in terms of the rates. An essential step in the analysis is the idea of associating the Feller property with individual states. 相似文献

12.

《Stochastic Processes and their Applications》2020,130(4):2488-2518

相似文献

13.

A survey of recent results on continuous-time Markov decision processes

Xianping Guo Onésimo Hernández-Lerma Tomás Prieto-Rumeau Xi-Ren Cao Junyu Zhang Qiying Hu Mark E. Lewis Ricardo Vélez 《TOP》2006,14(2):177-261

This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games. Research partially supported by grants NSFC, DRFP and NCET. Research partially supported by CONACyT (Mexico) Grant 45693-F. 相似文献

14.

Markov decision processes with state-dependent discount factors and unbounded rewards/costs

Qingda Wei Xianping Guo 《Operations Research Letters》2011,39(5):369-374

This paper deals with discrete-time Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Under general conditions, we develop an iteration algorithm for computing the optimal value function, and also prove the existence of optimal stationary policies. Furthermore, we illustrate our results with a cash-balance model. 相似文献

15.

盈余过程的马氏性与索赔到达间隔分布为离散型的连续时间风险模型

刘国欣袁莉萍《应用数学学报》2006,29(3):518-526

本文研究广泛的一类连续时间风险模型盈余过程的马氏性,得到了盈余过程成为马氏过程的充分必要条件．首次建立了索赔到达间隔为离散型分布的连续时间风险模型．并对两个基本特例得到了破产概率的准确表达式．相似文献

16.

Substochastic semigroups and densities of piecewise deterministic Markov processes

Marta Tyran-Kamińska 《Journal of Mathematical Analysis and Applications》2009,357(2):385-402

Necessary and sufficient conditions are given for a substochastic semigroup on L¹ obtained through the Kato-Voigt perturbation theorem to be either stochastic or strongly stable. We show how such semigroups are related to piecewise deterministic Markov process, provide a probabilistic interpretation of our results, and apply them to fragmentation equations. 相似文献

17.

Continuity of the M/G/c queue

Lothar Breuer 《Queueing Systems》2008,58(4):321-331

Consider an M/G/c queue with homogeneous servers and service time distribution F. It is shown that an approximation of the service time distribution F by stochastically smaller distributions, say F _n, leads to an approximation of the stationary distribution π of the original M/G/c queue by the stationary distributions π _n of the M/G/c queues with service time distributions F _n. Here all approximations are in weak convergence. The argument is based on a representation of M/G/c queues in terms of piecewise deterministic Markov processes as well as some coupling methods. 相似文献

18.

Regret bounds for Narendra-Shapiro bandit algorithms

Sébastien Gadat Sofiane Saadane 《Stochastics An International Journal of Probability and Stochastic Processes》2018,90(6):886-926

Narendra-Shapiro (NS) algorithms are bandit-type algorithms developed in the 1960s. NS-algorithms have been deeply studied in infinite horizon but little non-asymptotic results exist for this type of bandit algorithms. In this paper, we focus on a non-asymptotic study of the regret and address the following question: are Narendra-Shapiro bandit algorithms competitive from this point of view? In our main result, we obtain some uniform explicit bounds for the regret of (over)-penalized-NS algorithms. We also extend to the multi-armed case some convergence properties of penalized-NS algorithms towards a stationary Piecewise Deterministic Markov Process (PDMP). Finally, we establish some new sharp mixing bounds for these processes. 相似文献

19.

Suboptimal policy determination for large-scale Markov decision processes,part 2: Implementation and numerical evaluation

J. L. Popyack C. C. White III 《Journal of Optimization Theory and Applications》1985,46(3):343-358

We present an implementation of the procedure for determining a suboptimal policy for a large-scale Markov decision process (MDP) presented in Part 1. An operation count analysis illuminates the significant computational benefits of this procedure for determining an optimal policy relative to a procedure for determining a suboptimal policy based on state and action space aggregation. Results of a preliminary numerical study indicate that the quality of the suboptimal policy produced by the 3MDP approach shows promise.This research has been supported by NSF Grants Nos. ECS-80-18266 and ECS-83-19355. 相似文献

20.

The Laurent series, sensitive discount and Blackwell optimality for continuous-time controlled Markov chains

Tomás Prieto-Rumeau Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2005,61(1):123-145

相似文献