共查询到20条相似文献,搜索用时 15 毫秒
1.
《Operations Research Letters》2020,48(1):96-103
This paper deals with risk-sensitive piecewise deterministic Markov decision processes, where the expected exponential utility of a finite-horizon reward is to be maximized. Both the transition rates and reward functions are allowed to be unbounded. Feynman–Kac’s formula is developed in our setup, using which along with an approximation technique, we establish the associated Hamilton–Jacobi–Bellman equation and the existence of risk-sensitive optimal policies under suitable conditions. 相似文献
2.
Yonghui Huang Xianping Guo 《Stochastics An International Journal of Probability and Stochastic Processes》2019,91(1):67-95
This paper is concerned with the problem of minimizing the expected finite-horizon cost for piecewise deterministic Markov decision processes. The transition rates may be unbounded, and the cost functions are allowed to be unbounded from above and from below. The optimality is over the general history-dependent policies, where the control is continuously acting in time. The infinitesimal approach is employed to establish the associated Hamilton-Jacobi-Bellman equation, via which the existence of optimal policies is proved. An example is provided to verify all the assumptions proposed. 相似文献
3.
4.
In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our conditions. Supported by NSFC, NCET and RFDP. 相似文献
6.
We introduce and study a class of non-stationary semi-Markov decision processes on a finite horizon. By constructing an equivalent Markov decision process, we establish the existence of a piecewise open loop relaxed control which is optimal for the finite horizon problem. 相似文献
7.
《Stochastic Processes and their Applications》2020,130(3):1515-1544
The purpose of this paper is to study a Markovian metapopulation model on a directed graph with edge-supported transfers and deterministic intra-nodal population dynamics. We first state tractable stability conditions for two typical frameworks motivated by applications: constant jump rates with multiplicative transfer amplitudes, and coercive jump rates with unitary transfers. More general criteria for boundedness, petiteness and ergodicity are then given. 相似文献
8.
Quanxin Zhu 《Mathematical Methods of Operations Research》2007,66(2):299-313
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward)
rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal
stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new”
cost (or reward) rate.
Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation
of Guangdong Province (Grant No: 06300957). 相似文献
9.
《Optimization》2012,61(4):773-800
AbstractIn this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs. 相似文献
10.
In this paper,we study the smoothness of certain functions in two kinds of risk models with a barrier dividend strategy.Mainly using technique from the piecewise deterministic Markov processes theory,we prove that the function is continuously differentiable in the first risk model.Using the weak infinitesimal generator method of Markov processes,we prove that the function is twice continuously differentiable in the second risk model.Intego-differential equations satisfied by them are derived. 相似文献
11.
This paper presents some conditions for the minimal Q-function to be a Feller transition function, for a given q-matrix Q. We derive a sufficient condition that is stated explicitly in terms of the transition rates. Furthermore, some necessary and sufficient conditions are derived of a more implicit nature, namely in terms of properties of a system of equations (or inequalities) and in terms of the operator induced by the q-matrix. The criteria lead to some perturbation results. These results are applied to birth-death processes with killing, yielding some sufficient and some necessary conditions for the Feller property directly in terms of the rates. An essential step in the analysis is the idea of associating the Feller property with individual states. 相似文献
12.
13.
Xianping Guo Onésimo Hernández-Lerma Tomás Prieto-Rumeau Xi-Ren Cao Junyu Zhang Qiying Hu Mark E. Lewis Ricardo Vélez 《TOP》2006,14(2):177-261
This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most
commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For
concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more
general MDPs or to Markov games.
Research partially supported by grants NSFC, DRFP and NCET.
Research partially supported by CONACyT (Mexico) Grant 45693-F. 相似文献
14.
This paper deals with discrete-time Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Under general conditions, we develop an iteration algorithm for computing the optimal value function, and also prove the existence of optimal stationary policies. Furthermore, we illustrate our results with a cash-balance model. 相似文献
15.
本文研究广泛的一类连续时间风险模型盈余过程的马氏性,得到了盈余过程成为马氏过程的充分必要条件.首次建立了索赔到达间隔为离散型分布的连续时间风险模型.并对两个基本特例得到了破产概率的准确表达式. 相似文献
16.
Marta Tyran-Kamińska 《Journal of Mathematical Analysis and Applications》2009,357(2):385-402
Necessary and sufficient conditions are given for a substochastic semigroup on L1 obtained through the Kato-Voigt perturbation theorem to be either stochastic or strongly stable. We show how such semigroups are related to piecewise deterministic Markov process, provide a probabilistic interpretation of our results, and apply them to fragmentation equations. 相似文献
17.
Lothar Breuer 《Queueing Systems》2008,58(4):321-331
Consider an M/G/c queue with homogeneous servers and service time distribution F. It is shown that an approximation of the service time distribution F by stochastically smaller distributions, say F n , leads to an approximation of the stationary distribution π of the original M/G/c queue by the stationary distributions π n of the M/G/c queues with service time distributions F n . Here all approximations are in weak convergence. The argument is based on a representation of M/G/c queues in terms of piecewise deterministic Markov processes as well as some coupling methods. 相似文献
18.
Sébastien Gadat Sofiane Saadane 《Stochastics An International Journal of Probability and Stochastic Processes》2018,90(6):886-926
Narendra-Shapiro (NS) algorithms are bandit-type algorithms developed in the 1960s. NS-algorithms have been deeply studied in infinite horizon but little non-asymptotic results exist for this type of bandit algorithms. In this paper, we focus on a non-asymptotic study of the regret and address the following question: are Narendra-Shapiro bandit algorithms competitive from this point of view? In our main result, we obtain some uniform explicit bounds for the regret of (over)-penalized-NS algorithms. We also extend to the multi-armed case some convergence properties of penalized-NS algorithms towards a stationary Piecewise Deterministic Markov Process (PDMP). Finally, we establish some new sharp mixing bounds for these processes. 相似文献
19.
We present an implementation of the procedure for determining a suboptimal policy for a large-scale Markov decision process (MDP) presented in Part 1. An operation count analysis illuminates the significant computational benefits of this procedure for determining an optimal policy relative to a procedure for determining a suboptimal policy based on state and action space aggregation. Results of a preliminary numerical study indicate that the quality of the suboptimal policy produced by the 3MDP approach shows promise.This research has been supported by NSF Grants Nos. ECS-80-18266 and ECS-83-19355. 相似文献