首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《Optimization》2012,61(4):773-800
Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs.  相似文献   

2.
Nonzero-sum ergodic semi-Markov games with Borel state spaces are studied. An equilibrium theorem is proved in the class of correlated stationary strategies using public randomization. Under some additivity assumption concerning the transition probabilities stationary Nash equilibria are also shown to exist.Received: October 2004 / Revised: January 2005  相似文献   

3.
We study a zero-sum partially observed semi-Markov game with average payoff on a countable state space. Under certain ergodicity conditions we show that a saddle point equilibrium exists. We achieve this by solving the corresponding average cost optimality equation using a span contraction method. The average value is shown to be the unique zero of a Lipschitz continuous function. A value iteration scheme is developed to compute the value.  相似文献   

4.
This paper concerns two-person zero-sum games for a class of average-payoff continuous-time Markov processes in Polish spaces.The underlying processes are determined by transition rates that are allowed to be unbounded,and the payoff function may have neither upper nor lower bounds.We use two optimality inequalities to replace the so-called optimality equation in the previous literature.Under more general conditions,these optimality inequalities yield the existence of the value of the game and of a pair of ...  相似文献   

5.
This paper is concerned with the problem of minimizing the expected finite-horizon cost for piecewise deterministic Markov decision processes. The transition rates may be unbounded, and the cost functions are allowed to be unbounded from above and from below. The optimality is over the general history-dependent policies, where the control is continuously acting in time. The infinitesimal approach is employed to establish the associated Hamilton-Jacobi-Bellman equation, via which the existence of optimal policies is proved. An example is provided to verify all the assumptions proposed.  相似文献   

6.
In this paper we study the zero-sum games for continuous-time Markov jump processes under the risk-sensitive finite-horizon cost criterion. The state space is a Borel space and the transition rates are allowed to be unbounded. Under the suitable conditions, we use a new value iteration approach to establish the existence of a solution to the risk-sensitive finite-horizon optimality equations of the players, obtain the existence of the value of the game and show the existence of saddle-point equilibria.  相似文献   

7.
We treat non-cooperative stochastic games with countable state space and with finitely many players each having finitely many moves available in a given state. As a function of the current state and move vector, each player incurs a nonnegative cost. Assumptions are given for the expected discounted cost game to have a Nash equilibrium randomized stationary strategy. These conditions hold for bounded costs, thereby generalizing Parthasarathy (1973) and Federgruen (1978). Assumptions are given for the long-run average expected cost game to have a Nash equilibrium randomized stationary strategy, under which each player has constant average cost. A flow control example illustrates the results. This paper complements the treatment of the zero-sum case in Sennott (1993a).  相似文献   

8.
We consider nonzero-sum games for continuous-time jump processes with unbounded transition rates under expected average payoff criterion. The state and action spaces are Borel spaces and reward rates are unbounded. We introduce an approximating sequence of stochastic game models with extended state space, for which the uniform exponential ergodicity is obtained. Moreover, we prove the existence of a stationary almost Markov Nash equilibrium by introducing auxiliary static game models. Finally, a cash flow model is employed to illustrate the results.  相似文献   

9.
Existence and uniqueness of a Nash equilibrium feedback is established for a simple class nonzero-sum differential games on the line.  相似文献   

10.
In this paper we deal with the problem of existence of a smooth solution of the Hamilton–Jacobi–Bellman–Isaacs (HJBI for short) system of equations associated with nonzero-sum stochastic differential games. We consider the problem in unbounded domains either in the case of continuous generators or for discontinuous ones. In each case we show the existence of a smooth solution of the system. As a consequence, we show that the game has smooth Nash payoffs which are given by means of the solution of the HJBI system and the stochastic process which governs the dynamic of the controlled system.  相似文献   

11.
In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our conditions. Supported by NSFC, NCET and RFDP.  相似文献   

12.
13.
《Optimization》2012,61(7):1593-1623
This paper deals with the ratio and time expected average criteria for constrained semi-Markov decision processes (SMDPs). The state and action spaces are Polish spaces, the rewards and costs are unbounded from above and from below, and the mean holding times are allowed to be unbounded from above. First, under general conditions we prove the existence of constrained-optimal policies for the ratio expected average criterion by developing a technique of occupation measures including the mean holding times for SMDPs, which are the generalizations of those for the standard discrete-time and continuous-time MDPs. Then, we give suitable conditions under which we establish the equivalence of the two average criteria by the optional sampling theorem, and thus we show the existence of constrained-optimal policies for the time expected average criterion. Finally, we illustrate the application of our main results with a controlled linear system, for which an exact optimal policy is obtained.  相似文献   

14.
15.
If P is a stochastic matrix corresponding to a stationary, irreducible, positive persistent Markov chain of period d>1, the powers Pn will not converge as n → ∞. However, the subsequences Pnd+k for k=0,1,...d-1, and hence Cesaro averages Σnk-1 Pk/n, will converge. In this paper we determine classes of nonstationary Markov chains for which the analogous subsequences and/or Cesaro averages converge and consider the rates of convergence. The results obtained are then applied to the analysis of expected average cost.  相似文献   

16.
A class of stochastic games with additive reward and transition structure is studied. For zero-sum games under some ergodicity assumptions 1-equilibria are shown to exist. They correspond to so-called sensitive optimal policies in dynamic programming. For a class of nonzero-sum stochastic games with nonatomic transitions nonrandomized Nash equilibrium points with respect to the average payoff criterion are also obtained. Included examples show that the results of this paper can not be extented to more general payoff or transition structure.  相似文献   

17.
This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games. Research partially supported by grants NSFC, DRFP and NCET. Research partially supported by CONACyT (Mexico) Grant 45693-F.  相似文献   

18.
In this paper we study zero-sum stochastic games. The optimality criterion is the long-run expected average criterion, and the payoff function may have neither upper nor lower bounds. We give a new set of conditions for the existence of a value and a pair of optimal stationary strategies. Our conditions are slightly weaker than those in the previous literature, and some new sufficient conditions for the existence of a pair of optimal stationary strategies are imposed on the primitive data of the model. Our results are illustrated with a queueing system, for which our conditions are satisfied but some of the conditions in some previous literatures fail to hold.  相似文献   

19.
An approximate version of the standard uniformization technique is introduced for application to continuous-time Markov chains with unbounded jump rates. This technique is shown to be asymptotically exact and an error bound for the order of its accuracy is provided. An illustrative queueing application is included.  相似文献   

20.
Zero-sum stochastic games with countable state space and with finitely many moves available to each player in a given state are treated. As a function of the current state and the moves chosen, player I incurs a nonnegative cost and player II receives this as a reward. For both the discounted and average cost cases, assumptions are given for the game to have a finite value and for the existence of an optimal randomized stationary strategy pair. In the average cost case, the assumptions generalize those given in Sennott (1993) for the case of a Markov decision chain. Theorems of Hoffman and Karp (1966) and Nowak (1992) are obtained as corollaries. Sufficient conditions are given for the assumptions to hold. A flow control example illustrates the results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号