首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We consider nonzero-sum games for continuous-time jump processes with unbounded transition rates under expected average payoff criterion. The state and action spaces are Borel spaces and reward rates are unbounded. We introduce an approximating sequence of stochastic game models with extended state space, for which the uniform exponential ergodicity is obtained. Moreover, we prove the existence of a stationary almost Markov Nash equilibrium by introducing auxiliary static game models. Finally, a cash flow model is employed to illustrate the results.  相似文献   

2.
In this paper, we consider the continuous-time nonzero-sum stochastic games under the constrained average criteria. The state space is denumerable and the action space of each player is a general Polish space. The transition rates, reward and cost functions are allowed to be unbounded. The main hypotheses in this paper include the standard drift conditions, continuity-compactness condition and some ergodicity assumptions. By applying the vanishing discount method, we obtain the existence of stationary constrained average Nash equilibria.  相似文献   

3.
本文首次在报酬函数及转移速率族均非一致有界的条件下,对可数状态空间,可地动集的连续时间折扣马氏决策规划进行研究,文中引入一类新的无界报酬函数,在一类新的马氏策略中,讨论了最优策略的存在性及春结构,除证明了在有界报酬和一致有界转移速率族下成立的主要结果外,本文还得到一些重要结论。  相似文献   

4.
《Optimization》2012,61(4):773-800
Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs.  相似文献   

5.
There are several approaches of sharing resources among users. There is a noncooperative approach wherein each user strives to maximize its own utility. The most common optimality notion is then the Nash equilibrium. Nash equilibria are generally Pareto inefficient. On the other hand, we consider a Nash equilibrium to be fair as it is defined in a context of fair competition without coalitions (such as cartels and syndicates). We show a general framework of systems wherein there exists a Pareto optimal allocation that is Pareto superior to an inefficient Nash equilibrium. We consider this Pareto optimum to be ??Nash equilibrium based fair.?? We further define a ??Nash proportionately fair?? Pareto optimum. We then provide conditions for the existence of a Pareto-optimal allocation that is, truly or most closely, proportional to a Nash equilibrium. As examples that fit in the above framework, we consider noncooperative flow-control problems in communication networks, for which we show the conditions on the existence of Nash-proportionately fair Pareto optimal allocations.  相似文献   

6.
《Optimization》2012,61(7):1593-1623
This paper deals with the ratio and time expected average criteria for constrained semi-Markov decision processes (SMDPs). The state and action spaces are Polish spaces, the rewards and costs are unbounded from above and from below, and the mean holding times are allowed to be unbounded from above. First, under general conditions we prove the existence of constrained-optimal policies for the ratio expected average criterion by developing a technique of occupation measures including the mean holding times for SMDPs, which are the generalizations of those for the standard discrete-time and continuous-time MDPs. Then, we give suitable conditions under which we establish the equivalence of the two average criteria by the optional sampling theorem, and thus we show the existence of constrained-optimal policies for the time expected average criterion. Finally, we illustrate the application of our main results with a controlled linear system, for which an exact optimal policy is obtained.  相似文献   

7.
A class of stochastic games with additive reward and transition structure is studied. For zero-sum games under some ergodicity assumptions 1-equilibria are shown to exist. They correspond to so-called sensitive optimal policies in dynamic programming. For a class of nonzero-sum stochastic games with nonatomic transitions nonrandomized Nash equilibrium points with respect to the average payoff criterion are also obtained. Included examples show that the results of this paper can not be extented to more general payoff or transition structure.  相似文献   

8.
This paper is concerned with the problem of minimizing the expected finite-horizon cost for piecewise deterministic Markov decision processes. The transition rates may be unbounded, and the cost functions are allowed to be unbounded from above and from below. The optimality is over the general history-dependent policies, where the control is continuously acting in time. The infinitesimal approach is employed to establish the associated Hamilton-Jacobi-Bellman equation, via which the existence of optimal policies is proved. An example is provided to verify all the assumptions proposed.  相似文献   

9.
We consider stochastic games with countable state spaces and unbounded immediate payoff functions. Our assumptions on the transition structure of the game are based on a recent work by Meyn and Tweedie [19] on computable bounds for geometric convergence rates of Markov chains. The main results in this paper concern the existence of sensitive optimal strategies in some classes of zero-sum stochastic games. By sensitive optimality we mean overtaking or 1-optimality. We also provide a new Nash equilibrium theorem for a class of ergodic nonzero-sum stochastic games with denumerable state spaces.  相似文献   

10.
This paper deals with denumerable-state continuous-time controlled Markov chains with possibly unbounded transition and reward rates. It concerns optimality criteria that improve the usual expected average reward criterion. First, we show the existence of average reward optimal policies with minimal average variance. Then we compare the variance minimization criterion with overtaking optimality. We present an example showing that they are opposite criteria, and therefore we cannot optimize them simultaneously. This leads to a multiobjective problem for which we identify the set of Pareto optimal policies (also known as nondominated policies).  相似文献   

11.
We use a game theoretical approach to study pricing and advertisement decisions in a manufacturer–retailer supply chain when price discounts are offered by both the manufacturer and retailer. When the manufacturer is the leader of the game, we obtained Stackelberg equilibrium with manufacturer’s local allowance, national brand name investment, manufacturer’s preferred price discount, retailer’s price discount, and local advertising expense. For the special case of two-stage equilibrium when the manufacturer’s price discount is exogenous, we found that the retailer is willing to increase local advertising expense if the manufacturer increases local advertising allowance and provides deeper price discount, or if the manufacturer decreases its brand name investment. When both the manufacturer and retailer have power, Nash equilibrium in a competition game is obtained. The comparison between the Nash equilibrium and Stackelberg equilibrium shows that the manufacturer always prefers Stackelberg equilibrium, but there is no definitive conclusion for the retailer. The bargaining power can be used to determine the profit sharing between the manufacturer and the retailer. Once the profit sharing is determined, we suggest a simple contract to help the manufacturer and retailer obtain their desired profit sharing.  相似文献   

12.
This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games. Research partially supported by grants NSFC, DRFP and NCET. Research partially supported by CONACyT (Mexico) Grant 45693-F.  相似文献   

13.
We study deterministic discounted optimal control problems associated with discrete-time systems. It is shown that, for small discount rates, the controllability properties of the underlying system can guarantee the convergence of the discounted value function to the value function of the average yield. An application in the theory of exponential growth rates of discrete inclusions is presented. This application motivates the analysis of infinite-horizon optimal control problems with running yields that are unbounded from below.  相似文献   

14.
公超  林勇 《数学学报》2018,61(3):503-510
本文探讨图上的泛函不等式,并且在无界拉普拉斯算子的意义下,利用图的完备性和图上超压缩性的性质,证明了图上对数Sobolev不等式的成立,以及超压缩性与图上Nash不等式的等价关系.  相似文献   

15.
Nonzero-sum non-stationary discounted Markov game model   总被引:1,自引:0,他引:1  
The goal of this paper is provide a theory of K-person non-stationary Markov games with unbounded rewards, for a countable state space and action spaces. We investigate both the finite and infinite horizon problems. We define the concept of strong Nash equilibrium and present conditions for both problems for which strong Nash or Nash equilibrium strategies exist for all players within the Markov strategies, and show that the rewards in equilibrium satisfy the optimality equations.  相似文献   

16.
The equilibrium problem with equilibrium constraints (EPEC) can be looked on as a generalization of Nash equilibrium problem (NEP) and the mathematical program with equilibrium constraints (MPEC) whose constraints contain a parametric variational inequality or complementarity system. In this paper, we particularly consider a special class of EPECs where a common parametric P-matrix linear complementarity system is contained in all players?? strategy sets. After reformulating the EPEC as an equivalent nonsmooth NEP, we use a smoothing method to construct a sequence of smoothed NEPs that approximate the original problem. We consider two solution concepts, global Nash equilibrium and stationary Nash equilibrium, and establish some results about the convergence of approximate Nash equilibria. Moreover we show some illustrative numerical examples.  相似文献   

17.
In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our conditions. Supported by NSFC, NCET and RFDP.  相似文献   

18.
19.
This paper deals with Markov Decision Processes (MDPs) on Borel spaces with possibly unbounded costs. The criterion to be optimized is the expected total cost with a random horizon of infinite support. In this paper, it is observed that this performance criterion is equivalent to the expected total discounted cost with an infinite horizon and a varying-time discount factor. Then, the optimal value function and the optimal policy are characterized through some suitable versions of the Dynamic Programming Equation. Moreover, it is proved that the optimal value function of the optimal control problem with a random horizon can be bounded from above by the optimal value function of a discounted optimal control problem with a fixed discount factor. In this case, the discount factor is defined in an adequate way by the parameters introduced for the study of the optimal control problem with a random horizon. To illustrate the theory developed, a version of the Linear-Quadratic model with a random horizon and a Logarithm Consumption-Investment model are presented.  相似文献   

20.
This paper concerns two-person zero-sum games for a class of average-payoff continuous-time Markov processes in Polish spaces.The underlying processes are determined by transition rates that are allowed to be unbounded,and the payoff function may have neither upper nor lower bounds.We use two optimality inequalities to replace the so-called optimality equation in the previous literature.Under more general conditions,these optimality inequalities yield the existence of the value of the game and of a pair of ...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号