共查询到20条相似文献,搜索用时 0 毫秒
1.
Rolando Cavazos-Cadena Emmanuel Fernández-Gaucherand 《Mathematical Methods of Operations Research》1995,41(1):89-108
We consider discrete-time nonlinear controlled stochastic systems, modeled by controlled Makov chains with denumerable state space and compact action space. The corresponding stochastic control problem of maximizing average rewards in the long-run is studied. Departing from the most common position which usesexpected values of rewards, we focus on a sample path analysis of the stream of states/rewards. Under a Lyapunov function condition, we show that stationary policies obtained from the average reward optimality equation are not only average reward optimal, but indeed sample path average reward optimal, for almost all sample paths.Research supported by a U.S.-México Collaborative Research Program funded by the National Science Foundation under grant NSF-INT 9201430, and by CONACyT-MEXICO.Partially supported by the MAXTOR Foundation for applied Probability and Statistics, under grant No. 01-01-56/04-93.Research partially supported by the Engineering Foundation under grant RI-A-93-10, and by a grant from the AT&T Foundation. 相似文献
2.
This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy. 相似文献
3.
Partially observable Markov decision chains with finite state, action and signal spaces are considered. The performance index is the risk-sensitive average criterion and, under conditions concerning reachability between the unobservable states and observability of the signals, it is shown that the value iteration algorithm can be implemented to approximate the optimal average cost, to determine a stationary policy whose performance index is arbitrarily close to the optimal one, and to establish the existence of solutions to the optimality equation. The results rely on an appropriate extension of the well-known Schweitzer's transformation. 相似文献
4.
5.
We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model. 相似文献
6.
Evgueni I. Gordienko J. Adolfo Minjárez-Sosa 《Mathematical Methods of Operations Research》1998,48(1):37-55
The paper deals with a class of discrete-time Markov control processes with Borel state and action spaces, and possibly unbounded
one-stage costs. The processes are given by recurrent equations x
t
+1=F(x
t
,a
t
,ξ
t
), t=1,2,… with i.i.d. ℜ
k
– valued random vectors ξ
t
whose density ρ is unknown. Assuming observability of ξ
t
, and taking advantage of the procedure of statistical estimation of ρ used in a previous work by authors, we construct an
average cost optimal adaptive policy.
Received March/Revised version October 1997 相似文献
7.
This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given. 相似文献
8.
9.
Shaohui Zheng 《应用数学学报(英文版)》1991,7(1):6-16
This paper deals with the continuous time Markov decision programming (briefly CTMDP) with unbounded reward rate. The economic criterion is the long-run average reward. To the models with countable state space and compact metric action sets, we present a set of sufficient conditions to ensure the existence of the stationary optimal policies.This paper was prepared with the support of the National Youth Science Foundation. 相似文献
10.
This paper attempts to study two-person nonzero-sum games for denumerable continuous-time Markov chains determined by transition rates,with an expected average criterion.The transition rates are allowed to be unbounded,and the payoff functions may be unbounded from above and from below.We give suitable conditions under which the existence of a Nash equilibrium is ensured.More precisely,using the socalled "vanishing discount" approach,a Nash equilibrium for the average criterion is obtained as a limit point of a sequence of equilibrium strategies for the discounted criterion as the discount factors tend to zero.Our results are illustrated with a birth-and-death game. 相似文献
11.
Linn I. Sennott 《Annals of Operations Research》1991,28(1):261-271
We deal with countable state Markov decision processes with finite action sets and (possibly) unbounded costs. Assuming the existence of an expected average cost optimal stationary policyf, with expected average costg, when canf andg be found using undiscounted value iteration? We give assumptions guaranteeing the convergence of a quantity related tong?Ν n (i), whereΝ n (i) is the minimum expectedn-stage cost when the process starts in statei. The theory is applied to a queueing system with variable service rates and to a queueing system with variable arrival parameter. 相似文献
12.
13.
Evgueni Gordienko Raúl Montes-De-Oca Adolfo Minjárez-Sosa 《Mathematical Methods of Operations Research》1997,45(2):245-263
The aim of the paper is to show that Lyapunov-like ergodicity conditions on Markov decision processes with Borel state space and possibly unbounded cost provide the approximation of an average cost optimal policy by solvingn-stage optimization problems (n = 1, 2, ...). The used approach ensures the exponential rate of convergence. The approximation of this type would be useful to find adaptive procedures of control and to estimate stability of an optimal control under disturbances of the transition probability.Research supported in part by Consejo Nacional de Ciencia y Tecnologia (CONACYT) under grant 0635P-E9506.Research supported by Fondo del Sistema de Investigatión del Mar de Cortés under Grant SIMAC/94/CT-005. 相似文献
14.
Tomás Prieto-Rumeau Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2009,70(3):527-540
This paper deals with denumerable-state continuous-time controlled Markov chains with possibly unbounded transition and reward
rates. It concerns optimality criteria that improve the usual expected average reward criterion. First, we show the existence
of average reward optimal policies with minimal average variance. Then we compare the variance minimization criterion with
overtaking optimality. We present an example showing that they are opposite criteria, and therefore we cannot optimize them
simultaneously. This leads to a multiobjective problem for which we identify the set of Pareto optimal policies (also known
as nondominated policies). 相似文献
15.
《Optimization》2012,61(4):773-800
AbstractIn this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs. 相似文献
16.
Rolando Cavazos-Cadena 《Mathematical Methods of Operations Research》2009,70(3):541-566
This work is concerned with controlled Markov chains with finite state and action spaces. It is assumed that the decision
maker has an arbitrary but constant risk sensitivity coefficient, and that the performance of a control policy is measured
by the long-run average cost criterion. Within this framework, the existence of solutions of the corresponding risk-sensitive
optimality equation for arbitrary cost function is characterized in terms of communication properties of the transition law. 相似文献
17.
18.
We consider the Poisson equations for denumerable Markov chains with unbounded cost functions. Solutions to the Poisson equations exist in the Banach space of bounded real-valued functions with respect to a weighted supremum norm such that the Markov chain is geometrically ergodic. Under minor additional assumptions the solution is also unique. We give a novel probabilistic proof of this fact using relations between ergodicity and recurrence. The expressions involved in the Poisson equations have many solutions in general. However, the solution that has a finite norm with respect to the weighted supremum norm is the unique solution to the Poisson equations. We illustrate how to determine this solution by considering three queueing examples: a multi-server queue, two independent single server queues, and a priority queue with dependence between the queues. 相似文献
19.
Linn I. Sennott 《Mathematical Methods of Operations Research》1994,40(2):145-162
We treat non-cooperative stochastic games with countable state space and with finitely many players each having finitely many moves available in a given state. As a function of the current state and move vector, each player incurs a nonnegative cost. Assumptions are given for the expected discounted cost game to have a Nash equilibrium randomized stationary strategy. These conditions hold for bounded costs, thereby generalizing Parthasarathy (1973) and Federgruen (1978). Assumptions are given for the long-run average expected cost game to have a Nash equilibrium randomized stationary strategy, under which each player has constant average cost. A flow control example illustrates the results. This paper complements the treatment of the zero-sum case in Sennott (1993a). 相似文献
20.
Linn I. Sennott 《Mathematical Methods of Operations Research》1994,39(2):209-225
Zero-sum stochastic games with countable state space and with finitely many moves available to each player in a given state are treated. As a function of the current state and the moves chosen, player I incurs a nonnegative cost and player II receives this as a reward. For both the discounted and average cost cases, assumptions are given for the game to have a finite value and for the existence of an optimal randomized stationary strategy pair. In the average cost case, the assumptions generalize those given in Sennott (1993) for the case of a Markov decision chain. Theorems of Hoffman and Karp (1966) and Nowak (1992) are obtained as corollaries. Sufficient conditions are given for the assumptions to hold. A flow control example illustrates the results. 相似文献