期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Denumerable controlled Markov chains with average reward criterion: Sample path optimality

Rolando Cavazos-Cadena Emmanuel Fernández-Gaucherand 《Mathematical Methods of Operations Research》1995,41(1):89-108

We consider discrete-time nonlinear controlled stochastic systems, modeled by controlled Makov chains with denumerable state space and compact action space. The corresponding stochastic control problem of maximizing average rewards in the long-run is studied. Departing from the most common position which usesexpected values of rewards, we focus on a sample path analysis of the stream of states/rewards. Under a Lyapunov function condition, we show that stationary policies obtained from the average reward optimality equation are not only average reward optimal, but indeed sample path average reward optimal, for almost all sample paths.Research supported by a U.S.-México Collaborative Research Program funded by the National Science Foundation under grant NSF-INT 9201430, and by CONACyT-MEXICO.Partially supported by the MAXTOR Foundation for applied Probability and Statistics, under grant No. 01-01-56/04-93.Research partially supported by the Engineering Foundation under grant RI-A-93-10, and by a grant from the AT&T Foundation. 相似文献

2.

Denumerable state semi-Markov decision processes with unbounded costs,average cost criterion

A. Federgruen A. Hordijk H.C. Tijms 《Stochastic Processes and their Applications》1979,9(2):223-235

This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy. 相似文献

3.

Successive approximations in partially observable controlled Markov chains with risk-sensitive average criterion

《Stochastics An International Journal of Probability and Stochastic Processes》2013,85(6):537-568

Partially observable Markov decision chains with finite state, action and signal spaces are considered. The performance index is the risk-sensitive average criterion and, under conditions concerning reachability between the unobservable states and observability of the signals, it is shown that the value iteration algorithm can be implemented to approximate the optimal average cost, to determine a stationary policy whose performance index is arbitrarily close to the optimal one, and to establish the existence of solutions to the optimality equation. The results rely on an appropriate extension of the well-known Schweitzer's transformation. 相似文献

4.

A note on strong 1-optimal policies in Markov decision chains with unbounded costs

Andrzej S. Nowak 《Mathematical Methods of Operations Research》1999,49(3):475-482

相似文献

5.

An average-value-at-risk criterion for Markov decision processes with unbounded costs

Qiuli LIU Wai-Ki CHING Junyu ZHANG Hongchu WANG 《Frontiers of Mathematics in China》2022,17(4):673

We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model. 相似文献

6.

Adaptive control for discrete-time Markov processes with unbounded costs: Average criterion

Evgueni I. Gordienko J. Adolfo Minjárez-Sosa 《Mathematical Methods of Operations Research》1998,48(1):37-55

The paper deals with a class of discrete-time Markov control processes with Borel state and action spaces, and possibly unbounded one-stage costs. The processes are given by recurrent equations x _t ₊₁=F(x _t,a _t,ξ_t), t=1,2,… with i.i.d. ℜ^k– valued random vectors ξ_t whose density ρ is unknown. Assuming observability of ξ_t, and taking advantage of the procedure of statistical estimation of ρ used in a previous work by authors, we construct an average cost optimal adaptive policy. Received March/Revised version October 1997 相似文献

7.

A new strong optimality criterion for nonstationary Markov decision processes

Xianping Guo Peng Shi Weiping Zhu 《Mathematical Methods of Operations Research》2000,52(2):287-306

This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given. 相似文献

8.

Optimal switching problem for countable Markov chains: average reward criterion

Alexander Yushkevich 《Mathematical Methods of Operations Research》2001,53(1):1-24

相似文献

9.

Continuous time Markov decision programming with average reward criterion and unbounded reward rate

Shaohui Zheng 《应用数学学报(英文版)》1991,7(1):6-16

This paper deals with the continuous time Markov decision programming (briefly CTMDP) with unbounded reward rate. The economic criterion is the long-run average reward. To the models with countable state space and compact metric action sets, we present a set of sufficient conditions to ensure the existence of the stationary optimal policies.This paper was prepared with the support of the National Youth Science Foundation. 相似文献

10.

Nonzero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates

WenZhao Zhang XianPing Guo 《中国科学数学(英文版)》2012,55(11):2405-2416

This paper attempts to study two-person nonzero-sum games for denumerable continuous-time Markov chains determined by transition rates,with an expected average criterion.The transition rates are allowed to be unbounded,and the payoff functions may be unbounded from above and from below.We give suitable conditions under which the existence of a Nash equilibrium is ensured.More precisely,using the socalled "vanishing discount" approach,a Nash equilibrium for the average criterion is obtained as a limit point of a sequence of equilibrium strategies for the discounted criterion as the discount factors tend to zero.Our results are illustrated with a birth-and-death game. 相似文献

11.

Value iteration in countable state average cost Markov decision processes with unbounded costs

Linn I. Sennott 《Annals of Operations Research》1991,28(1):261-271

We deal with countable state Markov decision processes with finite action sets and (possibly) unbounded costs. Assuming the existence of an expected average cost optimal stationary policyf, with expected average costg, when canf andg be found using undiscounted value iteration? We give assumptions guaranteeing the convergence of a quantity related tong?Ν _n (i), whereΝ _n (i) is the minimum expectedn-stage cost when the process starts in statei. The theory is applied to a queueing system with variable service rates and to a queueing system with variable arrival parameter. 相似文献

12.

The optimality equation and ε-optimal strategies in Markov games with average reward criterion 总被引：1，自引：0，他引：1

Heinz-Uwe Küenle Ronald Schurath 《Mathematical Methods of Operations Research》2003,56(3):451-471

相似文献

13.

Approximation of average cost optimal policies for general Markov decision processes with unbounded costs

Evgueni Gordienko Raúl Montes-De-Oca Adolfo Minjárez-Sosa 《Mathematical Methods of Operations Research》1997,45(2):245-263

The aim of the paper is to show that Lyapunov-like ergodicity conditions on Markov decision processes with Borel state space and possibly unbounded cost provide the approximation of an average cost optimal policy by solvingn-stage optimization problems (n = 1, 2, ...). The used approach ensures the exponential rate of convergence. The approximation of this type would be useful to find adaptive procedures of control and to estimate stability of an optimal control under disturbances of the transition probability.Research supported in part by Consejo Nacional de Ciencia y Tecnologia (CONACYT) under grant 0635P-E9506.Research supported by Fondo del Sistema de Investigatión del Mar de Cortés under Grant SIMAC/94/CT-005. 相似文献

14.

Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains

Tomás Prieto-Rumeau Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2009,70(3):527-540

This paper deals with denumerable-state continuous-time controlled Markov chains with possibly unbounded transition and reward rates. It concerns optimality criteria that improve the usual expected average reward criterion. First, we show the existence of average reward optimal policies with minimal average variance. Then we compare the variance minimization criterion with overtaking optimality. We present an example showing that they are opposite criteria, and therefore we cannot optimize them simultaneously. This leads to a multiobjective problem for which we identify the set of Pareto optimal policies (also known as nondominated policies). 相似文献

15.

Risk-sensitive average continuous-time Markov decision processes with unbounded rates

《Optimization》2012,61(4):773-800

Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs. 相似文献

16.

Solutions of the average cost optimality equation for finite Markov decision chains: risk-sensitive and risk-neutral criteria

Rolando Cavazos-Cadena 《Mathematical Methods of Operations Research》2009,70(3):541-566

This work is concerned with controlled Markov chains with finite state and action spaces. It is assumed that the decision maker has an arbitrary but constant risk sensitivity coefficient, and that the performance of a control policy is measured by the long-run average cost criterion. Within this framework, the existence of solutions of the corresponding risk-sensitive optimality equation for arbitrary cost function is characterized in terms of communication properties of the transition law. 相似文献

17.

Solution to the risk-sensitive average optimality equation in communicating Markov decision chains with finite state space: An alternative approach

Rolando Cavazos-Cadena Daniel Hernández-Hernández 《Mathematical Methods of Operations Research》2003,56(3):473-479

相似文献

18.

On the uniqueness of solutions to the Poisson equations for average cost Markov chains with unbounded cost functions

Sandjai?Bhulai Email author Flora M.?Spieksma 《Mathematical Methods of Operations Research》2003,58(2):221-236

We consider the Poisson equations for denumerable Markov chains with unbounded cost functions. Solutions to the Poisson equations exist in the Banach space of bounded real-valued functions with respect to a weighted supremum norm such that the Markov chain is geometrically ergodic. Under minor additional assumptions the solution is also unique. We give a novel probabilistic proof of this fact using relations between ergodicity and recurrence. The expressions involved in the Poisson equations have many solutions in general. However, the solution that has a finite norm with respect to the weighted supremum norm is the unique solution to the Poisson equations. We illustrate how to determine this solution by considering three queueing examples: a multi-server queue, two independent single server queues, and a priority queue with dependence between the queues. 相似文献

19.

Nonzero-sum stochastic games with unbounded costs: Discounted and average cost cases

Linn I. Sennott 《Mathematical Methods of Operations Research》1994,40(2):145-162

We treat non-cooperative stochastic games with countable state space and with finitely many players each having finitely many moves available in a given state. As a function of the current state and move vector, each player incurs a nonnegative cost. Assumptions are given for the expected discounted cost game to have a Nash equilibrium randomized stationary strategy. These conditions hold for bounded costs, thereby generalizing Parthasarathy (1973) and Federgruen (1978). Assumptions are given for the long-run average expected cost game to have a Nash equilibrium randomized stationary strategy, under which each player has constant average cost. A flow control example illustrates the results. This paper complements the treatment of the zero-sum case in Sennott (1993a). 相似文献

20.

Zero-sum stochastic games with unbounded costs: Discounted and average cost cases

Linn I. Sennott 《Mathematical Methods of Operations Research》1994,39(2):209-225

Zero-sum stochastic games with countable state space and with finitely many moves available to each player in a given state are treated. As a function of the current state and the moves chosen, player I incurs a nonnegative cost and player II receives this as a reward. For both the discounted and average cost cases, assumptions are given for the game to have a finite value and for the existence of an optimal randomized stationary strategy pair. In the average cost case, the assumptions generalize those given in Sennott (1993) for the case of a Markov decision chain. Theorems of Hoffman and Karp (1966) and Nowak (1992) are obtained as corollaries. Sufficient conditions are given for the assumptions to hold. A flow control example illustrates the results. 相似文献