期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes

Emmanuel Fernández-Gaucherand Aristotle Arapostathis Steven I. Marcus 《Annals of Operations Research》1991,29(1):439-469

We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.This research was supported in part by the Advanced Technology Program of the State of Texas, in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, and in part by the Air Force Office of Scientific Research (AFSC) under Contract F49620-89-C-0044. 相似文献

2.

Markov control processes with randomized discounted cost

Juan González-Hernández Raquiel R. López-Martínez J. Rubén Pérez-Hernández 《Mathematical Methods of Operations Research》2007,65(1):27-44

In this paper we consider Markov Decision Processes with discounted cost and a random rate in Borel spaces. We establish the dynamic programming algorithm in finite and infinity horizon cases. We provide conditions for the existence of measurable selectors. And we show an example of consumption-investment problem. This research was partially supported by the PROMEP grant 103.5/05/40. 相似文献

3.

Another set of conditions for Markov decision processes with average sample-path costs

Quanxin Zhu Xianping Guo 《Journal of Mathematical Analysis and Applications》2006,322(2):1199-1214

This paper deals with discrete-time Markov decision processes with average sample-path costs (ASPC) in Borel spaces. The costs may have neither upper nor lower bounds. We propose new conditions for the existence of ε-ASPC-optimal (deterministic) stationary policies in the class of all randomized history-dependent policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of ASPC optimal stationary policies are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has first been used to study the ASPC criterion. Also, the approach provided here is slightly different from the “optimality equation approach” widely used in the previous literature. On the other hand, under mild assumptions we show that average expected cost optimality and ASPC-optimality are equivalent. Finally, we use a controlled queueing system to illustrate our results. 相似文献

4.

On the optimality equation for zero-sum ergodic stochastic games

Anna Jaśkiewicz Andrzej S. Nowak 《Mathematical Methods of Operations Research》2001,54(2):291-301

相似文献

5.

Average cost Markov control processes: stability with respect to the Kantorovich metric

Evgueni Gordienko Enrique Lemus-Rodríguez Raúl Montes-de-Oca 《Mathematical Methods of Operations Research》2009,70(1):13-33

We study perturbations of a discrete-time Markov control process on a general state space. The amount of perturbation is measured by means of the Kantorovich distance. We assume that an average (per unit of time on the infinite horizon) optimal control policy can be found for the perturbed (supposedly known) process, and that it is used to control the original (unperturbed) process. The one-stage cost is not assumed to be bounded. Under Lyapunov-like conditions we find upper bounds for the average cost excess when such an approximation is used in place of the optimal (unknown) control policy. As an application of the found inequalities we consider the approximation by relevant empirical distributions. We illustrate our results by estimating the stability of a simple autoregressive control process. Also examples of unstable processes are provided. 相似文献

6.

Unbounded cost Markov decision processes with limsup and liminf average criteria: new conditions

Quanxin Zhu Xianping Guo Yonglong Dai 《Mathematical Methods of Operations Research》2005,61(3):469-482

相似文献

7.

A new strong optimality criterion for nonstationary Markov decision processes

Xianping Guo Peng Shi Weiping Zhu 《Mathematical Methods of Operations Research》2000,52(2):287-306

This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given. 相似文献

8.

Non-randomized policies for constrained Markov decision processes

Richard C. Chen Eugene A. Feinberg 《Mathematical Methods of Operations Research》2007,66(1):165-179

This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non-randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application is presented. 相似文献

9.

Average optimality for continuous-time Markov decision processes with a policy iteration approach

Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704

This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献

10.

Average optimality inequality for continuous-time Markov decision processes in Polish spaces

Quanxin Zhu 《Mathematical Methods of Operations Research》2007,66(2):299-313

In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957). 相似文献

11.

On computing average cost optimal policies with application to routing to parallel queues

Linn I. Sennott 《Mathematical Methods of Operations Research》1997,45(1):45-62

The Approximating Sequence Method for computation of average cost optimal stationary policies in denumerahle state Markov decision chains, introduced in Sennott (1994), is reviewed. New methods for verifying the assumptions are given. These are useful for models with multidimensional state spaces that satisfy certain mild structural properties. The results are applied to four problems in the optimal routing of packets to parallel queues. Numerical results are given for one of the models. 相似文献

12.

Hamilton-Jacobi-Bellman inequality for the average control of piecewise deterministic Markov processes

《Stochastics An International Journal of Probability and Stochastic Processes》2013,85(6):817-835

ABSTRACT

The main goal of this paper is to study the infinite-horizon long run average continuous-time optimal control problem of piecewise deterministic Markov processes (PDMPs) with the control acting continuously on the jump intensity λ and on the transition measure Q of the process. We provide conditions for the existence of a solution to an integro-differential optimality inequality, the so called Hamilton-Jacobi-Bellman (HJB) equation, and for the existence of a deterministic stationary optimal policy. These results are obtained by using the so-called vanishing discount approach, under some continuity and compactness assumptions on the parameters of the problem, as well as some non-explosive conditions for the process. 相似文献

13.

Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state

Rolando Cavazos-Cadena 《Applied Mathematics and Optimization》1992,26(2):171-194

We consider discrete-timeaverage reward Markov decision processes with denumerable state space andbounded reward function. Under structural restrictions on the model the existence of an optimal stationary policy is proved; both the lim inf and lim sup average criteria are considered. In contrast to the usual approach our results donot rely on the average regard optimality equation. Rather, the arguments are based on well-known facts fromRenewal Theory.This research was supported in part by the Consejo Nacional de Ciencia y Tecnologia (CONACYT) under Grants PCEXCNA 040640 and 050156, and by SEMAC under Grant 89-1/00ifn$. 相似文献

14.

Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities

《Stochastics An International Journal of Probability and Stochastic Processes》2013,85(2):273-307

相似文献

15.

Risk-sensitive average continuous-time Markov decision processes with unbounded rates

《Optimization》2012,61(4):773-800

Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs. 相似文献

16.

An approximation approach to ergodic semi-Markov control processes

Anna Jaśkiewicz 《Mathematical Methods of Operations Research》2001,54(1):1-19

相似文献

17.

A note on negative dynamic programming for risk-sensitive control

Anna Jakiewicz 《Operations Research Letters》2008,36(5):531-534

Negative dynamic programming for risk-sensitive control is studied. Under some compactness and semicontinuity assumptions the following results are proved: the convergence of the value iteration algorithm to the optimal expected total reward, the Borel measurability or upper semicontinuity of the optimal value functions, and the existence of an optimal stationary policy. 相似文献

18.

Bias Optimality versus Strong 0-Discount Optimality in Markov Control Processes with Unbounded Costs

Nadine Hilgert Onésimo Hernández-Lerma 《Acta Appl Math》2003,77(3):215-235

This paper deals with expected average cost (EAC) and discount-sensitive criteria for discrete-time Markov control processes on Borel spaces, with possibly unbounded costs. Conditions are given under which (a) EAC optimality and strong –1-discount optimality are equivalent; (b) strong 0-discount optimality implies bias optimality; and, conversely, under an additional hypothesis, (c) bias optimality implies strong 0-discount optimality. Thus, in particular, as the class of bias optimal policies is nonempty, (c) gives the existence of a strong 0-discount optimal policy, whereas from (b) and (c) we get conditions for bias optimality and strong 0-discount optimality to be equivalent. A detailed example illustrates our results. 相似文献

19.

Denumerable state semi-Markov decision processes with unbounded costs,average cost criterion

A. Federgruen A. Hordijk H.C. Tijms 《Stochastic Processes and their Applications》1979,9(2):223-235

This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy. 相似文献

20.

Finite-horizon piecewise deterministic Markov decision processes with unbounded transition rates

Yonghui Huang Xianping Guo 《Stochastics An International Journal of Probability and Stochastic Processes》2019,91(1):67-95

This paper is concerned with the problem of minimizing the expected finite-horizon cost for piecewise deterministic Markov decision processes. The transition rates may be unbounded, and the cost functions are allowed to be unbounded from above and from below. The optimality is over the general history-dependent policies, where the control is continuously acting in time. The infinitesimal approach is employed to establish the associated Hamilton-Jacobi-Bellman equation, via which the existence of optimal policies is proved. An example is provided to verify all the assumptions proposed. 相似文献