期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards

Arie Hordijk Alexander A. Yushkevich 《Mathematical Methods of Operations Research》1999,50(3):421-448

相似文献

2.

Optimality equations and sensitive optimality in bounded Markov decision processes 1

《Optimization》2012,61(5):767-781

This paper consider Markov decision processes with countable state space, compact action spaces and a bounded reward function. Under some recurrence and connectedness condition, including the simultaneous Döblin condition, we prove the existence of bounded solutions of the optimality equations which arise for the multichain case in connection with the average reward criterion and sensitive optimality criteria, and we give a characterization of the sets of n-average optimal decision rules. 相似文献

3.

The Laurent series, sensitive discount and Blackwell optimality for continuous-time controlled Markov chains

Tomás Prieto-Rumeau Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2005,61(1):123-145

相似文献

4.

Criteria of optimality in the infinite-time optimal control problem

L. E. Stern 《Journal of Optimization Theory and Applications》1984,44(3):497-508

The present paper gives a systematic presentation of different definitions of optimality in the infinite-time optimal control problem. Some of these definitions are new, while others have been used throughout the literature, sometimes with different names. The logical implications between them are clearly stated, corresponding comparison criteria for solutions are defined, and other relations as well as two types of equivalence relations are established.This work was supported in part by Simmons College Fund for Research, Grant No. 201. 相似文献

5.

A new strong optimality criterion for nonstationary Markov decision processes

Xianping Guo Peng Shi Weiping Zhu 《Mathematical Methods of Operations Research》2000,52(2):287-306

This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given. 相似文献

6.

Overtaking optimality for controlled Markov-modulated diffusions

《Optimization》2012,61(12):1405-1426

This article concerns controlled Markov-modulated diffusions (also known as piecewise diffusions or switching diffusions or diffusions with Markovian switchings). Our main objective is to give conditions for the existence and characterization of overtaking optimal policies. To this end, first, we use fact that the average reward Hamilton–Jacobi–Bellman equation gives that the family of the so-called canonical control policies is nonempty. Then, within this family, we search policies with some special feature, for instance, canonical policies that in addition maximize the bias, which turn out to be overtaking optimal. 相似文献

7.

On the existence of overtaking trajectories

T. Eirola V. Kaitala 《Journal of Optimization Theory and Applications》1986,49(2):227-237

A study is made of the conditions sufficient for the existence of overtaking trajectories for a class of infinite time-horizon, time-variant optimal control systems. Nonautonomy is restricted to disturbances with limits at infinity. The convergence property of the overtaking trajectories to the optimal steady-state limit is shown. 相似文献

8.

A note on strong 1-optimal policies in Markov decision chains with unbounded costs

Andrzej S. Nowak 《Mathematical Methods of Operations Research》1999,49(3):475-482

相似文献

9.

A note on asymptotics of discounted value function and strong 0-discount optimality

A. A. Yushkevich 《Mathematical Methods of Operations Research》1996,44(2):223-231

We consider a Markov decision process with a Borel state space, bounded rewards, and a bounded transition density satisfying a simultaneous Doeblin-Doob condition. An asymptotics for the discounted value function related to the existence of stationary strong 0-discount optimal policies is extended from the case of finite action sets to the case of compact action sets and continuous in action rewards and transition densities.Supported by NSF grant DMS-9404177 相似文献

10.

Bias and overtaking equilibria for zero-sum continuous-time Markov games 总被引：1，自引：0，他引：1

Tomás Prieto-Rumeau Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2005,61(3):437-454

相似文献

11.

Average sample-path optimality for continuous-time Markov decision processes in Polish spaces

Quan-xin Zhu 《应用数学学报(英文版)》2011,27(4):613-624

In this paper we study the average sample-path cost(ASPC) problem for continuous-time Markov decision processes in Polish spaces.To the best of our knowledge,this paper is a first attempt to study the ASPC criterion on continuous-time MDPs with Polish state and action spaces.The corresponding transition rates are allowed to be unbounded,and the cost rates may have neither upper nor lower bounds.Under some mild hypotheses,we prove the existence of ε(ε≥ 0)-ASPC optimal stationary policies based on two differe... 相似文献

12.

Uniformly overtaking and weakly overtaking optimal solutions in infinite-horizon optimal control: When optimal solutions are agreeable

D. A. Carlson 《Journal of Optimization Theory and Applications》1990,64(1):55-69

In this paper, we investigate the relationship between two classes of optimality which have arisen in the study of dynamic optimization problems defined on an infinite-time domain. We utilize an optimal control framework to discuss our results. In particular, we establish relationships between limiting objective functional type optimality concepts, commonly known as overtaking optimality and weakly overtaking optimality, and the finite-horizon solution concepts of decision-horizon optimality and agreeable plans. Our results show that both classes of optimality are implied by corresponding uniform limiting objective functional type optimality concepts, referred to here as uniformly overtaking optimality and uniformly weakly overtaking optimality. This observation permits us to extract sufficient conditions for optimality from known sufficient conditions for overtaking and weakly overtaking optimality by strengthening their hypotheses. These results take the form of a strengthened maximum principle. Examples are given to show that the hypotheses of these results can be realized.This research was supported by the National Science Foundation, Grant No. DMS-87-00706, and by the Southern Illinois University at Carbondale, Summer Research Fellowship Program. 相似文献

13.

Sufficient optimality criteria in nonlinear programming for generalized equality-inequality constraints

C. Singh 《Journal of Optimization Theory and Applications》1977,22(4):631-635

Sufficient optimality criteria of the Kuhn-Tucker and Fritz John type in nonlinear programming are established in the presence of equality-inequality constraints. The constraint functions are assumed to be quasiconvex, and the objective function is taken to be pseudoconvex (or convex). 相似文献

14.

Bias optimality for multichain continuous-time Markov decision processes

Xianping Guo XinYuan Song Junyu Zhang 《Operations Research Letters》2009,37(5):317-321

This paper deals with the bias optimality of multichain models for finite continuous-time Markov decision processes. Based on new performance difference formulas developed here, we prove the convergence of a so-called bias-optimal policy iteration algorithm, which can be used to obtain bias-optimal policies in a finite number of iterations. 相似文献

15.

A survey of recent results on continuous-time Markov decision processes

Xianping Guo Onésimo Hernández-Lerma Tomás Prieto-Rumeau Xi-Ren Cao Junyu Zhang Qiying Hu Mark E. Lewis Ricardo Vélez 《TOP》2006,14(2):177-261

This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games. Research partially supported by grants NSFC, DRFP and NCET. Research partially supported by CONACyT (Mexico) Grant 45693-F. 相似文献

16.

A parametric characterization of local optimality

Dür Mirjam 《Mathematical Methods of Operations Research》2003,57(1):101-109

相似文献

17.

On sufficient second order optimality conditions in multiobjective optimization

Giancarlo Bigi 《Mathematical Methods of Operations Research》2006,63(1):77-85

A second order sufficient optimality criterion is presented for a multiobjective problem subject to a constraint given just as a set. To this aim, we first refine known necessary conditions in such a way that the sufficient ones differ by the replacement of inequalities by strict inequalities. Furthermore, we show that no relationship holds between this criterion and a sufficient multipliers rule, when the constraint is described by inequalities and equalities. Finally, improvements of this criterion for the unconstrained case are presented, stressing the differences with single-objective optimization 相似文献

18.

Statistical verification of optimality conditions for stochastic programs with recourse

Julia L. Higle Suvrajeet Sen 《Annals of Operations Research》1991,30(1):215-239

Statistically motivated algorithms for the solution of stochastic programming problems typically suffer from their inability to recognize optimality of a given solution algorithmically. Thus, the quality of solutions provided by such methods is difficult to ascertain. In this paper, we develop methods for verification of optimality conditions within the framework of Stochastic Decomposition (SD) algorithms for two stage linear programs with recourse. Consistent with the stochastic nature of an SD algorithm, we provide termination criteria that are based on statistical verification of traditional (deterministic) optimality conditions. We propose the use of bootstrap methods to confirm the satisfaction of generalized Kuhn-Tucker conditions and conditions based on Lagrange duality. These methods are illustrated in the context of a power generation planning model, and the results are encouraging.This work was supported in part by Grant No. AFOSR-88-0076 from the Air Force Office of Scientific Research and Grant No. DDM-89-10046 from the National Science Foundation. 相似文献

19.

On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes

Emmanuel Fernández-Gaucherand Aristotle Arapostathis Steven I. Marcus 《Annals of Operations Research》1991,29(1):439-469

We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.This research was supported in part by the Advanced Technology Program of the State of Texas, in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, and in part by the Air Force Office of Scientific Research (AFSC) under Contract F49620-89-C-0044. 相似文献

20.

Average optimality inequality for continuous-time Markov decision processes in Polish spaces

Quanxin Zhu 《Mathematical Methods of Operations Research》2007,66(2):299-313

In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957). 相似文献