首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
《Optimization》2012,61(5):767-781
This paper consider Markov decision processes with countable state space, compact action spaces and a bounded reward function. Under some recurrence and connectedness condition, including the simultaneous Döblin condition, we prove the existence of bounded solutions of the optimality equations which arise for the multichain case in connection with the average reward criterion and sensitive optimality criteria, and we give a characterization of the sets of n-average optimal decision rules.  相似文献   

3.
4.
This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given.  相似文献   

5.
The present paper gives a systematic presentation of different definitions of optimality in the infinite-time optimal control problem. Some of these definitions are new, while others have been used throughout the literature, sometimes with different names. The logical implications between them are clearly stated, corresponding comparison criteria for solutions are defined, and other relations as well as two types of equivalence relations are established.This work was supported in part by Simmons College Fund for Research, Grant No. 201.  相似文献   

6.
《Optimization》2012,61(12):1405-1426
This article concerns controlled Markov-modulated diffusions (also known as piecewise diffusions or switching diffusions or diffusions with Markovian switchings). Our main objective is to give conditions for the existence and characterization of overtaking optimal policies. To this end, first, we use fact that the average reward Hamilton–Jacobi–Bellman equation gives that the family of the so-called canonical control policies is nonempty. Then, within this family, we search policies with some special feature, for instance, canonical policies that in addition maximize the bias, which turn out to be overtaking optimal.  相似文献   

7.
A study is made of the conditions sufficient for the existence of overtaking trajectories for a class of infinite time-horizon, time-variant optimal control systems. Nonautonomy is restricted to disturbances with limits at infinity. The convergence property of the overtaking trajectories to the optimal steady-state limit is shown.  相似文献   

8.
9.
We consider a Markov decision process with a Borel state space, bounded rewards, and a bounded transition density satisfying a simultaneous Doeblin-Doob condition. An asymptotics for the discounted value function related to the existence of stationary strong 0-discount optimal policies is extended from the case of finite action sets to the case of compact action sets and continuous in action rewards and transition densities.Supported by NSF grant DMS-9404177  相似文献   

10.
In this paper we study the average sample-path cost(ASPC) problem for continuous-time Markov decision processes in Polish spaces.To the best of our knowledge,this paper is a first attempt to study the ASPC criterion on continuous-time MDPs with Polish state and action spaces.The corresponding transition rates are allowed to be unbounded,and the cost rates may have neither upper nor lower bounds.Under some mild hypotheses,we prove the existence of ε(ε≥ 0)-ASPC optimal stationary policies based on two differe...  相似文献   

11.
12.
Sufficient optimality criteria of the Kuhn-Tucker and Fritz John type in nonlinear programming are established in the presence of equality-inequality constraints. The constraint functions are assumed to be quasiconvex, and the objective function is taken to be pseudoconvex (or convex).  相似文献   

13.
In this paper, we investigate the relationship between two classes of optimality which have arisen in the study of dynamic optimization problems defined on an infinite-time domain. We utilize an optimal control framework to discuss our results. In particular, we establish relationships between limiting objective functional type optimality concepts, commonly known as overtaking optimality and weakly overtaking optimality, and the finite-horizon solution concepts of decision-horizon optimality and agreeable plans. Our results show that both classes of optimality are implied by corresponding uniform limiting objective functional type optimality concepts, referred to here as uniformly overtaking optimality and uniformly weakly overtaking optimality. This observation permits us to extract sufficient conditions for optimality from known sufficient conditions for overtaking and weakly overtaking optimality by strengthening their hypotheses. These results take the form of a strengthened maximum principle. Examples are given to show that the hypotheses of these results can be realized.This research was supported by the National Science Foundation, Grant No. DMS-87-00706, and by the Southern Illinois University at Carbondale, Summer Research Fellowship Program.  相似文献   

14.
This paper deals with the bias optimality of multichain models for finite continuous-time Markov decision processes. Based on new performance difference formulas developed here, we prove the convergence of a so-called bias-optimal policy iteration algorithm, which can be used to obtain bias-optimal policies in a finite number of iterations.  相似文献   

15.
This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games. Research partially supported by grants NSFC, DRFP and NCET. Research partially supported by CONACyT (Mexico) Grant 45693-F.  相似文献   

16.
17.
A second order sufficient optimality criterion is presented for a multiobjective problem subject to a constraint given just as a set. To this aim, we first refine known necessary conditions in such a way that the sufficient ones differ by the replacement of inequalities by strict inequalities. Furthermore, we show that no relationship holds between this criterion and a sufficient multipliers rule, when the constraint is described by inequalities and equalities. Finally, improvements of this criterion for the unconstrained case are presented, stressing the differences with single-objective optimization  相似文献   

18.
Statistically motivated algorithms for the solution of stochastic programming problems typically suffer from their inability to recognize optimality of a given solution algorithmically. Thus, the quality of solutions provided by such methods is difficult to ascertain. In this paper, we develop methods for verification of optimality conditions within the framework of Stochastic Decomposition (SD) algorithms for two stage linear programs with recourse. Consistent with the stochastic nature of an SD algorithm, we provide termination criteria that are based on statistical verification of traditional (deterministic) optimality conditions. We propose the use of bootstrap methods to confirm the satisfaction of generalized Kuhn-Tucker conditions and conditions based on Lagrange duality. These methods are illustrated in the context of a power generation planning model, and the results are encouraging.This work was supported in part by Grant No. AFOSR-88-0076 from the Air Force Office of Scientific Research and Grant No. DDM-89-10046 from the National Science Foundation.  相似文献   

19.
We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.This research was supported in part by the Advanced Technology Program of the State of Texas, in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, and in part by the Air Force Office of Scientific Research (AFSC) under Contract F49620-89-C-0044.  相似文献   

20.
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号