共查询到20条相似文献,搜索用时 15 毫秒
1.
A. A. Yushkevich 《Mathematical Methods of Operations Research》1995,42(1):93-108
We consider a Markov decision process with a Borel state space, a countable action space, finite action sets, bounded rewards and a bounded transition density satisfying a simultaneous Doeblin condition. The existence of stationary strong 0-discount optimal polices is proved.Supported by NSF grant DMS-9404177. 相似文献
2.
3.
Masami Kurano 《Annals of Operations Research》1991,29(1):375-385
Average cost Markov decision processes (MDPs) with compact state and action spaces and bounded lower semicontinuous cost functions are considered. Kurano [7] has treated the general case in which several ergodic classes and a transient set are permitted for the Markov process induced by any randomized stationary policy under the hypothesis of Doeblin and showed the existence of a minimum pair of state and policy. This paper considers the same case as that discussed in Kurano [7] and proves some new results which give the existence theorem of an optimal stationary policy under some reasonable conditions. 相似文献
4.
5.
Rolando Cavazos-Cadena 《Annals of Operations Research》1991,28(1):3-27
This paper concerns countable state space Markov decision processes endowed with a (long-run expected)average reward criterion. For these models we summarize and, in some cases,extend some recent results on sufficient conditions to establish the existence of optimal stationary policies. The topics considered are the following: (i) the new assumptions introduced by Sennott in [20–23], (ii)necessary and sufficient conditions for the existence of a bounded solution to the optimality equation, and (iii) equivalence of average optimality criteria. Some problems are posed.This research was partially supported by the Third World Academy of Sciences (TWAS) under Grant No. TWAS RG MP 898-152. 相似文献
6.
This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results. 相似文献
7.
This paper is the first part of a study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove here the existence of deterministic stationary policies which are Blackwell optimal in the class of all, in general randomized, stationary policies. We establish also a lexicographical policy improvement algorithm leading to Blackwell optimal policies and the relation between such policies and the Blackwell optimality equation. Our technique is a combination of the weighted norms approach developed in Dekker and Hordijk (1988) for countable models with unbounded rewards and of the weak-strong topology approach used in Yushkevich (1997a) for Borel models with bounded rewards. 相似文献
8.
Quan-xin Zhu 《应用数学学报(英文版)》2011,27(4):613-624
In this paper we study the average sample-path cost(ASPC) problem for continuous-time Markov decision processes in Polish spaces.To the best of our knowledge,this paper is a first attempt to study the ASPC criterion on continuous-time MDPs with Polish state and action spaces.The corresponding transition rates are allowed to be unbounded,and the cost rates may have neither upper nor lower bounds.Under some mild hypotheses,we prove the existence of ε(ε≥ 0)-ASPC optimal stationary policies based on two differe... 相似文献
9.
Quanxin Zhu 《Mathematical Methods of Operations Research》2007,66(2):299-313
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward)
rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal
stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new”
cost (or reward) rate.
Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation
of Guangdong Province (Grant No: 06300957). 相似文献
10.
Quan-xin Zhu 《高校应用数学学报(英文版)》2010,25(4):400-410
This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions. 相似文献
11.
Quanxin Zhu 《随机分析与应用》2013,31(5):953-974
Abstract In this paper we study discrete-time Markov decision processes with average expected costs (AEC) and discount-sensitive criteria in Borel state and action spaces. The costs may have neither upper nor lower bounds. We propose another set of conditions on the system's primitive data, and under which we prove (1) AEC optimality and strong ? 1-discount optimality are equivalent; (2) a condition equivalent to strong 0-discount optimal stationary policies; and (3) the existence of strong n (n = ?1, 0)-discount optimal stationary policies. Our conditions are weaker than those in the previous literature. In particular, the “stochastic monotonicity condition” in this paper has been first used to study strong n (n = ?1, 0)-discount optimality. Moreover, we provide a new approach to prove the existence of strong 0-discount optimal stationary policies. It should be noted that our way is slightly different from those in the previous literature. Finally, we apply our results to an inventory system and a controlled queueing system. 相似文献
12.
1.IntroductionInreliabilitytheory,inordertocalculatethefailurefrequencyofarepairablesystem,Shily]firstintroducedandstudiedthetransitionfrequencybetweentwodisjointstatesetsforafiniteMarkovchainandavectorMarkovprocesswithfinitediscretestatespaceandobtainedageneralformulaoftransitionfrequency.Then,ontheconditionthatthegeneratormatrixofMarkovchainisuniformlybounded,Shi[8'9]againprovedthetransitionfrequencyformulaandobtainedthreeotherusefulformulas.Obviously,thepoint(orcalledcounting)processofsta… 相似文献
13.
《Optimization》2012,61(5):767-781
This paper consider Markov decision processes with countable state space, compact action spaces and a bounded reward function. Under some recurrence and connectedness condition, including the simultaneous Döblin condition, we prove the existence of bounded solutions of the optimality equations which arise for the multichain case in connection with the average reward criterion and sensitive optimality criteria, and we give a characterization of the sets of n-average optimal decision rules. 相似文献
14.
A. A. Yushkevich 《Mathematical Methods of Operations Research》1996,44(2):223-231
We consider a Markov decision process with a Borel state space, bounded rewards, and a bounded transition density satisfying a simultaneous Doeblin-Doob condition. An asymptotics for the discounted value function related to the existence of stationary strong 0-discount optimal policies is extended from the case of finite action sets to the case of compact action sets and continuous in action rewards and transition densities.Supported by NSF grant DMS-9404177 相似文献
15.
Bias Optimality versus Strong 0-Discount Optimality in Markov Control Processes with Unbounded Costs
This paper deals with expected average cost (EAC) and discount-sensitive criteria for discrete-time Markov control processes on Borel spaces, with possibly unbounded costs. Conditions are given under which (a) EAC optimality and strong –1-discount optimality are equivalent; (b) strong 0-discount optimality implies bias optimality; and, conversely, under an additional hypothesis, (c) bias optimality implies strong 0-discount optimality. Thus, in particular, as the class of bias optimal policies is nonempty, (c) gives the existence of a strong 0-discount optimal policy, whereas from (b) and (c) we get conditions for bias optimality and strong 0-discount optimality to be equivalent. A detailed example illustrates our results. 相似文献
16.
17.
18.
Rolando Cavazos-Cadena 《Applied Mathematics and Optimization》1992,26(2):171-194
We consider discrete-timeaverage reward Markov decision processes with denumerable state space andbounded reward function. Under structural restrictions on the model the existence of an optimal stationary policy is proved; both the lim inf and lim sup average criteria are considered. In contrast to the usual approach our results donot rely on the average regard optimality equation. Rather, the arguments are based on well-known facts fromRenewal Theory.This research was supported in part by the Consejo Nacional de Ciencia y Tecnologia (CONACYT) under Grants PCEXCNA 040640 and 050156, and by SEMAC under Grant 89-1/00ifn$. 相似文献
19.
This paper deals with discrete-time Markov decision processes with average sample-path costs (ASPC) in Borel spaces. The costs may have neither upper nor lower bounds. We propose new conditions for the existence of ε-ASPC-optimal (deterministic) stationary policies in the class of all randomized history-dependent policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of ASPC optimal stationary policies are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has first been used to study the ASPC criterion. Also, the approach provided here is slightly different from the “optimality equation approach” widely used in the previous literature. On the other hand, under mild assumptions we show that average expected cost optimality and ASPC-optimality are equivalent. Finally, we use a controlled queueing system to illustrate our results. 相似文献