首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper is the first part of a study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove here the existence of deterministic stationary policies which are Blackwell optimal in the class of all, in general randomized, stationary policies. We establish also a lexicographical policy improvement algorithm leading to Blackwell optimal policies and the relation between such policies and the Blackwell optimality equation. Our technique is a combination of the weighted norms approach developed in Dekker and Hordijk (1988) for countable models with unbounded rewards and of the weak-strong topology approach used in Yushkevich (1997a) for Borel models with bounded rewards.  相似文献   

2.
This paper deals with Blackwell optimality for continuous-time controlled Markov chains with compact Borel action space, and possibly unbounded reward (or cost) rates and unbounded transition rates. We prove the existence of a deterministic stationary policy which is Blackwell optimal in the class of all admissible (nonstationary) Markov policies, thus extending previous results that analyzed Blackwell optimality in the class of stationary policies. We compare our assumptions to the corresponding ones for discrete-time Markov controlled processes.  相似文献   

3.
4.
After an introduction into sensitive criteria in Markov decision processes and a discussion of definitions, we prove the existence of stationary Blackwell optimal policies under following main assumptions: (i) the state space is a Borel one; (ii) the action space is countable, the action sets are finite; (iii) the transition function is given by a transition density; (iv) a simultaneous Doeblin-type recurrence condition holds. The proof is based on an aggregation of randomized stationary policies into measures. Topology in the space of those measures is at the same time a weak and a strong one, and this fact yields compactness of the space and continuity of Laurent coefficients of the expected discounted reward. Another important tool is a lexicographical policy improvement. The exposition is mostly self-contained.Supported by the National Science Foundation.  相似文献   

5.
This paper studies discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains (CMC) with denumerable state space and compact action space, and with an infinite planning horizon. Recently, there has been a renewed interest in CMC with a long-run, expected average cost (AC) optimality criterion. A classical approach to study average optimality consists in formulating the AC case as a limit of the discounted cost (DC) case, as the discount factor increases to 1, i.e., as the discounting effectvanishes. This approach has been rekindled in recent years, with the introduction by Sennott and others of conditions under which AC optimal stationary policies are shown to exist. However, AC optimality is a rather underselective criterion, which completely neglects the finite-time evolution of the controlled process. Our main interest in this paper is to study the relation between the notions of AC optimality andstrong average cost (SAC) optimality. The latter criterion is introduced to asses the performance of a policy over long but finite horizons, as well as in the long-run average sense. We show that for bounded one-stage cost functions, Sennott's conditions are sufficient to guarantee thatevery AC optimal policy is also SAC optimal. On the other hand, a detailed counterexample is given that shows that the latter result does not extend to the case of unbounded cost functions. In this counterexample, Sennott's conditions are verified and a policy is exhibited that is both average and Blackwell optimal and satisfies the average cost inequality.  相似文献   

6.
In this paper, applying an interval arithmetic analysis, we consider the average case of controlled Markov set-chains, whose process allows for fluctuating transition matrices at each step in time. We introduce a v-step contractive property for the average case, under which a Pareto optimal periodic policy is characterized as a maximal solution of optimality equation. Also, in the class of stationary policies, the behavior of the expected reward over T-horizon as T approaches ∞ is investigated and the left- and right-hand side optimality equations are given, by which a Pareto optimal stationary policy is found. As a numerical example, the Taxicab problem is considered.  相似文献   

7.
This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by different classes of policies; we then focus on stationary policies and on mixed deterministic policies and present conditions under which optimal policies exist within these classes. We conclude by introducing an equivalent infinite Linear Program.  相似文献   

8.
This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature.  相似文献   

9.
We generalize the geometric discount of finite discounted cost Markov Decision Processes to “exponentially representable”discount functions, prove existence of optimal policies which are stationary from some time N onward, and provide an algorithm for their computation. Outside this class, optimal “N-stationary” policies in general do not exist.  相似文献   

10.
《Optimization》2012,61(1):191-202
This paper presents a recurrent condition on Markov decision processes with a countable state space and bounded rewards. The condition is sufficient for the existence of a Blackwell optimal stationary policy, having the Laurent series expansion with continuous coefficients. It is so relaxed that the Markov chain corresponding to a stationary policy may have countably many periodic recurrent classes. Our method finds the deviation matrix in an explicit form.  相似文献   

11.
In this paper, we consider positive stochastic games, when the state and action spaces are all infinite. We prove that, under certain conditions, the positive stochastic game has a value and that the maximizing player has an -optimal stationary strategy and the minimizing player has an optimal stationary strategy.The authors are grateful to Professor David Blackwell and the referee for some useful comments.  相似文献   

12.
In this paper, we discuss Markovian decision programming with recursive vector-reward andgive an algorithm to find optimal policies. We prove that: (1) There is a Markovian optimal policy for the nonstationary case; (2) Thereis a stationary optimal policy for the stationary case.  相似文献   

13.
We consider a single-item, continuous-review, (s, S) inventory system, under complete backlogging and a constant procurement lead time. Demands occur in independent, identically distributed batches, separated by independent identically distributed intervals. The model also includes a class of periodic review systems as a special case. Of interest are the optimal control policies with respect to a stationary cost rate function, constructed to include ordering, holding and shortage costs. We study the structural properties of the cost rate function and report some new bounds and optimality conditions. An application of these to the computation and approximation of optimal policies is also discussed.  相似文献   

14.
We consider the constrained optimization of a finite-state, finite action Markov chain. In the adaptive problem, the transition probabilities are assumed to be unknown, and no prior distribution on their values is given. We consider constrained optimization problems in terms of several cost criteria which are asymptotic in nature. For these criteria we show that it is possible to achieve the same optimal cost as in the non-adaptive case.We first formulate a constrained optimization problem under each of the cost criteria and establish the existence of optimal stationary policies.Since the adaptive problem is inherently non-stationary, we suggest a class ofAsymptotically Stationary (AS) policies, and show that, under each of the cost criteria, the costs of an AS policy depend only on its limiting behavior. This property implies that there exist optimal AS policies. A method for generating adaptive policies is then suggested, which leads to strongly consistent estimators for the unknown transition probabilities. A way to guarantee that these policies are also optimal is to couple them with the adaptive algorithm of [3]. This leads to optimal policies for each of the adaptive constrained optimization problems under discussion.This work was supported in part through United States-Israel Binational Science Foundation Grant BSF 85-00306.  相似文献   

15.
This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation.  相似文献   

16.
This paper deals with constrained average reward Semi-Markov Decision Processes (SMDPs) with finite state and action sets. We consider two average reward criteria. The first criterion is time-average rewards, which equal the lower limits of the expected average rewards per unit time, as the horizon tends to infinity. The second criterion is ratio-average rewards, which equal the lower limits of the ratios of the expected total rewards during the firstn steps to the expected total duration of thesen steps asn . For both criteria, we prove the existence of optimal mixed stationary policies for constrained problems when the constraints are of the same nature as the objective functions. For unichain problems, we show the existence of randomized stationary policies which are optimal for both criteria. However, optimal mixed stationary policies may be different for each of these critria even for unichain problems. We provide linear programming algorithms for the computation of optimal policies.  相似文献   

17.
Abstract

In this paper we study discrete-time Markov decision processes with average expected costs (AEC) and discount-sensitive criteria in Borel state and action spaces. The costs may have neither upper nor lower bounds. We propose another set of conditions on the system's primitive data, and under which we prove (1) AEC optimality and strong ? 1-discount optimality are equivalent; (2) a condition equivalent to strong 0-discount optimal stationary policies; and (3) the existence of strong n (n = ?1, 0)-discount optimal stationary policies. Our conditions are weaker than those in the previous literature. In particular, the “stochastic monotonicity condition” in this paper has been first used to study strong n (n = ?1, 0)-discount optimality. Moreover, we provide a new approach to prove the existence of strong 0-discount optimal stationary policies. It should be noted that our way is slightly different from those in the previous literature. Finally, we apply our results to an inventory system and a controlled queueing system.  相似文献   

18.
《随机分析与应用》2013,31(3):589-625
Abstract

We consider a periodic-review stochastic inventory problem in which demands for a single product in each of a finite number of periods are independent and identically distributed random variables. We analyze the case where shortages (stockouts) are penalized via fixed and proportional costs simultaneously. For this problem, due to the finiteness of the planning horizon and non-linearity of the shortage costs, computing the optimal inventory policy requires a substantial effort as noted in the previous literature. Hence, our paper is aimed at reducing this computational burden. As a resolution, we propose to compute “the best stationary policy.” To this end, we restrict our attention to the class of stationary base-stock policies, and show that the multi-period, stochastic, dynamic problem at hand can be reduced to a deterministic, static equivalent. Using this important result, we introduce a model for computing an optimal stationary base-stock policy for the finite horizon problem under consideration. Fundamental analytic conclusions, some numerical examples, and related research findings are also discussed.  相似文献   

19.
This paper deals with discrete-time Markov decision processes with average sample-path costs (ASPC) in Borel spaces. The costs may have neither upper nor lower bounds. We propose new conditions for the existence of ε-ASPC-optimal (deterministic) stationary policies in the class of all randomized history-dependent policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of ASPC optimal stationary policies are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has first been used to study the ASPC criterion. Also, the approach provided here is slightly different from the “optimality equation approach” widely used in the previous literature. On the other hand, under mild assumptions we show that average expected cost optimality and ASPC-optimality are equivalent. Finally, we use a controlled queueing system to illustrate our results.  相似文献   

20.
We study the ergodic control problem for a class of controlled jump diffusions driven by a compound Poisson process. This extends the results of Arapostathis et al. (2019) to running costs that are not near-monotone. This generality is needed in applications such as optimal scheduling of large-scale parallel server networks.We provide a full characterizations of optimality via the Hamilton–Jacobi–Bellman (HJB) equation, for which we additionally exhibit regularity of solutions under mild hypotheses. In addition, we show that optimal stationary Markov controls are a.s. pathwise optimal. Lastly, we show that one can fix a stable control outside a compact set and obtain near-optimal solutions by solving the HJB on a sufficiently large bounded domain. This is useful for constructing asymptotically optimal scheduling policies for multiclass parallel server networks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号