期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards

Arie Hordijk Alexander A. Yushkevich 《Mathematical Methods of Operations Research》1999,49(1):1-39

This paper is the first part of a study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove here the existence of deterministic stationary policies which are Blackwell optimal in the class of all, in general randomized, stationary policies. We establish also a lexicographical policy improvement algorithm leading to Blackwell optimal policies and the relation between such policies and the Blackwell optimality equation. Our technique is a combination of the weighted norms approach developed in Dekker and Hordijk (1988) for countable models with unbounded rewards and of the weak-strong topology approach used in Yushkevich (1997a) for Borel models with bounded rewards. 相似文献

2.

Blackwell Optimality in the Class of Markov Policies for Continuous-Time Controlled Markov Chains

Tomás Prieto-Rumeau 《Acta Appl Math》2006,92(1):77-96

This paper deals with Blackwell optimality for continuous-time controlled Markov chains with compact Borel action space, and possibly unbounded reward (or cost) rates and unbounded transition rates. We prove the existence of a deterministic stationary policy which is Blackwell optimal in the class of all admissible (nonstationary) Markov policies, thus extending previous results that analyzed Blackwell optimality in the class of stationary policies. We compare our assumptions to the corresponding ones for discrete-time Markov controlled processes. 相似文献

3.

A counterexample on overtaking optimality

Andrzej S. Nowak Oscar Vega-Amaya 《Mathematical Methods of Operations Research》1999,49(3):435-439

相似文献

4.

Blackwell optimal policies in a Markov decision process with a Borel state space

A. A. Yushkevich 《Mathematical Methods of Operations Research》1994,40(3):253-288

After an introduction into sensitive criteria in Markov decision processes and a discussion of definitions, we prove the existence of stationary Blackwell optimal policies under following main assumptions: (i) the state space is a Borel one; (ii) the action space is countable, the action sets are finite; (iii) the transition function is given by a transition density; (iv) a simultaneous Doeblin-type recurrence condition holds. The proof is based on an aggregation of randomized stationary policies into measures. Topology in the space of those measures is at the same time a weak and a strong one, and this fact yields compactness of the space and continuity of Laurent coefficients of the expected discounted reward. Another important tool is a lexicographical policy improvement. The exposition is mostly self-contained.Supported by the National Science Foundation. 相似文献

5.

Denumerable controlled Markov chains with strong average optimality criterion: Bounded & unbounded costs

Rolando Cavazos-Cadena Emmanuel Fernández-Gaucherand 《Mathematical Methods of Operations Research》1996,43(3):281-300

This paper studies discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains (CMC) with denumerable state space and compact action space, and with an infinite planning horizon. Recently, there has been a renewed interest in CMC with a long-run, expected average cost (AC) optimality criterion. A classical approach to study average optimality consists in formulating the AC case as a limit of the discounted cost (DC) case, as the discount factor increases to 1, i.e., as the discounting effectvanishes. This approach has been rekindled in recent years, with the introduction by Sennott and others of conditions under which AC optimal stationary policies are shown to exist. However, AC optimality is a rather underselective criterion, which completely neglects the finite-time evolution of the controlled process. Our main interest in this paper is to study the relation between the notions of AC optimality andstrong average cost (SAC) optimality. The latter criterion is introduced to asses the performance of a policy over long but finite horizons, as well as in the long-run average sense. We show that for bounded one-stage cost functions, Sennott's conditions are sufficient to guarantee thatevery AC optimal policy is also SAC optimal. On the other hand, a detailed counterexample is given that shows that the latter result does not extend to the case of unbounded cost functions. In this counterexample, Sennott's conditions are verified and a policy is exhibited that is both average and Blackwell optimal and satisfies the average cost inequality. 相似文献

6.

Controlled Markov set-chains under average criteria

Masanori Hosaka Masayuki Horiguchi Masami Kurano 《Applied mathematics and computation》2001,120(1-3):195-209

In this paper, applying an interval arithmetic analysis, we consider the average case of controlled Markov set-chains, whose process allows for fluctuating transition matrices at each step in time. We introduce a v-step contractive property for the average case, under which a Pareto optimal periodic policy is characterized as a maximal solution of optimality equation. Also, in the class of stationary policies, the behavior of the expected reward over T-horizon as T approaches ∞ is investigated and the left- and right-hand side optimality equations are given, by which a Pareto optimal stationary policy is found. As a numerical example, the Taxicab problem is considered. 相似文献

7.

Constrained Markov decision processes with total cost criteria: Occupation measures and primal LP

Eitan Altman 《Mathematical Methods of Operations Research》1996,43(1):45-72

This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by different classes of policies; we then focus on stationary policies and on mixed deterministic policies and present conditions under which optimal policies exist within these classes. We conclude by introducing an equivalent infinite Linear Program. 相似文献

8.

Average optimality for continuous-time Markov decision processes with a policy iteration approach

Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704

This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献

9.

Markov decision processes with exponentially representable discounting

Yair Carmon 《Operations Research Letters》2009,37(1):51-55

We generalize the geometric discount of finite discounted cost Markov Decision Processes to “exponentially representable”discount functions, prove existence of optimal policies which are stationary from some time N onward, and provide an algorithm for their computation. Outside this class, optimal “N-stationary” policies in general do not exist. 相似文献

10.

Deviation Matrix,Laurent Series and Blackwell Optimality in Countable State Markov Decision Processes

《Optimization》2012,61(1):191-202

This paper presents a recurrent condition on Markov decision processes with a countable state space and bounded rewards. The condition is sufficient for the existence of a Blackwell optimal stationary policy, having the Laurent series expansion with continuous coefficients. It is so relaxed that the Markov chain corresponding to a stationary policy may have countably many periodic recurrent classes. Our method finds the deviation matrix in an explicit form. 相似文献

11.

On stochastic games,II

A. Maitra T. Parthasarathy 《Journal of Optimization Theory and Applications》1971,8(2):154-160

In this paper, we consider positive stochastic games, when the state and action spaces are all infinite. We prove that, under certain conditions, the positive stochastic game has a value and that the maximizing player has an -optimal stationary strategy and the minimizing player has an optimal stationary strategy.The authors are grateful to Professor David Blackwell and the referee for some useful comments. 相似文献

12.

MARKOVIAN DECISION PROGRAMMING WITH RECURSIVE VECTOR-REWARD

刘建庸刘克《应用数学学报(英文版)》1990,6(2):158-165

In this paper, we discuss Markovian decision programming with recursive vector-reward andgive an algorithm to find optimal policies. We prove that: (1) There is a Markovian optimal policy for the nonstationary case; (2) Thereis a stationary optimal policy for the stationary case. 相似文献

13.

Optimality conditions for regenerative inventory systems under batch demands

Izzet Sahin 《商业与工业应用随机模型》1988,4(3):173-183

We consider a single-item, continuous-review, (s, S) inventory system, under complete backlogging and a constant procurement lead time. Demands occur in independent, identically distributed batches, separated by independent identically distributed intervals. The model also includes a class of periodic review systems as a special case. Of interest are the optimal control policies with respect to a stationary cost rate function, constructed to include ordering, holding and shortage costs. We study the structural properties of the cost rate function and report some new bounds and optimality conditions. An application of these to the computation and approximation of optimal policies is also discussed. 相似文献

14.

Adaptive control of constrained Markov chains: Criteria and policies

Eitan Altman Adam Shwartz 《Annals of Operations Research》1991,28(1):101-134

We consider the constrained optimization of a finite-state, finite action Markov chain. In the adaptive problem, the transition probabilities are assumed to be unknown, and no prior distribution on their values is given. We consider constrained optimization problems in terms of several cost criteria which are asymptotic in nature. For these criteria we show that it is possible to achieve the same optimal cost as in the non-adaptive case.We first formulate a constrained optimization problem under each of the cost criteria and establish the existence of optimal stationary policies.Since the adaptive problem is inherently non-stationary, we suggest a class ofAsymptotically Stationary (AS) policies, and show that, under each of the cost criteria, the costs of an AS policy depend only on its limiting behavior. This property implies that there exist optimal AS policies. A method for generating adaptive policies is then suggested, which leads to strongly consistent estimators for the unknown transition probabilities. A way to guarantee that these policies are also optimal is to couple them with the adaptive algorithm of [3]. This leads to optimal policies for each of the adaptive constrained optimization problems under discussion.This work was supported in part through United States-Israel Binational Science Foundation Grant BSF 85-00306. 相似文献

15.

Discounted continuous-time Markov decision processes with unbounded rates and randomized history-dependent policies: the dynamic programming approach

Alexey Piunovskiy Yi Zhang 《4OR: A Quarterly Journal of Operations Research》2014,12(1):49-75

This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation. 相似文献

16.

Constrained Semi-Markov decision processes with average rewards

Eugene A. Feinberg 《Mathematical Methods of Operations Research》1994,39(3):257-288

This paper deals with constrained average reward Semi-Markov Decision Processes (SMDPs) with finite state and action sets. We consider two average reward criteria. The first criterion is time-average rewards, which equal the lower limits of the expected average rewards per unit time, as the horizon tends to infinity. The second criterion is ratio-average rewards, which equal the lower limits of the ratios of the expected total rewards during the firstn steps to the expected total duration of thesen steps asn . For both criteria, we prove the existence of optimal mixed stationary policies for constrained problems when the constraints are of the same nature as the objective functions. For unichain problems, we show the existence of randomized stationary policies which are optimal for both criteria. However, optimal mixed stationary policies may be different for each of these critria even for unichain problems. We provide linear programming algorithms for the computation of optimal policies. 相似文献

17.

Another Set of Conditions for Strong n (n = −1, 0) Discount Optimality in Markov Decision Processes

Quanxin Zhu 《随机分析与应用》2013,31(5):953-974

Abstract

In this paper we study discrete-time Markov decision processes with average expected costs (AEC) and discount-sensitive criteria in Borel state and action spaces. The costs may have neither upper nor lower bounds. We propose another set of conditions on the system's primitive data, and under which we prove (1) AEC optimality and strong ? 1-discount optimality are equivalent; (2) a condition equivalent to strong 0-discount optimal stationary policies; and (3) the existence of strong n (n = ?1, 0)-discount optimal stationary policies. Our conditions are weaker than those in the previous literature. In particular, the “stochastic monotonicity condition” in this paper has been first used to study strong n (n = ?1, 0)-discount optimality. Moreover, we provide a new approach to prove the existence of strong 0-discount optimal stationary policies. It should be noted that our way is slightly different from those in the previous literature. Finally, we apply our results to an inventory system and a controlled queueing system. 相似文献

18.

Computing a Stationary Base-Stock Policy for a Finite Horizon Stochastic Inventory Problem with Non-linear Shortage Costs

《随机分析与应用》2013,31(3):589-625

Abstract

We consider a periodic-review stochastic inventory problem in which demands for a single product in each of a finite number of periods are independent and identically distributed random variables. We analyze the case where shortages (stockouts) are penalized via fixed and proportional costs simultaneously. For this problem, due to the finiteness of the planning horizon and non-linearity of the shortage costs, computing the optimal inventory policy requires a substantial effort as noted in the previous literature. Hence, our paper is aimed at reducing this computational burden. As a resolution, we propose to compute “the best stationary policy.” To this end, we restrict our attention to the class of stationary base-stock policies, and show that the multi-period, stochastic, dynamic problem at hand can be reduced to a deterministic, static equivalent. Using this important result, we introduce a model for computing an optimal stationary base-stock policy for the finite horizon problem under consideration. Fundamental analytic conclusions, some numerical examples, and related research findings are also discussed. 相似文献

19.

Another set of conditions for Markov decision processes with average sample-path costs

Quanxin Zhu Xianping Guo 《Journal of Mathematical Analysis and Applications》2006,322(2):1199-1214

This paper deals with discrete-time Markov decision processes with average sample-path costs (ASPC) in Borel spaces. The costs may have neither upper nor lower bounds. We propose new conditions for the existence of ε-ASPC-optimal (deterministic) stationary policies in the class of all randomized history-dependent policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of ASPC optimal stationary policies are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has first been used to study the ASPC criterion. Also, the approach provided here is slightly different from the “optimality equation approach” widely used in the previous literature. On the other hand, under mild assumptions we show that average expected cost optimality and ASPC-optimality are equivalent. Finally, we use a controlled queueing system to illustrate our results. 相似文献

20.

Ergodic control of diffusions with compound Poisson jumps under a general structural hypothesis

《Stochastic Processes and their Applications》2020,130(11):6733-6756

We study the ergodic control problem for a class of controlled jump diffusions driven by a compound Poisson process. This extends the results of Arapostathis et al. (2019) to running costs that are not near-monotone. This generality is needed in applications such as optimal scheduling of large-scale parallel server networks.We provide a full characterizations of optimality via the Hamilton–Jacobi–Bellman (HJB) equation, for which we additionally exhibit regularity of solutions under mild hypotheses. In addition, we show that optimal stationary Markov controls are a.s. pathwise optimal. Lastly, we show that one can fix a stable control outside a compact set and obtain near-optimal solutions by solving the HJB on a sufficiently large bounded domain. This is useful for constructing asymptotically optimal scheduling policies for multiclass parallel server networks. 相似文献