期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Blackwell Optimality in the Class of Markov Policies for Continuous-Time Controlled Markov Chains

Tomás Prieto-Rumeau 《Acta Appl Math》2006,92(1):77-96

This paper deals with Blackwell optimality for continuous-time controlled Markov chains with compact Borel action space, and possibly unbounded reward (or cost) rates and unbounded transition rates. We prove the existence of a deterministic stationary policy which is Blackwell optimal in the class of all admissible (nonstationary) Markov policies, thus extending previous results that analyzed Blackwell optimality in the class of stationary policies. We compare our assumptions to the corresponding ones for discrete-time Markov controlled processes. 相似文献

2.

Directed graphs,Hamiltonicity and doubly stochastic matrices

Vivek S. Borkar Vladimir Ejov Jerzy A. Filar 《Random Structures and Algorithms》2004,25(4):376-395

We consider the Hamiltonian cycle problem embedded in singularly perturbed (controlled) Markov chains. We also consider a functional on the space of stationary policies of the process that consists of the (1,1)‐entry of the fundamental matrices of the Markov chains induced by the same policies. In particular, we focus on the subset of these policies that induce doubly stochastic probability transition matrices, which we refer to as the “doubly stochastic policies.” We show that when the perturbation parameter ? is sufficiently small the minimum of this functional over the space of the doubly stochastic policies is attained very close to a Hamiltonian cycle, provided that the graph is Hamiltonian. We also derive precise analytical expressions for the elements of the fundamental matrix that lend themselves to probabilistic interpretation as well as asymptotic expressions for the first diagonal element, for a variety of deterministic policies that are of special interest, including those that correspond to Hamiltonian cycles. © 2004 Wiley Periodicals, Inc. Random Struct. Alg., 2004 相似文献

3.

On the Hamiltonicity Gap and doubly stochastic matrices

Vivek S. Borkar Vladimir Ejov Jerzy A. Filar 《Random Structures and Algorithms》2009,34(4):502-519

We consider the Hamiltonian cycle problem embedded in singularly perturbed (controlled) Markov chains. We also consider a functional on the space of stationary policies of the process that consists of the (1,1)‐entry of the fundamental matrices of the Markov chains induced by these policies. We focus on the subset of these policies that induce doubly stochastic probability transition matrices which we refer to as the “doubly stochastic policies.” We show that when the perturbation parameter, ε, is sufficiently small, the minimum of this functional over the space of the doubly stochastic policies is attained at a Hamiltonian cycle, provided that the graph is Hamiltonian. We also show that when the graph is non‐Hamiltonian, the above minimum is strictly greater than that in a Hamiltonian case. We call the size of this difference the “Hamiltonicity Gap” and derive a conservative lower bound for this gap. Our results imply that the Hamiltonian cycle problem is equivalent to the problem of minimizing the variance of the first hitting time of the home node, over doubly stochastic policies. © 2008 Wiley Periodicals, Inc. Random Struct. Alg., 2009 相似文献

4.

非平稳MDP平均模型── 一致最优(G,B)-生成策略的存在性

郭先平《数学学报》2000,43(2):269-274

本文考虑的是可数状态空间任意行动空间非平稳ＭＤＰ平均模型,借鉴于ＦｅｉｎｂｅｒｇＥ．Ａ（１９９４）的思想,提出了比马氏策略和ＦｅｉｎｂｅｒｇＥ．Ａ的（ｆ,Ｂ）－生成策略和更为广泛的（Ｇ,Ｂ）－生成策略的概念,在弱遍历条件下,用概率分析的方法,证明了一致最优（Ｇ,Ｂ）－生成策略的存在性．从而将ＦｅｉｎｂｅｒｇＥ．Ａ．（１９９４）的主要结果推广到非平衡可数状态空间情形．相似文献

5.

Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards

Arie Hordijk Alexander A. Yushkevich 《Mathematical Methods of Operations Research》1999,50(3):421-448

相似文献

6.

Random walk centrality and a partition of Kemeny’s constant

Steve Kirkland 《Czechoslovak Mathematical Journal》2016,66(3):757-775

We consider an accessibility index for the states of a discrete-time, ergodic, homogeneous Markov chain on a finite state space; this index is naturally associated with the random walk centrality introduced by Noh and Reiger (2004) for a random walk on a connected graph. We observe that the vector of accessibility indices provides a partition of Kemeny’s constant for the Markov chain. We provide three characterizations of this accessibility index: one in terms of the first return time to the state in question, and two in terms of the transition matrix associated with the Markov chain. Several bounds are provided on the accessibility index in terms of the eigenvalues of the transition matrix and the stationary vector, and the bounds are shown to be tight. The behaviour of the accessibility index under perturbation of the transition matrix is investigated, and examples exhibiting some counter-intuitive behaviour are presented. Finally, we characterize the situation in which the accessibility indices for all states coincide. 相似文献

7.

Optimal models with maximizing probability of first achieving target value in the preceding stages

林元烈伍从斌康波大《中国科学A辑(英文版)》2003,46(3):396-414

Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy 相似文献

8.

Dynamic pricing and scheduling in a multi-class single-server queueing system

Eren Başar Çil Fikri Karaesmen E. Lerzan Örmeci 《Queueing Systems》2011,67(4):305-331

This paper investigates an optimal sequencing and dynamic pricing problem for a two-class queueing system. Using a Markov Decision Process based model, we obtain structural characterizations of optimal policies. In particular, it is shown that the optimal pricing policy depends on the entire queue length vector but some monotonicity results prevail as the composition of this vector changes. A numerical study finds that static pricing policies may have significant suboptimality but simple dynamic pricing policies perform well in most situations. 相似文献

9.

Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards

Arie Hordijk Alexander A. Yushkevich 《Mathematical Methods of Operations Research》1999,49(1):1-39

This paper is the first part of a study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove here the existence of deterministic stationary policies which are Blackwell optimal in the class of all, in general randomized, stationary policies. We establish also a lexicographical policy improvement algorithm leading to Blackwell optimal policies and the relation between such policies and the Blackwell optimality equation. Our technique is a combination of the weighted norms approach developed in Dekker and Hordijk (1988) for countable models with unbounded rewards and of the weak-strong topology approach used in Yushkevich (1997a) for Borel models with bounded rewards. 相似文献

10.

Borel状态空间非平稳MDP的平均方差准则

郭先平《数学学报》2001,44(2):333-342

本文考虑具有Ｂｏｒｅｌ状态空间和行动空间非平稳ＭＤＰ的平均方差准则．首先,在遍历条件下,利用最优方程,证明了关于平均期望目标最优马氏策略的存在性．然后,通过构造新的模型,利用马氏过程的理论,进一步证明了在关于平均期望目标是最优的一类马氏策略中,存在一个马氏策略使得平均方差达到最小．作为本文的特例还得到了ＤｙｎｋｉｎＥ．Ｂ．和ＹｕｓｈｋｅｖｉｃｈＡ．Ａ．及ＫｕｒａｎｏＭ．等中的主要结果．相似文献

11.

Continuity properties of Markov semigroups and their restrictions to invariant <Emphasis Type="Italic">L</Emphasis><Superscript>1</Superscript>-spaces

Sander C. Hille Daniël T. H. Worm 《Semigroup Forum》2009,79(3):575-600

We consider Markov semigroups on the cone of positive finite measures on a complete separable metric space. Such a semigroup extends to a semigroup of linear operators on the vector space of measures that typically fails to be strongly continuous for the total variation norm. First we characterise when the restriction of a Markov semigroup to an invariant L ¹-space is strongly continuous. Aided by this result we provide several characterisations of the subspace of strong continuity for the total variation norm. We prove that this subspace is a projection band in the Banach lattice of finite measures, and consequently obtain a direct sum decomposition. 相似文献

12.

Constrained Markov decision processes with total cost criteria: Occupation measures and primal LP

Eitan Altman 《Mathematical Methods of Operations Research》1996,43(1):45-72

This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by different classes of policies; we then focus on stationary policies and on mixed deterministic policies and present conditions under which optimal policies exist within these classes. We conclude by introducing an equivalent infinite Linear Program. 相似文献

13.

Decision diagrams for the approximate analysis of Markov models

Gianfranco Ciardo 《PAMM》2007,7(1):1080705-1080706

Decision diagrams of various types can be used to encode the exact state space and transition rate matrix of large Markov models. However, the exact solution of such models still requires to store at least one real vector with one entry per reachable state, a formidable limitation to the practical use of these encodings. Thus, we discuss automatic techniques for the approximate computation of performance measures when the Markov model can be compactly encoded but not exactly solved. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

14.

Envelopes of Sets of Measures, Tightness, and Markov Control Processes

J. González-Hernández O. Hernández-Lerma 《Applied Mathematics and Optimization》1999,40(3):377-392

We introduce upper and lower envelopes for sets of measures on an arbitrary topological space, which are then used to give a tightness criterion. These concepts are applied to show the existence of optimal policies for a class of Markov control processes. Accepted 22 May 1998 相似文献

15.

Asymptotic properties of constrained Markov Decision Processes

Eitan Altman 《Mathematical Methods of Operations Research》1993,37(2):151-170

We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of finite horizon MDPs to the infinite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with infinite state space, (3) convergence of MDPs as the discount factor goes to a limit. In all these cases we establish the convergence of optimal values and policies. Moreover, based on the optimal policy for the limiting problem, we construct policies which are almost optimal for the other (approximating) problems. Based on the convergence of MDPs with a truncated state space to the problem with infinite state space, we show that an optimal stationary policy exists such that the number of randomisations it uses is less or equal to the number of constraints plus one. We finally apply the results to a dynamic scheduling problem.This work was partially supported by the Chateaubriand fellowship from the French embassy in Israel and by the European Grant BRA-QMIPS of CEC DG XIII 相似文献

16.

Algorithmic aspects of mean–variance optimization in Markov decision processes

Shie Mannor John N. Tsitsiklis 《European Journal of Operational Research》2013

We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms. 相似文献

17.

Controlled jump processes

Stanley R. Pliska 《Stochastic Processes and their Applications》1975,3(3):259-282

Finite and infinite planning horizon Markov decision problems are formulated for a class of jump processes with general state and action spaces and controls which are measurable functions on the time axis taking values in an appropriate metrizable vector space. For the finite horizon problem, the maximum expected reward is the unique solution, which exists, of a certain differential equation and is a strongly continuous function in the space of upper semi-continuous functions. A necessary and sufficient condition is provided for an admissible control to be optimal, and a sufficient condition is provided for the existence of a measurable optimal policy. For the infinite horizon problem, the maximum expected total reward is the fixed point of a certain operator on the space of upper semi-continuous functions. A stationary policy is optimal over all measurable policies in the transient and discounted cases as well as, with certain added conditions, in the positive and negative cases. 相似文献

18.

Approximate receding horizon approach for Markov decision processes: average reward case

Hyeong Soo Chang 《Journal of Mathematical Analysis and Applications》2003,286(2):636-651

We consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call “approximate receding horizon control.” We first analyze the performance of the approximate receding horizon control for infinite-horizon average reward under an ergodicity assumption, which also generalizes the result obtained by White (J. Oper. Res. Soc. 33 (1982) 253-259). We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the sub-MDP. The first control policy is based on a finite-horizon approximation of Howard's policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation. 相似文献

19.

Convergence of the optimal values of constrained Markov control processes

Jorge Alvarez-Mena Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2002,55(3):461-484

相似文献

20.

An average-value-at-risk criterion for Markov decision processes with unbounded costs

Qiuli LIU Wai-Ki CHING Junyu ZHANG Hongchu WANG 《Frontiers of Mathematics in China》2022,17(4):673

We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model. 相似文献