首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper deals with Blackwell optimality for continuous-time controlled Markov chains with compact Borel action space, and possibly unbounded reward (or cost) rates and unbounded transition rates. We prove the existence of a deterministic stationary policy which is Blackwell optimal in the class of all admissible (nonstationary) Markov policies, thus extending previous results that analyzed Blackwell optimality in the class of stationary policies. We compare our assumptions to the corresponding ones for discrete-time Markov controlled processes.  相似文献   

2.
We consider the Hamiltonian cycle problem embedded in singularly perturbed (controlled) Markov chains. We also consider a functional on the space of stationary policies of the process that consists of the (1,1)‐entry of the fundamental matrices of the Markov chains induced by the same policies. In particular, we focus on the subset of these policies that induce doubly stochastic probability transition matrices, which we refer to as the “doubly stochastic policies.” We show that when the perturbation parameter ? is sufficiently small the minimum of this functional over the space of the doubly stochastic policies is attained very close to a Hamiltonian cycle, provided that the graph is Hamiltonian. We also derive precise analytical expressions for the elements of the fundamental matrix that lend themselves to probabilistic interpretation as well as asymptotic expressions for the first diagonal element, for a variety of deterministic policies that are of special interest, including those that correspond to Hamiltonian cycles. © 2004 Wiley Periodicals, Inc. Random Struct. Alg., 2004  相似文献   

3.
We consider the Hamiltonian cycle problem embedded in singularly perturbed (controlled) Markov chains. We also consider a functional on the space of stationary policies of the process that consists of the (1,1)‐entry of the fundamental matrices of the Markov chains induced by these policies. We focus on the subset of these policies that induce doubly stochastic probability transition matrices which we refer to as the “doubly stochastic policies.” We show that when the perturbation parameter, ε, is sufficiently small, the minimum of this functional over the space of the doubly stochastic policies is attained at a Hamiltonian cycle, provided that the graph is Hamiltonian. We also show that when the graph is non‐Hamiltonian, the above minimum is strictly greater than that in a Hamiltonian case. We call the size of this difference the “Hamiltonicity Gap” and derive a conservative lower bound for this gap. Our results imply that the Hamiltonian cycle problem is equivalent to the problem of minimizing the variance of the first hitting time of the home node, over doubly stochastic policies. © 2008 Wiley Periodicals, Inc. Random Struct. Alg., 2009  相似文献   

4.
郭先平 《数学学报》2000,43(2):269-274
本文考虑的是可数状态空间任意行动空间非平稳MDP平均模型,借鉴于Feinberg E. A(1994)的思想,提出了比马氏策略和 Feinberg E. A的(f,B)-生成策略和更为广泛的(G,B)-生成策略的概念,在弱遍历条件下,用概率分析的方法,证明了一致最优(G,B)-生成策略的存在性.从而将 Feinberg E. A.(1994)的主要结果推广到非平衡可数状态空间情形.  相似文献   

5.
6.
We consider an accessibility index for the states of a discrete-time, ergodic, homogeneous Markov chain on a finite state space; this index is naturally associated with the random walk centrality introduced by Noh and Reiger (2004) for a random walk on a connected graph. We observe that the vector of accessibility indices provides a partition of Kemeny’s constant for the Markov chain. We provide three characterizations of this accessibility index: one in terms of the first return time to the state in question, and two in terms of the transition matrix associated with the Markov chain. Several bounds are provided on the accessibility index in terms of the eigenvalues of the transition matrix and the stationary vector, and the bounds are shown to be tight. The behaviour of the accessibility index under perturbation of the transition matrix is investigated, and examples exhibiting some counter-intuitive behaviour are presented. Finally, we characterize the situation in which the accessibility indices for all states coincide.  相似文献   

7.
Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy  相似文献   

8.
This paper investigates an optimal sequencing and dynamic pricing problem for a two-class queueing system. Using a Markov Decision Process based model, we obtain structural characterizations of optimal policies. In particular, it is shown that the optimal pricing policy depends on the entire queue length vector but some monotonicity results prevail as the composition of this vector changes. A numerical study finds that static pricing policies may have significant suboptimality but simple dynamic pricing policies perform well in most situations.  相似文献   

9.
This paper is the first part of a study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove here the existence of deterministic stationary policies which are Blackwell optimal in the class of all, in general randomized, stationary policies. We establish also a lexicographical policy improvement algorithm leading to Blackwell optimal policies and the relation between such policies and the Blackwell optimality equation. Our technique is a combination of the weighted norms approach developed in Dekker and Hordijk (1988) for countable models with unbounded rewards and of the weak-strong topology approach used in Yushkevich (1997a) for Borel models with bounded rewards.  相似文献   

10.
郭先平 《数学学报》2001,44(2):333-342
本文考虑具有 Borel状态空间和行动空间非平稳 MDP的平均方差准则.首先,在遍历条件下,利用最优方程,证明了关于平均期望目标最优马氏策略的存在性.然后,通过构造新的模型,利用马氏过程的理论,进一步证明了在关于平均期望目标是最优的一类马氏策略中,存在一个马氏策略使得平均方差达到最小.作为本文的特例还得到了 Dynkin E. B.和 Yushkevich A. A.及 Kurano M.等中的主要结果.  相似文献   

11.
We consider Markov semigroups on the cone of positive finite measures on a complete separable metric space. Such a semigroup extends to a semigroup of linear operators on the vector space of measures that typically fails to be strongly continuous for the total variation norm. First we characterise when the restriction of a Markov semigroup to an invariant L 1-space is strongly continuous. Aided by this result we provide several characterisations of the subspace of strong continuity for the total variation norm. We prove that this subspace is a projection band in the Banach lattice of finite measures, and consequently obtain a direct sum decomposition.  相似文献   

12.
This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by different classes of policies; we then focus on stationary policies and on mixed deterministic policies and present conditions under which optimal policies exist within these classes. We conclude by introducing an equivalent infinite Linear Program.  相似文献   

13.
Gianfranco Ciardo 《PAMM》2007,7(1):1080705-1080706
Decision diagrams of various types can be used to encode the exact state space and transition rate matrix of large Markov models. However, the exact solution of such models still requires to store at least one real vector with one entry per reachable state, a formidable limitation to the practical use of these encodings. Thus, we discuss automatic techniques for the approximate computation of performance measures when the Markov model can be compactly encoded but not exactly solved. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

14.
We introduce upper and lower envelopes for sets of measures on an arbitrary topological space, which are then used to give a tightness criterion. These concepts are applied to show the existence of optimal policies for a class of Markov control processes. Accepted 22 May 1998  相似文献   

15.
We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of finite horizon MDPs to the infinite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with infinite state space, (3) convergence of MDPs as the discount factor goes to a limit. In all these cases we establish the convergence of optimal values and policies. Moreover, based on the optimal policy for the limiting problem, we construct policies which are almost optimal for the other (approximating) problems. Based on the convergence of MDPs with a truncated state space to the problem with infinite state space, we show that an optimal stationary policy exists such that the number of randomisations it uses is less or equal to the number of constraints plus one. We finally apply the results to a dynamic scheduling problem.This work was partially supported by the Chateaubriand fellowship from the French embassy in Israel and by the European Grant BRA-QMIPS of CEC DG XIII  相似文献   

16.
We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms.  相似文献   

17.
Finite and infinite planning horizon Markov decision problems are formulated for a class of jump processes with general state and action spaces and controls which are measurable functions on the time axis taking values in an appropriate metrizable vector space. For the finite horizon problem, the maximum expected reward is the unique solution, which exists, of a certain differential equation and is a strongly continuous function in the space of upper semi-continuous functions. A necessary and sufficient condition is provided for an admissible control to be optimal, and a sufficient condition is provided for the existence of a measurable optimal policy. For the infinite horizon problem, the maximum expected total reward is the fixed point of a certain operator on the space of upper semi-continuous functions. A stationary policy is optimal over all measurable policies in the transient and discounted cases as well as, with certain added conditions, in the positive and negative cases.  相似文献   

18.
We consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call “approximate receding horizon control.” We first analyze the performance of the approximate receding horizon control for infinite-horizon average reward under an ergodicity assumption, which also generalizes the result obtained by White (J. Oper. Res. Soc. 33 (1982) 253-259). We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the sub-MDP. The first control policy is based on a finite-horizon approximation of Howard's policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation.  相似文献   

19.
20.
We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号