首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 656 毫秒
1.
Discrete time countable state Markov decision processes with finite decision sets and bounded costs are considered. Conditions are given under which an unbounded solution to the average cost optimality equation exists and yields an optimal stationary policy. A new form of the optimality equation is derived for the case in which every stationary policy gives rise to an ergodic Markov chain.  相似文献   

2.
Average cost Markov decision processes (MDPs) with compact state and action spaces and bounded lower semicontinuous cost functions are considered. Kurano [7] has treated the general case in which several ergodic classes and a transient set are permitted for the Markov process induced by any randomized stationary policy under the hypothesis of Doeblin and showed the existence of a minimum pair of state and policy. This paper considers the same case as that discussed in Kurano [7] and proves some new results which give the existence theorem of an optimal stationary policy under some reasonable conditions.  相似文献   

3.
This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature.  相似文献   

4.
We deal with countable state Markov decision processes with finite action sets and (possibly) unbounded costs. Assuming the existence of an expected average cost optimal stationary policyf, with expected average costg, when canf andg be found using undiscounted value iteration? We give assumptions guaranteeing the convergence of a quantity related tong?Ν n (i), whereΝ n (i) is the minimum expectedn-stage cost when the process starts in statei. The theory is applied to a queueing system with variable service rates and to a queueing system with variable arrival parameter.  相似文献   

5.
Move-to-front rule is a heuristic updating a list of n items according to requests. Items are required with unknown probabilities (or popularities). The induced Markov chain is known to be ergodic. A main problem is the study of the distribution of the search cost defined as the position of the required item. Here we first establish the link between two recent papers of Barrera and Paroissin and Lijoi and Pruenster that both extend the results proved by Kingman on the expected stationary search cost. By combining the results contained in these papers, we obtain the limiting behavior for any moments of the stationary search cost as n tends to infinity.  相似文献   

6.
We consider the optimization of the variance of the sum of costs as well as that of an average expected cost in Markov decision processes with unbounded cost. In case of general state and action space, we find the stationary policy which makes the average variance as small as possible in the class of policies which are ε-optimal in an average expected cost.  相似文献   

7.
郭先平 《数学学报》2001,44(2):333-342
本文考虑具有 Borel状态空间和行动空间非平稳 MDP的平均方差准则.首先,在遍历条件下,利用最优方程,证明了关于平均期望目标最优马氏策略的存在性.然后,通过构造新的模型,利用马氏过程的理论,进一步证明了在关于平均期望目标是最优的一类马氏策略中,存在一个马氏策略使得平均方差达到最小.作为本文的特例还得到了 Dynkin E. B.和 Yushkevich A. A.及 Kurano M.等中的主要结果.  相似文献   

8.
This paper studies discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains (CMC) with denumerable state space and compact action space, and with an infinite planning horizon. Recently, there has been a renewed interest in CMC with a long-run, expected average cost (AC) optimality criterion. A classical approach to study average optimality consists in formulating the AC case as a limit of the discounted cost (DC) case, as the discount factor increases to 1, i.e., as the discounting effectvanishes. This approach has been rekindled in recent years, with the introduction by Sennott and others of conditions under which AC optimal stationary policies are shown to exist. However, AC optimality is a rather underselective criterion, which completely neglects the finite-time evolution of the controlled process. Our main interest in this paper is to study the relation between the notions of AC optimality andstrong average cost (SAC) optimality. The latter criterion is introduced to asses the performance of a policy over long but finite horizons, as well as in the long-run average sense. We show that for bounded one-stage cost functions, Sennott's conditions are sufficient to guarantee thatevery AC optimal policy is also SAC optimal. On the other hand, a detailed counterexample is given that shows that the latter result does not extend to the case of unbounded cost functions. In this counterexample, Sennott's conditions are verified and a policy is exhibited that is both average and Blackwell optimal and satisfies the average cost inequality.  相似文献   

9.
In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our conditions. Supported by NSFC, NCET and RFDP.  相似文献   

10.
Continuous time Markovian decision models with countable state space are investigated. The existence of an optimal stationary policy is established for the expected average return criterion function. It is shown that the expected average return can be expressed as an expected discounted return of a related Markovian decision process. A policy iteration method is given which converges to an optimal deterministic policy, the policy so obtained is shown optimal over all Markov policies.  相似文献   

11.
This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy.  相似文献   

12.
For an ergodic continuous-time birth and death process on the nonnegative integers, a well-known theorem states that the hitting time T 0,n starting from state 0 to state n has the same distribution as the sum of n independent exponential random variables. Firstly, we generalize this theorem to an absorbing birth and death process (say, with state ?1 absorbing) to derive the distribution of T 0,n . We then give explicit formulas for Laplace transforms of hitting times between any two states for an ergodic or absorbing birth and death process. Secondly, these results are all extended to birth and death processes on the nonnegative integers with ?? an exit, entrance, or regular boundary. Finally, we apply these formulas to fastest strong stationary times for strongly ergodic birth and death processes.  相似文献   

13.
We study infinite-horizon asymptotic average optimality for parallel server networks with multiple classes of jobs and multiple server pools in the Halfin–Whitt regime. Three control formulations are considered: (1) minimizing the queueing and idleness cost, (2) minimizing the queueing cost under constraints on idleness at each server pool, and (3) fairly allocating the idle servers among different server pools. For the third problem, we consider a class of bounded-queue, bounded-state (BQBS) stable networks, in which any moment of the state is bounded by that of the queue only (for both the limiting diffusion and diffusion-scaled state processes). We show that the optimal values for the diffusion-scaled state processes converge to the corresponding values of the ergodic control problems for the limiting diffusion. We present a family of state-dependent Markov balanced saturation policies (BSPs) that stabilize the controlled diffusion-scaled state processes. It is shown that under these policies, the diffusion-scaled state process is exponentially ergodic, provided that at least one class of jobs has a positive abandonment rate. We also establish useful moment bounds, and study the ergodic properties of the diffusion-scaled state processes, which play a crucial role in proving the asymptotic optimality.  相似文献   

14.
We analyze an infinite horizon, single product, continuous review model in which pricing and inventory decisions are made simultaneously and ordering cost includes a fixed cost. We show that there exists a stationary (s,S) inventory policy maximizing the expected discounted or expected average profit under general conditions.  相似文献   

15.
We construct a stationary Markov process with trivial tail σ-field and a nondegenerate observation process such that the corresponding nonlinear filtering process is not uniquely ergodic. This settles in the negative a conjecture of the author in the ergodic theory of nonlinear filters arising from an erroneous proof in the classic paper of H. Kunita (1971), wherein an exchange of intersection and supremum of σ-fields is taken for granted.  相似文献   

16.
The paper deals with continuous time Markov decision processes on a fairly general state space. The economic criterion is the long-run average return. A set of conditions is shown to be sufficient for a constant g to be optimal average return and a stationary policy π1 to be optimal. This condition is shown to be satisfied under appropriate assumptions on the optimal discounted return function. A policy improvement algorithm is proposed and its convergence to an optimal policy is proved.  相似文献   

17.
We consider a multi-queue multi-server system with n servers (processors) and m queues. At the system there arrives a stationary and ergodic stream of m different types of requests with service requirements which are served according to the following k-limited head of the line processor sharing discipline: The first k requests at the head of the m queues are served in processor sharing by the n processors, where each request may receive at most the capacity of one processor. By means of sample path analysis and Loynes’ monotonicity method, a stationary and ergodic state process is constructed, and a necessary as well as a sufficient condition for the stability of the m separate queues are given, which are tight within the class of all stationary ergodic inputs. These conditions lead to tight necessary and sufficient conditions for the whole system, also in case of permanent customers, generalizing an earlier result by the authors for the case of n=k=1. This work was supported by a grant from the Siemens AG.  相似文献   

18.
Using matrix algebra we obtain a general equation for the sum, normalized with suitable constants, of all the expected hitting times in an ergodic Markov chain. This equation yields as corollaries, among others, Broder and Karlin’s formula, Foster’s nth formula and an expression of the Kirchhoff index in terms of the eigenvalues of the Laplacian.  相似文献   

19.
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957).  相似文献   

20.
本文研究了在一般状态空间具有平均费用的非平稳Markov决策过程,把在平稳情形用补充的折扣模型的最优方程来建立平均费用的最优方程的结果,推广到非平稳的情形.利用这个结果证明了最优策略的存在性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号