首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we are concerned with the existence of optimal stationary policies for infinite-horizon risk-sensitive Markov control processes with denumerable state space, unbounded cost function, and long-run average cost. Introducing a discounted cost dynamic game, we prove that its value function satisfies an Isaacs equation, and its relationship with the risk-sensitive control problem is studied. Using the vanishing discount approach, we prove that the risk-sensitive dynamic programming inequality holds, and derive an optimal stationary policy. Accepted 1 October 1997  相似文献   

2.
Discrete time countable state Markov decision processes with finite decision sets and bounded costs are considered. Conditions are given under which an unbounded solution to the average cost optimality equation exists and yields an optimal stationary policy. A new form of the optimality equation is derived for the case in which every stationary policy gives rise to an ergodic Markov chain.  相似文献   

3.
本文对可数状态集、非空决策集、报酬无界的平均准则马氏决策过程,提出了一组新的条件,在此条件下存在(ε)最优平稳策略,且当最优不等式中的和有定义时最优不等式也成立。  相似文献   

4.
This paper deals with discrete-time Markov control processes withBorel state and control spaces, with possiblyunbounded costs andnoncompact control constraint sets, and the average cost criterion. Conditions are given for the convergence of the value iteration algorithm to the optimal average cost, and for a sequence of finite-horizon optimal policies to have an accumulation point which is average cost optimal.This research was partially supported by the Consejo Nacional de Ciencia y Tecnología (CONACyT) under grant 1332-E9206.  相似文献   

5.
This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy.  相似文献   

6.
The main goal of this paper is to apply the so-called policy iteration algorithm (PIA) for the long run average continuous control problem of piecewise deterministic Markov processes (PDMP’s) taking values in a general Borel space and with compact action space depending on the state variable. In order to do that we first derive some important properties for a pseudo-Poisson equation associated to the problem. In the sequence it is shown that the convergence of the PIA to a solution satisfying the optimality equation holds under some classical hypotheses and that this optimal solution yields to an optimal control strategy for the average control problem for the continuous-time PDMP in a feedback form.  相似文献   

7.
In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our conditions. Supported by NSFC, NCET and RFDP.  相似文献   

8.
This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by different classes of policies; we then focus on stationary policies and on mixed deterministic policies and present conditions under which optimal policies exist within these classes. We conclude by introducing an equivalent infinite Linear Program.  相似文献   

9.
This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the equivalent semi-Markov decision processes (SMDPs). The equivalence is embodied in the expected discounted cost functions of SMPs and SMDPs, that is, every stopping time of SMPs can induce a policy of SMDPs such that the value functions are equal, and vice versa. The existence of the optimal stopping time of SMPs is proved by this equivalence relation. Next, we give the optimality equation of the value function and develop an effective iterative algorithm for computing it. Moreover, we show that the optimal and ε-optimal stopping time can be characterized by the hitting time of the special sets. Finally, to illustrate the validity of our results, an example of a maintenance system is presented in the end.  相似文献   

10.
For dynamic scheduling of multi-class systems where backorder cost is incurred per unit backordered regardless of the time needed to satisfy backordered demand, the following models are considered: the cost model to minimize the sum of expected average inventory holding and backorder costs and the service model to minimize expected average inventory holding cost under an aggregate fill rate constraint. Use of aggregate fill rate constraint in the service model instead of an individual fill rate constraint for each class is justified by deriving equivalence relations between the considered cost and service models. Based on the numerical investigation that the optimal policy for the cost model is a base-stock policy with switching curves and fixed base-stock levels, an alternative service model is considered over the class of base-stock controlled dynamic scheduling policies to minimize the total inventory (base-stock) investment under an aggregate fill rate constraint. The policy that solves this alternative model is proposed as an approximation of the optimal policy of the original cost and the equivalent service models. Very accurate heuristics are devised to approximate the proposed policy for given base-stock levels. Comparison with base-stock controlled First Come First Served (FCFS) and Longest Queue (LQ) policies and an extension of LQ policy (Δ policy) shows that the proposed policy performs much better to solve the service models under consideration, especially when the traffic intensity is high.  相似文献   

11.
The aim of the paper is to show that Lyapunov-like ergodicity conditions on Markov decision processes with Borel state space and possibly unbounded cost provide the approximation of an average cost optimal policy by solvingn-stage optimization problems (n = 1, 2, ...). The used approach ensures the exponential rate of convergence. The approximation of this type would be useful to find adaptive procedures of control and to estimate stability of an optimal control under disturbances of the transition probability.Research supported in part by Consejo Nacional de Ciencia y Tecnologia (CONACYT) under grant 0635P-E9506.Research supported by Fondo del Sistema de Investigatión del Mar de Cortés under Grant SIMAC/94/CT-005.  相似文献   

12.
This paper studies discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains (CMC) with denumerable state space and compact action space, and with an infinite planning horizon. Recently, there has been a renewed interest in CMC with a long-run, expected average cost (AC) optimality criterion. A classical approach to study average optimality consists in formulating the AC case as a limit of the discounted cost (DC) case, as the discount factor increases to 1, i.e., as the discounting effectvanishes. This approach has been rekindled in recent years, with the introduction by Sennott and others of conditions under which AC optimal stationary policies are shown to exist. However, AC optimality is a rather underselective criterion, which completely neglects the finite-time evolution of the controlled process. Our main interest in this paper is to study the relation between the notions of AC optimality andstrong average cost (SAC) optimality. The latter criterion is introduced to asses the performance of a policy over long but finite horizons, as well as in the long-run average sense. We show that for bounded one-stage cost functions, Sennott's conditions are sufficient to guarantee thatevery AC optimal policy is also SAC optimal. On the other hand, a detailed counterexample is given that shows that the latter result does not extend to the case of unbounded cost functions. In this counterexample, Sennott's conditions are verified and a policy is exhibited that is both average and Blackwell optimal and satisfies the average cost inequality.  相似文献   

13.
This paper deals with expected average cost (EAC) and discount-sensitive criteria for discrete-time Markov control processes on Borel spaces, with possibly unbounded costs. Conditions are given under which (a) EAC optimality and strong –1-discount optimality are equivalent; (b) strong 0-discount optimality implies bias optimality; and, conversely, under an additional hypothesis, (c) bias optimality implies strong 0-discount optimality. Thus, in particular, as the class of bias optimal policies is nonempty, (c) gives the existence of a strong 0-discount optimal policy, whereas from (b) and (c) we get conditions for bias optimality and strong 0-discount optimality to be equivalent. A detailed example illustrates our results.  相似文献   

14.
This paper deals with constrained Markov decision processes (MDPs) with first passage criteria. The objective is to maximize the expected reward obtained during a first passage time to some target set, and a constraint is imposed on the associated expected cost over this first passage time. The state space is denumerable, and the rewards/costs are possibly unbounded. In addition, the discount factor is state-action dependent and is allowed to be equal to one. We develop suitable conditions for the existence of a constrained optimal policy, which are generalizations of those for constrained MDPs with the standard discount criteria. Moreover, it is revealed that the constrained optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our results, which exhibits some advantage of our optimality conditions.  相似文献   

15.
In this paper we use policy-iteration to explore the behaviour of optimal control policies for lost sales inventory models with the constraint that not more than one replenishment order may be outstanding at any time. Continuous and periodic review, fixed and variable lead times, replenishment order sizes which are constrained to be an integral multiple of some fixed unit of transfer and service level constraint models are all considered. Demand is discrete and, for continuous review, assumed to derive from a compound Poisson process. It is demonstrated that, in general, neither the best (s, S) nor the best (r, Q) policy is optimal but that the best policy from within those classes will have a cost which is generally close to that of the optimal policy obtained by policy iteration. Finally, near-optimal computationally-efficient control procedures for finding (s, S) and (r, Q) policies are proposed and their performance illustrated.  相似文献   

16.
This paper deals with a class of semi-Markov control models with Borel state and control spaces, possibly unbounded costs, and unknown holding times distribution F. Assuming that F does not depend on state-action pairs, we combine suitable methods of statistical estimation of the mean holding time with control procedures to construct an average cost optimal Markovian policy [^(p)]={ft}\hat{\pi}=\{f_{t}\}.  相似文献   

17.
《Optimization》2012,61(2):191-210
We consider in this paper optimal control problems in which some of the constraint sets are unbounded. Firstly we deal with problems in which the control set is unbounded, so that ‘impulses’ are allowed as admissible controls, discontinuous functions as admissible trajectories. The second type of problem treated is that of infinite horizons, the time set being unbounded. Both class of problems are treated in a similar way. Firstly, a problem is transformed into a semi-infinite linear programming problem by embedding the spacesof admissible trajectory-control pairs into spaces of measures. Then this is mapped into an appropriate nonstandard structure, where a near-minimizer is found for the non-standard optimization; this entity is mapped back as a minimizer for the original problem. An appendix is including introducing the basic concepts of nonstandard analysis

Numerical methods are presented for the estimation of the minimizing measure, and the construction of nearly optimal trajectory-control pairs. Examples are given involving multiplicative controls  相似文献   

18.
郭先平  戴永隆 《数学学报》2002,45(1):171-182
本文考虑的是转移速率族任意且费用率函数可能无界的连续时间马尔可夫决策过程的折扣模型.放弃了传统的要求相应于每个策略的 Q -过程唯一等条件,而首次考虑相应每个策略的 Q -过程不一定唯一, 转移速率族也不一定保守, 费用率函数可能无界, 且允许行动空间非空任意的情形. 本文首次用"α-折扣费用最优不等式"更新了传统的α-折扣费用最优方程,并用"最优不等式"和新的方法,不仅证明了传统的主要结果即最优平稳策略的存在性, 而且还进一步探讨了( ∈>0  )-最优平稳策略,具有单调性质的最优平稳策略, 以及(∈≥0) -最优决策过程的存在性, 得到了一些有意义的新结果. 最后, 提供了一个迁移率受控的生灭系统例子, 它满足本文的所有条件, 而传统的假设(见文献[1-14])均不成立.  相似文献   

19.
This paper deals with a one-dimensional controlled diffusion process on a compact interval with reflecting boundaries. The set of available actions is finite and the action can be changed only at countably many stopping times. The cost structure includes both a continuous movement cost rate depending on the state and the action, and a switching cost when the action is changed. The policies are evaluated with respect to the average cost criterion. The problem is solved by looking at, for each stationary policy, an embedded stochastic process corresponding to the state intervals visited in the sequence of switching times. The communicating classes of this process are classified into closed and transient groups and a method of calculating the average cost for the closed and transient classes is given. Also given are conditions to guarantee the optimality of a stationary policy. A Brownian motion control problem with quadratic cost is worked out in detail and the form of an optimal policy is established.  相似文献   

20.
In this paper, we discuss the 2-stage output procedure of a finite dam under the condition that water must be released by a fixed time. From this standpoint, the reservoir model we consider is subject to a sample path constraint and has a more general cost function than the earlier contributions. We analytically derive explicit formulas for the long-run average and the expected total discounted costs for an infinite time span and numerically calculate the optimal control policy. Finally, the optimal policy is compared with one by Zuckerman [1] and the effect of the fixed release time is discussed further.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号