首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 280 毫秒
1.
We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of finite horizon MDPs to the infinite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with infinite state space, (3) convergence of MDPs as the discount factor goes to a limit. In all these cases we establish the convergence of optimal values and policies. Moreover, based on the optimal policy for the limiting problem, we construct policies which are almost optimal for the other (approximating) problems. Based on the convergence of MDPs with a truncated state space to the problem with infinite state space, we show that an optimal stationary policy exists such that the number of randomisations it uses is less or equal to the number of constraints plus one. We finally apply the results to a dynamic scheduling problem.This work was partially supported by the Chateaubriand fellowship from the French embassy in Israel and by the European Grant BRA-QMIPS of CEC DG XIII  相似文献   

2.
This paper deals with Markov Decision Processes (MDPs) on Borel spaces with possibly unbounded costs. The criterion to be optimized is the expected total cost with a random horizon of infinite support. In this paper, it is observed that this performance criterion is equivalent to the expected total discounted cost with an infinite horizon and a varying-time discount factor. Then, the optimal value function and the optimal policy are characterized through some suitable versions of the Dynamic Programming Equation. Moreover, it is proved that the optimal value function of the optimal control problem with a random horizon can be bounded from above by the optimal value function of a discounted optimal control problem with a fixed discount factor. In this case, the discount factor is defined in an adequate way by the parameters introduced for the study of the optimal control problem with a random horizon. To illustrate the theory developed, a version of the Linear-Quadratic model with a random horizon and a Logarithm Consumption-Investment model are presented.  相似文献   

3.
We study optimal control of Markov processes with age-dependent transition rates. The control policy is chosen continuously over time based on the state of the process and its age. We study infinite horizon discounted cost and infinite horizon average cost problems. Our approach is via the construction of an equivalent semi-Markov decision process. We characterise the value function and optimal controls for both discounted and average cost cases.  相似文献   

4.
This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non-randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application is presented.  相似文献   

5.
In this paper we are concerned with the existence of optimal stationary policies for infinite-horizon risk-sensitive Markov control processes with denumerable state space, unbounded cost function, and long-run average cost. Introducing a discounted cost dynamic game, we prove that its value function satisfies an Isaacs equation, and its relationship with the risk-sensitive control problem is studied. Using the vanishing discount approach, we prove that the risk-sensitive dynamic programming inequality holds, and derive an optimal stationary policy. Accepted 1 October 1997  相似文献   

6.
We study risk-sensitive control of continuous time Markov chains taking values in discrete state space. We study both finite and infinite horizon problems. In the finite horizon problem we characterize the value function via Hamilton Jacobi Bellman equation and obtain an optimal Markov control. We do the same for infinite horizon discounted cost case. In the infinite horizon average cost case we establish the existence of an optimal stationary control under certain Lyapunov condition. We also develop a policy iteration algorithm for finding an optimal control.  相似文献   

7.
This paper studies discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains (CMC) with denumerable state space and compact action space, and with an infinite planning horizon. Recently, there has been a renewed interest in CMC with a long-run, expected average cost (AC) optimality criterion. A classical approach to study average optimality consists in formulating the AC case as a limit of the discounted cost (DC) case, as the discount factor increases to 1, i.e., as the discounting effectvanishes. This approach has been rekindled in recent years, with the introduction by Sennott and others of conditions under which AC optimal stationary policies are shown to exist. However, AC optimality is a rather underselective criterion, which completely neglects the finite-time evolution of the controlled process. Our main interest in this paper is to study the relation between the notions of AC optimality andstrong average cost (SAC) optimality. The latter criterion is introduced to asses the performance of a policy over long but finite horizons, as well as in the long-run average sense. We show that for bounded one-stage cost functions, Sennott's conditions are sufficient to guarantee thatevery AC optimal policy is also SAC optimal. On the other hand, a detailed counterexample is given that shows that the latter result does not extend to the case of unbounded cost functions. In this counterexample, Sennott's conditions are verified and a policy is exhibited that is both average and Blackwell optimal and satisfies the average cost inequality.  相似文献   

8.
We consider a discrete time Markov Decision Process (MDP) under the discounted payoff criterion in the presence of additional discounted cost constraints. We study the sensitivity of optimal Stationary Randomized (SR) policies in this setting with respect to the upper bound on the discounted cost constraint functionals. We show that such sensitivity analysis leads to an improved version of the Feinberg–Shwartz algorithm (Math Oper Res 21(4):922–945, 1996) for finding optimal policies that are ultimately stationary and deterministic.  相似文献   

9.
We study an infinite horizon optimal stopping Markov problem which is either undiscounted (total reward) or with a general Markovian discount rate. Using ergodic properties of the underlying Markov process, we establish the feasibility of the stopping problem and prove the existence of optimal and εε-optimal stopping times. We show the continuity of the value function and its variational characterisation (in the viscosity sense) under different sets of assumptions satisfied by large classes of diffusion and jump–diffusion processes. In the case of a general discounted problem we relax a classical assumption that the discount rate is uniformly separated from zero.  相似文献   

10.
We consider the constrained optimization of a finite-state, finite action Markov chain. In the adaptive problem, the transition probabilities are assumed to be unknown, and no prior distribution on their values is given. We consider constrained optimization problems in terms of several cost criteria which are asymptotic in nature. For these criteria we show that it is possible to achieve the same optimal cost as in the non-adaptive case.We first formulate a constrained optimization problem under each of the cost criteria and establish the existence of optimal stationary policies.Since the adaptive problem is inherently non-stationary, we suggest a class ofAsymptotically Stationary (AS) policies, and show that, under each of the cost criteria, the costs of an AS policy depend only on its limiting behavior. This property implies that there exist optimal AS policies. A method for generating adaptive policies is then suggested, which leads to strongly consistent estimators for the unknown transition probabilities. A way to guarantee that these policies are also optimal is to couple them with the adaptive algorithm of [3]. This leads to optimal policies for each of the adaptive constrained optimization problems under discussion.This work was supported in part through United States-Israel Binational Science Foundation Grant BSF 85-00306.  相似文献   

11.
We consider continuous-time Markov decision processes in Polish spaces. The performance of a control policy is measured by the expected discounted reward criterion associated with state-dependent discount factors. All underlying Markov processes are determined by the given transition rates which are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. By using the dynamic programming approach, we establish the discounted reward optimality equation (DROE) and the existence and uniqueness of its solutions. Under suitable conditions, we also obtain a discounted optimal stationary policy which is optimal in the class of all randomized stationary policies. Moreover, when the transition rates are uniformly bounded, we provide an algorithm to compute (or?at least to approximate) the discounted reward optimal value function as well as a discounted optimal stationary policy. Finally, we use an example to illustrate our results. Specially, we first derive an explicit and exact solution to the DROE and an explicit expression of a discounted optimal stationary policy for such an example.  相似文献   

12.
This paper considers a two-facility supply chain for a single product in which facility 1 orders the product from facility 2 and facility 2 orders the product from a supplier in each period. The orders placed by each facility are delivered in two possible nonnegative integer numbers of periods. The difference between them is one period. Random demands in each period arise only at facility 1. There are physical storage constraints at both facilities in each period. The objective of the supply chain is to find an ordering policy that minimizes the expected cost over a finite horizon and the discounted stationary expected cost over an infinite horizon. We characterize the structure of the minimum expected cost and the optimal ordering policy for both the finite and the discounted stationary infinite horizon problems.  相似文献   

13.
This paper deals with constrained Markov decision processes (MDPs) with first passage criteria. The objective is to maximize the expected reward obtained during a first passage time to some target set, and a constraint is imposed on the associated expected cost over this first passage time. The state space is denumerable, and the rewards/costs are possibly unbounded. In addition, the discount factor is state-action dependent and is allowed to be equal to one. We develop suitable conditions for the existence of a constrained optimal policy, which are generalizations of those for constrained MDPs with the standard discount criteria. Moreover, it is revealed that the constrained optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our results, which exhibits some advantage of our optimality conditions.  相似文献   

14.
The main purpose of this paper is to investigate the asymptotic behavior of the discounted risk-sensitive control problem for periodic diffusion processes when the discount factor $\alpha$ goes to zero. If $u_\alpha(\theta,x)$ denotes the optimal cost function, $\theta$ being the risk factor, then it is shown that $\lim_{\alpha\to 0}\alpha u_\alpha(\theta,x)=\xi(\theta)$ where $\xi(\theta)$ is the average on $]0,\theta[$ of the optimal cost of the (usual) infinite horizon risk-sensitive control problem.  相似文献   

15.
We study a classical stochastic optimal control problem with constraints and discounted payoff in an infinite horizon setting. The main result of the present paper lies in the fact that this optimal control problem is shown to have the same value as a linear optimization problem stated on some appropriate space of probability measures. This enables one to derive a dual formulation that appears to be strongly connected to the notion of (viscosity sub) solution to a suitable Hamilton-Jacobi-Bellman equation. We also discuss relation with long-time average problems.  相似文献   

16.
Yi Zhang 《TOP》2013,21(2):378-408
In this paper we develop the convex analytic approach to a discounted discrete-time Markov decision process (DTMDP) in Borel state and action spaces with N constraints. Unlike the classic discounted models, we allow a non-constant discount factor. After defining and characterizing the corresponding occupation measures, the original constrained DTMDP is written as a convex program in the space of occupation measures, whose compactness and convexity we show. In particular, we prove that every extreme point of the space of occupation measures can be generated by a deterministic stationary policy for the DTMDP. For the resulting convex program, we prove that it admits a solution that can be expressed as a convex combination of N+1 extreme points of the space of occupation measures. One of its consequences is the existence of a randomized stationary optimal policy for the original constrained DTMDP.  相似文献   

17.
In this paper we provide some sufficient conditions for the differentiability of the value function in a class of infinite-horizon continuous-time models of convex optimization arising in economics. We dispense with the assumption of interior optimal paths. This assumption is quite unnatural in constrained optimization, and is usually hard to check in applications. The differentiability of the value function is used to prove Bellman’s equation as well as the existence and continuity of the optimal feedback policy. We also establish the uniqueness of the vector of dual variables. These results become useful for the characterization and computation of optimal solutions.  相似文献   

18.
An optimal control problem is formulated with a simple epidemic model in which the control of the epidemic is effected by varying the scale of the quarantine program in a way which minimizes a discounted linear cost over an infinite horizon. An important function of the problem parameters is identified. It is shown that if this function has a value of less than or equal to one, then the optimal policy is not to quarantine at all. While if this functions assume a value in excess of one, then the optimal policy is not to quarantine at all if the initial fraction of infectives is sufficiently high; otherwise, it is optimal to have a full scale quarantine program. Slight modification in these policies are required for the finite horizon version of the problem.  相似文献   

19.
We generalize the geometric discount of finite discounted cost Markov Decision Processes to “exponentially representable”discount functions, prove existence of optimal policies which are stationary from some time N onward, and provide an algorithm for their computation. Outside this class, optimal “N-stationary” policies in general do not exist.  相似文献   

20.
This paper deals with minimization of the variances of the total discounted costs for constrained Continuous-Time Markov Decision Processes (CTMDPs). The costs consist of cumulative costs incurred between jumps and instant costs incurred at jump epochs. We interpret discounting as an exponentially distributed stopping time. According to existing theory, for the expected total discounted costs optimal policies exist in the forms of randomized stationary and switching stationary policies. While the former is typically unique, the latter forms a finite set whose number of elements grows exponentially with the number of constraints. This paper investigates the problem when the process stops immediately after the first jump. For costs up to the first jump we provide an index for selection of actions by switching stationary policies and show that the indexed switching policy achieves a smaller variance than the randomized stationary policy. For problems without instant costs, the indexed switching policy achieves the minimum variance of costs up to the first jump among all the equivalent switching policies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号