首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 682 毫秒
1.
This paper studies discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains (CMC) with denumerable state space and compact action space, and with an infinite planning horizon. Recently, there has been a renewed interest in CMC with a long-run, expected average cost (AC) optimality criterion. A classical approach to study average optimality consists in formulating the AC case as a limit of the discounted cost (DC) case, as the discount factor increases to 1, i.e., as the discounting effectvanishes. This approach has been rekindled in recent years, with the introduction by Sennott and others of conditions under which AC optimal stationary policies are shown to exist. However, AC optimality is a rather underselective criterion, which completely neglects the finite-time evolution of the controlled process. Our main interest in this paper is to study the relation between the notions of AC optimality andstrong average cost (SAC) optimality. The latter criterion is introduced to asses the performance of a policy over long but finite horizons, as well as in the long-run average sense. We show that for bounded one-stage cost functions, Sennott's conditions are sufficient to guarantee thatevery AC optimal policy is also SAC optimal. On the other hand, a detailed counterexample is given that shows that the latter result does not extend to the case of unbounded cost functions. In this counterexample, Sennott's conditions are verified and a policy is exhibited that is both average and Blackwell optimal and satisfies the average cost inequality.  相似文献   

2.
《Optimization》2012,61(4):773-800
Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs.  相似文献   

3.
This note concerns discrete-time controlled Markov chains with Borel state and action spaces. Given a nonnegative cost function, the performance of a control policy is measured by the superior limit risk-sensitive average criterion associated with a constant and positive risk sensitivity coefficient. Within such a framework, the discounted approach is used (a) to establish the existence of solutions for the corresponding optimality inequality, and (b) to show that, under mild conditions on the cost function, the optimal value functions corresponding to the superior and inferior limit average criteria coincide on a certain subset of the state space. The approach of the paper relies on standard dynamic programming ideas and on a simple analytical derivation of a Tauberian relation.  相似文献   

4.
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957).  相似文献   

5.
In this paper, we consider the nonstationary Markov decision processes (MDP, for short) with average variance criterion on a countable state space, finite action spaces and bounded one-step rewards. From the optimality equations which are provided in this paper, we translate the average variance criterion into a new average expected cost criterion. Then we prove that there exists a Markov policy, which is optimal in an original average expected reward criterion, that minimizies the average variance in the class of optimal policies for the original average expected reward criterion.  相似文献   

6.
本文对可数状态集、非空决策集、报酬无界的平均准则马氏决策过程,提出了一组新的条件,在此条件下存在(ε)最优平稳策略,且当最优不等式中的和有定义时最优不等式也成立。  相似文献   

7.
A minimax control problem for a coupled system of a semilinear elliptic equation and an obstacle variational inequality is considered. The major novelty of such problem lies in the simultaneous presence of a nonsmooth state equation (variational inequality) and a nonsmooth cost function (sup norm). In this paper, the existence of optimal controls and the optimality conditions are established.  相似文献   

8.
This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy.  相似文献   

9.
We study perturbations of a discrete-time Markov control process on a general state space. The amount of perturbation is measured by means of the Kantorovich distance. We assume that an average (per unit of time on the infinite horizon) optimal control policy can be found for the perturbed (supposedly known) process, and that it is used to control the original (unperturbed) process. The one-stage cost is not assumed to be bounded. Under Lyapunov-like conditions we find upper bounds for the average cost excess when such an approximation is used in place of the optimal (unknown) control policy. As an application of the found inequalities we consider the approximation by relevant empirical distributions. We illustrate our results by estimating the stability of a simple autoregressive control process. Also examples of unstable processes are provided.  相似文献   

10.
本文研究了在一般状态空间具有平均费用的非平稳Markov决策过程,把在平稳情形用补充的折扣模型的最优方程来建立平均费用的最优方程的结果,推广到非平稳的情形.利用这个结果证明了最优策略的存在性.  相似文献   

11.
This work is concerned with controlled Markov chains with finite state and action spaces. It is assumed that the decision maker has an arbitrary but constant risk sensitivity coefficient, and that the performance of a control policy is measured by the long-run average cost criterion. Within this framework, the existence of solutions of the corresponding risk-sensitive optimality equation for arbitrary cost function is characterized in terms of communication properties of the transition law.  相似文献   

12.
This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given.  相似文献   

13.
14.
This paper investigates finite horizon semi-Markov decision processes with denumerable states. The optimality is over the class of all randomized history-dependent policies which include states and also planning horizons, and the cost rate function is assumed to be bounded below. Under suitable conditions, we show that the value function is a minimum nonnegative solution to the optimality equation and there exists an optimal policy. Moreover, we develop an effective algorithm for computing optimal policies, derive some properties of optimal policies, and in addition, illustrate our main results with a maintenance system.  相似文献   

15.
This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature.  相似文献   

16.
In a M/M/N+M queue, when there are many customers waiting, it may be preferable to reject a new arrival rather than risk that arrival later abandoning without receiving service. On the other hand, rejecting new arrivals increases the percentage of time servers are idle, which also may not be desirable. We address these trade-offs by considering an admission control problem for a M/M/N+M queue when there are costs associated with customer abandonment, server idleness, and turning away customers. First, we formulate the relevant Markov decision process (MDP), show that the optimal policy is of threshold form, and provide a simple and efficient iterative algorithm that does not presuppose a bounded state space to compute the minimum infinite horizon expected average cost and associated threshold level. Under certain conditions we can guarantee that the algorithm provides an exact optimal solution when it stops; otherwise, the algorithm stops when a provided bound on the optimality gap is reached. Next, we solve the approximating diffusion control problem (DCP) that arises in the Halfin–Whitt many-server limit regime. This allows us to establish that the parameter space has a sharp division. Specifically, there is an optimal solution with a finite threshold level when the cost of an abandonment exceeds the cost of rejecting a customer; otherwise, there is an optimal solution that exercises no control. This analysis also yields a convenient analytic expression for the infinite horizon expected average cost as a function of the threshold level. Finally, we propose a policy for the original system that is based on the DCP solution, and show that this policy is asymptotically optimal. Our extensive numerical study shows that the control that arises from solving the DCP achieves a very similar cost to the control that arises from solving the MDP, even when the number of servers is small.  相似文献   

17.
We study infinite-horizon asymptotic average optimality for parallel server networks with multiple classes of jobs and multiple server pools in the Halfin–Whitt regime. Three control formulations are considered: (1) minimizing the queueing and idleness cost, (2) minimizing the queueing cost under constraints on idleness at each server pool, and (3) fairly allocating the idle servers among different server pools. For the third problem, we consider a class of bounded-queue, bounded-state (BQBS) stable networks, in which any moment of the state is bounded by that of the queue only (for both the limiting diffusion and diffusion-scaled state processes). We show that the optimal values for the diffusion-scaled state processes converge to the corresponding values of the ergodic control problems for the limiting diffusion. We present a family of state-dependent Markov balanced saturation policies (BSPs) that stabilize the controlled diffusion-scaled state processes. It is shown that under these policies, the diffusion-scaled state process is exponentially ergodic, provided that at least one class of jobs has a positive abandonment rate. We also establish useful moment bounds, and study the ergodic properties of the diffusion-scaled state processes, which play a crucial role in proving the asymptotic optimality.  相似文献   

18.
In this paper we are concerned with the existence of optimal stationary policies for infinite-horizon risk-sensitive Markov control processes with denumerable state space, unbounded cost function, and long-run average cost. Introducing a discounted cost dynamic game, we prove that its value function satisfies an Isaacs equation, and its relationship with the risk-sensitive control problem is studied. Using the vanishing discount approach, we prove that the risk-sensitive dynamic programming inequality holds, and derive an optimal stationary policy. Accepted 1 October 1997  相似文献   

19.
This paper deals with a one-dimensional controlled diffusion process on a compact interval with reflecting boundaries. The set of available actions is finite and the action can be changed only at countably many stopping times. The cost structure includes both a continuous movement cost rate depending on the state and the action, and a switching cost when the action is changed. The policies are evaluated with respect to the average cost criterion. The problem is solved by looking at, for each stationary policy, an embedded stochastic process corresponding to the state intervals visited in the sequence of switching times. The communicating classes of this process are classified into closed and transient groups and a method of calculating the average cost for the closed and transient classes is given. Also given are conditions to guarantee the optimality of a stationary policy. A Brownian motion control problem with quadratic cost is worked out in detail and the form of an optimal policy is established.  相似文献   

20.
An average-reward Markov decision process (MDP) with discretetime parameter, denumerable state space, and bounded reward function is considered. With such a model, we associate a family of MDPs. Then, we determinenecessary conditions for the existence of a bounded solution to the optimality equation for each one of the models in the family. Moreover,necessary andsufficient conditions are given so that the optimality equations have a bounded solution with an additional property.This research was supported in part by the Consejo Nacional de Ciencia y Tecnología (CONACYT) under Grant No. PCEXCNA-040640.Dedicated to Professor Eutimio Alberto Cuéllar-Goríbar on the occasion of his eightieth birthday  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号