首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider a multi-sever Markovian queueing system with abandonments where admitted customers pay a reward either at the time of arrival or service completion. There is a cost associated with abandonments and a holding cost associated with customers in the system. We prove that the policy that maximizes the long-run average reward is of threshold type and completely characterize the optimal thresholds. We conclude with a comparison of various characteristics of the two variants of the model.  相似文献   

2.
This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non-randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application is presented.  相似文献   

3.
4.
Kuri  Joy  Kumar  Anurag 《Queueing Systems》1997,27(1-2):1-16
We consider a problem of admission control to a single queue in discrete time. The controller has access to k step old queue lengths only, where k can be arbitrary. The problem is motivated, in particular, by recent advances in high-speed networking where information delays have become prominent. We formulate the problem in the framework of Completely Observable Controlled Markov Chains, in terms of a multi-dimensional state variable. Exploiting the structure of the problem, we show that under appropriate conditions, the multi-dimensional Dynamic Programming Equation (DPE) can be reduced to a unidimensional one. We then provide simple computable upper and lower bounds to the optimal value function corresponding to the reduced unidimensional DPE. These upper and lower bounds, along with a certain relationship among the parameters of the problem, enable us to deduce partially the structural features of the optimal policy. Our approach enables us to recover simply, in part, the recent results of Altman and Stidham, who have shown that a multiple-threshold-type policy is optimal for this problem. Further, under the same relationship among the parameters of the problem, we provide easily computable upper bounds to the multiple thresholds and show the existence of simple relationships among these upper bounds. These relationships allow us to gain very useful insights into the nature of the optimal policy. In particular, the insights obtained are of great importance for the problem of actually computing an optimal policy because they reduce the search space enormously. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

5.
This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527.  相似文献   

6.
This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given.  相似文献   

7.
The Approximating Sequence Method for computation of average cost optimal stationary policies in denumerahle state Markov decision chains, introduced in Sennott (1994), is reviewed. New methods for verifying the assumptions are given. These are useful for models with multidimensional state spaces that satisfy certain mild structural properties. The results are applied to four problems in the optimal routing of packets to parallel queues. Numerical results are given for one of the models.  相似文献   

8.
We consider a two-class 1 preemptive priority queue in which there are two essential, on-line decisions that have to be taken. The first is the decision to either accept or reject new type-1 or type-2 jobs. The second is the decision to abort jobs, i.e., to remove any type-1 or type-2 jobs from the system. We show that there exist optimal threshold policies for these two types of, decisions.  相似文献   

9.
We present an implementation of the procedure for determining a suboptimal policy for a large-scale Markov decision process (MDP) presented in Part 1. An operation count analysis illuminates the significant computational benefits of this procedure for determining an optimal policy relative to a procedure for determining a suboptimal policy based on state and action space aggregation. Results of a preliminary numerical study indicate that the quality of the suboptimal policy produced by the 3MDP approach shows promise.This research has been supported by NSF Grants Nos. ECS-80-18266 and ECS-83-19355.  相似文献   

10.
This paper is concerned with the adaptive control problem, over the infinite horizon, for partially observable Markov decision processes whose transition functions are parameterized by an unknown vector. We treat finite models and impose relatively mild assumptions on the transition function. Provided that a sequence of parameter estimates converging in probability to the true parameter value is available, we show that the certainty equivalence adaptive policy is optimal in the long-run average sense.  相似文献   

11.
A survey of Markov decision models for control of networks of queues   总被引:2,自引:0,他引:2  
We review models for the optimal control of networks of queues. Our main emphasis is on models based on Markov decision theory and the characterization of the structure of optimal control policies.This research was partially supported by the National Science Foundation under Grant No. DDM-8719825. The Government has certain rights in this material. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The research was also partially supported by the C.I.E.S. (France), while the author was on leave at INRIA, Sophia-Antipolis, 1991–92.  相似文献   

12.
《Optimization》2012,61(9):1133-1150
This article presents a new method of linear programming (LP) for solving Markov decision processes (MDPs) based on the simplex method (SM). SM has shown to be the most efficient method in many practical problems; unfortunately, classical SM has an exponential complexity. Therefore, new SMs have emerged for obtaining optimal solutions in the most efficient way. The cosine simplex method (CSM) is one of them. CSM is based on the Karush Kuhn Tucker conditions, and is able to efficiently solve general LP problems. This work presents a new method named the Markov Cosine Simplex Method (MCSM) for solving MDP problems, which is an extension of CSM. In this article, the efficiency of MCSM is compared to the traditional revised simplex method (RSM); experimental results show that MCSM is far more efficient than RSM.  相似文献   

13.
This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.  相似文献   

14.
We consider an M|E N |1 queue in which there are two essential on-line decisions that have to be taken. The first one is the decision to either accept or reject new jobs. The second one is the decision to either continue or abort the service of a job. We show that under certain regularity conditions, there exist optimal threshold policies for these two types of decisions.  相似文献   

15.
In this paper we construct an algorithm of successive approximation type that computes a locally optimal periodic policy for a Markov decision chain with partial state information. The algorithm is applied to a queueing network with server control.The research of this author has been supported by the Netherlands Organization for Scientific Research (N.W.O.).  相似文献   

16.
We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model.  相似文献   

17.
We consider discrete-timeaverage reward Markov decision processes with denumerable state space andbounded reward function. Under structural restrictions on the model the existence of an optimal stationary policy is proved; both the lim inf and lim sup average criteria are considered. In contrast to the usual approach our results donot rely on the average regard optimality equation. Rather, the arguments are based on well-known facts fromRenewal Theory.This research was supported in part by the Consejo Nacional de Ciencia y Tecnologia (CONACYT) under Grants PCEXCNA 040640 and 050156, and by SEMAC under Grant 89-1/00ifn$.  相似文献   

18.
In this paper we extend standard dynamic programming results for the risk sensitive optimal control of discrete time Markov chains to a new class of models. The state space is only finite, but now the assumptions about the Markov transition matrix are much less restrictive. Our results are then applied to the financial problem of managing a portfolio of assets which are affected by Markovian microeconomic and macroeconomic factors and where the investor seeks to maximize the portfolio's risk adjusted growth rate.  相似文献   

19.
Recent results for parameter-adaptive Markov decision processes (MDP's) are extended to partially observed MDP's depending on unknown parameters. These results include approximations converging uniformly to the optimal reward function and asymptotically optimal adaptive policies.This research was supported in part by the Consejo del Sistema Nacional de Educación Tecnologica (COSNET) under Grant 178/84, in part by the Air Force Office of Scientific Research under Grant AFOSR-84-0089, in part by the National Science Foundation under Grant ECS-84-12100, and in part by the Joint Services Electronics Program under Contract F49602-82-C-0033.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号