首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Ambulance diversion (AD) is used by emergency departments (EDs) to relieve congestion by requesting ambulances to bypass the ED and transport patients to another facility. We study optimal AD control policies using a Markov Decision Process (MDP) formulation that minimizes the average time that patients wait beyond their recommended safety time threshold. The model assumes that patients can be treated in one of two treatment areas and that the distribution of the time to start treatment at the neighboring facility is known. Assuming Poisson arrivals and exponential times for the length of stay in the ED, we show that the optimal AD policy follows a threshold structure, and explore the behavior of optimal policies under different scenarios. We analyze the value of information on the time to start treatment in the neighboring hospital, and show that optimal policies depend strongly on the congestion experienced by the other facility. Simulation is used to compare the performance of the proposed MDP model to that of simple heuristics under more realistic assumptions. Results indicate that the MDP model performs significantly better than the tested heuristics under most cases. Finally, we discuss practical issues related to the implementation of the policies prescribed by the MDP.  相似文献   

2.
Motivated by an application to school funding, we introduce the notion of a robust decomposable Markov decision process (MDP). A robust decomposable MDP model applies to situations where several MDPs, with the transition probabilities in each only known through an uncertainty set, are coupled together by joint resource constraints. Robust decomposable MDPs are different than both decomposable MDPs, and robust MDPs and cannot be solved by a direct application of the solution methods from either of those areas. In fact, to the best of our knowledge, there is no known method to tractably compute optimal policies in robust, decomposable MDPs. We show how to tractably compute good policies for this model, and apply the derived method to a stylized school funding example.  相似文献   

3.
We consider a Markov decision process (MDP) under average reward criterion. We investigate the decomposition of such MDP into smaller MDPs by using the strongly connected classes in the associated graph. Then, by introducing the associated levels, we construct an aggregation-disaggregation algorithm for the computation of an optimal strategy for the original MDP.  相似文献   

4.
We present an implementation of the procedure for determining a suboptimal policy for a large-scale Markov decision process (MDP) presented in Part 1. An operation count analysis illuminates the significant computational benefits of this procedure for determining an optimal policy relative to a procedure for determining a suboptimal policy based on state and action space aggregation. Results of a preliminary numerical study indicate that the quality of the suboptimal policy produced by the 3MDP approach shows promise.This research has been supported by NSF Grants Nos. ECS-80-18266 and ECS-83-19355.  相似文献   

5.
In this paper we consider healthcare policy issues for trading off resources in testing, prevention, and cure of two-stage contagious diseases. An individual that has contracted the two-stage contagious disease will initially show no symptoms of the disease but is capable of spreading it. If the initial stages are not detected which could lead to complications eventually, then symptoms start appearing in the latter stage when it would be necessary to perform expensive treatment. Under a constrained budget situation, policymakers are faced with the decision of how to allocate budget for prevention (via vaccinations), subsidizing treatment, and examination to detect the presence of initial stages of the contagious disease. These decisions need to be performed in each period of a given time horizon. To aid this decision-making exercise, we formulate a stochastic dynamic optimal control problem with feedback which can be modeled as a Markov decision process (MDP). However, solving the MDP is computationally intractable due to the large state space as the embedded stochastic network cannot be decomposed. Hence we propose an asymptotically optimal solution based on a fluid model of the dynamics in the stochastic network. We heuristically fine-tune the asymptotically optimal solution for the non-asymptotic case, and test it extensively for several numerical cases. In particular we investigate the effect of budget, length of planning horizon, type of disease, population size, and ratio of costs on the policy for budget allocation.  相似文献   

6.
We develop an online actor–critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.  相似文献   

7.
Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy  相似文献   

8.
We consider limiting average Markov decision processes (MDP) with finite state and action spaces. We propose some algorithms to determine optimal strategies for deterministic and general MDPs. These algorithms are based on graph theory and the construction of levels in some aggregated MDP.  相似文献   

9.
We consider a general adversarial stochastic optimization model. Our model involves the design of a system that an adversary may subsequently attempt to destroy or degrade. We introduce SPAR, which utilizes mixed-integer programming for the design decision and a Markov decision process (MDP) for the modeling of our adversarial phase.  相似文献   

10.
In this paper, we present a numerical approach to an inverse problem of a population dynamics model. We propose a numerical approximation of the optimal control for obtaining the desired observation state using the augmented Lagrangian method. Moreover, the existence and uniqueness of the numerical solutions are mathematically investigated in this work. Finally, we present some numerical experiments to illustrate our theoretical results.  相似文献   

11.
 We give a policy-improvement type algorithm to locate an optimal pure stationary strategy for discounted stochastic games with perfect information. A graph theoretic motivation for our algorithm is presented as well. Received: January 1998 / Accepted: May 2002 Published online: February 14, 2003 Key words. stochastic games – MDP – perfect information – policy iteration Partially Funded by NSF Grant DMS 930-1052 and DMS 970-4951  相似文献   

12.
This paper presents extensive computational experiments to compare 10 heuristics and 20 metaheuristics for the maximum diversity problem (MDP). This problem consists of selecting a subset of maximum diversity from a given set of elements. It arises in a wide range of real-world settings and we can find a large number of studies, in which heuristic and metaheuristic methods are proposed. However, probably due to the fact that this problem has been referenced under different names, we have only found limited comparisons with a few methods on some sets of instances. This paper reviews all the heuristics and metaheuristics for finding near-optimal solutions for the MDP. We present the new benchmark library MDPLIB, which includes most instances previously used for this problem, as well as new ones, giving a total of 315. We also present an exhaustive computational comparison of the 30 methods on the MDPLIB. Non-parametric statistical tests are reported in our study to draw significant conclusions.  相似文献   

13.
Directed hypergraphs represent a general modelling and algorithmic tool, which have been successfully used in many different research areas such as artificial intelligence, database systems, fuzzy systems, propositional logic and transportation networks. However, modelling Markov decision processes using directed hypergraphs has not yet been considered.In this paper we consider finite-horizon Markov decision processes (MDPs) with finite state and action space and present an algorithm for finding the K best deterministic Markov policies. That is, we are interested in ranking the first K deterministic Markov policies in non-decreasing order using an additive criterion of optimality. The algorithm uses a directed hypergraph to model the finite-horizon MDP. It is shown that the problem of finding the optimal policy can be formulated as a minimum weight hyperpath problem and be solved in linear time, with respect to the input data representing the MDP, using different additive optimality criteria.  相似文献   

14.
李猜  耿娜  王春鸣 《运筹与管理》2019,28(7):108-117
血液是一种特殊的稀缺资源,临床上通过输血可以治疗一些血液疾病甚至挽救生命。目前血液尚不能人工制造,只能通过献血者捐献的方式获得,因此需要合理的库存管理来有效利用有限的血液资源。近期研究表明,血小板生命周期较短,且其输注效果会随着储存时间的延长而降低。因此,本文针对血小板的库存控制问题,考虑短缺、浪费和输血新鲜度之间的平衡,以效用最大化为目标建立有限时域的马尔科夫决策过程模型,探讨最优的血小板订货策略和使用策略。根据最优策略数值解的主要特征提出了启发式策略,数值实验结果表明所提启发式控制策略性能接近于最优控制策略,且优于传统策略。  相似文献   

15.
16.
We consider a discrete time Markov Decision Process (MDP) under the discounted payoff criterion in the presence of additional discounted cost constraints. We study the sensitivity of optimal Stationary Randomized (SR) policies in this setting with respect to the upper bound on the discounted cost constraint functionals. We show that such sensitivity analysis leads to an improved version of the Feinberg–Shwartz algorithm (Math Oper Res 21(4):922–945, 1996) for finding optimal policies that are ultimately stationary and deterministic.  相似文献   

17.
刘阳  耿娜 《运筹与管理》2017,26(9):78-87
及时的检查对于患者病情诊断和治疗非常重要。然而,患者不同的紧急程度、检查项目的多样性、以及患者的行为因素如失约等,使门诊患者调度问题难以求解。为解决该问题,本文考虑患者对两种检查项目的不同需求,两类不同的紧急程度,以及患者失约和医生加班,建立了有限时域马尔可夫决策过程(MDP)模型,目标是使得患者检查所得的期望收益最大化以及期望加班时间惩罚成本最小化。由于MDP模型复杂,难以用解析方法来分析最优控制策略,因此本文基于MDP模型进行数值实验,观察最优解的结构特征,进一步构造了两种参数化启发式调度策略,并采用遗传算法对调度策略的参数进行优化。数值实验比较了最优控制策略、两种启发式调度策略以及先到先服务规则,实验结果表明本文所提的调度策略性能偏离最优解不超过10%;当工作负荷非常大时,启发式调度策略远远优于先到先服务规则。  相似文献   

18.
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms.  相似文献   

19.
The admission decision is one of the fundamental categories of demand-management decisions. In the dynamic model of the single-resource capacity control problem, the distribution of demand does not explicitly depend on external conditions. However, in reality, demand may depend on the current external environment which represents the prevailing economic, financial, social or other factors that affect customer behavior. We formulate a Markov Decision Process (MDP) to maximize expected revenues over a finite horizon that explicitly models the current environment. We derive some structural results of the optimal admission policy, including the existence of an environment-dependent thresholds and a comparison of threshold levels in different environments. We also present some computational results which illustrate these structural properties. Finally, we extend some of the results to a related dynamic pricing formulation.  相似文献   

20.
Displaying night-vision thermal images with day-time colors is paramount for scene interpretation and target tracking. In this paper, we employ object recognition methods for colorization, which amounts to segmenting thermal images into plants, buildings, sky, water, roads and others, then calculating colors to each class. The main thrust of our work is the introduction of Markov decision processes (MDP) to deal with the computational complexity of the colorization problem. MDP provides us with the approaches of neighborhood analysis and probabilistic classification which we exploit to efficiently solve chromatic estimation. We initially label the segments with a classifier, paving the way for the neighborhood analysis. We then update classification confidences of each class by MDP under the consideration of neighboring consistency and scenery layout. Finally we calculate the colors for every segment by blending the characteristic colors of each class it belongs to in a probabilistic way. Experimental results show that the colorized appearance of our algorithm is satisfactory and harmonious; the computational speed is quite fast as well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号