首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Continuous time Markovian decision models with countable state space are investigated. The existence of an optimal stationary policy is established for the expected average return criterion function. It is shown that the expected average return can be expressed as an expected discounted return of a related Markovian decision process. A policy iteration method is given which converges to an optimal deterministic policy, the policy so obtained is shown optimal over all Markov policies.  相似文献   

2.
3.
This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation.  相似文献   

4.
We computationally assess policies for the elevator control problem by a new column-generation approach for the linear programming method for discounted infinite-horizon Markov decision problems. By analyzing the optimality of given actions in given states, we were able to provably improve the well-known nearest-neighbor policy. Moreover, with the method we could identify an optimal parking policy. This approach can be used to detect and resolve weaknesses in particular policies for Markov decision problems.  相似文献   

5.
In this paper we consider a Markov decision model introduced by Economou (2003), in which it was proved that the optimal policy in the problem of controlling a compound immigration process through total catastrophes is of control-limit type. We show that the average cost of a control-limit policy is unimodal as a function of the critical point. This result enables us to design very efficient algorithms for the computation of the optimal policy as the bisection procedure and a special-purpose policy iteration algorithm that operates on the class of control-limit policies.AMS 2000 Subject Classification: Primary 9OC40; Secondary 6OJ25  相似文献   

6.
We consider a discrete time Markov decision process (MDP) with a finite state space, a finite action space, and two kinds of immediate rewards. The problem is to maximize the time average reward generated by one reward stream, subject to the other reward not being smaller than a prescribed value. An MDP with a reward constraint can be solved by linear programming in the range of mixed policies. On the other hand, when we restrict ourselves to pure policies, the problem is a combinatorial problem, for which a solution has not been discovered. In this paper, we propose an approach by Genetic Algorithms (GAs) in order to obtain an effective search process and to obtain a near optimal, possibly optimal pure stationary policy. A numerical example is given to examine the efficiency of the approach proposed.  相似文献   

7.
In the first section a modification to Howard's policy improvement routine for Markov decision problems is described. The modified routine normally converges the more rapidly to the optimal policy. In the second section a particular form of recurrence relation, which leads to the rapid determination of improved policies is developed for a certain type of dynamic programming problem. The relation is used to show that the repair limit method is the optimal strategy for a basic equipment replacement problem.  相似文献   

8.
Directed hypergraphs represent a general modelling and algorithmic tool, which have been successfully used in many different research areas such as artificial intelligence, database systems, fuzzy systems, propositional logic and transportation networks. However, modelling Markov decision processes using directed hypergraphs has not yet been considered.In this paper we consider finite-horizon Markov decision processes (MDPs) with finite state and action space and present an algorithm for finding the K best deterministic Markov policies. That is, we are interested in ranking the first K deterministic Markov policies in non-decreasing order using an additive criterion of optimality. The algorithm uses a directed hypergraph to model the finite-horizon MDP. It is shown that the problem of finding the optimal policy can be formulated as a minimum weight hyperpath problem and be solved in linear time, with respect to the input data representing the MDP, using different additive optimality criteria.  相似文献   

9.
In the theory and applications of Markov decision processes introduced by Howard and subsequently developed by many authors, it is assumed that actions can be chosen independently at each state. A policy constrained Markov decision process is one where selecting a given action in one state restricts the choice of actions in another. This note describes a method for determining a maximal gain policy in the policy constrained case. The method involves the use of bounds on the gain of the feasible policies to produce a policy ranking list. This list then forms a basis for a bounded enumeration procedure which yields the optimal policy.  相似文献   

10.
In this paper we consider a model consisting of a deteriorating installation that transfers a raw material to a production unit and a buffer which has been built between the installation and the production unit. The deterioration process of the installation is considered to be nonstationary, i.e. the transition probabilities may depend not only on the working conditions of the installation but on its age as well. The problem of the optimal preventive maintenance of the installation is considered. Under a suitable cost structure it is shown that, for fixed age of the installation and fixed buffer level, the optimal policy is of control-limit type. When the deterioration process is stationary, an efficient Markov decision algorithm operating on the class of control-limit policies is developed. There is strong numerical evidence that the algorithm converges to the optimal policy. Two generalizations of this model are also discussed.  相似文献   

11.
We treat an inventory control problem in a facility that provides a single type of service for customers. Items used in service are supplied by an outside supplier. To incorporate lost sales due to service delay into the inventory control, we model a queueing system with finite waiting room and non-instantaneous replenishment process and examine the impact of finite buffer on replenishment policies. Employing a Markov decision process theory, we characterize the optimal replenishment policy as a monotonic threshold function of reorder point under the discounted cost criterion. We present a simple procedure that jointly finds optimal buffer size and order quantity.  相似文献   

12.
We consider a problem where different classes of customers can book different types of services in advance and the service company has to respond immediately to the booking request confirming or rejecting it. Due to the possibility of cancellations before the day of service, or no-shows at the day of service, overbooking the given capacity is a viable decision. The objective of the service company is to maximize profit made of class-type specific revenues, refunds for cancellations or no-shows as well as the cost of overtime. For the calculation of the latter, information of the underlying appointment schedule is required. Throughout the paper we will relate the problem to capacity allocation in radiology services. Drawing upon ideas from revenue management, overbooking, and appointment scheduling we model the problem as a Markov decision process in discrete time which due to proper aggregation can be optimally solved with an iterative stochastic dynamic programming approach. In an experimental study we successfully apply the approach to a real world problem with data from the radiology department of a hospital. Furthermore, we compare the optimal policy to four heuristic policies, of whom one is currently in use. We can show that the optimal policy significantly improves the currently used policy and that a nested booking limit type policy closely approximates the optimal policy and is thus recommended for use in practice.  相似文献   

13.
We give mild conditions for the existence of optimal solutions for a Markov decision problem with average cost, under m constraints of the same kind, in Borel actions and states spaces. Moreover, there is an optimal policy that is a convex combination of at most m+1 deterministic policies.  相似文献   

14.
Military medical planners must develop dispatching policies that dictate how aerial medical evacuation (MEDEVAC) units are utilized during major combat operations. The objective of this research is to determine how to optimally dispatch MEDEVAC units in response to 9-line MEDEVAC requests to maximize MEDEVAC system performance. A discounted, infinite horizon Markov decision process (MDP) model is developed to examine the MEDEVAC dispatching problem. The MDP model allows the dispatching authority to accept, reject, or queue incoming requests based on a request’s classification (i.e., zone and precedence level) and the state of the MEDEVAC system. A representative planning scenario based on contingency operations in southern Afghanistan is utilized to investigate the differences between the optimal dispatching policy and three practitioner-friendly myopic policies. Two computational experiments are conducted to examine the impact of selected MEDEVAC problem features on the optimal policy and the system performance measure. Several excursions are examined to identify how the 9-line MEDEVAC request arrival rate and the MEDEVAC flight speeds impact the optimal dispatching policy. Results indicate that dispatching MEDEVAC units considering the precedence level of requests and the locations of busy MEDEVAC units increases the performance of the MEDEVAC system. These results inform the development and implementation of MEDEVAC tactics, techniques, and procedures by military medical planners. Moreover, an analysis of solution approaches for the MEDEVAC dispatching problem reveals that the policy iteration algorithm substantially outperforms the linear programming algorithms executed by CPLEX 12.6 with regard to computational effort. This result supports the claim that policy iteration remains the superlative solution algorithm for exactly solving computationally tractable Markov decision problems.  相似文献   

15.
In multi-location inventory systems, transshipments are often used to improve customer service and reduce cost. Determining optimal transshipment policies for such systems involves a complex optimisation problem that is only tractable for systems with few locations. Consequently simple heuristic transshipment policies are often applied in practice. This paper develops an approximate solution method which applies decomposition to reduce a Markov decision process model of a multi-location inventory system into a number of models involving only two locations. The value functions from the subproblems are used to estimate the fair charge for the inventory provided in a transshipment. This estimate of the fair charge is used as the decision criterion in a heuristic transshipment policy for the multi-location system. A numerical study shows that the proposed heuristic can deliver considerable cost savings compared to the simple heuristics often used in practice.  相似文献   

16.
1.IntroductionandModelTheearlierliteratureaboutconstrainedMarkovdecisionprocesses(MDPs,forshort)canbefoundinDerman'sbook[1].Later,therehavebeenmanyachievementsinthisarea.Forexample,averagerewardMDPswithaconstrainthasbeendiscussedbyBeutleandRosslz],HordijkandKallenberg[3]jAltmanandSchwartz[4],etal.Inthecaseoffinitestatespace,discountedrewardcriterionMDPswithaconstrainthasbeentreatedbyKallenberg['landTanaka[6],etal.Whenstatespaceisdenumerable,suchproblemswerediscussedbySennott[71andAlt…  相似文献   

17.
郭先平 《数学学报》2001,44(2):333-342
本文考虑具有 Borel状态空间和行动空间非平稳 MDP的平均方差准则.首先,在遍历条件下,利用最优方程,证明了关于平均期望目标最优马氏策略的存在性.然后,通过构造新的模型,利用马氏过程的理论,进一步证明了在关于平均期望目标是最优的一类马氏策略中,存在一个马氏策略使得平均方差达到最小.作为本文的特例还得到了 Dynkin E. B.和 Yushkevich A. A.及 Kurano M.等中的主要结果.  相似文献   

18.
In this paper, we consider a mean–variance optimization problem for Markov decision processes (MDPs) over the set of (deterministic stationary) policies. Different from the usual formulation in MDPs, we aim to obtain the mean–variance optimal policy that minimizes the variance over a set of all policies with a given expected reward. For continuous-time MDPs with the discounted criterion and finite-state and action spaces, we prove that the mean–variance optimization problem can be transformed to an equivalent discounted optimization problem using the conditional expectation and Markov properties. Then, we show that a mean–variance optimal policy and the efficient frontier can be obtained by policy iteration methods with a finite number of iterations. We also address related issues such as a mutual fund theorem and illustrate our results with an example.  相似文献   

19.
We consider a discrete-time Markov decision process with a partially ordered state space and two feasible control actions in each state. Our goal is to find general conditions, which are satisfied in a broad class of applications to control of queues, under which an optimal control policy is monotonic. An advantage of our approach is that it easily extends to problems with both information and action delays, which are common in applications to high-speed communication networks, among others. The transition probabilities are stochastically monotone and the one-stage reward submodular. We further assume that transitions from different states are coupled, in the sense that the state after a transition is distributed as a deterministic function of the current state and two random variables, one of which is controllable and the other uncontrollable. Finally, we make a monotonicity assumption about the sample-path effect of a pairwise switch of the actions in consecutive stages. Using induction on the horizon length, we demonstrate that optimal policies for the finite- and infinite-horizon discounted problems are monotonic. We apply these results to a single queueing facility with control of arrivals and/or services, under very general conditions. In this case, our results imply that an optimal control policy has threshold form. Finally, we show how monotonicity of an optimal policy extends in a natural way to problems with information and/or action delay, including delays of more than one time unit. Specifically, we show that, if a problem without delay satisfies our sufficient conditions for monotonicity of an optimal policy, then the same problem with information and/or action delay also has monotonic (e.g., threshold) optimal policies.  相似文献   

20.
This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号