首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 921 毫秒
1.
2.
Iwamoto recently established a formal transformation via an invariant imbedding to construct a controlled Markov chain that can be solved in a backward manner, as in backward induction for finite-horizon Markov decision processes (MDPs), for a given controlled Markov chain with non-additive forward recursive objective function criterion. Chang et al. presented formal methods, called “parallel rollout” and “policy switching,” of combining given multiple policies in MDPs and showed that the policies generated by both methods improve all of the policies that the methods combine. This brief paper extends the methods of parallel rollout and policy switching for forward recursive objective function criteria and shows that the similar property holds as in MDPs. We further discuss how to implement these methods via simulation.  相似文献   

3.
We computationally assess policies for the elevator control problem by a new column-generation approach for the linear programming method for discounted infinite-horizon Markov decision problems. By analyzing the optimality of given actions in given states, we were able to provably improve the well-known nearest-neighbor policy. Moreover, with the method we could identify an optimal parking policy. This approach can be used to detect and resolve weaknesses in particular policies for Markov decision problems.  相似文献   

4.
System dynamics has been seen primarily as a strategic tool, most effectively used at the highest level of strategy to identify robust policy interventions under a wide range of scenarios. However, an alternative, complementary and powerful role is emerging. This is at an ‘intermediate level’ in organisations to coordinate and integrate policies across the value chain. It is at this level where business value, as defined by the discounted value of future free cash flow, is both created and destroyed. This paper introduces the need for ‘intermediate-level’ and ‘value-based’ modelling and emphasises the natural role of system dynamics in supporting a methodology to fulfil the need. It describes the development of an approach and its application in the oil industry to coordinate the response of people and tools within operational, financial and commercial functions across the value chain to address a variety of problems and issues.  相似文献   

5.
We establish the posterior consistency for parametric, partially observed, fully dominated Markov models. The prior is assumed to assign positive probability to all neighborhoods of the true parameter, for a distance induced by the expected Kullback–Leibler divergence between the parametric family members’ Markov transition densities. This assumption is easily checked in general. In addition, we show that the posterior consistency is implied by the consistency of the maximum likelihood estimator. The result is extended to possibly improper priors and non-stationary observations. Finally, we check our assumptions on a linear Gaussian model and a well-known stochastic volatility model.  相似文献   

6.
This paper investigates an optimal inspection and replacement problem for a discrete-time Markovian deterioration system. It is assumed that the system is monitored incompletely by a certain mechanism which gives the decision maker some information about the exact state of the system. The problem is to obtain an optimal inspection and replacement policy minimizing the expected total discounted cost over an infinite horizon and formulated as a partially observable Markov decision process. Furthermore, under some reasonable conditions reflecting the practical meaning of the deterioration, it is shown that there exists an optimal inspection and replacement policy in the class of monotonic four-region policies.  相似文献   

7.
This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given.  相似文献   

8.
This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527.  相似文献   

9.
In the steady state of a discrete time Markov decision process, we consider the problem to find an optimal randomized policy that minimizes the variance of the reward in a transition among the policies which give the mean not less than a specified value. The problem is solved by introducing a parametric Markov decision process with average cost criterion. It is shown that there exists an optimal policy which is a mixture of at most two pure policies. As an application, the toymaker's problem is discussed.  相似文献   

10.
We study a unichain Markov decision process i.e. a controlled Markov process whose state process under a stationary policy is an ergodic Markov chain. Here the state and action spaces are assumed to be either finite or countable. When the state process is uniformly ergodic and the immediate cost is bounded then a policy that minimizes the long-term expected average cost also has an nth stage sample path cost that with probability one is asymptotically less than the nth stage sample path cost under any other non-optimal stationary policy with a larger expected average cost. This is a strengthening in the Markov model case of the a.s. asymptotically optimal property frequently discussed in the literature.  相似文献   

11.
This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation.  相似文献   

12.
We consider the constrained optimization of a finite-state, finite action Markov chain. In the adaptive problem, the transition probabilities are assumed to be unknown, and no prior distribution on their values is given. We consider constrained optimization problems in terms of several cost criteria which are asymptotic in nature. For these criteria we show that it is possible to achieve the same optimal cost as in the non-adaptive case.We first formulate a constrained optimization problem under each of the cost criteria and establish the existence of optimal stationary policies.Since the adaptive problem is inherently non-stationary, we suggest a class ofAsymptotically Stationary (AS) policies, and show that, under each of the cost criteria, the costs of an AS policy depend only on its limiting behavior. This property implies that there exist optimal AS policies. A method for generating adaptive policies is then suggested, which leads to strongly consistent estimators for the unknown transition probabilities. A way to guarantee that these policies are also optimal is to couple them with the adaptive algorithm of [3]. This leads to optimal policies for each of the adaptive constrained optimization problems under discussion.This work was supported in part through United States-Israel Binational Science Foundation Grant BSF 85-00306.  相似文献   

13.
We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.This research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152.  相似文献   

14.
In the theory and applications of Markov decision processes introduced by Howard and subsequently developed by many authors, it is assumed that actions can be chosen independently at each state. A policy constrained Markov decision process is one where selecting a given action in one state restricts the choice of actions in another. This note describes a method for determining a maximal gain policy in the policy constrained case. The method involves the use of bounds on the gain of the feasible policies to produce a policy ranking list. This list then forms a basis for a bounded enumeration procedure which yields the optimal policy.  相似文献   

15.
This paper is a survey of papers which make use of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return). It covers infinite-horizon nondiscounted formulations, infinite-horizon discounted formulations, and finite-horizon formulations. For problem formulations in terms solely of the probabilities of being in each state and taking each action, policy equivalence results are given which allow policies to be restricted to the class of Markov policies or to the randomizations of deterministic Markov policies. For problems which cannot be stated in such terms, in terms of the primitive state setI, formulations involving a redefinition of the states are examined.The author would like to thank two referees for a very thorough and helpful referceing of the original article and for the extra references (Refs. 47–52) now added to the original reference list.  相似文献   

16.
Discrete time countable state Markov decision processes with finite decision sets and bounded costs are considered. Conditions are given under which an unbounded solution to the average cost optimality equation exists and yields an optimal stationary policy. A new form of the optimality equation is derived for the case in which every stationary policy gives rise to an ergodic Markov chain.  相似文献   

17.
We consider a discrete-time Markov decision process with a partially ordered state space and two feasible control actions in each state. Our goal is to find general conditions, which are satisfied in a broad class of applications to control of queues, under which an optimal control policy is monotonic. An advantage of our approach is that it easily extends to problems with both information and action delays, which are common in applications to high-speed communication networks, among others. The transition probabilities are stochastically monotone and the one-stage reward submodular. We further assume that transitions from different states are coupled, in the sense that the state after a transition is distributed as a deterministic function of the current state and two random variables, one of which is controllable and the other uncontrollable. Finally, we make a monotonicity assumption about the sample-path effect of a pairwise switch of the actions in consecutive stages. Using induction on the horizon length, we demonstrate that optimal policies for the finite- and infinite-horizon discounted problems are monotonic. We apply these results to a single queueing facility with control of arrivals and/or services, under very general conditions. In this case, our results imply that an optimal control policy has threshold form. Finally, we show how monotonicity of an optimal policy extends in a natural way to problems with information and/or action delay, including delays of more than one time unit. Specifically, we show that, if a problem without delay satisfies our sufficient conditions for monotonicity of an optimal policy, then the same problem with information and/or action delay also has monotonic (e.g., threshold) optimal policies.  相似文献   

18.
Military medical planners must develop dispatching policies that dictate how aerial medical evacuation (MEDEVAC) units are utilized during major combat operations. The objective of this research is to determine how to optimally dispatch MEDEVAC units in response to 9-line MEDEVAC requests to maximize MEDEVAC system performance. A discounted, infinite horizon Markov decision process (MDP) model is developed to examine the MEDEVAC dispatching problem. The MDP model allows the dispatching authority to accept, reject, or queue incoming requests based on a request’s classification (i.e., zone and precedence level) and the state of the MEDEVAC system. A representative planning scenario based on contingency operations in southern Afghanistan is utilized to investigate the differences between the optimal dispatching policy and three practitioner-friendly myopic policies. Two computational experiments are conducted to examine the impact of selected MEDEVAC problem features on the optimal policy and the system performance measure. Several excursions are examined to identify how the 9-line MEDEVAC request arrival rate and the MEDEVAC flight speeds impact the optimal dispatching policy. Results indicate that dispatching MEDEVAC units considering the precedence level of requests and the locations of busy MEDEVAC units increases the performance of the MEDEVAC system. These results inform the development and implementation of MEDEVAC tactics, techniques, and procedures by military medical planners. Moreover, an analysis of solution approaches for the MEDEVAC dispatching problem reveals that the policy iteration algorithm substantially outperforms the linear programming algorithms executed by CPLEX 12.6 with regard to computational effort. This result supports the claim that policy iteration remains the superlative solution algorithm for exactly solving computationally tractable Markov decision problems.  相似文献   

19.
Continuous time Markovian decision models with countable state space are investigated. The existence of an optimal stationary policy is established for the expected average return criterion function. It is shown that the expected average return can be expressed as an expected discounted return of a related Markovian decision process. A policy iteration method is given which converges to an optimal deterministic policy, the policy so obtained is shown optimal over all Markov policies.  相似文献   

20.
In this paper we combine the idea of ‘power steady model’, ‘discount factor’ and ‘power prior’, for a general class of filter model, more specifically within a class of dynamic generalized linear models (DGLM). We show an optimality property for our proposed method and present the particle filter algorithm for DGLM as an alternative to Markov chain Monte Carlo method. We also present two applications; one on dynamic Poisson models for hurricane count data in Atlantic ocean and the another on the dynamic Poisson regression model for longitudinal count data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号