期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A counterexample on overtaking optimality

Andrzej S. Nowak Oscar Vega-Amaya 《Mathematical Methods of Operations Research》1999,49(3):435-439

相似文献

2.

Multi-policy improvement in stochastic optimization with forward recursive function criteria

Hyeong Soo Chang 《Journal of Mathematical Analysis and Applications》2005,305(1):130-139

Iwamoto recently established a formal transformation via an invariant imbedding to construct a controlled Markov chain that can be solved in a backward manner, as in backward induction for finite-horizon Markov decision processes (MDPs), for a given controlled Markov chain with non-additive forward recursive objective function criterion. Chang et al. presented formal methods, called “parallel rollout” and “policy switching,” of combining given multiple policies in MDPs and showed that the policies generated by both methods improve all of the policies that the methods combine. This brief paper extends the methods of parallel rollout and policy switching for forward recursive objective function criteria and shows that the similar property holds as in MDPs. We further discuss how to implement these methods via simulation. 相似文献

3.

Computational bounds for elevator control policies by large scale linear programming

Stefan Heinz Jörg Rambau Andreas Tuchscherer 《Mathematical Methods of Operations Research》2014,79(1):87-117

We computationally assess policies for the elevator control problem by a new column-generation approach for the linear programming method for discounted infinite-horizon Markov decision problems. By analyzing the optimality of given actions in given states, we were able to provably improve the well-known nearest-neighbor policy. Moreover, with the method we could identify an optimal parking policy. This approach can be used to detect and resolve weaknesses in particular policies for Markov decision problems. 相似文献

4.

Holistic oil field value management: using system dynamics for ‘intermediate level’ and ‘value-based’ modelling in the oil industry

D Corben R Stevenson E F Wolstenholme 《The Journal of the Operational Research Society》1999,50(4):383-391

System dynamics has been seen primarily as a strategic tool, most effectively used at the highest level of strategy to identify robust policy interventions under a wide range of scenarios. However, an alternative, complementary and powerful role is emerging. This is at an ‘intermediate level’ in organisations to coordinate and integrate policies across the value chain. It is at this level where business value, as defined by the discounted value of future free cash flow, is both created and destroyed. This paper introduces the need for ‘intermediate-level’ and ‘value-based’ modelling and emphasises the natural role of system dynamics in supporting a methodology to fulfil the need. It describes the development of an approach and its application in the oil industry to coordinate the response of people and tools within operational, financial and commercial functions across the value chain to address a variety of problems and issues. 相似文献

5.

Posterior consistency for partially observed Markov models

《Stochastic Processes and their Applications》2020,130(2):733-759

We establish the posterior consistency for parametric, partially observed, fully dominated Markov models. The prior is assumed to assign positive probability to all neighborhoods of the true parameter, for a distance induced by the expected Kullback–Leibler divergence between the parametric family members’ Markov transition densities. This assumption is easily checked in general. In addition, we show that the posterior consistency is implied by the consistency of the maximum likelihood estimator. The result is extended to possibly improper priors and non-stationary observations. Finally, we check our assumptions on a linear Gaussian model and a well-known stochastic volatility model. 相似文献

6.

An optimal inspection and replacement policy under incomplete state information

《European Journal of Operational Research》1986,27(1):117-128

This paper investigates an optimal inspection and replacement problem for a discrete-time Markovian deterioration system. It is assumed that the system is monitored incompletely by a certain mechanism which gives the decision maker some information about the exact state of the system. The problem is to obtain an optimal inspection and replacement policy minimizing the expected total discounted cost over an infinite horizon and formulated as a partially observable Markov decision process. Furthermore, under some reasonable conditions reflecting the practical meaning of the deterioration, it is shown that there exists an optimal inspection and replacement policy in the class of monotonic four-region policies. 相似文献

7.

A new strong optimality criterion for nonstationary Markov decision processes

Xianping Guo Peng Shi Weiping Zhu 《Mathematical Methods of Operations Research》2000,52(2):287-306

This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given. 相似文献

8.

An improved algorithm for solving communicating average reward Markov decision processes 总被引：1，自引：0，他引：1

Moshe Haviv Martin L. Puterman 《Annals of Operations Research》1991,28(1):229-242

This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527. 相似文献

9.

A variance minimization problem for a Markov decision process

《European Journal of Operational Research》1987,31(1):140-145

In the steady state of a discrete time Markov decision process, we consider the problem to find an optimal randomized policy that minimizes the variance of the reward in a transition among the policies which give the mean not less than a specified value. The problem is solved by introducing a parametric Markov decision process with average cost criterion. It is shown that there exists an optimal policy which is a mixture of at most two pure policies. As an application, the toymaker's problem is discussed. 相似文献

10.

Sample path optimality for a Markov optimization problem

《Stochastic Processes and their Applications》2005,115(5):769-779

We study a unichain Markov decision process i.e. a controlled Markov process whose state process under a stationary policy is an ergodic Markov chain. Here the state and action spaces are assumed to be either finite or countable. When the state process is uniformly ergodic and the immediate cost is bounded then a policy that minimizes the long-term expected average cost also has an nth stage sample path cost that with probability one is asymptotically less than the nth stage sample path cost under any other non-optimal stationary policy with a larger expected average cost. This is a strengthening in the Markov model case of the a.s. asymptotically optimal property frequently discussed in the literature. 相似文献

11.

Discounted continuous-time Markov decision processes with unbounded rates and randomized history-dependent policies: the dynamic programming approach

Alexey Piunovskiy Yi Zhang 《4OR: A Quarterly Journal of Operations Research》2014,12(1):49-75

This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation. 相似文献

12.

Adaptive control of constrained Markov chains: Criteria and policies

Eitan Altman Adam Shwartz 《Annals of Operations Research》1991,28(1):101-134

We consider the constrained optimization of a finite-state, finite action Markov chain. In the adaptive problem, the transition probabilities are assumed to be unknown, and no prior distribution on their values is given. We consider constrained optimization problems in terms of several cost criteria which are asymptotic in nature. For these criteria we show that it is possible to achieve the same optimal cost as in the non-adaptive case.We first formulate a constrained optimization problem under each of the cost criteria and establish the existence of optimal stationary policies.Since the adaptive problem is inherently non-stationary, we suggest a class ofAsymptotically Stationary (AS) policies, and show that, under each of the cost criteria, the costs of an AS policy depend only on its limiting behavior. This property implies that there exist optimal AS policies. A method for generating adaptive policies is then suggested, which leads to strongly consistent estimators for the unknown transition probabilities. A way to guarantee that these policies are also optimal is to couple them with the adaptive algorithm of [3]. This leads to optimal policies for each of the adaptive constrained optimization problems under discussion.This work was supported in part through United States-Israel Binational Science Foundation Grant BSF 85-00306. 相似文献

13.

Equivalence of Lyapunov stability criteria in a class of Markov decision processes

Rolando Cavazos-Cadena Onésimo Hernández-Lerma 《Applied Mathematics and Optimization》1992,26(2):113-137

We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.This research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152. 相似文献

14.

Markov programming with policy constraints

N.A.J. Hastings D. Sadjadi 《European Journal of Operational Research》1979,3(3):253-255

In the theory and applications of Markov decision processes introduced by Howard and subsequently developed by many authors, it is assumed that actions can be chosen independently at each state. A policy constrained Markov decision process is one where selecting a given action in one state restricts the choice of actions in another. This note describes a method for determining a maximal gain policy in the policy constrained case. The method involves the use of bounds on the gain of the feasible policies to produce a policy ranking list. This list then forms a basis for a bounded enumeration procedure which yields the optimal policy. 相似文献

15.

Mean,variance, and probabilistic criteria in finite Markov decision processes: A review 总被引：3，自引：0，他引：3

D. J. White 《Journal of Optimization Theory and Applications》1988,56(1):1-29

This paper is a survey of papers which make use of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return). It covers infinite-horizon nondiscounted formulations, infinite-horizon discounted formulations, and finite-horizon formulations. For problem formulations in terms solely of the probabilities of being in each state and taking each action, policy equivalence results are given which allow policies to be restricted to the class of Markov policies or to the randomizations of deterministic Markov policies. For problems which cannot be stated in such terms, in terms of the primitive state setI, formulations involving a redefinition of the states are examined.The author would like to thank two referees for a very thorough and helpful referceing of the original article and for the extra references (Refs. 47–52) now added to the original reference list. 相似文献

16.

A new condition for the existence of optimal stationary policies in average cost Markov decision processes

《Operations Research Letters》1986,5(1):17-23

Discrete time countable state Markov decision processes with finite decision sets and bounded costs are considered. Conditions are given under which an unbounded solution to the average cost optimality equation exists and yields an optimal stationary policy. A new form of the optimality equation is derived for the case in which every stationary policy gives rise to an ergodic Markov chain. 相似文献

17.

Optimality of monotonic policies for two-action Markovian decision processes,with applications to control of queues with delayed information

Eitan Altman Shaler Stidham Jr. 《Queueing Systems》1995,21(3-4):267-291

We consider a discrete-time Markov decision process with a partially ordered state space and two feasible control actions in each state. Our goal is to find general conditions, which are satisfied in a broad class of applications to control of queues, under which an optimal control policy is monotonic. An advantage of our approach is that it easily extends to problems with both information and action delays, which are common in applications to high-speed communication networks, among others. The transition probabilities are stochastically monotone and the one-stage reward submodular. We further assume that transitions from different states are coupled, in the sense that the state after a transition is distributed as a deterministic function of the current state and two random variables, one of which is controllable and the other uncontrollable. Finally, we make a monotonicity assumption about the sample-path effect of a pairwise switch of the actions in consecutive stages. Using induction on the horizon length, we demonstrate that optimal policies for the finite- and infinite-horizon discounted problems are monotonic. We apply these results to a single queueing facility with control of arrivals and/or services, under very general conditions. In this case, our results imply that an optimal control policy has threshold form. Finally, we show how monotonicity of an optimal policy extends in a natural way to problems with information and/or action delay, including delays of more than one time unit. Specifically, we show that, if a problem without delay satisfies our sufficient conditions for monotonicity of an optimal policy, then the same problem with information and/or action delay also has monotonic (e.g., threshold) optimal policies. 相似文献

18.

Examining military medical evacuation dispatching policies utilizing a Markov decision process model of a controlled queueing system

Phillip R. Jenkins Matthew J. Robbins Brian J. Lunday 《Annals of Operations Research》2018,271(2):641-678

Military medical planners must develop dispatching policies that dictate how aerial medical evacuation (MEDEVAC) units are utilized during major combat operations. The objective of this research is to determine how to optimally dispatch MEDEVAC units in response to 9-line MEDEVAC requests to maximize MEDEVAC system performance. A discounted, infinite horizon Markov decision process (MDP) model is developed to examine the MEDEVAC dispatching problem. The MDP model allows the dispatching authority to accept, reject, or queue incoming requests based on a request’s classification (i.e., zone and precedence level) and the state of the MEDEVAC system. A representative planning scenario based on contingency operations in southern Afghanistan is utilized to investigate the differences between the optimal dispatching policy and three practitioner-friendly myopic policies. Two computational experiments are conducted to examine the impact of selected MEDEVAC problem features on the optimal policy and the system performance measure. Several excursions are examined to identify how the 9-line MEDEVAC request arrival rate and the MEDEVAC flight speeds impact the optimal dispatching policy. Results indicate that dispatching MEDEVAC units considering the precedence level of requests and the locations of busy MEDEVAC units increases the performance of the MEDEVAC system. These results inform the development and implementation of MEDEVAC tactics, techniques, and procedures by military medical planners. Moreover, an analysis of solution approaches for the MEDEVAC dispatching problem reveals that the policy iteration algorithm substantially outperforms the linear programming algorithms executed by CPLEX 12.6 with regard to computational effort. This result supports the claim that policy iteration remains the superlative solution algorithm for exactly solving computationally tractable Markov decision problems. 相似文献

19.

Continuous time Markovian decision processes average return criterion

Prasadarao Kakumanu 《Journal of Mathematical Analysis and Applications》1975,52(1):173-188

Continuous time Markovian decision models with countable state space are investigated. The existence of an optimal stationary policy is established for the expected average return criterion function. It is shown that the expected average return can be expressed as an expected discounted return of a related Markovian decision process. A policy iteration method is given which converges to an optimal deterministic policy, the policy so obtained is shown optimal over all Markov policies. 相似文献

20.

On Dynamic Generalized Linear Models with Applications

Sourish Das Dipak K. Dey 《Methodology and Computing in Applied Probability》2013,15(2):407-421

In this paper we combine the idea of ‘power steady model’, ‘discount factor’ and ‘power prior’, for a general class of filter model, more specifically within a class of dynamic generalized linear models (DGLM). We show an optimality property for our proposed method and present the particle filter algorithm for DGLM as an alternative to Markov chain Monte Carlo method. We also present two applications; one on dynamic Poisson models for hurricane count data in Atlantic ocean and the another on the dynamic Poisson regression model for longitudinal count data. 相似文献