共查询到20条相似文献,搜索用时 15 毫秒
1.
Edilson F. Arruda 《Operations Research Letters》2011,39(3):193-197
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms. 相似文献
2.
The Markov decision process is studied under the maximization of the probability that total discounted rewards exceed a target level. We focus on and study the dynamic programing equations of the model. We give various properties of the optimal return operator and, for the infinite planning-horizon model, we characterize the optimal value function as a maximal fixed point of the previous operator. Various turnpike results relating the finite and infinite-horizon models are also given. 相似文献
3.
Yonghui Huang Xianping Guo 《European Journal of Operational Research》2011,212(1):131-140
This paper investigates finite horizon semi-Markov decision processes with denumerable states. The optimality is over the class of all randomized history-dependent policies which include states and also planning horizons, and the cost rate function is assumed to be bounded below. Under suitable conditions, we show that the value function is a minimum nonnegative solution to the optimality equation and there exists an optimal policy. Moreover, we develop an effective algorithm for computing optimal policies, derive some properties of optimal policies, and in addition, illustrate our main results with a maintenance system. 相似文献
4.
Mean,variance, and probabilistic criteria in finite Markov decision processes: A review 总被引:3,自引:0,他引:3
D. J. White 《Journal of Optimization Theory and Applications》1988,56(1):1-29
This paper is a survey of papers which make use of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return). It covers infinite-horizon nondiscounted formulations, infinite-horizon discounted formulations, and finite-horizon formulations. For problem formulations in terms solely of the probabilities of being in each state and taking each action, policy equivalence results are given which allow policies to be restricted to the class of Markov policies or to the randomizations of deterministic Markov policies. For problems which cannot be stated in such terms, in terms of the primitive state setI, formulations involving a redefinition of the states are examined.The author would like to thank two referees for a very thorough and helpful referceing of the original article and for the extra references (Refs. 47–52) now added to the original reference list. 相似文献
5.
This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature. 相似文献
6.
This paper focuses on solving a finite horizon semi-Markov decision process with multiple constraints. We convert the problem to a constrained absorbing discrete-time Markov decision process and then to an equivalent linear program over a class of occupancy measures. The existence, characterization and computation of constrained-optimal policies are established under suitable conditions. An example is given to demonstrate our results. 相似文献
7.
《Optimization》2012,61(9):1133-1150
This article presents a new method of linear programming (LP) for solving Markov decision processes (MDPs) based on the simplex method (SM). SM has shown to be the most efficient method in many practical problems; unfortunately, classical SM has an exponential complexity. Therefore, new SMs have emerged for obtaining optimal solutions in the most efficient way. The cosine simplex method (CSM) is one of them. CSM is based on the Karush Kuhn Tucker conditions, and is able to efficiently solve general LP problems. This work presents a new method named the Markov Cosine Simplex Method (MCSM) for solving MDP problems, which is an extension of CSM. In this article, the efficiency of MCSM is compared to the traditional revised simplex method (RSM); experimental results show that MCSM is far more efficient than RSM. 相似文献
8.
We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure μn of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model. 相似文献
9.
This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy. 相似文献
10.
《Optimization》2012,61(2-3):271-283
This paper presents a new concept of Markov decision processes: continuous time shock Markov decision processes, which model Markovian controlled systems sequentially shocked by its environment. Between two adjacent shocks, the system can be modeled by continuous time Markov decision processes. But according to each shock, the system's parameters are changed and an instantaneous state transition occurs. After presenting the model, we prove that the optimality equation, which consists of countable equations, has a unique solution in some function space Ω 相似文献
11.
K.D. Glazebrook 《Stochastic Processes and their Applications》1979,9(1):19-33
A large class of continuous parameter jump decision processes is considered. Pontryagin's Maximum Principle is used to derive a necessary condition for optimality. An optimal strategy may frequently be obtained explicitly. 相似文献
12.
This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given. 相似文献
13.
M. Sun 《Journal of Optimization Theory and Applications》1993,79(2):405-413
We introduce a revised simplex algorithm for solving a typical type of dynamic programming equation arising from a class of finite Markov decision processes. The algorithm also applies to several types of optimal control problems with diffusion models after discretization. It is based on the regular simplex algorithm, the duality concept in linear programming, and certain special features of the dynamic programming equation itself. Convergence is established for the new algorithm. The algorithm has favorable potential applicability when the number of actions is very large or even infinite. 相似文献
14.
D. J. White 《Mathematical Methods of Operations Research》1995,41(1):71-88
In this paper we use an approach which uses a superharmonic property of a sequence of functions generated by an algorithm to show that these functions converge in a non-increasing manner to the optimal value function for our problem, and bounds are given for the loss of optimality if the computational process is terminated at any iteration. The basic procedure is to add an additional linear term at each iteration, selected by solving a particular optimisation problem, for which primal and dual linear programming formulations are given. 相似文献
15.
Norihiro Mizuno 《Operations Research Letters》1985,3(6):307-311
This paper develops an efficient LP algorithm for solving single chain undiscounted Markov decision problems. The algorithm imposes, in the framework of the simplex method, the multiple choice constraints that exactly one basic variable be chosen from each Markov state. It is proved that the algorithm converges to an optimal solution in a finite number of steps. 相似文献
16.
Norihiro Mizuno 《Operations Research Letters》1983,2(3):127-133
We show that the LP formulation for an undiscounted multi-chain Markov decision problem can be put in a block upper-triangular form by a polynomial time procedure. Each minimal block (after an appropriate dynamic revision) gives rise to a single-chain Markov decision problem which can be treated independently. An optimal solution to each single-chain problem can be connected by auxiliary dual programs to obtain an optimal solution to a multi-chain problem. 相似文献
17.
We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model. 相似文献
18.
《Operations Research Letters》2021,49(1):55-61
This note presents a technique that is useful for the study of piecewise deterministic Markov decision processes (PDMDPs) with general policies and unbounded transition intensities. This technique produces an auxiliary PDMDP from the original one. The auxiliary PDMDP possesses certain desired properties, which may not be possessed by the original PDMDP. We apply this technique to risk-sensitive PDMDPs with total cost criteria, and comment on its connection with the uniformization technique. 相似文献
19.
20.
We present an Linear Programming formulation of MDPs with countable state and action spaces and no unichain assumption. This is an extension of the Hordijk and Kallenberg (1979) formulation in finite state and action spaces. We provide sufficient conditions for both existence of optimal solutions to the primal LP program and absence of duality gap. Then, existence of a (possibly randomized) average optimal policy is also guaranteed. Existence of a stationary average optimal deterministic policy is also investigated. 相似文献