首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this paper we consider a homotopy deformation approach to solving Markov decision process problems by the continuous deformation of a simpler Markov decision process problem until it is identical with the original problem. Algorithms and performance bounds are given.  相似文献   

2.
This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the"limit"one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation.  相似文献   

3.
《Optimization》2012,61(4):773-800
Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs.  相似文献   

4.
This paper studies the synthesis of controllers for discrete-time, continuous state stochastic systems subject to omega-regular specifications using finite-state abstractions. Omega-regular properties allow specifying complex behaviors and encompass, for example, linear temporal logic. First, we present a synthesis algorithm for minimizing or maximizing the probability that a discrete-time switched stochastic system with a finite number of modes satisfies an omega-regular property. Our approach relies on a finite-state abstraction of the underlying dynamics in the form of a Bounded-parameter Markov Decision Process arising from a finite partition of the system’s domain. Such Markovian abstractions allow for a range of probabilities of transition between states for each selected action representing a mode of the original system. Our method is built upon an analysis of the Cartesian product between the abstraction and a Deterministic Rabin Automaton encoding the specification of interest or its complement. Specifically, we show that synthesis can be decomposed into a qualitative problem, where the so-called greatest permanent winning components of the product automaton are created, and a quantitative problem, which requires maximizing the probability of reaching this component in the worst-case instantiation of the transition intervals. Additionally, we propose a quantitative metric for measuring the quality of the designed controller with respect to the continuous abstracted states and devise a specification-guided domain partition refinement heuristic with the objective of reaching a user-defined optimality target. Next, we present a method for computing control policies for stochastic systems with a continuous set of available inputs. In this case, the system is assumed to be affine in input and disturbance, and we derive a technique for solving the qualitative and quantitative problems in the resulting finite-state abstractions of such systems. For this, we introduce a new type of abstractions called Controlled Interval-valued Markov Chains. Specifically, we show that the greatest permanent winning component of such abstractions are found by appropriately partitioning the continuous input space in order to generate a bounded-parameter Markov decision process that accounts for all possible qualitative transitions between the finite set of states. Then, the problem of maximizing the probability of reaching these components is cast as a (possibly non-convex) optimization problem over the continuous set of available inputs. A metric of quality for the synthesized controller and a partition refinement scheme are described for this framework as well. Finally, we present a detailed case study.  相似文献   

5.
We present an implementation of the procedure for determining a suboptimal policy for a large-scale Markov decision process (MDP) presented in Part 1. An operation count analysis illuminates the significant computational benefits of this procedure for determining an optimal policy relative to a procedure for determining a suboptimal policy based on state and action space aggregation. Results of a preliminary numerical study indicate that the quality of the suboptimal policy produced by the 3MDP approach shows promise.This research has been supported by NSF Grants Nos. ECS-80-18266 and ECS-83-19355.  相似文献   

6.
We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ  . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure μnμn of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model.  相似文献   

7.
This paper is a survey of papers which make use of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return). It covers infinite-horizon nondiscounted formulations, infinite-horizon discounted formulations, and finite-horizon formulations. For problem formulations in terms solely of the probabilities of being in each state and taking each action, policy equivalence results are given which allow policies to be restricted to the class of Markov policies or to the randomizations of deterministic Markov policies. For problems which cannot be stated in such terms, in terms of the primitive state setI, formulations involving a redefinition of the states are examined.The author would like to thank two referees for a very thorough and helpful referceing of the original article and for the extra references (Refs. 47–52) now added to the original reference list.  相似文献   

8.
《Optimization》2012,61(5):651-670
Optimality problems in infinite horizon, discrete time, vector criterion Markov and semi-Markov decision processes are expressed as standard problems of multiobjective linear programming. Processes with discounting, absorbing processes and completely ergodie processes without discounting are investigated. The common properties and special structure of derived multiobjective linear programming problems are overviewed. Computational simplicities associated with these problems in comparison with general multiobjective linear programming problems are discussed. Methods for solving these problems are overviewed and simple numerical examples are given.  相似文献   

9.
It is known that the value function in an unconstrained Markov decision process with finitely many states and actions is a piecewise rational function in the discount factor a, and that the value function can be expressed as a Laurent series expansion about = 1 for close enough to 1. We show in this paper that this property also holds for the value function of Markov decision processes with additional constraints. More precisely, we show by a constructive proof that there are numbers O = o <1 <... < m–1 < m = 1 such that for everyj = 1, 2, ...,m – 1 either the problem is not feasible for all discount factors in the open interval (j–1, j) or the value function is a rational function in a in the closed interval [j–1, j]. As a consequence, if the constrained problem is feasible in the neighborhood of = 1, then the value function has a Laurent series expansion about = 1. Our proof technique for the constrained case provides also a new proof for the unconstrained case.  相似文献   

10.
Mobile communication technologies enable truck drivers to keep abreast of changing traffic conditions in real-time. We assume that such communication capability exists for a single vehicle traveling from a known origin to a known destination where certain arcs en route are congested, perhaps as the result of an accident. Further, we know the likelihood, as a function of congestion duration, that congested arcs will become uncongested and thus less costly to traverse. Using a Markov decision process, we then model and analyze the problem of constructing a minimum expected total cost route from an origin to a destination that anticipates and then responds to changes in congestion, if they occur, while the vehicle is en route. We provide structural results and illustrate the behavior of an optimal policy with several numerical examples and demonstrate the superiority of an optimal anticipatory policy, relative to a route design approach that reflects the reactive nature of current routing procedures.  相似文献   

11.
This paper is concerned with the adaptive control problem, over the infinite horizon, for partially observable Markov decision processes whose transition functions are parameterized by an unknown vector. We treat finite models and impose relatively mild assumptions on the transition function. Provided that a sequence of parameter estimates converging in probability to the true parameter value is available, we show that the certainty equivalence adaptive policy is optimal in the long-run average sense.  相似文献   

12.
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms.  相似文献   

13.
14.
We consider a service system with a single server, a finite waiting room and two classes of customers with deterministic service time. Primary jobs arrive at random and are admitted whenever there is room in the system. At the beginning of each period, secondary jobs can be admitted from an infinite pool. A revenue is earned upon admission of each job, with the primary jobs bringing a higher contribution than the secondary jobs, the objective being to maximize the total discounted revenue over an infinite horizon. We model the system as a discrete time Markov decision process and show that a monotone admission policy is optimal when the number of primary arrivals has a fixed distribution. Moreover, when the primary arrival distribution varies with time according to a finite state Markov chain, we show that the optimal policy is conditionally monotone and that the numerical computation of an optimal policy, in this case, is substantially more difficult than in the case of stationary arrivals.This research was supported in part by the National Science Foundation, under grant ECS-8803061, while the author was at the University of Arizona.  相似文献   

15.
This work proposes an algorithm that makes use of partial information to improve the convergence properties of the value iteration algorithm in terms of the overall computational complexity. The algorithm iterates on a series of increasingly refined approximate models that converges to the true model according to an optimal linear rate, which coincides with the convergence rate of the original value iteration algorithm. The paper investigates the properties of the proposed algorithm and features a series of switchover queue examples which illustrates the efficacy of the method.  相似文献   

16.
This paper presents a novel harmony generation method based on decision-theoretic planning. We are the first to model music generation using Markov decision processes (MDPs). We give a proof of concept for this approach by using MDP planning to generate four-part harmony, given the melody or soprano line. Our initial results show feasibility, and show the variance possible, depending on the choice of reward functions.  相似文献   

17.
This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.  相似文献   

18.
Structural properties of stochastic dynamic programs are essential to understanding the nature of the solutions and in deriving appropriate approximation techniques. We concentrate on a class of multidimensional Markov decision processes and derive sufficient conditions for the monotonicity of the value functions. We illustrate our result in the case of the multiproduct batch dispatch (MBD) problem.  相似文献   

19.
We give two simple axioms that characterize a simple functional form for aggregation of column stochastic matrices (i.e., Markov processes). Several additional observations are made about such aggregation, including the special case in which the aggregated process is Markovian relative to the original one.  相似文献   

20.
By adopting a nice auxiliary transform of Markov operators, we derive new bounds for the first eigenvalue of the generator corresponding to symmetric Markov processes. Our results not only extend the related topic in the literature, but also are efficiently used to study the first eigenvalue of birth-death processes with killing and that of elliptic operators with killing on half line. In particular, we obtain two approximation procedures for the first eigenvalue of birth-death processes with killing, and present qualitatively sharp upper and lower bounds for the first eigenvalue of elliptic operators with killing on half line.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号