首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The Markov Decision Process (MDP) framework is a tool for the efficient modelling and solving of sequential decision-making problems under uncertainty. However, it reaches its limits when state and action spaces are large, as can happen for spatially explicit decision problems. Factored MDPs and dedicated solution algorithms have been introduced to deal with large factored state spaces. But the case of large action spaces remains an issue. In this article, we define graph-based Markov Decision Processes (GMDPs), a particular Factored MDP framework which exploits the factorization of the state space and the action space of a decision problem. Both spaces are assumed to have the same dimension. Transition probabilities and rewards are factored according to a single graph structure, where nodes represent pairs of state/decision variables of the problem. The complexity of this representation grows only linearly with the size of the graph, whereas the complexity of exact resolution grows exponentially. We propose an approximate solution algorithm exploiting the structure of a GMDP and whose complexity only grows quadratically with the size of the graph and exponentially with the maximum number of neighbours of any node. This algorithm, referred to as MF-API, belongs to the family of Approximate Policy Iteration (API) algorithms. It relies on a mean-field approximation of the value function of a policy and on a search limited to the suboptimal set of local policies. We compare it, in terms of performance, with two state-of-the-art algorithms for Factored MDPs: SPUDD and Approximate Linear Programming (ALP). Our experiments show that SPUDD is not generally applicable to solving GMDPs, due to the size of the action space we want to tackle. On the other hand, ALP can be adapted to solve GMDPs. We show that ALP is faster than MF-API and provides solutions of similar quality for most problems. However, for some problems MF-API provides significantly better policies, and in all cases provides a better approximation of the value function of approximate policies. These promising results show that the GMDP model offers a convenient framework for modelling and solving a large range of spatial and structured planning problems, that can arise in many different domains where processes are managed over networks: natural resources, agriculture, computer networks, etc.  相似文献   

2.
Decision-theoretic troubleshooting is one of the areas to which Bayesian networks can be applied. Given a probabilistic model of a malfunctioning man-made device, the task is to construct a repair strategy with minimal expected cost. The problem has received considerable attention over the past two decades. Efficient solution algorithms have been found for simple cases, whereas other variants have been proven NP-complete. We study several variants of the problem found in literature, and prove that computing approximate troubleshooting strategies is NP-hard. In the proofs, we exploit a close connection to set-covering problems.  相似文献   

3.
In this paper we consider a homotopy deformation approach to solving Markov decision process problems by the continuous deformation of a simpler Markov decision process problem until it is identical with the original problem. Algorithms and performance bounds are given.  相似文献   

4.
We present an implementation of the procedure for determining a suboptimal policy for a large-scale Markov decision process (MDP) presented in Part 1. An operation count analysis illuminates the significant computational benefits of this procedure for determining an optimal policy relative to a procedure for determining a suboptimal policy based on state and action space aggregation. Results of a preliminary numerical study indicate that the quality of the suboptimal policy produced by the 3MDP approach shows promise.This research has been supported by NSF Grants Nos. ECS-80-18266 and ECS-83-19355.  相似文献   

5.
This study presents a learning automata-based harmony search (LAHS) for unconstrained optimization of continuous problems. The harmony search (HS) algorithm performance strongly depends on the fine tuning of its parameters, including the harmony consideration rate (HMCR), pitch adjustment rate (PAR) and bandwidth (bw). Inspired by the spur-in-time responses in the musical improvisation process, learning capabilities are employed in the HS to select these parameters based on spontaneous reactions. An extensive numerical investigation is conducted on several well-known test functions, and the results are compared with the HS algorithm and its prominent variants, including the improved harmony search (IHS), global-best harmony search (GHS) and self-adaptive global-best harmony search (SGHS). The numerical results indicate that the LAHS is more efficient in finding optimum solutions and outperforms the existing HS algorithm variants.  相似文献   

6.
    
This note presents a technique that is useful for the study of piecewise deterministic Markov decision processes (PDMDPs) with general policies and unbounded transition intensities. This technique produces an auxiliary PDMDP from the original one. The auxiliary PDMDP possesses certain desired properties, which may not be possessed by the original PDMDP. We apply this technique to risk-sensitive PDMDPs with total cost criteria, and comment on its connection with the uniformization technique.  相似文献   

7.
In this paper we use an approach which uses a superharmonic property of a sequence of functions generated by an algorithm to show that these functions converge in a non-increasing manner to the optimal value function for our problem, and bounds are given for the loss of optimality if the computational process is terminated at any iteration. The basic procedure is to add an additional linear term at each iteration, selected by solving a particular optimisation problem, for which primal and dual linear programming formulations are given.  相似文献   

8.
We introduce a journey planning problem in multi-modal transportation networks under uncertainty. The goal is to find a journey, possibly involving transfers between different transport services, from a given origin to a given destination within a specified time horizon. Due to uncertainty in travel times, the arrival times of transport services at public transport stops are modeled as random variables. If a transfer between two services is rendered unsuccessful, the commuter has to reconsider the remaining path to the destination. The problem is modeled as a Markov decision process in which states are defined as paths in the transport network. The main contribution is a backward induction method that generates an optimal policy for traversing the public transport network in terms of maximizing the probability of reaching the destination in time. By assuming history independence and independence of successful transfers between services we obtain approximate methods for the same problem. Analysis and numerical experiments suggest that while solving the path dependent model requires the enumeration of all paths from the origin to the destination, the proposed approximations may be useful for practical purposes due to their computational simplicity. In addition to on-time arrival probability, we show how travel and overdue costs can be taken into account, making the model applicable to freight transportation problems.  相似文献   

9.
This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.  相似文献   

10.
We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ  . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure μnμn of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model.  相似文献   

11.
Mobile communication technologies enable truck drivers to keep abreast of changing traffic conditions in real-time. We assume that such communication capability exists for a single vehicle traveling from a known origin to a known destination where certain arcs en route are congested, perhaps as the result of an accident. Further, we know the likelihood, as a function of congestion duration, that congested arcs will become uncongested and thus less costly to traverse. Using a Markov decision process, we then model and analyze the problem of constructing a minimum expected total cost route from an origin to a destination that anticipates and then responds to changes in congestion, if they occur, while the vehicle is en route. We provide structural results and illustrate the behavior of an optimal policy with several numerical examples and demonstrate the superiority of an optimal anticipatory policy, relative to a route design approach that reflects the reactive nature of current routing procedures.  相似文献   

12.
《Optimization》2012,61(5):651-670
Optimality problems in infinite horizon, discrete time, vector criterion Markov and semi-Markov decision processes are expressed as standard problems of multiobjective linear programming. Processes with discounting, absorbing processes and completely ergodie processes without discounting are investigated. The common properties and special structure of derived multiobjective linear programming problems are overviewed. Computational simplicities associated with these problems in comparison with general multiobjective linear programming problems are discussed. Methods for solving these problems are overviewed and simple numerical examples are given.  相似文献   

13.
Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy  相似文献   

14.
We consider a variation of the hypercube model in which there are N distinguishable servers and R types of customers. Customers that find all servers busy (blocked customers) are lost. When service times are exponentially distributed and customers arrive according to independent Poisson streams, we show that the policy which always assigns customers to the fastest available server minimizes the long-run average number of lost customers. Furthermore, we derive an upper bound for the blocking probability and the long-run average number of customers lost.  相似文献   

15.
16.
Structural properties of stochastic dynamic programs are essential to understanding the nature of the solutions and in deriving appropriate approximation techniques. We concentrate on a class of multidimensional Markov decision processes and derive sufficient conditions for the monotonicity of the value functions. We illustrate our result in the case of the multiproduct batch dispatch (MBD) problem.  相似文献   

17.
18.
This paper presents a self-adaptive global best harmony search (SGHS) algorithm for solving continuous optimization problems. In the proposed SGHS algorithm, a new improvisation scheme is developed so that the good information captured in the current global best solution can be well utilized to generate new harmonies. The harmony memory consideration rate (HMCR) and pitch adjustment rate (PAR) are dynamically adapted by the learning mechanisms proposed. The distance bandwidth (BW) is dynamically adjusted to favor exploration in the early stages and exploitation during the final stages of the search process. Extensive computational simulations and comparisons are carried out by employing a set of 16 benchmark problems from literature. The computational results show that the proposed SGHS algorithm is more effective in finding better solutions than the state-of-the-art harmony search (HS) variants.  相似文献   

19.
1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe...  相似文献   

20.
In this paper we study the problem of personnel planning in care-at-home facilities. We model the system as a Markov decision process, which leads to a high-dimensional control problem. We study monotonicity properties of the system and derive structural results for the optimal policy. Based on these insights, we propose a trunk reservation heuristic to control the system. We provide numerical evidence that the heuristic yields close to optimal performance, and scales well for large problem instances.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号