首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
The Markov Decision Process (MDP) framework is a tool for the efficient modelling and solving of sequential decision-making problems under uncertainty. However, it reaches its limits when state and action spaces are large, as can happen for spatially explicit decision problems. Factored MDPs and dedicated solution algorithms have been introduced to deal with large factored state spaces. But the case of large action spaces remains an issue. In this article, we define graph-based Markov Decision Processes (GMDPs), a particular Factored MDP framework which exploits the factorization of the state space and the action space of a decision problem. Both spaces are assumed to have the same dimension. Transition probabilities and rewards are factored according to a single graph structure, where nodes represent pairs of state/decision variables of the problem. The complexity of this representation grows only linearly with the size of the graph, whereas the complexity of exact resolution grows exponentially. We propose an approximate solution algorithm exploiting the structure of a GMDP and whose complexity only grows quadratically with the size of the graph and exponentially with the maximum number of neighbours of any node. This algorithm, referred to as MF-API, belongs to the family of Approximate Policy Iteration (API) algorithms. It relies on a mean-field approximation of the value function of a policy and on a search limited to the suboptimal set of local policies. We compare it, in terms of performance, with two state-of-the-art algorithms for Factored MDPs: SPUDD and Approximate Linear Programming (ALP). Our experiments show that SPUDD is not generally applicable to solving GMDPs, due to the size of the action space we want to tackle. On the other hand, ALP can be adapted to solve GMDPs. We show that ALP is faster than MF-API and provides solutions of similar quality for most problems. However, for some problems MF-API provides significantly better policies, and in all cases provides a better approximation of the value function of approximate policies. These promising results show that the GMDP model offers a convenient framework for modelling and solving a large range of spatial and structured planning problems, that can arise in many different domains where processes are managed over networks: natural resources, agriculture, computer networks, etc.  相似文献   

2.
Operations Research techniques are usually presented as distinct models. Difficult as it may often be, achieving linkage between these models could reveal their interdependency and make them easier for the user to understand. In this article three different models, namely Markov Chain, Dynamic Programming, and Markov Sequential Decision Processes, are used to solve an inventory problem based on the periodic review system. We show how the three models converge to the same (s, S) policy and we provide a numerical example to illustrate such a convergence.  相似文献   

3.
1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe...  相似文献   

4.
5.
This paper investigates Factored Markov Decision Processes with Imprecise Probabilities (MDPIPs); that is, Factored Markov Decision Processes (MDPs) where transition probabilities are imprecisely specified. We derive efficient approximate solutions for Factored MDPIPs based on mathematical programming. To do this, we extend previous linear programming approaches for linear approximations in Factored MDPs, resulting in a multilinear formulation for robust “maximin” linear approximations in Factored MDPIPs. By exploiting the factored structure in MDPIPs we are able to demonstrate orders of magnitude reduction in solution time over standard exact non-factored approaches, in exchange for relatively low approximation errors, on a difficult class of benchmark problems with millions of states.  相似文献   

6.
Factored Markov Decision Processes (MDPs) provide a compact representation for modeling sequential decision making problems with many variables. Approximate linear programming (LP) is a prominent method for solving factored MDPs. However, it cannot be applied to models with large treewidth due to the exponential number of constraints. This paper proposes a novel and efficient approximate method to represent the exponentially many constraints. We construct an augmented junction graph from the factored MDP, and represent the constraints using a set of cluster constraints and separator constraints, where the cluster constraints play the role of reducing the number of constraints, and the separator constraints enforce the consistency of neighboring clusters so as to improve the accuracy. In the case where the junction graph is tree-structured, our method provides an equivalent representation to the original constraints. In other cases, our method provides a good trade-off between computation and accuracy. Experimental results on different models show that our algorithm performs better than other approximate linear programming algorithms on computational cost or expected reward.  相似文献   

7.
We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of finite horizon MDPs to the infinite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with infinite state space, (3) convergence of MDPs as the discount factor goes to a limit. In all these cases we establish the convergence of optimal values and policies. Moreover, based on the optimal policy for the limiting problem, we construct policies which are almost optimal for the other (approximating) problems. Based on the convergence of MDPs with a truncated state space to the problem with infinite state space, we show that an optimal stationary policy exists such that the number of randomisations it uses is less or equal to the number of constraints plus one. We finally apply the results to a dynamic scheduling problem.This work was partially supported by the Chateaubriand fellowship from the French embassy in Israel and by the European Grant BRA-QMIPS of CEC DG XIII  相似文献   

8.
One of the foremost difficulties in solving Mixed-Integer Nonlinear Programs, either with exact or heuristic methods, is to find a feasible point. We address this issue with a new feasibility pump algorithm tailored for nonconvex Mixed-Integer Nonlinear Programs. Feasibility pumps are algorithms that iterate between solving a continuous relaxation and a mixed-integer relaxation of the original problems. Such approaches currently exist in the literature for Mixed-Integer Linear Programs and convex Mixed-Integer Nonlinear Programs: both cases exhibit the distinctive property that the continuous relaxation can be solved in polynomial time. In nonconvex Mixed-Integer Nonlinear Programming such a property does not hold, and therefore special care has to be exercised in order to allow feasibility pump algorithms to rely only on local optima of the continuous relaxation. Based on a new, high level view of feasibility pump algorithms as a special case of the well-known successive projection method, we show that many possible different variants of the approach can be developed, depending on how several different (orthogonal) implementation choices are taken. A remarkable twist of feasibility pump algorithms is that, unlike most previous successive projection methods from the literature, projection is ??naturally?? taken in two different norms in the two different subproblems. To cope with this issue while retaining the local convergence properties of standard successive projection methods we propose the introduction of appropriate norm constraints in the subproblems; these actually seem to significantly improve the practical performance of the approach. We present extensive computational results on the MINLPLib, showing the effectiveness and efficiency of our algorithm.  相似文献   

9.
The concept of redundancy is accepted in Operations Research and Information Theory. In Linear Programming, a constraint is said to be redundant if the feasible decision space is identical with or without the constraint. In Information Theory, redundancy is used as a measure of the stability against noise in transmission. Analogies with Multi Criteria Decision Making (MCDM) are indicated and it is argued that the redundancy concept should be used as a regular feature in conditioning and analysis of Multi Criteria Programs. Properties of a proposed conflict-based characterisation are stated and some existence results are derived. Redundancy is here intended for interactive methods, when the efficient set is progressively explored. A new redundancy test for the linear case is formulated from the framework. A probabilistic method based on correlation is proposed and tested for the non-linear case. Finally, some general guidelines are given concerning the redundancy problem.  相似文献   

10.
We consider Markov Decision Processes under light traffic conditions. We develop an algorithm to obtain asymptotically optimal policies for both the total discounted and the average cost criterion. This gives a general framework for several light traffic results in the literature. We illustrate the method by deriving the asymptotically optimal control of a simple ATM network.  相似文献   

11.
12.
Branching Constraint Satisfaction Problems (BCSPs) model a class of uncertain dynamic resource allocation problems. We describe the features of BCSPs, and show that the associated decision problem is NP-complete. Markov Decision Problems could be used in place of BCSPs, but we show analytically and empirically that, for the class of problems in question, the BCSP algorithms are more efficient than the related MDP algorithms.  相似文献   

13.
This paper deals with Markov Decision Processes (MDPs) on Borel spaces with possibly unbounded costs. The criterion to be optimized is the expected total cost with a random horizon of infinite support. In this paper, it is observed that this performance criterion is equivalent to the expected total discounted cost with an infinite horizon and a varying-time discount factor. Then, the optimal value function and the optimal policy are characterized through some suitable versions of the Dynamic Programming Equation. Moreover, it is proved that the optimal value function of the optimal control problem with a random horizon can be bounded from above by the optimal value function of a discounted optimal control problem with a fixed discount factor. In this case, the discount factor is defined in an adequate way by the parameters introduced for the study of the optimal control problem with a random horizon. To illustrate the theory developed, a version of the Linear-Quadratic model with a random horizon and a Logarithm Consumption-Investment model are presented.  相似文献   

14.
We consider a discrete time Markov Decision Process (MDP) under the discounted payoff criterion in the presence of additional discounted cost constraints. We study the sensitivity of optimal Stationary Randomized (SR) policies in this setting with respect to the upper bound on the discounted cost constraint functionals. We show that such sensitivity analysis leads to an improved version of the Feinberg–Shwartz algorithm (Math Oper Res 21(4):922–945, 1996) for finding optimal policies that are ultimately stationary and deterministic.  相似文献   

15.
Several interactive methods exist to identify nondominated solutions in a Multiple Objective Mixed Integer Linear Program. But what if the Decision Maker is also interested in sorting those solutions (assigning them to pre-established ordinal categories)? We propose an interactive “branch-and-bound like” technique to progressively build the nondominated set, combined with ELECTRE TRI method (Pessimistic procedure) to sort identified nondominated solutions. A disaggregation approach is considered in order to avoid direct definition of all ELECTRE TRI preference parameters. Weight-importance coefficients are inferred and category reference profiles are determined based on assignment examples provided by the Decision Maker. A computation tool was developed with a twofold purpose: support the Decision Maker involved in a decision process and provide a test bed for research purposes.  相似文献   

16.
In this paper we consider Markov Decision Processes with discounted cost and a random rate in Borel spaces. We establish the dynamic programming algorithm in finite and infinity horizon cases. We provide conditions for the existence of measurable selectors. And we show an example of consumption-investment problem. This research was partially supported by the PROMEP grant 103.5/05/40.  相似文献   

17.
It is shown how the dual of Fourier–Motzkin elimination can be applied to eliminating the constraints of an Integer Linear Program. The result will, in general, be to reduce the Integer Program to a single Diophantine equation together with a series of Linear homogeneous congruences. Extreme continuous solutions to the Diophantine equation give extreme solutions to the Linear Programming relaxation. Integral solutions to the Diophantine equation which also satisfy the congruences give all the solutions to the Integer Program.  相似文献   

18.
We prove in this Note the moderate deviation principle (MDP) for the averaging principle of a stochastic differential equation (SDE) in a fast random environment, modelized by an exponentially ergodic Markov process independent of the Wiener process driving the SDE. The main tools will be the method of Puhalskii for exponential tightness and a MDP for inhomogeneous functionals of Markov processes established in [5].  相似文献   

19.
The usual approach to finding optimal repair limits on failure of a component is to use a finite state approximation Markov Decision Process (MDP). In this paper an alternative approach is introduced. Assuming a stochastically increasing repair cost, the optimum solution is shown to satisfy a certain two-point boundary condition, first order differential equation. An asymptotic formula for the optimal repair limit function is derived. Numerical solutions are obtained for some Weibull and Special Erlang distributed time to failure distributions. The structural form of the repair limit function results in a solution procedure which is several orders of magnitude faster than is achievable using previous methods.  相似文献   

20.
This paper presents an application of the integration between Geographical Information Systems (GIS) and Multi-Criteria Decision Analysis (MCDA) to aid spatial decisions. We present a hypothetical case study to illustrate the GIS–MCDA integration: the selection of the best municipal district of Rio de Janeiro State, Brazil, in relation to the quality of urban life. The best municipal district is the one that presents the closest characteristics to those considered ideal by the decision-maker. The approach adopted is the Multi-Objective Linear Programming (MOLP) and the chosen method is the Pareto Race.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号