首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
We consider limiting average Markov decision processes (MDP) with finite state and action spaces. We propose some algorithms to determine optimal strategies for deterministic and general MDPs. These algorithms are based on graph theory and the construction of levels in some aggregated MDP.  相似文献   

2.
We consider a Markov decision process (MDP) under average reward criterion. We investigate the decomposition of such MDP into smaller MDPs by using the strongly connected classes in the associated graph. Then, by introducing the associated levels, we construct an aggregation-disaggregation algorithm for the computation of an optimal strategy for the original MDP.  相似文献   

3.
We consider a discrete time finite Markov decision process (MDP) with the discounted and weighted reward optimality criteria. In [1] the authors considered some decomposition of limiting average MDPs. In this paper, we use an analogous approach for discounted and weighted MDPs. Then, we construct some hierarchical decomposition algorithms for both discounted and weighted MDPs.  相似文献   

4.
5.
6.
We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of finite horizon MDPs to the infinite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with infinite state space, (3) convergence of MDPs as the discount factor goes to a limit. In all these cases we establish the convergence of optimal values and policies. Moreover, based on the optimal policy for the limiting problem, we construct policies which are almost optimal for the other (approximating) problems. Based on the convergence of MDPs with a truncated state space to the problem with infinite state space, we show that an optimal stationary policy exists such that the number of randomisations it uses is less or equal to the number of constraints plus one. We finally apply the results to a dynamic scheduling problem.This work was partially supported by the Chateaubriand fellowship from the French embassy in Israel and by the European Grant BRA-QMIPS of CEC DG XIII  相似文献   

7.
本文考虑的是Hinderer提出的状态空间和行动空间均业般集的非平稳MDP平均模型,利用扩大状态空间的方法,建立了此模型的最优方程,并给出了最优方程有解及蜞 最优策略存在的条件,从最优方程出发,用概率的方法证明了最优策略的存在性,最后还提供了此模型的值迭代算法及其收敛性证明,从而推广了Smith。L.Lassere,B「3」及Larma^「6」等的主要结果。  相似文献   

8.
Motivated by an application to school funding, we introduce the notion of a robust decomposable Markov decision process (MDP). A robust decomposable MDP model applies to situations where several MDPs, with the transition probabilities in each only known through an uncertainty set, are coupled together by joint resource constraints. Robust decomposable MDPs are different than both decomposable MDPs, and robust MDPs and cannot be solved by a direct application of the solution methods from either of those areas. In fact, to the best of our knowledge, there is no known method to tractably compute optimal policies in robust, decomposable MDPs. We show how to tractably compute good policies for this model, and apply the derived method to a stylized school funding example.  相似文献   

9.
In this paper, partially observable Markov decision processes (POMDPs) with discrete state and action space under the average reward criterion are considered from a recent-developed sensitivity point of view. By analyzing the average-reward performance difference formula, we propose a policy iteration algorithm with step sizes to obtain an optimal or local optimal memoryless policy. This algorithm improves the policy along the same direction as the policy iteration does and suitable step sizes guarantee the convergence of the algorithm. Moreover, the algorithm can be used in Markov decision processes (MDPs) with correlated actions. Two numerical examples are provided to illustrate the applicability of the algorithm.  相似文献   

10.
11.
Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy  相似文献   

12.
The simplicial algorithm is a kind of branch-and-bound method for computing a globally optimal solution of a convex maximization problem. Its convergence under the ω-subdivision strategy was an open question for some decades until Locatelli and Raber proved it (J Optim Theory Appl 107:69–79, 2000). In this paper, we modify their linear programming relaxation and give a different and simpler proof of the convergence. We also develop a new convergent subdivision strategy, and report numerical results of comparing it with existing strategies.  相似文献   

13.
The Markov Decision Process (MDP) framework is a tool for the efficient modelling and solving of sequential decision-making problems under uncertainty. However, it reaches its limits when state and action spaces are large, as can happen for spatially explicit decision problems. Factored MDPs and dedicated solution algorithms have been introduced to deal with large factored state spaces. But the case of large action spaces remains an issue. In this article, we define graph-based Markov Decision Processes (GMDPs), a particular Factored MDP framework which exploits the factorization of the state space and the action space of a decision problem. Both spaces are assumed to have the same dimension. Transition probabilities and rewards are factored according to a single graph structure, where nodes represent pairs of state/decision variables of the problem. The complexity of this representation grows only linearly with the size of the graph, whereas the complexity of exact resolution grows exponentially. We propose an approximate solution algorithm exploiting the structure of a GMDP and whose complexity only grows quadratically with the size of the graph and exponentially with the maximum number of neighbours of any node. This algorithm, referred to as MF-API, belongs to the family of Approximate Policy Iteration (API) algorithms. It relies on a mean-field approximation of the value function of a policy and on a search limited to the suboptimal set of local policies. We compare it, in terms of performance, with two state-of-the-art algorithms for Factored MDPs: SPUDD and Approximate Linear Programming (ALP). Our experiments show that SPUDD is not generally applicable to solving GMDPs, due to the size of the action space we want to tackle. On the other hand, ALP can be adapted to solve GMDPs. We show that ALP is faster than MF-API and provides solutions of similar quality for most problems. However, for some problems MF-API provides significantly better policies, and in all cases provides a better approximation of the value function of approximate policies. These promising results show that the GMDP model offers a convenient framework for modelling and solving a large range of spatial and structured planning problems, that can arise in many different domains where processes are managed over networks: natural resources, agriculture, computer networks, etc.  相似文献   

14.
Summary. The aim of this paper is to describe an efficient adaptive strategy for discretizing ill-posed linear operator equations of the first kind: we consider Tikhonov-Phillips regularization with a finite dimensional approximation instead of A. We propose a sparse matrix structure which still leads to optimal convergences rates but requires substantially less scalar products for computing compared with standard methods. Received September 16, 1998 / Revised version received August 4, 1999 / Published online August 2, 2000  相似文献   

15.
This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527.  相似文献   

16.
郭先平 《数学学报》2001,44(2):333-342
本文考虑具有 Borel状态空间和行动空间非平稳 MDP的平均方差准则.首先,在遍历条件下,利用最优方程,证明了关于平均期望目标最优马氏策略的存在性.然后,通过构造新的模型,利用马氏过程的理论,进一步证明了在关于平均期望目标是最优的一类马氏策略中,存在一个马氏策略使得平均方差达到最小.作为本文的特例还得到了 Dynkin E. B.和 Yushkevich A. A.及 Kurano M.等中的主要结果.  相似文献   

17.
In this paper we consider a two-person zero-sum discounted stochastic game with ARAT structure and formulate the problem of computing a pair of pure optimal stationary strategies and the corresponding value vector of such a game as a vertical linear complementarity problem. We show that Cottle-Dantzig’s algorithm (a generalization of Lemke’s algorithm) can solve this problem under a mild assumption. Received July 8, 1998 / Revised version received April 16, 1999? Published online September 15, 1999  相似文献   

18.
This note describes sufficient conditions under which total-cost and average-cost Markov decision processes (MDPs) with general state and action spaces, and with weakly continuous transition probabilities, can be reduced to discounted MDPs. For undiscounted problems, these reductions imply the validity of optimality equations and the existence of stationary optimal policies. The reductions also provide methods for computing optimal policies. The results are applied to a capacitated inventory control problem with fixed costs and lost sales.  相似文献   

19.
《Optimization》2012,61(3-4):385-392
In the steady state of an undiscounted Markov decision process, we consider the problem to find an optimal stationary probability distribution that maximizes the mean standard deviation ratio among all the stationary probability distributions. The problem injects considerations in MDPs from the relative point of view  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号