期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On some algorithms for limiting average Markov decision processes

C. Daoui M. Abbad 《Operations Research Letters》2007,35(2):261-266

We consider limiting average Markov decision processes (MDP) with finite state and action spaces. We propose some algorithms to determine optimal strategies for deterministic and general MDPs. These algorithms are based on graph theory and the construction of levels in some aggregated MDP. 相似文献

2.

A decomposition algorithm for limiting average Markov decision problems

Mohammed Abbad Hatim Boustique 《Operations Research Letters》2003,31(6):473-476

We consider a Markov decision process (MDP) under average reward criterion. We investigate the decomposition of such MDP into smaller MDPs by using the strongly connected classes in the associated graph. Then, by introducing the associated levels, we construct an aggregation-disaggregation algorithm for the computation of an optimal strategy for the original MDP. 相似文献

3.

Hierarchical algorithms for discounted and weighted Markov decision processes

Abbad M. Daoui C. 《Mathematical Methods of Operations Research》2003,58(2):237-245

We consider a discrete time finite Markov decision process (MDP) with the discounted and weighted reward optimality criteria. In [1] the authors considered some decomposition of limiting average MDPs. In this paper, we use an analogous approach for discounted and weighted MDPs. Then, we construct some hierarchical decomposition algorithms for both discounted and weighted MDPs. 相似文献

4.

Semi-infinite Markov decision processes

Ming Chen Jerzy A. Filar Ke Liu 《Mathematical Methods of Operations Research》2000,51(1):115-137

相似文献

5.

Semi-infinite discounted Markov decision processes: Policy improvement and singular perturbations

Mohammed Abbad Khalid Rahhali 《Mathematical Methods of Operations Research》2001,54(2):279-290

相似文献

6.

Asymptotic properties of constrained Markov Decision Processes

Eitan Altman 《Mathematical Methods of Operations Research》1993,37(2):151-170

We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of finite horizon MDPs to the infinite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with infinite state space, (3) convergence of MDPs as the discount factor goes to a limit. In all these cases we establish the convergence of optimal values and policies. Moreover, based on the optimal policy for the limiting problem, we construct policies which are almost optimal for the other (approximating) problems. Based on the convergence of MDPs with a truncated state space to the problem with infinite state space, we show that an optimal stationary policy exists such that the number of randomisations it uses is less or equal to the number of constraints plus one. We finally apply the results to a dynamic scheduling problem.This work was partially supported by the Chateaubriand fellowship from the French embassy in Israel and by the European Grant BRA-QMIPS of CEC DG XIII 相似文献

7.

非平稳MDP平均模型及其算法

郭先平《应用数学与计算数学学报》1995,9(2):53-59

本文考虑的是Ｈｉｎｄｅｒｅｒ提出的状态空间和行动空间均业般集的非平稳ＭＤＰ平均模型，利用扩大状态空间的方法，建立了此模型的最优方程，并给出了最优方程有解及蜞最优策略存在的条件，从最优方程出发，用概率的方法证明了最优策略的存在性，最后还提供了此模型的值迭代算法及其收敛性证明，从而推广了Ｓｍｉｔｈ。Ｌ．Ｌａｓｓｅｒｅ，Ｂ「３」及Ｌａｒｍａ＾「６」等的主要结果。相似文献

8.

Robust decomposable Markov decision processes motivated by allocating school budgets

Nedialko B. Dimitrov Stanko Dimitrov Stefanka Chukova 《European Journal of Operational Research》2014

Motivated by an application to school funding, we introduce the notion of a robust decomposable Markov decision process (MDP). A robust decomposable MDP model applies to situations where several MDPs, with the transition probabilities in each only known through an uncertainty set, are coupled together by joint resource constraints. Robust decomposable MDPs are different than both decomposable MDPs, and robust MDPs and cannot be solved by a direct application of the solution methods from either of those areas. In fact, to the best of our knowledge, there is no known method to tractably compute optimal policies in robust, decomposable MDPs. We show how to tractably compute good policies for this model, and apply the derived method to a stylized school funding example. 相似文献

9.

Finding optimal memoryless policies of POMDPs under the expected average reward criterion

Yanjie Li Baoqun Yin 《European Journal of Operational Research》2011,211(3):556-567

In this paper, partially observable Markov decision processes (POMDPs) with discrete state and action space under the average reward criterion are considered from a recent-developed sensitivity point of view. By analyzing the average-reward performance difference formula, we propose a policy iteration algorithm with step sizes to obtain an optimal or local optimal memoryless policy. This algorithm improves the policy along the same direction as the policy iteration does and suitable step sizes guarantee the convergence of the algorithm. Moreover, the algorithm can be used in Markov decision processes (MDPs) with correlated actions. Two numerical examples are provided to illustrate the applicability of the algorithm. 相似文献

10.

Distributionally robust optimization for sequential decision-making

《Optimization》2012,61(12):2397-2426

相似文献

11.

Optimal models with maximizing probability of first achieving target value in the preceding stages

林元烈伍从斌康波大《中国科学A辑(英文版)》2003,46(3):396-414

Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy 相似文献

12.

A convergent simplicial algorithm with ω-subdivision and ω-bisection strategies

Takahito Kuno Paul E. K. Buckland 《Journal of Global Optimization》2012,52(3):371-390

The simplicial algorithm is a kind of branch-and-bound method for computing a globally optimal solution of a convex maximization problem. Its convergence under the ω-subdivision strategy was an open question for some decades until Locatelli and Raber proved it (J Optim Theory Appl 107:69–79, 2000). In this paper, we modify their linear programming relaxation and give a different and simpler proof of the convergence. We also develop a new convergent subdivision strategy, and report numerical results of comparing it with existing strategies. 相似文献

13.

A framework and a mean-field algorithm for the local control of spatial processes

Régis Sabbadin Nathalie Peyrard 《International Journal of Approximate Reasoning》2012,53(1):66-86

The Markov Decision Process (MDP) framework is a tool for the efficient modelling and solving of sequential decision-making problems under uncertainty. However, it reaches its limits when state and action spaces are large, as can happen for spatially explicit decision problems. Factored MDPs and dedicated solution algorithms have been introduced to deal with large factored state spaces. But the case of large action spaces remains an issue. In this article, we define graph-based Markov Decision Processes (GMDPs), a particular Factored MDP framework which exploits the factorization of the state space and the action space of a decision problem. Both spaces are assumed to have the same dimension. Transition probabilities and rewards are factored according to a single graph structure, where nodes represent pairs of state/decision variables of the problem. The complexity of this representation grows only linearly with the size of the graph, whereas the complexity of exact resolution grows exponentially. We propose an approximate solution algorithm exploiting the structure of a GMDP and whose complexity only grows quadratically with the size of the graph and exponentially with the maximum number of neighbours of any node. This algorithm, referred to as MF-API, belongs to the family of Approximate Policy Iteration (API) algorithms. It relies on a mean-field approximation of the value function of a policy and on a search limited to the suboptimal set of local policies. We compare it, in terms of performance, with two state-of-the-art algorithms for Factored MDPs: SPUDD and Approximate Linear Programming (ALP). Our experiments show that SPUDD is not generally applicable to solving GMDPs, due to the size of the action space we want to tackle. On the other hand, ALP can be adapted to solve GMDPs. We show that ALP is faster than MF-API and provides solutions of similar quality for most problems. However, for some problems MF-API provides significantly better policies, and in all cases provides a better approximation of the value function of approximate policies. These promising results show that the GMDP model offers a convenient framework for modelling and solving a large range of spatial and structured planning problems, that can arise in many different domains where processes are managed over networks: natural resources, agriculture, computer networks, etc. 相似文献

14.

An adaptive discretization for Tikhonov-Phillips regularization with a posteriori parameter selection

Peter Maaß Sergei V. Pereverzev Ronny Ramlau Sergei G. Solodky 《Numerische Mathematik》2001,87(3):485-502

Summary. The aim of this paper is to describe an efficient adaptive strategy for discretizing ill-posed linear operator equations of the first kind: we consider Tikhonov-Phillips regularization with a finite dimensional approximation instead of A. We propose a sparse matrix structure which still leads to optimal convergences rates but requires substantially less scalar products for computing compared with standard methods. Received September 16, 1998 / Revised version received August 4, 1999 / Published online August 2, 2000 相似文献

15.

An improved algorithm for solving communicating average reward Markov decision processes 总被引：1，自引：0，他引：1

Moshe Haviv Martin L. Puterman 《Annals of Operations Research》1991,28(1):229-242

This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527. 相似文献

16.

Borel状态空间非平稳MDP的平均方差准则

郭先平《数学学报》2001,44(2):333-342

本文考虑具有Ｂｏｒｅｌ状态空间和行动空间非平稳ＭＤＰ的平均方差准则．首先,在遍历条件下,利用最优方程,证明了关于平均期望目标最优马氏策略的存在性．然后,通过构造新的模型,利用马氏过程的理论,进一步证明了在关于平均期望目标是最优的一类马氏策略中,存在一个马氏策略使得平均方差达到最小．作为本文的特例还得到了ＤｙｎｋｉｎＥ．Ｂ．和ＹｕｓｈｋｅｖｉｃｈＡ．Ａ．及ＫｕｒａｎｏＭ．等中的主要结果．相似文献

17.

Vertical linear complementarity and discounted zero-sum stochastic games with ARAT structure

S.R. Mohan S.K. Neogy T. Parthasarathy S. Sinha 《Mathematical Programming》1999,86(3):637-648

In this paper we consider a two-person zero-sum discounted stochastic game with ARAT structure and formulate the problem of computing a pair of pure optimal stationary strategies and the corresponding value vector of such a game as a vertical linear complementarity problem. We show that Cottle-Dantzig’s algorithm (a generalization of Lemke’s algorithm) can solve this problem under a mild assumption. Received July 8, 1998 / Revised version received April 16, 1999? Published online September 15, 1999 相似文献

18.

Reduction of total-cost and average-cost MDPs with weakly continuous transition probabilities to discounted MDPs

Eugene A. Feinberg Jefferson Huang 《Operations Research Letters》2018,46(2):179-184

This note describes sufficient conditions under which total-cost and average-cost Markov decision processes (MDPs) with general state and action spaces, and with weakly continuous transition probabilities, can be reduced to discounted MDPs. For undiscounted problems, these reductions imply the validity of optimality equations and the existence of stationary optimal policies. The reductions also provide methods for computing optimal policies. The results are applied to a capacitated inventory control problem with fixed costs and lost sales. 相似文献

19.

Remarks on maximal meanstandard devition ratio in undiscounted mdps

《Optimization》2012,61(3-4):385-392

In the steady state of an undiscounted Markov decision process, we consider the problem to find an optimal stationary probability distribution that maximizes the mean standard deviation ratio among all the stationary probability distributions. The problem injects considerations in MDPs from the relative point of view 相似文献

20.

Optimal switching problem for countable Markov chains: average reward criterion

Alexander Yushkevich 《Mathematical Methods of Operations Research》2001,53(1):1-24

相似文献