首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
We consider a discrete time finite Markov decision process (MDP) with the discounted and weighted reward optimality criteria. In [1] the authors considered some decomposition of limiting average MDPs. In this paper, we use an analogous approach for discounted and weighted MDPs. Then, we construct some hierarchical decomposition algorithms for both discounted and weighted MDPs.  相似文献   

2.
Abstract

Current Gibbs sampling schemes in mixture of Dirichlet process (MDP) models are restricted to using “conjugate” base measures that allow analytic evaluation of the transition probabilities when resampling configurations, or alternatively need to rely on approximate numeric evaluations of some transition probabilities. Implementation of Gibbs sampling in more general MDP models is an open and important problem because most applications call for the use of nonconjugate base measures. In this article we propose a conceptual framework for computational strategies. This framework provides a perspective on current methods, facilitates comparisons between them, and leads to several new methods that expand the scope of MDP models to nonconjugate situations. We discuss one in detail. The basic strategy is based on expanding the parameter vector, and is applicable for MDP models with arbitrary base measure and likelihood. Strategies are also presented for the important class of normal-normal MDP models and for problems with fixed or few hyperparameters. The proposed algorithms are easily implemented and illustrated with an application.  相似文献   

3.
4.
5.
In the controlled ovarian hyperstimulation (COH) treatment, clinicians monitor the patients’ physiological responses to gonadotropin administration to tradeoff between pregnancy probability and ovarian hyperstimulation syndrome (OHSS). We formulate the dosage control problem in the COH treatment as a stochastic dynamic program and design approximate dynamic programming (ADP) algorithms to overcome the well-known curses of dimensionality in Markov decision processes (MDP). Our numerical experiments indicate that the piecewise linear (PWL) approximation ADP algorithms can obtain policies that are very close to the one obtained by the MDP benchmark with significantly less solution time.  相似文献   

6.
7.
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms.  相似文献   

8.
Branching Constraint Satisfaction Problems (BCSPs) model a class of uncertain dynamic resource allocation problems. We describe the features of BCSPs, and show that the associated decision problem is NP-complete. Markov Decision Problems could be used in place of BCSPs, but we show analytically and empirically that, for the class of problems in question, the BCSP algorithms are more efficient than the related MDP algorithms.  相似文献   

9.
The Unichain classification problem detects whether a finite state and action MDP is unichain under all deterministic policies. This problem is NP-hard. This paper provides polynomial algorithms for this problem when there is a state that is either recurrent under all deterministic policies or absorbing under some action.  相似文献   

10.
Single Sample Path-Based Optimization of Markov Chains   总被引:11,自引:0,他引:11  
Motivated by the needs of on-line optimization of real-world engineering systems, we studied single sample path-based algorithms for Markov decision problems (MDP). The sample path used in the algorithms can be obtained by observing the operation of a real system. We give a simple example to explain the advantages of the sample path-based approach over the traditional computation-based approach: matrix inversion is not required; some transition probabilities do not have to be known; it may save storage space; and it gives the flexibility of iterating the actions for a subset of the state space in each iteration. The effect of the estimation errors and the convergence property of the sample path-based approach are studied. Finally, we propose a fast algorithm, which updates the policy whenever the system reaches a particular set of states and prove that the algorithm converges to the true optimal policy with probability one under some conditions. The sample path-based approach may have important applications to the design and management of engineering systems, such as high speed communication networks.This work was supported in part by  相似文献   

11.
The maximum diversity problem (MDP) is a challenging NP-hard problem with a wide range of real applications. Several researchers have pointed out close relationship between the MDP and unconstrained binary quadratic program (UBQP). In this paper, we provide procedures to solve MDP ideas from the UBQP formulation of the problem. We first give some local optimality results for r-flip improvement procedures on MDP. Then, a set of highly effective diversification approaches based on sequential improvement steps for MDP are presented. Four versions of the approaches are used within a simple tabu search and applied to 140 benchmark MDP problems available on the Internet. The procedures solve all 80 small- to medium-sized problems instantly to the best known solutions. For 22 of the 60 large problems, the procedures improved by significant amounts the best known solutions in reasonably short CPU time.  相似文献   

12.
We study the multichannel deconvolution problem (MDP) in a discrete setting by developing the theory for converting the method used in the continuous setting in [36]. We give a method for solving the MDP when the convolvers are characteristic functions, derive the explicit form of the linear system, and obtain an upper bound on the condition number of the system in a particular case. We compare the Schiske reconstruction [28] to our solution in the discrete setting, and give an explicit formula for the corresponding error. We then give the algorithm for solving the general MDP and discuss in detail the local reconstruction aspects of the problem. Finally, we describe a method for improving the reconstruction by regularization and give some explicit estimates on error bounds in the presence of noise.  相似文献   

13.
Generalized linear mixed models with semiparametric random effects are useful in a wide variety of Bayesian applications. When the random effects arise from a mixture of Dirichlet process (MDP) model with normal base measure, Gibbs samplingalgorithms based on the Pólya urn scheme are often used to simulate posterior draws in conjugate models (essentially, linear regression models and models for binary outcomes). In the nonconjugate case, some common problems associated with existing simulation algorithms include convergence and mixing difficulties.

This article proposes an algorithm for MDP models with exponential family likelihoods and normal base measures. The algorithm proceeds by making a Laplace approximation to the likelihood function, thereby matching the proposal with that of the Gibbs sampler. The proposal is accepted or rejected via a Metropolis-Hastings step. For conjugate MDP models, the algorithm is identical to the Gibbs sampler. The performance of the technique is investigated using a Poisson regression model with semi-parametric random effects. The algorithm performs efficiently and reliably, even in problems where large-sample results do not guarantee the success of the Laplace approximation. This is demonstrated by a simulation study where most of the count data consist of small numbers. The technique is associated with substantial benefits relative to existing methods, both in terms of convergence properties and computational cost.  相似文献   

14.
The Markov Decision Process (MDP) framework is a tool for the efficient modelling and solving of sequential decision-making problems under uncertainty. However, it reaches its limits when state and action spaces are large, as can happen for spatially explicit decision problems. Factored MDPs and dedicated solution algorithms have been introduced to deal with large factored state spaces. But the case of large action spaces remains an issue. In this article, we define graph-based Markov Decision Processes (GMDPs), a particular Factored MDP framework which exploits the factorization of the state space and the action space of a decision problem. Both spaces are assumed to have the same dimension. Transition probabilities and rewards are factored according to a single graph structure, where nodes represent pairs of state/decision variables of the problem. The complexity of this representation grows only linearly with the size of the graph, whereas the complexity of exact resolution grows exponentially. We propose an approximate solution algorithm exploiting the structure of a GMDP and whose complexity only grows quadratically with the size of the graph and exponentially with the maximum number of neighbours of any node. This algorithm, referred to as MF-API, belongs to the family of Approximate Policy Iteration (API) algorithms. It relies on a mean-field approximation of the value function of a policy and on a search limited to the suboptimal set of local policies. We compare it, in terms of performance, with two state-of-the-art algorithms for Factored MDPs: SPUDD and Approximate Linear Programming (ALP). Our experiments show that SPUDD is not generally applicable to solving GMDPs, due to the size of the action space we want to tackle. On the other hand, ALP can be adapted to solve GMDPs. We show that ALP is faster than MF-API and provides solutions of similar quality for most problems. However, for some problems MF-API provides significantly better policies, and in all cases provides a better approximation of the value function of approximate policies. These promising results show that the GMDP model offers a convenient framework for modelling and solving a large range of spatial and structured planning problems, that can arise in many different domains where processes are managed over networks: natural resources, agriculture, computer networks, etc.  相似文献   

15.
We consider a Markov decision process (MDP) under average reward criterion. We investigate the decomposition of such MDP into smaller MDPs by using the strongly connected classes in the associated graph. Then, by introducing the associated levels, we construct an aggregation-disaggregation algorithm for the computation of an optimal strategy for the original MDP.  相似文献   

16.
In this paper we discuss MDP with distribution function criterion of first-passage time. Some properties of several kinds of optimal policies are given. Existence results and algorithms for these optimal policies are given in this paper. Accepted 24 July 2000. Online publication 12 April 2001.  相似文献   

17.
This article begins with a review of previously proposed integer formulations for the maximum diversity problem (MDP). This problem consists of selecting a subset of elements from a larger set in such a way that the sum of the distances between the chosen elements is maximized. We propose a branch and bound algorithm and develop several upper bounds on the objective function values of partial solutions to the MDP. Empirical results with a collection of previously reported instances indicate that the proposed algorithm is able to solve all the medium-sized instances (with 50 elements) as well as some large-sized instances (with 100 elements). We compare our method with the best previous linear integer formulation solved with the well-known software Cplex. The comparison favors the proposed procedure.  相似文献   

18.
We establish Moderate deviations principles (MDP) for some stable autoregressive models of order p: first for continuous unbounded additive functionnals of this process, second for the estimation error of the regression function (least-squares estimator in the stable linear case, kernel estimator in the Lipschitz non-linear case).  相似文献   

19.
20.
Linear Programming is known to be an important and useful tool for solving Markov Decision Processes (MDP). Its derivation relies on the Dynamic Programming approach, which also serves to solve MDP. However, for Markov Decision Processes with several constraints the only available methods are based on Linear Programs. The aim of this paper is to investigate some aspects of such Linear Programs, related to multi-chain MDPs. We first present a stochastic interpretation of the decision variables that appear in the Linear Programs available in the literature. We then show for the multi-constrained Markov Decision Process that the Linear Program suggested in [9] can be obtained from an equivalent unconstrained Lagrange formulation of the control problem. This shows the connection between the Linear Program approach and the Lagrange approach, that was previously used only for the case of a single constraint [3, 14, 15].  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号