首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the controlled ovarian hyperstimulation (COH) cycle of the in vitro fertilization-embryo transfer (IVF-ET) therapy, the clinicians observe the patients’ responses to gonadotropin dosages through closely monitoring their physiological states, to balance the trade-off between pregnancy rate and ovarian hyperstimulation syndrome (OHSS) risk. In this paper, we model the clinical practice in the COH treatment cycle as a stochastic dynamic program, to capture the dynamic decision process and to account for each individual patient’s stochastic responses to gonadotropin administration. We discretize the problem into a Markov decision process and solve it using a slightly modified backward dynamic programming algorithm. We then evaluate the policies using simulation and explore the impact of patient misclassification. More specifically, we focus on patients with polycystic ovary syndrome (PCOS) or potential, that is, the patients that tend to be more sensitive to gonadotropin administration.  相似文献   

2.
Branching Constraint Satisfaction Problems (BCSPs) model a class of uncertain dynamic resource allocation problems. We describe the features of BCSPs, and show that the associated decision problem is NP-complete. Markov Decision Problems could be used in place of BCSPs, but we show analytically and empirically that, for the class of problems in question, the BCSP algorithms are more efficient than the related MDP algorithms.  相似文献   

3.
4.
We consider limiting average Markov decision processes (MDP) with finite state and action spaces. We propose some algorithms to determine optimal strategies for deterministic and general MDPs. These algorithms are based on graph theory and the construction of levels in some aggregated MDP.  相似文献   

5.
Abstract

Current Gibbs sampling schemes in mixture of Dirichlet process (MDP) models are restricted to using “conjugate” base measures that allow analytic evaluation of the transition probabilities when resampling configurations, or alternatively need to rely on approximate numeric evaluations of some transition probabilities. Implementation of Gibbs sampling in more general MDP models is an open and important problem because most applications call for the use of nonconjugate base measures. In this article we propose a conceptual framework for computational strategies. This framework provides a perspective on current methods, facilitates comparisons between them, and leads to several new methods that expand the scope of MDP models to nonconjugate situations. We discuss one in detail. The basic strategy is based on expanding the parameter vector, and is applicable for MDP models with arbitrary base measure and likelihood. Strategies are also presented for the important class of normal-normal MDP models and for problems with fixed or few hyperparameters. The proposed algorithms are easily implemented and illustrated with an application.  相似文献   

6.
7.
Generalized linear mixed models with semiparametric random effects are useful in a wide variety of Bayesian applications. When the random effects arise from a mixture of Dirichlet process (MDP) model with normal base measure, Gibbs samplingalgorithms based on the Pólya urn scheme are often used to simulate posterior draws in conjugate models (essentially, linear regression models and models for binary outcomes). In the nonconjugate case, some common problems associated with existing simulation algorithms include convergence and mixing difficulties.

This article proposes an algorithm for MDP models with exponential family likelihoods and normal base measures. The algorithm proceeds by making a Laplace approximation to the likelihood function, thereby matching the proposal with that of the Gibbs sampler. The proposal is accepted or rejected via a Metropolis-Hastings step. For conjugate MDP models, the algorithm is identical to the Gibbs sampler. The performance of the technique is investigated using a Poisson regression model with semi-parametric random effects. The algorithm performs efficiently and reliably, even in problems where large-sample results do not guarantee the success of the Laplace approximation. This is demonstrated by a simulation study where most of the count data consist of small numbers. The technique is associated with substantial benefits relative to existing methods, both in terms of convergence properties and computational cost.  相似文献   

8.
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms.  相似文献   

9.
Ambulance diversion (AD) is used by emergency departments (EDs) to relieve congestion by requesting ambulances to bypass the ED and transport patients to another facility. We study optimal AD control policies using a Markov Decision Process (MDP) formulation that minimizes the average time that patients wait beyond their recommended safety time threshold. The model assumes that patients can be treated in one of two treatment areas and that the distribution of the time to start treatment at the neighboring facility is known. Assuming Poisson arrivals and exponential times for the length of stay in the ED, we show that the optimal AD policy follows a threshold structure, and explore the behavior of optimal policies under different scenarios. We analyze the value of information on the time to start treatment in the neighboring hospital, and show that optimal policies depend strongly on the congestion experienced by the other facility. Simulation is used to compare the performance of the proposed MDP model to that of simple heuristics under more realistic assumptions. Results indicate that the MDP model performs significantly better than the tested heuristics under most cases. Finally, we discuss practical issues related to the implementation of the policies prescribed by the MDP.  相似文献   

10.
The maximum diversity problem (MDP) is a challenging NP-hard problem with a wide range of real applications. Several researchers have pointed out close relationship between the MDP and unconstrained binary quadratic program (UBQP). In this paper, we provide procedures to solve MDP ideas from the UBQP formulation of the problem. We first give some local optimality results for r-flip improvement procedures on MDP. Then, a set of highly effective diversification approaches based on sequential improvement steps for MDP are presented. Four versions of the approaches are used within a simple tabu search and applied to 140 benchmark MDP problems available on the Internet. The procedures solve all 80 small- to medium-sized problems instantly to the best known solutions. For 22 of the 60 large problems, the procedures improved by significant amounts the best known solutions in reasonably short CPU time.  相似文献   

11.
This paper investigates a dynamic event-triggered optimal control problem of discrete-time (DT) nonlinear Markov jump systems (MJSs) via exploring policy iteration (PI) adaptive dynamic programming (ADP) algorithms. The performance index function (PIF) defined in each subsystem is updated by utilizing an online PI algorithm, and the corresponding control policy is derived via solving the optimal PIF. Then, we adopt neural network (NN) techniques, including an actor network and a critic network, to estimate the iterative PIF and control policy. Moreover, the designed dynamic event-triggered mechanism (DETM) is employed to avoid wasting additional resources when the estimated iterative control policy is updated. Finally, based on the Lyapunov difference method, it is proved that the system stability and the convergence of all signals can be guaranteed under the developed control scheme. A simulation example for DT nonlinear MJSs with two system modes is presented to demonstrate the feasibility of the control design scheme.  相似文献   

12.
This paper is the first of two papers that present and evaluate an approach for determining suboptimal policies for large-scale Markov decision processes (MDP). Part 1 is devoted to the determination of bounds that motivate the development and indicate the quality of the suboptimal design approach; Part 2 is concerned with the implementation and evaluation of the suboptimal design approach. The specific MDP considered is the infinite-horizon, expected total discounted cost MDP with finite state and action spaces. The approach can be described as follows. First, the original MDP is approximated by a specially structured MDP. The special structure suggests how to construct associated smaller, more computationally tractable MDP's. The suboptimal policy for the original MDP is then constructed from the solutions of these smaller MDP's. The key feature of this approach is that the state and action space cardinalities of the smaller MDP's are exponential reductions of the state and action space cardinalities of the original MDP.  相似文献   

13.
Multi-homing is used by Internet Service Providers (ISPs) to connect to the Internet via different network providers. This study develops a routing strategy under multi-homing in the case where network providers charge ISPs according to top-percentile pricing (i.e. based on the θth highest volume of traffic shipped). We call this problem the Top-percentile Traffic Routing Problem (TpTRP).Solution approaches based on Stochastic Dynamic Programming require discretization in state space, which introduces a large number of state variables. This is known as the curse of dimensionality in state space. To overcome this, in previous work we have suggested to use approximate dynamic programming (ADP) to construct value function approximations, which allow us to work in continuous state space. The resulting ADP model provides well performing routing policies for medium sized instances of the TpTRP. In this work we extend the ADP model, by using Bézier Curves/Surfaces to obtain continuous-time approximations of the time-dependent ADP parameters. This modification reduces the number of regression parameters to estimate, and thus accelerates the efficiency of parameter training in the solution of the ADP model, which makes realistically sized TpTRP instances tractable. We argue that our routing strategy is near optimal by giving bounds.  相似文献   

14.
The Unichain classification problem detects whether a finite state and action MDP is unichain under all deterministic policies. This problem is NP-hard. This paper provides polynomial algorithms for this problem when there is a state that is either recurrent under all deterministic policies or absorbing under some action.  相似文献   

15.
The Markov Decision Process (MDP) framework is a tool for the efficient modelling and solving of sequential decision-making problems under uncertainty. However, it reaches its limits when state and action spaces are large, as can happen for spatially explicit decision problems. Factored MDPs and dedicated solution algorithms have been introduced to deal with large factored state spaces. But the case of large action spaces remains an issue. In this article, we define graph-based Markov Decision Processes (GMDPs), a particular Factored MDP framework which exploits the factorization of the state space and the action space of a decision problem. Both spaces are assumed to have the same dimension. Transition probabilities and rewards are factored according to a single graph structure, where nodes represent pairs of state/decision variables of the problem. The complexity of this representation grows only linearly with the size of the graph, whereas the complexity of exact resolution grows exponentially. We propose an approximate solution algorithm exploiting the structure of a GMDP and whose complexity only grows quadratically with the size of the graph and exponentially with the maximum number of neighbours of any node. This algorithm, referred to as MF-API, belongs to the family of Approximate Policy Iteration (API) algorithms. It relies on a mean-field approximation of the value function of a policy and on a search limited to the suboptimal set of local policies. We compare it, in terms of performance, with two state-of-the-art algorithms for Factored MDPs: SPUDD and Approximate Linear Programming (ALP). Our experiments show that SPUDD is not generally applicable to solving GMDPs, due to the size of the action space we want to tackle. On the other hand, ALP can be adapted to solve GMDPs. We show that ALP is faster than MF-API and provides solutions of similar quality for most problems. However, for some problems MF-API provides significantly better policies, and in all cases provides a better approximation of the value function of approximate policies. These promising results show that the GMDP model offers a convenient framework for modelling and solving a large range of spatial and structured planning problems, that can arise in many different domains where processes are managed over networks: natural resources, agriculture, computer networks, etc.  相似文献   

16.
We propose a family of retrospective optimization (RO) algorithms for optimizing stochastic systems with both integer and continuous decision variables. The algorithms are continuous search procedures embedded in a RO framework using dynamic simplex interpolation (RODSI). By decreasing dimensions (corresponding to the continuous variables) of simplex, the retrospective solutions become closer to an optimizer of the objective function. We present convergence results of RODSI algorithms for stochastic “convex” systems. Numerical results show that a simple implementation of RODSI algorithms significantly outperforms some random search algorithms such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO).  相似文献   

17.
We present a Markov decision process (MDP) model to determine the optimal timing of blood pressure and cholesterol medications. We study the use of our model for a high-risk population of patients with type 2 diabetes; however, the model and methods we present are applicable to the general population. We compare the optimal policies based on our MDP to published guidelines for initiation of blood pressure and cholesterol medications over the course of a patient’s lifetime. We also present a bicriteria analysis that illustrates the trade off between quality-adjusted life years and costs of treatment.  相似文献   

18.
In this paper we consider healthcare policy issues for trading off resources in testing, prevention, and cure of two-stage contagious diseases. An individual that has contracted the two-stage contagious disease will initially show no symptoms of the disease but is capable of spreading it. If the initial stages are not detected which could lead to complications eventually, then symptoms start appearing in the latter stage when it would be necessary to perform expensive treatment. Under a constrained budget situation, policymakers are faced with the decision of how to allocate budget for prevention (via vaccinations), subsidizing treatment, and examination to detect the presence of initial stages of the contagious disease. These decisions need to be performed in each period of a given time horizon. To aid this decision-making exercise, we formulate a stochastic dynamic optimal control problem with feedback which can be modeled as a Markov decision process (MDP). However, solving the MDP is computationally intractable due to the large state space as the embedded stochastic network cannot be decomposed. Hence we propose an asymptotically optimal solution based on a fluid model of the dynamics in the stochastic network. We heuristically fine-tune the asymptotically optimal solution for the non-asymptotic case, and test it extensively for several numerical cases. In particular we investigate the effect of budget, length of planning horizon, type of disease, population size, and ratio of costs on the policy for budget allocation.  相似文献   

19.
20.
Single Sample Path-Based Optimization of Markov Chains   总被引:11,自引:0,他引:11  
Motivated by the needs of on-line optimization of real-world engineering systems, we studied single sample path-based algorithms for Markov decision problems (MDP). The sample path used in the algorithms can be obtained by observing the operation of a real system. We give a simple example to explain the advantages of the sample path-based approach over the traditional computation-based approach: matrix inversion is not required; some transition probabilities do not have to be known; it may save storage space; and it gives the flexibility of iterating the actions for a subset of the state space in each iteration. The effect of the estimation errors and the convergence property of the sample path-based approach are studied. Finally, we propose a fast algorithm, which updates the policy whenever the system reaches a particular set of states and prove that the algorithm converges to the true optimal policy with probability one under some conditions. The sample path-based approach may have important applications to the design and management of engineering systems, such as high speed communication networks.This work was supported in part by  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号