首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Dynamic programming is the essential tool in dynamic economic analysis. Problems such as portfolio allocation for individuals and optimal growth of national economies are typical examples. Numerical methods typically approximate the value function and use value function iteration to compute the value function for the optimal policy. Polynomial approximations are natural choices for approximating value functions when we know that the true value function is smooth. However, numerical value function iteration with polynomial approximations is unstable because standard methods such as interpolation and least squares fitting do not preserve shape. We introduce shape-preserving approximation methods that stabilize value function iteration, and are generally faster than previous stable methods such as piecewise linear interpolation.  相似文献   

2.
This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iterations, the value iteration algorithm is not strongly polynomial for discounted dynamic programming.  相似文献   

3.
This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527.  相似文献   

4.
Given an n×n symmetric positive definite matrix A and a vector , two numerical methods for approximating are developed, analyzed, and computationally tested. The first method applies a Newton iteration to a specific nonlinear system to approximate while the second method applies a step-control method to numerically solve a specific initial-value problem to approximate . Assuming that A is first reduced to tridiagonal form, the first method requires O(n2) operations per iteration while the second method requires O(n) operations per iteration. In contrast, numerical methods that first approximate A1/2 and then compute generally require O(n3) operations per iteration.  相似文献   

5.
In actual practice, iteration methods applied to the solution of finite systems of equations yield inconclusive results as to the existence or nonexistence of solutions and the accuracy of any approximate solutions obtained. On the other hand, construction of interval extensions of ordinary iteration operators permits one to carry out interval iteration computationally, with results which can give rigorous guarantees of existence or nonexistence of solutions, and error bounds for approximate solutions. Examples are given of the solution of a nonlinear system of equations and the calculation of eigenvalues and eigenvectors of a matrix by interval iteration. Several ways to obtain lower and upper bounds for eigenvalues are given.Sponsored by the United States Army under Contract No. DAAG29-80-C-0041.  相似文献   

6.
We propose a new algorithm for sparse estimation of eigenvectors in generalized eigenvalue problems (GEPs). The GEP arises in a number of modern data-analytic situations and statistical methods, including principal component analysis (PCA), multiclass linear discriminant analysis (LDA), canonical correlation analysis (CCA), sufficient dimension reduction (SDR), and invariant co-ordinate selection. We propose to modify the standard generalized orthogonal iteration with a sparsity-inducing penalty for the eigenvectors. To achieve this goal, we generalize the equation-solving step of orthogonal iteration to a penalized convex optimization problem. The resulting algorithm, called penalized orthogonal iteration, provides accurate estimation of the true eigenspace, when it is sparse. Also proposed is a computationally more efficient alternative, which works well for PCA and LDA problems. Numerical studies reveal that the proposed algorithms are competitive, and that our tuning procedure works well. We demonstrate applications of the proposed algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA, and SDR. Supplementary materials for this article are available online.  相似文献   

7.
This paper provides an easily computable one parameter inspection policy for the detection of the failure of a system when the time to failure of the system follows a gamma distribution. This one parameter inspection policy, as suggested by Munford and Shahani, compares fairly well with a computationally difficult optimal policy suggested by Barlow, Hunter and Proschan. Necessary tables are provided for the "optimal" parameter for a wide range of the parameters of the problem, from which the successive inspection times can be computed easily.  相似文献   

8.
For solving large sparse systems of linear equations, we construct a paradigm of two-step matrix splitting iteration methods and analyze its convergence property for the nonsingular and the positive-definite matrix class. This two-step matrix splitting iteration paradigm adopts only one single splitting of the coefficient matrix, together with several arbitrary iteration parameters. Hence, it can be constructed easily in actual applications, and can also recover a number of representatives of the existing two-step matrix splitting iteration methods. This result provides systematic treatment for the two-step matrix splitting iteration methods, establishes rigorous theory for their asymptotic convergence, and enriches algorithmic family of the linear iteration solvers, for the iterative solutions of large sparse linear systems.  相似文献   

9.
In this paper, we introduce a multigrid method for solving the nonliear Urysohn integral equation. The algorithm is derived from a discrete resolvent equation which approximates the continuous resolvent equation of the nonlinear Urysohn integral equation. The algorithm is mathematically equivalent to Atkinson’s adaptive twogrid iteration. But the two are different computationally. We show the convergence of the algorithm and its equivalence to Atkinson’s adaptive twogrid iteration. In our numerical example, we compare our algorithm to other multigrid methods for solving the nonliear Urysohn integral equation including the nonlinear multigrid method introduced by Hackbush.  相似文献   

10.
Path-following algorithms take at each iteration a Newton step for approaching a point on the central path, in such a way that all the iterates remain in a given neighborhood of that path. This paper studies the case in which each iteration uses a pure Newton step with the largest possible reduction in complementarity measure (duality gap). This algorithm is known to converge superlinearly in objective values. We show that with the addition of a computationally trivial safeguard it achieves Q-quadratic convergence, and show that this behaviour cannot be proved by usual techniques for the original method. Research done while visiting Delft University of Technology, and supported in part by CAPES-Brazil.  相似文献   

11.
We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in Bertsekas and Yu (Math. Oper. Res. 37(1):66-94, 2012). The main difference from the standard policy iteration approach is in the policy evaluation phase: instead of solving a linear system of equations, our algorithm solves an optimal stopping problem inexactly with a finite number of value iterations. The main advantage over the standard Q-learning approach is lower overhead: most iterations do not require a minimization over all controls, in the spirit of modified policy iteration. We prove the convergence of asynchronous deterministic and stochastic lookup table implementations of our method for undiscounted, total cost stochastic shortest path problems. These implementations overcome some of the traditional convergence difficulties of asynchronous modified policy iteration, and provide policy iteration-like alternative Q-learning schemes with as reliable convergence as classical Q-learning. We also discuss methods that use basis function approximations of Q-factors and we give an associated error bound.  相似文献   

12.
Summary  In the inference of contingency table, when the cell counts are not large enough for asymptotic approximation, conditioning exact method is used and often computationally impractical for large tables. Instead, various sampling methods can be used. Based on permutation, the Monte Carlo sampling may become again impractical for large tables. For this, existing the Markov chain method is to sample a few elements of the table at each iteration and is inefficient. Here we consider a Markov chain, in which a sub-table of user specified size is updated at each iteration, and it achieves high sampling efficiency. Some theoretical properties of the chain and its applications to some commonly used tables are discussed. As an illustration, this method is applied to the exact test of the Hardy-Weinberg equilibrium in the population genetics context.  相似文献   

13.
To computationally solve an adaptive optimal control problem by means of conventional dynamic programming, a backward recursion must be used with an optimal value and optimal control determined for all conceivable prior information patterns and prior control histories. Consequently, almost all problems are beyond the capability of even large computers.As an alternative, we develop in this paper a computational successive improvement scheme which involves choosing a nominal control policy and then improving it at each iteration. Each improvement involves considerable computation, but much less than the straightforward dynamic programming algorithm. As in any local-improvement procedure, the scheme may converge to something which is inferior to the absolutely optimal control.This paper has been supported by the National Science Foundation under Grant No. GP-25081.  相似文献   

14.
Addition of a correction term every other Newton iteration provides a fifth-order method for finding simple zeros of nonlinear functions. A two-parameter family of such methods is developed. Each family member requires the given function and its derivative to be evaluated at two points per step.Work supported by the British Science Research Council at the University of Dundee, and by the U.S. Atomic Energy Commission.  相似文献   

15.
Military medical planners must develop dispatching policies that dictate how aerial medical evacuation (MEDEVAC) units are utilized during major combat operations. The objective of this research is to determine how to optimally dispatch MEDEVAC units in response to 9-line MEDEVAC requests to maximize MEDEVAC system performance. A discounted, infinite horizon Markov decision process (MDP) model is developed to examine the MEDEVAC dispatching problem. The MDP model allows the dispatching authority to accept, reject, or queue incoming requests based on a request’s classification (i.e., zone and precedence level) and the state of the MEDEVAC system. A representative planning scenario based on contingency operations in southern Afghanistan is utilized to investigate the differences between the optimal dispatching policy and three practitioner-friendly myopic policies. Two computational experiments are conducted to examine the impact of selected MEDEVAC problem features on the optimal policy and the system performance measure. Several excursions are examined to identify how the 9-line MEDEVAC request arrival rate and the MEDEVAC flight speeds impact the optimal dispatching policy. Results indicate that dispatching MEDEVAC units considering the precedence level of requests and the locations of busy MEDEVAC units increases the performance of the MEDEVAC system. These results inform the development and implementation of MEDEVAC tactics, techniques, and procedures by military medical planners. Moreover, an analysis of solution approaches for the MEDEVAC dispatching problem reveals that the policy iteration algorithm substantially outperforms the linear programming algorithms executed by CPLEX 12.6 with regard to computational effort. This result supports the claim that policy iteration remains the superlative solution algorithm for exactly solving computationally tractable Markov decision problems.  相似文献   

16.
Nonproportional hazards models often arise in biomedical studies, as evidenced by a recent national kidney transplant study. During the follow-up, the effects of baseline risk factors, such as patients’ comorbidity conditions collected at transplantation, may vary over time. To model such dynamic changes of covariate effects, time-varying survival models have emerged as powerful tools. However, traditional methods of fitting time-varying effects survival model rely on an expansion of the original dataset in a repeated measurement format, which, even with a moderate sample size, leads to an extremely large working dataset. Consequently, the computational burden increases quickly as the sample size grows, and analyses of a large dataset such as our motivating example defy any existing statistical methods and software. We propose a novel application of quasi-Newton iteration method to model time-varying effects in survival analysis. We show that the algorithm converges superlinearly and is computationally efficient for large-scale datasets. We apply the proposed methods, via a stratified procedure, to analyze the national kidney transplant data and study the impact of potential risk factors on post-transplant survival. Supplementary materials for this article are available online.  相似文献   

17.
《Operations Research Letters》2014,42(6-7):429-431
This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and λ-policy iteration algorithms are not strongly polynomial.  相似文献   

18.
In this paper, a family of fourth orderP-stable methods for solving second order initial value problems is considered. When applied to a nonlinear differential system, all the methods in the family give rise to a nonlinear system which may be solved using a modified Newton method. The classical methods of this type involve at least three (new) function evaluations per iteration (that is, they are 3-stage methods) and most involve using complex arithmetic in factorising their iteration matrix. We derive methods which require only two (new) function evaluations per iteration and for which the iteration matrix is a true real perfect square. This implies that real arithmetic will be used and that at most one real matrix must be factorised at each step. Also we consider various computational aspects such as local error estimation and a strategy for changing the step size.  相似文献   

19.
In this paper, we consider a mean–variance optimization problem for Markov decision processes (MDPs) over the set of (deterministic stationary) policies. Different from the usual formulation in MDPs, we aim to obtain the mean–variance optimal policy that minimizes the variance over a set of all policies with a given expected reward. For continuous-time MDPs with the discounted criterion and finite-state and action spaces, we prove that the mean–variance optimization problem can be transformed to an equivalent discounted optimization problem using the conditional expectation and Markov properties. Then, we show that a mean–variance optimal policy and the efficient frontier can be obtained by policy iteration methods with a finite number of iterations. We also address related issues such as a mutual fund theorem and illustrate our results with an example.  相似文献   

20.
The discretization of eigenvalue problems for partial differential operators is a major source of matrix eigenvalue problems having very large dimensions, but only some of the smallest eigenvalues together with the eigenvectors are to be determined. Preconditioned inverse iteration (a “matrix-free” method) derives from the well-known inverse iteration procedure in such a way that the associated system of linear equations is solved approximately by using a (multigrid) preconditioner. A new convergence analysis for preconditioned inverse iteration is presented. The preconditioner is assumed to satisfy some bound for the spectral radius of the error propagation matrix resulting in a simple geometric setup. In this first part the case of poorest convergence depending on the choice of the preconditioner is analyzed. In the second part the dependence on all initial vectors having a fixed Rayleigh quotient is considered. The given theory provides sharp convergence estimates for the eigenvalue approximations showing that multigrid eigenvalue/vector computations can be done with comparable efficiency as known from multigrid methods for boundary value problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号