首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
本文考虑可数状态离散时间马氏决策过程的首达目标模型的风险概率准则.优化的准则是最小化系统首次到达目标状态集的时间不超过某阈值的风险概率.首先建立最优方程并且证明最优值函数和最优方程的解对应,然后讨论了最优策略的一些性质,并进一步给出了最优平稳策略存在的条件,最后用一个例子说明我们的结果.  相似文献   

2.
This paper concerns with the performance analysis for controlled semi-Markov systems in Borel state and action spaces. The performability of the system is defined as the probability that the system reaches a prescribed reward level during a first passage time to some target set. Under mild conditions, we develop a value iteration algorithm for computing the optimal value, and establish the existence of optimal policies with the maximal performability. Our main results are applied to a maintenance problem.  相似文献   

3.
We analyze the computation of optimal and approximately optimal policies for a discrete-time model of a single reservoir whose discharges generate hydroelectric power. Inflows in successive periods are random variables. Revenue from hydroelectric production is represented by a piecewise linear function. We use the special structure of optimal policies, together with piecewise affine approximations of the optimal return functions at each stage of dynamic programming, to decrease the computational effort by an order of magnitude compared with ordinary value iteration. The method is then used to obtain easily computable lower and upper bounds on the value function of an optimal policy, and a policy whose value function is between the bounds.  相似文献   

4.
This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total loss incurred during a first passage time to some target set exceeds a loss level. First, we establish the optimality equation via a successive approximation technique, and show that the value function is the unique solution to the optimality equation. Second, we give suitable conditions, under which we prove the existence of optimal policies and develop an algorithm for computing ?-optimal policies. Finally, we apply our main results to a business system.  相似文献   

5.
In this paper, we deal with two-person zero-sum stochastic games for discrete-time Markov processes. The optimality criterion to be studied is the discounted payoff criterion during a first passage time to some target set, where the discount factor is state-dependent. The state and action spaces are all Borel spaces, and the payoff functions are allowed to be unbounded. Under the suitable conditions, we first establish the optimality equation. Then, using dynamic programming techniques, we obtain the existence of the value of the game and a pair of optimal stationary policies. Moreover, we present the exponential convergence of the value iteration and a ‘martingale characterization’ of a pair of optimal policies. Finally, we illustrate the applications of our main results with an inventory system.  相似文献   

6.
This paper deals with discrete-time Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Under general conditions, we develop an iteration algorithm for computing the optimal value function, and also prove the existence of optimal stationary policies. Furthermore, we illustrate our results with a cash-balance model.  相似文献   

7.
We present a specialized policy iteration method for the computation of optimal and approximately optimal policies for a discrete-time model of a single reservoir whose discharges generate hydroelectric power. The model is described in (Lamond et al., 1995) and (Drouin et al., 1996), where the special structure of optimal policies is given and an approximate value iteration method is presented, using piecewise affine approximations of the optimal return functions. Here, we present a finite method for computing an optimal policy in O(n3) arithmetic operations, where n is the number of states in the associated Markov decision process, and a finite method for computing a lower bound on the optimal value function in O(m2n) where m is the number of nodes of the piecewise affine approximation.  相似文献   

8.
Dynamic programming is the essential tool in dynamic economic analysis. Problems such as portfolio allocation for individuals and optimal growth of national economies are typical examples. Numerical methods typically approximate the value function and use value function iteration to compute the value function for the optimal policy. Polynomial approximations are natural choices for approximating value functions when we know that the true value function is smooth. However, numerical value function iteration with polynomial approximations is unstable because standard methods such as interpolation and least squares fitting do not preserve shape. We introduce shape-preserving approximation methods that stabilize value function iteration, and are generally faster than previous stable methods such as piecewise linear interpolation.  相似文献   

9.
本文考虑的是Hinderer提出的状态空间和行动空间均业般集的非平稳MDP平均模型,利用扩大状态空间的方法,建立了此模型的最优方程,并给出了最优方程有解及蜞 最优策略存在的条件,从最优方程出发,用概率的方法证明了最优策略的存在性,最后还提供了此模型的值迭代算法及其收敛性证明,从而推广了Smith。L.Lassere,B「3」及Larma^「6」等的主要结果。  相似文献   

10.
In an optimization problem with equality constraints the optimal value function divides the state space into two parts. At a point where the objective function is less than the optimal value, a good iteration must increase the value of the objective function. Thus, a good iteration must be a balance between increasing or decreasing the objective function and decreasing a constraint violation function. This implies that at a point where the constraint violation function is large, we should construct noninferior solutions relative to points in a local search region. By definition, an accessory function is a linear combination of the objective function and a constraint violation function. We show that a way to construct an acceptable iteration, at a point where the constraint violation function is large, is to minimize an accessory function. We develop a two-phases method. In Phase I some constraints may not be approximately satisfied or the current point is not close to the solution. Iterations are generated by minimizing an accessory function. Once all the constraints are approximately satisfied, the initial values of the Lagrange multipliers are defined. A test with a merit function is used to determine whether or not the current point and the Lagrange multipliers are both close to the optimal solution. If not, Phase I is continued. If otherwise, Phase II is activated and the Newton method is used to compute the optimal solution and fast convergence is achieved.  相似文献   

11.
In this paper we consider a general optimal consumption-portfolio selection problem of an infinitely-lived agent whose consumption rate process is subject to subsistence constraints before retirement. That is, her consumption rate should be greater than or equal to some positive constant before retirement. We integrate three optimal decisions which are the optimal consumption, the optimal investment choice and the optimal stopping problem in which the agent chooses her retirement time in one model. We obtain the explicit forms of optimal policies using a martingale method and a variational inequality arising from the dual function of the optimal stopping problem. We treat the optimal retirement time as the first hitting time when her wealth exceeds a certain wealth level which will be determined by a free boundary value problem and duality approaches. We also derive closed forms of the optimal wealth processes before and after retirement. Some numerical examples are presented for the case of constant relative risk aversion (CRRA) utility class.  相似文献   

12.
Policy iteration is a well-studied algorithm for solving stationary Markov decision processes (MDPs). It has also been extended to robust stationary MDPs. For robust nonstationary MDPs, however, an “as is” execution of this algorithm is not possible because it would call for an infinite amount of computation in each iteration. We therefore present a policy iteration algorithm for robust nonstationary MDPs, which performs finitely implementable approximate variants of policy evaluation and policy improvement in each iteration. We prove that the sequence of cost-to-go functions produced by this algorithm monotonically converges pointwise to the optimal cost-to-go function; the policies generated converge subsequentially to an optimal policy.  相似文献   

13.
In this paper, we use the variational iteration method (VIM) for optimal control problems. First, optimal control problems are transferred to Hamilton–Jacobi–Bellman (HJB) equation as a nonlinear first order hyperbolic partial differential equation. Then, the basic VIM is applied to construct a nonlinear optimal feedback control law. By this method, the control and state variables can be approximated as a function of time. Also, the numerical value of the performance index is obtained readily. In view of the convergence of the method, some illustrative examples are presented to show the efficiency and reliability of the presented method.  相似文献   

14.
In this paper, an Envelope Theorem (ET) will be established for optimization problems on Euclidean spaces. In general, the Envelope Theorems permit analyzing an optimization problem and giving the solution by means of differentiability techniques. The ET will be presented in two versions. One of them uses concavity assumptions, whereas the other one does not require such kind of assumptions. Thereafter, the ET established will be applied to the Markov Decision Processes (MDPs) on Euclidean spaces, discounted and with infinite horizon. As the first application, several examples (including some economic models) of discounted MDPs for which the et allows to determine the value iteration functions will be presented. This will permit to obtain the corresponding optimal value functions and the optimal policies. As the second application of the ET, it will be proved that under differentiability conditions in the transition law, in the reward function, and the noise of the system, the value function and the optimal policy of the problem are differentiable with respect to the state of the system. Besides, various examples to illustrate these differentiability conditions will be provided. This work was partially supported by Benemérita Universidad Aut ónoma de Puebla (BUAP) under grant VIEP-BUAP 38/EXC/06-G, by Consejo Nacional de Ciencia y Tecnología (CONACYT), and by Evaluation-orientation de la COopération Scientifique (ECOS) under grant CONACyT-ECOS M06-M01.  相似文献   

15.
针对非线性方程求单根问题,提出了一种新的Newton预测-校正格式.通过每步迭代增加计算一个函数值和一阶导数值,使得每步迭代需要估计两个函数值和两个一阶导数值.与标准的Newton算法的二阶收敛速度相比,新算法具有更高阶的收敛速度2+\sqrt{6}.通过测试函数对新算法进行测试, 与相关算法比较,表明算法在迭代次数、运算时间及最优值方面都具有较明显的优势. 最后,将这种新格式推广到多维向量值函数, 采用泰勒公式证明了其收敛性,并给出了两个二维算例来验证其收敛的有效性.  相似文献   

16.
本文讨论的是离散模型下以期望累计红利最大化为目标的最优红利分配政策,通过Bellman最优性准则,我们得到了最优值函数满足的动态规划方程并结合实例给出了求解这些方程的算法.  相似文献   

17.
非负费用折扣半马氏决策过程   总被引:1,自引:0,他引:1  
黄永辉  郭先平 《数学学报》2010,53(3):503-514
本文考虑可数状态非负费用的折扣半马氏决策过程.首先在给定半马氏决策核和策略下构造一个连续时间半马氏决策过程,然后用最小非负解方法证明值函数满足最优方程和存在ε-最优平稳策略,并进一步给出最优策略的存在性条件及其一些性质.最后,给出了值迭代算法和一个数值算例.  相似文献   

18.
Continuous time Markovian decision models with countable state space are investigated. The existence of an optimal stationary policy is established for the expected average return criterion function. It is shown that the expected average return can be expressed as an expected discounted return of a related Markovian decision process. A policy iteration method is given which converges to an optimal deterministic policy, the policy so obtained is shown optimal over all Markov policies.  相似文献   

19.
We are concerned with Markov decision processes with Borel state and action spaces; the transition law and the reward function depend on anunknown parameter. In this framework, we study therecursive adaptive nonstationary value iteration policy, which is proved to be optimal under thesame conditions usually imposed to obtain the optimality of other well-knownnonrecursive adaptive policies. The results are illustrated by showing the existence of optimal adaptive policies for a class of additive-noise systems with unknown noise distribution.This research was supported in part by the Consejo Nacional de Ciencia y Tecnología under Grants PCEXCNA-050156 and A128CCOEO550, and in part by the Third World Academy of Sciences under Grant TWAS RG MP 898-152.  相似文献   

20.
In this paper a new algorithm is provided for obtaining approximately optimal policies for infinite-horizon discounted Markov decision processes. In addition, some of the properties of the algorithm are established. The algorithm is based upon the fact that the optimal value function is the unique vector minimum function within the superharmonic set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号