首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy  相似文献   

2.
This paper attempts to study the convergence of optimal values and optimal policies of continuous-time Markov decision processes(CTMDP for short)under the constrained average criteria. For a given original model M_∞of CTMDP with denumerable states and a sequence {M_n} of CTMDP with finite states, we give a new convergence condition to ensure that the optimal values and optimal policies of {M_n} converge to the optimal value and optimal policy of M_∞as the state space Snof Mnconverges to the state space S_∞of M_∞, respectively. The transition rates and cost/reward functions of M_∞are allowed to be unbounded. Our approach can be viewed as a combination method of linear program and Lagrange multipliers.  相似文献   

3.
This paper considers the dividend optimization problem for an insurance company under the consideration of internal competition between different units inside the company. The objective is to find a reinsurance policy and a dividend payment scheme so as to maximize the expected discounted value of the dividend payment, and the expected present value of an amount which the insurer earns until the time of ruin. By solving the corresponding constrained Hamilton-Jacobi-Bellman (HJB) equation, we obtain the value function and the optimal reinsurance policy and dividend payment.  相似文献   

4.
This work develops asymptotically optimal dividend policies to maximize the expected present value of dividends until ruin.Compound Poisson processes with regime switching are used to model the surplus and the switching(a continuous-time controlled Markov chain) represents random environment and other economic conditions.Assuming the switching to be fast varying together with suitable conditions,it is shown that the system has a limit that is an average with respect to the invariant measure of a related Markov chain.Under simple conditions,the optimal policy of the limit dividend strategy is a threshold policy.Using the optimal policy of the limit system as a guide,feedback control for the original surplus is then developed.It is demonstrated that the constructed dividend policy is asymptotically optimal.  相似文献   

5.
Consider the optimal dividend problem for an insurance company whose uncontrolled surplus precess evolves as a spectrally negative Levy process. We assume that dividends are paid to the shareholders according to admissible strategies whose dividend rate is bounded by a constant. The objective is to find a dividend policy so as to maximize the expected discounted value of dividends which are paid to the shareholders until the company is ruined. In this paper, we show that a threshold strategy (also called refraction strategy) forms an optimal strategy under the condition that the Levy measure has a completely monotone density.  相似文献   

6.
In this paper, we discuss Markovian decision programming with recursive vector-reward andgive an algorithm to find optimal policies. We prove that: (1) There is a Markovian optimal policy for the nonstationary case; (2) Thereis a stationary optimal policy for the stationary case.  相似文献   

7.
We consider the compound binomial model, and assume that dividends are paid to the shareholders according to an admissible strategy with dividend rates bounded by a constant.The company controls the amount of dividends in order to maximize the cumulative expected discounted dividends prior to ruin. We show that the optimal value function is the unique solution of a discrete HJB equation. Moreover, we obtain some properties of the optimal payment strategy, and offer a simple algorithm for obtaining the optimal strategy. The key of our method is to transform the value function. Numerical examples are presented to illustrate the transformation method.  相似文献   

8.
We consider the spectrally negative L@vy processes and determine the joint laws for the quantities such as the first and last passage times over a fixed level, the overshoots and undershoots at first passage, the minimum, the maximum, and the duration of negative values. We apply our results to insurance risk theory to find an explicit expression for the generalized expected discounted penalty function in terms of scale functions. Furthermore, a new expression for the generalized Dickson's formula is provided.  相似文献   

9.
This paper considers a model of an insurance company which is allowed to invest a risky asset and to purchase proportional reinsurance. The objective is to find the policy which maximizes the expected total discounted dividend pay-out until the time of bankruptcy and the terminal value of the company under liquidity constraint. We find the solution of this problem via solving the problem with zero terminal value. We also analyze the influence of terminal value on the optimal policy.  相似文献   

10.
We determine replenishment and sales decisions jointly for an inventory system with random demand, lost sales and random yield. Demands in consecutive periods are independent random variables and their distributions are known. We incorporate discretionary sales, when inventory may be set aside to satisfy future demand even if some present demand may be lost. Our objective is to minimize the total discounted cost over the problem horizon by choosing an optimal replenishment and discretionary sales policy. We obtain the structure of the optimal replenishment and discretionary sales policy and show that the optimal policy for finite horizon problem converges to that of the infinite horizon problem. Moreover, we compare the optimal policy under random yield with that under certain yield, and show that the optimal order quantity (sales quantity) under random yield is more (less) than that under certain yield.  相似文献   

11.
非负费用折扣半马氏决策过程   总被引:1,自引:0,他引:1  
黄永辉  郭先平 《数学学报》2010,53(3):503-514
本文考虑可数状态非负费用的折扣半马氏决策过程.首先在给定半马氏决策核和策略下构造一个连续时间半马氏决策过程,然后用最小非负解方法证明值函数满足最优方程和存在ε-最优平稳策略,并进一步给出最优策略的存在性条件及其一些性质.最后,给出了值迭代算法和一个数值算例.  相似文献   

12.
本文研究约束折扣半马氏决策规划问题,即在一折扣期望费用约束下,使折扣期望报酬达最大的约束最优问题,假设状态集可数,行动集为紧的非空Borel集,本文给出了p-约束最优策略的充要条件,证明了在适当的假设条件下必存在p-约束最优策略。  相似文献   

13.
We provide weak sufficient conditions for a full-service policy to be optimal in a queueing control problem in which the service rate is a dynamic decision variable. In our model there are service costs and holding costs and the objective is to minimize the expected total discounted cost over an infinite horizon. We begin with a semi-Markov decision model for a single-server queue with exponentially distributed inter-arrival and service times. Then we present a general model with weak probabilistic assumptions and demonstrate that the full-service policy minimizes both finite-horizon and infinite-horizon total discounted cost on each sample path.  相似文献   

14.
We study optimal control of Markov processes with age-dependent transition rates. The control policy is chosen continuously over time based on the state of the process and its age. We study infinite horizon discounted cost and infinite horizon average cost problems. Our approach is via the construction of an equivalent semi-Markov decision process. We characterise the value function and optimal controls for both discounted and average cost cases.  相似文献   

15.
This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total loss incurred during a first passage time to some target set exceeds a loss level. First, we establish the optimality equation via a successive approximation technique, and show that the value function is the unique solution to the optimality equation. Second, we give suitable conditions, under which we prove the existence of optimal policies and develop an algorithm for computing ?-optimal policies. Finally, we apply our main results to a business system.  相似文献   

16.
This paper studies the risk minimization problem in semi-Markov decision processes with denumerable states. The criterion to be optimized is the risk probability (or risk function) that a first passage time to some target set doesn't exceed a threshold value. We first characterize such risk functions and the corresponding optimal value function, and prove that the optimal value function satisfies the optimality equation by using a successive approximation technique. Then, we present some properties of optimal policies, and further give conditions for the existence of optimal policies. In addition, a value iteration algorithm and a policy improvement method for obtaining respectively the optimal value function and optimal policies are developed. Finally, two examples are given to illustrate the value iteration procedure and essential characterization of the risk function.  相似文献   

17.
本文讨论离散型冲击折扣半马氏决策过程,在建立模型后,我们将它化成了一个等价的离散时间马氏决策过程.  相似文献   

18.
本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性.  相似文献   

19.
《Optimization》2012,61(4):339-353
In this article we consider the approximate solution for semi-Markov decision problems with infinite horizon, countable state space, discounted cost function and finite action space. We present converging sequences of lower and upper bounds for the value function and, moreover, we derive a method for exclusion of suboptimal actions.  相似文献   

20.
章云等了一类报酬函数绝对平均相对有界的非时齐向量值马氏决策模型,得出了一最优策略存在的充分条件,并讨论了强最优和最优的关系,张升等导出了该模型的几个性质。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号