首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 109 毫秒
1.
本文考虑可数状态离散时间马氏决策过程的首达目标模型的风险概率准则.优化的准则是最小化系统首次到达目标状态集的时间不超过某阈值的风险概率.首先建立最优方程并且证明最优值函数和最优方程的解对应,然后讨论了最优策略的一些性质,并进一步给出了最优平稳策略存在的条件,最后用一个例子说明我们的结果.  相似文献   

2.
董泽清 《数学学报》1978,21(2):135-150
我们涉及的折扣马氏决策规划(有些著者称为马氏决策过程),具有状态空问与每个状态可用的决策集均为可数无穷集、次随机转移律族、有界报酬函数.给出了一个求(ε_)最优平稳策略的加速收敛逐次逼近算法,比White的逐次逼近算法更快地收敛于(ε_)最优解,并配合有非最优策略的检验准则,使算法更加得益. 设β为折扣因子,一般说β(或(ε,β))_最优平稳策略,往往是非唯一的,甚至与平稳策略类包含的策略数一样多.我们自然希望在诸β(或(ε,β))_最优平稳策略中寻求方差齐次地(关于初始状态)达(ε_)最小的策略.我们证明了这种策略确实存在,并给出了获得这种策略的算法.  相似文献   

3.
本文研究约束折扣半马氏决策规划问题,即在一折扣期望费用约束下,使折扣期望报酬达最大的约束最优问题,假设状态集可数,行动集为紧的非空Borel集,本文给出了p-约束最优策略的充要条件,证明了在适当的假设条件下必存在p-约束最优策略。  相似文献   

4.
对向量值半Markov决策规划给出了线性加权解法 .通过该方法还容易地证明了向量值半Markov决策规划存在平稳最优策略的结论 ,并给出了强最优策略存在与否的另一个判别法 .  相似文献   

5.
本文考虑可数状态空间非平稳马尔可夫决策过程(MDP)的平均目标.首先,我们指出并改正了Park,et,al[1]和Alden,etal[2]的错误,并在弱于Park,etal[1]的条件下,借助于新建立的最优方程,证明了最优平均值的收敛性和平均最优马氏策略的存在性.其次,给出了ε(>0)-平均最优马氏策略的滚动式算法.  相似文献   

6.
研究可数状态空间任意行动空间非一致性有界费用马氏决策过程(MDP)的强平均最优,给出了使得每个常用的平均最优策略也是强平均最优的条件,并实质性的推广了Cavazos-Cadena和Fernandez-Gaucheran(Math.Meth.Oper.Res.,1996,43:281-300)的主要结果.  相似文献   

7.
本文对可数状态集、非空决策集、报酬无界的平均准则马氏决策过程,提出了一组新的条件,在此条件下存在(ε)最优平稳策略,且当最优不等式中的和有定义时最优不等式也成立。  相似文献   

8.
本文讨论了可数状态空间、可数决策空间、次随机转移率族、有界报酬函数的马氏决策规划(MDP)的折扣模型,给出了一个非ε-最优策略的检验准则.  相似文献   

9.
本文用线性规划方法研究平均马氏决策过程,推广了K.W.Ross在[4]中得出的结果,给出列紧空间中可列状态可列行动多重约束马氏决策过程最优随机平稳策略的存在性。  相似文献   

10.
设备修理、更新模型及最优策略   总被引:6,自引:0,他引:6  
本文利用具有有限个状态和无限个选择行动的半马氏决策过程(SM-DP)建立了一个比较符合实际情况的设备修理、更新模型。在无穷时间和连续折扣情况下,证明了最优修理、更新策略的存在,以使设备的期望折扣净收入最大。  相似文献   

11.
This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs.The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set.We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy.Then,we prove that the value function satisfies the optimality equation and there exists an optimal(or e-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach.Further we give some properties of optimal policies.In addition,a value iteration algorithm for computing the value function and optimal policies is developed and an example is given.Finally,it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes.  相似文献   

12.
This paper investigates finite horizon semi-Markov decision processes with denumerable states. The optimality is over the class of all randomized history-dependent policies which include states and also planning horizons, and the cost rate function is assumed to be bounded below. Under suitable conditions, we show that the value function is a minimum nonnegative solution to the optimality equation and there exists an optimal policy. Moreover, we develop an effective algorithm for computing optimal policies, derive some properties of optimal policies, and in addition, illustrate our main results with a maintenance system.  相似文献   

13.
We consider undiscounted semi-Markov decision process with a target set and our main concern is a problem minimizing threshold probability. We formulate the problem as an infinite horizon case with a recurrent class. We show that an optimal value function is a unique solution to an optimality equation and there exists a stationary optimal policy. Also several value iteration methods and a policy improvement method are given in our model. Furthermore, we investigate a relationship between threshold probabilities and expectations for total rewards.  相似文献   

14.
We provide weak sufficient conditions for a full-service policy to be optimal in a queueing control problem in which the service rate is a dynamic decision variable. In our model there are service costs and holding costs and the objective is to minimize the expected total discounted cost over an infinite horizon. We begin with a semi-Markov decision model for a single-server queue with exponentially distributed inter-arrival and service times. Then we present a general model with weak probabilistic assumptions and demonstrate that the full-service policy minimizes both finite-horizon and infinite-horizon total discounted cost on each sample path.  相似文献   

15.
We study optimal control of Markov processes with age-dependent transition rates. The control policy is chosen continuously over time based on the state of the process and its age. We study infinite horizon discounted cost and infinite horizon average cost problems. Our approach is via the construction of an equivalent semi-Markov decision process. We characterise the value function and optimal controls for both discounted and average cost cases.  相似文献   

16.
In this paper, we analyse an optimal production, repair and replacement problem for a manufacturing system subject to random machine breakdowns. The system produces parts, and upon machine breakdown, either an imperfect repair is undertaken or the machine is replaced with a new identical one. The decision variables of the system are the production rate and the repair/replacement policy. The objective of the control problem is to find decision variables that minimize total incurred costs over an infinite planning horizon. Firstly, a hierarchical decision making approach, based on a semi-Markov decision model (SMDM), is used to determine the optimal repair and replacement policy. Secondly, the production rate is determined, given the obtained repair and replacement policy. Optimality conditions are given and numerical methods are used to solve them and to determine the control policy. We show that the number of parts to hold in inventory in order to hedge against breakdowns must be readjusted to a higher level as the number of breakdowns increases or as the machine ages. We go from the traditional policy with only one high threshold level to a policy with several threshold levels, which depend on the number of breakdowns. Numerical examples and sensitivity analyses are presented to illustrate the usefulness of the proposed approach.  相似文献   

17.
A system deteriorates stochastically over time. When the deterioration reaches a critical level, the system has to be either repaired or replaced by a new one. Repairs are generally both less effective and less costly than a replacement. By modelling the problem as a semi-Markov decision process, an explicit expression is found for the optimal maintenance policy for minimizing the long run average cost per unit time. Some numerical results are given for a multi-state system.  相似文献   

18.
This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total loss incurred during a first passage time to some target set exceeds a loss level. First, we establish the optimality equation via a successive approximation technique, and show that the value function is the unique solution to the optimality equation. Second, we give suitable conditions, under which we prove the existence of optimal policies and develop an algorithm for computing ?-optimal policies. Finally, we apply our main results to a business system.  相似文献   

19.
This paper studies the risk minimization problem in semi-Markov decision processes with denumerable states. The criterion to be optimized is the risk probability (or risk function) that a first passage time to some target set doesn't exceed a threshold value. We first characterize such risk functions and the corresponding optimal value function, and prove that the optimal value function satisfies the optimality equation by using a successive approximation technique. Then, we present some properties of optimal policies, and further give conditions for the existence of optimal policies. In addition, a value iteration algorithm and a policy improvement method for obtaining respectively the optimal value function and optimal policies are developed. Finally, two examples are given to illustrate the value iteration procedure and essential characterization of the risk function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号