首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
本文讨论了一类非时齐部分可观察Markov决策模型.在不改变状态空间可列性的条件下,把该模型转化为[5]中的一般化折扣模型,从而解决了其最优策略问题,并且得到了该模型的有限阶段逼近算法,其中该算法涉及的状态是可列的.  相似文献   

4.
We investigate the problem of minimizing the Average-Value-at-Risk (AVaR τ ) of the discounted cost over a finite and an infinite horizon which is generated by a Markov Decision Process (MDP). We show that this problem can be reduced to an ordinary MDP with extended state space and give conditions under which an optimal policy exists. We also give a time-consistent interpretation of the AVaR τ . At the end we consider a numerical example which is a simple repeated casino game. It is used to discuss the influence of the risk aversion parameter τ of the AVaR τ -criterion.  相似文献   

5.
A model for switching between gilt-edged securities is developed using a modified version of Howard's Markov decision algorithm. The model makes use of empirical observations of the behaviour of relative price movements. It produced some interesting results in the theory of Markov decision processes, and empirical tests of methods of implementation, which allow for constraints not included in the formal model, showed very promising results.  相似文献   

6.
Comparisons of the performance of solution algorithms for Markov decision processes rely heavily on problem generators to provide sizeable sets of test problems. Existing generation techniques allow little control over the properties of the test problems and often result in problems which are not typical of real-world examples. This paper identifies the properties of Markov decision processes which affect the performance of solution algorithms, and also describes a new problem generation technique which allows all of these properties to be controlled.  相似文献   

7.
We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of finite horizon MDPs to the infinite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with infinite state space, (3) convergence of MDPs as the discount factor goes to a limit. In all these cases we establish the convergence of optimal values and policies. Moreover, based on the optimal policy for the limiting problem, we construct policies which are almost optimal for the other (approximating) problems. Based on the convergence of MDPs with a truncated state space to the problem with infinite state space, we show that an optimal stationary policy exists such that the number of randomisations it uses is less or equal to the number of constraints plus one. We finally apply the results to a dynamic scheduling problem.This work was partially supported by the Chateaubriand fellowship from the French embassy in Israel and by the European Grant BRA-QMIPS of CEC DG XIII  相似文献   

8.
该文考虑的是可数状态空间有限行动空间非齐次马氏决策过程的期望总报酬准则.与以往不同的是,我们是通过扩大状态空间的方法,将非齐次的马氏决策过程转化成齐次的马氏决策过程,于是非常简洁地得到了按传统的方法所得的主要结果.  相似文献   

9.
徐晨 《数学研究》1998,31(3):312-318
本文讨论半马氏环境连续时间马氏决策过程中的平均准则.首先讨论了半马氏报酬过程中的逼近问题,进而讨论平均目标函数逼近问题。  相似文献   

10.
11.
The paper describes a bound on the optimal value of a Markov Decision Process using the iterates of the value iteration algorithm. Previous bounds have depended upon the values of the last two iterations, but here we consider a bound which depends on the last three iterates. We show that in a maximisation problem, this leads to a better lower bound on the optimal value.  相似文献   

12.
This paper considers the application of a variable neighborhood search (VNS) algorithm for finite-horizon (H stages) Markov Decision Processes (MDPs), for the purpose of alleviating the “curse of dimensionality” phenomenon in searching for the global optimum. The main idea behind the VNSMDP algorithm is that, based on the result of the stage just considered, the search for the optimal solution (action) of state x in stage t is conducted systematically in variable neighborhood sets of the current action. Thus, the VNSMDP algorithm is capable of searching for the optimum within some subsets of the action space, rather than over the whole action set. Analysis on complexity and convergence attributes of the VNSMDP algorithm are conducted in the paper. It is shown by theoretical and computational analysis that, the VNSMDP algorithm succeeds in searching for the global optimum in an efficient way.  相似文献   

13.
本文考虑可数状态离散时间马氏决策过程的首达目标模型的风险概率准则.优化的准则是最小化系统首次到达目标状态集的时间不超过某阈值的风险概率.首先建立最优方程并且证明最优值函数和最优方程的解对应,然后讨论了最优策略的一些性质,并进一步给出了最优平稳策略存在的条件,最后用一个例子说明我们的结果.  相似文献   

14.
In this paper, we show that a discounted continuous-time Markov decision process in Borel spaces with randomized history-dependent policies, arbitrarily unbounded transition rates and a non-negative reward rate is equivalent to a discrete-time Markov decision process. Based on a completely new proof, which does not involve Kolmogorov??s forward equation, it is shown that the value function for both models is given by the minimal non-negative solution to the same Bellman equation. A verifiable necessary and sufficient condition for the finiteness of this value function is given, which induces a new condition for the non-explosion of the underlying controlled process.  相似文献   

15.
We consider continuous-time Markov decision processes in Polish spaces. The performance of a control policy is measured by the expected discounted reward criterion associated with state-dependent discount factors. All underlying Markov processes are determined by the given transition rates which are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. By using the dynamic programming approach, we establish the discounted reward optimality equation (DROE) and the existence and uniqueness of its solutions. Under suitable conditions, we also obtain a discounted optimal stationary policy which is optimal in the class of all randomized stationary policies. Moreover, when the transition rates are uniformly bounded, we provide an algorithm to compute (or?at least to approximate) the discounted reward optimal value function as well as a discounted optimal stationary policy. Finally, we use an example to illustrate our results. Specially, we first derive an explicit and exact solution to the DROE and an explicit expression of a discounted optimal stationary policy for such an example.  相似文献   

16.
The use of Markov Decision Processes for Inspection Maintenance and Rehabilitation of civil engineering structures relies on the use of several transition matrices related to the stochastic degradation process, maintenance actions and imperfect inspections. Point estimators for these matrices are usually used and they are evaluated using statistical inference methods and/or expert evaluation methods. Thus, considerable epistemic uncertainty often veils the true values of these matrices. Our contribution through this paper is threefold. First, we present a methodology for incorporating epistemic uncertainties in dynamic programming algorithms used to solve finite horizon Markov Decision Processes (which may be partially observable). Second, we propose a methodology based on the use of Dirichlet distributions which answers, in our sense, much of the controversy found in the literature about estimating Markov transition matrices. Third, we show how the complexity resulting from the use of Monte-Carlo simulations for the transition matrices can be greatly overcome in the framework of dynamic programming. The proposed model is applied to concrete bridge under degradation, in order to provide the optimal strategy for inspection and maintenance. The influence of epistemic uncertainties on the optimal solution is underlined through sensitivity analysis regarding the input data.  相似文献   

17.
18.
We provide a bound for the variation of the function that assigns to every competitive Markov decision process and every discount factor its discounted value. This bound implies that the undiscounted value of a competitive Markov decision process is continuous in the relative interior of the space of transition rules.  相似文献   

19.
离散事件系统静态稳定性的马氏决策过程方法   总被引:3,自引:0,他引:3  
本文用马氏决策过程方法来讨论离散事件系统(DES)的静态稳定性问题,包括强吸引域和弱吸引域的计算,同时讨论了弱吸引域的稳定控制器的计算问题,所用方法不要求给定谓词的∑-(或∑u-)不变性和对环路的判别。  相似文献   

20.
Abstract

This article deals with the limiting average variance criterion for discrete-time Markov decision processes in Borel spaces. The costs may have neither upper nor lower bounds. We propose another set of conditions under which we prove the existence of a variance minimal policy in the class of average expected cost optimal stationary policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of a variance minimal policy are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has been first used to study the limiting average variance criterion. Also, the optimality inequality approach provided here is different from the “optimality equation approach” widely used in the previous literature. Finally, we use a controlled queueing system to illustrate our results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号