首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 656 毫秒
1.
A Markov-type hybrid process with discrete parameters is constructed from credibilistic kernels and stochastic kernels. To evaluate a hybrid reward process, a discounted total expected value is defined, which is characterized as a fixed point of the corresponding operator. Also, examples are given.  相似文献   

2.
In this paper we suggest a new successive approximation method to compute the optimal discounted reward for finite state and action, discrete time, discounted Markov decision chains. The method is based on a block partitioning of the (stochastic) matrices corresponding to the stationary policies. The method is particularly attractive when the transition matrices are jointly nearly decomposable or nearly completely decomposable.  相似文献   

3.
We consider a discrete time finite Markov decision process (MDP) with the discounted and weighted reward optimality criteria. In [1] the authors considered some decomposition of limiting average MDPs. In this paper, we use an analogous approach for discounted and weighted MDPs. Then, we construct some hierarchical decomposition algorithms for both discounted and weighted MDPs.  相似文献   

4.
5.
We consider continuous-time Markov decision processes in Polish spaces. The performance of a control policy is measured by the expected discounted reward criterion associated with state-dependent discount factors. All underlying Markov processes are determined by the given transition rates which are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. By using the dynamic programming approach, we establish the discounted reward optimality equation (DROE) and the existence and uniqueness of its solutions. Under suitable conditions, we also obtain a discounted optimal stationary policy which is optimal in the class of all randomized stationary policies. Moreover, when the transition rates are uniformly bounded, we provide an algorithm to compute (or?at least to approximate) the discounted reward optimal value function as well as a discounted optimal stationary policy. Finally, we use an example to illustrate our results. Specially, we first derive an explicit and exact solution to the DROE and an explicit expression of a discounted optimal stationary policy for such an example.  相似文献   

6.
In this paper, we show that a discounted continuous-time Markov decision process in Borel spaces with randomized history-dependent policies, arbitrarily unbounded transition rates and a non-negative reward rate is equivalent to a discrete-time Markov decision process. Based on a completely new proof, which does not involve Kolmogorov??s forward equation, it is shown that the value function for both models is given by the minimal non-negative solution to the same Bellman equation. A verifiable necessary and sufficient condition for the finiteness of this value function is given, which induces a new condition for the non-explosion of the underlying controlled process.  相似文献   

7.
In a decision process (gambling or dynamic programming problem) with finite state space and arbitrary decision sets (gambles or actions), there is always available a Markov strategy which uniformly (nearly) maximizes the average time spent at a goal. If the decision sets are closed, there is even a stationary strategy with the same property.Examples are given to show that approximations by discounted or finite horizon payoffs are not useful for the general average reward problem.  相似文献   

8.
We propose a new approach to accelerate the convergence of the modified policy iteration method for Markov decision processes with the total expected discounted reward. In the new policy iteration an additional operator is applied to the iterate generated by Markov operator, resulting in a bigger improvement in each iteration.  相似文献   

9.
We study an optimal maintenance policy for the server in a queueing system. Customers arrive at the server in a Poisson stream and are served by an exponential server, which is subject to multiple states indicating levels of popularity. The server state transitions are governed by a Markov process. The arrival rate depends on the server state and it decreases as the server loses popularity. By maintenance the server state recovers completely, though the customers in the system are lost at the beginning of maintenance. The customers who arrive during maintenance are also lost. In this paper, two kinds of such systems are considered. The first system receives a unit reward when a customer arrives at the system and pays a unit cost for each lost customer at the start of maintenance. The second system receives a unit reward at departure, and pays nothing for lost customers at the beginning of maintenance. Our objective is to maximize the total expected discounted profit over an infinite time horizon. We use a semi-Markov decision process to formulate the problem and are able to establish some properties for the optimal maintenance policy under certain conditions.  相似文献   

10.
本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性.  相似文献   

11.
本文研究约束折扣半马氏决策规划问题,即在一折扣期望费用约束下,使折扣期望报酬达最大的约束最优问题,假设状态集可数,行动集为紧的非空Borel集,本文给出了p-约束最优策略的充要条件,证明了在适当的假设条件下必存在p-约束最优策略。  相似文献   

12.
Abstract

In this article, we study continuous-time Markov decision processes in Polish spaces. The optimality criterion to be maximized is the expected discounted criterion. The transition rates may be unbounded, and the reward rates may have neither upper nor lower bounds. We provide conditions on the controlled system's primitive data under which we prove that the transition functions of possibly non-homogeneous continuous-time Markov processes are regular by using Feller's construction approach to such transition functions. Then, under continuity and compactness conditions we prove the existence of optimal stationary policies by using the technique of extended infinitesimal operators associated with the transition functions of possibly non-homogeneous continuous-time Markov processes, and also provide a recursive way to compute (or at least to approximate) the optimal reward values. The conditions provided in this paper are different from those used in the previous literature, and they are illustrated with an example.  相似文献   

13.
This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games. Research partially supported by grants NSFC, DRFP and NCET. Research partially supported by CONACyT (Mexico) Grant 45693-F.  相似文献   

14.
After an introduction into sensitive criteria in Markov decision processes and a discussion of definitions, we prove the existence of stationary Blackwell optimal policies under following main assumptions: (i) the state space is a Borel one; (ii) the action space is countable, the action sets are finite; (iii) the transition function is given by a transition density; (iv) a simultaneous Doeblin-type recurrence condition holds. The proof is based on an aggregation of randomized stationary policies into measures. Topology in the space of those measures is at the same time a weak and a strong one, and this fact yields compactness of the space and continuity of Laurent coefficients of the expected discounted reward. Another important tool is a lexicographical policy improvement. The exposition is mostly self-contained.Supported by the National Science Foundation.  相似文献   

15.
16.
章云等了一类报酬函数绝对平均相对有界的非时齐向量值马氏决策模型,得出了一最优策略存在的充分条件,并讨论了强最优和最优的关系,张升等导出了该模型的几个性质。  相似文献   

17.
So far, there have been several concepts about fuzzy random variables and their expected values in literature. One of the concepts defined by Liu and Liu (2003a) is that the fuzzy random variable is a measurable function from a probability space to a collection of fuzzy variables and its expected value is described as a scalar number. Based on the concepts, this paper addresses two processes—fuzzy random renewal process and fuzzy random renewal reward process. In the fuzzy random renewal process, the interarrival times are characterized as fuzzy random variables and a fuzzy random elementary renewal theorem on the limit value of the expected renewal rate of the process is presented. In the fuzzy random renewal reward process, both the interarrival times and rewards are depicted as fuzzy random variables and a fuzzy random renewal reward theorem on the limit value of the long-run expected reward per unit time is provided. The results obtained in this paper coincide with those in stochastic case or in fuzzy case when the fuzzy random variables degenerate to random variables or to fuzzy variables.  相似文献   

18.
For an infinite-horizon discounted Markov decision process with a finite number of states and actions, this note provides upper bounds on the number of operations required to compute an approximately optimal policy by value iterations in terms of the discount factor, spread of the reward function, and desired closeness to optimality. One of the provided upper bounds on the number of iterations has the property that it is a non-decreasing function of the value of the discount factor.  相似文献   

19.
We provide weak sufficient conditions for a full-service policy to be optimal in a queueing control problem in which the service rate is a dynamic decision variable. In our model there are service costs and holding costs and the objective is to minimize the expected total discounted cost over an infinite horizon. We begin with a semi-Markov decision model for a single-server queue with exponentially distributed inter-arrival and service times. Then we present a general model with weak probabilistic assumptions and demonstrate that the full-service policy minimizes both finite-horizon and infinite-horizon total discounted cost on each sample path.  相似文献   

20.
This paper analyzes the continuity and differentiability of several classes of ruin functions under Markov-modulated insurance risk models with a barrier and threshold dividend strategy, respectively. Many ruin related functions in the literature, such as the expectation and the Laplace transform of the Gerber–Shiu discounted penalty function at ruin, of the total discounted dividends until ruin, and of the time-integrated discounted penalty and/or reward function of the risk process, etc, are special cases of the functions considered in this paper. Continuity and differentiability of these functions in the corresponding dual models are also studied.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号