首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The concept of a limiting conditional age distribution of a continuous time Markov process whose state space is the set of non-negative integers and for which {0} is absorbing is defined as the weak limit as t→∞ of the last time before t an associated “return” Markov process exited from {0} conditional on the state, j, of this process at t. It is shown that this limit exists and is non-defective if the return process is ρ-recurrent and satisfies the strong ratio limit property. As a preliminary to the proof of the main results some general results are established on the representation of the ρ-invariant measure and function of a Markov process. The conditions of the main results are shown to be satisfied by the return process constructed from a Markov branching process and by birth and death processes. Finally, a number of limit theorems for the limiting age as j→∞ are given.  相似文献   

2.
This paper is concerned with the properties of the value-iteration operator0 which arises in undiscounted Markov decision problems. We give both necessary and sufficient conditions for this operator to reduce to a contraction operator, in which case it is easy to show that the value-iteration method exhibits a uniform geometric convergence rate. As necessary conditions we obtain a number of important characterizations of the chain and periodicity structures of the problem, and as sufficient conditions, we give a general “scrambling-type” recurrency condition, which encompasses a number of important special cases. Next, we show that a data transformation turns every unichained undiscounted Markov Renewal Program into an equivalent undiscounted Markov decision problem, in which the value-iteration operator is contracting, because it satisfies this “scrambling-type” condition. We exploit this contraction property in order to obtain lower and upper bounds as well as variational characterizations for the fixed point of the optimality equation and a test for eliminating suboptimal actions.  相似文献   

3.
The semi-Markov process studied here is a generalized random walk on the non-negative integers with zero as a reflecting barrier, in which the time interval between two consecutive jumps is given an arbitrary distribution H(t). Our process is identical with the Markov chain studied by Miller [6] in the special case when H(t)=U1(t), the Heaviside function with unit jump at t=1. By means of a Spitzer-Baxter type identity, we establish criteria for transience, positive and null recurrence, as well as conditions for exponential ergodicity. The results obtained here generalize those of [6] and some classical results in random walk theory [10].  相似文献   

4.
We consider continuous-time Markov decision processes in Polish spaces. The performance of a control policy is measured by the expected discounted reward criterion associated with state-dependent discount factors. All underlying Markov processes are determined by the given transition rates which are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. By using the dynamic programming approach, we establish the discounted reward optimality equation (DROE) and the existence and uniqueness of its solutions. Under suitable conditions, we also obtain a discounted optimal stationary policy which is optimal in the class of all randomized stationary policies. Moreover, when the transition rates are uniformly bounded, we provide an algorithm to compute (or?at least to approximate) the discounted reward optimal value function as well as a discounted optimal stationary policy. Finally, we use an example to illustrate our results. Specially, we first derive an explicit and exact solution to the DROE and an explicit expression of a discounted optimal stationary policy for such an example.  相似文献   

5.
This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation.  相似文献   

6.
7.
This paper studies three ways to construct a nonhomogeneous jump Markov process: (i) via a compensator of the random measure of a multivariate point process, (ii) as a minimal solution of the backward Kolmogorov equation, and (iii) as a minimal solution of the forward Kolmogorov equation. The main conclusion of this paper is that, for a given measurable transition intensity, commonly called a Q-function, all these constructions define the same transition function. If this transition function is regular, that is, the probability of accumulation of jumps is zero, then this transition function is the unique solution of the backward and forward Kolmogorov equations. For continuous Q-functions, Kolmogorov equations were studied in Feller?s seminal paper. In particular, this paper extends Feller?s results for continuous Q-functions to measurable Q-functions and provides additional results.  相似文献   

8.
This paper concerns the construction and regularity of a transition (probability) function of a non-homogeneous continuous-time Markov process with given transition rates and a general state space. Motivating from a lot of restriction in applications of a transition function with continuous (in t≥0) and conservative transition rates q(t, x, Λ), we consider the case that q(t,x,Λ) are only required to satisfy a mild measurability (in t≥0) condition, which is a generalization of the continuity condition. Under the measurability condition we construct a transition function with the given transition rates, provide a necessary and sufficient condition for it to be regular, and further obtain some interesting additional results.  相似文献   

9.
This paper deals with a particular type of Markov decision process in which the state takes the form I = S × Z, where S is countable, and Z = {1, 2}, and the action space K = Z, independently of s?S. The state space I is ordered by a partial order ?, which is specified in terms of an integer valued function on S. The action space K has the natural order ≤. Under certain monotonicity and submodularity conditions it is shown that isotone optimal policies exist with respect to ? and ? on I and K respectively. The paper examines how the particular isotone structure may be used to simplify the usual policy space algorithm. A brief discussion of the usual successive approximation (value iteration) method, and also the extension of the ideas to semi-Markov decision processes, is given.  相似文献   

10.
We present a specialized policy iteration method for the computation of optimal and approximately optimal policies for a discrete-time model of a single reservoir whose discharges generate hydroelectric power. The model is described in (Lamond et al., 1995) and (Drouin et al., 1996), where the special structure of optimal policies is given and an approximate value iteration method is presented, using piecewise affine approximations of the optimal return functions. Here, we present a finite method for computing an optimal policy in O(n3) arithmetic operations, where n is the number of states in the associated Markov decision process, and a finite method for computing a lower bound on the optimal value function in O(m2n) where m is the number of nodes of the piecewise affine approximation.  相似文献   

11.
The equation F(x, σ) = 0,xK, in which σ is a parameter and x is an unknown taking values in a given convex cone in a Banach space X, is considered. This equation is examined in a neighborhood of a given solution (x *, σ*) for which the Robinson regularity condition may be violated. Under the assumption that the 2-regularity condition (defined in the paper), which is much weaker than the Robinson regularity condition, is satisfied, an implicit function theorem is obtained for this equation. This result is a generalization of the known implicit function theorems even for the case when the cone K coincides with the entire space X.  相似文献   

12.
13.
In the theory and applications of Markov decision processes introduced by Howard and subsequently developed by many authors, it is assumed that actions can be chosen independently at each state. A policy constrained Markov decision process is one where selecting a given action in one state restricts the choice of actions in another. This note describes a method for determining a maximal gain policy in the policy constrained case. The method involves the use of bounds on the gain of the feasible policies to produce a policy ranking list. This list then forms a basis for a bounded enumeration procedure which yields the optimal policy.  相似文献   

14.
We consider a 1-dimensional reaction-diffusion equation with nonlinear boundary conditions of logistic type with delay. We deal with non-negative solutions and analyze the stability behavior of its unique positive equilibrium solution, which is given by the constant function u≡1. We show that if the delay is small, this equilibrium solution is asymptotically stable, similar as in the case without delay. We also show that, as the delay goes to infinity, this equilibrium becomes unstable and undergoes a cascade of Hopf bifurcations. The structure of this cascade will depend on the parameters appearing in the equation. This equation shows some dynamical behavior that differs from the case where the nonlinearity with delay is in the interior of the domain.  相似文献   

15.
Extending the multi-timescale model proposed by the author et al. in the context of Markov decision processes, this paper proposes a simple analytical model called M timescale two-person zero-sum Markov Games (MMGs) for hierarchically structured sequential decision-making processes in two players' competitive situations where one player (the minimizer) wishes to minimize their cost that will be paid to the adversary (the maximizer). In this hierarchical model, for each player, decisions in each level in the M-level hierarchy are made in M different discrete timescales and the state space and the control space of each level in the hierarchy are non-overlapping with those of the other levels, respectively, and the hierarchy is structured in a "pyramid" sense such that a decision made at level m (slower timescale) state and/or the state will affect the evolutionary decision making process of the lower-level m+1 (faster timescale) until a new decision is made at the higher level but the lower-level decisions themselves do not affect the transition dynamics of higher levels. The performance produced by the lower-level decisions will affect the higher level decisions for each player. A hierarchical objective function for the minimizer and the maximizer is defined, and from this we define "multi-level equilibrium value function" and derive a "multi-level equilibrium equation". We also discuss how to solve hierarchical games exactly.  相似文献   

16.
本文考虑可数状态离散时间马氏决策过程的首达目标模型的风险概率准则.优化的准则是最小化系统首次到达目标状态集的时间不超过某阈值的风险概率.首先建立最优方程并且证明最优值函数和最优方程的解对应,然后讨论了最优策略的一些性质,并进一步给出了最优平稳策略存在的条件,最后用一个例子说明我们的结果.  相似文献   

17.
《Optimization》2012,61(4):773-800
Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs.  相似文献   

18.
We consider the stationary distribution of the M/GI/1 type queue when background states are countable. We are interested in its tail behavior. To this end, we derive a Markov renewal equation for characterizing the stationary distribution using a Markov additive process that describes the number of customers in system when the system is not empty. Variants of this Markov renewal equation are also derived. It is shown that the transition kernels of these renewal equations can be expressed by the ladder height and the associated background state of a dual Markov additive process. Usually, matrix analysis is extensively used for studying the M/G/1 type queue. However, this may not be convenient when the background states are countable. We here rely on stochastic arguments, which not only make computations possible but also reveal new features. Those results are applied to study the tail decay rates of the stationary distributions. This includes refinements of the existence results with extensions.  相似文献   

19.
Abstract

The members of a set of conditional probability density functions are called compatible if there exists a joint probability density function that generates them. We generalize this concept by calling the conditionals functionally compatible if there exists a non-negative function that behaves like a joint density as far as generating the conditionals according to the probability calculus, but whose integral over the whole space is not necessarily finite. A necessary and sufficient condition for functional compatibility is given that provides a method of calculating this function, if it exists. A Markov transition function is then constructed using a set of functionally compatible conditional densities and it is shown, using the compatibility results, that the associated Markov chain is positive recurrent if and only if the conditionals are compatible. A Gibbs Markov chain, constructed via “Gibbs conditionals” from a hierarchical model with an improper posterior, is a special case. Therefore, the results of this article can be used to evaluate the consequences of applying the Gibbs sampler when the posterior's impropriety is unknown to the user. Our results cannot, however, be used to detect improper posteriors. Monte Carlo approximations based on Gibbs chains are shown to have undesirable limiting behavior when the posterior is improper. The results are applied to a Bayesian hierarchical one-way random effects model with an improper posterior distribution. The model is simple, but also quite similar to some models with improper posteriors that have been used in conjunction with the Gibbs sampler in the literature.  相似文献   

20.
In this paper, we consider a Markov additive insurance risk process under a randomized dividend strategy in the spirit of Albrecher et al. (2011). Decisions on whether to pay dividends are only made at a sequence of dividend decision time points whose intervals are Erlang(n) distributed. At a dividend decision time, if the surplus level is larger than a predetermined dividend barrier, then the excess is paid as a dividend as long as ruin has not occurred. In contrast to Albrecher et al. (2011), it is assumed that the event of ruin is monitored continuously (Avanzi et al. (2013) and Zhang (2014)), i.e. the surplus process is stopped immediately once it drops below zero. The quantities of our interest include the Gerber-Shiu expected discounted penalty function and the expected present value of dividends paid until ruin. Solutions are derived with the use of Markov renewal equations. Numerical examples are given, and the optimal dividend barrier is identified in some cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号