首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
This paper deals with discrete-time Markov control processes in Borel spaces, with unbounded rewards. The criterion to be optimized is a long-run sample-path (or pathwise) average reward subject to constraints on a long-run pathwise average cost. To study this pathwise problem, we give conditions for the existence of optimal policies for the problem with “expected” constraints. Moreover, we show that the expected case can be solved by means of a parametric family of optimality equations. These results are then extended to the problem with pathwise constraints.  相似文献   

2.
This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given.  相似文献   

3.
《Optimization》2012,61(4):773-800
Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs.  相似文献   

4.
This work is concerned with controlled Markov chains with finite state and action spaces. It is assumed that the decision maker has an arbitrary but constant risk sensitivity coefficient, and that the performance of a control policy is measured by the long-run average cost criterion. Within this framework, the existence of solutions of the corresponding risk-sensitive optimality equation for arbitrary cost function is characterized in terms of communication properties of the transition law.  相似文献   

5.
The paper deals with continuous time Markov decision processes on a fairly general state space. The economic criterion is the long-run average return. A set of conditions is shown to be sufficient for a constant g to be optimal average return and a stationary policy π1 to be optimal. This condition is shown to be satisfied under appropriate assumptions on the optimal discounted return function. A policy improvement algorithm is proposed and its convergence to an optimal policy is proved.  相似文献   

6.
This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature.  相似文献   

7.
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957).  相似文献   

8.
Abstract

This article deals with the limiting average variance criterion for discrete-time Markov decision processes in Borel spaces. The costs may have neither upper nor lower bounds. We propose another set of conditions under which we prove the existence of a variance minimal policy in the class of average expected cost optimal stationary policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of a variance minimal policy are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has been first used to study the limiting average variance criterion. Also, the optimality inequality approach provided here is different from the “optimality equation approach” widely used in the previous literature. Finally, we use a controlled queueing system to illustrate our results.  相似文献   

9.
Abstract

This article deals with discrete-time two-person zero-sum stochastic games with Borel state and action spaces. The optimality criterion to be studied is the long-run expected average payoff criterion, and the (immediate) payoff function may have neither upper nor lower bounds. We first replace the optimality equation widely used in the previous literature with two so-called optimality inequalities, and give a new set of conditions for the existence of solutions to the optimality inequalities. Then, from the optimality inequalities we ensure the existence of a pair of average optimal stationary strategies. Our new condition is slightly weaker than those in the previous literature, and as a byproduct some interesting results such as the convergence of a value iteration scheme to the value of the discounted payoff game is obtained. Finally, we first apply the main results in this article to generalized inventory systems, and then further provide an example of controlled population processes for which all of our conditions are satisfied, while some of conditions in some of previous literature fail to hold.  相似文献   

10.
An optimal replacement policy for a multistate degenerative simple system   总被引:1,自引:0,他引:1  
In this paper, a degenerative simple system (i.e. a degenerative one-component system with one repairman) with k + 1 states, including k failure states and one working state, is studied. Assume that the system after repair is not “as good as new”, and the degeneration of the system is stochastic. Under these assumptions, we consider a new replacement policy T based on the system age. Our problem is to determine an optimal replacement policy T such that the average cost rate (i.e. the long-run average cost per unit time) of the system is minimized. The explicit expression of the average cost rate is derived, the corresponding optimal replacement policy can be determined, the explicit expression of the minimum of the average cost rate can be found and under some mild conditions the existence and uniqueness of the optimal policy T can be proved, too. Further, we can show that the repair model for the multistate system in this paper forms a general monotone process repair model which includes the geometric process repair model as a special case. We can also show that the repair model in the paper is equivalent to a geometric process repair model for a two-state degenerative simple system in the sense that they have the same average cost rate and the same optimal policy. Finally, a numerical example is given to illustrate the theoretical results of this model.  相似文献   

11.
Preventive maintenance policies have been studied in the literature without considering the risk due to the cost variability. In this paper, we consider the two most popular preventive replacement policies, namely, age and block replacement policies under long-run average cost and expected unit time cost criteria. To quantify the risk in the preventive maintenance policies, we use the long-run variance of the accumulated cost over a time interval. We numerically derive the Risk-sensitive preventive replacement policies and study the impact of the Risk-sensitive optimality criterion on the managerial decisions. We also examine the performance of the expected unit time cost criterion as an alternative to the traditional long-run average cost criterion.  相似文献   

12.
This note concerns controlled Markov chains on a denumerable sate space. The performance of a control policy is measured by the risk-sensitive average criterion, and it is assumed that (a) the simultaneous Doeblin condition holds, and (b) the system is communicating under the action of each stationary policy. If the cost function is bounded below, it is established that the optimal average cost is characterized by an optimality inequality, and it is to shown that, even for bounded costs, such an inequality may be strict at every state. Also, for a nonnegative cost function with compact support, the existence an uniqueness of bounded solutions of the optimality equation is proved, and an example is provided to show that such a conclusion generally fails when the cost is negative at some state.  相似文献   

13.
This paper provides a characterization of the optimal average cost function, when the long-run (risk-sensitive) average cost criterion is used. The Markov control model has a denumerable state space with finite set of actions, and the characterization presented is given in terms of a system of local Poisson equations, which gives as a by-product the existence of an optimal stationary policy.  相似文献   

14.
This paper deals with semi-Markov decision processes under the average expected criterion. The state and action spaces are Borel spaces, and the cost/reward function is allowed to be unbounded from above and from below. We give another set of conditions, under which the existence of an optimal (deterministic) stationary policy is proven by a new technique of two average optimality inequalities. Our conditions are slightly weaker than those in the existing literature, and some new sufficient conditions for the verifications of our assumptions are imposed on the primitive data of the model. Finally, we illustrate our results with three examples.  相似文献   

15.
The self-tuning scheme for the adaptive control of a diffusion process is studied with long-run average cost criterion and maximum likelihood estimation of parameters. Asymptotic optimality under a suitable identifiability condition is established under two alternative sets of hypotheses—a Lyapunov-type stability criterion and a condition on cost which penalizes instability.  相似文献   

16.
This paper investigates a queueing system in which the controller can perform admission and service rate control. In particular, we examine a single-server queueing system with Poisson arrivals and exponentially distributed services with adjustable rates. At each decision epoch the controller may adjust the service rate. Also, the controller can reject incoming customers as they arrive. The objective is to minimize long-run average costs which include: a holding cost, which is a non-decreasing function of the number of jobs in the system; a service rate cost c(x), representing the cost per unit time for servicing jobs at rate x; and a rejection cost κ for rejecting a single job. From basic principles, we derive a simple, efficient algorithm for computing the optimal policy. Our algorithm also provides an easily computable bound on the optimality gap at every step. Finally, we demonstrate that, in the class of stationary policies, deterministic stationary policies are optimal for this problem.  相似文献   

17.
This note concerns discrete-time controlled Markov chains with Borel state and action spaces. Given a nonnegative cost function, the performance of a control policy is measured by the superior limit risk-sensitive average criterion associated with a constant and positive risk sensitivity coefficient. Within such a framework, the discounted approach is used (a) to establish the existence of solutions for the corresponding optimality inequality, and (b) to show that, under mild conditions on the cost function, the optimal value functions corresponding to the superior and inferior limit average criteria coincide on a certain subset of the state space. The approach of the paper relies on standard dynamic programming ideas and on a simple analytical derivation of a Tauberian relation.  相似文献   

18.
This paper deals with a one-dimensional controlled diffusion process on a compact interval with reflecting boundaries. The set of available actions is finite and the action can be changed only at countably many stopping times. The cost structure includes both a continuous movement cost rate depending on the state and the action, and a switching cost when the action is changed. The policies are evaluated with respect to the average cost criterion. The problem is solved by looking at, for each stationary policy, an embedded stochastic process corresponding to the state intervals visited in the sequence of switching times. The communicating classes of this process are classified into closed and transient groups and a method of calculating the average cost for the closed and transient classes is given. Also given are conditions to guarantee the optimality of a stationary policy. A Brownian motion control problem with quadratic cost is worked out in detail and the form of an optimal policy is established.  相似文献   

19.
20.
Partially observable Markov decision chains with finite state, action and signal spaces are considered. The performance index is the risk-sensitive average criterion and, under conditions concerning reachability between the unobservable states and observability of the signals, it is shown that the value iteration algorithm can be implemented to approximate the optimal average cost, to determine a stationary policy whose performance index is arbitrarily close to the optimal one, and to establish the existence of solutions to the optimality equation. The results rely on an appropriate extension of the well-known Schweitzer's transformation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号