首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 862 毫秒
1.
In this paper, the infinite horizon Markovian decision programming with recursive reward functions is discussed. We show that Bellman's optimal principle is applicable for our model. Then, a sufficient and necessary condition for a policy to be optimal is given. For the stationary case, an iteration algorithm for finding a stationary optimal policy is designed. The algorithm is a generalization of Howard's [7] and Iwamoto's [3] algorithms.This research was supported by the National Natural Science Foundation of China.  相似文献   

2.
We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of finite horizon MDPs to the infinite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with infinite state space, (3) convergence of MDPs as the discount factor goes to a limit. In all these cases we establish the convergence of optimal values and policies. Moreover, based on the optimal policy for the limiting problem, we construct policies which are almost optimal for the other (approximating) problems. Based on the convergence of MDPs with a truncated state space to the problem with infinite state space, we show that an optimal stationary policy exists such that the number of randomisations it uses is less or equal to the number of constraints plus one. We finally apply the results to a dynamic scheduling problem.This work was partially supported by the Chateaubriand fellowship from the French embassy in Israel and by the European Grant BRA-QMIPS of CEC DG XIII  相似文献   

3.
《Optimization》2012,61(3):469-472
A certain monotonicity property is proved for the optimal expected cumulative discounted reward associated with a dynamic programming model with finite horizon, describing the Bernoulli: two-armed bandit problem. This property leads to a positive answer to the conjecture formulated in [3, p. 473]. For two special cases a stay-on-a-winner rule is obtained as a by-product.  相似文献   

4.
We consider the minimizing risk problems in discounted Markov decisions processes with countable state space and bounded general rewards. We characterize optimal values for finite and infinite horizon cases and give two sufficient conditions for the existence of an optimal policy in an infinite horizon case. These conditions are closely connected with Lemma 3 in White (1993), which is not correct as Wu and Lin (1999) point out. We obtain a condition for the lemma to be true, under which we show that there is an optimal policy. Under another condition we show that an optimal value is a unique solution to some optimality equation and there is an optimal policy on a transient set.  相似文献   

5.
We consider a periodic review inventory system and present its optimal policy in the infinite horizon setting. The optimal inventory policy that maximizes the infinite horizon expected discounted profit for the model is analytically obtained by relating to the finite horizon setting using results from variational analysis. Results are provided that elucidate the operations of the inventory system in the long run.  相似文献   

6.
Stochastic scheduling problems are considered by using discounted dynamic programming. Both, maximizing pure rewards and minimizing linear holding costs are treated in one common Markov decision problem. A sufficient condition for the optimality of the myopic policy for finite and infinite horizon is given. For the infinite horizon case we show the optimality of an index policy and give a sufficient condition for the index policy to be myopic. Moreover, the relation between the two sufficient conditions is discussed.  相似文献   

7.
We study an optimal maintenance policy for the server in a queueing system. Customers arrive at the server in a Poisson stream and are served by an exponential server, which is subject to multiple states indicating levels of popularity. The server state transitions are governed by a Markov process. The arrival rate depends on the server state and it decreases as the server loses popularity. By maintenance the server state recovers completely, though the customers in the system are lost at the beginning of maintenance. The customers who arrive during maintenance are also lost. In this paper, two kinds of such systems are considered. The first system receives a unit reward when a customer arrives at the system and pays a unit cost for each lost customer at the start of maintenance. The second system receives a unit reward at departure, and pays nothing for lost customers at the beginning of maintenance. Our objective is to maximize the total expected discounted profit over an infinite time horizon. We use a semi-Markov decision process to formulate the problem and are able to establish some properties for the optimal maintenance policy under certain conditions.  相似文献   

8.
This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given.  相似文献   

9.
We consider the optimization of finite-state, finite-action Markov decision processes under constraints. Costs and constraints are of the discounted or average type, and possibly finite-horizon. We investigate the sensitivity of the optimal cost and optimal policy to changes in various parameters. We relate several optimization problems to a generic linear program, through which we investigate sensitivity issues. We establish conditions for the continuity of the optimal value in the discount factor. In particular, the optimal value and optimal policy for the expected average cost are obtained as limits of the dicounted case, as the discount factor goes to one. This generalizes a well-known result for the unconstrained case. We also establish the continuity in the discount factor for certain non-stationary policies. We then discuss the sensitivity of optimal policies and optimal values to small changes in the transition matrix and in the instantaneous cost functions. The importance of the last two results is related to the performance of adaptive policies for constrained MDP under various cost criteria [3,5]. Finally, we establish the convergence of the optimal value for the discounted constrained finite horizon problem to the optimal value of the corresponding infinite horizon problem.  相似文献   

10.
This paper deals with Markov Decision Processes (MDPs) on Borel spaces with possibly unbounded costs. The criterion to be optimized is the expected total cost with a random horizon of infinite support. In this paper, it is observed that this performance criterion is equivalent to the expected total discounted cost with an infinite horizon and a varying-time discount factor. Then, the optimal value function and the optimal policy are characterized through some suitable versions of the Dynamic Programming Equation. Moreover, it is proved that the optimal value function of the optimal control problem with a random horizon can be bounded from above by the optimal value function of a discounted optimal control problem with a fixed discount factor. In this case, the discount factor is defined in an adequate way by the parameters introduced for the study of the optimal control problem with a random horizon. To illustrate the theory developed, a version of the Linear-Quadratic model with a random horizon and a Logarithm Consumption-Investment model are presented.  相似文献   

11.
In a M/M/N+M queue, when there are many customers waiting, it may be preferable to reject a new arrival rather than risk that arrival later abandoning without receiving service. On the other hand, rejecting new arrivals increases the percentage of time servers are idle, which also may not be desirable. We address these trade-offs by considering an admission control problem for a M/M/N+M queue when there are costs associated with customer abandonment, server idleness, and turning away customers. First, we formulate the relevant Markov decision process (MDP), show that the optimal policy is of threshold form, and provide a simple and efficient iterative algorithm that does not presuppose a bounded state space to compute the minimum infinite horizon expected average cost and associated threshold level. Under certain conditions we can guarantee that the algorithm provides an exact optimal solution when it stops; otherwise, the algorithm stops when a provided bound on the optimality gap is reached. Next, we solve the approximating diffusion control problem (DCP) that arises in the Halfin–Whitt many-server limit regime. This allows us to establish that the parameter space has a sharp division. Specifically, there is an optimal solution with a finite threshold level when the cost of an abandonment exceeds the cost of rejecting a customer; otherwise, there is an optimal solution that exercises no control. This analysis also yields a convenient analytic expression for the infinite horizon expected average cost as a function of the threshold level. Finally, we propose a policy for the original system that is based on the DCP solution, and show that this policy is asymptotically optimal. Our extensive numerical study shows that the control that arises from solving the DCP achieves a very similar cost to the control that arises from solving the MDP, even when the number of servers is small.  相似文献   

12.
In this paper we study the exploitation of a one species forest plantation when timber price is governed by a stochastic process. The work focuses on providing closed expressions for the optimal harvesting policy in terms of the parameters of the price process and the discount factor, with finite and infinite time horizon. We assume that harvest is restricted to mature trees older than a certain age and that growth and natural mortality after maturity are neglected. We use stochastic dynamic programming techniques to characterize the optimal policy and we model price using a geometric Brownian motion and an Ornstein–Uhlenbeck process. In the first case we completely characterize the optimal policy for all possible choices of the parameters. In the second case we provide sufficient conditions, based on explicit expressions for reservation prices, assuring that harvesting everything available is optimal. In addition, for the Ornstein–Uhlenbeck case we propose a policy based on a reservation price that performs well in numerical simulations. In both cases we solve the problem for every initial condition and the best policy is obtained endogenously, that is, without imposing any ad hoc restrictions such as maximum sustained yield or convergence to a predefined final state.  相似文献   

13.
In this paper we consider a nonstationary periodic review dynamic production–inventory model with uncertain production capacity and uncertain demand. The maximum production capacity varies stochastically. It is known that order up-to (or base-stock, critical number) policies are optimal for both finite horizon problems and infinite horizon problems. We obtain upper and lower bounds of the optimal order up-to levels, and show that for an infinite horizon problem the upper and the lower bounds of the optimal order up-to levels for the finite horizon counterparts converge as the planning horizons considered get longer. Furthermore, under mild conditions the differences between the upper and the lower bounds converge exponentially to zero.  相似文献   

14.
We study risk-sensitive control of continuous time Markov chains taking values in discrete state space. We study both finite and infinite horizon problems. In the finite horizon problem we characterize the value function via Hamilton Jacobi Bellman equation and obtain an optimal Markov control. We do the same for infinite horizon discounted cost case. In the infinite horizon average cost case we establish the existence of an optimal stationary control under certain Lyapunov condition. We also develop a policy iteration algorithm for finding an optimal control.  相似文献   

15.
In this article, we consider a portfolio optimization problem of the Merton’s type with complete memory over a finite time horizon. The problem is formulated as a stochastic control problem on a finite time horizon and the state evolves according to a process governed by a stochastic process with memory. The goal is to choose investment and consumption controls such that the total expected discounted utility is maximized. Under certain conditions, we derive the explicit solutions for the associated Hamilton–Jacobi–Bellman (HJB) equations in a finite-dimensional space for exponential, logarithmic, and power utility functions. For those utility functions, verification results are established to ensure that the solutions are equal to the value functions, and the optimal controls are also derived.  相似文献   

16.
We consider a continuous time dynamic pricing problem for selling a given number of items over a finite or infinite time horizon. The demand is price sensitive and follows a non-homogeneous Poisson process. We formulate this problem as to maximize the expected discounted revenue and obtain the structural properties of the optimal revenue function and optimal price policy by the Hamilton-Jacobi-Bellman (HJB) equation. Moreover, we study the impact of the discount rate on the optimal revenue function and the optimal price. Further, we extend the problem to the case with discounting and time-varying demand, the infinite time horizon problem. Numerical examples are used to illustrate our analytical results.  相似文献   

17.
We provide weak sufficient conditions for a full-service policy to be optimal in a queueing control problem in which the service rate is a dynamic decision variable. In our model there are service costs and holding costs and the objective is to minimize the expected total discounted cost over an infinite horizon. We begin with a semi-Markov decision model for a single-server queue with exponentially distributed inter-arrival and service times. Then we present a general model with weak probabilistic assumptions and demonstrate that the full-service policy minimizes both finite-horizon and infinite-horizon total discounted cost on each sample path.  相似文献   

18.
19.
We determine replenishment and sales decisions jointly for an inventory system with random demand, lost sales and random yield. Demands in consecutive periods are independent random variables and their distributions are known. We incorporate discretionary sales, when inventory may be set aside to satisfy future demand even if some present demand may be lost. Our objective is to minimize the total discounted cost over the problem horizon by choosing an optimal replenishment and discretionary sales policy. We obtain the structure of the optimal replenishment and discretionary sales policy and show that the optimal policy for finite horizon problem converges to that of the infinite horizon problem. Moreover, we compare the optimal policy under random yield with that under certain yield, and show that the optimal order quantity (sales quantity) under random yield is more (less) than that under certain yield.  相似文献   

20.
ABSTRACT

Our purpose of this paper is to study stochastic control problems for systems driven by mean-field stochastic differential equations with elephant memory, in the sense that the system (like the elephants) never forgets its history. We study both the finite horizon case and the infinite time horizon case.
  • In the finite horizon case, results about existence and uniqueness of solutions of such a system are given. Moreover, we prove sufficient as well as necessary stochastic maximum principles for the optimal control of such systems. We apply our results to solve a mean-field linear quadratic control problem.

  • For infinite horizon, we derive sufficient and necessary maximum principles.

    As an illustration, we solve an optimal consumption problem from a cash flow modelled by an elephant memory mean-field system.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号