首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 207 毫秒
1.
This paper is the first of two papers that present and evaluate an approach for determining suboptimal policies for large-scale Markov decision processes (MDP). Part 1 is devoted to the determination of bounds that motivate the development and indicate the quality of the suboptimal design approach; Part 2 is concerned with the implementation and evaluation of the suboptimal design approach. The specific MDP considered is the infinite-horizon, expected total discounted cost MDP with finite state and action spaces. The approach can be described as follows. First, the original MDP is approximated by a specially structured MDP. The special structure suggests how to construct associated smaller, more computationally tractable MDP's. The suboptimal policy for the original MDP is then constructed from the solutions of these smaller MDP's. The key feature of this approach is that the state and action space cardinalities of the smaller MDP's are exponential reductions of the state and action space cardinalities of the original MDP.  相似文献   

2.
This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by different classes of policies; we then focus on stationary policies and on mixed deterministic policies and present conditions under which optimal policies exist within these classes. We conclude by introducing an equivalent infinite Linear Program.  相似文献   

3.
This paper studies stochastic inventory problems with unbounded Markovian demands, ordering costs that are lower semicontinuous, and inventory/backlog (or surplus) costs that are lower semicontinuous with polynomial growth. Finite-horizon problems, stationary and nonstationary discounted-cost infinite-horizon problems, and stationary long-run average-cost problems are addressed. Existence of optimal Markov or feedback policies is established. Furthermore, optimality of (s, S)-type policies is proved when, in addition, the ordering cost consists of fixed and proportional cost components and the surplus cost is convex.  相似文献   

4.
We computationally assess policies for the elevator control problem by a new column-generation approach for the linear programming method for discounted infinite-horizon Markov decision problems. By analyzing the optimality of given actions in given states, we were able to provably improve the well-known nearest-neighbor policy. Moreover, with the method we could identify an optimal parking policy. This approach can be used to detect and resolve weaknesses in particular policies for Markov decision problems.  相似文献   

5.
This paper addresses the simultaneous determination of pricing and inventory replenishment strate- gies under a fluctuating environment. Specifically, we analyze the single item, periodic review model. The demand consists of two parts: the deterministic component, which is influenced by the price, and the stochastic component (perturbation). The distribution of the stochastic component is determined by the current state of an exogenous Markov chain. The price that is charged in any given period can be specified dynamically. A replenishment order may be placed at the beginning of some or all of the periods, and stockouts are fully backlogged. Ordering costs that are lower semicontinuous, and inventory/backlog (or surplus) costs that are continuous with polynomial growth. Finite-horizon and infinite-horizon problems are addressed. Existence of optimal policies is established. Furthermore, optimality of (s,S,p)-type policies is proved when the ordering cost consists of fixed and proportional cost components and the surplus cost (these costs are all state-dependent) is convex.  相似文献   

6.
We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of finite horizon MDPs to the infinite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with infinite state space, (3) convergence of MDPs as the discount factor goes to a limit. In all these cases we establish the convergence of optimal values and policies. Moreover, based on the optimal policy for the limiting problem, we construct policies which are almost optimal for the other (approximating) problems. Based on the convergence of MDPs with a truncated state space to the problem with infinite state space, we show that an optimal stationary policy exists such that the number of randomisations it uses is less or equal to the number of constraints plus one. We finally apply the results to a dynamic scheduling problem.This work was partially supported by the Chateaubriand fellowship from the French embassy in Israel and by the European Grant BRA-QMIPS of CEC DG XIII  相似文献   

7.
We study infinite-horizon asymptotic average optimality for parallel server networks with multiple classes of jobs and multiple server pools in the Halfin–Whitt regime. Three control formulations are considered: (1) minimizing the queueing and idleness cost, (2) minimizing the queueing cost under constraints on idleness at each server pool, and (3) fairly allocating the idle servers among different server pools. For the third problem, we consider a class of bounded-queue, bounded-state (BQBS) stable networks, in which any moment of the state is bounded by that of the queue only (for both the limiting diffusion and diffusion-scaled state processes). We show that the optimal values for the diffusion-scaled state processes converge to the corresponding values of the ergodic control problems for the limiting diffusion. We present a family of state-dependent Markov balanced saturation policies (BSPs) that stabilize the controlled diffusion-scaled state processes. It is shown that under these policies, the diffusion-scaled state process is exponentially ergodic, provided that at least one class of jobs has a positive abandonment rate. We also establish useful moment bounds, and study the ergodic properties of the diffusion-scaled state processes, which play a crucial role in proving the asymptotic optimality.  相似文献   

8.
In this paper we are concerned with the existence of optimal stationary policies for infinite-horizon risk-sensitive Markov control processes with denumerable state space, unbounded cost function, and long-run average cost. Introducing a discounted cost dynamic game, we prove that its value function satisfies an Isaacs equation, and its relationship with the risk-sensitive control problem is studied. Using the vanishing discount approach, we prove that the risk-sensitive dynamic programming inequality holds, and derive an optimal stationary policy. Accepted 1 October 1997  相似文献   

9.
An iterative decomposition method is presented for computing the values in an infinite-horizon discounted Markov renewal program (DMRP). The states are partitioned intoM groups, with each iteration involving disaggregation of one group at a time, with the otherM–1 groups being collapsed intoM–1 singletons using the replacement process method. Each disaggregation also looks like a DMRP and can be performed by policy-iteration, value-iteration or linear programming. Anticipated benefits from the method include reduced computer time and memory requirements, scale-invariance and greater robustness to the starting point.  相似文献   

10.
This paper concerns nonstationary continuous-time Markov control processes on Polish spaces, with the infinite-horizon discounted cost criterion. Necessary and sufficient conditions are given for a control policy to be optimal and asymptotically optimal. In addition, under suitable hypotheses, it is shown that the successive approximation procedure converges in the sense that the sequence of finite-horizon optimal cost functions and the corresponding optimal control policies both converge.  相似文献   

11.
In this paper a new algorithm is provided for obtaining approximately optimal policies for infinite-horizon discounted Markov decision processes. In addition, some of the properties of the algorithm are established. The algorithm is based upon the fact that the optimal value function is the unique vector minimum function within the superharmonic set.  相似文献   

12.
We consider a discrete-time Markov decision process with a partially ordered state space and two feasible control actions in each state. Our goal is to find general conditions, which are satisfied in a broad class of applications to control of queues, under which an optimal control policy is monotonic. An advantage of our approach is that it easily extends to problems with both information and action delays, which are common in applications to high-speed communication networks, among others. The transition probabilities are stochastically monotone and the one-stage reward submodular. We further assume that transitions from different states are coupled, in the sense that the state after a transition is distributed as a deterministic function of the current state and two random variables, one of which is controllable and the other uncontrollable. Finally, we make a monotonicity assumption about the sample-path effect of a pairwise switch of the actions in consecutive stages. Using induction on the horizon length, we demonstrate that optimal policies for the finite- and infinite-horizon discounted problems are monotonic. We apply these results to a single queueing facility with control of arrivals and/or services, under very general conditions. In this case, our results imply that an optimal control policy has threshold form. Finally, we show how monotonicity of an optimal policy extends in a natural way to problems with information and/or action delay, including delays of more than one time unit. Specifically, we show that, if a problem without delay satisfies our sufficient conditions for monotonicity of an optimal policy, then the same problem with information and/or action delay also has monotonic (e.g., threshold) optimal policies.  相似文献   

13.
A portfolio optimization problem on an infinite-time horizon is considered. Risky asset prices obey a logarithmic Brownian motion and interest rates vary according to an ergodic Markov diffusion process. The goal is to choose optimal investment and consumption policies to maximize the infinite-horizon expected discounted hyperbolic absolute risk aversion (HARA) utility of consumption. The problem is then reduced to a one-dimensional stochastic control problem by virtue of the Girsanov transformation. A dynamic programming principle is used to derive the dynamic programming equation (DPE). The subsolution/supersolution method is used to obtain existence of solutions of the DPE. The solutions are then used to derive the optimal investment and consumption policies. In addition, for a special case, we obtain the results using the viscosity solution method.  相似文献   

14.
For an infinite-horizon discounted Markov decision process with a finite number of states and actions, this note provides upper bounds on the number of operations required to compute an approximately optimal policy by value iterations in terms of the discount factor, spread of the reward function, and desired closeness to optimality. One of the provided upper bounds on the number of iterations has the property that it is a non-decreasing function of the value of the discount factor.  相似文献   

15.
We consider a noncooperative N-person discounted Markov game with a metric state space, and define the total expected discounted gain. Under some conditions imposed on the objects in the game system, we prove that our game system has an equilibrium point and each player has his equilibrium strategy. Moreover in the case of a nondiscounted game, the total expected gain up to a finite time can be obtained, and we define the long-run expected average gain. Thus if we impose a further assumption for the objects besides the conditions in the case of the discounted game, then it is proved that the equilibrium point exists in the nondiscounted Markov game. The technique for proving the nondiscounted case is essentially to modify the objects of the game so that they become objects of a modified Markov game with a discounted factor which has an equilibrium point in addition to the equilibrium point of the discounted game.  相似文献   

16.
We consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call “approximate receding horizon control.” We first analyze the performance of the approximate receding horizon control for infinite-horizon average reward under an ergodicity assumption, which also generalizes the result obtained by White (J. Oper. Res. Soc. 33 (1982) 253-259). We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the sub-MDP. The first control policy is based on a finite-horizon approximation of Howard's policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation.  相似文献   

17.
Continuous time Markovian decision models with countable state space are investigated. The existence of an optimal stationary policy is established for the expected average return criterion function. It is shown that the expected average return can be expressed as an expected discounted return of a related Markovian decision process. A policy iteration method is given which converges to an optimal deterministic policy, the policy so obtained is shown optimal over all Markov policies.  相似文献   

18.
We prove the existence of Markov perfect equilibria (MPE) for nonstationary undiscounted infinite-horizon dynamic games with alternating moves. A suitable finite-horizon equilibrium relaxation, the ending state constrained MPE, captures the relevant features of an infinite-horizon MPE for a long enough horizon, under a uniformly bounded reachability assumption.  相似文献   

19.
We consider the problem of optimally maintaining a periodically inspected system that deteriorates according to a discrete-time Markov process and has a limit on the number of repairs that can be performed before it must be replaced. After each inspection, a decision maker must decide whether to repair the system, replace it with a new one, or leave it operating until the next inspection, where each repair makes the system more susceptible to future deterioration. If the system is found to be failed at an inspection, then it must be either repaired or replaced with a new one at an additional penalty cost. The objective is to minimize the total expected discounted cost due to operation, inspection, maintenance, replacement and failure. We formulate an infinite-horizon Markov decision process model and derive key structural properties of the resulting optimal cost function that are sufficient to establish the existence of an optimal threshold-type policy with respect to the system’s deterioration level and cumulative number of repairs. We also explore the sensitivity of the optimal policy to inspection, repair and replacement costs. Numerical examples are presented to illustrate the structure and the sensitivity of the optimal policy.  相似文献   

20.
Finite and infinite planning horizon Markov decision problems are formulated for a class of jump processes with general state and action spaces and controls which are measurable functions on the time axis taking values in an appropriate metrizable vector space. For the finite horizon problem, the maximum expected reward is the unique solution, which exists, of a certain differential equation and is a strongly continuous function in the space of upper semi-continuous functions. A necessary and sufficient condition is provided for an admissible control to be optimal, and a sufficient condition is provided for the existence of a measurable optimal policy. For the infinite horizon problem, the maximum expected total reward is the fixed point of a certain operator on the space of upper semi-continuous functions. A stationary policy is optimal over all measurable policies in the transient and discounted cases as well as, with certain added conditions, in the positive and negative cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号