首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy  相似文献   

2.
This paper addresses the problem faced by a large electricity consumer in determining the optimal procurement plan over a short-term time horizon. The inherent complexity of the problem, due to its dynamic and stochastic nature, is dealt by means of the stochastic programming modeling framework. In particular, a two-stage problem is formulated with the aim of establishing the optimal amount of electricity to be purchased through bilateral contracts and in the Day-Ahead Electricity Market. Recourse actions are used to hedge against uncertainty related to future electricity prices and consumer’s needs. The optimal plan is defined so to minimize the overall cost and to control risk, which is measured in the form of violation of budget constraints. The stochastic model is dynamically solved in a rolling horizon fashion by iteratively considering more and more recent information and a planning horizon of decreasing length. Extensive numerical experiments have been carried out to assess the performance of the proposed dynamic decision approach. The results collected considering a real test case are very encouraging and provide evidence of the superiority of the approach also in comparison with other alternative procurement strategies.  相似文献   

3.
In a M/M/N+M queue, when there are many customers waiting, it may be preferable to reject a new arrival rather than risk that arrival later abandoning without receiving service. On the other hand, rejecting new arrivals increases the percentage of time servers are idle, which also may not be desirable. We address these trade-offs by considering an admission control problem for a M/M/N+M queue when there are costs associated with customer abandonment, server idleness, and turning away customers. First, we formulate the relevant Markov decision process (MDP), show that the optimal policy is of threshold form, and provide a simple and efficient iterative algorithm that does not presuppose a bounded state space to compute the minimum infinite horizon expected average cost and associated threshold level. Under certain conditions we can guarantee that the algorithm provides an exact optimal solution when it stops; otherwise, the algorithm stops when a provided bound on the optimality gap is reached. Next, we solve the approximating diffusion control problem (DCP) that arises in the Halfin–Whitt many-server limit regime. This allows us to establish that the parameter space has a sharp division. Specifically, there is an optimal solution with a finite threshold level when the cost of an abandonment exceeds the cost of rejecting a customer; otherwise, there is an optimal solution that exercises no control. This analysis also yields a convenient analytic expression for the infinite horizon expected average cost as a function of the threshold level. Finally, we propose a policy for the original system that is based on the DCP solution, and show that this policy is asymptotically optimal. Our extensive numerical study shows that the control that arises from solving the DCP achieves a very similar cost to the control that arises from solving the MDP, even when the number of servers is small.  相似文献   

4.
We consider a collection of heterogeneous assets which exhibit independent stochastic behavior, but are economically interdependent due to resource constraints on management decisions. We represent the collection of assets as a network of nonhomogeneous Markov decision processes linked by side constraints. To facilitate a procedure for obtaining nonrandomized decision policies, we express the resource limitations as integrated chance constraints and relax them into the objective function within a Lagrangian penalty term. We utilize subgradient optimization to establish upper bounds and a greedy randomized Lagrangian repair heuristic to obtain feasible solutions. We empirically validate the tightness of these a posteriori upper and lower bounds with computational experiments on a pair of applications. For pavement maintenance, we examine the effect that the reward structure of each constituent MDP has on the maintenance policy in the presence of budget constraints. For equipment replacement, we consider constraints on two resource types.  相似文献   

5.
Horizon and stages in applications of stochastic programming in finance   总被引:2,自引:0,他引:2  
To solve a decision problem under uncertainty via stochastic programming means to choose or to build a suitable stochastic programming model taking into account the nature of the real-life problem, character of input data, availability of software and computer technology. In applications of multistage stochastic programs additional rather complicated modeling issues come to the fore. They concern the choice of the horizon, stages, methods for generating scenario trees, etc. We shall discuss briefly the ways of selecting horizon and stages in financial applications. In our numerical studies, we focus on alternative choices of stages and their impact on optimal first-stage solutions of bond portfolio optimization problems. AMS Subject classification 90C15 . 92B28  相似文献   

6.
We consider the optimization of finite-state, finite-action Markov decision processes under constraints. Costs and constraints are of the discounted or average type, and possibly finite-horizon. We investigate the sensitivity of the optimal cost and optimal policy to changes in various parameters. We relate several optimization problems to a generic linear program, through which we investigate sensitivity issues. We establish conditions for the continuity of the optimal value in the discount factor. In particular, the optimal value and optimal policy for the expected average cost are obtained as limits of the dicounted case, as the discount factor goes to one. This generalizes a well-known result for the unconstrained case. We also establish the continuity in the discount factor for certain non-stationary policies. We then discuss the sensitivity of optimal policies and optimal values to small changes in the transition matrix and in the instantaneous cost functions. The importance of the last two results is related to the performance of adaptive policies for constrained MDP under various cost criteria [3,5]. Finally, we establish the convergence of the optimal value for the discounted constrained finite horizon problem to the optimal value of the corresponding infinite horizon problem.  相似文献   

7.
Analyses of global climate policy as a sequential decision under uncertainty have been severely restricted by dimensionality and computational burdens. Therefore, they have limited the number of decision stages, discrete actions, or number and type of uncertainties considered. In particular, two common simplifications are the use of two-stage models to approximate a multi-stage problem and exogenous formulations for inherently endogenous or decision-dependent uncertainties (in which the shock at time t+1 depends on the decision made at time t). In this paper, we present a stochastic dynamic programming formulation of the Dynamic Integrated Model of Climate and the Economy (DICE), and the application of approximate dynamic programming techniques to numerically solve for the optimal policy under uncertain and decision-dependent technological change in a multi-stage setting. We compare numerical results using two alternative value function approximation approaches, one parametric and one non-parametric. We show that increasing the variance of a symmetric mean-preserving uncertainty in abatement costs leads to higher optimal first-stage emission controls, but the effect is negligible when the uncertainty is exogenous. In contrast, the impact of decision-dependent cost uncertainty, a crude approximation of technology R&D, on optimal control is much larger, leading to higher control rates (lower emissions). Further, we demonstrate that the magnitude of this effect grows with the number of decision stages represented, suggesting that for decision-dependent phenomena, the conventional two-stage approximation will lead to an underestimate of the effect of uncertainty.  相似文献   

8.
Planning horizon is a key issue in production planning. Different from previous approaches based on Markov Decision Processes, we study the planning horizon of capacity planning problems within the framework of stochastic programming. We first consider an infinite horizon stochastic capacity planning model involving a single resource, linear cost structure, and discrete distributions for general stochastic cost and demand data (non-Markovian and non-stationary). We give sufficient conditions for the existence of an optimal solution. Furthermore, we study the monotonicity property of the finite horizon approximation of the original problem. We show that, the optimal objective value and solution of the finite horizon approximation problem will converge to the optimal objective value and solution of the infinite horizon problem, when the time horizon goes to infinity. These convergence results, together with the integrality of decision variables, imply the existence of a planning horizon. We also develop a useful formula to calculate an upper bound on the planning horizon. Then by decomposition, we show the existence of a planning horizon for a class of very general stochastic capacity planning problems, which have complicated decision structure.  相似文献   

9.
We propose a method for the control of multi-class queueing networks over a finite time horizon. We approximate the multi-class queueing network by a fluid network and formulate a fluid optimization problem which we solve as a separated continuous linear program. The optimal fluid solution partitions the time horizon to intervals in which constant fluid flow rates are maintained. We then use a policy by which the queueing network tracks the fluid solution. To that end we model the deviations between the queuing and the fluid network in each of the intervals by a multi-class queueing network with some infinite virtual queues. We then keep these deviations stable by an adaptation of a maximum pressure policy. We show that this method is asymptotically optimal when the number of items that is processed and the processing speed increase. We illustrate these results through a simple example of a three stage re-entrant line. Research supported in part by Israel Science Foundation Grant 249/02 and 454/05 and by European Network of Excellence Euro-NGI.  相似文献   

10.
This paper studies a single-product, dynamic, non-stationary, stochastic inventory problem with capacity commitment, in which a buyer purchases a fixed capacity from a supplier at the beginning of a planning horizon and the buyer’s total cumulative order quantity over the planning horizon is constrained with the capacity. The objective of the buyer is to choose the capacity at the beginning of the planning horizon and the order quantity in each period to minimize the expected total cost over the planning horizon. We characterize the structure of the minimum sum of the expected ordering, storage and shortage costs in a period and thereafter and the optimal ordering policy for a given capacity. Based on the structure, we identify conditions under which a myopic ordering policy is optimal and derive an equation for the optimal capacity commitment. We then use the optimal capacity and the myopic ordering policy to evaluate the effect of the various parameters on the minimum expected total cost over the planning horizon.  相似文献   

11.
In [10] a system of stochastic programming models was introduced for the optimal control of a storage level. Each model in this system serves to determine the optimal policy for only one period ahead though the time horizon consists of many future periods. The optimal control thus obtained can be considered an open loop control methodology. The main purpose of this paper is to present an application by giving an optimal control method for the regulation of the water level of Lake Balaton in Hungary. By solving almost 600 stochastic programming problems we analyze what would have happened if we had controlled the water level using our method between 1922 and 1970, where one decision period is one month. The numerical results show that the proposed control methodology works quite well in this case.  相似文献   

12.
A wireless sensor network is a network consisting of distributed autonomouselectronic devices called sensors. Sensors have limited energy and capabilityfor sensing, data processing, and communicating, but they can collectivelybehave to provide an effective network that monitors an area and transmitinformation to gateway nodes or sinks, either directly or through other sensornodes. In most applications the network must operate for long periods of time,so the available energy resources of the sensors must be managed efficiently. Inthis paper, we first develop a mixed integer linear programming model tomaximize network lifetime by optimally determining locations of sensors andsinks, activity schedules of deployed sensors, and data flow routes from sensorsto sinks over a finite planning horizon subject to coverage, flow conservation,energy consumption, and budget constraints. Unfortunately, it is difficult tosolve this model exactly even for small instances. Therefore, we propose twoapproximate solution methods: a Lagrangean heuristic and a two-stage heuristicin which sensors are deployed and an activity schedule is found in the firststage, whereas sinks are located and sensor-to-sink data flow routes aredetermined in the second stage. Computational experiments performed on varioustest instances indicate that the Lagrangean heuristic is both efficient andaccurate and also outperforms the two-stage heuristic.  相似文献   

13.
In this paper we present rules concerning the optimal policy and stability regions for the single product periodic review inventory problem with stationary demands, over a finite horizon. The key parameter to the whole study is the Lot-Sizing Index (LSI) introduced by Blackburn and Millen. Two algorithms are presented. The first one constructs stability regions which are expressed as intervals of the LSI parameter, covering the whole range of its values. The proposed algorithm is very simple to understand and implement, and most importantly, it provides a solution table which can be used by the decision maker to easily determine the optimal policy for any problem with a given horizon and any possible combination of its cost parameters, namely any LSI value. The second proposed algorithm determines the optimal policy for any given LSI value; it constitutes a completely different approach to that of the Wagner–Whitin algorithm and requires very little computational effort.  相似文献   

14.
We develop a multi-stage stochastic programming approach to optimize the bidding strategy of a virtual power plant (VPP) operating on the Spanish spot market for electricity. The VPP markets electricity produced in the wind parks it manages on the day-ahead market and on six staggered auction-based intraday markets. Uncertainty enters the problem via stochastic electricity prices as well as uncertain wind energy production. We set up the problem of bidding for one day of operation as a Markov decision process (MDP) that is solved using a variant of the stochastic dual dynamic programming algorithm. We conduct an extensive out-of-sample comparison demonstrating that the optimal policy obtained by the stochastic program clearly outperforms deterministic planning, a pure day-ahead strategy, a benchmark that only uses the day-ahead market and the first intraday market, as well as a proprietary stochastic programming approach developed in the industry. Furthermore, we study the effect of risk aversion as modeled by the nested Conditional Value-at-Risk as well as the impact of changes in various problem parameters.  相似文献   

15.
Managing capacity flexibility in make-to-order production environments   总被引:3,自引:0,他引:3  
This paper addresses the problem of managing flexible production capacity in a make-to-order (MTO) manufacturing environment. We present a multi-period capacity management model where we distinguish between process flexibility (the ability to produce multiple products on multiple production lines) and operational flexibility (the ability to dynamically change capacity allocations among different product families over time). For operational flexibility, we consider two polices: a fixed allocation policy where the capacity allocations are fixed throughout the planning horizon and a dynamic allocation policy where the capacity allocations change from period to period. The former approach is modeled as a single-stage stochastic program and solved using a cutting-plane method. The latter approach is modeled as a multi-stage stochastic program and a sampling-based decomposition method is presented to identify a feasible policy and assess the quality of that policy. A computational experiment quantifies the benefits of operational flexibility and demonstrates that it is most beneficial when the demand and capacity are well-balanced and the demand variability is high. Additionally, our results reveal that myopic operating policies may lead a firm to adopt more process flexibility and form denser flexibility configuration chains. That is, process flexibility may be over-valued in the literature since it is assumed that a firm will operate optimally after the process flexibility decision. We also show that the value of process flexibility increases with the number of periods in the planning horizon if an optimal operating policy is employed. This result is reversed if a myopic allocation policy is adopted instead.  相似文献   

16.
We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of finite horizon MDPs to the infinite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with infinite state space, (3) convergence of MDPs as the discount factor goes to a limit. In all these cases we establish the convergence of optimal values and policies. Moreover, based on the optimal policy for the limiting problem, we construct policies which are almost optimal for the other (approximating) problems. Based on the convergence of MDPs with a truncated state space to the problem with infinite state space, we show that an optimal stationary policy exists such that the number of randomisations it uses is less or equal to the number of constraints plus one. We finally apply the results to a dynamic scheduling problem.This work was partially supported by the Chateaubriand fellowship from the French embassy in Israel and by the European Grant BRA-QMIPS of CEC DG XIII  相似文献   

17.
We consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call “approximate receding horizon control.” We first analyze the performance of the approximate receding horizon control for infinite-horizon average reward under an ergodicity assumption, which also generalizes the result obtained by White (J. Oper. Res. Soc. 33 (1982) 253-259). We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the sub-MDP. The first control policy is based on a finite-horizon approximation of Howard's policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation.  相似文献   

18.
Military medical planners must develop dispatching policies that dictate how aerial medical evacuation (MEDEVAC) units are utilized during major combat operations. The objective of this research is to determine how to optimally dispatch MEDEVAC units in response to 9-line MEDEVAC requests to maximize MEDEVAC system performance. A discounted, infinite horizon Markov decision process (MDP) model is developed to examine the MEDEVAC dispatching problem. The MDP model allows the dispatching authority to accept, reject, or queue incoming requests based on a request’s classification (i.e., zone and precedence level) and the state of the MEDEVAC system. A representative planning scenario based on contingency operations in southern Afghanistan is utilized to investigate the differences between the optimal dispatching policy and three practitioner-friendly myopic policies. Two computational experiments are conducted to examine the impact of selected MEDEVAC problem features on the optimal policy and the system performance measure. Several excursions are examined to identify how the 9-line MEDEVAC request arrival rate and the MEDEVAC flight speeds impact the optimal dispatching policy. Results indicate that dispatching MEDEVAC units considering the precedence level of requests and the locations of busy MEDEVAC units increases the performance of the MEDEVAC system. These results inform the development and implementation of MEDEVAC tactics, techniques, and procedures by military medical planners. Moreover, an analysis of solution approaches for the MEDEVAC dispatching problem reveals that the policy iteration algorithm substantially outperforms the linear programming algorithms executed by CPLEX 12.6 with regard to computational effort. This result supports the claim that policy iteration remains the superlative solution algorithm for exactly solving computationally tractable Markov decision problems.  相似文献   

19.
We present an implementation of the procedure for determining a suboptimal policy for a large-scale Markov decision process (MDP) presented in Part 1. An operation count analysis illuminates the significant computational benefits of this procedure for determining an optimal policy relative to a procedure for determining a suboptimal policy based on state and action space aggregation. Results of a preliminary numerical study indicate that the quality of the suboptimal policy produced by the 3MDP approach shows promise.This research has been supported by NSF Grants Nos. ECS-80-18266 and ECS-83-19355.  相似文献   

20.
This paper considers the mobile facility routing and scheduling problem with stochastic demand (MFRSPSD). The MFRSPSD simultaneously determines the route and schedule of a fleet of mobile facilities which serve customers with uncertain demand to minimize the total cost generated during the planning horizon. The problem is formulated as a two-stage stochastic programming model, in which the first stage decision deals with the temporal and spatial movement of MFs and the second stage handles how MFs serve customer demands. An algorithm based on the multicut version of the L-shaped method is proposed in which several lower bound inequalities are developed and incorporated into the master program. The computational results show that the algorithm yields a tighter lower bound and converges faster to the optimal solution. The result of a sensitivity analysis further indicates that in dealing with stochastic demand the two-stage stochastic programming approach has a distinctive advantage over the model considering only the average demand in terms of cost reduction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号