首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 683 毫秒
1.
This paper deals with the optimal production planning for a single product over a finite horizon. The holding and production costs are assumed quadratic as in Holt, Modigliani, Muth and Simon (HMMS) [7] model. The cumulative demand is compound Poisson and a chance constraint is included to guarantee that the inventory level is positive with a probability of at least α at each time point. The resulting stochastic optimization problem is transformed into a deterministic optimal control problem with control variable and of the optimal solution is presented. The form of state variable inequality constraints. A discussion the optimal control (production rate) is obtained as follows: if there exists a time t1 such that t1?[O, T]where T is the end of the planning period, then (i) produce nothing until t1 and (ii) produce at a rate equal to the expected demand plus a ‘correction factor’ between t1 and T. If t1 is found to be greater than T, then the optimal decision is to produce nothing and always meet the demand from the inventory.  相似文献   

2.
One-armed bandit models with continuous and delayed responses   总被引:2,自引:0,他引:2  
One-armed bandit processes with continuous delayed responses are formulated as controlled stochastic processes following the Bayesian approach. It is shown that under some regularity conditions, a Gittins-like index exists which is the limit of a monotonic sequence of break-even values characterizing optimal initial selections of arms for finite horizon bandit processes. Furthermore, there is an optimal stopping solution when all observations on the unknown arm are complete. Results are illustrated with a bandit model having exponentially distributed responses, in which case the controlled stochastic process becomes a Markov decision process, the Gittins-like index is the Gittins index and the Gittins index strategy is optimal. Acknowledgement.We thank an anonymous referee for constructive and insightful comments, especially those related to the notion of the Gittins index.Both authors are funded by the Natural Sciences and Engineering Research Council (NSERC) of Canada.  相似文献   

3.
The system investigated consists of a stochastic periodic stream of raw material, a continuous processing operation with controllable deterministic service rates, and a storage facility. The arrival stream is periodically interrupted and divided into alternating on-off intervals of fixed length. The processing facility is allowed to operate during the off-interval. Superimposed on this system is a cost structure composed of processing and holding costs. Such operations may be found in manufacturing as well as service systems (for example, dry cleaners, machine shops, repair and maintenance shops, printers, information processing centers, etc). A service rate control rule that minimizes the infinite-horizon discounted expected total cost is found. Existence and uniqueness of long-term optimal cost and policy functions is shown. Since the optimal policy cannot be expressed explicitly, an approximate solution was obtained. An error bound on the optimal cost associated with this solution is exhibited. The approximate solution is characterized by a service rate control rule that is a linear function of the level of inventory at the start of each on-interval and a piecewise linear function of inventory at the start of each off-interval. The optimal discounted expected total cost is quadratic in the inventory level at the start of each interval. Computational results indicate relative cost errors in the order of 2–3 percent.This research was performed at the Sanitary Engineering Research Laboratory and Operations Research Center of the University of California, Berkeley. It was made possible by US Public Health Research Grant UI-00547 from the Environmental Control Administration-Bureau of Solid Waste Management and by National Science Foundation Grant GK-1684.The author thanks Professor C. R. Glassey for not only suggesting this research, but for his constant encouragement and suggestions throughout its duration. He also thanks Professors W. S. Jewell and P. H. McGauhey whose comments on the draft were very helpful.  相似文献   

4.
An incomplete financial market is considered with a risky asset and a bond. The risky asset price is a pure jump process whose dynamics depends on a jump-diffusion stochastic factor describing the activity of other markets, macroeconomics factors or microstructure rules that drive the market. With a stochastic control approach, maximization of the expected utility of terminal wealth is discussed for utility functions of constant relative risk aversion type. Under suitable assumptions, closed form solutions for the value functions and for the optimal strategy are provided and verification results are discussed. Moreover, the solution to the dual problems associated with the utility maximization problems is derived.  相似文献   

5.
Finite and infinite planning horizon Markov decision problems are formulated for a class of jump processes with general state and action spaces and controls which are measurable functions on the time axis taking values in an appropriate metrizable vector space. For the finite horizon problem, the maximum expected reward is the unique solution, which exists, of a certain differential equation and is a strongly continuous function in the space of upper semi-continuous functions. A necessary and sufficient condition is provided for an admissible control to be optimal, and a sufficient condition is provided for the existence of a measurable optimal policy. For the infinite horizon problem, the maximum expected total reward is the fixed point of a certain operator on the space of upper semi-continuous functions. A stationary policy is optimal over all measurable policies in the transient and discounted cases as well as, with certain added conditions, in the positive and negative cases.  相似文献   

6.
This paper investigates finite horizon semi-Markov decision processes with denumerable states. The optimality is over the class of all randomized history-dependent policies which include states and also planning horizons, and the cost rate function is assumed to be bounded below. Under suitable conditions, we show that the value function is a minimum nonnegative solution to the optimality equation and there exists an optimal policy. Moreover, we develop an effective algorithm for computing optimal policies, derive some properties of optimal policies, and in addition, illustrate our main results with a maintenance system.  相似文献   

7.
This paper investigates the problem of the optimal switching among a finite number of Markov processes, generalizing some of the author's earlier results for controlled one-dimensional diffusion. Under rather general conditions, it is shown that the optimal discounted cost function is the unique solution of a functional equation. Under more restrictive assumptions, this function is shown to be the unique solution of some quasi-variational inequalities. These assumptions are verified for a large class of control problems. For controlled Markov chains and controlled one-dimensional diffusion, the existence of a stationary optimal policy is established. Finally, a policy iteration method is developed to calculate an optimal stationary policy, if one exists.This research was sponsored by the Air Force Office of Scientific Research (AFSC), United States Air Force, under Contract No. F-49620-79-C-0165.The author would like to thank the referee for bringing Refs. 7, 8, and 9 to his attention.  相似文献   

8.
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms.  相似文献   

9.
A comprehensible and unified system control approach is presented to solve a class of production/inventory smoothing problems. A nonstationary, non-Gaussian, finite-time linear optimal solution with an attractive computation scheme is obtained for a general quadratic and linear cost structure. A complete solution to a classical production/inventory control problem is given as an example. A general solution to the discrete-time optimal regulator with arbitrary but known disturbance is provided and discussed in detail. A computationally attractive closed-loop suboptimal scheme is presented for problems with constraints or nonquadratic costs. Implementation and interpretation of the results are discussed.  相似文献   

10.
Consider the optimal stopping problem of a one-dimensional diffusion with positive discount. Based on Dynkin's characterization of the value as the minimal excessive majorant of the reward and considering its Riesz representation, we give an explicit equation to find the optimal stopping threshold for problems with one-sided stopping regions, and an explicit formula for the value function of the problem. This representation also gives light on the validity of the smooth-fit (SF) principle. The results are illustrated by solving some classical problems, and also through the solution of: optimal stopping of the skew Brownian motion and optimal stopping of the sticky Brownian motion, including cases in which the SF principle fails.  相似文献   

11.
带随机过程的随机规划问题最优解过程的平稳性与马氏性   总被引:1,自引:0,他引:1  
证明了带随机过程的随机规划问题其最优争集中至少存在一列最优解均为可测的随机过程;且如果问题中的随机过程具有平稳性与马氏性,则此时间问题的最优解过程亦具有相应的特性。  相似文献   

12.
Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy  相似文献   

13.
Value iteration and optimization of multiclass queueing networks   总被引:2,自引:0,他引:2  
Chen  Rong-Rong  Meyn  Sean 《Queueing Systems》1999,32(1-3):65-97
This paper considers in parallel the scheduling problem for multiclass queueing networks, and optimization of Markov decision processes. It is shown that the value iteration algorithm may perform poorly when the algorithm is not initialized properly. The most typical case where the initial value function is taken to be zero may be a particularly bad choice. In contrast, if the value iteration algorithm is initialized with a stochastic Lyapunov function, then the following hold: (i) a stochastic Lyapunov function exists for each intermediate policy, and hence each policy is regular (a strong stability condition), (ii) intermediate costs converge to the optimal cost, and (iii) any limiting policy is average cost optimal. It is argued that a natural choice for the initial value function is the value function for the associated deterministic control problem based upon a fluid model, or the approximate solution to Poisson’s equation obtained from the LP of Kumar and Meyn. Numerical studies show that either choice may lead to fast convergence to an optimal policy. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

14.
We consider the problem of control for continuous time stochastic hybrid systems in finite time horizon. The systems considered are nonlinear: the state evolution is a nonlinear function of both the control and the state. The control parameters change at discrete times according to an underlying controlled Markov chain which has finite state and action spaces. The objective is to design a controller which would minimize an expected nonlinear cost of the state trajectory. We show using an averaging procedure, that the above minimization problem can be approximated by the solution of some deterministic optimal control problem. This paper generalizes our previous results obtained for systems whose state evolution is linear in the control.This work is supported by the Australian Research Council. All correspondence should be directed to the first author.  相似文献   

15.
The problem of when, if ever, a stand of old-growth forest should be harvested is formulated as an optimal stopping problem, and a decision rule to maximize the expected present value of amenity services plus timber benefits is found analytically. This solution can be thought of as providing the “correct” way in which cost-benefit analysis should be carried out. Future values of amenity services provided by the standing forest and or timber are considered to be uncertain and are modeled by Geometric Poisson Jump (GPJ) processes. This specification avoids the ambiguity which arises with Geometric Brownian Motion (GBM) models, as to which form of stochastic integral (Itô or Stratonovich) should be employed, but more importantly allows for monotonic (yet stochastic) processes. It is shown that monotonicity (or lack of it) in the value of amenity services relative to timber values plays an important part in the solution. If amenity values never go down (or never go up) relative to timber values, then the certain-equivalence cost-benefit procedure provides the optimal solution, and there is no option value. It is only to the extent that the relative valuations can change direction that the certainty-equivalence procedure becomes sub-optimal and option value arises.  相似文献   

16.
A finite collection of piecewise-deterministic processes are controlled in order to minimize the expected value of a performance functional with continuous operating cost and discrete switching control costs. The solution of the associated dynamic programming equation is obtained by an iterative approximation using optimal stopping time problems.This research was supported in part by NSF Grant No. DMS-8508651 and by University of Tennessee Science Alliance Research Incentive Award.  相似文献   

17.
A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given.  相似文献   

18.
We find the closed form formula for the price of the perpetual American lookback spread option, whose payoff is the difference of the running maximum and minimum prices of a single asset. We solve an optimal stopping problem related to both maximum and minimum. We show that the spread option is equivalent to some fixed strike options on some domains, find the exact form of the optimal stopping region, and obtain the solution of the resulting partial differential equations. The value function is not differentiable. However, we prove the verification theorem due to the monotonicity of the maximum and minimum processes.  相似文献   

19.
The paper presents a comparison between two different flavors of nonlinear models to be used for the approximate solution of T-stage stochastic optimization (TSO) problems, a typical paradigm of Markovian decision processes. Specifically, the well-known class of neural networks is compared with a semi-local approach based on kernel functions, characterized by less demanding computational requirements. To this purpose, two alternative methods for the numerical solution of TSO are considered, one corresponding to the classic approximate dynamic programming (ADP) and the other based on a direct optimization of the optimal control functions, introduced here for the first time. Advantages and drawbacks in the TSO context of the two classes of approximators are analyzed, in terms of computational burden and approximation capabilities. Then, their performances are evaluated through simulations in two important high-dimensional TSO test cases, namely inventory forecasting and water reservoirs management.  相似文献   

20.
非负费用折扣半马氏决策过程   总被引:1,自引:0,他引:1  
黄永辉  郭先平 《数学学报》2010,53(3):503-514
本文考虑可数状态非负费用的折扣半马氏决策过程.首先在给定半马氏决策核和策略下构造一个连续时间半马氏决策过程,然后用最小非负解方法证明值函数满足最优方程和存在ε-最优平稳策略,并进一步给出最优策略的存在性条件及其一些性质.最后,给出了值迭代算法和一个数值算例.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号