首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
An optimal replacement policy for a multistate degenerative simple system   总被引:1,自引:0,他引:1  
In this paper, a degenerative simple system (i.e. a degenerative one-component system with one repairman) with k + 1 states, including k failure states and one working state, is studied. Assume that the system after repair is not “as good as new”, and the degeneration of the system is stochastic. Under these assumptions, we consider a new replacement policy T based on the system age. Our problem is to determine an optimal replacement policy T such that the average cost rate (i.e. the long-run average cost per unit time) of the system is minimized. The explicit expression of the average cost rate is derived, the corresponding optimal replacement policy can be determined, the explicit expression of the minimum of the average cost rate can be found and under some mild conditions the existence and uniqueness of the optimal policy T can be proved, too. Further, we can show that the repair model for the multistate system in this paper forms a general monotone process repair model which includes the geometric process repair model as a special case. We can also show that the repair model in the paper is equivalent to a geometric process repair model for a two-state degenerative simple system in the sense that they have the same average cost rate and the same optimal policy. Finally, a numerical example is given to illustrate the theoretical results of this model.  相似文献   

3.
The aim of the paper is to show that Lyapunov-like ergodicity conditions on Markov decision processes with Borel state space and possibly unbounded cost provide the approximation of an average cost optimal policy by solvingn-stage optimization problems (n = 1, 2, ...). The used approach ensures the exponential rate of convergence. The approximation of this type would be useful to find adaptive procedures of control and to estimate stability of an optimal control under disturbances of the transition probability.Research supported in part by Consejo Nacional de Ciencia y Tecnologia (CONACYT) under grant 0635P-E9506.Research supported by Fondo del Sistema de Investigatión del Mar de Cortés under Grant SIMAC/94/CT-005.  相似文献   

4.
Using the decomposition of solution of SDE, we consider the stochastic optimal control problem with anticipative controls as a family of deterministic control problems parametrized by the paths of the driving Wiener process and of a newly introduced Lagrange multiplier stochastic process (nonanticipativity equality constraint). It is shown that the value function of these problems is the unique global solution of a robust equation (random partial differential equation) associated to a linear backward Hamilton-Jacobi-Bellman stochastic partial differential equation (HJB SPDE). This appears as limiting SPDE for a sequence of random HJB PDE's when linear interpolation approximation of the Wiener process is used. Our approach extends the Wong-Zakai type results [20] from SDE to the stochastic dynamic programming equation by showing how this arises as average of the limit of a sequence of deterministic dynamic programming equations. The stochastic characteristics method of Kunita [13] is used to represent the value function. By choosing the Lagrange multiplier equal to its nonanticipative constraint value the usual stochastic (nonanticipative) optimal control and optimal cost are recovered. This suggests a method for solving the anticipative control problems by almost sure deterministic optimal control. We obtain a PDE for the “cost of perfect information” the difference between the cost function of the nonanticipative control problem and the cost of the anticipative problem which satisfies a nonlinear backward HJB SPDE. Poisson bracket conditions are found ensuring this has a global solution. The cost of perfect information is shown to be zero when a Lagrangian submanifold is invariant for the stochastic characteristics. The LQG problem and a nonlinear anticipative control problem are considered as examples in this framework  相似文献   

5.
We deal with a discrete-time finite horizon Markov decision process with locally compact Borel state and action spaces, and possibly unbounded cost function. Based on Lipschitz continuity of the elements of the control model, we propose a state and action discretization procedure for approximating the optimal value function and an optimal policy of the original control model. We provide explicit bounds on the approximation errors. Our results are illustrated by a numerical application to a fisheries management problem.  相似文献   

6.
We study optimal control of Markov processes with age-dependent transition rates. The control policy is chosen continuously over time based on the state of the process and its age. We study infinite horizon discounted cost and infinite horizon average cost problems. Our approach is via the construction of an equivalent semi-Markov decision process. We characterise the value function and optimal controls for both discounted and average cost cases.  相似文献   

7.
This paper considers an optimal maintenance policy for a practical and reparable deteriorating system subject to random shocks. Modeling the repair time by a geometric process and the failure mechanism by a generalized δ-shock process, we develop an explicit expression of the long-term average cost per time unit for the system under a threshold-type replacement policy. Based on this average cost function, we propose a finite search algorithm to locate the optimal replacement policy N to minimize the average cost rate. We further prove that the optimal policy N is unique and present some numerical examples. Many practical systems fit the model developed in this paper.  相似文献   

8.
In this paper, we consider a continuous review inventory system of a slow moving item for which the demand rate drops to a lower level at a known future time instance. The inventory system is controlled according to a one-for-one replenishment policy with a fixed lead time. Adapting to lower demand is achieved by changing the control policy in advance and letting the demand take away the excess stocks. We show that the timing of the control policy change primarily determines the tradeoff between backordering penalties and obsolescence costs. We propose an approximate solution for the optimal time to shift to the new control policy minimizing the expected total cost during the transient period. We find that the advance policy change results in significant cost savings and the approximation yields near optimal expected total costs.  相似文献   

9.
We introduce a class of models for multidimensional control problems that we call skip-free Markov decision processes on trees. We describe and analyse an algorithm applicable to Markov decision processes of this type that are skip-free in the negative direction. Starting with the finite average cost case, we show that the algorithm combines the advantages of both value iteration and policy iteration—it is guaranteed to converge to an optimal policy and optimal value function after a finite number of iterations but the computational effort required for each iteration step is comparable with that for value iteration. We show that the algorithm can also be used to solve discounted cost models and continuous-time models, and that a suitably modified algorithm can be used to solve communicating models.  相似文献   

10.
In a M/M/N+M queue, when there are many customers waiting, it may be preferable to reject a new arrival rather than risk that arrival later abandoning without receiving service. On the other hand, rejecting new arrivals increases the percentage of time servers are idle, which also may not be desirable. We address these trade-offs by considering an admission control problem for a M/M/N+M queue when there are costs associated with customer abandonment, server idleness, and turning away customers. First, we formulate the relevant Markov decision process (MDP), show that the optimal policy is of threshold form, and provide a simple and efficient iterative algorithm that does not presuppose a bounded state space to compute the minimum infinite horizon expected average cost and associated threshold level. Under certain conditions we can guarantee that the algorithm provides an exact optimal solution when it stops; otherwise, the algorithm stops when a provided bound on the optimality gap is reached. Next, we solve the approximating diffusion control problem (DCP) that arises in the Halfin–Whitt many-server limit regime. This allows us to establish that the parameter space has a sharp division. Specifically, there is an optimal solution with a finite threshold level when the cost of an abandonment exceeds the cost of rejecting a customer; otherwise, there is an optimal solution that exercises no control. This analysis also yields a convenient analytic expression for the infinite horizon expected average cost as a function of the threshold level. Finally, we propose a policy for the original system that is based on the DCP solution, and show that this policy is asymptotically optimal. Our extensive numerical study shows that the control that arises from solving the DCP achieves a very similar cost to the control that arises from solving the MDP, even when the number of servers is small.  相似文献   

11.
We address a rate control problem associated with a single server Markovian queueing system with customer abandonment in heavy traffic. The controller can choose a buffer size for the queueing system and also can dynamically control the service rate (equivalently the arrival rate) depending on the current state of the system. An infinite horizon cost minimization problem is considered here. The cost function includes a penalty for each rejected customer, a control cost related to the adjustment of the service rate and a penalty for each abandoning customer. We obtain an explicit optimal strategy for the limiting diffusion control problem (the Brownian control problem or BCP) which consists of a threshold-type optimal rejection process and a feedback-type optimal drift control. This solution is then used to construct an asymptotically optimal control policy, i.e. an optimal buffer size and an optimal service rate for the queueing system in heavy traffic. The properties of generalized regulator maps and weak convergence techniques are employed to prove the asymptotic optimality of this policy. In addition, we identify the parameter regimes where the infinite buffer size is optimal.  相似文献   

12.
In this paper we discuss the problem of optimally parking single and multiple idle elevators under light-traffic conditions. The problem is analyzed from the point of view of the elevator owner whose objective is to minimize the expected total cost of parking and dispatching the elevator (which includes the cost incurred for waiting passengers). We first consider the case of a single elevator and analyze a (commonly used but suboptimal) state-independent myopic policy that always positions the idle elevator at the same floor. Building on the results obtained for the myopic policy, we then show that the optimal non-myopic (state-dependent) policy calls for dispatching the idle elevator to the state-dependent median of a weight distribution. Next, we consider the more difficult case of two elevators and develop an expression for the expected dispatching distance function. We show that the objective function for the myopic policy is non-convex. The non-myopic policy is found to be dependent on the state of the two idle elevators. We compute the optimal state-dependent policy for two elevators using the results developed for the myopic policy. Next, we examine the case of multiple elevators and provide a general recursive formula to find the expected dispatching distance functions. Finally, we generalize the previous models by incorporating a fixed cost for parking the idle elevators that results in a two-sided optimal policy with different regions. Every policy that we introduce and analyze is illustrated by an example. The paper concludes with a short summary and suggestions for future research.  相似文献   

13.
We consider a class of stochastic nonlinear programs for which an approximation to a locally optimal solution is specified in terms of a fractional reduction of the initial cost error. We show that such an approximate solution can be found by approximately solving a sequence of sample average approximations. The key issue in this approach is the determination of the required sequence of sample average approximations as well as the number of iterations to be carried out on each sample average approximation in this sequence. We show that one can express this requirement as an idealized optimization problem whose cost function is the computing work required to obtain the required error reduction. The specification of this idealized optimization problem requires the exact knowledge of a few problems and algorithm parameters. Since the exact values of these parameters are not known, we use estimates, which can be updated as the computation progresses. We illustrate our approach using two numerical examples from structural engineering design.  相似文献   

14.
We develop an online actor–critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.  相似文献   

15.
In this paper, a unified policy iteration approach is presented for the optimal control problem of stochastic system with discounted average cost and continuous state space. The approach consists of temporal difference learning-based potential function approximation algorithms and performance difference formula-based policy improvement. The approximation algorithms are derived by solving the Poisson equation-based fixed-point equation, which can be viewed as continuous versions of least squares policy evaluation algorithm and least squares temporal difference algorithm. The simulations are provided to illustrate the effectiveness of the approach.  相似文献   

16.
In this paper we propose an adaptive model for multi-mode project scheduling under uncertainty. We assume that there is a due date for concluding the project and a tardiness penalty for failing to meet this due date, and that several distinct modes may be used to undertake each activity. We define scheduling policies based on a set of thresholds. The starting time of the activity is compared with those thresholds in order to define the execution mode.We propose a procedure, based on the electromagnetism heuristic, for choosing a scheduling policy. In computational tests, we conclude that the adaptive scheduling policy found by using the model and the heuristic solution procedure is consistently better than the optimal non-adaptive policy. When the different modes have very different characteristics and there is a reasonable difference between the average duration of the project and the due date, the cost advantage of the adaptive policy becomes very significant.  相似文献   

17.
In this paper, applying the technique of diffusion approximation to an M/G/1 queuing system with removable server, we provide a robust approximation model for determining an optimal operating policy of the system. The following costs are incurred to the system: costs per hour for keeping the server on or off, fixed costs for turning the server on or off, and a holding cost per customer per hour. The expected discounted cost is used as a criterion for optimality. Using a couple of independent diffusion processes approximating the number of customers in the system, we derive approximation formulae of the expected discounted cost that do not depend on the service time distribution but its first two moments. Some new results on the characterization of the optimal operating policy are provided from these results. Moreover, in order to examine the accuracy of the approximation, they are numerically compared with the exact results.  相似文献   

18.
We treat an inventory control problem in a facility that provides a single type of service for customers. Items used in service are supplied by an outside supplier. To incorporate lost sales due to service delay into the inventory control, we model a queueing system with finite waiting room and non-instantaneous replenishment process and examine the impact of finite buffer on replenishment policies. Employing a Markov decision process theory, we characterize the optimal replenishment policy as a monotonic threshold function of reorder point under the discounted cost criterion. We present a simple procedure that jointly finds optimal buffer size and order quantity.  相似文献   

19.
This paper studies the policy iteration algorithm (PIA) for average cost Markov control processes on Borel spaces. Two classes of MCPs are considered. One of them allows some restricted-growth unbounded cost functions and compact control constraint sets; the other one requires strictly unbounded costs and the control constraint sets may be non-compact. For each of these classes, the PIA yields, under suitable assumptions, the optimal (minimum) cost, an optimal stationary control policy, and a solution to the average cost optimality equation.  相似文献   

20.
In the classical Economic Manufacturing Quantity (EMQ) model, it is assumed that all items produced are of perfect quality and the production facility never breaks down. However, in real production, the product quality is usually a function of the state of the production process which may deteriorate over time and the production facility may fail randomly. In this paper, we study the effect of machine failures on the optimal lot size and on the optimal number of inspections in a production cycle. The formula for the long-run expected average cost per unit time is obtained for a generally distributed time to failure. An optimal production/inspection policy is found by minimising the expected average cost.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号