首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most industrial products and processes are characterized by several, typically correlated measurable variables, which jointly describe the product or process quality. Various control charts such as Hotelling’s T2, EWMA and CUSUM charts have been developed for multivariate quality control, where the values of the chart parameters, namely the sample size, sampling interval and the control limits are determined to satisfy given economic and/or statistical requirements. It is well known that this traditional non-Bayesian approach to a control chart design is not optimal, but very few results regarding the form of the optimal Bayesian control policy have appeared in the literature, all limited to a univariate chart design. In this paper, we consider a multivariate Bayesian process mean control problem for a finite production run under the assumption that the observations are values of independent, normally distributed vectors of random variables. The problem is formulated in the POMDP (partially observable Markov decision process) framework and the objective is to determine a control policy minimizing the total expected cost. It is proved that under standard operating and cost assumptions the control limit policy is optimal. Cost comparisons with the benchmark chi-squared chart and the MEWMA chart show that the Bayesian chart is highly cost effective, the savings are larger for smaller values of the critical Mahalanobis distance between the in-control and out-of-control process mean.  相似文献   

2.
This paper proposes a set of methods for solving stochastic decision problems modeled as partially observable Markov decision processes (POMDPs). This approach (Real Time Heuristic Decision System, RT-HDS) is based on the use of prediction methods combined with several existing heuristic decision algorithms. The prediction process is one of tree creation. The value function for the last step uses some of the classic heuristic decision methods. To illustrate how this approach works, comparative results of different algorithms with a variety of simple and complex benchmark problems are reported. The algorithm has also been tested in a mobile robot supervision architecture.  相似文献   

3.
Order Acceptance (OA) is one of the main functions in business control. Accepting an order when capacity is available could disable the system to accept more profitable orders in the future with opportunity losses as a consequence. Uncertain information is also an important issue here. We use Markov decision models and learning methods from Artificial Intelligence to find decision policies under uncertainty. Reinforcement Learning (RL) is quite a new approach in OA. It is shown here that RL works well compared with heuristics. It is demonstrated that employing an RL trained agent is a robust, flexible approach that in addition can be used to support the detection of good heuristics.  相似文献   

4.
The generalization of policies in reinforcement learning is a main issue, both from the theoretical model point of view and for their applicability. However, generalizing from a set of examples or searching for regularities is a problem which has already been intensively studied in machine learning. Thus, existing domains such as Inductive Logic Programming have already been linked with reinforcement learning. Our work uses techniques in which generalizations are constrained by a language bias, in order to regroup similar states. Such generalizations are principally based on the properties of concept lattices. To guide the possible groupings of similar states of the environment, we propose a general algebraic framework, considering the generalization of policies through a partition of the set of states and using a language bias as an a priori knowledge. We give a practical application as an example of our theoretical approach by proposing and experimenting a bottom-up algorithm.  相似文献   

5.
On the evaluation of strategies for branching bandit processes   总被引:1,自引:0,他引:1  
Glazebrook [1] has given an account of improved procedures for strategy evaluation for resource allocation in a stochastic environment. These methods are extended in the paper in such a way that they can be applied to problems which, for example, have precedence constraints and/or an arrivals process of new jobs. Theoretical results, backed up by numerical studies, show that quasi-myopic heuristics often perform well.  相似文献   

6.
An inequality regarding the minimum ofP(lim inf(X n D n )) is proved for a class of random sequences. This result is related to the problem of sufficiency of Markov strategies for Markov decision processes with the Dubins-Savage criterion, the asymptotical behaviour of nonhomogeneous Markov chains, and some other problems.  相似文献   

7.
We prove that if one or more players in a locally finite positional game have winning strategies, then they can find it by themselves, not losing more than a bounded number of plays and not using more than a linear-size memory, independently of the strategies applied by the other players. We design two algorithms for learning how to win. One of them can also be modified to determine a strategy that achieves a draw, provided that no winning strategy exists for the player in question but with properly chosen moves a draw can be ensured from the starting position. If a drawing- or winning strategy exists, then it is learnt after no more than a linear number of plays lost (linear in the number of edges of the game graph). Z. Tuza’s research has been supported in part by the grant OTKA T-049613.  相似文献   

8.
We consider the optimal replacement of a periodically inspected system under Markov deterioration that operates in a controlled environment. Provided are sufficient conditions that characterize an optimal control-limit replacement policy with respect to the system’s condition and its environment. The structure of the optimal policy is illustrated by a numerical example.  相似文献   

9.
We establish the optimality of structured replacement policies for a periodically inspected system that fails silently whenever the cumulative number of shocks, or the magnitude of a single shock it has received, exceeds a corresponding threshold. Shocks arrive according to a Markov-modulated Poisson process which represents the (controllable or uncontrollable) environment.  相似文献   

10.
Bike-sharing systems are becoming increasingly popular in large cities. The natural imbalance and the stochasticity of bike’s arrivals and departures lead operators to develop redistribution strategies in order to ensure a sufficiently high quality of service for users. Using a Markov decision process approach, we develop an implementable decision-support tool which may help the operator to decide at any point of time (i) which station should be prioritized, and (ii) which number of bikes should be added or removed at each station. Our objective is to minimize the rate of arrival of unsatisfied users who find their station empty or full. The existence of an optimal inventory level at each station is proven. It may vary over time but does not depend on the capacity of the truck which operates the repositioning. Next, we compute the relative value function of the system, together with the average cost and the optimal state. These results are used to derive a policy for station’s prioritization using a one-step policy improvement method. We evaluate our policy in comparison with the optimal one and with other intuitive ones in an extended version of our model. From our numerical experiments, we show that only a little intervention of the operator can significantly enhance the quality of service, and that the rule of thumb for bike repositioning is to prioritize the closer, the more active, the closer to be full or empty, and the more imbalanced stations if no reversing in the imbalance is anticipated.  相似文献   

11.
12.
13.
《Operations Research Letters》2014,42(6-7):429-431
This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and λ-policy iteration algorithms are not strongly polynomial.  相似文献   

14.
This paper deals with discrete-time Markov decision processes with average sample-path costs (ASPC) in Borel spaces. The costs may have neither upper nor lower bounds. We propose new conditions for the existence of ε-ASPC-optimal (deterministic) stationary policies in the class of all randomized history-dependent policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of ASPC optimal stationary policies are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has first been used to study the ASPC criterion. Also, the approach provided here is slightly different from the “optimality equation approach” widely used in the previous literature. On the other hand, under mild assumptions we show that average expected cost optimality and ASPC-optimality are equivalent. Finally, we use a controlled queueing system to illustrate our results.  相似文献   

15.
We consider the minimizing risk problems in discounted Markov decisions processes with countable state space and bounded general rewards. We characterize optimal values for finite and infinite horizon cases and give two sufficient conditions for the existence of an optimal policy in an infinite horizon case. These conditions are closely connected with Lemma 3 in White (1993), which is not correct as Wu and Lin (1999) point out. We obtain a condition for the lemma to be true, under which we show that there is an optimal policy. Under another condition we show that an optimal value is a unique solution to some optimality equation and there is an optimal policy on a transient set.  相似文献   

16.
This paper studies both the average sample-path reward (ASPR) criterion and the limiting average variance criterion for denumerable discrete-time Markov decision processes. The rewards may have neither upper nor lower bounds. We give sufficient conditions on the system’s primitive data and under which we prove the existence of ASPR-optimal stationary policies and variance optimal policies. Our conditions are weaker than those in the previous literature. Moreover, our results are illustrated by a controlled queueing system. Research partially supported by the Natural Science Foundation of Guangdong Province (Grant No: 06025063) and the Natural Science Foundation of China (Grant No: 10626021).  相似文献   

17.
We consider a problem where different classes of customers can book different types of service in advance and the service company has to respond immediately to the booking request confirming or rejecting it. The objective of the service company is to maximize profit made of class-type specific revenues, refunds for cancellations or no-shows as well as cost of overtime. For the calculation of the latter, information on the underlying appointment schedule is required. In contrast to most models in the literature we assume that the service time of clients is stochastic and that clients might be unpunctual. Throughout the paper we will relate the problem to capacity allocation in radiology services. The problem is modeled as a continuous-time Markov decision process and solved using simulation-based approximate dynamic programming (ADP) combined with a discrete event simulation of the service period. We employ an adapted heuristic ADP algorithm from the literature and investigate on the benefits of applying ADP to this type of problem. First, we study a simplified problem with deterministic service times and punctual arrival of clients and compare the solution from the ADP algorithm to the optimal solution. We find that the heuristic ADP algorithm performs very well in terms of objective function value, solution time, and memory requirements. Second, we study the problem with stochastic service times and unpunctuality. It is then shown that the resulting policy constitutes a large improvement over an “optimal” policy that is deduced using restrictive, simplifying assumptions.  相似文献   

18.
This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature.  相似文献   

19.
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957).  相似文献   

20.
In this paper, we use reinforcement learning (RL) techniques to determine dynamic prices in an electronic monopolistic retail market. The market that we consider consists of two natural segments of customers, captives and shoppers. Captives are mature, loyal buyers whereas the shoppers are more price sensitive and are attracted by sales promotions and volume discounts. The seller is the learning agent in the system and uses RL to learn from the environment. Under (reasonable) assumptions about the arrival process of customers, inventory replenishment policy, and replenishment lead time distribution, the system becomes a Markov decision process thus enabling the use of a wide spectrum of learning algorithms. In this paper, we use the Q-learning algorithm for RL to arrive at optimal dynamic prices that optimize the seller’s performance metric (either long term discounted profit or long run average profit per unit time). Our model and methodology can also be used to compute optimal reorder quantity and optimal reorder point for the inventory policy followed by the seller and to compute the optimal volume discounts to be offered to the shoppers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号