首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present an implementation of the procedure for determining a suboptimal policy for a large-scale Markov decision process (MDP) presented in Part 1. An operation count analysis illuminates the significant computational benefits of this procedure for determining an optimal policy relative to a procedure for determining a suboptimal policy based on state and action space aggregation. Results of a preliminary numerical study indicate that the quality of the suboptimal policy produced by the 3MDP approach shows promise.This research has been supported by NSF Grants Nos. ECS-80-18266 and ECS-83-19355.  相似文献   

2.
This paper is the first of two papers that present and evaluate an approach for determining suboptimal policies for large-scale Markov decision processes (MDP). Part 1 is devoted to the determination of bounds that motivate the development and indicate the quality of the suboptimal design approach; Part 2 is concerned with the implementation and evaluation of the suboptimal design approach. The specific MDP considered is the infinite-horizon, expected total discounted cost MDP with finite state and action spaces. The approach can be described as follows. First, the original MDP is approximated by a specially structured MDP. The special structure suggests how to construct associated smaller, more computationally tractable MDP's. The suboptimal policy for the original MDP is then constructed from the solutions of these smaller MDP's. The key feature of this approach is that the state and action space cardinalities of the smaller MDP's are exponential reductions of the state and action space cardinalities of the original MDP.  相似文献   

3.
We consider a discrete time Markov decision process (MDP) with a finite state space, a finite action space, and two kinds of immediate rewards. The problem is to maximize the time average reward generated by one reward stream, subject to the other reward not being smaller than a prescribed value. An MDP with a reward constraint can be solved by linear programming in the range of mixed policies. On the other hand, when we restrict ourselves to pure policies, the problem is a combinatorial problem, for which a solution has not been discovered. In this paper, we propose an approach by Genetic Algorithms (GAs) in order to obtain an effective search process and to obtain a near optimal, possibly optimal pure stationary policy. A numerical example is given to examine the efficiency of the approach proposed.  相似文献   

4.
5.
The usual approach to finding optimal repair limits on failure of a component is to use a finite state approximation Markov Decision Process (MDP). In this paper an alternative approach is introduced. Assuming a stochastically increasing repair cost, the optimum solution is shown to satisfy a certain two-point boundary condition, first order differential equation. An asymptotic formula for the optimal repair limit function is derived. Numerical solutions are obtained for some Weibull and Special Erlang distributed time to failure distributions. The structural form of the repair limit function results in a solution procedure which is several orders of magnitude faster than is achievable using previous methods.  相似文献   

6.
We consider the dispatching problem in a size- and state-aware multi-queue system with Poisson arrivals and queue-specific job sizes. By size- and state-awareness, we mean that the dispatcher knows the size of an arriving job and the remaining service times of the jobs in each queue. By queue-specific job sizes, we mean that the time to process a job may depend on the chosen server. We focus on minimizing the mean sojourn time (i.e., response time) by an MDP approach. First we derive the so-called size-aware relative values of states with respect to the sojourn time in an M/G/1 queue operating under FIFO, LIFO, SPT or SRPT disciplines. For FIFO and LIFO, the size-aware relative values turn out to be insensitive to the form of the job size distribution. The relative values are then exploited in developing efficient dispatching rules in the spirit of the first policy iteration.  相似文献   

7.
Single Sample Path-Based Optimization of Markov Chains   总被引:11,自引:0,他引:11  
Motivated by the needs of on-line optimization of real-world engineering systems, we studied single sample path-based algorithms for Markov decision problems (MDP). The sample path used in the algorithms can be obtained by observing the operation of a real system. We give a simple example to explain the advantages of the sample path-based approach over the traditional computation-based approach: matrix inversion is not required; some transition probabilities do not have to be known; it may save storage space; and it gives the flexibility of iterating the actions for a subset of the state space in each iteration. The effect of the estimation errors and the convergence property of the sample path-based approach are studied. Finally, we propose a fast algorithm, which updates the policy whenever the system reaches a particular set of states and prove that the algorithm converges to the true optimal policy with probability one under some conditions. The sample path-based approach may have important applications to the design and management of engineering systems, such as high speed communication networks.This work was supported in part by  相似文献   

8.
We have developed an approach to implement a real time admissible heuristic search algorithm to solve project scheduling problems. This algorithm is characterised by the complete heuristic learning process: state selection, heuristic learning, and search path review. This implementation approach is based on the dynamic nature of the activity status and the resource availability of a project. It consists of states, state transition operator, heuristic estimate, and the cost of transition between states. The performance analysis shows that the accumulation of heuristic learning during the search process has led to the re-scheduling of resource dominating activities, which is a major factor in controlling the overall project completion time.  相似文献   

9.
In a hidden Markov model, the underlying Markov chain is usually unobserved. Often, the state path with maximum posterior probability (Viterbi path) is used as its estimate. Although having the biggest posterior probability, the Viterbi path can behave very atypically by passing states of low marginal posterior probability. To avoid such situations, the Viterbi path can be modified to bypass such states. In this article, an iterative procedure for improving the Viterbi path in such a way is proposed and studied. The iterative approach is compared with a simple batch approach where a number of states with low probability are all replaced at the same time. It can be seen that the iterative way of adjusting the Viterbi state path is more efficient and it has several advantages over the batch approach. The same iterative algorithm for improving the Viterbi path can be used when it is possible to reveal some hidden states and estimating the unobserved state sequence can be considered as an active learning task. The batch approach as well as the iterative approach are based on classification probabilities of the Viterbi path. Classification probabilities play an important role in determining a suitable value for the threshold parameter used in both algorithms. Therefore, properties of classification probabilities under different conditions on the model parameters are studied.  相似文献   

10.
11.
We consider a discrete time finite Markov decision process (MDP) with the discounted and weighted reward optimality criteria. In [1] the authors considered some decomposition of limiting average MDPs. In this paper, we use an analogous approach for discounted and weighted MDPs. Then, we construct some hierarchical decomposition algorithms for both discounted and weighted MDPs.  相似文献   

12.
This paper presents a model for improving utilization in IEEE 802.11e wireless LAN via a Markov decision process (MDP) approach. A Markov chain tracking the utilized transmission window for two separate access mechanisms is devised. Subsequently, the action space and the rewards of the MDP are judiciously selected with the aim of improving overall utilization without explicit blocking. The proposed MDP model for 802.11e reveals that proportional allocation of access opportunities improve overall utilization compared to completely randomized access. Simulation results go on to show that a policy that limits HCCA access as a function of channel load improves utilization by an average 8 %. The optimization framework proposed in this paper is promising as a practical decision support tool for resource planning in 802.11e.  相似文献   

13.
Directed hypergraphs represent a general modelling and algorithmic tool, which have been successfully used in many different research areas such as artificial intelligence, database systems, fuzzy systems, propositional logic and transportation networks. However, modelling Markov decision processes using directed hypergraphs has not yet been considered.In this paper we consider finite-horizon Markov decision processes (MDPs) with finite state and action space and present an algorithm for finding the K best deterministic Markov policies. That is, we are interested in ranking the first K deterministic Markov policies in non-decreasing order using an additive criterion of optimality. The algorithm uses a directed hypergraph to model the finite-horizon MDP. It is shown that the problem of finding the optimal policy can be formulated as a minimum weight hyperpath problem and be solved in linear time, with respect to the input data representing the MDP, using different additive optimality criteria.  相似文献   

14.
15.
In many decision-making situations, decision makers (DMs) have difficulty in specifying their perceived state probability values or even probability value ranges. However, they may find it easier to tell how much more likely is the occurrence of a given state when compared with other states. An approach is proposed to identify the efficient strategies of a decision-making situation where the DMs involved declare their perceived relative likelihood of the occurrence of the states by pair-wise comparisons. The pair-wise comparisons of all the states are used to construct a judgment matrix, which is transformed into a probability matrix. The columns of the transformed matrix delineate a convex cone of the state probabilities. Next, an efficiency linear program (ELP) is formulated for each available strategy, whose optimal solution determines whether or not that strategy is efficient within the probability region defined by the cone. Only an efficient strategy can be optimum for a given set of state probability values. Inefficient strategies are never used irrespective of state probability values. The application of the approach is demonstrated using examples where DMs offer differing views on the occurrence of the states.  相似文献   

16.
Linear Programming is known to be an important and useful tool for solving Markov Decision Processes (MDP). Its derivation relies on the Dynamic Programming approach, which also serves to solve MDP. However, for Markov Decision Processes with several constraints the only available methods are based on Linear Programs. The aim of this paper is to investigate some aspects of such Linear Programs, related to multi-chain MDPs. We first present a stochastic interpretation of the decision variables that appear in the Linear Programs available in the literature. We then show for the multi-constrained Markov Decision Process that the Linear Program suggested in [9] can be obtained from an equivalent unconstrained Lagrange formulation of the control problem. This shows the connection between the Linear Program approach and the Lagrange approach, that was previously used only for the case of a single constraint [3, 14, 15].  相似文献   

17.
We consider limiting average Markov decision processes (MDP) with finite state and action spaces. We propose some algorithms to determine optimal strategies for deterministic and general MDPs. These algorithms are based on graph theory and the construction of levels in some aggregated MDP.  相似文献   

18.
The maximum diversity problem (MDP) is a challenging NP-hard problem with a wide range of real applications. Several researchers have pointed out close relationship between the MDP and unconstrained binary quadratic program (UBQP). In this paper, we provide procedures to solve MDP ideas from the UBQP formulation of the problem. We first give some local optimality results for r-flip improvement procedures on MDP. Then, a set of highly effective diversification approaches based on sequential improvement steps for MDP are presented. Four versions of the approaches are used within a simple tabu search and applied to 140 benchmark MDP problems available on the Internet. The procedures solve all 80 small- to medium-sized problems instantly to the best known solutions. For 22 of the 60 large problems, the procedures improved by significant amounts the best known solutions in reasonably short CPU time.  相似文献   

19.
We discuss the problem of guaranteed guidance of a linear control system by a fixed time under the assumption that the system is subject to an unknown disturbance. We consider the case when a part of state coordinates are measured and the set of unknown initial states is finite. We specify a solution algorithm based on the combination of the package approach, the theory of dynamic inversion, and the extremal shift method.  相似文献   

20.
Ambulance diversion (AD) is used by emergency departments (EDs) to relieve congestion by requesting ambulances to bypass the ED and transport patients to another facility. We study optimal AD control policies using a Markov Decision Process (MDP) formulation that minimizes the average time that patients wait beyond their recommended safety time threshold. The model assumes that patients can be treated in one of two treatment areas and that the distribution of the time to start treatment at the neighboring facility is known. Assuming Poisson arrivals and exponential times for the length of stay in the ED, we show that the optimal AD policy follows a threshold structure, and explore the behavior of optimal policies under different scenarios. We analyze the value of information on the time to start treatment in the neighboring hospital, and show that optimal policies depend strongly on the congestion experienced by the other facility. Simulation is used to compare the performance of the proposed MDP model to that of simple heuristics under more realistic assumptions. Results indicate that the MDP model performs significantly better than the tested heuristics under most cases. Finally, we discuss practical issues related to the implementation of the policies prescribed by the MDP.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号