首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper investigates an optimal inspection and replacement problem for a discrete-time Markovian deterioration system. It is assumed that the system is monitored incompletely by a certain mechanism which gives the decision maker some information about the exact state of the system. The problem is to obtain an optimal inspection and replacement policy minimizing the expected total discounted cost over an infinite horizon and formulated as a partially observable Markov decision process. Furthermore, under some reasonable conditions reflecting the practical meaning of the deterioration, it is shown that there exists an optimal inspection and replacement policy in the class of monotonic four-region policies.  相似文献   

2.
We investigate stationary hidden Markov processes for which mutual information between the past and the future is infinite. It is assumed that the number of observable states is finite and the number of hidden states is countably infinite. Under this assumption, we show that the block mutual information of a hidden Markov process is upper bounded by a power law determined by the tail index of the hidden state distribution. Moreover, we exhibit three examples of processes. The first example, considered previously, is nonergodic and the mutual information between the blocks is bounded by the logarithm of the block length. The second example is also nonergodic but the mutual information between the blocks obeys a power law. The third example obeys the power law and is ergodic.  相似文献   

3.
A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given.  相似文献   

4.
Algorithms are described for determining optimal policies for finite state, finite action, infinite discrete time horizon Markov decision processes. Both value-improvement and policy-improvement techniques are used in the algorithms. Computing procedures are also described. The algorithms are appropriate for processes that are either finite or infinite, deterministic or stochastic, discounted or undiscounted, in any meaningful combination of these features. Computing procedures are described in terms of initial data processing, bound improvements, process reduction, and testing and solution. Application of the methodology is illustrated with an example involving natural resource management. Management implications of certain hypothesized relationships between mallard survival and harvest rates are addressed by applying the optimality procedures to mallard population models.  相似文献   

5.
Unlike in finite dimensions, a basic feasible solution characterization of extreme points does not hold in countably infinite linear programs. We develop regularity conditions under which such a characterization is possible. Applications to infinite network flow problems and non-stationary Markov decision processes are presented.  相似文献   

6.
We develop a Markov decision process formulation of a dynamic pricing problem for multiple substitutable flights between the same origin and destination, taking into account customer choice among the flights. The model is rendered computationally intractable for exact solution by its multi-dimensional state and action spaces, so we develop and analyze various bounds and heuristics. We first describe three related models, each based on some form of pooling, and introduce heuristics suggested by these models. We also develop separable bounds for the value function which are used to construct value- and policy-approximation heuristics. Extensive numerical experiments show the value- and policy-approximation approaches to work well across a wide range of problem parameters, and to outperform the pooling-based heuristics in most cases. The methods are applicable even for large problems, and are potentially useful for practical applications.  相似文献   

7.
We present necessary and sufficient conditions for discrete infinite horizon optimization problems with unique solutions to be solvable. These problems can be equivalently viewed as the task of finding a shortest path in an infinite directed network. We provide general forward algorithms with stopping rules for their solution. The key condition required is that of weak reachability, which roughly requires that for any sequence of nodes or states, it must be possible from optimal states to reach states close in cost to states along this sequence. Moreover the costs to reach these states must converge to zero. Applications are considered in optimal search, undiscounted Markov decision processes, and deterministic infinite horizon optimization.This work was supported in part by NSF Grant ECS-8700836 to The University of Michigan.  相似文献   

8.
Decision-making in an environment of uncertainty and imprecision for real-world problems is a complex task. In this paper it is introduced general finite state fuzzy Markov chains that have a finite convergence to a stationary (may be periodic) solution. The Cesaro average and the -potential for fuzzy Markov chains are defined, then it is shown that the relationship between them corresponds to the Blackwell formula in the classical theory of Markov decision processes. Furthermore, it is pointed out that recurrency does not necessarily imply ergodicity. However, if a fuzzy Markov chain is ergodic, then the rows of its ergodic projection equal the greatest eigen fuzzy set of the transition matrix. Then, the fuzzy Markov chain is shown to be a robust system with respect to small perturbations of the transition matrix, which is not the case for the classical probabilistic Markov chains. Fuzzy Markov decision processes are finally introduced and discussed.  相似文献   

9.
This paper considers the solution of Markov decision problems whose parameters can be obtained only via approximating schemes, or where it is computationally preferable to approximate the parameters, rather than employing exact algorithms for their computation.Various models are presented in which this situation occurs. Furthermore, it is shown that a modified value-iteration method may be employed, both for the discounted version and for the undiscounted version of the model, in order to solve the optimality equation and to find optimal policies. In both cases, the convergence rate is determined.As a side result, we characterize the asymptotic behavior of backward products of a geometrically convergent sequence of Markov matrices.  相似文献   

10.
In this paper we study a Markov decision model with quasi-hyperbolic discounting and transition probability function depending on an unknown parameter. Assuming that the set of parameters is finite, the sets of states and actions are Borel and the transition probabilities satisfy some additivity conditions and are atomless, we prove the existence of a non-randomised robust Markov perfect equilibrium.  相似文献   

11.
We introduce a revised simplex algorithm for solving a typical type of dynamic programming equation arising from a class of finite Markov decision processes. The algorithm also applies to several types of optimal control problems with diffusion models after discretization. It is based on the regular simplex algorithm, the duality concept in linear programming, and certain special features of the dynamic programming equation itself. Convergence is established for the new algorithm. The algorithm has favorable potential applicability when the number of actions is very large or even infinite.  相似文献   

12.
1. IntroductionThe motivation of writing this paper was from calculating the blocking probability foran overloaded finite system. Our numerical experiments suggested that this probability canbe approximated efficiently by rotating the transition matrix by 180". Some preliminaryresults were obtained and can be found in [11 and [2]. Rotating the transition matrix definesa new Markov chain, which is often called the dual process in the literature, for example,[3--7]. For a finite Markov chain, …  相似文献   

13.
Summary This paper develops a new framework for the study of Markov decision processes in which the control problem is viewed as an optimization problem on the set of canonically induced measures on the trajectory space of the joint state and control process. This set is shown to be compact convex. One then associates with each of the usual cost criteria (infinite horizon discounted cost, finite horizon, control up to an exit time) a naturally defined occupation measure such that the cost is an integral of some function with respect to this measure. These measures are shown to form a compact convex set whose extreme points are characterized. Classical results about existence of optimal strategies are recovered from this and several applications to multicriteria and constrained optimization problems are briefly indicated.Research supported by NSF Grant CDR-85-00108  相似文献   

14.
Military medical planners must develop dispatching policies that dictate how aerial medical evacuation (MEDEVAC) units are utilized during major combat operations. The objective of this research is to determine how to optimally dispatch MEDEVAC units in response to 9-line MEDEVAC requests to maximize MEDEVAC system performance. A discounted, infinite horizon Markov decision process (MDP) model is developed to examine the MEDEVAC dispatching problem. The MDP model allows the dispatching authority to accept, reject, or queue incoming requests based on a request’s classification (i.e., zone and precedence level) and the state of the MEDEVAC system. A representative planning scenario based on contingency operations in southern Afghanistan is utilized to investigate the differences between the optimal dispatching policy and three practitioner-friendly myopic policies. Two computational experiments are conducted to examine the impact of selected MEDEVAC problem features on the optimal policy and the system performance measure. Several excursions are examined to identify how the 9-line MEDEVAC request arrival rate and the MEDEVAC flight speeds impact the optimal dispatching policy. Results indicate that dispatching MEDEVAC units considering the precedence level of requests and the locations of busy MEDEVAC units increases the performance of the MEDEVAC system. These results inform the development and implementation of MEDEVAC tactics, techniques, and procedures by military medical planners. Moreover, an analysis of solution approaches for the MEDEVAC dispatching problem reveals that the policy iteration algorithm substantially outperforms the linear programming algorithms executed by CPLEX 12.6 with regard to computational effort. This result supports the claim that policy iteration remains the superlative solution algorithm for exactly solving computationally tractable Markov decision problems.  相似文献   

15.
This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results.  相似文献   

16.
In a M/M/N+M queue, when there are many customers waiting, it may be preferable to reject a new arrival rather than risk that arrival later abandoning without receiving service. On the other hand, rejecting new arrivals increases the percentage of time servers are idle, which also may not be desirable. We address these trade-offs by considering an admission control problem for a M/M/N+M queue when there are costs associated with customer abandonment, server idleness, and turning away customers. First, we formulate the relevant Markov decision process (MDP), show that the optimal policy is of threshold form, and provide a simple and efficient iterative algorithm that does not presuppose a bounded state space to compute the minimum infinite horizon expected average cost and associated threshold level. Under certain conditions we can guarantee that the algorithm provides an exact optimal solution when it stops; otherwise, the algorithm stops when a provided bound on the optimality gap is reached. Next, we solve the approximating diffusion control problem (DCP) that arises in the Halfin–Whitt many-server limit regime. This allows us to establish that the parameter space has a sharp division. Specifically, there is an optimal solution with a finite threshold level when the cost of an abandonment exceeds the cost of rejecting a customer; otherwise, there is an optimal solution that exercises no control. This analysis also yields a convenient analytic expression for the infinite horizon expected average cost as a function of the threshold level. Finally, we propose a policy for the original system that is based on the DCP solution, and show that this policy is asymptotically optimal. Our extensive numerical study shows that the control that arises from solving the DCP achieves a very similar cost to the control that arises from solving the MDP, even when the number of servers is small.  相似文献   

17.
We consider a service system with a single server, a finite waiting room and two classes of customers with deterministic service time. Primary jobs arrive at random and are admitted whenever there is room in the system. At the beginning of each period, secondary jobs can be admitted from an infinite pool. A revenue is earned upon admission of each job, with the primary jobs bringing a higher contribution than the secondary jobs, the objective being to maximize the total discounted revenue over an infinite horizon. We model the system as a discrete time Markov decision process and show that a monotone admission policy is optimal when the number of primary arrivals has a fixed distribution. Moreover, when the primary arrival distribution varies with time according to a finite state Markov chain, we show that the optimal policy is conditionally monotone and that the numerical computation of an optimal policy, in this case, is substantially more difficult than in the case of stationary arrivals.This research was supported in part by the National Science Foundation, under grant ECS-8803061, while the author was at the University of Arizona.  相似文献   

18.
19.
Dynamic principal agent models are formulated based on constrained Markov decision process (CMDP), in which conditions are given that the state space of the system is countable and the agent chooses his actions from a countable action set. If the principal has finite alternative contracts to select, it is shown that the optimal contract solution and the corresponding optimal policy can be obtained by linear programming under the discounted criterion and average criterion. The paper is supported by Zhejiang Provincial Nature Science Foundation of China (701017) and by Scientific Research Fund of Zhejiang Provincial Education Department.  相似文献   

20.
In a decision process (gambling or dynamic programming problem) with finite state space and arbitrary decision sets (gambles or actions), there is always available a Markov strategy which uniformly (nearly) maximizes the average time spent at a goal. If the decision sets are closed, there is even a stationary strategy with the same property.Examples are given to show that approximations by discounted or finite horizon payoffs are not useful for the general average reward problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号