首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
Linear Programming is known to be an important and useful tool for solving Markov Decision Processes (MDP). Its derivation relies on the Dynamic Programming approach, which also serves to solve MDP. However, for Markov Decision Processes with several constraints the only available methods are based on Linear Programs. The aim of this paper is to investigate some aspects of such Linear Programs, related to multi-chain MDPs. We first present a stochastic interpretation of the decision variables that appear in the Linear Programs available in the literature. We then show for the multi-constrained Markov Decision Process that the Linear Program suggested in [9] can be obtained from an equivalent unconstrained Lagrange formulation of the control problem. This shows the connection between the Linear Program approach and the Lagrange approach, that was previously used only for the case of a single constraint [3, 14, 15].  相似文献   

2.
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms.  相似文献   

3.
This paper develops an efficient LP algorithm for solving single chain undiscounted Markov decision problems. The algorithm imposes, in the framework of the simplex method, the multiple choice constraints that exactly one basic variable be chosen from each Markov state. It is proved that the algorithm converges to an optimal solution in a finite number of steps.  相似文献   

4.
Planning horizon is a key issue in production planning. Different from previous approaches based on Markov Decision Processes, we study the planning horizon of capacity planning problems within the framework of stochastic programming. We first consider an infinite horizon stochastic capacity planning model involving a single resource, linear cost structure, and discrete distributions for general stochastic cost and demand data (non-Markovian and non-stationary). We give sufficient conditions for the existence of an optimal solution. Furthermore, we study the monotonicity property of the finite horizon approximation of the original problem. We show that, the optimal objective value and solution of the finite horizon approximation problem will converge to the optimal objective value and solution of the infinite horizon problem, when the time horizon goes to infinity. These convergence results, together with the integrality of decision variables, imply the existence of a planning horizon. We also develop a useful formula to calculate an upper bound on the planning horizon. Then by decomposition, we show the existence of a planning horizon for a class of very general stochastic capacity planning problems, which have complicated decision structure.  相似文献   

5.
6.
Finite and infinite planning horizon Markov decision problems are formulated for a class of jump processes with general state and action spaces and controls which are measurable functions on the time axis taking values in an appropriate metrizable vector space. For the finite horizon problem, the maximum expected reward is the unique solution, which exists, of a certain differential equation and is a strongly continuous function in the space of upper semi-continuous functions. A necessary and sufficient condition is provided for an admissible control to be optimal, and a sufficient condition is provided for the existence of a measurable optimal policy. For the infinite horizon problem, the maximum expected total reward is the fixed point of a certain operator on the space of upper semi-continuous functions. A stationary policy is optimal over all measurable policies in the transient and discounted cases as well as, with certain added conditions, in the positive and negative cases.  相似文献   

7.
二层决策问题的灵敏度分析(2)   总被引:1,自引:0,他引:1  
二层决策系统包含着两个最优化决策问题,其中上层决策问题的目标值是由下层决策的解所隐含地确定的.本文研究了二层决策问题的另一方面的灵敏度分析问题,讨论了上层决策者的价值系数发生变化而二层决策问题的最优解不变所产生的灵敏度分析问题.为了确定二层决策问题价值系数发生变化的范围,首先我们给出了灵敏度分析的基本方法,结合“k th-best”算法我们又给出了灵敏度分析的操作步骤.在所确定的变化范围内,价值系数的变化,不会引起二层决策问题的全局最优解的变化,从而为决策者提供了相对稳定的决策方案.最后我们给出了数值实例,它表明本文所给出的灵敏度分析的方法是正确的.  相似文献   

8.
Asset allocation among diverse financial markets is essential for investors especially under situations such as the financial crisis of 2008. Portfolio optimization is the most developed method to examine the optimal decision for asset allocation. We employ the hidden Markov model to identify regimes in varied financial markets; a regime switching model gives multiple distributions and this information can convert the static mean–variance model into an optimization problem under uncertainty, which is the case for unobservable market regimes. We construct a stochastic program to optimize portfolios under the regime switching framework and use scenario generation to mathematically formulate the optimization problem. In addition, we build a simple example for a pension fund and examine the behavior of the optimal solution over time by using a rolling-horizon simulation. We conclude that the regime information helps portfolios avoid risk during left-tail events.  相似文献   

9.
This paper presents an application of Lemke's method to a class of Markov decision problems, appearing in the optimal stopping problems, and other well-known optimization problems. We consider a special case of the Markov decision problems with finitely many states, where the agent can choose one of the alternatives; getting a fixed reward immediately or paying the penalty for one term. We show that the problem can be reduced to a linear complementarity problem that can be solved by Lemke's method with the number of iterations less than the number of states. The reduced linear complementarity problem does not necessarily satisfy the copositive-plus condition. Nevertheless we show that the Lemke's method succeeds in solving the problem by proving that the problem satisfies a necessary and sufficient condition for the extended Lemke's method to compute a solution in the piecewise linear complementarity problem.  相似文献   

10.
In multi-location inventory systems, transshipments are often used to improve customer service and reduce cost. Determining optimal transshipment policies for such systems involves a complex optimisation problem that is only tractable for systems with few locations. Consequently simple heuristic transshipment policies are often applied in practice. This paper develops an approximate solution method which applies decomposition to reduce a Markov decision process model of a multi-location inventory system into a number of models involving only two locations. The value functions from the subproblems are used to estimate the fair charge for the inventory provided in a transshipment. This estimate of the fair charge is used as the decision criterion in a heuristic transshipment policy for the multi-location system. A numerical study shows that the proposed heuristic can deliver considerable cost savings compared to the simple heuristics often used in practice.  相似文献   

11.
We evaluate the benefits of coordinating capacity and inventory decisions in a make-to-stock production environment. We consider a firm that faces multi-class demand and has additional capacity options that are temporary and randomly available. We formulate the model as a Markov decision process (MDP) and prove that a solution to the optimal joint control problem exists. For several special cases we characterize the structure of the optimal policy. For the general case, however, we show that the optimal policy is state-dependent, and in many instances non-monotone and difficult to implement. Therefore, we consider three pragmatic heuristic policies and assess their performance. We show that the majority of the savings originate from the ability to dynamically adjust capacity, and that a simple heuristic that can adjust production capacity (based on workload fluctuation) but uses a static production/rationing policy can result in significant savings.  相似文献   

12.
We present necessary and sufficient conditions for discrete infinite horizon optimization problems with unique solutions to be solvable. These problems can be equivalently viewed as the task of finding a shortest path in an infinite directed network. We provide general forward algorithms with stopping rules for their solution. The key condition required is that of weak reachability, which roughly requires that for any sequence of nodes or states, it must be possible from optimal states to reach states close in cost to states along this sequence. Moreover the costs to reach these states must converge to zero. Applications are considered in optimal search, undiscounted Markov decision processes, and deterministic infinite horizon optimization.This work was supported in part by NSF Grant ECS-8700836 to The University of Michigan.  相似文献   

13.
In this paper, we study discounted Markov decision processes on an uncountable state space. We allow a utility (reward) function to be unbounded both from above and below. A new feature in our approach is an easily verifiable rate of growth condition introduced for a positive part of the utility function. This assumption, in turn, enables us to prove the convergence of a value iteration algorithm to a solution to the Bellman equation. Moreover, by virtue of the optimality equation we show the existence of an optimal stationary policy.  相似文献   

14.
In this paper a continuous-time discounted dynamic programming problem in a Markov decision model is investigated. In many cases it is difficult to search directly for an optimal solution for such a programming problem. We introduce a Lagrangian-type programming problem associated with the original programming problem and show that, under some assumptions, a weak optimal solution exists for the Lagrangian problem. Moreover, we consider the original programming problem in the perturbed programming one and develop the Lagrangian duality.  相似文献   

15.
We consider the optimal asset allocation problem in a continuous-time regime-switching market. The problem is to maximize the expected utility of the terminal wealth of a portfolio that contains an option, an underlying stock and a risk-free bond. The difficulty that arises in our setting is finding a way to represent the return of the option by the returns of the stock and the risk-free bond in an incomplete regime-switching market. To overcome this difficulty, we introduce a functional operator to generate a sequence of value functions, and then show that the optimal value function is the limit of this sequence. The explicit form of each function in the sequence can be obtained by solving an auxiliary portfolio optimization problem in a single-regime market. And then the original optimal value function can be approximated by taking the limit. Additionally, we can also show that the optimal value function is a solution to a dynamic programming equation, which leads to the explicit forms for the optimal value function and the optimal portfolio process. Furthermore, we demonstrate that, as long as the current state of the Markov chain is given, it is still optimal for an investor in a multiple-regime market to simply allocate his/her wealth in the same way as in a single-regime market.  相似文献   

16.
We computationally assess policies for the elevator control problem by a new column-generation approach for the linear programming method for discounted infinite-horizon Markov decision problems. By analyzing the optimality of given actions in given states, we were able to provably improve the well-known nearest-neighbor policy. Moreover, with the method we could identify an optimal parking policy. This approach can be used to detect and resolve weaknesses in particular policies for Markov decision problems.  相似文献   

17.
Value iteration and optimization of multiclass queueing networks   总被引:2,自引:0,他引:2  
Chen  Rong-Rong  Meyn  Sean 《Queueing Systems》1999,32(1-3):65-97
This paper considers in parallel the scheduling problem for multiclass queueing networks, and optimization of Markov decision processes. It is shown that the value iteration algorithm may perform poorly when the algorithm is not initialized properly. The most typical case where the initial value function is taken to be zero may be a particularly bad choice. In contrast, if the value iteration algorithm is initialized with a stochastic Lyapunov function, then the following hold: (i) a stochastic Lyapunov function exists for each intermediate policy, and hence each policy is regular (a strong stability condition), (ii) intermediate costs converge to the optimal cost, and (iii) any limiting policy is average cost optimal. It is argued that a natural choice for the initial value function is the value function for the associated deterministic control problem based upon a fluid model, or the approximate solution to Poisson’s equation obtained from the LP of Kumar and Meyn. Numerical studies show that either choice may lead to fast convergence to an optimal policy. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

18.
In just-in-time (JIT) production systems, there is both input stock in the form of parts and output stock in the form of product at each stage. These activities are controlled by production-ordering and withdrawal kanbans. This paper discusses a discrete-time optimal control problem in a multistage JIT-based production and distribution system with stochastic demand and capacity, developed to minimize the expected total cost per unit of time. The problem can be formulated as an undiscounted Markov decision process (UMDP); however, the curse of dimensionality makes it very difficult to find an exact solution. The author proposes a new neuro-dynamic programming (NDP) algorithm, the simulation-based modified policy iteration method (SBMPIM), to solve the optimal control problem. The existing NDP algorithms and SBMPIM are numerically compared with a traditional UMDP algorithm for a single-stage JIT production system. It is shown that all NDP algorithms except the SBMPIM fail to converge to an optimal control.Additionally, a new algorithm for finding the optimal parameters of pull systems is proposed. Numerical comparisons between near-optimal controls computed using the SBMPIM and optimized pull systems are conducted for three-stage JIT-based production and distribution systems. UMDPs with 42 million states are solved using the SBMPIM. The pull systems discussed are the kanban, base stock, CONWIP, hybrid and extended kanban.  相似文献   

19.
A new decomposition method for multistage stochastic linear programming problems is proposed. A multistage stochastic problem is represented in a tree-like form and with each node of the decision tree a certain linear or quadratic subproblem is associated. The subproblems generate proposals for their successors and some backward information for their predecessors. The subproblems can be solved in parallel and exchange information in an asynchronous way through special buffers. After a finite time the method either finds an optimal solution to the problem or discovers its inconsistency. An analytical illustrative example shows that parallelization can speed up computation over every sequential method. Computational experiments indicate that for large problems we can obtain substantial gains in efficiency with moderate numbers of processors.This work was partly supported by the International Institute for Applied Systems Analysis, Laxenburg, Austria.  相似文献   

20.
In this paper, by considering the experts' vague or fuzzy understanding of the nature of the parameters in the problem formulation process, multiobjective linear fractional programming problems with block angular structure involving fuzzy numbers are formulated. Using the a-level sets of fuzzy numbers, the corresponding nonfuzzy a-multiobjective linear fractional programming problem is introduced. The fuzzy goals of the decision maker for the objective functions are quantified by eliciting the corresponding membership functions including nonlinear ones. Through the introduction of extended Pareto optimality concepts, if the decision maker specifies the degree a and the reference membership values, the corresponding extended Pareto optimal solution can be obtained by solving the minimax problems for which the Dantzig-Wolfe decomposition method and Ritter's partitioning procedure are applicable. Then a linear programming-based interactive fuzzy satisficing method with decomposition procedures for deriving a satisficing solution for the decision maker efficiently from an extended Pareto optimal solution set is presented. An illustrative numerical example is provided to demonstrate the feasibility of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号