首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary This paper develops a new framework for the study of Markov decision processes in which the control problem is viewed as an optimization problem on the set of canonically induced measures on the trajectory space of the joint state and control process. This set is shown to be compact convex. One then associates with each of the usual cost criteria (infinite horizon discounted cost, finite horizon, control up to an exit time) a naturally defined occupation measure such that the cost is an integral of some function with respect to this measure. These measures are shown to form a compact convex set whose extreme points are characterized. Classical results about existence of optimal strategies are recovered from this and several applications to multicriteria and constrained optimization problems are briefly indicated.Research supported by NSF Grant CDR-85-00108  相似文献   

2.
本文讨论了一类非时齐部分可观察Markov决策模型.在不改变状态空间可列性的条件下,把该模型转化为[5]中的一般化折扣模型,从而解决了其最优策略问题,并且得到了该模型的有限阶段逼近算法,其中该算法涉及的状态是可列的.  相似文献   

3.
This paper derives an inventory model for deteriorating items with stock-dependent consumption rate and shortages under inflation and time discounting over a finite planning horizon. We show that the total cost function is convex. With the convexity, a simple solution algorithm is presented to determine the optimal order quantity and the optimal interval of the total cost function. The results are discussed with a numerical example and particular cases of the model are discussed in brief. A sensitivity analysis of the optimal solution with respect to the parameters of the system is carried out.  相似文献   

4.
Continuous time Markovian decision models with countable state space are investigated. The existence of an optimal stationary policy is established for the expected average return criterion function. It is shown that the expected average return can be expressed as an expected discounted return of a related Markovian decision process. A policy iteration method is given which converges to an optimal deterministic policy, the policy so obtained is shown optimal over all Markov policies.  相似文献   

5.
具有库存损耗且允许缺货的EOQ模型   总被引:2,自引:0,他引:2  
提出了一种库存损耗量随时间和库存量变化,且允许缺货的EOQ模型,证明了该模型的平均总费用函数在给定条件下为凸函数,并讨论了模型的最优策略及近似解.  相似文献   

6.
This paper provides a characterization of the optimal average cost function, when the long-run (risk-sensitive) average cost criterion is used. The Markov control model has a denumerable state space with finite set of actions, and the characterization presented is given in terms of a system of local Poisson equations, which gives as a by-product the existence of an optimal stationary policy.  相似文献   

7.
《Optimization》2012,61(2):255-269
Constrained Markov decision processes with compact state and action spaces are studied under long-run average reward or cost criteria. By introducing a corresponding Lagrange function, a saddle-point theorem is given, by which the existence of a constrained optimal pair of initial state distribution and policy is shown. Also, under the hypothesis of Doeblin, the functional characterization of a constrained optimal policy is obtained  相似文献   

8.
We establish a flexible capacity strategy model with multiple market periods under demand uncertainty and investment constraints. In the model, a firm makes its capacity decision under a financial budget constraint at the beginning of the planning horizon which embraces n market periods. In each market period, the firm goes through three decision-making stages: the safety production stage, the additional production stage and the optimal sales stage. We formulate the problem and obtain the optimal capacity, the optimal safety production, the optimal additional production and the optimal sales of each market period under different situations. We find that there are two thresholds for the unit capacity cost. When the capacity cost is very low, the optimal capacity is determined by its financial budget; when the capacity cost is very high, the firm keeps its optimal capacity at its safety production level; and when the cost is in between of the two thresholds, the optimal capacity is determined by the capacity cost, the number of market periods and the unit cost of additional production. Further, we explore the endogenous safety production level. We verify the conditions under which the firm has different optimal safety production levels. Finally, we prove that the firm can benefit from the investment only when the designed planning horizon is longer than a threshold. Moreover, we also derive the formulae for the above three thresholds.  相似文献   

9.
We consider the problem of optimally maintaining a periodically inspected system that deteriorates according to a discrete-time Markov process and has a limit on the number of repairs that can be performed before it must be replaced. After each inspection, a decision maker must decide whether to repair the system, replace it with a new one, or leave it operating until the next inspection, where each repair makes the system more susceptible to future deterioration. If the system is found to be failed at an inspection, then it must be either repaired or replaced with a new one at an additional penalty cost. The objective is to minimize the total expected discounted cost due to operation, inspection, maintenance, replacement and failure. We formulate an infinite-horizon Markov decision process model and derive key structural properties of the resulting optimal cost function that are sufficient to establish the existence of an optimal threshold-type policy with respect to the system’s deterioration level and cumulative number of repairs. We also explore the sensitivity of the optimal policy to inspection, repair and replacement costs. Numerical examples are presented to illustrate the structure and the sensitivity of the optimal policy.  相似文献   

10.
A generalized EOQ model for deteriorating items is considered here in which the demand rate, deterioration rate, holding cost and ordering cost are all assumed to be continuous functions of time. Shortages are also allowed and are completely backlogged. The planning horizon is finite and the replenishment periods are assumed to be constant. The optimal replenishment policy and the decision rule which minimizes the total system cost are derived. A numerical example is given to illustrate the developed model. Sensitivity analysis is also presented for the given model.  相似文献   

11.
本文研究约束折扣半马氏决策规划问题,即在一折扣期望费用约束下,使折扣期望报酬达最大的约束最优问题,假设状态集可数,行动集为紧的非空Borel集,本文给出了p-约束最优策略的充要条件,证明了在适当的假设条件下必存在p-约束最优策略。  相似文献   

12.
Most existing studies on software release policies use models based on the non-homogeneous Poisson process. In this paper, we discuss a software release policy based on a state space model. The state space model has a Gamma-Gamma type invariant conditional distribution. A cost model that removes errors in software systems and risk cost due to software failure is used. The optimal release time to minimize the expected cost in every test-debugging stage is discussed.  相似文献   

13.
We study a unichain Markov decision process i.e. a controlled Markov process whose state process under a stationary policy is an ergodic Markov chain. Here the state and action spaces are assumed to be either finite or countable. When the state process is uniformly ergodic and the immediate cost is bounded then a policy that minimizes the long-term expected average cost also has an nth stage sample path cost that with probability one is asymptotically less than the nth stage sample path cost under any other non-optimal stationary policy with a larger expected average cost. This is a strengthening in the Markov model case of the a.s. asymptotically optimal property frequently discussed in the literature.  相似文献   

14.
This note deals with Markov decision chains evolving on a denumerable state space. Under standard continuity-compactness requirements, an explicit example is provided to show that, with respect to a strong sample-path average reward criterion, the Lyapunov function condition does not ensure the existence of an optimal stationary policy.  相似文献   

15.
Testing is an important activity in product development. Past studies, which are developed to determine the optimal scheduling of tests, often focused on single-stage testing of sequential design process. This paper presents an analytical model for the scheduling of tests in overlapped design process, where a downstream stage starts before the completion of upstream testing. We derive optimal stopping rules for upstream and downstream stages’ testing, together with the optimal time elapsed between beginning the upstream tests and beginning the downstream development. We find that the cost function is first convex then concave increasing with respect to upstream testing duration. A one-dimensional search algorithm is then proposed for finding the unique optimum that minimizes the overall cost. Moreover, the impact of different model parameters, such as the problem-solving capacity and opportunity cost, on the optimal solution is discussed. Finally, we compare the testing strategies in overlapped process with those in sequential process, and get some additional results. The methodology is illustrated with a case study at a handset design company.  相似文献   

16.
本文研究了在一般状态空间具有平均费用的非平稳Markov决策过程,把在平稳情形用补充的折扣模型的最优方程来建立平均费用的最优方程的结果,推广到非平稳的情形.利用这个结果证明了最优策略的存在性.  相似文献   

17.
Dynamic principal agent models are formulated based on constrained Markov decision process (CMDP), in which conditions are given that the state space of the system is countable and the agent chooses his actions from a countable action set. If the principal has finite alternative contracts to select, it is shown that the optimal contract solution and the corresponding optimal policy can be obtained by linear programming under the discounted criterion and average criterion. The paper is supported by Zhejiang Provincial Nature Science Foundation of China (701017) and by Scientific Research Fund of Zhejiang Provincial Education Department.  相似文献   

18.
In this paper, we are concerned with a new algorithm for multichain finite state Markov decision processes which finds an average optimal policy through the decomposition of the state space into some communicating classes and a transient class. For each communicating class, a relatively optimal policy is found, which is used to find an optimal policy by applying the value iteration algorithm. Using a pattern matrix determining the behaviour pattern of the decision process, the decomposition of the state space is effectively done, so that the proposed algorithm simplifies the structured one given by the excellent Leizarowitz’s paper (Math Oper Res 28:553–586, 2003). Also, a numerical example is given to comprehend the algorithm.  相似文献   

19.
In this paper we study the continuous-time Markov decision processes with a denumerable state space, a Borel action space, and unbounded transition and cost rates. The optimality criterion to be considered is the finite-horizon expected total cost criterion. Under the suitable conditions, we propose a finite approximation for the approximate computations of an optimal policy and the value function, and obtain the corresponding error estimations. Furthermore, our main results are illustrated with a controlled birth and death system.  相似文献   

20.
This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs.The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set.We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy.Then,we prove that the value function satisfies the optimality equation and there exists an optimal(or e-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach.Further we give some properties of optimal policies.In addition,a value iteration algorithm for computing the value function and optimal policies is developed and an example is given.Finally,it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号