期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A convex analytic approach to Markov decision processes

Vivek S. Borkar 《Probability Theory and Related Fields》1988,78(4):583-602

Summary This paper develops a new framework for the study of Markov decision processes in which the control problem is viewed as an optimization problem on the set of canonically induced measures on the trajectory space of the joint state and control process. This set is shown to be compact convex. One then associates with each of the usual cost criteria (infinite horizon discounted cost, finite horizon, control up to an exit time) a naturally defined occupation measure such that the cost is an integral of some function with respect to this measure. These measures are shown to form a compact convex set whose extreme points are characterized. Classical results about existence of optimal strategies are recovered from this and several applications to multicriteria and constrained optimization problems are briefly indicated.Research supported by NSF Grant CDR-85-00108 相似文献

2.

非时齐部分可观察Markov决策规划的最优策略问题

张继红郭世贞章芸《运筹学学报》2004,8(2):81-87

本文讨论了一类非时齐部分可观察Markov决策模型．在不改变状态空间可列性的条件下，把该模型转化为[5]中的一般化折扣模型，从而解决了其最优策略问题，并且得到了该模型的有限阶段逼近算法，其中该算法涉及的状态是可列的．相似文献

3.

An inventory model for deteriorating items with stock-dependent consumption rate and shortages under inflation and time discounting

《European Journal of Operational Research》2006,168(2):463-474

This paper derives an inventory model for deteriorating items with stock-dependent consumption rate and shortages under inflation and time discounting over a finite planning horizon. We show that the total cost function is convex. With the convexity, a simple solution algorithm is presented to determine the optimal order quantity and the optimal interval of the total cost function. The results are discussed with a numerical example and particular cases of the model are discussed in brief. A sensitivity analysis of the optimal solution with respect to the parameters of the system is carried out. 相似文献

4.

Continuous time Markovian decision processes average return criterion

Prasadarao Kakumanu 《Journal of Mathematical Analysis and Applications》1975,52(1):173-188

Continuous time Markovian decision models with countable state space are investigated. The existence of an optimal stationary policy is established for the expected average return criterion function. It is shown that the expected average return can be expressed as an expected discounted return of a related Markovian decision process. A policy iteration method is given which converges to an optimal deterministic policy, the policy so obtained is shown optimal over all Markov policies. 相似文献

5.

具有库存损耗且允许缺货的EOQ模型 总被引：2，自引：0，他引：2

罗键邱洪全董维维《数学的实践与认识》2007,37(6):7-10

提出了一种库存损耗量随时间和库存量变化,且允许缺货的EOQ模型,证明了该模型的平均总费用函数在给定条件下为凸函数,并讨论了模型的最优策略及近似解. 相似文献

6.

Mean-Field Pontryagin Maximum Principle

Mattia Bongini Massimo Fornasier Francesco Rossi Francesco Solombrino 《Journal of Optimization Theory and Applications》2017,174(1):1-4

This paper provides a characterization of the optimal average cost function, when the long-run (risk-sensitive) average cost criterion is used. The Markov control model has a denumerable state space with finite set of actions, and the characterization presented is given in terms of a system of local Poisson equations, which gives as a by-product the existence of an optimal stationary policy. 相似文献

7.

Constrained markov decision processes with compact state and action spaces: the average case

《Optimization》2012,61(2):255-269

Constrained Markov decision processes with compact state and action spaces are studied under long-run average reward or cost criteria. By introducing a corresponding Lagrange function, a saddle-point theorem is given, by which the existence of a constrained optimal pair of initial state distribution and policy is shown. Also, under the hypothesis of Doeblin, the functional characterization of a constrained optimal policy is obtained 相似文献

8.

Flexible capacity strategy with multiple market periods under demand uncertainty and investment constraint

Liu Yang C.T. Ng 《European Journal of Operational Research》2014

We establish a flexible capacity strategy model with multiple market periods under demand uncertainty and investment constraints. In the model, a firm makes its capacity decision under a financial budget constraint at the beginning of the planning horizon which embraces n market periods. In each market period, the firm goes through three decision-making stages: the safety production stage, the additional production stage and the optimal sales stage. We formulate the problem and obtain the optimal capacity, the optimal safety production, the optimal additional production and the optimal sales of each market period under different situations. We find that there are two thresholds for the unit capacity cost. When the capacity cost is very low, the optimal capacity is determined by its financial budget; when the capacity cost is very high, the firm keeps its optimal capacity at its safety production level; and when the cost is in between of the two thresholds, the optimal capacity is determined by the capacity cost, the number of market periods and the unit cost of additional production. Further, we explore the endogenous safety production level. We verify the conditions under which the firm has different optimal safety production levels. Finally, we prove that the firm can benefit from the investment only when the designed planning horizon is longer than a threshold. Moreover, we also derive the formulae for the above three thresholds. 相似文献

9.

Optimally maintaining a Markovian deteriorating system with limited imperfect repairs

Murat Kurt Jeffrey P. Kharoufeh 《European Journal of Operational Research》2010

We consider the problem of optimally maintaining a periodically inspected system that deteriorates according to a discrete-time Markov process and has a limit on the number of repairs that can be performed before it must be replaced. After each inspection, a decision maker must decide whether to repair the system, replace it with a new one, or leave it operating until the next inspection, where each repair makes the system more susceptible to future deterioration. If the system is found to be failed at an inspection, then it must be either repaired or replaced with a new one at an additional penalty cost. The objective is to minimize the total expected discounted cost due to operation, inspection, maintenance, replacement and failure. We formulate an infinite-horizon Markov decision process model and derive key structural properties of the resulting optimal cost function that are sufficient to establish the existence of an optimal threshold-type policy with respect to the system’s deterioration level and cumulative number of repairs. We also explore the sensitivity of the optimal policy to inspection, repair and replacement costs. Numerical examples are presented to illustrate the structure and the sensitivity of the optimal policy. 相似文献

10.

An EOQ Model for Deteriorating Items with Time Varying Demand and Costs

B. C. Giri A. Goswami K. S. Chaudhuri 《The Journal of the Operational Research Society》1996,47(11):1398-1405

A generalized EOQ model for deteriorating items is considered here in which the demand rate, deterioration rate, holding cost and ordering cost are all assumed to be continuous functions of time. Shortages are also allowed and are completely backlogged. The planning horizon is finite and the replenishment periods are assumed to be constant. The optimal replenishment policy and the decision rule which minimizes the total system cost are derived. A numerical example is given to illustrate the developed model. Sensitivity analysis is also presented for the given model. 相似文献

11.

约束折扣半马氏决策规划

胡光华张升《应用数学学报》1997,20(2):187-195

本文研究约束折扣半马氏决策规划问题，即在一折扣期望费用约束下，使折扣期望报酬达最大的约束最优问题，假设状态集可数，行动集为紧的非空Ｂｏｒｅｌ集，本文给出了ｐ－约束最优策略的充要条件，证明了在适当的假设条件下必存在ｐ－约束最优策略。相似文献

12.

A sequential software release policy

Yen-Chang Chang 《Annals of the Institute of Statistical Mathematics》2004,56(1):193-204

Most existing studies on software release policies use models based on the non-homogeneous Poisson process. In this paper, we discuss a software release policy based on a state space model. The state space model has a Gamma-Gamma type invariant conditional distribution. A cost model that removes errors in software systems and risk cost due to software failure is used. The optimal release time to minimize the expected cost in every test-debugging stage is discussed. 相似文献

13.

Sample path optimality for a Markov optimization problem

《Stochastic Processes and their Applications》2005,115(5):769-779

We study a unichain Markov decision process i.e. a controlled Markov process whose state process under a stationary policy is an ergodic Markov chain. Here the state and action spaces are assumed to be either finite or countable. When the state process is uniformly ergodic and the immediate cost is bounded then a policy that minimizes the long-term expected average cost also has an nth stage sample path cost that with probability one is asymptotically less than the nth stage sample path cost under any other non-optimal stationary policy with a larger expected average cost. This is a strengthening in the Markov model case of the a.s. asymptotically optimal property frequently discussed in the literature. 相似文献

14.

A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion

Rolando Cavazos-Cadena Raúl Montes-de-Oca Karel Sladký 《Journal of Optimization Theory and Applications》2014,163(2):674-684

This note deals with Markov decision chains evolving on a denumerable state space. Under standard continuity-compactness requirements, an explicit example is provided to show that, with respect to a strong sample-path average reward criterion, the Lyapunov function condition does not ensure the existence of an optimal stationary policy. 相似文献

15.

Optimal testing strategies in overlapped design process

Yanjun Qian Min Xie Thong Ngee Goh Jun Lin 《European Journal of Operational Research》2010

Testing is an important activity in product development. Past studies, which are developed to determine the optimal scheduling of tests, often focused on single-stage testing of sequential design process. This paper presents an analytical model for the scheduling of tests in overlapped design process, where a downstream stage starts before the completion of upstream testing. We derive optimal stopping rules for upstream and downstream stages’ testing, together with the optimal time elapsed between beginning the upstream tests and beginning the downstream development. We find that the cost function is first convex then concave increasing with respect to upstream testing duration. A one-dimensional search algorithm is then proposed for finding the unique optimum that minimizes the overall cost. Moreover, the impact of different model parameters, such as the problem-solving capacity and opportunity cost, on the optimal solution is discussed. Finally, we compare the testing strategies in overlapped process with those in sequential process, and get some additional results. The methodology is illustrated with a case study at a handset design company. 相似文献

16.

具有平均费用的非平稳Markov决策过程

魏力仁《经济数学》1995,(1)

本文研究了在一般状态空间具有平均费用的非平稳Ｍａｒｋｏｖ决策过程，把在平稳情形用补充的折扣模型的最优方程来建立平均费用的最优方程的结果，推广到非平稳的情形．利用这个结果证明了最优策略的存在性．相似文献

17.

Dynamic principal agent model based on CMDP

Yuanyao?Ding Email author Rangcheng?Jia Shaoxiang?Tang 《Mathematical Methods of Operations Research》2003,58(1):149-157

Dynamic principal agent models are formulated based on constrained Markov decision process (CMDP), in which conditions are given that the state space of the system is countable and the agent chooses his actions from a countable action set. If the principal has finite alternative contracts to select, it is shown that the optimal contract solution and the corresponding optimal policy can be obtained by linear programming under the discounted criterion and average criterion. The paper is supported by Zhejiang Provincial Nature Science Foundation of China (701017) and by Scientific Research Fund of Zhejiang Provincial Education Department. 相似文献

18.

A structured pattern matrix algorithm for multichain Markov decision processes

Tetsuichiro Iki Masayuki Horiguchi Masami Kurano 《Mathematical Methods of Operations Research》2007,66(3):545-555

In this paper, we are concerned with a new algorithm for multichain finite state Markov decision processes which finds an average optimal policy through the decomposition of the state space into some communicating classes and a transient class. For each communicating class, a relatively optimal policy is found, which is used to find an optimal policy by applying the value iteration algorithm. Using a pattern matrix determining the behaviour pattern of the decision process, the decomposition of the state space is effectively done, so that the proposed algorithm simplifies the structured one given by the excellent Leizarowitz’s paper (Math Oper Res 28:553–586, 2003). Also, a numerical example is given to comprehend the algorithm. 相似文献

19.

Finite approximation for finite-horizon continuous-time Markov decision processes

Qingda Wei 《4OR: A Quarterly Journal of Operations Research》2017,15(1):67-84

In this paper we study the continuous-time Markov decision processes with a denumerable state space, a Borel action space, and unbounded transition and cost rates. The optimality criterion to be considered is the finite-horizon expected total cost criterion. Under the suitable conditions, we propose a finite approximation for the approximate computations of an optimal policy and the value function, and obtain the corresponding error estimations. Furthermore, our main results are illustrated with a controlled birth and death system. 相似文献

20.

First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs

Yong-hui Huang Guo Xian-ping 《应用数学学报(英文版)》2011,27(2):177-190

This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs.The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set.We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy.Then,we prove that the value function satisfies the optimality equation and there exists an optimal(or e-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach.Further we give some properties of optimal policies.In addition,a value iteration algorithm for computing the value function and optimal policies is developed and an example is given.Finally,it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes. 相似文献