期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An optimal inspection and replacement policy under incomplete state information

《European Journal of Operational Research》1986,27(1):117-128

This paper investigates an optimal inspection and replacement problem for a discrete-time Markovian deterioration system. It is assumed that the system is monitored incompletely by a certain mechanism which gives the decision maker some information about the exact state of the system. The problem is to obtain an optimal inspection and replacement policy minimizing the expected total discounted cost over an infinite horizon and formulated as a partially observable Markov decision process. Furthermore, under some reasonable conditions reflecting the practical meaning of the deterioration, it is shown that there exists an optimal inspection and replacement policy in the class of monotonic four-region policies. 相似文献

2.

Continuous time control of Markov processes on an arbitrary state space: Average return criterion

Bharat T. Doshi 《Stochastic Processes and their Applications》1976,4(1):55-77

The paper deals with continuous time Markov decision processes on a fairly general state space. The economic criterion is the long-run average return. A set of conditions is shown to be sufficient for a constant g to be optimal average return and a stationary policy π¹ to be optimal. This condition is shown to be satisfied under appropriate assumptions on the optimal discounted return function. A policy improvement algorithm is proposed and its convergence to an optimal policy is proved. 相似文献

3.

非时齐向量值马氏决策模型

秦叔明张升《应用概率统计》2000,16(1):57-65

章云等了一类报酬函数绝对平均相对有界的非时齐向量值马氏决策模型,得出了一最优策略存在的充分条件,并讨论了强最优和最优的关系,张升等导出了该模型的几个性质。相似文献

4.

MARKOVIAN DECISION PROGRAMMING WITH RECURSIVE VECTOR-REWARD

刘建庸刘克《应用数学学报(英文版)》1990,6(2):158-165

In this paper, we discuss Markovian decision programming with recursive vector-reward andgive an algorithm to find optimal policies. We prove that: (1) There is a Markovian optimal policy for the nonstationary case; (2) Thereis a stationary optimal policy for the stationary case. 相似文献

5.

Estimation and control in multichain processes

Hans-Joachim Girlich A. A. Sokolichin 《Annals of Operations Research》1991,32(1):23-33

This paper considers Markovian decision processes in discrete time with transition probabilities depending on an unknown parameter which may change step by step. In the case of the convergence of such a parameter sequence, a policy maximizing the average expected reward over an infinite future is looked for. Under continuity conditions, the uniform optimality of a policy based on estimation and control for some multichain models is shown. 相似文献

6.

Denumerable state semi-Markov decision processes with unbounded costs,average cost criterion

A. Federgruen A. Hordijk H.C. Tijms 《Stochastic Processes and their Applications》1979,9(2):223-235

This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy. 相似文献

7.

Optimal Claiming in an Automobile Insurance System with Bonus-Malus Structure

J. Kolderman A. Volgenant 《The Journal of the Operational Research Society》1985,36(3):239-247

A motorist involved in an accident will have to decide whether to claim from his insurance company or not when he is at fault. An optimal decision rule can only be determined in the light of future developments and future decisions, since the consequences of claiming or not claiming are felt in the subsequent year's premiums. In this paper, optimal no-claim limits are determined for a common Dutch type of insurance policy with bonus-malus structures, using generalized Markovian programming. The computational results are given for various values of the expected number of accidents per year. 相似文献

8.

Existence of optimal policy for time non-homogeneous discounted Markovian decision programming

Shizhen Guo Zeqing Dong 《应用数学学报(英文版)》1990,6(4):295-307

In this paper we discuss the discrete time non-homogeneous discounted Markovian decision programming, where the state space and all action sets are countable. Suppose that the optimum value function is finite. We give the necessary and sufficient conditions for the existence of an optimal policy. Suppose that the absolute mean of rewards is relatively bounded. We also give the necessary and sufficient conditions for the existence of an optimal policy. 相似文献

9.

Nonstationary denumerable state Markov decision processes – with average variance criterion

Xianping Guo 《Mathematical Methods of Operations Research》1999,49(1):87-96

In this paper, we consider the nonstationary Markov decision processes (MDP, for short) with average variance criterion on a countable state space, finite action spaces and bounded one-step rewards. From the optimality equations which are provided in this paper, we translate the average variance criterion into a new average expected cost criterion. Then we prove that there exists a Markov policy, which is optimal in an original average expected reward criterion, that minimizies the average variance in the class of optimal policies for the original average expected reward criterion. 相似文献

10.

On Markovian decision programming with recursive reward functions

Jianyong Liu Ke Liu 《Annals of Operations Research》1990,24(1):145-164

In this paper, the infinite horizon Markovian decision programming with recursive reward functions is discussed. We show that Bellman's optimal principle is applicable for our model. Then, a sufficient and necessary condition for a policy to be optimal is given. For the stationary case, an iteration algorithm for finding a stationary optimal policy is designed. The algorithm is a generalization of Howard's [7] and Iwamoto's [3] algorithms.This research was supported by the National Natural Science Foundation of China. 相似文献

11.

Optimal Claiming on Vehicle Insurance Revisited

J. M. Norman D. C. S. Shearn 《The Journal of the Operational Research Society》1980,31(2):181-186

Most motorists involved in an accident will claim from their insurance company only if the cost of repair exceeds a certain amount. In this paper, optimal no-claim limits are determined for a common type of insurance policy, and a simple decision rule which might be used by a motorist is shown to have an expected cost very close to the optimal decision rule. 相似文献

12.

Borel状态空间非平稳MDP的平均方差准则

郭先平《数学学报》2001,44(2):333-342

本文考虑具有Ｂｏｒｅｌ状态空间和行动空间非平稳ＭＤＰ的平均方差准则．首先,在遍历条件下,利用最优方程,证明了关于平均期望目标最优马氏策略的存在性．然后,通过构造新的模型,利用马氏过程的理论,进一步证明了在关于平均期望目标是最优的一类马氏策略中,存在一个马氏策略使得平均方差达到最小．作为本文的特例还得到了ＤｙｎｋｉｎＥ．Ｂ．和ＹｕｓｈｋｅｖｉｃｈＡ．Ａ．及ＫｕｒａｎｏＭ．等中的主要结果．相似文献

13.

A variance minimization problem for a Markov decision process

《European Journal of Operational Research》1987,31(1):140-145

In the steady state of a discrete time Markov decision process, we consider the problem to find an optimal randomized policy that minimizes the variance of the reward in a transition among the policies which give the mean not less than a specified value. The problem is solved by introducing a parametric Markov decision process with average cost criterion. It is shown that there exists an optimal policy which is a mixture of at most two pure policies. As an application, the toymaker's problem is discussed. 相似文献

14.

Dynamic Programming and Inventory Problems

Maurice Sasieni 《The Journal of the Operational Research Society》1960,11(1-2):41-49

After an introductory discussion of the usefulness of the technique of dynamic programming in solving practical problems of multi-stage decision processes, the paper describes its application to inventory problems. In particular, the effect of allowing the number of decision stages to increase indefinitely is investigated, and it is shown that under certain realistic conditions this situation can be dealt with. It appears to be generally true that the average cost per period will converge, for an optimal policy, as the number of periods considered increases indefinitely, and that it is feasible to search for the policy which minimizes this long-term average cost. The paper concludes with a specific example, in which it is shown that only eight iterations were necessary to find a reasonable approximation to the optimal re-order policy. 相似文献

15.

Adaptive control of average Markov decision chains under the Lyapunov stability condition

Rolando Cavazos-Cadena 《Mathematical Methods of Operations Research》2001,54(1):63-99

相似文献

16.

Sample path optimality for a Markov optimization problem

《Stochastic Processes and their Applications》2005,115(5):769-779

We study a unichain Markov decision process i.e. a controlled Markov process whose state process under a stationary policy is an ergodic Markov chain. Here the state and action spaces are assumed to be either finite or countable. When the state process is uniformly ergodic and the immediate cost is bounded then a policy that minimizes the long-term expected average cost also has an nth stage sample path cost that with probability one is asymptotically less than the nth stage sample path cost under any other non-optimal stationary policy with a larger expected average cost. This is a strengthening in the Markov model case of the a.s. asymptotically optimal property frequently discussed in the literature. 相似文献

17.

Short-term booking of air cargo space

《European Journal of Operational Research》2006,174(3):1979-1990

This paper proposes a stochastic dynamic programming model for a short-term capacity planning model for air cargo space. Long-term cargo space is usually acquired by freight forwarders or shippers many months ahead on a contract basis, and usually the forecasted demand is unreliable. A re-planning of cargo space is needed when the date draws nearer to the flight departure time. Hence, for a given amount of long-term contract space, the decision for each stage is the quantity of additional space required for the next stage and the decision planning model evaluates the optimal cost policy based on the economic trade-off between the cost of backlogged shipment and the cost of acquiring additional cargo space. Under certain conditions, we show that the return function is convex with respect to the additional space acquired for a given state and the optimal expected cost for the remaining stages is an increasing convex function with respect to the state variables. These two properties can be carried backward recursively and therefore the optimal cost policy can be determined efficiently. 相似文献

18.

Characterizing optimal empty container reposition policy in periodic-review shuttle service systems

D-P Song 《The Journal of the Operational Research Society》2007,58(1):122-133

This paper considers a periodic-review shuttle service system with random customer demands and finite reposition capacity. The objective is to find the optimal stationary policy of empty container reposition by minimizing the sum of container leasing cost, inventory cost and reposition cost. Using Markov decision process approach, the structures of the optimal stationary policies for both expected discounted cost and long-run average cost are completely characterized. Monotonic and asymptotic behaviours of the optimal policy are established. By taking advantage of special structure of the optimal policy, the stationary distribution of the system states is obtained, which is then used to compute interesting steady-state performance measures and implement the optimal policy. Numerical examples are given to demonstrate the results. 相似文献

19.

An optimality principle for Markovian decision processes

Paul J Schweitzer Bezalel Gavish 《Journal of Mathematical Analysis and Applications》1976,54(1):173-184

The following optimality principle is established for finite undiscounted or discounted Markov decision processes: If a policy is (gain, bias, or discounted) optimal in one state, it is also optimal for all states reachable from this state using this policy. The optimality principle is used constructively to demonstrate the existence of a policy that is optimal in every state, and then to derive the coupled functional equations satisfied by the optimal return vectors. This reverses the usual sequence, where one first establishes (via policy iteration or linear programming) the solvability of the coupled functional equations, and then shows that the solution is indeed the optimal return vector and that the maximizing policy for the functional equations is optimal for every state. 相似文献

20.

Average optimality for continuous-time Markov decision processes with a policy iteration approach

Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704

This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献