首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
This paper investigates an optimal inspection and replacement problem for a discrete-time Markovian deterioration system. It is assumed that the system is monitored incompletely by a certain mechanism which gives the decision maker some information about the exact state of the system. The problem is to obtain an optimal inspection and replacement policy minimizing the expected total discounted cost over an infinite horizon and formulated as a partially observable Markov decision process. Furthermore, under some reasonable conditions reflecting the practical meaning of the deterioration, it is shown that there exists an optimal inspection and replacement policy in the class of monotonic four-region policies.  相似文献   

2.
The paper deals with continuous time Markov decision processes on a fairly general state space. The economic criterion is the long-run average return. A set of conditions is shown to be sufficient for a constant g to be optimal average return and a stationary policy π1 to be optimal. This condition is shown to be satisfied under appropriate assumptions on the optimal discounted return function. A policy improvement algorithm is proposed and its convergence to an optimal policy is proved.  相似文献   

3.
章云等了一类报酬函数绝对平均相对有界的非时齐向量值马氏决策模型,得出了一最优策略存在的充分条件,并讨论了强最优和最优的关系,张升等导出了该模型的几个性质。  相似文献   

4.
In this paper, we discuss Markovian decision programming with recursive vector-reward andgive an algorithm to find optimal policies. We prove that: (1) There is a Markovian optimal policy for the nonstationary case; (2) Thereis a stationary optimal policy for the stationary case.  相似文献   

5.
This paper considers Markovian decision processes in discrete time with transition probabilities depending on an unknown parameter which may change step by step. In the case of the convergence of such a parameter sequence, a policy maximizing the average expected reward over an infinite future is looked for. Under continuity conditions, the uniform optimality of a policy based on estimation and control for some multichain models is shown.  相似文献   

6.
This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy.  相似文献   

7.
A motorist involved in an accident will have to decide whether to claim from his insurance company or not when he is at fault. An optimal decision rule can only be determined in the light of future developments and future decisions, since the consequences of claiming or not claiming are felt in the subsequent year's premiums. In this paper, optimal no-claim limits are determined for a common Dutch type of insurance policy with bonus-malus structures, using generalized Markovian programming. The computational results are given for various values of the expected number of accidents per year.  相似文献   

8.
In this paper we discuss the discrete time non-homogeneous discounted Markovian decision programming, where the state space and all action sets are countable. Suppose that the optimum value function is finite. We give the necessary and sufficient conditions for the existence of an optimal policy. Suppose that the absolute mean of rewards is relatively bounded. We also give the necessary and sufficient conditions for the existence of an optimal policy.  相似文献   

9.
In this paper, we consider the nonstationary Markov decision processes (MDP, for short) with average variance criterion on a countable state space, finite action spaces and bounded one-step rewards. From the optimality equations which are provided in this paper, we translate the average variance criterion into a new average expected cost criterion. Then we prove that there exists a Markov policy, which is optimal in an original average expected reward criterion, that minimizies the average variance in the class of optimal policies for the original average expected reward criterion.  相似文献   

10.
In this paper, the infinite horizon Markovian decision programming with recursive reward functions is discussed. We show that Bellman's optimal principle is applicable for our model. Then, a sufficient and necessary condition for a policy to be optimal is given. For the stationary case, an iteration algorithm for finding a stationary optimal policy is designed. The algorithm is a generalization of Howard's [7] and Iwamoto's [3] algorithms.This research was supported by the National Natural Science Foundation of China.  相似文献   

11.
Most motorists involved in an accident will claim from their insurance company only if the cost of repair exceeds a certain amount. In this paper, optimal no-claim limits are determined for a common type of insurance policy, and a simple decision rule which might be used by a motorist is shown to have an expected cost very close to the optimal decision rule.  相似文献   

12.
郭先平 《数学学报》2001,44(2):333-342
本文考虑具有 Borel状态空间和行动空间非平稳 MDP的平均方差准则.首先,在遍历条件下,利用最优方程,证明了关于平均期望目标最优马氏策略的存在性.然后,通过构造新的模型,利用马氏过程的理论,进一步证明了在关于平均期望目标是最优的一类马氏策略中,存在一个马氏策略使得平均方差达到最小.作为本文的特例还得到了 Dynkin E. B.和 Yushkevich A. A.及 Kurano M.等中的主要结果.  相似文献   

13.
In the steady state of a discrete time Markov decision process, we consider the problem to find an optimal randomized policy that minimizes the variance of the reward in a transition among the policies which give the mean not less than a specified value. The problem is solved by introducing a parametric Markov decision process with average cost criterion. It is shown that there exists an optimal policy which is a mixture of at most two pure policies. As an application, the toymaker's problem is discussed.  相似文献   

14.
After an introductory discussion of the usefulness of the technique of dynamic programming in solving practical problems of multi-stage decision processes, the paper describes its application to inventory problems. In particular, the effect of allowing the number of decision stages to increase indefinitely is investigated, and it is shown that under certain realistic conditions this situation can be dealt with. It appears to be generally true that the average cost per period will converge, for an optimal policy, as the number of periods considered increases indefinitely, and that it is feasible to search for the policy which minimizes this long-term average cost. The paper concludes with a specific example, in which it is shown that only eight iterations were necessary to find a reasonable approximation to the optimal re-order policy.  相似文献   

15.
16.
We study a unichain Markov decision process i.e. a controlled Markov process whose state process under a stationary policy is an ergodic Markov chain. Here the state and action spaces are assumed to be either finite or countable. When the state process is uniformly ergodic and the immediate cost is bounded then a policy that minimizes the long-term expected average cost also has an nth stage sample path cost that with probability one is asymptotically less than the nth stage sample path cost under any other non-optimal stationary policy with a larger expected average cost. This is a strengthening in the Markov model case of the a.s. asymptotically optimal property frequently discussed in the literature.  相似文献   

17.
This paper proposes a stochastic dynamic programming model for a short-term capacity planning model for air cargo space. Long-term cargo space is usually acquired by freight forwarders or shippers many months ahead on a contract basis, and usually the forecasted demand is unreliable. A re-planning of cargo space is needed when the date draws nearer to the flight departure time. Hence, for a given amount of long-term contract space, the decision for each stage is the quantity of additional space required for the next stage and the decision planning model evaluates the optimal cost policy based on the economic trade-off between the cost of backlogged shipment and the cost of acquiring additional cargo space. Under certain conditions, we show that the return function is convex with respect to the additional space acquired for a given state and the optimal expected cost for the remaining stages is an increasing convex function with respect to the state variables. These two properties can be carried backward recursively and therefore the optimal cost policy can be determined efficiently.  相似文献   

18.
This paper considers a periodic-review shuttle service system with random customer demands and finite reposition capacity. The objective is to find the optimal stationary policy of empty container reposition by minimizing the sum of container leasing cost, inventory cost and reposition cost. Using Markov decision process approach, the structures of the optimal stationary policies for both expected discounted cost and long-run average cost are completely characterized. Monotonic and asymptotic behaviours of the optimal policy are established. By taking advantage of special structure of the optimal policy, the stationary distribution of the system states is obtained, which is then used to compute interesting steady-state performance measures and implement the optimal policy. Numerical examples are given to demonstrate the results.  相似文献   

19.
The following optimality principle is established for finite undiscounted or discounted Markov decision processes: If a policy is (gain, bias, or discounted) optimal in one state, it is also optimal for all states reachable from this state using this policy. The optimality principle is used constructively to demonstrate the existence of a policy that is optimal in every state, and then to derive the coupled functional equations satisfied by the optimal return vectors. This reverses the usual sequence, where one first establishes (via policy iteration or linear programming) the solvability of the coupled functional equations, and then shows that the solution is indeed the optimal return vector and that the maximizing policy for the functional equations is optimal for every state.  相似文献   

20.
This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号