首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs.The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set.We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy.Then,we prove that the value function satisfies the optimality equation and there exists an optimal(or e-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach.Further we give some properties of optimal policies.In addition,a value iteration algorithm for computing the value function and optimal policies is developed and an example is given.Finally,it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes.  相似文献   

2.
In this paper, we consider a mean–variance optimization problem for Markov decision processes (MDPs) over the set of (deterministic stationary) policies. Different from the usual formulation in MDPs, we aim to obtain the mean–variance optimal policy that minimizes the variance over a set of all policies with a given expected reward. For continuous-time MDPs with the discounted criterion and finite-state and action spaces, we prove that the mean–variance optimization problem can be transformed to an equivalent discounted optimization problem using the conditional expectation and Markov properties. Then, we show that a mean–variance optimal policy and the efficient frontier can be obtained by policy iteration methods with a finite number of iterations. We also address related issues such as a mutual fund theorem and illustrate our results with an example.  相似文献   

3.
This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy.  相似文献   

4.
非负费用折扣半马氏决策过程   总被引:1,自引:0,他引:1  
黄永辉  郭先平 《数学学报》2010,53(3):503-514
本文考虑可数状态非负费用的折扣半马氏决策过程.首先在给定半马氏决策核和策略下构造一个连续时间半马氏决策过程,然后用最小非负解方法证明值函数满足最优方程和存在ε-最优平稳策略,并进一步给出最优策略的存在性条件及其一些性质.最后,给出了值迭代算法和一个数值算例.  相似文献   

5.
We consider a simple problem in the optimal control of Brownian Motion. There are two modes of control available, each with its own drift and diffusion coefficients, and switching costs are incurred whenever the control mode is changed. Finally, holding costs are incurred according to a quadratic function of the state of the system, and all costs are continuously discounted. It is shown that there exists an optimal policy involving just two critical numbers, and formulas are given for computation of the critical numbers.  相似文献   

6.
郭先平  戴永隆 《数学学报》2002,45(1):171-182
本文考虑的是转移速率族任意且费用率函数可能无界的连续时间马尔可夫决策过程的折扣模型.放弃了传统的要求相应于每个策略的 Q -过程唯一等条件,而首次考虑相应每个策略的 Q -过程不一定唯一, 转移速率族也不一定保守, 费用率函数可能无界, 且允许行动空间非空任意的情形. 本文首次用"α-折扣费用最优不等式"更新了传统的α-折扣费用最优方程,并用"最优不等式"和新的方法,不仅证明了传统的主要结果即最优平稳策略的存在性, 而且还进一步探讨了( ∈>0  )-最优平稳策略,具有单调性质的最优平稳策略, 以及(∈≥0) -最优决策过程的存在性, 得到了一些有意义的新结果. 最后, 提供了一个迁移率受控的生灭系统例子, 它满足本文的所有条件, 而传统的假设(见文献[1-14])均不成立.  相似文献   

7.
Finite and infinite planning horizon Markov decision problems are formulated for a class of jump processes with general state and action spaces and controls which are measurable functions on the time axis taking values in an appropriate metrizable vector space. For the finite horizon problem, the maximum expected reward is the unique solution, which exists, of a certain differential equation and is a strongly continuous function in the space of upper semi-continuous functions. A necessary and sufficient condition is provided for an admissible control to be optimal, and a sufficient condition is provided for the existence of a measurable optimal policy. For the infinite horizon problem, the maximum expected total reward is the fixed point of a certain operator on the space of upper semi-continuous functions. A stationary policy is optimal over all measurable policies in the transient and discounted cases as well as, with certain added conditions, in the positive and negative cases.  相似文献   

8.
In this paper, applying the technique of diffusion approximation to an M/G/1 queuing system with removable server, we provide a robust approximation model for determining an optimal operating policy of the system. The following costs are incurred to the system: costs per hour for keeping the server on or off, fixed costs for turning the server on or off, and a holding cost per customer per hour. The expected discounted cost is used as a criterion for optimality. Using a couple of independent diffusion processes approximating the number of customers in the system, we derive approximation formulae of the expected discounted cost that do not depend on the service time distribution but its first two moments. Some new results on the characterization of the optimal operating policy are provided from these results. Moreover, in order to examine the accuracy of the approximation, they are numerically compared with the exact results.  相似文献   

9.
We consider continuous-time Markov decision processes in Polish spaces. The performance of a control policy is measured by the expected discounted reward criterion associated with state-dependent discount factors. All underlying Markov processes are determined by the given transition rates which are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. By using the dynamic programming approach, we establish the discounted reward optimality equation (DROE) and the existence and uniqueness of its solutions. Under suitable conditions, we also obtain a discounted optimal stationary policy which is optimal in the class of all randomized stationary policies. Moreover, when the transition rates are uniformly bounded, we provide an algorithm to compute (or?at least to approximate) the discounted reward optimal value function as well as a discounted optimal stationary policy. Finally, we use an example to illustrate our results. Specially, we first derive an explicit and exact solution to the DROE and an explicit expression of a discounted optimal stationary policy for such an example.  相似文献   

10.
We provide weak sufficient conditions for a full-service policy to be optimal in a queueing control problem in which the service rate is a dynamic decision variable. In our model there are service costs and holding costs and the objective is to minimize the expected total discounted cost over an infinite horizon. We begin with a semi-Markov decision model for a single-server queue with exponentially distributed inter-arrival and service times. Then we present a general model with weak probabilistic assumptions and demonstrate that the full-service policy minimizes both finite-horizon and infinite-horizon total discounted cost on each sample path.  相似文献   

11.
This paper addresses the problem of determining stock replenishment policies to meet the demand for spare parts for items of equipment which are no longer manufactured. The assumptions that the number of items still in use is decreasing and that parts fail randomly lend credence to a Poisson demand process with an underlying mean which is decreasing exponentially. We use a dynamic programming formulation in continuous time to determine that replenishment policy which minimises the mean total discounted cost of set-up/order, unit production/purchase, unsatisfied demand and stock left over at the end of the time horizon.  相似文献   

12.
We considerM transmitting stations sending packets to a single receiver over a slotted time-multiplexed link. For each phase consisting ofT consecutive slots, the receiver dynamically allocates these slots among theM transmitters. Our objective is to characterize policies that minimize the long-term average of the total number of messages awaiting service at theM transmitters. We establish necessary and sufficient conditions on the arrival processes at the transmitters for the existence of finite cost time-average policies; it is not enough that the average arrival rate is strictly less than the slot capacity. We construct a pure strategy that attains a finite average cost under these conditions. This in turn leads to the existence of an optimal time-average pure policy for each phase lengthT, and to upper and lower bounds on the cost this policy achieves. Furthermore, we show that such an optimal time-average policy has the same properties as those of optimal discounted policies investigated by the authors in a previous paper. Finally, we prove that in the absence of costs accrued by messages within the phase, there exists a policy such that the time-average cost tends toward zero as the phase lengthT.  相似文献   

13.
This paper investigates the problem of the optimal switching among a finite number of Markov processes, generalizing some of the author's earlier results for controlled one-dimensional diffusion. Under rather general conditions, it is shown that the optimal discounted cost function is the unique solution of a functional equation. Under more restrictive assumptions, this function is shown to be the unique solution of some quasi-variational inequalities. These assumptions are verified for a large class of control problems. For controlled Markov chains and controlled one-dimensional diffusion, the existence of a stationary optimal policy is established. Finally, a policy iteration method is developed to calculate an optimal stationary policy, if one exists.This research was sponsored by the Air Force Office of Scientific Research (AFSC), United States Air Force, under Contract No. F-49620-79-C-0165.The author would like to thank the referee for bringing Refs. 7, 8, and 9 to his attention.  相似文献   

14.
In the classical Cram\'{e}r-Lundberg model in risk theory the problem of finding the optimal dividend strategy and optimal dividend return function is a widely discussed topic. In the present paper, we discuss the problem of maximizing the expected discounted net dividend payments minus the expected discounted costs of injecting new capital, in the Cram\'{e}r-Lundberg model with proportional taxes and fixed transaction costs imposed each time the dividend is paid out and with both fixed and proportional transaction costs incurred each time the capital injection is made. Negative surplus or ruin is not allowed. By solving the corresponding quasi-variational inequality, we obtain the analytical solution of the optimal return function and the optimal joint dividend and capital injection strategy when claims are exponentially distributed.  相似文献   

15.
董泽清 《数学学报》1978,21(2):135-150
我们涉及的折扣马氏决策规划(有些著者称为马氏决策过程),具有状态空问与每个状态可用的决策集均为可数无穷集、次随机转移律族、有界报酬函数.给出了一个求(ε_)最优平稳策略的加速收敛逐次逼近算法,比White的逐次逼近算法更快地收敛于(ε_)最优解,并配合有非最优策略的检验准则,使算法更加得益. 设β为折扣因子,一般说β(或(ε,β))_最优平稳策略,往往是非唯一的,甚至与平稳策略类包含的策略数一样多.我们自然希望在诸β(或(ε,β))_最优平稳策略中寻求方差齐次地(关于初始状态)达(ε_)最小的策略.我们证明了这种策略确实存在,并给出了获得这种策略的算法.  相似文献   

16.
研究了跳服从Erlang(n)分布,随机观察时服从指数分布的对偶风险模型.假设在边值策略下红利分发只在观察时发生,建立了红利期望贴现函数V(u;b)的微积分方程组.给出了当收益额服从PH(m)分布时V(u;b)的解析解.探讨了当收益额服从指数分布时V(u;b)的具体求解方法.  相似文献   

17.
本文研究约束折扣半马氏决策规划问题,即在一折扣期望费用约束下,使折扣期望报酬达最大的约束最优问题,假设状态集可数,行动集为紧的非空Borel集,本文给出了p-约束最优策略的充要条件,证明了在适当的假设条件下必存在p-约束最优策略。  相似文献   

18.
This note describes sufficient conditions under which total-cost and average-cost Markov decision processes (MDPs) with general state and action spaces, and with weakly continuous transition probabilities, can be reduced to discounted MDPs. For undiscounted problems, these reductions imply the validity of optimality equations and the existence of stationary optimal policies. The reductions also provide methods for computing optimal policies. The results are applied to a capacitated inventory control problem with fixed costs and lost sales.  相似文献   

19.
Yi Zhang 《TOP》2013,21(2):378-408
In this paper we develop the convex analytic approach to a discounted discrete-time Markov decision process (DTMDP) in Borel state and action spaces with N constraints. Unlike the classic discounted models, we allow a non-constant discount factor. After defining and characterizing the corresponding occupation measures, the original constrained DTMDP is written as a convex program in the space of occupation measures, whose compactness and convexity we show. In particular, we prove that every extreme point of the space of occupation measures can be generated by a deterministic stationary policy for the DTMDP. For the resulting convex program, we prove that it admits a solution that can be expressed as a convex combination of N+1 extreme points of the space of occupation measures. One of its consequences is the existence of a randomized stationary optimal policy for the original constrained DTMDP.  相似文献   

20.
《Indagationes Mathematicae》2023,34(5):1181-1205
We consider the impulse control of Lévy processes under the infinite horizon, discounted cost criterion. Our motivating example is the cash management problem in which a controller is charged a fixed plus proportional cost for adding to or withdrawing from his/her reserve, plus an opportunity cost for keeping any cash on hand. Our main result is to provide a verification theorem for the optimality of control band policies in this scenario. We also analyze the transient and steady-state behavior of the controlled process under control band policies and explicitly solve for an optimal policy in the case in which the Lévy process to be controlled is the sum of a Brownian motion with drift and a compound Poisson process with exponentially distributed jump sizes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号