首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Unichain classification problem detects whether a finite state and action MDP is unichain under all deterministic policies. This problem is NP-hard. This paper provides polynomial algorithms for this problem when there is a state that is either recurrent under all deterministic policies or absorbing under some action.  相似文献   

2.
One of the most fundamental results in inventory theoryis the optimality of (s, S) policy for inventory systems withsetup cost. This result is established based on a key assumptionof infinite production/ordering capacity. Several studies haveshown that, when there is a finite production/ordering capacity,the optimal policy for the inventory system is very complicatedand indeed, only partial characterization for the optimal policyis possible. In this paper, we consider a continuous reviewinventory system with finite production/ordering capacity andsetup cost, and show that the optimal control policy for thissystem has a very simple structure. We also develop efficientalgorithms to compute the optimal control parameters.  相似文献   

3.
The aim of the paper is to show that Lyapunov-like ergodicity conditions on Markov decision processes with Borel state space and possibly unbounded cost provide the approximation of an average cost optimal policy by solvingn-stage optimization problems (n = 1, 2, ...). The used approach ensures the exponential rate of convergence. The approximation of this type would be useful to find adaptive procedures of control and to estimate stability of an optimal control under disturbances of the transition probability.Research supported in part by Consejo Nacional de Ciencia y Tecnologia (CONACYT) under grant 0635P-E9506.Research supported by Fondo del Sistema de Investigatión del Mar de Cortés under Grant SIMAC/94/CT-005.  相似文献   

4.
Critical resources are often shared among different classes of customers. Capacity reservation allows each class of customers to better manage priorities of its customers but might lead to unused capacity. Unused capacity can be avoided or reduced by advance cancelation. This paper addresses the service capacity reservation for a given class of customers. The reservation process is characterized by: contracted time slots (CTS) reserved for the class of customers, requests for lengthy regular time slots (RTS) and two advance cancelation modes to cancel CTS one-period or two-period before. The optimal control under a given contract is formulated as an average cost Markov Decision Process (MDP) in order to minimize customer waiting times, unused CTS and CTS cancelation. Structural properties of optimal control policies are established via the corresponding discounted cost MDP problem. Numerical results show that two-period advance CTS cancelation can significantly improve the contract-based solution.  相似文献   

5.
研究了仓库容量可以控制的、基于折扣准则的多周期随机存贮模型.利用马氏决策过程(MDP)的方法,建立了最小折现成本所满足的最优方程,在此基础上,得到了一个(Ut*,yt*(b))结构的最优策略:当仓库容量小于Ut*时将容量扩充到Ut*,并订货至Ut*;否则保持仓库容量不变,且当存贮量小于yt*(b)时订货到yt*(b),反之不订货.  相似文献   

6.
In this paper, we consider a supply chain with one manufacturer, one retailer, and some online customers. In addition to supplying the retailer, manufacturers may selectively take orders from individuals online. Through the Markov Decision Process, we explore the optimal production and availability policy for a manufacturer to determine whether to produce one more unit of products and whether to indicate “in stock” or “out of stock” on website. We measure the benefits and influences of adding online customers with and without the retailer’s inventory information sharing. We also simulate the production and availability policy via a myopic method, which can be implemented easily in the real world. Prediction of simple switching functions for the production and availability is proposed. We find the information sharing, production capacity and unit profit from online orders are the primary factors influencing manufacturer profits and optimal policy. The manufacturer might reserve 50% production capacity for contractual orders from the retailer and devote the remaining capacity to selective orders from spontaneous online customers.  相似文献   

7.
《Optimization》2012,61(2):121-131
This paper discusses a general bulk service queue which falls into the Markov renewal class. Applying an analysis similar to the one by Hunter (1983) for M/M1/N type of feedback queues, certain properties of discrete and continuous time queue length processe are studied here. The results and formulas are then applied to a numerical illustration.  相似文献   

8.
We consider the problem of a firm (“the buyer”) that must acquire a fixed number (L) of items. The buyer can acquire these items either at a fixed buy-it-now price in the open market or by participating in a sequence of N > L auctions. The objective of the buyer is to minimize his expected total cost for acquiring all L items. We model this problem as a Markov Decision Process and establish monotonicity properties for the optimal value function and the optimal bidding strategies.  相似文献   

9.
This paper provides an optimal sequential decision procedure for deciding between two composite hypotheses about the unknown failure rate of an exponential distribution, using censored data. The procedure has two components, a stopping time and a decision function. The optimal stopping time minimizes the expected total loss due to a wrong decision plus cost of observing the process. The optimal decision function is easily characterized once a stopping time has been specified. The main result determines the continuation region for the optimal decision procedure  相似文献   

10.
A biological population with N individuals assumed to be healthy is monitored over time. A portion of these individuals die from natural causes while other may get infected with a disease and become sick. A number of the sick individuals will then die from natural causes and other may die from the disease. A stochastic model for the various transitions ‘healthy, sick, death’, is studied where it is assumed that the only observed states are deaths. Based on these information, optimal filters for the number of individuals in each state are derived. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

11.
It is common to subsample Markov chain output to reduce the storage burden. Geyer shows that discarding k ? 1 out of every k observations will not improve statistical efficiency, as quantified through variance in a given computational budget. That observation is often taken to mean that thinning Markov chain Monte Carlo (MCMC) output cannot improve statistical efficiency. Here, we suppose that it costs one unit of time to advance a Markov chain and then θ > 0 units of time to compute a sampled quantity of interest. For a thinned process, that cost θ is incurred less often, so it can be advanced through more stages. Here, we provide examples to show that thinning will improve statistical efficiency if θ is large and the sample autocorrelations decay slowly enough. If the lag ? ? 1 autocorrelations of a scalar measurement satisfy ρ? > ρ? + 1 > 0, then there is always a θ < ∞ at which thinning becomes more efficient for averages of that scalar. Many sample autocorrelation functions resemble first order AR(1) processes with ρ? = ρ|?| for some ? 1 < ρ < 1. For an AR(1) process, it is possible to compute the most efficient subsampling frequency k. The optimal k grows rapidly as ρ increases toward 1. The resulting efficiency gain depends primarily on θ, not ρ. Taking k = 1 (no thinning) is optimal when ρ ? 0. For ρ > 0, it is optimal if and only if θ ? (1 ? ρ)2/(2ρ). This efficiency gain never exceeds 1 + θ. This article also gives efficiency bounds for autocorrelations bounded between those of two AR(1) processes. Supplementary materials for this article are available online.  相似文献   

12.
《Optimization》2012,61(3):259-281
In this paper we are concerned with several random processes that occur in M/G2/l queue with instantaneous feedback in which the feedback decision process is a pair of independent Bernoulli processes. The stationary distribution of the output process has been obtained. Results for particular queues with feedback and without feedback are obtained. Some operating characteristics are derived for this queue. Some interesting results are obtained for departure processes. Optimum service rate is obtained. Numerical examples are provided to test the feasibility of the queueing model  相似文献   

13.
This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non-randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application is presented.  相似文献   

14.
Optimal stopping,exponential utility,and linear programming   总被引:1,自引:0,他引:1  
This paper uses linear programming to compute an optimal policy for a stopping problem whose utility function is exponential. This is done by transforming the problem into an equivalent one having additive utility and nonnegative (not necessarily substochastic) transition matrices.Research was supported by NSF Grant ENG 76-15599.  相似文献   

15.
Kuri  Joy  Kumar  Anurag 《Queueing Systems》1997,27(1-2):1-16
We consider a problem of admission control to a single queue in discrete time. The controller has access to k step old queue lengths only, where k can be arbitrary. The problem is motivated, in particular, by recent advances in high-speed networking where information delays have become prominent. We formulate the problem in the framework of Completely Observable Controlled Markov Chains, in terms of a multi-dimensional state variable. Exploiting the structure of the problem, we show that under appropriate conditions, the multi-dimensional Dynamic Programming Equation (DPE) can be reduced to a unidimensional one. We then provide simple computable upper and lower bounds to the optimal value function corresponding to the reduced unidimensional DPE. These upper and lower bounds, along with a certain relationship among the parameters of the problem, enable us to deduce partially the structural features of the optimal policy. Our approach enables us to recover simply, in part, the recent results of Altman and Stidham, who have shown that a multiple-threshold-type policy is optimal for this problem. Further, under the same relationship among the parameters of the problem, we provide easily computable upper bounds to the multiple thresholds and show the existence of simple relationships among these upper bounds. These relationships allow us to gain very useful insights into the nature of the optimal policy. In particular, the insights obtained are of great importance for the problem of actually computing an optimal policy because they reduce the search space enormously. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

16.
17.
We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.This research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152.  相似文献   

18.
19.
ABSTRACT

The main goal of this paper is to study the infinite-horizon long run average continuous-time optimal control problem of piecewise deterministic Markov processes (PDMPs) with the control acting continuously on the jump intensity λ and on the transition measure Q of the process. We provide conditions for the existence of a solution to an integro-differential optimality inequality, the so called Hamilton-Jacobi-Bellman (HJB) equation, and for the existence of a deterministic stationary optimal policy. These results are obtained by using the so-called vanishing discount approach, under some continuity and compactness assumptions on the parameters of the problem, as well as some non-explosive conditions for the process.  相似文献   

20.
Optimal Models for first arrival time ( H ) and first arrival target total return (W H ) distribution functions on MDP in continuous time are presented. Asymptotic expansions of H andW H are derived and expressed in simple, explicit forms, and some of their properties are discussed. Two methods to find an optimal policy for distribution function of H are given. Several necessary and sufficient conditions for the existence of the optimal policy are obtained. This result leads to that the scope of finding the optimal policy is greatly reduced. A special case is also discussed and some deep results are given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号