首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We consider a simple Markovian queue with Poisson arrivals and exponential service times for jobs. The controller chooses state-dependent service rates from an action space. The queue has a finite buffer, and when full, new jobs get rejected. The controller’s objective is to choose optimal service rates that meet a quality-of-service constraint. We solve this problem analytically and compute it numerically under two cases: When the action space is unbounded and when it is bounded.  相似文献   

2.
The ordinary American put option assumes that investors can exercise their right at any time epoch. However, due to limitations in actual trades, they are not totally free to exercise in time. In this paper, motivated by this practical situation, we consider American put options with a finite set of exercisable time epochs. Assuming that the underlying stock price process follows a discrete-time Markov process, the put option premium is derived. It is shown that, as for the ordinary American put, the option premium is decomposed into the corresponding European put premium plus the early exercise premium under the stationary independent increments assumption. Moreover, the option premium converges to the ordinary American put premium from below as the number of exercisable time epochs increases under regularity conditions. Some lower bound of the option premium is also obtained.  相似文献   

3.
A new method for predicting failures of a partially observable system is presented. System deterioration is modeled as a hidden, 3-state continuous time homogeneous Markov process. States 0 and 1, which are not observable, represent good and warning conditions, respectively. Only the failure state 2 is assumed to be observable. The system is subject to condition monitoring at equidistant, discrete time epochs. The vector observation process is stochastically related to the system state. The objective is to develop a method for optimally predicting impending system failures. Model parameters are estimated using EM algorithm and a cost-optimal Bayesian fault prediction scheme is proposed. The method is illustrated using real data obtained from spectrometric analysis of oil samples collected at regular time epochs from transmission units of heavy hauler trucks used in mining industry. A comparison with other methods is given, which illustrates effectiveness of our approach.  相似文献   

4.
We present intensional dynamic programming (IDP), a generic framework for structured dynamic programming over atomic, propositional and relational representations of states and actions. We first develop set-based dynamic programming and show its equivalence with classical dynamic programming. We then show how to describe state sets intensionally using any form of structured knowledge representation and obtain a generic algorithm that can optimally solve large, even infinite, MDPs without explicit state space enumeration. We derive two new Bellman backup operators and algorithms. In order to support the view of IDP as a Rosetta stone for structured dynamic programming, we review many existing techniques that employ either propositional or relational knowledge representation frameworks.  相似文献   

5.
This paper proposes a set of methods for solving stochastic decision problems modeled as partially observable Markov decision processes (POMDPs). This approach (Real Time Heuristic Decision System, RT-HDS) is based on the use of prediction methods combined with several existing heuristic decision algorithms. The prediction process is one of tree creation. The value function for the last step uses some of the classic heuristic decision methods. To illustrate how this approach works, comparative results of different algorithms with a variety of simple and complex benchmark problems are reported. The algorithm has also been tested in a mobile robot supervision architecture.  相似文献   

6.
Because of their convincing performance, there is a growing interest in using evolutionary algorithms for reinforcement learning. We propose learning of neural network policies by the covariance matrix adaptation evolution strategy (CMA-ES), a randomized variable-metric search algorithm for continuous optimization. We argue that this approach, which we refer to as CMA Neuroevolution Strategy (CMA-NeuroES), is ideally suited for reinforcement learning, in particular because it is based on ranking policies (and therefore robust against noise), efficiently detects correlations between parameters, and infers a search direction from scalar reinforcement signals. We evaluate the CMA-NeuroES on five different (Markovian and non-Markovian) variants of the common pole balancing problem. The results are compared to those described in a recent study covering several RL algorithms, and the CMA-NeuroES shows the overall best performance.  相似文献   

7.
Inspection models applicable to a finite planning horizon are developed for the following lifetime distributions: uniform, exponential, and Weibull distribution. For a given lifetime distribution, maximization of profit is used as the sole optimization criterion for determining an optimal planning horizon over which a system may be operated as well as ideal inspection times. Illustrative examples (focusing on the uniform and Weibull distributions and using Mathematica programs) are given. For some situations, evenly spreading inspections over the entire planning horizon are seen to result in the attainment of desirable profit levels over a shorter planning horizon. Scope for further research is given as well. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

8.
We introduce and study a class of non-stationary semi-Markov decision processes on a finite horizon. By constructing an equivalent Markov decision process, we establish the existence of a piecewise open loop relaxed control which is optimal for the finite horizon problem.  相似文献   

9.
10.
This article proposes a probability model for k-dimensional ordinal outcomes, that is, it considers inference for data recorded in k-dimensional contingency tables with ordinal factors. The proposed approach is based on full posterior inference, assuming a flexible underlying prior probability model for the contingency table cell probabilities. We use a variation of the traditional multivariate probit model, with latent scores that determine the observed data. In our model, a mixture of normals prior replaces the usual single multivariate normal model for the latent variables. By augmenting the prior model to a mixture of normals we generalize inference in two important ways. First, we allow for varying local dependence structure across the contingency table. Second, inference in ordinal multivariate probit models is plagued by problems related to the choice and resampling of cutoffs defined for these latent variables. We show how the proposed mixture model approach entirely removes these problems. We illustrate the methodology with two examples, one simulated dataset and one dataset of interrater agreement.  相似文献   

11.
Many individuals suffering from food insecurity obtain assistance from governmental programs and nonprofit agencies such as food banks. Much of the food distributed by food banks come from donations which are received from various sources in uncertain quantities at random points in time. This paper presents a model that can assist food banks in distributing these uncertain supplies equitably and measure the performance of their distribution efforts. We formulate this decision problem as a discrete-time, discrete state Markov decision process that considers stochastic supply, deterministic demand and an equity-based objective. We investigate three different allocation rules and describe the optimal policy as a function of available inventory. We also provide county level estimates of unmet need and determine the probability distribution associated with the number of underserved counties. A numerical study is performed to show how the allocation policy and unmet need are impacted by uncertain supply and deterministic, time-varying demand. We also compare different allocation rules in terms of equity and effectiveness.  相似文献   

12.
In this paper the possibility is investigated of using aggregation in the action space for some Markov decision processes of inventory control type. For the standard (s, S) inventory control model the policy improvement procedure can be executed in a very efficient way, therefore, aggregation in the action space is not of much use. However, in situations where the decisions have some aftereffect and, hence, the old decision has to be incorporated in the state, it might be rewarding to aggregate actions. Some variants for aggregation and disaggregation are formulated and analyzed. Numerical evidence is presented.  相似文献   

13.
We study the coordination of production and quality control in a tandem-queue system. There are two stages, with a single server at stage one that can engage in processing an item, or inspecting the produced item, or staying idle; whereas the second stage represents the aggregate of the rest of the production facility. We focus on the optimal control of the first stage, where both the production and inspection times follow general distributions. We formulate a semi-Markov decision program with a long-run average objective, and derive the stationary optimal policy to control and coordinate the production, inspection, and idling processes. We show that there exists a threshold valuei , such that under the optimal policy, once the threshold is reached, production should be suspended at the first stage; and this leads naturally toi +1 being the required buffer capacity between the two stages.Supported in part by NSF Grant MDI-9523029.Supported in part by HKUST Grant DAG95/96.BM52.  相似文献   

14.
Univariate or multivariate ordinal responses are often assumed to arise from a latent continuous parametric distribution, with covariate effects that enter linearly. We introduce a Bayesian nonparametric modeling approach for univariate and multivariate ordinal regression, which is based on mixture modeling for the joint distribution of latent responses and covariates. The modeling framework enables highly flexible inference for ordinal regression relationships, avoiding assumptions of linearity or additivity in the covariate effects. In standard parametric ordinal regression models, computational challenges arise from identifiability constraints and estimation of parameters requiring nonstandard inferential techniques. A key feature of the nonparametric model is that it achieves inferential flexibility, while avoiding these difficulties. In particular, we establish full support of the nonparametric mixture model under fixed cut-off points that relate through discretization the latent continuous responses with the ordinal responses. The practical utility of the modeling approach is illustrated through application to two datasets from econometrics, an example involving regression relationships for ozone concentration, and a multirater agreement problem. Supplementary materials with technical details on theoretical results and on computation are available online.  相似文献   

15.
In this paper we consider Markov Decision Processes with discounted cost and a random rate in Borel spaces. We establish the dynamic programming algorithm in finite and infinity horizon cases. We provide conditions for the existence of measurable selectors. And we show an example of consumption-investment problem. This research was partially supported by the PROMEP grant 103.5/05/40.  相似文献   

16.
We study the optimal control of an assembly system that produces one assembled-to-order final product with multiple made-to-stock components and sells it at variable price. It is shown that a threshold control on component production, product price, and product orders maximizes total discounted profit over an infinite horizon.  相似文献   

17.
A two-dimensional partially observable diffusion type process is considered. Under the assumption that the coefficients of the system of the equations satisfy the functional Lipschitz condition only with respect to one of the space variables, and that the linear growth condition on the infinity is also satisfied, the innovation process existence is proved for the so-called “observable” component of the process. As a lemma we obtain a non-traditional stochastic analogy of the well-known Gronwall-Bellman's lemma.  相似文献   

18.
ABSTRACT

The main goal of this paper is to study the infinite-horizon long run average continuous-time optimal control problem of piecewise deterministic Markov processes (PDMPs) with the control acting continuously on the jump intensity λ and on the transition measure Q of the process. We provide conditions for the existence of a solution to an integro-differential optimality inequality, the so called Hamilton-Jacobi-Bellman (HJB) equation, and for the existence of a deterministic stationary optimal policy. These results are obtained by using the so-called vanishing discount approach, under some continuity and compactness assumptions on the parameters of the problem, as well as some non-explosive conditions for the process.  相似文献   

19.
We consider a service system with a single server, a finite waiting room and two classes of customers with deterministic service time. Primary jobs arrive at random and are admitted whenever there is room in the system. At the beginning of each period, secondary jobs can be admitted from an infinite pool. A revenue is earned upon admission of each job, with the primary jobs bringing a higher contribution than the secondary jobs, the objective being to maximize the total discounted revenue over an infinite horizon. We model the system as a discrete time Markov decision process and show that a monotone admission policy is optimal when the number of primary arrivals has a fixed distribution. Moreover, when the primary arrival distribution varies with time according to a finite state Markov chain, we show that the optimal policy is conditionally monotone and that the numerical computation of an optimal policy, in this case, is substantially more difficult than in the case of stationary arrivals.This research was supported in part by the National Science Foundation, under grant ECS-8803061, while the author was at the University of Arizona.  相似文献   

20.
Recently, it has been recognized that revenue management of cruise ships is different from that of airlines or hotels. Among the main differences is the presence of multiple capacity constraints in cruise ships, i.e., the number of cabins in different categories and the number of lifeboat seats, versus a single constraint in airlines and hotels (i.e., number of seats or rooms). We develop a discrete-time dynamic capacity control model for a cruise ship characterized by multiple constraints on cabin and lifeboat capacities. Customers (families) arrive sequentially according to a stochastic process and request one cabin of a certain category and one or more lifeboat seats. The cruise ship revenue manager decides which requests to accept based on the remaining cabin and lifeboat capacities at the time of an arrival as well as the type of the arrival. We show that the opportunity cost of accepting a customer is not always monotone in the reservation levels or time. This non-monotone behavior implies that “conventional” booking limits or critical time periods capacity control policies are not optimal. We provide analysis and insights justifying the non-monotone behavior in our cruise ship context. In the absence of monotonicity, and with the optimal solution requiring heavy storage for “large” (industry-size) problems, we develop several heuristics and thoroughly test their performance, via simulation, against the optimal solution, well-crafted upper bounds, and a first-come first-served lower bound. Our heuristics are based on rolling-up the multi-dimensional state space into one or two dimensions and solving the resulting dynamic program (DP). This is a strength of our approach since our DP-based heuristics are easy to understand, solve and analyze. We find that single-dimensional heuristics based on decoupling the cabins and lifeboat problems perform quite well in most cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号