首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Gittins has shown that for a class of Markov decision processes called alternative bandit processes, optimal policies can easily be determined once the dynamic allocation indices (DAIs) for the constituent bandit processes are computed. Improved algorithms are presented for calculating DAIs both for general bandit processes and for the well-known special case of the multi-armed bandit problem.  相似文献   

2.
We use the statistical model of bandit processes to formulate and solve two kinds of optimal investment and consumption problems. The payoffs from the investment are dividend payments with fixed return rates, but the payment frequency is stochastic following a Poisson distribution. The financial market consists of assets which follow Poisson distributions with known or unknown intensity rates. Two kinds of consumption patterns are defined and the optimality of the myopic strategy, the Gittins index strategy, and the play‐the‐winner strategy are discussed. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

3.
We give a new and comparably short proof of Gittins’ index theorem for dynamic allocation problems of the multi-armed bandit type in continuous time under minimal assumptions. This proof gives a complete characterization of optimal allocation strategies as those policies which follow the current leader among the Gittins indices while ensuring that a Gittins index is at an all-time low whenever the associated project is not worked on exclusively. The main tool is a representation property of Gittins index processes which allows us to show that these processes can be chosen to be pathwise lower semi-continuous from the right and quasi-lower semi-continuous from the left. Both regularity properties turn out to be crucial for our characterization and the construction of optimal allocation policies.  相似文献   

4.
5.
A two‐armed bandit model using a Bayesian approach is formulated and investigated in this paper with the goal of maximizing the value of a certain criterion of optimality. The bandit model illustrates the trade‐off between exploration and exploitation, where exploration means acquiring scientific acknowledge for better‐informed decisions at later stages (ie, maximizing long‐term benefit), and exploitation means applying the current knowledge for the best possible outcome at the current stage (ie, maximizing the immediate expected payoff). When one arm has known characteristics, stochastic dynamic programming is applied to characterize the optimal strategy and provide the foundation for its calculation. The results show that the celebrated Gittins index can be approximated by a monotonic sequence of break‐even values. When both arms are unknown, we derive a special case of optimality of the myopic strategy.  相似文献   

6.
Traditional approaches to stochastic resource allocation problems (including the classical multi-armed bandit problems) have usually made use of dynamic programming (DP) methodology, perhaps buttressed by further ad hoc arguments. While such approaches seem ‘natural’ they have usually proved technically very difficult. Bertsimas and Niño-Mora have recently given a radically new account of many important results in this area which relate to Gittins indices. The key to their approach is in the characterisation of the region of achievable performance. The optimisation problems of interest are then solved as linear programs over this region. Here we exploit elements within the Bertsimas and Niño-Mora framework (in particular, its capacity to give formulae for the total return of a given policy in closed form) to obtain (i) a simple dynamic programming proof of the optimality of Gittins index policies and (ii) a range of index-based suboptimality bounds for general policies for a variety of stochastic models for resource allocation.  相似文献   

7.
We study four proofs that the Gittins index priority rule is optimal for alternative bandit processes. These include Gittins’ original exchange argument, Weber’s prevailing charge argument, Whittle’s Lagrangian dual approach, and Bertsimas and Niño-Mora’s proof based on the achievable region approach and generalized conservation laws. We extend the achievable region proof to infinite countable state spaces, by using infinite dimensional linear programming theory.  相似文献   

8.
We consider single-server fluid networks with feedback and arbitrary input processes. The server has to be scheduled in order to minimize a linear holding cost. This model is the fluid analogue of the so-called Klimov problem. Using the achievable-region approach, we show that the Gittins index rule is optimal in a strong sense: it minimizes the linear holding cost for arbitrary input processes and for all time points t0.  相似文献   

9.
Threshold autoregressive (AR) and autoregressive moving average (ARMA) processes with continuous time parameter have been discussed in several recent papers by Brockwellet al. (1991,Statist. Sinica,1, 401–410), Tong and Yeung (1991,Statist. Sinica,1, 411–430), Brockwell and Hyndman (1992,International Journal Forecasting,8, 157–173) and Brockwell (1994,J. Statist. Plann. Inference,39, 291–304). A threshold ARMA process with boundary width 2>0 is easy to define in terms of the unique strong solution of a stochastic differential equation whose coefficients are piecewise linear and Lipschitz. The positive boundary-width is a convenient mathematical device to smooth out the coefficient changes at the boundary and hence to ensure the existence and uniqueness of the strong solution of the stochastic differential equation from which the process is derived. In this paper we give a direct definition of a threshold ARMA processes with =0 in the important case when only the autoregressive coefficients change with the level of the process. (This of course includes all threshold AR processes with constant scale parameter.) The idea is to express the distributions of the process in terms of the weak solution of a certain stochastic differential equation. It is shown that the joint distributions of this solution with =0 are the weak limits as 0 of the distributions of the solution with >0. The sense in which the approximating sequence of processes used by Brockwell and Hyndman (1992,International Journal Forecasting,8, 157–173) converges to this weak solution is also investigated. Some numerical examples illustrate the value of the latter approximation in comparison with the more direct representation of the process obtained from the Cameron-Martin-Girsanov formula. It is used in particular to fit continuous-time threshold models to the sunspot and Canadian lynx series.Research partially supported by National Science Foundation Research Grants DMS 9105745 and 9243648.  相似文献   

10.
Multi-Armed bandit problem revisited   总被引:1,自引:0,他引:1  
In this paper, we revisit aspects of the multi-armed bandit problem in the earlier work (Ref. 1). An alternative proof of the optimality of the Gittins index rule is derived under the discounted reward criterion. The proof does not involve an explicit use of the interchange argument. The ideas of the proof are extended to derive the asymptotic optimality of the index rule under the average reward criterion. Problems involving superprocesses and arm-acquiring bandits are also reexamined. The properties of an optimal policy for an arm-acquiring bandit are discussed.This research was supported by NSF Grant IRI-91-20074.  相似文献   

11.
I develop a notion of nonlinear stochastic integrals for hyperfinite Lévy processes and use it to find exact formulas for expressions which are intuitively of the form and , where l is a Lévy process. These formulas are then applied to geometric Lévy processes, infinitesimal transformations of hyperfinite Lévy processes, and to minimal martingale measures. Some of the central concepts and results are closely related to those found in S. Cohen’s work on stochastic calculus for processes with jumps on manifolds, and the paper may be regarded as a reworking of his ideas in a different setting and with totally different techniques.  相似文献   

12.
13.
For an M/G/1 queue with the objective of minimizing the mean number of jobs in the system, the Gittins index rule is known to be optimal among the set of non-anticipating policies. We develop properties of the Gittins index. For a single-class queue it is known that when the service time distribution is of type Decreasing Hazard Rate (New Better than Used in Expectation), the Foreground–Background (First-Come-First-Served) discipline is optimal. By utilizing the Gittins index approach, we show that in fact, Foreground–Background and First-Come-First-Served are optimal if and only if the service time distribution is of type Decreasing Hazard Rate and New Better than Used in Expectation, respectively. For the multi-class case, where jobs of different classes have different service distributions, we obtain new results that characterize the optimal policy under various assumptions on the service time distributions. We also investigate distributions whose hazard rate and mean residual lifetime are not monotonic.  相似文献   

14.
Summary It often happens that a stochastic process may be approximated by a sum of a large number of independent components no one of which contributes a significant proportion of the whole. For example the depth of water in a lake with many small rivers flowing into it from distant sources, or the point process of calls entering a telephone exchange (considered as the sum of a number of point processes of calls made by individual subscribers) may approximately fulfil these hypotheses. In the present work we formulate and solve the problem of characterizing stochastic processes all of whose finite-dimensional distributions are infinitely divisible. Although some of our results could be derived from known theorems on probabilities on general algebraic structures, many could not and it seems preferable to take the vector-valued infinitely divisible laws as our starting point. We emphasize that an infinitely divisible process (in our sense) on the real line is not necessarily a decomposable process in the sense of Lévy (cf. § 4) though decomposable processes are particular cases.In § 1 a representation theorem for non-negative (and possibly infinite) stochastic processes is derived, while an analogous theorem in the real-valued case is to be found in § 2. § 3 is devoted to the statement of a central limit theorem and the investigation of some of the properties of infinitely divisible processes. The investigation is continued in § 4 by an examination of processes on the real line giving, for example, a representation theorem under weak conditions for infinitely divisible processes which are a.s. sample continuous. Finally in § 5 a study is made of infinitely divisible point processes and random measures.The author is indebted to Professor J. F. C. Kingman for advice and encouragement.  相似文献   

15.
Bandits are a finite collection of random variables. Bandit problems are Markov decision problems in which, at each decision time, the decision maker selects a random variable (referred to as a bandit arm) and observes an outcome. The selection is based on the observation history. The objective is to sequentially choose arms so as to minimize growth (with decision time) rate of the number of suboptimal selections.The appellation bandit refers to mechanical gambling machines, and the tradition stems from the question of allocating competing treatments to a sequence of patients having the same disease. Our motivation is machine learning in which a game-playing or assembly-line adjusting computer is faced with a sequence of statistically-similar decision problems and, as resource, has access to an expanding data base relevant to these problems.The setting for the present study is nonparametric and infinite horizon. The central aim is to relate a methodology which postulates finite moments or, alternatively, bounded bandit arms. Under these circumstances, strategies proposed are shown to be asymptotically optimal and converge at guaranteed rates. In the bounded-arm case, the rate is optimal.We extend the theory to the case in which the bandit population is infinite, and share some computational experience.  相似文献   

16.
Abstract

We study the problem of optimal control of a jump diffusion, that is, a process which is the solution of a stochastic differential equation driven by Lévy processes. It is required that the control process is adapted to a given subfiltration of the filtration generated by the underlying Lévy processes. We prove two maximum principles (one sufficient and one necessary) for this type of partial information control. The results are applied to a partial information mean-variance portfolio selection problem in finance.  相似文献   

17.
Abstract

We develop two parsimonious models for pricing multi-name credit derivatives. We derive closed form expression for the loss distribution, which then can be used in determining the prices of tranche and index swaps and more exotic derivatives on these contracts. Our starting point is the model of Ding et al., 2008, which takes the loss process as a time-changed birth process. We introduce stochastic parameter variations into the intensity of the loss process and use the multi-time scale approach of Fouque et al., 2003 Fouque, J.-P., Papanicolaou, G., Sircar, R. and Solna, K. 2003. Multiscale stochastic volatility asymptotics. SIAM Journal of Multiscale Modeling and Simulation, 2(1): 2242.  [Google Scholar] and obtain explicit perturbation approximations to the loss distribution. We demonstrate the competence of our approach by calibrating it to the CDX index data.  相似文献   

18.
On Gaussian Processes Equivalent in Law to Fractional Brownian Motion   总被引:1,自引:1,他引:0  
We consider Gaussian processes that are equivalent in law to the fractional Brownian motion and their canonical representations. We prove a Hitsuda type representation theorem for the fractional Brownian motion with Hurst index H1/2. For the case H>1/2 we show that such a representation cannot hold. We also consider briefly the connection between Hitsuda and Girsanov representations. Using the Hitsuda representation we consider a certain special kind of Gaussian stochastic equation with fractional Brownian motion as noise.  相似文献   

19.
We study a queueing system withm exponential servers with distinct service rates. Jobs arrive at the system following an arbitrary point process. Arrived jobs receive service at the first unoccupied server (if any) according to an entry order , which is a permutation of the integers 1, 2,...,m. The system has a finite buffer capacity. When the buffer limit is reached, arrivals will be blocked. Blocked jobs will either be lost or come back as New arrivals after a random travel time. We are concerned with the dynamic stochastic behavior of the system under different entry orders. A partial ordering is established among entry orders, and is shown to result in some quite strong orderings among the associated stochastic processes that reflect the congestion and the service characteristics of the system. The results developed here complement existing comparison results for queues with homogeneous servers, and can be applied to aid the design of conveyor and communication systems.  相似文献   

20.
We introduce and study fractional generalizations of the well-known Gamma process, in the following sense: the corresponding densities are proved to satisfy the same differential equation as the usual Gamma process, but with the shift operator replaced by its fractional version of order ν > 0. In the case ν > 1, the solution corresponds to the density of a Gamma process time-changed by an independent stable subordinator of index 1/ν. For ν less than one an analogous result holds, with the subordinator replaced by the inverse. In this case the fractional Gamma process is proved to be a non-stationary version of the standard one, with power law behavior of the expected value. Hence it can be considered a useful tool in modelling stochastic deterioration in the non-linear cases, a situation which often occurs in real data (see i.e., [42 Van Noortwijk, J.M., 2009. A survey of the application of Gamma processes in maintenance. Reliability Engineering System Safety 94: 221.[Crossref], [Web of Science ®] [Google Scholar]] and the references therein).

As a consequence of the previous results, the fractional generalizations of some Gamma subordinated processes (i.e. the Variance Gamma, the Geometric Stable and the Negative Binomial) are introduced and the corresponding fractional differential equations are obtained. These processes are particularly relevant for a wide range of financial and technological applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号