首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A two‐armed bandit model using a Bayesian approach is formulated and investigated in this paper with the goal of maximizing the value of a certain criterion of optimality. The bandit model illustrates the trade‐off between exploration and exploitation, where exploration means acquiring scientific acknowledge for better‐informed decisions at later stages (ie, maximizing long‐term benefit), and exploitation means applying the current knowledge for the best possible outcome at the current stage (ie, maximizing the immediate expected payoff). When one arm has known characteristics, stochastic dynamic programming is applied to characterize the optimal strategy and provide the foundation for its calculation. The results show that the celebrated Gittins index can be approximated by a monotonic sequence of break‐even values. When both arms are unknown, we derive a special case of optimality of the myopic strategy.  相似文献   

2.
We use the statistical model of bandit processes to formulate and solve two kinds of optimal investment and consumption problems. The payoffs from the investment are dividend payments with fixed return rates, but the payment frequency is stochastic following a Poisson distribution. The financial market consists of assets which follow Poisson distributions with known or unknown intensity rates. Two kinds of consumption patterns are defined and the optimality of the myopic strategy, the Gittins index strategy, and the play‐the‐winner strategy are discussed. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

3.
In this paper, optimal investment and consumption decisions for an optimal choiceproblem in infinite borizon are considered, for an investor who has available a bank account anda stock whose price is a log normal diffusion. The bank pays at an interest rate r for any de-posit, and takes at a larger rate / for any loan. As in the paper of Xu Wensheng and ChenShuping in JAMS(B), where an analogous problem in finite horizon is studied, optimal strategies are obtained via Hamilton-Jacobi-Bellman (ladE) equation which is derived from dynamic c1-programming principle. For the specific HARA case, i.e. U(t,c)=e^-βtc^1-R/1-R, this paper getsthe optimal consumption and optimal investment in the form of c^‘1 =β -^-g/Rwi and π^‘1= b -- γ / Rσ^2wr, with γ1,=max{γ,min{γ‘,b--Rσ^2‘} },^-g=(1--R)[γ (b-γ)^2/2Rσ^2]. This result coincides with the classical one under condition γ‘ ≡γ.  相似文献   

4.
We deal with zero-sum two-player stochastic games with perfect information. We propose two algorithms to find the uniform optimal strategies and one method to compute the optimality range of discount factors. We prove the convergence in finite time for one algorithm. The uniform optimal strategies are also optimal for the long run average criterion and, in transient games, for the undiscounted criterion as well.  相似文献   

5.
We consider a portfolio optimization problem under stochastic volatility as well as stochastic interest rate on an infinite time horizon. It is assumed that risky asset prices follow geometric Brownian motion and both volatility and interest rate vary according to ergodic Markov diffusion processes and are correlated with risky asset price. We use an asymptotic method to obtain an optimal consumption and investment policy and find some characteristics of the policy depending upon the correlation between the underlying risky asset price and the stochastic interest rate.  相似文献   

6.
In this paper, we study the upper bounds for ruin probabilities of an insurance company which invests its wealth in a stock and a bond. We assume that the interest rate of the bond is stochastic and it is described by a Cox-Ingersoll-Ross (CIR) model. For the stock price process, we consider both the case of constant volatility (driven by an O-U process) and the case of stochastic volatility (driven by a CIR model). In each case, under certain conditions, we obtain the minimal upper bound for ruin probability as well as the corresponding optimal investment strategy by a pure probabilistic method.  相似文献   

7.
In this paper, we consider the consumption and investment problem with random horizon in a Batch Markov Arrival Process (BMAP) model. The investor invests her wealth in a financial market consisting of a risk-free asset and a risky asset. The price processes of the riskless asset and the risky asset are modulated by a continuous-time Markov chain, which is the phase process of a BMAP. The possible consumption or investment are restricted to a sequence of random discrete time points which are determined by the same BMAP. The investor has only consumption opportunities at some of these random time points, has both consumption and investment opportunities at some other random time points, and can do nothing at the remaining random time points. The object of the investor is to select the consumption–investment strategy that maximizes the expected total discounted utility. The purpose of this paper is to analyze the impact of the consumption–investment opportunity and the economic state on the value functions and consumption–investment strategies. The general solution and the exact solution under the assumption that the consumption and the terminal wealth are evaluated by the power utility are obtained. Finally, a numerical example is presented.  相似文献   

8.
9.
1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe...  相似文献   

10.
In this paper, we consider a stochastic control problem on a finite time horizon. The unit price of capital obeys a logarithmic Brownian motion, and the income from production is also subject to the random Brownian fluctuations. The goal is to choose optimal investment and consumption policies to maximize the finite horizon expected discounted hyperbolic absolute risk aversion utility of consumption. A dynamic programming principle is used to derive a time‐dependent Hamilton–Jacobi–Bellman equation. The Leray–Schauder fixed point theorem is used to obtain existence of solution of the HJB equation. At last, we derive the optimal investment and consumption policies by the verification theorem. The main contribution in this paper is the use of PDE technique to the finite time problem for obtaining optimal polices. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
A stochastic population model with the mixed harvesting strategy is formulated and studied in this paper. Sufficient and necessary conditions for survival of the species are derived firstly. Then, based on the ergodic stationary distribution, the optimal strategy is identified. Results show that the linear harvesting effort threatens to the survival of the species; the quadratic harvesting strategy occupies an absolute advantage in the harvesting and excludes the linear part out of the optimal harvesting strategy. It''s interest to see all these occur only in the random environments. Computer simulations are carried out to support the obtained results.  相似文献   

12.
13.
We consider a general adversarial stochastic optimization model. Our model involves the design of a system that an adversary may subsequently attempt to destroy or degrade. We introduce SPAR, which utilizes mixed-integer programming for the design decision and a Markov decision process (MDP) for the modeling of our adversarial phase.  相似文献   

14.
In this paper we investigate an optimal job, consumption, and investment policy of an economic agent in a continuous and infinite time horizon. The agent’s preference is characterized by the Cobb–Douglas utility function whose arguments are consumption and leisure. We use the martingale method to obtain the closed-form solution for the optimal job, consumption, and portfolio policy. We compare the optimal consumption and investment policy with that in the absence of job choice opportunities.  相似文献   

15.
王熙逵 《经济数学》2001,18(4):39-48
本文有两个目的.第一,对Bandit过程这一学科的主要概念及结果作一次系统性的介绍.第二,综述Bandit过程的模型,计算与应用的最新发展.本文刻画了Bandit过程与马氏决策规划的关系.通过考虑理论上或方法论上的局限,实际中或计算上的困难,以及应用中的限制.我们讨论一些重要的争端和公开的问题.  相似文献   

16.
This paper studies the synthesis of controllers for discrete-time, continuous state stochastic systems subject to omega-regular specifications using finite-state abstractions. Omega-regular properties allow specifying complex behaviors and encompass, for example, linear temporal logic. First, we present a synthesis algorithm for minimizing or maximizing the probability that a discrete-time switched stochastic system with a finite number of modes satisfies an omega-regular property. Our approach relies on a finite-state abstraction of the underlying dynamics in the form of a Bounded-parameter Markov Decision Process arising from a finite partition of the system’s domain. Such Markovian abstractions allow for a range of probabilities of transition between states for each selected action representing a mode of the original system. Our method is built upon an analysis of the Cartesian product between the abstraction and a Deterministic Rabin Automaton encoding the specification of interest or its complement. Specifically, we show that synthesis can be decomposed into a qualitative problem, where the so-called greatest permanent winning components of the product automaton are created, and a quantitative problem, which requires maximizing the probability of reaching this component in the worst-case instantiation of the transition intervals. Additionally, we propose a quantitative metric for measuring the quality of the designed controller with respect to the continuous abstracted states and devise a specification-guided domain partition refinement heuristic with the objective of reaching a user-defined optimality target. Next, we present a method for computing control policies for stochastic systems with a continuous set of available inputs. In this case, the system is assumed to be affine in input and disturbance, and we derive a technique for solving the qualitative and quantitative problems in the resulting finite-state abstractions of such systems. For this, we introduce a new type of abstractions called Controlled Interval-valued Markov Chains. Specifically, we show that the greatest permanent winning component of such abstractions are found by appropriately partitioning the continuous input space in order to generate a bounded-parameter Markov decision process that accounts for all possible qualitative transitions between the finite set of states. Then, the problem of maximizing the probability of reaching these components is cast as a (possibly non-convex) optimization problem over the continuous set of available inputs. A metric of quality for the synthesized controller and a partition refinement scheme are described for this framework as well. Finally, we present a detailed case study.  相似文献   

17.
In this paper, we study an inverse optimal problem in discrete-time stochastic control. We give necessary and sufficient conditions for a solution to a system of stochastic difference equations to be the solution of a certain optimal control problem. Our results extend to the stochastic case the work of Dechert. In particular, we present a stochastic version of an important principle in welfare economics.  相似文献   

18.
本文考虑索赔额过程与索赔时间过程具有相依性的更新风险模型.假定保险公司将其盈余投资到金融市场中,该投资的价格过程服从几何L′evy过程.当索赔额分布属于L∩D时,本文得到有限时间总索赔额现值尾概率的一致渐近估计,同时也得到有限时间破产概率的一致渐近估计.  相似文献   

19.
We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model.  相似文献   

20.
This paper solves a general continuous-time consumption and portfolio decision problem for a single agent for whom there exists, upon bankruptcy, a possibility of recovery from his bankruptcy. The main contribution of the paper is in the modeling of the recovery process. Moreover, it is shown that the model with recovery has a one-to-one correspondence with the model with terminal bankruptcy treated in the literature.This research was supported by Grants SSHRC-410-83-9888 and NSERC-A4619 to the first author and by Grants NSF-DMS-86-01510 and AFOSR-88-0183 to the second author. Comments from E. Presman are gratefully acknowledged.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号