期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

吴晓孔荫莹郭圳滨《数学物理学报(A辑)》2022,(2):594-604

该文研究了Polish空间上、带折扣因子的连续时间马尔可夫决策过程(CTMDPs)的量子化平稳策略的渐近最优性问题.首先,建立了折扣最优方程(DOE)及其解的存在性和唯一性.其次,在适当的条件下证明了最优确定性平稳策略的存在性.此外,为了对行动空间进行离散化,构造了一列量子化策略,利用有限行动空间的策略来逼近一般(Polish)空间上的折扣CTMDPs最优平稳策略.最后,通过一个例子来说明该文的渐近逼近结果. 相似文献

2.

Markov调节中基于时滞和相依风险模型的最优再保险与投资

张彩斌梁志彬袁锦泉《中国科学:数学》2021,(5):773-796

本文研究保险公司在Markov调节下基于时滞及相依风险模型的最优再保险与最优投资问题,其中市场被划分为有限个状态,一些重要的参数随着市场状态的转换而变化.假设保险公司的盈余过程由复合Poisson过程描述,而风险资产的价格过程由几何跳扩散模型刻画,并且假设这两个跳过程是相依的.以最大化终端财富值的均值-方差效用为目标,在博弈论框架下,利用随机控制理论和相应的广义Hamilton-Jacobi-Bellman(HJB)方程,本文得到最优策略和值函数的显式表达,并证明解的存在性和唯一性.最后,通过一些数值实例,验证所得结论的正确性,并探讨一些重要参数对最优策略的影响. 相似文献

3.

具有马氏调制的风险模型的均值-方差组合选择问题

下载免费PDF全文

杨鹏《数学杂志》2015,35(6):1541-1550

本文研究了保险市场上的均值-方差组合选择问题.本文利用线性二次控制理论,得到了最优策略和有效的均值-方差边界的显示解. 相似文献

4.

连续时间马尔可夫决策过程的折扣模型

郭先平戴永隆《数学学报》2002,45(1):171-182

本文考虑的是转移速率族任意且费用率函数可能无界的连续时间马尔可夫决策过程的折扣模型．放弃了传统的要求相应于每个策略的　Ｑ　－过程唯一等条件,而首次考虑相应每个策略的　Ｑ　－过程不一定唯一,　转移速率族也不一定保守,　费用率函数可能无界,　且允许行动空间非空任意的情形．　本文首次用"α－折扣费用最优不等式"更新了传统的α－折扣费用最优方程,并用"最优不等式"和新的方法,不仅证明了传统的主要结果即最优平稳策略的存在性,　而且还进一步探讨了（　∈＞０　　）－最优平稳策略,具有单调性质的最优平稳策略,　以及（∈≥０）　－最优决策过程的存在性,　得到了一些有意义的新结果．　最后,　提供了一个迁移率受控的生灭系统例子,　它满足本文的所有条件,　而传统的假设（见文献［１－１４］）均不成立．相似文献

5.

半Markov决策过程的研究进展

《中国科学:数学》2015,(5)

本文综述半Markov决策过程(SMDP)理论的发展现状,主要介绍SMDP无限阶段期望折扣报酬准则、长期平均准则、有限阶段期望报酬准则、首达目标期望报酬准则、概率准则、受约束问题和均值-方差准则的研究工作,着重阐述这些优化准则的背景、意义、主要研究进展及有待研究的问题.最后,展望SMDP未来的一些潜在研究方向和相关问题. 相似文献

6.

基于基准过程的动态均值-方差最优投资组合选择

刘利敏肖庆宪《数学的实践与认识》2013,43(1)

利用动态规划方法研究了基于基准过程的动态均值-方差最优投资组合问题,证明了识别定理,得到了剩余过程的均方最优投资策略和有效前沿. 相似文献

7.

基于均值-方差准则的保险公司最优投资策略

谷爱玲曾艳珊《数学的实践与认识》2012,42(22):24-30

研究了保险公司在均值-方差准则下的最优投资问题,其中保险公司的盈余过程由带随机扰动的Cramer-Lundberg模型刻画,而且保险公司可将其盈余投资于无风险资产和一种风险资产.利用随机动态规划方法,通过求解相应的HJB方程,得到了均值方差模型的最优投资策略和有效前沿.最后,给出了数值算例说明扰动项对有效前沿的影响. 相似文献

8.

风险资产价格服从CEV模型的投资组合随机微分博弈

刘庆平陈丽航李静《数学理论与应用》2014,(3)

本文在风险资产价格服从CEV模型时,讨论两个投资者的时间一致均值-方差最优投资组合选择的随机微分博弈问题.运用动态规划原理,求得了最优投资策略及相应的值函数. 相似文献

9.

基于索赔相依的时滞均衡投资与再保险策略

阎方刘伟刘国欣《应用数学》2023,(2):550-561

本文研究保险公司的最优投资与再保险问题.假设再保险种类是比例再保险,未来索赔与历史索赔是相关的.此外,风险资产的价格过程由常方差弹性模型来描述,并且在财富过程中考虑了财富的时滞效应.在均值-方差优化准则下,本文给出了最优均衡投资和比例再保险策略及值函数的显式解.最后,通过数值分析,讨论了模型主要参数对最优策略的影响.本文所提模型及所获结果是对文献中已有研究成果的推广. 相似文献

10.

再保险-投资的M-V及M-VaR最优策略 总被引：1，自引：0，他引：1

王海燕彭大衡《经济数学》2011,(3):71-76

考虑保险公司再保险-投资问题在均值-方差(M-V)模型和均值-在险价值(M-VaR)模型下的最优常数再调整策略.在保险公司盈余过程服从扩散过程的假设及多风险资产的Black-Scholes市场条件下,分别得到均值-方差模型和均值-在险价值模型下保险公司再保险-投资问题的最优常数再调整策略及共有效前沿,并就两种模型下的结... 相似文献

11.

Variance minimization for continuous-time Markov decision processes: two approaches

Quan-xin Zhu 《高校应用数学学报(英文版)》2010,25(4):400-410

This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions. 相似文献

12.

Nonstationary denumerable state Markov decision processes – with average variance criterion

Xianping Guo 《Mathematical Methods of Operations Research》1999,49(1):87-96

In this paper, we consider the nonstationary Markov decision processes (MDP, for short) with average variance criterion on a countable state space, finite action spaces and bounded one-step rewards. From the optimality equations which are provided in this paper, we translate the average variance criterion into a new average expected cost criterion. Then we prove that there exists a Markov policy, which is optimal in an original average expected reward criterion, that minimizies the average variance in the class of optimal policies for the original average expected reward criterion. 相似文献

13.

Average optimality for continuous-time Markov decision processes with a policy iteration approach

Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704

This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献

14.

Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains

Tomás Prieto-Rumeau Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2009,70(3):527-540

This paper deals with denumerable-state continuous-time controlled Markov chains with possibly unbounded transition and reward rates. It concerns optimality criteria that improve the usual expected average reward criterion. First, we show the existence of average reward optimal policies with minimal average variance. Then we compare the variance minimization criterion with overtaking optimality. We present an example showing that they are opposite criteria, and therefore we cannot optimize them simultaneously. This leads to a multiobjective problem for which we identify the set of Pareto optimal policies (also known as nondominated policies). 相似文献

15.

Continuous-Time Markov Decision Processes with Unbounded Transition and Discounted-Reward Rates

Hao Yan Junyu Zhang 《随机分析与应用》2013,31(2):209-231

Abstract

In this article, we study continuous-time Markov decision processes in Polish spaces. The optimality criterion to be maximized is the expected discounted criterion. The transition rates may be unbounded, and the reward rates may have neither upper nor lower bounds. We provide conditions on the controlled system's primitive data under which we prove that the transition functions of possibly non-homogeneous continuous-time Markov processes are regular by using Feller's construction approach to such transition functions. Then, under continuity and compactness conditions we prove the existence of optimal stationary policies by using the technique of extended infinitesimal operators associated with the transition functions of possibly non-homogeneous continuous-time Markov processes, and also provide a recursive way to compute (or at least to approximate) the optimal reward values. The conditions provided in this paper are different from those used in the previous literature, and they are illustrated with an example. 相似文献

16.

Continuous-Time Markov Decision Processes with State-Dependent Discount Factors

Liuer Ye Xianping Guo 《Acta Appl Math》2012,121(1):5-27

We consider continuous-time Markov decision processes in Polish spaces. The performance of a control policy is measured by the expected discounted reward criterion associated with state-dependent discount factors. All underlying Markov processes are determined by the given transition rates which are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. By using the dynamic programming approach, we establish the discounted reward optimality equation (DROE) and the existence and uniqueness of its solutions. Under suitable conditions, we also obtain a discounted optimal stationary policy which is optimal in the class of all randomized stationary policies. Moreover, when the transition rates are uniformly bounded, we provide an algorithm to compute (or?at least to approximate) the discounted reward optimal value function as well as a discounted optimal stationary policy. Finally, we use an example to illustrate our results. Specially, we first derive an explicit and exact solution to the DROE and an explicit expression of a discounted optimal stationary policy for such an example. 相似文献

17.

First passage Markov decision processes with constraints and varying discount factors

Xiao WU Xiaolong ZOU Xianping GUO 《Frontiers of Mathematics in China》2015,10(4):1005

This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results. 相似文献

18.

A mean–variance optimization problem for discounted Markov decision processes

Xianping Guo Liuer Ye George Yin 《European Journal of Operational Research》2012

In this paper, we consider a mean–variance optimization problem for Markov decision processes (MDPs) over the set of (deterministic stationary) policies. Different from the usual formulation in MDPs, we aim to obtain the mean–variance optimal policy that minimizes the variance over a set of all policies with a given expected reward. For continuous-time MDPs with the discounted criterion and finite-state and action spaces, we prove that the mean–variance optimization problem can be transformed to an equivalent discounted optimization problem using the conditional expectation and Markov properties. Then, we show that a mean–variance optimal policy and the efficient frontier can be obtained by policy iteration methods with a finite number of iterations. We also address related issues such as a mutual fund theorem and illustrate our results with an example. 相似文献

19.

New Average Optimality Conditions for Semi-Markov Decision Processes in Borel Spaces

Qingda Wei Xianping Guo 《Journal of Optimization Theory and Applications》2012,153(3):709-732

This paper deals with semi-Markov decision processes under the average expected criterion. The state and action spaces are Borel spaces, and the cost/reward function is allowed to be unbounded from above and from below. We give another set of conditions, under which the existence of an optimal (deterministic) stationary policy is proven by a new technique of two average optimality inequalities. Our conditions are slightly weaker than those in the existing literature, and some new sufficient conditions for the verifications of our assumptions are imposed on the primitive data of the model. Finally, we illustrate our results with three examples. 相似文献

20.

Average optimality inequality for continuous-time Markov decision processes in Polish spaces

Quanxin Zhu 《Mathematical Methods of Operations Research》2007,66(2):299-313

In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957). 相似文献