共查询到20条相似文献,搜索用时 15 毫秒
1.
在动态多阶段情形,投资者面临的环境不仅只有投资环境,还包括消费环境.投资者关于投资与消费的决策具有层次性.因为消费事关人的生存需要,是优先要考虑的问题,且投资的最终目的还是为了消费,所以使消费最大化应是高一层次的目标,而使投资最大化则应是次一级的目标.因此,试图建立一个二层次消费与投资决策优化动态规划模型,以便更好地模拟现实世界的情况.讨论了该模型的动态决策过程和最优解的性质. 相似文献
2.
Hyeong Soo Chang 《Operations Research Letters》2007,35(4):434-438
This brief paper presents a policy improvement method for constrained Markov decision processes (MDPs) with average cost criterion under an ergodicity assumption, extending Howard's policy improvement for MDPs. The improvement method induces a policy iteration-type algorithm that converges to a local optimal policy. 相似文献
3.
研究一类多阶段动态规划问题,给出了求最优解的方法.将这种多阶段动态规划应用于Turbo译码中,不仅可以减少运算量,还可以避免传统的Turbo译码算法需要进行指数运算以及其随着迭代次数的增加容易出现的数据溢出问题,因此是一种十分有效的方法,是对系统工程理论应用领域的拓宽. 相似文献
4.
基于MDP和动态规划的医疗检查预约调度优化方法研究 总被引:1,自引:0,他引:1
医疗检查对医生诊断病人病情具有重要作用。针对医疗检查资源的预约调度问题,考虑两台设备、三类病人且各类病人所需检查时间不同的情况。以医院在检查设备方面收益最大化为目标,建立有限时域马尔可夫决策(Markov decision process,MDP)模型,并结合动态规划理论,得出系统最优的预约排程策略。通过matlab仿真模拟医院的检查预约情况,并结合调研数据,实例验证了该预约策略相对于传统预约策略的优越性。最后,对设备的最大可用时间和住院病人的预约请求到达率模型进行敏感性分析,研究了预约策略的适用性。 相似文献
5.
This paper investigates a dynamic event-triggered optimal control problem of discrete-time (DT) nonlinear Markov jump systems (MJSs) via exploring policy iteration (PI) adaptive dynamic programming (ADP) algorithms. The performance index function (PIF) defined in each subsystem is updated by utilizing an online PI algorithm, and the corresponding control policy is derived via solving the optimal PIF. Then, we adopt neural network (NN) techniques, including an actor network and a critic network, to estimate the iterative PIF and control policy. Moreover, the designed dynamic event-triggered mechanism (DETM) is employed to avoid wasting additional resources when the estimated iterative control policy is updated. Finally, based on the Lyapunov difference method, it is proved that the system stability and the convergence of all signals can be guaranteed under the developed control scheme. A simulation example for DT nonlinear MJSs with two system modes is presented to demonstrate the feasibility of the control design scheme. 相似文献
6.
卢方元 《数学的实践与认识》2006,36(4):121-125
以离散型动态投入产出模型为约束条件的主体、以决策部门所希望达到的种种目标为约束条件的附加部分,建立动态投入产出目标规划模型.通过求解动态投入产出目标规划模型而得到离散型动态投入产出模型的解.此解法与其它解法相比具有更大的实用价值. 相似文献
7.
8.
9.
动态模糊规划模型的构建及应用 总被引:1,自引:0,他引:1
常规规划模型通常存在如下两种缺陷:首先,它的目标系数及约束条件都是在硬性限制下的确定值,因而在建模方面弹性小、硬度大;其次,它的目标系数与时间无关,因此不能有效地刻划时时刻刻变化着的目标系数,而动态模糊规划模型可以有效地解决上述缺陷.首先应用模糊动态AHP确定目标系数;然后根据L-R模糊数的强序关系准则,将动态模糊规划模型分解为最优与最劣两个模糊规划模型;再根据以α水平截集为基础的求解方法,将上述两个模型进行相应的转换,建立具有风险分析功能的动态模糊规划模型;最后将其应用到一个实际算例中,收到较好的结果. 相似文献
10.
大型公立医院病床供需矛盾日益突出,医院作为服务系统有必要考虑由于病床需求响应速度不及时而引起的患者策略性行为。针对患者到达时间的随机性与住院时长的不确定性,本文提出考虑患者止步行为的动态入院接收决策问题,制定了适用于可等待慢性病患者的入院接收决策方法,旨在提高患者的就医满意度,有效权衡多类患者的接收数量,降低由于科室响应速度过慢引发的患者止步频率。首先,本文对考虑患者止步行为的动态入院接收决策问题进行数学描述及符号定义;然后,对患者止步行为的影响因素进行分析并构建止步概率函数;进一步地,构建考虑患者止步行为的动态入院接收马尔可夫决策过程(MDP)模型,并针对模型特点设计值迭代算法,最后通过数值算例验证本文所提方法的可行性与有效性。 相似文献
11.
范臻 《应用数学与计算数学学报》2006,20(1):56-62
本文对于信用资产组合的优化问题给出了一个稳健的模型,所建模型涉及了条件在险值(CVaR)风险度量以及具有补偿限制的随机线性规划框架,其思想是在CVaR与信用资产组合的重构费用之间进行权衡,并降低解对于随机参数的实现的敏感性.为求解相应的非线性规划,本文将基本模型转化为一系列的线性规划的求解问题. 相似文献
12.
本文研究n维组件单一产品,有限库存的ATO系统。通过建立马尔可夫决策过程模型(MDP),构造优化算法,研究组件生产与库存的最优控制策略。最优策路可以表示为状态依赖型库存阈值,系统内任一组件的控制策略受其它组件库存状态的影响。利用最优控制理论动态规划方法和数值计算方法对最优控制策略的存在性、最优值的数值计算进行研究,建立更符合实际生产的ATO系统决策模型,进行相应的理论和实验验证,研究系统参数对最优策略的影响。 相似文献
13.
This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iterations, the value iteration algorithm is not strongly polynomial for discounted dynamic programming. 相似文献
14.
运用马氏决策规划方法,对企业产品的销售和利润状况进行分析和研究,建立了实施企业生产运营项目的预决策模型,为降低企业项目实施的风险,实现决策的长期效益趋于最优提供了有价值的理论与方法. 相似文献
15.
16.
In this paper, we study a few challenging theoretical and numerical issues on the well known trust region policy optimization for deep reinforcement learning. The goal is to find a policy that maximizes the total expected reward when the agent acts according to the policy. The trust region subproblem is constructed with a surrogate function coherent to the total expected reward and a general distance constraint around the latest policy. We solve the subproblem using a reconditioned stochastic gradient method with a line search scheme to ensure that each step promotes the model function and stays in the trust region. To overcome the bias caused by sampling to the function estimations under the random settings, we add the empirical standard deviation of the total expected reward to the predicted increase in a ratio in order to update the trust region radius and decide whether the trial point is accepted. Moreover, for a Gaussian policy which is commonly used for continuous action space, the maximization with respect to the mean and covariance is performed separately to control the entropy loss. Our theoretical analysis shows that the deterministic version of the proposed algorithm tends to generate a monotonic improvement of the total expected reward and the global convergence is guaranteed under moderate assumptions. Comparisons with the state-of-the-art methods demonstrate the effectiveness and robustness of our method over robotic controls and game playings from OpenAI Gym. 相似文献
17.
本文在对求解多阶段决策问题的动态规划的基本理论—最优性原理进行严格证明的同时还通过实例介绍了动态规划的基本方法—逆序递推法的具体应用 相似文献
18.
Rüdiger Schultz 《Annals of Operations Research》2000,100(1-4):55-84
Some developments in structure and stability of stochastic programs during the last decade together with interrelations to optimization theory and stochastics are reviewed. With weak convergence of probability measures as a backbone we discuss qualitative and quantitative stability of recourse models that possibly involve integer variables. We sketch stability in chance constrained stochastic programming and provide some applications in statistical estimation. Finally, an outlook is devoted to issues that were not discussed in detail and to some open problems. 相似文献
19.
《Operations Research Letters》2014,42(6-7):429-431
This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and -policy iteration algorithms are not strongly polynomial. 相似文献
20.
JINDE WANG 《运筹学学报》1998,(1)
1.IntroductionDistributionproblemsareofgreatimportanceinstochasticoptimizationandstatis-tics.Usuallythiskindofproblemscanbedescribedinthefollowingform:wheref(x,w)isafuncti0ndefinedonR"xflandSisasubsetin'R".Because0fc0mplexityoftheproblems,ingeneral,onecangetonlytheirapproximatesolutions.Thefollowingtypeofapproximationis0ftenused:Letuscall(2)thefirsttype0fapproximation.DenotebyZ(w),A(w)theoptima1valueandoptimalsolutionsetofproblem(1)respectivelyandbyZk(w),Ak(w)thecorrespondingonesofproblem(… 相似文献