期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

马少辉李保珍黄树成《数理统计与管理》2012,31(2):307-315

从多阶段、延迟回报的角度来看待CRM中的决策优化问题。以KDD98数据集为例,将邮寄序贯决策定义为一个部分可观察马尔可夫决策模型(POMDP)。提出了模型参数估计的EM算法并用MATLAB实现;用模型对数似然值、BIC统计量选择最佳模型;用向前一步预测对模型进行检验;用Incremental prune算法对模型求解。实证结果表明,POMDP模型可以很好的捕捉客户购买行为的动态变化,对客户的购买有很好的预测效果。在此基础上,说明了如何使用该模型以客户终生价值最大化为目标优化直邮策略。相似文献

2.

马尔可夫过程在生产运营项目实施中的应用研究

赖芨宇郑建国朱佳俊蒋靖《数学的实践与认识》2007,37(19):41-45

运用马氏决策规划方法,对企业产品的销售和利润状况进行分析和研究,建立了实施企业生产运营项目的预决策模型,为降低企业项目实施的风险,实现决策的长期效益趋于最优提供了有价值的理论与方法. 相似文献

3.

马尔可夫决策过程在类风湿关节炎治疗中的应用研究

徐伟锋曹平《运筹与管理》2023,(9):150-156

类风湿关节炎(RA)不仅给人们带来了巨大的身心痛苦，同时也带来了巨大的成本。针对RA的治疗过程，本文提出将马尔可夫决策过程(MDP)应用于该过程中。对于建立MDP所需的各个参数，本文给出定义方式并利用临床数据进行推断。首先本文利用患者的实验室指标来衡量健康状态，然后将患者使用的中药视为行动的基础，接着分别将患者指标的改善程度之和与患者两次实验室指标检查之间已住院的时长视为治疗报酬与治疗成本，最后利用相对值迭代算法求解并得到了相应的治疗策略以及治疗报酬与治疗成本。实验结果表明，本文所得到的治疗报酬要高于医院的报酬且治疗成本要低于医院的成本，将MDP模型用于RA的中医治疗中具有一定的临床应用价值。相似文献

4.

基于马尔可夫决策过程的医疗检查预约优化模型

罗利秦春蓉罗永《运筹与管理》2014,(6)

医技部门的医疗检查如电子计算机断层扫描（CT）、核磁共振成像设备（MRI）、X射线（X－rays）常常有如下三种病人类型：门诊病人、住院病人和急诊病人。针对不同病人类型的需求特点，运用马尔可夫决策过程原理和动态规划方法，建立了医疗检查设备的预约优化模型，并证明了模型的最优性质，得出了不同病人类型的最优预约策略。数值算例的结果表明：本文的预约策略不仅易于实施，而且，通过该模型获得的最大收益比按传统先来先预约的模式所获得的收益要大。相似文献

5.

技术柔性、柔性生产与柔性技术的价值 总被引：1，自引：0，他引：1

倪得兵戴春爱唐小我《运筹与管理》2011,20(1)

在产品市场和要素市场存在不确定性的条件下,将企业的生产行为和技术柔性水平的选择纳入一个统一框架进行分析,给出了柔性技术的价值函数及其特征.结果表明,给定技术柔性水平,柔性生产行为价值不会低于非柔性生产行为的价值;给定生产行为,柔性技术的价值不会随着其柔性增加而降低.进一步,在一定条件下,柔性生产行为比非柔性生产行为更具价值,柔性技术的价值随着其柔性水平的增加而增加. 相似文献

6.

深度学习与高中数学命题学习的融合实践路径

郭卫华《上海中学数学》2023,(Z1):46-48

高中阶段的数学学科知识的广度与深度都有明显提升,其中数学命题教学是基础性的教学内容,这一内容模块对学生的数学综合素养提升有明显帮助.深度学习对数学命题教学的高效开展有促进和引导价值,在教学中,教师以深度学习理论为视角,可以优化数学命题教学的设计与组织实施,培养学生的高阶思维.笔者通过分析数学命题学习实践现状、数学高阶思维培养与高中数学命题学习的融合可行性,提出深度学习与数学命题学习相融合的实践路径与建议,以提高数学命题教学的效果,让学生的数学命题学习向思维更深处“漫溯”. 相似文献

7.

基于随机动态规划的有限库存ATO系统优化控制

李稚谭德庆《运筹与管理》2017,26(7):21-28

本文研究n维组件单一产品,有限库存的ATO系统。通过建立马尔可夫决策过程模型(MDP),构造优化算法,研究组件生产与库存的最优控制策略。最优策路可以表示为状态依赖型库存阈值,系统内任一组件的控制策略受其它组件库存状态的影响。利用最优控制理论动态规划方法和数值计算方法对最优控制策略的存在性、最优值的数值计算进行研究,建立更符合实际生产的ATO系统决策模型,进行相应的理论和实验验证,研究系统参数对最优策略的影响。相似文献

8.

基于深度学习的初中数学课堂教学问题链设计与实践——以浙教版八下“方差与标准差”为例

傅兰英《数学之友》2022,(15):51-53

科学教育的目的是培养知情的决策者,使学生具有进行正确决策的知识基础和能力.本文结合浙教版八年级下册第三章“方差与标准差”内容,在教学中以问题链为载体设计学习活动,体现一个统计量的构造过程,在问题链的设计中体现思辨、分析、决策、审视,促进深度学习,提高问题解决能力,培育数学素养. 相似文献

9.

对初中数学深度学习的理解与探究

张圆《中学数学》2021,(4):88-89

初中生在数学学习过程中,往往更多地将注意力集中在数学知识的习得,以及数学习题的解答上.这样的认识实际上限制了学生学习主动性的发挥,再加上初中生受身心发展局限性的影响,他们的学习行为有时停留在浅层学习(Surface Learning)的层面,存在碎片化、浅表化、浮躁化的现象,学生很难深度加工知识信息、深度理解复杂概念、深度掌握内在含义,进而建构个人化和情境化的知识体系以解决复杂问题.要化解这些难题,关键之一就是要优化学生的学习方式,要将学生从浅层学习中解放出来,要让学生真正经历深度学习的过程. 相似文献

10.

泛系识别理论与大系统泛系运筹学的研究与应用(Ⅱ)

吴学谋郭定和《应用数学和力学》1991,12(1):63-67

本文承前启后,介绍了泛系方法论新的框架以及与识别及大系统运筹有关的一些理法. 相似文献

11.

Application of a Near-Optimal Reinforcement Learning Controller to a Robotics Problem in Manufacturing: A Hybrid Approach

Warren E. Hearnes II Augustine O. Esogbue 《Fuzzy Optimization and Decision Making》2003,2(3):183-213

Optimization theory provides a framework for determining the best decisions or actions with respect to some mathematical model of a process. This paper focuses on learning to act in a near-optimal manner through reinforcement learning for problems that either have no model or the model is too complex. One approach to solving this class of problems is via approximate dynamic programming. The application of these methods are established primarily for the case of discrete state and action spaces. In this paper we develop efficient methods of learning which act in complex systems with continuous state and action spaces. Monte-Carlo approaches are employed to estimate function values in an iterative, incremental procedure. Derivative-free line search methods are used to obtain a near-optimal action in the continuous action space for a discrete subset of the state space. This near-optimal control policy is then extended to the entire continuous state space via a fuzzy additive model. To compensate for approximation errors, a modified procedure for perturbing the generated control policy is developed. Convergence results under moderate assumptions and stopping criteria are established. 相似文献

12.

Supporting Learning in Evolving Dynamic Environments

Faison P. Gibson 《Computational & Mathematical Organization Theory》2003,9(4):305-326

In dynamic decision environments such as direct sales, customer support, and electronically mediated bargaining, decision makers execute sequences of interdependent decisions under time pressure. Past decision support systems have focused on substituting for decision makers' cognitive deficits by relieving them of the need to explicitly account for sequential dependencies. However, these systems themselves are fragile to change and, further, do not enhance decision makers' own adaptive capacities. This study presents an alternative strategy that defines information systems requirements in terms of enhancing decision makers' adaptation. In so doing, the study introduces a simulation model of how decision makers learn patterns of sequential dependency. When a system was used to manage workflows in a way predicted by the model to enhance learning, decision makers in a bargaining experiment learned underlying patterns of sequential dependencythat helped them adapt to new situations. This result is rare if not unique in the study of dynamic decision environments. It indicates that a shift, away from substituting for short-term deficits and toward enhancing pattern learning, can substantially improve the effectiveness of decision support in dynamic environments. Based on the specific findings in this study, this shift has important implications for designing information system workflows and potential future applications in interface design. 相似文献

13.

Reinforcement learning versus heuristics for order acceptance on a single resource

M. Mainegra Hing A. van Harten P. C. Schuur 《Journal of Heuristics》2007,13(2):167-187

Order Acceptance (OA) is one of the main functions in business control. Accepting an order when capacity is available could disable the system to accept more profitable orders in the future with opportunity losses as a consequence. Uncertain information is also an important issue here. We use Markov decision models and learning methods from Artificial Intelligence to find decision policies under uncertainty. Reinforcement Learning (RL) is quite a new approach in OA. It is shown here that RL works well compared with heuristics. It is demonstrated that employing an RL trained agent is a robust, flexible approach that in addition can be used to support the detection of good heuristics. 相似文献

14.

Basis Function Adaptation in Temporal Difference Reinforcement Learning 总被引：1，自引：0，他引：1

Ishai?Menache Shie?Mannor Email author Nahum?Shimkin 《Annals of Operations Research》2005,134(1):215-238

Reinforcement Learning (RL) is an approach for solving complex multi-stage decision problems that fall under the general framework of Markov Decision Problems (MDPs), with possibly unknown parameters. Function approximation is essential for problems with a large state space, as it facilitates compact representation and enables generalization. Linear approximation architectures (where the adjustable parameters are the weights of pre-fixed basis functions) have recently gained prominence due to efficient algorithms and convergence guarantees. Nonetheless, an appropriate choice of basis function is important for the success of the algorithm. In the present paper we examine methods for adapting the basis function during the learning process in the context of evaluating the value function under a fixed control policy. Using the Bellman approximation error as an optimization criterion, we optimize the weights of the basis function while simultaneously adapting the (non-linear) basis function parameters. We present two algorithms for this problem. The first uses a gradient-based approach and the second applies the Cross Entropy method. The performance of the proposed algorithms is evaluated and compared in simulations. This research was partially supported by the Fund for Promotion of Research at the Technion. The work of S.M. was partially supported by the National Science Foundation under grant ECS-0312921. 相似文献

15.

Deep Reinforcement Learning for Market Making in Corporate Bonds: Beating the Curse of Dimensionality

Olivier Guéant Iuliia Manziuk 《Applied Mathematical Finance》2019,26(5):387-452

ABSTRACT

In corporate bond markets, which are mainly OTC markets, market makers play a central role by providing bid and ask prices for bonds to asset managers. Determining the optimal bid and ask quotes that a market maker should set for a given universe of bonds is a complex task. The existing models, mostly inspired by the Avellaneda-Stoikov model, describe the complex optimization problem faced by market makers: proposing bid and ask prices for making money out of the difference between them while mitigating the market risk associated with holding inventory. While most of the models only tackle one-asset market making, they can often be generalized to a multi-asset framework. However, the problem of solving the equations characterizing the optimal bid and ask quotes numerically is seldom tackled in the literature, especially in high dimension. In this paper, we propose a numerical method for approximating the optimal bid and ask quotes over a large universe of bonds in a model à la Avellaneda–Stoikov. As classical finite difference methods cannot be used in high dimension, we present a discrete-time method inspired by reinforcement learning techniques, namely, a model-based deep actor-critic algorithm. 相似文献

16.

Distributionally robust optimization for sequential decision-making

《Optimization》2012,61(12):2397-2426

相似文献

17.

Converging Marriage in Honey-Bees Optimization and Application to Stochastic Dynamic Programming 总被引：1，自引：0，他引：1

Hyeong Soo Chang 《Journal of Global Optimization》2006,35(3):423-441

In this paper, we first refine a recently proposed metaheuristic called “Marriage in Honey-Bees Optimization” (MBO) for solving combinatorial optimization problems with some modifications to formally show that MBO converges to the global optimum value. We then adapt MBO into an algorithm called “Honey-Bees Policy Iteration” (HBPI) for solving infinite horizon-discounted cost stochastic dynamic programming problems and show that HBPI also converges to the optimal value. 相似文献

18.

Analysis of optimal and nearly optimal sequencing policies for a closed queueing network

Wendell G. Gilland 《Operations Research Letters》2005,33(1):9-16

We analyze sequencing policies designed to most effectively utilize the resources of a closed queueing network representation of a manufacturing system. A continuous time Markov decision process formulation is used to compare the performance of optimal sequencing policies and a heuristic developed by analyzing a heavy traffic approximation of the system. 相似文献