首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 647 毫秒
1.
本文以车间搬运机器人为研究对象,在考虑时间窗的前提下,求解机器人进行物料配送和成品回收场景下的路径优化问题。提出一种强化学习遗传蚁群算法,首先利用扫描法求解初始搬运机器人的数量,并将子路径节点的几何中心设置为虚拟节点,利用嵌入遗传算子的蚁群算法求解连接虚拟节点的最优路径,再利用强化学习算法求解子路径的最优结果;最后将基本成本、运输成本和时间惩罚成本的加权和作为目标解,并最终求出满足约束条件的最优解。通过与基准问题求解结果对比,验证了强化学习遗传蚁群算法的优越性。  相似文献   

2.
旅行商问题(TSP)是组合最优化中的典型问题,求解TSP问题的现实意义重大.随着深度强化学习(DRL)在工业界的广泛应用,利用DRL模型自动设计学习算法成为近期的研究热点.为提升DRL模型在大范围TSP问题上的泛化能力,文章提出一种动态图卷积网络编码和空间注意力机制解码的混合模型求解大范围TSP问题.动态图卷积模块可以动态编码节点信息,从而有效地更新每个节点的隐藏层状态;空间注意力有利于捕捉节点之间的全局联系,进而通过加权所有局部特征计算和提取关键特征.实验结果表明文章模型将TSP50的训练策略泛化至TSP250/500/750/1000时的优化性能超越了先前DRL模型,且在TSPlib标准数据集上的测试结果也显示出模型对优化性能的提升.  相似文献   

3.
医学影像在临床应用中发挥着至关重要的作用,可用于早期发现、监测、诊断和治疗评估.当前,具有优异特征学习能力的深度学习方法已经迅速成为研究和分析医学影像的首选方法.本文介绍了医学影像研究中常见的深度学习模型,并梳理了这些深度学习模型在医学影像研究中的应用,其中包括传统的医学图像检测、分割和配准,以及深度学习模型在脑影像分析和低剂量CT医学图像重建中的应用.本文最后还对深度学习在未来医学影像研究领域的发展前景进行了讨论.  相似文献   

4.
针对汽车涂装车间中的作业优化排序问题,提出一种基于启发式Q学习的优化算法。首先,建立包括满足总装车间生产顺序和最小化喷枪颜色切换次数的多目标整数规划模型。将涂装作业优化排序问题抽象为马尔可夫过程,建立基于启发式Q算法的求解方法。通过具体案例,对比分析了启发式Q学习、Q学习、遗传算法三种方案的优劣。结果表明:在大规模问题域中,启发式Q学习算法具有寻优效率更高、效果更好的优势。本研究为机器学习算法在汽车涂装作业优化排序问题的应用提出了新思路。  相似文献   

5.
将Reid和Zhi提出的符号数值混合消元方法应用于求解多项式优化问题,将多项式优化问题转化为矩阵最小特征值求解问题,并在Maple软件中实现了算法.  相似文献   

6.
马斌  吴泽忠 《运筹与管理》2020,29(2):122-136
传统的供应链求解方法为投影法,针对其要对投影进行计算,十分复杂的缺点,提出用改进的粒子群算法求解供应链均衡问题,利用动态异步调整学习因子来有效的提高了算法搜索能力与精度。本文介绍了供应链网络均衡问题转变为无约束优化问题的方法,然后用改进的粒子群优化算法进行求解。通过四个数值算例,将实验结果与标准粒子群算法、蜂群算法、学习因子同步变化的粒子群算法进行比较,验证了改进的粒子群优化算法在解决供应链网络均衡问题中的有效性与优越性,为供应链网络求解提供了一种新的方法。  相似文献   

7.
对于以最小化最大完工时间为目标的置换流水车间调度问题,现有研究较少考虑学习效应对生产调度的影响,构建了具有学习效应的PFSP问题数学模型.采用ROV的编码方式,应用布谷鸟搜索算法进行离散优化问题求解.通过对Car类问题的大量仿真测试,表明了布谷鸟搜索算法求解该类问题的可行性和有效性.同时,证明了学习效应能够降低最大完工时间,从而提高生产效率.  相似文献   

8.
针对非洲野狗算法求解优化问题时全局性收敛不强的特点,对该算法进行改进,提出了改进的非洲野狗算法,结合二进制编码设计了求解离散优化问题的二进制编码非洲野狗算法,并将该算法应用于求解TSP问题并与其他算法做对比分析.研究结果显示,求解TSP问题时二进制编码非洲野狗算法求解精度更高,收敛速度更快.  相似文献   

9.
考虑求解一类半监督距离度量学习问题. 由于样本集(数据库)的规模与复杂性的激增, 在考虑距离度量学习问题时, 必须考虑学习来的距离度量矩阵具有稀疏性的特点. 因此, 在现有的距离度量学习模型中, 增加了学习矩阵的稀疏约束. 为了便于模型求解, 稀疏约束应用了Frobenius 范数约束. 进一步, 通过罚函数方法将Frobenius范数约束罚到目标函数, 使得具有稀疏约束的模型转化成无约束优化问题. 为了求解问题, 提出了正定矩阵群上加速投影梯度算法, 克服了矩阵群上不能直接进行线性组合的困难, 并分析了算法的收敛性. 最后通过UCI数据库的分类问题的例子, 进行了数值实验, 数值实验的结果说明了学习矩阵的稀疏性以及加速投影梯度算法的有效性.  相似文献   

10.
对指标带有偏好的多阶段多指标决策   总被引:5,自引:0,他引:5  
本在对指标带有偏好分析的基础上,又定义了理想方案及贴近度,进而将多阶段多指标决策问题转化为一个多目标优化模型来处理,并将其应用于证券投资领域。  相似文献   

11.
Basis Function Adaptation in Temporal Difference Reinforcement Learning   总被引:1,自引:0,他引:1  
Reinforcement Learning (RL) is an approach for solving complex multi-stage decision problems that fall under the general framework of Markov Decision Problems (MDPs), with possibly unknown parameters. Function approximation is essential for problems with a large state space, as it facilitates compact representation and enables generalization. Linear approximation architectures (where the adjustable parameters are the weights of pre-fixed basis functions) have recently gained prominence due to efficient algorithms and convergence guarantees. Nonetheless, an appropriate choice of basis function is important for the success of the algorithm. In the present paper we examine methods for adapting the basis function during the learning process in the context of evaluating the value function under a fixed control policy. Using the Bellman approximation error as an optimization criterion, we optimize the weights of the basis function while simultaneously adapting the (non-linear) basis function parameters. We present two algorithms for this problem. The first uses a gradient-based approach and the second applies the Cross Entropy method. The performance of the proposed algorithms is evaluated and compared in simulations. This research was partially supported by the Fund for Promotion of Research at the Technion. The work of S.M. was partially supported by the National Science Foundation under grant ECS-0312921.  相似文献   

12.
ABSTRACT. An important technical component of natural resource management, particularly in an adaptive management context, is optimization. This is used to select the most appropriate management strategy, given a model of the system and all relevant available information. For dynamic resource systems, dynamic programming has been the de facto standard for deriving optimal state‐specific management strategies. Though effective for small‐dimension problems, dynamic programming is incapable of providing solutions to larger problems, even with modern microcomputing technology. Reinforcement learning is an alternative, related procedure for deriving optimal management strategies, based on stochastic approximation. It is an iterative process that improves estimates of the value of state‐specific actions based in interactions with a system, or model thereof. Applications of reinforcement learning in the field of artificial intelligence have illustrated its ability to yield near‐optimal strategies for very complex model systems, highlighting the potential utility of this method for ecological and natural resource management problems, which tend to be of high dimension. I describe the concept of reinforcement learning and its approach of estimating optimal strategies by temporal difference learning. I then illustrate the application of this method using a simple, well‐known case study of Anderson [1975], and compare the reinforcement learning results with those of dynamic programming. Though a globally‐optimal strategy is not discovered, it performs very well relative to the dynamic programming strategy, based on simulated cumulative objective return. I suggest that reinforcement learning be applied to relatively complex problems where an approximate solution to a realistic model is preferable to an exact answer to an oversimplified model.  相似文献   

13.
非凸极小极大问题是近期国际上优化与机器学习、信号处理等交叉领域的一个重要研究前沿和热点,包括对抗学习、强化学习、分布式非凸优化等前沿研究方向的一些关键科学问题都归结为该类问题。国际上凸-凹极小极大问题的研究已取得很好的成果,但非凸极小极大问题不同于凸-凹极小极大问题,是有其自身结构的非凸非光滑优化问题,理论研究和求解难度都更具挑战性,一般都是NP-难的。重点介绍非凸极小极大问题的优化算法和复杂度分析方面的最新进展。  相似文献   

14.
Optimization theory provides a framework for determining the best decisions or actions with respect to some mathematical model of a process. This paper focuses on learning to act in a near-optimal manner through reinforcement learning for problems that either have no model or the model is too complex. One approach to solving this class of problems is via approximate dynamic programming. The application of these methods are established primarily for the case of discrete state and action spaces. In this paper we develop efficient methods of learning which act in complex systems with continuous state and action spaces. Monte-Carlo approaches are employed to estimate function values in an iterative, incremental procedure. Derivative-free line search methods are used to obtain a near-optimal action in the continuous action space for a discrete subset of the state space. This near-optimal control policy is then extended to the entire continuous state space via a fuzzy additive model. To compensate for approximation errors, a modified procedure for perturbing the generated control policy is developed. Convergence results under moderate assumptions and stopping criteria are established.  相似文献   

15.
陈峰 《运筹学学报》2021,25(3):37-73
本文基于整车物流智能调度决策支持系统的研发、实施与运维的成功应用,论述运筹学在智能化上的应用路径以及实践驱动的学术路径。该系统是国内较早在汽车物流企业实现落地的智能化调度系统,其所形成的思想理论与方法技术揭示了运筹学在智能化应用上的核心价值,以及实践驱动的学术价值,对解决“卡脖子”难题提供示范性思路。本文提出运筹学在智能化研发上“三环七步”的整体研发框架。首先,分析智能化需求的运筹学特征,详细介绍汽车整车物流的发展趋势、瓶颈及智能调度需求;其次,论述运筹学系统模型的作用与建模方法,分析汽车整车物流系统模型的决策要素、目标及约束,提出汽车整车物流智能调度的运筹学应用问题。然后,提出“模式装箱”的新装箱理论问题,明确问题的计算难解性、可解性及核心科学特征。进一步,建立汽车整车物流调度应用问题与科学问题的混合整数线性规划模型;提出求解汽车整车物流调度问题的分支定界算法,以及大规模问题求解的时空分解及滚动求解方法与技术;提出面向运筹应用的生产测试及压力测试方法,给出汽车整车物流调度的测试分析的流程与结果。此外,提出深度集成整车运输管理系统与仓库管理系统、优化算法引擎驱动的分布式、多视图、多系统融合的智能调度决策支持系统。最后,论述该系统在实施过程中的推广使用和运维情况,并对运筹学应用及实践驱动的科学研究进行总结与展望。  相似文献   

16.
Because of their convincing performance, there is a growing interest in using evolutionary algorithms for reinforcement learning. We propose learning of neural network policies by the covariance matrix adaptation evolution strategy (CMA-ES), a randomized variable-metric search algorithm for continuous optimization. We argue that this approach, which we refer to as CMA Neuroevolution Strategy (CMA-NeuroES), is ideally suited for reinforcement learning, in particular because it is based on ranking policies (and therefore robust against noise), efficiently detects correlations between parameters, and infers a search direction from scalar reinforcement signals. We evaluate the CMA-NeuroES on five different (Markovian and non-Markovian) variants of the common pole balancing problem. The results are compared to those described in a recent study covering several RL algorithms, and the CMA-NeuroES shows the overall best performance.  相似文献   

17.
Over the past 10 years, a considerable amount of research has been devoted to the development of models to support decision making in the particular yet important context of Emergency Medical Services (EMS). More specifically, the need for advanced strategies to take into account the uncertainty and dynamism inherent to EMS, as well as the pertinence of socially oriented objectives, such as equity, and patient medical outcomes, have brought new and exciting challenges to the field. In this context, this paper summarizes and discusses modern modeling approaches to address problems related to ambulance fleet management, particularly those related to vehicle location and relocation, as well as dispatching decisions. Although it reviews early works on static ambulance location problems, this review concentrates on recent approaches to address tactical and operational decisions, and the interaction between these two types of decisions. Finally, it concludes on the current state of the art and identifies promising research avenues in the field.  相似文献   

18.
We propose a hybrid heuristic procedure based on scatter search and tabu search for the problem of clustering objects to optimize multiple criteria. Our goal is to search for good approximations of the efficient frontier for this class of problems and provide a means for improving decision making in multiple application areas. Our procedure can be viewed as an extension of SSPMO (a scatter search application to nonlinear multiobjective optimization) to which we add new elements and strategies specially suited for combinatorial optimization problems. Clustering problems have been the subject of numerous studies; however, most of the work has focused on single-objective problems. Clustering using multiple criteria and/or multiple data sources has received limited attention in the operational research literature. Our scatter tabu search implementation is general and tackles several problems classes within this area of combinatorial data analysis. We conduct extensive experimentation to show that our method is capable of delivering good approximations of the efficient frontier for improved analysis and decision making.  相似文献   

19.
大量经济学实验研究证实了公平关切和学习效应对决策者行为的影响力。本文研究三人组供应链系统,通过区别设计个体自我学习以及社会学习的实验环境,对比考察备用供应商的公平关切程度,以及制造商和备用供应商学习曲线的特点。实验结果支持了学习效应存在的假设:随着实验期数的增加,单期决策时间逐渐减少,备用供应商的整体拒绝率逐渐降低,制造商的策略逐渐集中。进一步构建了引入公平关切的强化学习模型。通过参数估计发现在个体自我学习和社会学习实验环境下,备用供应商的横向公平关切程度均较为显著,信息共享对备用供应商的横向公平关切偏好无明显影响。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号