考虑框架协议的动态报童模型强化学习建模研究 Reinforcement Learning Model of Dynamic Newsboy Problem with Framework Protocol期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

考虑框架协议的动态报童模型强化学习建模研究

引用本文：	祁玉青,赵兴雷,赵田东杰.考虑框架协议的动态报童模型强化学习建模研究[J].运筹与管理,2022,31(10):105-112.

作者姓名：	祁玉青赵兴雷赵田东杰

作者单位：	南京工业大学经济与管理学院,江苏南京 211816

基金项目：	国家自然科学青年基金项目(71701092);国家社会科学基金项目(20BGL025)

摘要：	企业为了稳定货源和供货关系,常与供应商签订一定时期的框架性协议。为了解决零售商在框架协议下采购报童产品的问题,本文运用强化学习建立库存决策模型并使用Q学习算法求取较优订货策略。通过生成样本随机数来模拟需求量,对比研究Q学习算法订货和传统方法订货的差别。通过多次数值实验,发现使用强化学习方法订货相比于传统订货方法(定量订货法、移动平均预测、指数平滑法)平均利润提高约7%~22%,且多次实验下强化学习方法订货相比于理想状态的平均利润相差约8%。这些发现验证了强化学习解决库存问题的有效性和可行性。本文还研究了相关参数变化对总利润的影响,发现利润随着贪婪率(ε)增加而降低、随着学习率(α)的增加而增加。该结论能够为解决相关库存问题提供新的思路。
关键词：	库存模型框架协议 Q学习算法
收稿时间：	2020-10-20
Reinforcement Learning Model of Dynamic Newsboy Problem with Framework Protocol

QI Yu-qing,ZHAO Xing-lei,ZHAO Tian-dong-jie.Reinforcement Learning Model of Dynamic Newsboy Problem with Framework Protocol[J].Operations Research and Management Science,2022,31(10):105-112.

Authors:	QI Yu-qing ZHAO Xing-lei ZHAO Tian-dong-jie

Institution:	School of Economics and Management, Nanjing Tech University, Nanjing 211816, China

Abstract:	In order to stabilize the supply of goods and supply relations, enterprises often sign framework agreements with suppliers for a certain period of time. In order to solve the problem that retailers purchase newsboy products under the framework protocol, an inventory decision model is established by using reinforcement learning, and the optimal ordering strategy is obtained by using Q-learning algorithm. By generating random number of samples to simulate the demand, the difference between Q-learning algorithm and traditional ordering method is compared. Through a number of numerical experiments, it is found that the average profit of orderingwith reinforcement learning method is about 7%~22% higher thanof traditional ordering methods (quantitative ordering method,moving average forecasting and exponential smoothing), and the average profit difference of ordering with reinforcement learning method is about 8% compared with the ideal state. These findings verify the effectiveness and feasibility of reinforcement learning to solve inventory problems. This paper also studies the influence of several parameter changes on the total profit, and finds that the profit decreases with the increase of ε, while the profit increases with the increase of α. This conclusion can provide a new way of thinking for solving relevant inventory problems.

Keywords:	inventory model framework agreement Q-learning algorithm

	点击此处可从《运筹与管理》浏览原始摘要信息
	点击此处可从《运筹与管理》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏