首页 | 本学科首页   官方微博 | 高级检索  
     

大数据背景下水厂自动投矾模型研究
引用本文:戴宏,朱恩文,李今平,曹峻,于博骏. 大数据背景下水厂自动投矾模型研究[J]. 经济数学, 2020, 37(4): 182-189
作者姓名:戴宏  朱恩文  李今平  曹峻  于博骏
作者单位:长沙水业集团有限公司 ,湖南 长沙 410015;长沙理工大学 数学与统计学院 ,湖南 长沙 410114;海南大学 理学院 ,海南 海口 570228
基金项目:国家统计局科研资助项目
摘    要:根据某市自来水有限责任公司第二水厂的历史矾耗数据,建立矾耗流量关于原水浊度、温度等的动态矾耗模型. 通过对数据进行处理得到10900个合格且净水效果高效的数据,将筛选出的数据分为训练样本集和测试样本集. 在回归拟合中,通过拟合R2的大小将原水浊度划分为“低浊”“中浊”“高浊”3个区间,利用泰勒展开公式的非线性变量代换分别对3个区间建立不同的多项式回归模型,得到预测正确率约为72%,总的矾耗流量值约减少了9.6%的结果;在随机森林模型中,使用10900个合格数据,利用训练样本集,以“原水浊度”“pH值”“原水流量”和“水温”为输入变量,建立包含2000棵决策树的随机森林模型,得到预测正确率约为44. 21%,总的矾耗流量值增加了0.04%的结果. 从模型对合格数据的拟合优度看,随机森林模型比非线性回归模型效果更好;在平均绝对误差、平均绝对偏差百分比等评价指标上,前者均优于后者;但从历史数据检验的结果,模型的可解读性,模型的操作难度和推广角度看,分段二元非线性回归模型的优势更为突出.

关 键 词:动态矾耗模型  随机森林模型  非线性回归模型

Research on Dosing Coagulation Models in Waterworks under the Background of Big Data
DAI Hong,ZHU Enwen,LI Jinping,CAO Jun,YU Bojun. Research on Dosing Coagulation Models in Waterworks under the Background of Big Data[J]. Mathematics in Economics, 2020, 37(4): 182-189
Authors:DAI Hong  ZHU Enwen  LI Jinping  CAO Jun  YU Bojun
Affiliation:(1.Changsha Water Group Co.,LTD. Changsha, Hunan 410114, China; 2. School of Mathematics and Statistics,Changsha University of Science and Technology, Changsha, Hunan 410114, China; 3.School of Science, Hainan University, Haikou, Hainan 570228, China)
Abstract:Based on the historical alum consumption data of the second water plant of a city waterworks responsibility co., Ltd., the dynamic alum consumption models of raw water turbidity and temperature were established. 10900 qualified and efficient water purification data were obtained by processing the data, and the selected data were divided into training sample set and test sample set. In regression fitting, the turbidity of raw water was divided into three intervals: "low turbid", "medium turbid" and "high turbid" by fitting the size of R2. Using the nonlinear variable substitution of Taylor expansion formula to establish different multinomial regression models for the three intervals has the prediction accuracy of 72%, and the total alum consumption value is reduced by about 9.6%.In the stochastic forest model, using 10900 eligible data and the training sample set, making use of the "raw water turbidity", the "ph value", the "raw water flow" and the "water temperature" as input variables, a random forest model containing 2000 decision trees was established to obtain the predicted correct rate of about 44.21%. The total alum consumption value increased by 0.04% .From the view point of the goodness of fit of the model to the qualified data, the effect of the stochastic forest model is better than that of the nonlinear regression model. The former is superior to the latter in terms of average absolute error, average absolute deviation percentage and other evaluation indexes. However, from the historical data test results, the interpretability of the model, the operation difficulty of the model and the popularization perspective, the advantage of the segmented binary nonlinear regression model is more prominent.
Keywords:dynamic alum consumption model   stochastic forest model   nonlinear regression model
本文献已被 万方数据 等数据库收录!
点击此处可从《经济数学》浏览原始摘要信息
点击此处可从《经济数学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号