首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 78 毫秒
1.
针对大数据背景下机器学习的3种新分类算法:支持向量机、增强决策树、随机森林和传统分类的3种算法:逻辑回归、K最近邻法和线性判别分析法,选取了七个不同行业的实例数据集用上述六种分类算法进行数值分析,计算六种分类算法在测试集的总误判概率和两种错误的误判率.分析结果表明:从预测角度上大数据情况下新的机器学习分类算法尤其是随机森林和增强决策树的表现明显优于传统的分类算法.  相似文献   

2.
提出一种基于数据集分割的极限学习机集成算法——DS-E-ELM.该算法主要包含以下3个步骤:首先,将数据集分成互不相关的κ个子集,选择κ一1个子集组合成一个训练集,这样可以得到κ个不同的数据集;然后将新得到的κ个数据集利用极限学习机训练得到κ个分类器;最后对κ个分类器预测得到的结果通过多数投票的方法决定预测结果.通过对6个肿瘤数据集的实验证明,DS-E-ELM与单独的ELM、Bagging、Boosting等算法相比,具有更高的分类精度,且稳定性更好.  相似文献   

3.
构建了基于二阶段异质随机森林的汽油辛烷值预测模型.首先利用样本-位点信息表知识约简模型,筛选出对汽油辛烷值影响大的位点数据作为第一阶段;然后,利用集成学习思想集成支持向量回归和动态时间序列神经网络,构建异质随机森林预测模型作为第二阶段.利用十折交叉法验证模型精度,结果表明该集成学习算法具有有效性和高精度.  相似文献   

4.
陶朝杰  杨进 《经济数学》2020,37(3):214-220
虚假评论是电商发展过程中一个无法避免的难题. 针对在线评论数据中样本类别不平衡情况,提出基于BalanceCascade-GBDT算法的虚假评论识别方法. BalanceCascade算法通过设置分类器的误报率逐步缩小大类样本空间,然后集成所有基分类器构建最终分类器. GBDT以其高准确性和可解释性被广泛应用于分类问题中,并且作为样本扰动不稳定算法,是十分合适的基分类模型. 模型基于Yelp评论数据集,采用AUC值作为评价指标,并与逻辑回归、随机森林以及神经网络算法进行对比,实验证明了该方法的有效性.  相似文献   

5.
给出了基于全部风险(ORM)最小化基础上的半监督支持向量机分类算法,该算法通过加入工作集进行训练,提高了标准SVM对训练集提供信息不充分的数据集的分类泛化能力,而且能有效地处理大量的无标示数据.并将凹半监督支持向量机算法应用于县域可持续发展综合实力评价中.通过邯郸15个县作实证分析,论证了该算法的可行性和有效性.  相似文献   

6.
基于图的半监督分类方法近年来在模式识别和机器学习领域取得了广泛的关注.然而许多传统方法在构建邻域图时采用固定的邻域尺寸,且在模型训练过程中同等对待所有样本,忽略了样本间的差异性,从而影响了方法的效果.对此,文章提出一种基于自步学习和稀疏自表达的半监督分类方法,提取并保持数据的有判别信息的稀疏自表达结构,并基于自步学习机制提出一种新的自步学习项,将数据重要程度的软权重与硬权重相结合,来对样本进行学习.所提方法能够自适应建立数据间的关系,自动给出样本的重要程度并由易到难进行学习,且具有多类的显性非线性分类函数.几个标准数据集上的实验结果表明,所提算法具有较好的半监督分类效果.  相似文献   

7.
利用随机森林特征选择算法,对信用评估的可用指标集进行特征选择,在此基础上建立基于随机森林融合朴素贝叶斯的信用评估模型.选取UCI数据库中的German数据集进行实证研究,结果表明,通过随机森林进行特征选择的随机森林融合朴素贝叶斯模型具有更高的预测准确度.  相似文献   

8.
为了更好地利用晶体硅片资源,实现对晶体硅片准确高效的分类,提出了一种改进的ResNet34卷积神经网络,且用于对晶体硅片高清图像进行分类.通过拍摄晶体硅片高清图像建立自有数据集,并对其进行离线扩充来有效扩大数据集.基于ResNet34网络建立分类模型,采取自适应矩估计权重衰减优化算法(AdamW)来提高ResNet34网络的泛化能力,同时将注意力机制的方法融入到ResNet34网络中增强模型的特征提取能力,之后将改进的模型载入到晶体硅片数据集上训练,实验结果发现,所提W-ResNet34+SC-SEAM分类模型的准确率可达99.91%,比在仅利用ResNet34模型分类结果上提高了2.68%的准确率,实现了对晶体硅片的精确分类,证明了所提分类方法是可行的.  相似文献   

9.
本文分析了15具白骨化尸体标本的股骨汞(Hg),铅(Pb),镉(Cd)元素含量数据,在三年的时间内采集了3次,一共收集到45个数据。首先将这组数据看着纵向数据,利用线性随机效应混合模型、Cox随机混合效应模型进行分析,结果显示,如果对每个白骨化尸体标本建立线性模型,可以精确预测出死亡时间,而且不需要采集铅元素含量数据。混合效应模型的预测效果也很好,最大误差不会超过1个月。其次我们对数据不作任何假设,利用机器学习中随机森林方法分析数据,并利用5折交叉验证方法来判断结果的可靠性,训练集和测试集的NMSE分别为0.1205944,0.5604286,因此可以用训练出的模型来预测死亡时间。  相似文献   

10.
通过对2014~2019年我国信用债违约案例的原因分析及相关文献综述,从债券资质、债务主体、财务数据、宏观因素四个维度构建债券违约的指标体系,利用随机森林算法优化,研究发现当影响因素选择18项与37项时,样本内外预测结果达到均衡。基于不同角度的七种算法对比分析,择优选取三种作为底层算法:随机森林算法、梯度提升决策树算法与贝叶斯算法,并结合逻辑回归算法为次级训练算法融合构建基于Stacking算法集成的债券违约预测模型。实证结果表明,第一,Stacking算法的双重集成作用相对底层的单次集成总体精确度提升了1%到8%;第二,对不同指标数量的Stacking算法集成模型的评估表明所构建的指标体系提高了预测水平;第三,基于样本内外预测均衡的底层算法选择方法有效可取,分别纳入相对劣势的底层算法时,会逐渐影响模型稳定性。研究成果可以为我国债券市场风险管理提供技术支持与参考。  相似文献   

11.
Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines.  相似文献   

12.
This article is devoted to providing a theoretical underpinning for ensemble forecasting with rapid fluctuations in body forcing and in boundary conditions. Ensemble averaging principles are proved under suitable “mixing” conditions on random boundary conditions and on random body forcing. The ensemble averaged model is a nonlinear stochastic partial differential equation, with the deviation process (i.e., the approximation error process) quantified as the solution of a linear stochastic partial differential equation.  相似文献   

13.
根据某市自来水有限责任公司第二水厂的历史矾耗数据,建立矾耗流量关于原水浊度、温度等的动态矾耗模型. 通过对数据进行处理得到10900个合格且净水效果高效的数据,将筛选出的数据分为训练样本集和测试样本集. 在回归拟合中,通过拟合R2的大小将原水浊度划分为“低浊”“中浊”“高浊”3个区间,利用泰勒展开公式的非线性变量代换分别对3个区间建立不同的多项式回归模型,得到预测正确率约为72%,总的矾耗流量值约减少了9.6%的结果;在随机森林模型中,使用10900个合格数据,利用训练样本集,以“原水浊度”“pH值”“原水流量”和“水温”为输入变量,建立包含2000棵决策树的随机森林模型,得到预测正确率约为44. 21%,总的矾耗流量值增加了0.04%的结果. 从模型对合格数据的拟合优度看,随机森林模型比非线性回归模型效果更好;在平均绝对误差、平均绝对偏差百分比等评价指标上,前者均优于后者;但从历史数据检验的结果,模型的可解读性,模型的操作难度和推广角度看,分段二元非线性回归模型的优势更为突出.  相似文献   

14.
We prove uniform consistency of Random Survival Forests (RSF), a newly introduced forest ensemble learner for analysis of right-censored survival data. Consistency is proven under general splitting rules, bootstrapping, and random selection of variables—that is, under true implementation of the methodology. Under this setting we show that the forest ensemble survival function converges uniformly to the true population survival function. To prove this result we make one key assumption regarding the feature space: we assume that all variables are factors. Doing so ensures that the feature space has finite cardinality and enables us to exploit counting process theory and the uniform consistency of the Kaplan–Meier survival function.  相似文献   

15.
This paper investigates the use of neural network combining methods to improve time series forecasting performance of the traditional single keep-the-best (KTB) model. The ensemble methods are applied to the difficult problem of exchange rate forecasting. Two general approaches to combining neural networks are proposed and examined in predicting the exchange rate between the British pound and US dollar. Specifically, we propose to use systematic and serial partitioning methods to build neural network ensembles for time series forecasting. It is found that the basic ensemble approach created with non-varying network architectures trained using different initial random weights is not effective in improving the accuracy of prediction while ensemble models consisting of different neural network structures can consistently outperform predictions of the single ‘best’ network. Results also show that neural ensembles based on different partitions of the data are more effective than those developed with the full training data in out-of-sample forecasting. Moreover, reducing correlation among forecasts made by the ensemble members by utilizing data partitioning techniques is the key to success for the neural ensemble models. Although our ensemble methods show considerable advantages over the traditional KTB approach, they do not have significant improvement compared to the widely used random walk model in exchange rate forecasting.  相似文献   

16.
In this paper, we propose a new random forest (RF) algorithm to deal with high dimensional data for classification using subspace feature sampling method and feature value searching. The new subspace sampling method maintains the diversity and randomness of the forest and enables one to generate trees with a lower prediction error. A greedy technique is used to handle cardinal categorical features for efficient node splitting when building decision trees in the forest. This allows trees to handle very high cardinality meanwhile reducing computational time in building the RF model. Extensive experiments on high dimensional real data sets including standard machine learning data sets and image data sets have been conducted. The results demonstrated that the proposed approach for learning RFs significantly reduced prediction errors and outperformed most existing RFs when dealing with high-dimensional data.  相似文献   

17.
Distributions of selections of a random set are characterized in terms of inequalities, similar to the marriage problem. A consequence is that the ensemble of such distributions is convex compact and depends continuously on the distribution of the random set.  相似文献   

18.
A method for parallel construction of a classifier ensemble for solving the problem of localization of neuron sources within the brain on the basis of the analysis of electroencephalography signals is described. The idea of the proposed parallel numerical method consists in the consideration of the source parameters as attributes of decision tress constructed in parallel. The method is based on formation of a training data set from an experimental signal and construction of a classifier on the basis of the value of error of the potential, that is, the difference between the measured and model values of the potential. The efficiency of parallelization of the localization problem, namely, the data distribution between processors, and the distributed training of the ensembles of decision trees are considered. Analysis of the scalability of the problem of construction of a classifier ensemble with a increase in the number of processors in the course of solution of the problem of localization of a neuron source on multiprocessor computational complexes is presented. The parallel source localization algorithm is developed for architectures with either common or distributed memory. The algorithm is realized using the MPI technology; a hybrid model of parallel calculations using MPI and OpenMPI is also discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号