共查询到7条相似文献,搜索用时 15 毫秒
1.
Feature selection is one of the core contents of rough set theory and application. Since the reduction ability and classification performance of many feature selection algorithms based on rough set theory and its extensions are not ideal, this paper proposes a feature selection algorithm that combines the information theory view and algebraic view in the neighborhood decision system. First, the neighborhood relationship in the neighborhood rough set model is used to retain the classification information of continuous data, to study some uncertainty measures of neighborhood information entropy. Second, to fully reflect the decision ability and classification performance of the neighborhood system, the neighborhood credibility and neighborhood coverage are defined and introduced into the neighborhood joint entropy. Third, a feature selection algorithm based on neighborhood joint entropy is designed, which improves the disadvantage that most feature selection algorithms only consider information theory definition or algebraic definition. Finally, experiments and statistical analyses on nine data sets prove that the algorithm can effectively select the optimal feature subset, and the selection result can maintain or improve the classification performance of the data set. 相似文献
2.
This paper deals with a prediction problem of a new targeting variable corresponding to a new explanatory variable given a training dataset. To predict the targeting variable, we consider a model tree, which is used to represent a conditional probabilistic structure of a targeting variable given an explanatory variable, and discuss statistical optimality for prediction based on the Bayes decision theory. The optimal prediction based on the Bayes decision theory is given by weighting all the model trees in the model tree candidate set, where the model tree candidate set is a set of model trees in which the true model tree is assumed to be included. Because the number of all the model trees in the model tree candidate set increases exponentially according to the maximum depth of model trees, the computational complexity of weighting them increases exponentially according to the maximum depth of model trees. To solve this issue, we introduce a notion of meta-tree and propose an algorithm called MTRF (Meta-Tree Random Forest) by using multiple meta-trees. Theoretical and experimental analyses of the MTRF show the superiority of the MTRF to previous decision tree-based algorithms. 相似文献
3.
为有效地预测边坡稳定性和预防边坡失稳事故的发生,提出了鲸鱼优化算法(whale optimization algorithm,WOA)和随机森林(random forest,RF)相结合的混合模型 WOA-RF;基于所收集的边坡案例,采用混淆矩阵的分类性能指标和受试者工作特征曲线及线下面积评估混合模型WOA-RF的分类和泛化性能;使用WOA对4种广泛应用的机器学习模型进行优化,并将优化后的机器学习模型与WOA-RF模型进行对比分析.结果表明:WOA可以有效地优化超参数和提升模型性能;最优WOA-RF模型在训练集和测试集上的准确率分别为0.99和0.94,优化后,准确率、精确率、召回率、精确率和召回率的加权平均值分别提升了 11.9%、19.0%、4.8%和11.9%;对比分析各个模型的预测性能后发现,WOA-RF模型的各项指标均优于其他模型;确定了特征重要性排序,发现容重是影响边坡稳定性的最敏感特征.WOA-RF模型可有效地预测边坡稳定性,预测结果可为防护措施的制定提供依据. 相似文献
4.
针对现有的以概率统计理论为基础的方法和模糊神经网络法必须建立在大量统计数据基础之上,以及模糊信息扩散估计法可能对器件失效阈值估计过高的问题,提出将模糊信息处理技术用于对原始实验数据的处理,得到训练样本,在此基础上利用支持向量机回归预测一定功率的高功率微波辐照条件下电子器件的损伤概率。仿真结果表明:该方法与模糊神经网络法都较好地给出了预测结果,但该方法具有更高的精度(均方根误差为7.40610-5),并且克服了在样本数据减半的小样本情况下模糊神经网络法可能出现野值的缺陷。 相似文献
5.
针对现有的以概率统计理论为基础的方法和模糊神经网络法必须建立在大量统计数据基础之上,以及模糊信息扩散估计法可能对器件失效阈值估计过高的问题,提出将模糊信息处理技术用于对原始实验数据的处理,得到训练样本,在此基础上利用支持向量机回归预测一定功率的高功率微波辐照条件下电子器件的损伤概率。仿真结果表明:该方法与模糊神经网络法都较好地给出了预测结果,但该方法具有更高的精度(均方根误差为7.40610-5),并且克服了在样本数据减半的小样本情况下模糊神经网络法可能出现野值的缺陷。 相似文献
6.
Chunsheng Yan Zhongyi Cheng Si Luo Chen Huang Songtao Han Xiuli Han Yuandong Du Chaonan Ying 《Journal of Raman spectroscopy : JRS》2022,53(2):260-271
Handmade paper is a major carrier and restoration material of traditional Chinese ancient books, calligraphies, and paintings. In this study, we carried out a Raman spectroscopy analysis of 18 types of handmade paper samples. The main components of the handmade paper were cellulose and lignin, according to the wavenumber and Raman vibration assignment. We divided its Raman spectrum into eight subbands. Five machine learning models were employed: principal component analysis (PCA), partial least squares (PLS), support vector machine (SVM), k-nearest neighbors (KNN), and random forest (RF). The Raman spectral data were normalized, and the fluorescence envelope was subtracted using the airPLS algorithm to obtain four types of data, raw, normalized, defluorescence, and fluorescence data. An RF variable importance analysis of data processing showed that data normalization eliminated the intensity differences of fluorescence signals caused by lignin, which contained important information of raw materials and papermaking technology, let alone the data defluorescence. The data processing also reduced the importance of the average variables in almost all spectral bands. Nevertheless, the data processing is worthwhile because it significantly improves the accuracy of machine learning, and the information loss does not affect the prediction. Using the machine learning models of PCA, PLS, and SVM combined with linear regression (LR), KNN, and RF, the classification and prediction of handmade paper samples were realized. For almost all processed data, including the fluorescence data, PCA-LR had the highest classification and prediction accuracy (R2 = 1) in almost all spectral bands. PLS-LR and SVM-LR had the second-highest accuracies (R2 = 0.4–0.9), whereas KNN and RF had the lowest accuracies (R2 = 0.1–0.4) for full band spectral data. Our results suggest that the abundant information contained in Raman spectroscopy combined with powerful machine learning models could inspire further studies on handmade paper and related cultural relics. 相似文献
7.
This paper presents new approaches to fit regression models for symbolic internal-valued variables, which are shown to improve and extend the center method suggested by Billard and Diday and the center and range method proposed by Lima-Neto, E.A.and De Carvalho, F.A.T. Like the previously mentioned methods, the proposed regression models consider the midpoints and half of the length of the intervals as additional variables. We considered various methods to fit the regression models, including tree-based models, K-nearest neighbors, support vector machines, and neural networks. The approaches proposed in this paper were applied to a real dataset and to synthetic datasets generated with linear and nonlinear relations. For an evaluation of the methods, the root-mean-squared error and the correlation coefficient were used. The methods presented herein are available in the the RSDA package written in the R language, which can be installed from CRAN. 相似文献