首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
采用可见-近红外透射光谱结合CARS变量优选方法优化模型,对棕榈油碘值进行近红外定量分析。通过将使用不同预处理方法产生的建模效果进行比较,找到了理想的预处理方法,通过CARS变量选择方法优选出与棕榈油碘值相关的有效波点共60个,利用60个有效波点建立棕榈油碘值优化模型。根据优化模型的建模集相关系数(R_c=0.9814)和预测集相关系数(R_p=0.9806),得到的建模均方根误差(RM SEC=0.0398)和预测均方根误差(RM SEP=0.0406)优于采用全波段建立的模型得到的系数误差。利用可见近红外透射光谱结合CARS变量优选方法,简化了棕榈油碘值模型,并能够保证碘值预测的准确度。  相似文献   

2.
采用高分辨电喷雾萃取电离质谱(EESI-MS)技术对肝衰竭患者和健康志愿者呼出气体样本进行快速检测, 结合多块偏最小二乘分析(MB-PLS)方法, 对多批次获取的呼出气体代谢数据进行统计建模分析, 并与传统的PLS方法进行比较. 结果表明, MB-PLS方法能有效消除批次差异对统计建模的影响. 此外, 利用MB-PLS模型变量VIP值对变量进行筛选, 可降低数据的冗余, 消除无关变量对模型的影响, 从而有效提高了模型的性能.  相似文献   

3.
为了研究不同光谱输入方法对柴油运动粘度预测模型的影响,本文对预处理后的柴油光谱进行主成分分析(PCA)得到的前6个主成分(PCs)、建立偏最小二乘(PLS)回归曲线选取有效波长(EWs)和运用偏最小二乘回归(PLSR)建模得到的14个潜变量(LVs),将三者分别输入至最小二乘支持向量机(LS-SVM),对结果进行比较分析表明:LVs-LS-SVM建模得到的结果为柴油运动粘度(R_(Pre)~2=0.839,RMSEP=0.317,RPD=1.54),优于EWs-LS-SVM和PCs-LS-SVM模型,为柴油参数测定便携式仪器的开发奠定基础。  相似文献   

4.
将竞争自适应重加权采样(CARS)与区间偏最小二乘回归(iPLS)相结合的变量筛选建模方法 CARSiPLS,用于烟煤中水分与挥发分的近红外光谱测定。以CARS逐步筛选出每个区间与待测量相关的变量,建立烟煤中水分与挥发分近红外光谱测定的偏最小二乘回归模型。结果表明:与PLS、iPLS相比,CARSiPLS可以显著减少变量数,同时提高模型预测性能;挥发分建模变量从1557个减少至15个,水分建模变量从1557个减少至317个;挥发分、水分的预测平均绝对百分误差分别从0.031 5降至0.018 4、从0.188 4降至0.094 6;挥发分、水分的预测均方差分别从0.010 8降至0.006 7、从0.005 0降至0.002 8。  相似文献   

5.
将稳定度自适应重加权采样特征变量选择算法用于支持向量机定性分析(Support vector machine-stability competitive adaptive reweighted sampling,SVM-SCARS)。该算法通过对数据多次采样建模计算各变量的稳定度值,稳定度值能更加客观准确地评估变量在建模中的作用,因此可作为变量重要性的评价依据。通过循环迭代方式,采用自适应重加权采样技术逐步筛选变量,然后以每次循环所得变量子集建立SVM模型,并以模型交叉验证分类正确率(Correct classification rate of cross validation,CCRCV)评估子集优劣,确定最优特征变量子集。将该算法结合漫反射近红外光谱技术建立了制浆造纸常用木材的树种识别模型,实现了对4种桉木和2种相思木的快速识别分类。最终共筛选出15个特征变量建立分类模型,模型对各树种分类的正确率达97.9%,具有较好的分类效果。与全光谱模型和递归特征消除支持向量机模型相比,SVM-SCARS能够筛选出更少的特征变量,且模型具有更好的预测性能和稳定性。研究结果表明,SVM-SCARS算法能够有效优化光谱特征变量,提高近红外在线分析模型在木材材性分析中的稳健性和适用性。  相似文献   

6.
将竞争自适应重加权采样(CARS)与区间偏最小二乘回归(iPLS)相结合的变量筛选建模方法 CARSiPLS,用于烟煤中水分与挥发分的近红外光谱测定。以CARS逐步筛选出每个区间与待测量相关的变量,建立烟煤中水分与挥发分近红外光谱测定的偏最小二乘回归模型。结果表明:与PLS、iPLS相比,CARSiPLS可以显著减少变量数,同时提高模型预测性能;挥发分建模变量从1557个减少至15个,水分建模变量从1557个减少至317个;挥发分、水分的预测平均绝对百分误差分别从0.031 5降至0.018 4、从0.188 4降至0.094 6;挥发分、水分的预测均方差分别从0.010 8降至0.006 7、从0.005 0降至0.002 8。  相似文献   

7.
甲基烷烃结构与色谱保留指数相关性的拓扑指数法研究   总被引:14,自引:0,他引:14  
向铮  梁逸曾  胡黔楠 《色谱》2005,23(2):117-122
计算了207个甲基烷烃的127个拓扑指数变量,把变量选择方法GAPLS方法引入到定量结构与气相色谱保留关系研究中,对127个拓扑指数变量进行选择,得到了含7个变量的化合物的定量结构与色谱保留指数关系(QSRR)模型,其复相关系数的平方为0.99998,标准偏差为2.88。交互验证的复相关系数为0.99997,交互验证的预测标准偏差为2.95,表明该模型具有良好的稳定性和可靠性。对获得的7个变量进行了合理的结构解释,表明甲基烷烃色谱保留指数完全能用拓扑指数来精确表征。  相似文献   

8.
为了提高近红外光谱定量分析的预测精度和建模效率,提出了一种基于交互式自模型的混合物分析的波长优选方法,根据光谱各波长变量的纯度值和标准差值,选择含有用信息的波长变量,并引入相关权函数解决变量间共线性问题.通过依次迭代选择的变量建立定量校正模型,由交互验证均方根预测误差(RMSECV)确定最佳波长变量个数.应用该波长变量优选方法对具有不同葡萄糖含量的两组(四成分葡萄糖水溶液实验和人体血浆实验)近红外光谱数据进行分析,两组数据中分别只选择了全部变量的0.3%建立定量校正模型,其验证集葡萄糖浓度的均方根预测误差(RMSEP)分别减少为669和15 mg/L.与全谱范围及优选波段建立的定量校正模型比较,本方法能够通过波长变量优选最小化冗余信息、提高预测精度及建模效率.  相似文献   

9.
基于氨基酸物化性质的描述子矢量VHSE, 对21个后叶催产素类似物进行结构表征. 经逐步回归与偏最小二乘相结合的变量筛选技术, 根据模型的外部预测结果, 筛选得到一个最优的9变量组合. 应用该变量组合对21个后叶催产素类似物的促宫缩活性进行偏最小二乘建模, 模型复相关系数R2为92.6%, 留一法和留组法交互验证Q2分别为78.3%和79.4%. 结果表明, 后叶催产素的促宫缩活性主要与第3号氨基酸残基的疏水性、立体结构和电性性质以及第8号氨基酸残基的电性特征密切相关.  相似文献   

10.
变量选择技术是光谱建模的重要环节。本研究提出了一种新的变量选择方法——自加权变量组合集群分析法(AWVCPA),首先通过二进制矩阵采样法(BMS)对变量空间进行采样;其次通过对变量出现频率(Fre)和偏最小二乘回归系数(Reg)两种信息向量(IVs)做加权处理,得到了每个光谱变量的贡献值,进而考虑到了Fre和Reg两类IVs对于光谱建模的影响;最后通过指数衰减函数(EDF)删除贡献小的波长点,进而实现特征变量选取。以啤酒和玉米两组近红外光谱数据为例,基于偏最小二乘法(PLS)建立啤酒中酵母浓度预测模型和玉米中油浓度预测模型,对比其它变量选择方法。研究表明,在相同条件下,基于AWVCPA变量选择方法建立的预测模型都取得了最优的预测精度,对啤酒中酵母浓度的预测,相比全光谱PLS模型,RMSEP由0.5348下降到0.1457,预测精度提高了72.7%;对玉米含油量的预测,相比全光谱PLS模型,预测均方根误差(RMSEP)由0.0702下降到了0.0248,预测精度提高了64.7%。  相似文献   

11.
A subspace-projection method is developed to construct orthogonal block variable, which is originally from some kinds of series of topological indices or quantum chemical parameters. With the help of canonical correlation analysis, the orthogonal block variables were used to establish the structure-retention index correlation model. The regression of only few new orthogonal variables obtained by canonical correlation analysis against retention index shows significant improvement both in fitting and prediction ability of the correlation model. Moreover, the quantitative intercorrelation between the different block variables of topological indices can also be evaluated with the help of the subspace-projection technique proposed in this work.  相似文献   

12.
The calibration performance of partial least squares for one response variable (PLS1) can be improved by elimination of uninformative variables. Many methods are based on so-called predictive variable properties, which are functions of various PLS-model parameters, and which may change during the variable reduction process. In these methods variable reduction is made on the variables ranked in descending order for a given variable property. The methods start with full spectrum modelling. Iteratively, until a specified number of remaining variables is reached, the variable with the smallest property value is eliminated; a new PLS model is calculated, followed by a renewed ranking of the variables. The Stepwise Variable Reduction methods using Predictive-Property-Ranked Variables are denoted as SVR-PPRV. In the existing SVR-PPRV methods the PLS model complexity is kept constant during the variable reduction process. In this study, three new SVR-PPRV methods are proposed, in which a possibility for decreasing the PLS model complexity during the variable reduction process is build in. Therefore we denote our methods as PPRVR-CAM methods (Predictive-Property-Ranked Variable Reduction with Complexity Adapted Models). The selective and predictive abilities of the new methods are investigated and tested, using the absolute PLS regression coefficients as predictive property. They were compared with two modifications of existing SVR-PPRV methods (with constant PLS model complexity) and with two reference methods: uninformative variable elimination followed by either a genetic algorithm for PLS (UVE-GA-PLS) or an interval PLS (UVE-iPLS). The performance of the methods is investigated in conjunction with two data sets from near-infrared sources (NIR) and one simulated set. The selective and predictive performances of the variable reduction methods are compared statistically using the Wilcoxon signed rank test. The three newly developed PPRVR-CAM methods were able to retain significantly smaller numbers of informative variables than the existing SVR-PPRV, UVE-GA-PLS and UVE-iPLS methods without loss of prediction ability. Contrary to UVE-GA-PLS and UVE-iPLS, there is no variability in the number of retained variables in each PPRV(R) method. Renewed variable ranking, after deletion of a variable, followed by remodelling, combined with the possibility to decrease the PLS model complexity, is beneficial. A preferred PPRVR-CAM method is proposed.  相似文献   

13.
In this study, a new variable selection method called bootstrapping soft shrinkage (BOSS) method is developed. It is derived from the idea of weighted bootstrap sampling (WBS) and model population analysis (MPA). The weights of variables are determined based on the absolute values of regression coefficients. WBS is applied according to the weights to generate sub-models and MPA is used to analyze the sub-models to update weights for variables. The optimization procedure follows the rule of soft shrinkage, in which less important variables are not eliminated directly but are assigned smaller weights. The algorithm runs iteratively and terminates until the number of variables reaches one. The optimal variable set with the lowest root mean squared error of cross-validation (RMSECV) is selected. The method was tested on three groups of near infrared (NIR) spectroscopic datasets, i.e. corn datasets, diesel fuels datasets and soy datasets. Three high performing variable selection methods, i.e. Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm partial least squares (GA-PLS) are used for comparison. The results show that BOSS is promising with improved prediction performance. The Matlab codes for implementing BOSS are freely available on the website: http://www.mathworks.com/matlabcentral/fileexchange/52770-boss.  相似文献   

14.
沈含熙  蔡硕为 《分析化学》1994,22(7):716-719
本文采用逐步回归法对不定物种的多组分体系的紫外光谱矢量数据进行解析,可对物种及物种含量同时进行定性和定量分析,通过对两个复方药物体系(复方扑热息痛与复方维生素B)的紫外光谱数据的实际解析,表明本法能有效地完成混合物中物种及物种含量的同时定性和定量测定。  相似文献   

15.
16.
17.
Total order ranking (TOR) strategies, which are mathematically based on elementary methods of discrete mathematics, seem to be attractive and simple tools for performing data analysis. Moreover order-ranking strategies seem to be a very useful tool not only to perform data exploration but also to develop order ranking models, a possible alternative to conventional quantitative structure–activity relationship (QSAR) methods. In fact, when data material is characterised by uncertainties, order methods can be used as alternative to statistical methods such as multilinear regression (MLR), because they do not require specific functional relationships between the independent and dependent variables (responses). A ranking model is a relationship between a set of dependent attributes, experimentally investigated, and a set of independent attributes, i.e. model attributes, which are calculated attributes. As in regression and classification models, the variable selection model is one of the main steps in finding predictive models. In this work the genetic algorithm–variable subset selection (GA–VSS) approach is proposed as the variable selection method for searching for the best ranking models within a wide set of variables. The models based on the selected subsets of variables are compared with the experimental ranking and evaluated by the Spearmans rank index. A case study application is presented on a TOR model developed for polychlorinated biphenyl (PCB) compounds, which have been analysed according to some of their physicochemical properties which play an important role in their environmental impact.  相似文献   

18.
Comparative molecular field analysis (CoMFA) with partial least squares (PLS) is one of the most frequently used tools in three-dimensional quantitative structure-activity relationships (3D-QSAR) studies. Although many successful CoMFA applications have proved the value of this approach, there are some problems in its proper application. Especially, the inability of PLS to handle the low signal-to-noise ratio (sample-to-variable ratio) has attracted much attention from QSAR researchers as an exciting research target, and several variable selection methods have been proposed. More recently, we have developed a novel variable selection method for CoMFA modeling (GARGS: genetic algorithm-based region selection), and its utility has been demonstrated in the previous paper (Kimura, T., et al. J. Chem. Inf. Comput. Sci. 1998, 38, 276-282). The purpose of this study is to evaluate whether GARGS can pinpoint known molecular interactions in 3D space. We have used a published set of acetylcholinesterase (AChE) inhibitors as a test example. By applying GARGS to a data set of AChE inhibitors, several improved models with high internal prediction and low number of field variables were obtained. External validation was performed to select a final model among them. The coefficient contour maps of the final GARGS model were compared with the properties of the active site in AChE and the consistency between them was evaluated.  相似文献   

19.
通用模拟退火用于稳健多元分析校正   总被引:4,自引:0,他引:4  
模拟退火是一种全局优化算法,具有跨越局部最优点的机制,最小一乘是一种较常用的最小二乘更为稳健的优化准则,更适用于可能偏离正态分布的实际数据集,本文探讨了用最小一乘为准则并利用模拟退火方法同时测定多组分体系的可能性。应用于2-3组分药物体系分析,获得了满意的结果,本文还探讨了改变步长提高模拟退火算法优化精度的方法。  相似文献   

20.
Biomarker discovery is one important goal in metabolomics, which is typically modeled as selecting the most discriminating metabolites for classification and often referred to as variable importance analysis or variable selection. Until now, a number of variable importance analysis methods to discover biomarkers in the metabolomics studies have been proposed. However, different methods are mostly likely to generate different variable ranking results due to their different principles. Each method generates a variable ranking list just as an expert presents an opinion. The problem of inconsistency between different variable ranking methods is often ignored. To address this problem, a simple and ideal solution is that every ranking should be taken into account. In this study, a strategy, called rank aggregation, was employed. It is an indispensable tool for merging individual ranking lists into a single “super”-list reflective of the overall preference or importance within the population. This “super”-list is regarded as the final ranking for biomarker discovery. Finally, it was used for biomarkers discovery and selecting the best variable subset with the highest predictive classification accuracy. Nine methods were used, including three univariate filtering and six multivariate methods. When applied to two metabolic datasets (Childhood overweight dataset and Tubulointerstitial lesions dataset), the results show that the performance of rank aggregation has improved greatly with higher prediction accuracy compared with using all variables. Moreover, it is also better than penalized method, least absolute shrinkage and selectionator operator (LASSO), with higher prediction accuracy or less number of selected variables which are more interpretable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号