首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
The possibility of improving the predictive ability of comparative molecular field analysis (CoMFA) by settings optimization has been evaluated to show that CoMFA predictive ability can be improved. Ten different CoMFA settings are evaluated, producing a total of 6120 models. This method has been applied to nine different data sets, including the widely used benchmark steroid data set, as well as eight other data sets proposed as QSAR benchmarking data sets by Sutherland et al. (J. Med. Chem. 2004, 47, 5541-5554). All data sets have been studied using training and test sets to allow for both internal (q(2)) and external (r(2)(pred)) predictive ability assessment. CoMFA settings optimization was successful in developing models with improved q(2) and r(2)(pred) as compared to default CoMFA modeling. Optimized CoMFA is compared with comparative molecular similarity indices analysis (CoMSIA) and holographic quantitative structure-activity relationship (HQSAR) models and found to consistently produce models with improved or equivalent q(2) and r(2)(pred). The ability of settings optimization to improve model predictive ability has been validated using both internal and external predictions, and the risk of chance correlation has been evaluated using response variable randomization tests.  相似文献   

4.
5.
A novel strategy for the optimization of wavelet transforms with respect to the statistics of the data set in multivariate calibration problems is proposed. The optimization follows a linear semi-infinite programming formulation, which does not display local maxima problems and can be reproducibly solved with modest computational effort. After the optimization, a variable selection algorithm is employed to choose a subset of wavelet coefficients with minimal collinearity. The selection allows the building of a calibration model by direct multiple linear regression on the wavelet coefficients. In an illustrative application involving the simultaneous determination of Mn, Mo, Cr, Ni, and Fe in steel samples by ICP-AES, the proposed strategy yielded more accurate predictions than PCR, PLS, and nonoptimized wavelet regression.  相似文献   

6.
利用近红外光谱技术对食用植物油中反式脂肪酸(Trans fatty acids,TFA)含量进行快速定量检测,并通过波段选择、预处理方法、变量筛选及建模方法对TFA含量预测模型进行优化.采用AntarisⅡ傅里叶变换近红外光谱仪在4000~10000 cm-1光谱范围采集98个食用植物油样本的近红外透射光谱,然后采用气相色谱法测定TFA的真实含量.首先,对样本原始光谱进行波段、预处理方法优选;在此基础上,采用竞争自适应重加权法(Competitive adaptive reweighted sampling,CARS)筛选TFA相关的重要变量,最后应用主成分回归、偏最小二乘和最小二乘支持向量机方法分别建立食用植物油中TFA含量的预测模型.研究结果表明,近红外光谱技术检测食用植物油中的TFA含量是可行的,优化后的最佳预测模型的校正集和预测集R2分别为0.992和0.989,RMSEC和RMSEP分别为0.071%和0.075%.最佳预测模型所用的变量仅26个,占全波段变量的0.854%.此外,与全波段偏最小二乘预测模型相比,其预测集R2由0.904上升为0.989,RMSEP由0.230%下降为0.075%.由此表明,模型优化非常必要,CARS能有效筛选TFA相关的重要变量,极大减少建模变量数,从而简化预测模型,并较大提高预测模型的精度和稳定性.  相似文献   

7.
通用模拟退火用于稳健多元分析校正   总被引:4,自引:0,他引:4  
模拟退火是一种全局优化算法,具有跨越局部最优点的机制,最小一乘是一种较常用的最小二乘更为稳健的优化准则,更适用于可能偏离正态分布的实际数据集,本文探讨了用最小一乘为准则并利用模拟退火方法同时测定多组分体系的可能性。应用于2-3组分药物体系分析,获得了满意的结果,本文还探讨了改变步长提高模拟退火算法优化精度的方法。  相似文献   

8.
Comparative molecular field analysis (CoMFA) with partial least squares (PLS) is one of the most frequently used tools in three-dimensional quantitative structure-activity relationships (3D-QSAR) studies. Although many successful CoMFA applications have proved the value of this approach, there are some problems in its proper application. Especially, the inability of PLS to handle the low signal-to-noise ratio (sample-to-variable ratio) has attracted much attention from QSAR researchers as an exciting research target, and several variable selection methods have been proposed. More recently, we have developed a novel variable selection method for CoMFA modeling (GARGS: genetic algorithm-based region selection), and its utility has been demonstrated in the previous paper (Kimura, T., et al. J. Chem. Inf. Comput. Sci. 1998, 38, 276-282). The purpose of this study is to evaluate whether GARGS can pinpoint known molecular interactions in 3D space. We have used a published set of acetylcholinesterase (AChE) inhibitors as a test example. By applying GARGS to a data set of AChE inhibitors, several improved models with high internal prediction and low number of field variables were obtained. External validation was performed to select a final model among them. The coefficient contour maps of the final GARGS model were compared with the properties of the active site in AChE and the consistency between them was evaluated.  相似文献   

9.
Total order ranking (TOR) strategies, which are mathematically based on elementary methods of discrete mathematics, seem to be attractive and simple tools for performing data analysis. Moreover order-ranking strategies seem to be a very useful tool not only to perform data exploration but also to develop order ranking models, a possible alternative to conventional quantitative structure–activity relationship (QSAR) methods. In fact, when data material is characterised by uncertainties, order methods can be used as alternative to statistical methods such as multilinear regression (MLR), because they do not require specific functional relationships between the independent and dependent variables (responses). A ranking model is a relationship between a set of dependent attributes, experimentally investigated, and a set of independent attributes, i.e. model attributes, which are calculated attributes. As in regression and classification models, the variable selection model is one of the main steps in finding predictive models. In this work the genetic algorithm–variable subset selection (GA–VSS) approach is proposed as the variable selection method for searching for the best ranking models within a wide set of variables. The models based on the selected subsets of variables are compared with the experimental ranking and evaluated by the Spearmans rank index. A case study application is presented on a TOR model developed for polychlorinated biphenyl (PCB) compounds, which have been analysed according to some of their physicochemical properties which play an important role in their environmental impact.  相似文献   

10.
朱尔一  林燕  庄赞勇 《分析化学》2007,35(7):973-977
提出了一种新的偏最小二乘变量筛选方法,该方法利用PLS回归建模过程中的一些信息,删除一部分冗余的或对建模影响不大的变量来简化、优化预报模型。用此方法结合变量扩维方法处理云南昆明、思茅、西双版纳3个来源地缴获的244个海洛因样本的ICP-MS数据时,与传统的算法比较,模型的判别准确率得到大大提高,达到95%以上。且所得到的模型含变量少,很容易分析或解释各变量对模型的影响。因此该方法可用于对毒品来源有效的识别或鉴定。  相似文献   

11.
Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached.  相似文献   

12.
研究了近红外光谱技术快速检测红曲菌固态发酵过程参数水分含量和pH值的可行性。针对传统基于间隔策略波长选择方法忽略非线性因素的缺点,采用一种基于最小二乘支持向量机(Least squares support vector machines,LS-SVM)非线性模型的波长筛选算法:联合区间最小二乘支持向量机(Synergy interval least squares support vector machines,siLS-SVM),并将新算法与相关系数法、iPLS算法、siPLS算法对比。实验结果显示,联合siLS-SVM算法和LS-SVM模型取得了最好的预测效果,水分含量、pH值的预测集相关系数(Rp)分别为0.962 1、0.976 1,预测均方根误差(RMSEP)分别为0.012 9、0.145 2,表明模型具有较好的拟合度和预测性能。应用近红外光谱法进行红曲菌固态发酵过程的水分含量和pH值的快速检测可行,该方法为进一步实现其过程参数的在线检测及发酵条件优化提供了技术基础。  相似文献   

13.
Near infrared (NIR) spectroscopy based on effective wavelengths (EWs) and chemometrics was proposed to discriminate the varieties of fruit vinegars including aloe, apple, lemon and peach vinegars. One hundred eighty samples (45 for each variety) were selected randomly for the calibration set, and 60 samples (15 for each variety) for the validation set, whereas 24 samples (6 for each variety) for the independent set. Partial least squares discriminant analysis (PLS-DA) and least squares-support vector machine (LS-SVM) were implemented for calibration models. Different input data matrices of LS-SVM were determined by latent variables (LVs) selected by explained variance, and EWs selected by x-loading weights, regression coefficients, modeling power and independent component analysis (ICA). Then the LS-SVM models were developed with a grid search technique and RBF kernel function. All LS-SVM models outperformed PLS-DA model, and the optimal LS-SVM model was achieved with EWs (4021, 4058, 4264, 4400, 4853, 5070 and 5273 cm−1) selected by regression coefficients. The determination coefficient (R2), RMSEP and total recognition ratio with cutoff value ±0.1 in validation set were 1.000, 0.025 and 100%, respectively. The overall results indicted that the regression coefficients was an effective way for the selection of effective wavelengths. NIR spectroscopy combined with LS-SVM models had the capability to discriminate the varieties of fruit vinegars with high accuracy.  相似文献   

14.
15.
F Gan  Q Xu  L Zhang  Y Liang 《Analytical sciences》2001,17(7):869-873
In this paper, a new optimization strategy is put forward which locates as many potential unimodal regions as possible in the search space. The potential optima can be further explored by a global optimization method for searching in the identified unimodal regions. The proposed strategy was evaluated by the optimization of test functions. The results obtained by this approach are comparable with those achieved by variable step size generalized simulated annealing (VSGSA) and a genetic algorithm (GA). Finally, we used this strategy in a clustering analysis of a tobacco data set.  相似文献   

16.
A new version of an ant colony optimization (ACO) algorithm has been proposed. A modified ACO algorithm is proposed to select variables in QSAR modeling and to predict inhibiting action of some diarylimidazole derivatives on cyclooxygenase (COX) enzyme. As a comparison to this method, the evolution algorithm (EA) was also tested. Experimental results have demonstrated that the modified ACO is a useful tool for variable selection that needs few parameters to be adjusted and converges quickly toward the optimal position.  相似文献   

17.
In this paper, we proposed a wavelength selection method based on random decision particle swarm optimization with attractor for near‐infrared (NIR) spectra quantitative analysis. The proposed method was incorporated with partial least square (PLS) to construct a prediction model. The proposed method chooses the current own optimal or the current global optimal to calculate the attractor. Then the particle updates its flight velocity by the attractor, and the particle state is updated by the random decision with the new velocity. Moreover, the root‐mean‐square error of cross‐validation is adopted as the fitness function for the proposed method. In order to demonstrate the usefulness of the proposed method, PLS with all wavelengths, uninformative variable elimination by PLS, elastic net, genetic algorithm combined with PLS, the discrete particle swarm optimization combined with PLS, the modified particle swarm optimization combined with PLS, the neighboring particle swarm optimization combined with PLS, and the proposed method are used for building the components quantitative analysis models of NIR spectral datasets, and the effectiveness of these models is compared. Two application studies are presented, which involve NIR data obtained from an experiment of meat content determination using NIR and a combustion procedure. Results verify that the proposed method has higher predictive ability for NIR spectral data and the number of selected wavelengths is less. The proposed method has faster convergence speed and could overcome the premature convergence problem. Furthermore, although improving the prediction precision may sacrifice the model complexity under a certain extent, the proposed method is overfitted slightly. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

18.
19.
Derivation of quantitative structure-activity relationships (QSAR) usually involves computational models that relate a set of input variables describing the structural properties of the molecules for which the activity has been measured to the output variable representing activity. Many of the input variables may be correlated, and it is therefore often desirable to select an optimal subset of the input variables that results in the most predictive model. In this paper we describe an optimization technique for variable selection based on artificial ant colony systems. The algorithm is inspired by the behavior of real ants, which are able to find the shortest path between a food source and their nest using deposits of pheromone as a communication agent. The underlying basic self-organizing principle is exploited for the construction of parsimonious QSAR models based on neural networks for several classical QSAR data sets.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号