首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
从20种天然氨基酸的1369种性质参数经主成分分析得出一种新多肽序列表征方法——SZOTT. 将其用于71个不同长度肽序列表征, 以偏最小二乘(PLS)和支持向量机(SVM)建立定量结构-保留模型(QSRM). 研究表明, SZOTT能够较好表征71个肽序列特征, 其含信息量大且易操作, 与PLS相比, SVM对lgk建模预测表现出较强的拟合能力和良好外部预测能力, SZOTT表征方法和SVM建模可进一步用于肽HPLC保留行为研究.  相似文献   

2.
根据汽油辛值预测体系本身的非线性特点,提出主成分回归残差神经网络校正算法(principal component regression residual artificial neural network,PCRRANN)用于近红外测定汽油辛烷值的预测模型校正,该方法给合了主成分回归算法(PC),与经典的线性校正算法(PLS(Partial Least Square),PCR, 以及非线性PLS(NPLS,Non-linear PLS)等相比,预测明显的改善,文中还讨论了PCR主成分数及训练参数对预则模可能的影响。  相似文献   

3.
提出了改进的定量构效关系(^mmQSAR)研究方法,即认为在知晓配基与受体相互作用模式前提下建立一种较可靠的定量构效关系模型,从而克服了传统做法中仅根据样本集分子自身信息来构建预测模型的某些弊端.将此思路应用于功能肽/蛋白质亲和活性考察,采用遗传算法(GA)筛选虚拟受体结合靶点及相互作用模式,结合偏最小二乘(PLS)潜因多元建模技术,通过交互检验均方根误差(RMScv)作适应度评价函数筛选变化位点和受体残基,完成种群进化以最优个体为最终确定模型,得到一种新颖QSAR方法(^mmQSAR):基于功能肽/蛋白质作用模式的遗传虚拟筛选(暂称GVSPPC).该法成功解决了诸多QSAR研究难题,即大多数情况下受体结构未知而难以了解的配基与之结合方式.分别使用生物功能寡肽和多肽体系对GVSPPC加以检验,其结果表明,GVSPPC得到了优于传统方法QSAR结果(Rcu^2〉0.75~0.91,Qcv^2〉0.71~0.86,ERMs为0.19~0.95)且深入阐明配基与受体间作用机理,物理意义较为明确;同时计算选中模式作用能并与样本活性建立PLS模型,以其交互检验均方根误差(ERMSCV)作评价函数完成种群进化.通过大规模迭代直到规定的终止条件或到达最大繁殖代数,以该过程最优个体作为最终确定QSAR模型.  相似文献   

4.
对一系列共59个芳香性分子进行了HF/6-31G*水平上的结构优化,并在优化结构上进行了分子静电势及其导出参数的计算,应用多元线性回归方法建立了碳纳米管吸附有机污染物的平衡常数与分子结构间的定量关系. 结果表明,分子表面静电势参数(Vmin、σ+2 和ΣVind+)结合分子表面积(S)和最低空轨道能级(εLUMO)可以很好地用于构建碳纳米管吸附的定量结构-性质关系(QSPR)模型. 模型中引入的参数均具有明确的物理意义,其合理性可以从污染物与碳纳米管或水分子间相互作用的角度进行解释. 模型的稳定性和预测能力经“留一法”和Monte Carlo 交叉验证法进行了确证. 本文亦采用支持向量机(SVM)、最小二乘支持向量机(LSSVM)和高斯过程(GP)等三种方法建立了上述参数与碳纳米管吸附性质的非线性模型. SVM和LSSVM模型表现出强的拟合能力,但预测能力明显不如其他模型. GP模型无论是拟合能力还是预测能力都是最佳,但并没有明显地优于线性模型,说明对本文研究体系而言,其分子结构与性质间的关系主要以线性形式存在.  相似文献   

5.
该文提出了基于无监督判别投影特征选择的支持向量机方法(UDPFS-SVM)用于标志物筛选。UDPFS-SVM首先通过无监督判别投影算法(UDPFS)引入分类先验信息、添加正则化与惩罚函数等约束自适应地获得具有稀疏性的判别投影矩阵,然后根据获得的矩阵求得相应低维代谢矩阵,最后建立支持向量机(SVM)分类模型寻找生物标志物。所提出的方法能够同时进行模糊学习与稀疏学习,并可合理利用变量之间的依赖关系。通过UDPFS-SVM与偏最小二乘判别分析(PLS-DA)方法对高脂血症大鼠血浆代谢组学数据进行变量筛选,并采用方差分析、ROC曲线、线性判别分析(LDA)对筛选得到的生物标志物进行评价。结果表明,两种方法均发现8个生物标志物。方差分析显示UDPFS-SVM方法获得的生物标志物均具有显著性差异,且显著性差异值均大于PLS-DA;ROC结果显示UDPFS-SVM结果为1.00,比PLS-DA结果高0.05;LDA显示UDPFS-SVM获得的生物标志物在高脂血症样本中可以更好地消除组内代谢差异,区分组间代谢差异,说明UDPFS-SVM方法在高脂血症生物标志物发现上优于PLS-DA,为生物标志物的发现提供了一种新思路。  相似文献   

6.
邵学广  陈达  徐恒  刘智超  蔡文生 《中国化学》2009,27(7):1328-1332
偏最小二乘法(PLS)在近红外光谱(NIR)定量分析中占有重要地位,但预测结果往往容易受到样本分组和奇异样本等因素的影响,稳健性不强。多模型PLS (EPLS)方法在模型稳健性上得到提高,然而它无法识别样本中存在的奇异样本。为了同时提高模型的预测准确性和稳健性,本文提出了一种根据取样概率重新取样的多模型PLS方法,称为稳健共识PLS(RE-PLS)方法。该方法通过迭代赋权偏最小二乘法(IRPLS)计算样本回归残差得到每个校正集样本的取样概率,然后根据样本的取样概率来选择训练子集建立多个PLS模型,最后将所有PLS模型的预测结果平均作为最终预测结果。该方法用于两种不同植物样品的近红外光谱建模,并与传统的PLS及EPLS方法进行比较。结果表明该方法可以有效的避免校正集中奇异样本对模型的影响,同时可以提高预测精确度和稳健性。对于含有较多奇异样本的,复杂近红外光谱烟草实际样本,利用简单PLS或者EPLS方法建模预测效果不是很理想,而RE-PLS凭借其独特优势则有望在这种复杂光谱定量分析中得到广泛的应用。  相似文献   

7.
研究了基于统计学习理论的支持向量机(SVM)回归法在X射线荧光光谱定量分析中的应用。以39个农田土壤样品作为实验材料,以其中32个土壤样品作为校正集,选用SVM模型中Linear、Poly和RBF 3种核函数对As元素含量与荧光光谱数据进行回归建模。用3种不同模型对预测集中7个土壤样品的As元素含量进行预测分析,结果显示模型预测As元素含量与电感耦合等离子体发射光谱法测定的As元素含量之间的相关系数R2均大于0.99,相对分析误差RPD均大于3,表明所建立的SVM模型具有较好的使用价值。为了进一步考察SVM回归模型的预测效果,同应用较成熟的PLS回归模型的预测结果进行对比,结果显示SVM法的预测结果更好,表明SVM回归模型亦可用于便携式X射线荧光光谱法的定量预测分析。  相似文献   

8.
对一系列共59个芳香性分子进行了HF/6-31G*水平上的结构优化,并在优化结构上进行了分子静电势及其导出参数的计算,应用多元线性回归方法建立了碳纳米管吸附有机污染物的平衡常数与分子结构间的定量关系.结果表明,分子表面静电势参数(Vmin、σ2+和ΣV+ind)结合分子表面积(S)和最低空轨道能级(εLUMO)可以很好地用于构建碳纳米管吸附的定量结构-性质关系(QSPR)模型.模型中引入的参数均具有明确的物理意义,其合理性可以从污染物与碳纳米管或水分子间相互作用的角度进行解释.模型的稳定性和预测能力经"留一法"和Monte Carlo交叉验证法进行了确证.本文亦采用支持向量机(SVM)、最小二乘支持向量机(LSSVM)和高斯过程(GP)等三种方法建立了上述参数与碳纳米管吸附性质的非线性模型.SVM和LSSVM模型表现出强的拟合能力,但预测能力明显不如其他模型.GP模型无论是拟合能力还是预测能力都是最佳,但并没有明显地优于线性模型,说明对本文研究体系而言,其分子结构与性质间的关系主要以线性形式存在.  相似文献   

9.
光谱分析技术由于具有简单、快速、无损等优势,在复杂体系的定性和定量分析中得到了广泛应用。然而光谱中往往包含成百上千的波长点,有些波长点与研究的目标性质并不相关,加大了计算量并降低了模型的预测准确度。因此,在建立模型前需要进行变量选择。最小绝对收缩与选择算子(LASSO)可将回归系数收缩为0,进而达到变量选择的目的。该研究将LASSO用于三元调和油样品近红外光谱和生物样品拉曼光谱的变量选择,基于偏最小二乘(PLS)和多元线性回归(MLR)模型,分别对香油和肌氨酸的含量进行定量分析,并与无信息变量消除-PLS(UVE-PLS)、蒙特卡罗结合UVE-PLS(MCUVE-PLS)和随机检验-PLS(RT-PLS)3种变量选择方法进行比较。结果表明,基于LASSO的变量选择方法保留的变量数最少,运算速度最快。对三元调和油样品,LASSO-PLS预测的准确度最高;对生物样品,LASSO-MLR预测的准确度最高。因此,基于LASSO的变量选择算法有望在光谱分析领域中得到良好应用。  相似文献   

10.
建立了一种测定蛋白质的新方法.在pH3.6的Britton-Robinson(B-R)缓冲溶液中,蛋白质与四羧基镍酞菁NiPc(COOH)4发生相互作用,使体系在λ=388nm处的共振散射(RLS)增强,并且增强的散射强度(IRLS)与蛋白质的含量成比例,据此利用四羧基镍酞菁NiPc(COOH)4为光谱探针共振散射法测定人血清中的总蛋白质,同时优化了体系光散射检测的实验参数.在最佳的实验条件下,对牛血清白蛋白(BSA)、人血清白蛋白(HSA)、人血清总蛋白(TP)的线性范围分别为0.00~1.20mg/L、0.00~1.00mg/L、0.00~1.00mg/L,相应检测限分别为5.97×10^-4mg/L、2.90×10^-4mg/L、4.76×10^-4mg/L.将该方法应用于实际人血清样品中总蛋白的测定,结果与考马斯亮蓝法比较,令人满意.  相似文献   

11.
支持向量机分类和回归用于肽的QSAR研究   总被引:4,自引:0,他引:4  
周鹏  曾晖  李波  周原  李志良 《化学通报》2006,69(5):342-346
使用支持向量机技术对两类肽化合物体系进行了分类和回归研究,并将其系统地与K最邻近法、多元线性回归、偏最小二乘、人工神经网络进行了比较。结果表明,对于小样本、非线性问题,支持向量机具有较强的稳定性能及泛化能力,在大多数情况下能够得到优于传统方法的建模效果。对于分类问题,支持向量机对训练集和测试集都达到了100%的分类正确率;对于回归问题,支持向量机虽对训练集样本拟合效果略低于人工神经网络,但对外部测试集却表现出较强的预测能力。  相似文献   

12.
This paper introduces a technique to visualise the information content of the kernel matrix and a way to interpret the ingredients of the Support Vector Regression (SVR) model. Recently, the use of Support Vector Machines (SVM) for solving classification (SVC) and regression (SVR) problems has increased substantially in the field of chemistry and chemometrics. This is mainly due to its high generalisation performance and its ability to model non-linear relationships in a unique and global manner. Modeling of non-linear relationships will be enabled by applying a kernel function. The kernel function transforms the input data, usually non-linearly related to the associated output property, into a high dimensional feature space where the non-linear relationship can be represented in a linear form. Usually, SVMs are applied as a black box technique. Hence, the model cannot be interpreted like, e.g., Partial Least Squares (PLS). For example, the PLS scores and loadings make it possible to visualise and understand the driving force behind the optimal PLS machinery. In this study, we have investigated the possibilities to visualise and interpret the SVM model. Here, we exclusively have focused on Support Vector Regression to demonstrate these visualisation and interpretation techniques. Our observations show that we are now able to turn a SVR black box model into a transparent and interpretable regression modeling technique.  相似文献   

13.
The optimizations geometries and vibrational frequencies of H2CO,HCONH2 and acquired 3 complexes between H2CO?HCONH2 have been calculated by using the ab initio method at the MP2/6-31G( d)and MP2 (FC)/6-311++G(d,p)level. The non-minimum structures with negative vibrational frequencies are excluded. The lowest energy conformer of these complexes is a cyclic structure with N - H?O and C - H?O hydrogen bonds in a common plane. No significant changes are observed in the geometries of the monomers in their complexed state. The most characteristic geometrical properties of the complex are the lengthening of the contacting N-H bonds by 0.4-1.1 pm,and the general shortening of the contacting C-H bonds by 0.3-0.4 pm with respect to the monomers. The interaction energies of complexes have been corrected by the basis set superposition error (BSSE)using the full Boys-Bernardi counterpoise correction scheme. The corrected complex interaction energies of 3 structures at MP2/6-311++G(2df,3p)/ / MP2(FC)/6-311++G(d,p)level are -29.94, -16.10 and -18.45 kJ/mol,respectively. The interaction energies indicate that C - H?O is a weak hydrogen bond. The results of natural bond orbital population analysis reveals that there is only a small charge-transfer in the process of forming the complexes. The results of natural bond orbital analysis and atom in the molecule scheme appear quite significant in view of their importance for understanding the mechanisms of intermolecular interaction leading to hydrogen bonding. The results of molecular interaction energy decomposition analysis show that the electrostatic interaction plays an essential role in stabilizing the H2CO?HCONH2 complexes.  相似文献   

14.
Quantitative structure-activity relationship (QSAR) studies based on chemometric techniques are reviewed. Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR). Advantages of PLS compared to MLR are illustrated with typical applications. Genetic algorithm (GA) is a novel optimization technique which can be used as a search engine in variable selection. A novel hybrid approach comprising GA and PLS for variable selection developed in our group (GAPLS) is described. The more advanced method for comparative molecular field analysis (CoMFA) modeling called GA-based region selection (GARGS) is described as well. Applications of GAPLS and GARGS to QSAR and 3D-QSAR problems are shown with some representative examples. GA can be hybridized with nonlinear modeling methods such as artificial neural networks (ANN) for providing useful tools in chemometric and QSAR.  相似文献   

15.
Abstract

Quantitative structure-activity relationship (QSAR) studies based on chemometric techniques are reviewed. Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR). Advantages of PLS compared to MLR are illustrated with typical applications. Genetic algorithm (GA) is a novel optimization technique which can be used as a search engine in variable selection. A novel hybrid approach comprising GA and PLS for variable selection developed in our group (GAPLS) is described. The more advanced method for comparative molecular field analysis (CoMFA) modeling called GA-based region selection (GARGS) is described as well. Applications of GAPLS and GARGS to QSAR and 3D-QSAR problems are shown with some representative examples. GA can be hybridized with nonlinear modeling methods such as artificial neural networks (ANN) for providing useful tools in chemometric and QSAR.  相似文献   

16.
17.
Fourier transform-near infrared (FT-NIR) and FT-Raman spectrometries have been used to design partial least squares (PLS) calibration models for the determination of the ethanol content of ethanol fuel and alcoholic beverages. In the FT-NIR measurements the spectra were obtained using air as reference, and the spectral region for PLS modeling were selected based on the spectral distribution of the relative standard deviation in concentration. In the FT-Raman measurements hexachloro-1,3-butadiene (HCBD) has been used as an external standard. In the PLS/FT-NIR modeling for ethanol fuel analysis 50 ethanol fuel standards (84.9-100% (w/w)) were used (25 in the calibration, 25 in the validation). In the PLS/FT-Raman modeling 25 standards were used (13 in the calibration, 12 in the validation). The PLS/FT-NIR and FT-Raman models for beverage analysis made use of 24 standards (0-100% (v/v)). Twelve of them contained sugars (1-5% (w/w)), one-half was used in the calibration and the other half in the validation. Different spectral pre-processing were used in the PLS modeling, depending on the type of sample investigated. In the ethanol fuel analysis the FT-NIR pre-processing was a 17 points smoothed first derivative and for beverages no spectral pre-processing was used. The FT-Raman spectra were pre-processed by vector normalization in the ethanol fuel analysis and by a second derivative (17 points smoothing) in the beverage analysis. The PLS models were used in the analysis of real ethanol fuel and beverage samples. A t-test has shown that the FT-NIR model has an accuracy equivalent to that of the reference method (ASTM D4052) in the analysis of ethanol fuel, while in the analysis of beverages, the FT-Raman model presents an accuracy equivalent to the reference method. The limits of detection for NIR and Raman calibration models were 0.05 and 0.2% (w/w), respectively. It has also been shown that both techniques, present better results than gas chromatography (GC) in evaluating the ethanol content of beverages.  相似文献   

18.
19.
Advances in technology make it happen to have massive amount of information in the form of multiple variables per object. The use of multivariate approaches for modeling the real‐life phenomena is natural in such situation. There are numerous multivariate approaches in the literature, and its a challenge to stay updated on the possibilities. Partial least squares (PLSs) are one of the many modeling approaches for high‐throughput data, and its use in different fields to address the variety of problems has been increased in recent years. We therefore present an overview of PLS's applications. The objective of this paper is to give a comprehensive overview on the advances in PLS algorithm together with its applications for regression, classification, variable selection, and survival analysis problems covering genomics, chemometrics, neuroinformatics, process control, computer vision, econometric, environmental studies, and so on. We have mainly presented different PLS approaches and their applications, so that the reader can easily get an understanding of possibility to use PLS for their own field. For further reading, literature references together with software availability are provided. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
合成了2-苯基-4-硒唑甲酸配体(HL)和相应的6个过渡金属配合物[ML2(H2O)2](M=Co,Ni,Cu,Cd)1~4,[ZnL2](5),[CuL2(py)2](6).用元素分析、红外光谱、热重分析等表征手段确定了配合物的组成;用单晶X射线衍射测定了配合物3和6的结构;用溴化乙锭荧光探针初步研究了它们与DNA作用的强度和模式;考察了配体和配合物对大肠埃氏杆菌(E.coliJM109)、大肠杆菌(E.coli,DH5α)、表皮葡萄球菌(S.epidermidis)、金黄色葡萄球菌(S.aureus)、鲍曼不动杆菌(baumanii)、草绿色链球菌(S.viridans)6种细菌的抗菌活性及对正常细胞293T和肿瘤细胞RAW264.72的体外增殖抑制作用.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号