共查询到17条相似文献,搜索用时 343 毫秒
1.
基于SVR和k-近邻群的组合预测在QSAR中的应用 总被引:1,自引:0,他引:1
为提高定量构效关系(QSAR)研究的预测精度,发展了一种新的基于支持向量机回归(SVR)非线性筛选分子结构描述符、基于k-近邻群的非线性组合预测方法.首先以均方误差(MSE)最小为原则,以留一法通过多轮末尾淘汰实施分子结构描述符的非线性SVR汰选并给出最优核函数和相应保留描述符;其次基于待测样本与训练样本保留描述符向量的欧氏距离,以不同k-近邻群子模型双重留一法预测值反映样本集的异质性;然后基于MSE最小,以留一法通过多轮末尾淘汰实施近邻群子模型的非线性SVR汰选并给出最优核函数和相应保留子模型;最后基于保留子模型以双重留一法实施组合预测.以取代苯胺和苯酚类化合物对大型溞的QSAR实例验证表明:新方法在所有参比模型中预测精度最高,且能更精细地反映描述符与化合物毒性间的非线性关系,具结构风险最小、非线性、适于小样本,能有效克服过拟合、维数灾和局极小,非线性筛选描述符和子模型,非线性组合预测,自动选择最优核函数及其相应参数,泛化推广能力优异、预测精度高等诸多优点,在QSAR研究中有广泛应用前景. 相似文献
2.
3.
4.
基于地统计学与支持向量回归的QSAR建模 总被引:4,自引:0,他引:4
基于主成分分析(PCA)、地统计学(GS)和支持向量回归(SVR), 提出了一种新的定量构效关系(QSAR)个体化预测方法——Weight-PCA-GS-SVR. 其基本思路是: 先以PCA降维并消除自变量间的信息冗余, 继以SVR经非线性主成分筛选去除与因变量无关的主成分, 再以保留主成分计算样本间的加权距离, 然后以高维GS确定公用变程; 每一个待测样本都以自身为中心从训练集中找出加权距离小于公用变程的私有k个近邻, 以SVR训练建模完成个体化预测. Weight-PCA-GS-SVR从行、列两个方向对模型进行了优化, 为自变量提供了一种新的加权方法, 为解决最优k近邻选择难题提供了新的思路, 并具有SVR原来的优点. 经3个化合物活性实例数据集验证, 新方法在所有参比模型中预测精度最高, 且明显优于文献报道结果, Weight-PCA-GS-SVR在QSAR等回归预测领域有较广泛的应用前景. 相似文献
5.
基于岭回归和SVM的高维特征选择与肽QSAR建模 总被引:1,自引:0,他引:1
岭回归估计权重绝对值在一定程度上体现了对应特征作用大小, 据此发展了基于岭回归(RR)和支持向量机(SVM)的高维特征选择算法. 对苦味二肽(BTT)和细胞毒性T淋巴细胞(CTL)表位9 肽两个肽体系, 以氨基酸的531 个物理化学性质参数直接表征肽结构, 各获得1062、4779 个初始特征; 对训练集, 初始特征以岭回归排序后序贯引入, 当SVM留一法交叉测试(LOOCV)的均方误差(MSE)显著上扬时终止, 最后以多轮末尾淘汰进一步精筛, 分别获得7、18个物理化学意义明确的保留特征. 基于保留特征与支持向量回归(SVR), 对训练集建立定量构效关系(QSAR)模型, 预测独立测试集, 其拟合精度、留一法交叉测试精度、独立预测精度均优于现有文献报道结果. 新方法运行速度快, 选取的特征物理化学意义明确, 解释性强, 在肽、蛋白质定量构效关系建模等高维数据回归预测领域有较广泛应用前景. 相似文献
6.
定量描述气相色谱柱效的方法,一般采用Martin和Synge引进的理论塔板概念。有时也采用Purnell引进的有效塔板概念。为使理论塔板和有效塔板统一起来,并使新的柱效与容量比(K)无关,本文根据Jan Ceulemans的工作,介绍一种新的气相色谱柱效测量方法 相似文献
7.
多组分烷烃混合气体FTIR光谱定量分析新方法 总被引:1,自引:0,他引:1
提出一种核偏最小二乘(KPLS)特征提取耦合支持向量回归机(SVR)的新方法,用于实现气测录井中傅立叶变换红外(FTIR)光谱法对甲烷、乙烷、丙烷、异丁烷、正丁烷、异戊烷以及正戊烷7种组分混合气体的定量分析.采用KPLS方法对光谱数据进行特征提取,将得到的特征组分作为SVR的输入建立7组分气体的定量分析模型.对相同混合气体进行定量分析,结果显示,采用KPLS特征提取后,SVR模型对7种组分气体的预测均方根误差(RMSEP)分别为0.116、0.079、0.104、0.092、0.108、0.029和0.016,均小于线性偏最小二乘法(LPLS)、LPLS-SVR、KPLSR以及SVR模型的RMSEP.结果表明,KPLS-SVR法可以更好地提取隐含在混合气体FTIR光谱数据与其组分浓度之间的非线性特征,并有效地消除光谱数据噪声,大幅度降低数据维数,是一种有效的气测录井烷烃混合气体定量分析方法. 相似文献
8.
9.
10.
基于局部核函数与全局核函数支持向量回归优化小样本QSAR建模 总被引:1,自引:0,他引:1
为提高小样本定量构效关系(QSAR)预测精度,基于支持向量机全局核函数与局部核函数提出了一种新的建模方法:先依不同核函数筛选描述符,再依保留描述符构建支持向量机回归(SVR)子模型.子模型预测活性值与实验值组成混合样本.以均方误差(MSE)最小为原则,对混合样本再次基于SVR实施核函数寻优与子模型筛选,基于最优核函数和保留子模型以留一法完成预测.对2个小样本体系的QSAR研究表明,该方法兼具局部核函数和全局核函数的优点,既有较强的学习能力,又有较好的推广能力,预测精度高,稳定性好. 相似文献
11.
研究了色谱分离过程中物质的径向扩散和流动相发热对柱效能的影响。从热传导方程出发,运用色谱过程动力学原理推导了包括考虑流动相径向扩散、色谱柱发热影响的液相色谱塔板高度方程:
该方程概括了高效液相色谱(HPLC)、超高效液相色谱(UPLC)、毛细管电色谱(CEC)和消滞留层液相色谱(ESFLC)塔板高度与各种因素的关系。方程最后一项代表了径向扩散和柱发热对塔板高度的贡献。当流动相线速度较低且柱内径较细时,流动相摩擦生热和径向扩散对塔板高度的贡献趋近于零,塔板高度方程还原成Horvath和Lin的方程;当流动相线速度较高时,由于流动相摩擦生热,柱轴心与边缘温差增加,导致流动相线速度径向分布差异,使得柱效率降低。柱轴心与边缘的温差与流动相线速度平方成正比。该文指出,在流动相高线速度情况下,液相色谱的柱效率与柱内径密切相关,采用细内径柱有利于实现高速与高效率;过高的流动相线速度将导致色谱柱效率崩溃。 相似文献
该方程概括了高效液相色谱(HPLC)、超高效液相色谱(UPLC)、毛细管电色谱(CEC)和消滞留层液相色谱(ESFLC)塔板高度与各种因素的关系。方程最后一项代表了径向扩散和柱发热对塔板高度的贡献。当流动相线速度较低且柱内径较细时,流动相摩擦生热和径向扩散对塔板高度的贡献趋近于零,塔板高度方程还原成Horvath和Lin的方程;当流动相线速度较高时,由于流动相摩擦生热,柱轴心与边缘温差增加,导致流动相线速度径向分布差异,使得柱效率降低。柱轴心与边缘的温差与流动相线速度平方成正比。该文指出,在流动相高线速度情况下,液相色谱的柱效率与柱内径密切相关,采用细内径柱有利于实现高速与高效率;过高的流动相线速度将导致色谱柱效率崩溃。 相似文献
12.
13.
A method was developed to determine the minimum allowable operating temperature (MiAOT) of wall coated open tubular capillary columns. Two polyethylene glycol and fourteen polysiloxanes phases with different side groups (methyl, phenyl, cyanopropyl, trifluoropropyl, n-octyl) and backbone stiffening units (tetramethyl-p-silphenylene, tetramethyl-p,p'-sildiphenylene ether, carborane) were investigated by inverse gas chromatography. A sigmoidal profile of temperature versus column efficiency was found for almost all phases, as the column efficiency increased with temperature. The MiAOT was defined as that temperature where the column efficiency is half of its original value at elevated temperatures. It was found that the MiAOT of a stationary phase is approx. 60 K higher than its glass transition temperature. 相似文献
14.
Summary There are a number of parameters which have to be chosen depending on the analysis being done in gas chromatography. While the choice of stationary phase material is based on the solutes to be separated, the thickness is dependent on the concentration and the volatility of the components to be analyzed. This study undertakes a coupled column phase ratio optimization by connecting a short piece of a particular column prior to a normal length of an analytical column. Various columns of different dimensions (phase ratio), but of the same stationary phase material (methyl silicone), are coupled together by a deactivated glass press-fit connector, and the efficiency and capacity are measured. The coupling of fused silica open tubular columns is optimized in efficiency by matching or decreasing the phase ratio of successive columns. Capacity optimization is accomplished by increasing the phase ratio of consecutive columns. Capacity and efficiency optimization are opposing each other; therefore, if some efficiency can be sacrificed a substantial increase in capacity is possible. 相似文献
15.
Some problems concerning the efficiency of temperature gradient chromatography as compared to that of isothermal chromatography have been investigated. The following expression has been derived for temperature-gradient chromatography: where beta is the separation coefficient, alpha(1) and alpha(2), are the partition coefficients of two substances to be separated, R(f)(1) is the retardation factor for the substance with alpha(1)v is the volume of the eluent in the column, Deltav is the Supplementary volume of the eluent passing through the column owing to evaporation caused by the temperature-gradient, A(M). is the cross-sectional area of the mobile phase and d is the distance between the mid-points of the spots on the chromatogram. It has been shown that under the influence of the temperature gradient Deltav can be large enough for separation of two substances with an alpha(1)/alpha(2), ratio very close to 1. For this reason temperature-gradient chromatography with an open column is the most efficient means of separation so far known. 相似文献
16.
For the investigation of a diol phase (Inertsil Diol column) in hydrophilic interaction chromatography, urea, sucrose and glycine were used as test compounds. The chromatographic conditions were investigated for optimal column efficiency. The column temperature used in common reversed-phase liquid chromatography could also be used for the separation and the flow-rate should be adjusted to 0.3-0.5 ml/min to optimize column efficiency. It is suggested that the velocity of the hydrophilic interaction is slower than the hydrophobic interaction in RPLC. The addition of trifluoroacetic acid is effective for the retention of glycine, but ineffective for urea and sucrose. The diol phase exhibited sufficient chemical stability even if exposed to water in high percentage, and could be applied with isocratic elution for the separation/analysis of amino acids and glucose. 相似文献
17.
The experiment of separating low toxic n-hexane and cyclohexane by traditional gas chromatography was improved with orthogonal experimental design. The effect of sample quantity, vaporization chamber temperature, detector temperature, column temperature, line velocity and separability of the split ratio were examined. Through the experiment, students could understand the use and maintenance of gas chromatography instruments. At the same time, they could understand separability and column efficiency of chromatography. 相似文献