首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
万金玉  刘怡飞 《化学通报》2019,82(10):926-936
随着有机磷化合物(OPs)的广泛应用,其在越来越多的环境介质中被检测出来。大多数OPs具有毒性,但人们缺乏快速且有效的预测手段来对毒性进行评估。本文将结合E-Dragon软件计算的分子描述符,采用不同的QSAR模型对36个OPs的毒性进行预测。文中采用后退法作为描述符筛选方法,以均方根误差(RMSE)作为评价标准,共找到14个对线性核函数支持向量机(SVM)模型贡献较大的描述符;在最终得到的SVM模型交叉验证结果中,计算值与实际值的相关系数为0. 913,均方根误差为0. 388;外部测试验证结果中,平均相对误差为9. 10%。此外,采用多元线性回归(MLR)、人工神经网络(ANN)以及偏最小二乘回归(PLS)模型对OPs的毒性进行预测,交叉验证结果显示,三个模型的计算值与实际值的相关系数分别为0. 878、0. 686与0. 620,没有SVM模型的预测能力好。因此采用线性核函数的SVM模型对OPs进行毒性预测是一个行之有效的方法。  相似文献   

2.
机器学习方法用于建立乙酰胆碱酯酶抑制剂的分类模型   总被引:1,自引:0,他引:1  
我们构建了表征乙酰胆碱酯酶抑制剂分子组成、电荷、拓扑、几何结构及物理化学性质等特征的1559个描述符,通过Fischer Score排序过滤和Monte Carlo模拟退火法相结合进行变量筛选得到37个描述符,然后分别用支持向量学习机(SVM)、人工神经网络(ANN)和k-近邻(k-NN)等机器学习方法建立了乙酰胆碱酯酶抑制剂的分类预测模型.对于训练集的515个样本,通过五重交叉验证,各机器学习方法对正样本,负样本和总样本的平均预测精度分别为87.3%-92.7%,67.0%-81.0%和79.4%-88.2%;通过y-scrambling方法验证SVM模型是否偶然相关,结果正样本,负样本和总样本的平均预测精度分别为72.7%-82.5%,41.0%-53.0%和62.1%-69.1%,明显低于实际所建模型的预测精度,表明所建模型不存在偶然相关;对172个没有参与建模的外部独立测试样本,各机器学习方法对正样本,负样本和总样本的预测精度分别为93.3%-100.0%,74.6%-89.6%和86.1%-95.9%.所建模型中,SVM模型预测精度最好,且明显高于其它文献报道结果.  相似文献   

3.
为预测埃坡霉素类衍生物的抗癌活性, 定义了一套表征分子形状的描述符, 即K阶形状参数, 并计算了67个表征分子的电子、拓扑和几何结构的分子描述符. 描述符经遗传算法筛选, 用于建立基于支持向量学习机(SVM)的抗癌活性分类模型; 用留一法和5重交叉验证法对SVM模型参数进行了优化. 结果表明模型具有较高的预测性且两种方法得到相近结果, 交叉验证的预测正确率达80.6%; 经筛选后的描述符有30个, 其中含有5个K阶形状参数, 这些描述符对埃坡霉素类衍生物的抗癌活性的模型建立具有比较重要的作用.  相似文献   

4.
基于支持向量机方法的HERG钾离子通道抑制剂分类模型   总被引:1,自引:0,他引:1  
对human ether-a-gō-gō related genes(HERG)钾离子通道(钾通道)抑制剂,计算了表征分子组成、电荷分布、拓扑、几何结构及物理化学性质等特征的1559个分子描述符.采用Fischer Score(F-Score)排序过滤和Monte Carlo模拟退火法相结合从中筛选与HERG钾通道抑制剂分类相关的分子描述符.采用支持向量机(SVM)方法,分别以IC50=1.0、10.0μmol·L-1为分类标准,建立了三个分类预测模型.对367个训练集分子,用五重交叉验证.得到正、负样本的平均预测精度分别为84.8%-96.6%、80.7%-97.7%,其总的平均预测精度为87.1%-97.2%,优于其它文献报道结果.对97个外部测试集分子,所建三个模型的总样本预测精度在67.0%-90.1%之间,接近或优于其它文献报道结果.  相似文献   

5.
为了构建310个有机物分子结构与其黏度之间的定量结构-性质关系(QSPR)模型,探讨影响有机物液体黏度的结构因素,首先运用迭代自组织数据分析技术(ISODATA)将样本集初步分类,划分为训练集和测试集,进而应用DRAGON2.1软件计算310个有机物分子的分子结构描述符,以蚁群算法(ACO)筛选分子描述符,得到5个参数,随后分别采用多元线性回归法(MLR)和支持向量机法(SVM)建立ACO-MLR模型和ACOSVM模型.结果表明,非线性ACO-SVM模型(相关系数R2train=0.9013,R2test=0.9026)的性能优于线性ACOMLR模型(R2train=0.7680,R2test=0.8725).ACO-MLR模型和ACO-SVM模型对测试集所得预测值与实验值的相关系数分别为0.934和0.950,预测效果令人满意.本文应用Williams图对模型的应用域进行了一定的研究,所建立的模型为工程上提供了一种根据分子结构预测有机物黏度的有效方法.  相似文献   

6.
Multi-KNN-SVR组合预测在含氟化合物QSAR研究中的应用   总被引:1,自引:0,他引:1  
为深入认识含氟农药生物活性与其结构之间的关系, 建立了理想的QSAR模型, 从化合物油水分配系数等7个分子结构描述符出发, 基于支持向量回归(SVR)和MSE最小原则, 经自动寻找最优核函数和非线性筛选描述符, 构建了多个K-最近邻(KNN)预测子模型. 再经非线性筛选获得保留子模型, 以保留子模型实施组合预测(Multi-KNN-SVR). 33种含氟化合物对5种不同病害生物活性的留一法组合预测结果表明, 采用非线性筛选描述符和KNN子模型能有效地提高预测精度, 基于多个KNN子模型的非线性组合能进一步提高预测性能. Multi-KNN-SVR组合预测在QSAR以及其它相关预测研究中具有广泛应用前景.  相似文献   

7.
分子映射(MOLMAP)指数是以分子的化学键描述符为基础,通过Kohonen自组织映射依据一定的算法而衍生.化学键描述符是由化学键的物理化学性质,如两端原子的电荷差和拓扑性质,键连杂原子数量等所组成.本文将分子映射指数应用于4075个有机物质(Ames实验结果:2305个结构有诱变性,1770个结构无诱变性)的变异性预测.通过随机森林,分别采用三种类型的指数建立模型:(1)采用不同维数的分子映射指数;(2)采用全局分子描述符;(3)将分子映射指数与全局分子描述符相结合.整个数据集的集外(out-of-bag)交叉验证的正确预测率达到85.4%.为了检验模型的稳定性,采用所建模型预测源于另一数据库的472个化合物,正确预测率为86.7%,与此前的研究相比,两个预测结果均有所提高.  相似文献   

8.
基于SVR和k-近邻群的组合预测在QSAR中的应用   总被引:1,自引:0,他引:1  
为提高定量构效关系(QSAR)研究的预测精度,发展了一种新的基于支持向量机回归(SVR)非线性筛选分子结构描述符、基于k-近邻群的非线性组合预测方法.首先以均方误差(MSE)最小为原则,以留一法通过多轮末尾淘汰实施分子结构描述符的非线性SVR汰选并给出最优核函数和相应保留描述符;其次基于待测样本与训练样本保留描述符向量的欧氏距离,以不同k-近邻群子模型双重留一法预测值反映样本集的异质性;然后基于MSE最小,以留一法通过多轮末尾淘汰实施近邻群子模型的非线性SVR汰选并给出最优核函数和相应保留子模型;最后基于保留子模型以双重留一法实施组合预测.以取代苯胺和苯酚类化合物对大型溞的QSAR实例验证表明:新方法在所有参比模型中预测精度最高,且能更精细地反映描述符与化合物毒性间的非线性关系,具结构风险最小、非线性、适于小样本,能有效克服过拟合、维数灾和局极小,非线性筛选描述符和子模型,非线性组合预测,自动选择最优核函数及其相应参数,泛化推广能力优异、预测精度高等诸多优点,在QSAR研究中有广泛应用前景.  相似文献   

9.
对激素敏感脂肪酶,我们构建了表征分子组成、电荷、拓扑、几何结构及物理化学性质等特征的1559个描述符,通过Fischer Score排序过滤和Monte Carlo模拟退火法相结合进行变量筛选得到35个描述符,然后分别用支持向量学习机(SVM)、人工神经网络(ANN),k-近邻(k-NN),连续核密度估计(CKD)和逻...  相似文献   

10.
与传统的非甾体类消炎药相比,选择性环氧化酶-2抑制剂具有无胃肠道粘膜损伤,溃疡和肾功能障碍等严重的副作用,设计选择性环氧化酶-2抑制剂具有重要意义。本文用支持矢量学习机和神经网络两种机器学习方法建立选择性环氧化酶-2抑制剂的活性预测模型,以期为选择性环氧化酶-2抑制剂药物的合成提供先导化合物。我们将467个环氧化酶-2抑制剂用Kennard-Stone方法分为训练集,验证集和独立测试集,对每一抑制剂分子我们计算了463个包含组成描述符和拓扑描述符的分子描述符来表征其分子结构,并通过F-Score方法选取最重要的分子描述符用于分类模型的建立。结果表明,SVM方法通过变量筛选后具有很好的预测能力,其预测正确率达到93.30%。  相似文献   

11.
The advancements of information technology and related processing techniques have created a fertile base for progress in many scientific fields and industries. In the fields of drug discovery and development, machine learning techniques have been used for the development of novel drug candidates. The methods for designing drug targets and novel drug discovery now routinely combine machine learning and deep learning algorithms to enhance the efficiency, efficacy, and quality of developed outputs. The generation and incorporation of big data, through technologies such as high-throughput screening and high through-put computational analysis of databases used for both lead and target discovery, has increased the reliability of the machine learning and deep learning incorporated techniques. The use of these virtual screening and encompassing online information has also been highlighted in developing lead synthesis pathways. In this review, machine learning and deep learning algorithms utilized in drug discovery and associated techniques will be discussed. The applications that produce promising results and methods will be reviewed.  相似文献   

12.
人工智能助力当代化学研究   总被引:1,自引:0,他引:1  
朱博阳  吴睿龙  于曦 《化学学报》2020,78(12):1366-1382
以机器学习为代表的人工智能在当代的科学研究中正在发挥越来越重要的作用.不同于传统的计算机程序,机器学习人工智能可以通过对大量数据的反复分析和自身模型的优化,即“学习”过程,从而在大量的数据中寻找客观事物的相互联系,形成具有更好预测和决策能力的新模型,做出合理的判断.化学研究的特点恰恰是机器学习人工智能的强项.化学研究经常要面对十分复杂的物质体系和实验过程,从而很难通过化学物理原理进行精准的分析和判断.人工智能可以挖掘化学实验中产生的海量实验数据的相关性,帮助化学家做出合理分析预测,大大加速化学研发过程.本文介绍了当代人工智能方法及用其解决化学问题基本原理,并通过具体案例展示了人工智能辅助解决不同化学研发问题的方法以及对应的机器学习算法.将人工智能运用在化学科学的尝试正处于蓬勃上升期,人工智能已经初步展示出对化学研究的强大助力,希望本文能帮助更多的国内的化学工作者了解和运用这一有力的工具.  相似文献   

13.
The discovery of materials is increasingly guided by quantum-mechanical crystal-structure prediction, but the structural complexity in bulk and nanoscale materials remains a bottleneck. Here we demonstrate how data-driven approaches can vastly accelerate the search for complex structures, combining a machine-learning (ML) model for the potential-energy surface with efficient, fragment-based searching. We use the characteristic building units observed in Hittorf's and fibrous phosphorus to seed stochastic (“random”) structure searches over hundreds of thousands of runs. Our study identifies a family of hierarchically structured allotropes based on a P8 cage as principal building unit, including one-dimensional (1D) single and double helix structures, nanowires, and two-dimensional (2D) phosphorene allotropes with square-lattice and kagome topologies. These findings yield new insight into the intriguingly diverse structural chemistry of phosphorus, and they provide an example for how ML methods may, in the long run, be expected to accelerate the discovery of hierarchical nanostructures.  相似文献   

14.
The discovery of materials is increasingly guided by quantum‐mechanical crystal‐structure prediction, but the structural complexity in bulk and nanoscale materials remains a bottleneck. Here we demonstrate how data‐driven approaches can vastly accelerate the search for complex structures, combining a machine‐learning (ML) model for the potential‐energy surface with efficient, fragment‐based searching. We use the characteristic building units observed in Hittorf's and fibrous phosphorus to seed stochastic (“random”) structure searches over hundreds of thousands of runs. Our study identifies a family of hierarchically structured allotropes based on a P8 cage as principal building unit, including one‐dimensional (1D) single and double helix structures, nanowires, and two‐dimensional (2D) phosphorene allotropes with square‐lattice and kagome topologies. These findings yield new insight into the intriguingly diverse structural chemistry of phosphorus, and they provide an example for how ML methods may, in the long run, be expected to accelerate the discovery of hierarchical nanostructures.  相似文献   

15.
Hydrophobicity is an important physicochemical property of peptides and proteins. It is responsible for their conformational changes, stability, as well as various chemical intramolecular and intermolecular interactions. Enormous efforts have been invested to study the extent of hydrophobicity and how it could influence various biological processes, in addition to its crucial role in the separation and purification endeavor as well. Here, we have reviewed various studies that were carried out to determine the hydrophobicity starting from (i) simple amino acids solubility behavior, (ii) experimental approach that was undertaken in the reversed-phase liquid chromatography mode, and ending with (iii) some examples of more advanced computational and machine learning models.  相似文献   

16.
Crosslinked polyethylene (PEX‐a) pipes are emerging as promising replacements for traditional metal or concrete pipes used for water, gas, and sewage transport. Understanding the relationship between pipe formulation and performance is critical to their proper design and implementation. We have developed a methodology using principal component analysis (PCA) and the machine learning techniques of k‐means clustering and support vector machines (SVM) to compare and classify different PEX‐a pipe formulations based on characteristic infrared (IR) spectroscopy absorbance peaks. The application of PCA revealed that a large percentage (89%) of the total variance could be explained by the first three principal components (PC1‐PC3), with distinct clustering of the data for each formulation. By examining the contribution of the individual IR bands to the PCs, we determined that PC1 could be attributed to different peroxide crosslinkers, whereas PC2 and PC3 could be attributed to differences in the additives. Using the PCA results as input to k‐means clustering and SVM resulted in very high accuracy of classifying the different pipe formulations. Our approach highlights the advantages of using PCA and machine learning techniques to characterize different formulations of PEX‐a pipes, which is important to achieve a detailed understanding of the pipe formulation and manufacturing process. © 2019 Wiley Periodicals, Inc. J. Polym. Sci., Part B: Polym. Phys. 2019 , 57, 1255–1262  相似文献   

17.
In this era of artificial intelligence, we urgently want to optimize the current material design methods to come up with a more efficient and more accurate closed-loop system. The approach requires at least three parts including high-throughput screening, automated synthesis platform, and machine learning algorithms. Fortunately, the techniques mentioned above have been substantial developed. We have introduced the common algorithms of machine learning. Then, several machine learning-based design of carbon-based electrocatalysts are discussed. We tried to illustrate the research norms involving machine learning. Besides, other paper structures and details have been also discussed.  相似文献   

18.
随着商品中所含各种化合物的不断使用,人们日益关注其对人类及生态环境的安全危害。在过去的几年里,通过计算方法预测化合物毒性已经显示出极大的潜力。在此,总结了常用的机器学习和深度学习算法在建立毒性预测模型上的优缺点,并系统回顾了近三年发表的可免费访问的毒性预测网络服务器。此外,还讨论了基于人工智能和互联网时代下毒性预测所面临的机遇和挑战。希望指导人们合理的选择算法和网络服务器进行建模及化合物毒性评估。  相似文献   

19.
Reactive molecular dynamics (MD) simulation is performed using a reactive force field (ReaxFF). To this end, we developed a new method to optimize the ReaxFF parameters based on a machine learning approach. This approach combines the k-nearest neighbor and random forest regressor algorithm to efficiently locate several possible ReaxFF parameter sets. As a pilot test of the developed approach, the optimized ReaxFF parameter set was applied to perform chemical vapor deposition (CVD) of an α-Al2O3 crystal. The crystal structure of α-Al2O3 was reasonably reproduced even at a relatively high temperature (2000 K). The reactive MD simulation suggests that the (110) surface grows faster than the (0001) surface, indicating that the developed parameter optimization technique could be used for understanding the chemical reaction in the CVD process. © 2019 Wiley Periodicals, Inc.  相似文献   

20.
Abstract

The paper describes the program CLEW, which utilizes learning and geometrical fitting to discover pharmacophores from a set of active and inactive compounds. The program first divides the compounds into similar classes. It then utilizes machine learning to derive a set of rules that relate structure to activity for each class. Then it finds the common features among all classes. These common features are used by a geometrical fitting program that tries to a 3D fit between these features between minimized conformations for every active molecule in every class. Such a fit is used to infer a pharmacophore.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号