首页 | 本学科首页   官方微博 | 高级检索  
     


Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules
Authors:Timon Sebastian Schroeter  Anton Schwaighofer  Sebastian Mika  Antonius Ter Laak  Detlev Suelzle  Ursula Ganzer  Nikolaus Heinrich  Klaus-Robert Müller
Affiliation:Fraunhofer FIRST, Berlin, Germany. timon.schroeter@first.fraunhofer.de
Abstract:
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
Keywords:Solubility  Aqueous  Machine learning  Drug discovery  Domain of applicability  Error bar  Error estimation  Gaussian Process  Bayesian modeling  Random forest  Ensemble  Decision tree  Support vector machine  Ridge regression  Distance
本文献已被 PubMed SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号