首页 | 本学科首页   官方微博 | 高级检索  
     


k‐Nearest neighbors optimization‐based outlier removal
Authors:Abraham Yosipof  Hanoch Senderowitz
Affiliation:Department of Chemistry, Bar Ilan University, Ramat‐Gan, Israel
Abstract:Datasets of molecular compounds often contain outliers, that is, compounds which are different from the rest of the dataset. Outliers, while often interesting may affect data interpretation, model generation, and decisions making, and therefore, should be removed from the dataset prior to modeling efforts. Here, we describe a new method for the iterative identification and removal of outliers based on a k‐nearest neighbors optimization algorithm. We demonstrate for three different datasets that the removal of outliers using the new algorithm provides filtered datasets which are better than those provided by four alternative outlier removal procedures as well as by random compound removal in two important aspects: (1) they better maintain the diversity of the parent datasets; (2) they give rise to quantitative structure activity relationship (QSAR) models with much better prediction statistics. The new algorithm is, therefore, suitable for the pretreatment of datasets prior to QSAR modeling. © 2014 Wiley Periodicals, Inc.
Keywords:outlier removal  outlier detection  k‐nearest neighbors  quantitative structure activity relationship  optimization  distance‐based method
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号