Comparison of different data set screening methods for use in QSAR/QSPR generation studies |
| |
Authors: | B. T. Luke |
| |
Affiliation: | Advanced Biomedical Computing Center, SAIC Frederick, NCI-FCRDC, 430 Miller Drive, Frederick, MD 21702, USA |
| |
Abstract: | In investigations aimed at generating accurate Quantitative Structure/Activity Relationships (QSAR) or Quantitative Structure/Property Relationships (QSPR), data sets are used that potentially contain a large number of descriptors for each compound. For example, one of the data sets generated by Breneman and Rhem [J. Comput. Chem. 18 (1997) 182–197] contain 118 descriptors and the HPLC capacity factors [log(k′)] in an ODS column for 22 compounds. One method of improving the search for good relationships is to prescreen the data set and, hopefully, remove redundant descriptors. This paper examines six different methods of prescreening a data set. Each method is used to generate multiple subsets of the data, either using different screening thresholds or producing sets with a given number of descriptors. Each set is then examined in two different ways. The first uses an Evolutionary Programming method described earlier to generate multiple relationships, and the second determines which relationships could be obtained from this reduced set of descriptors and ranks them relative to the top 250 relationships obtained from an All Possible Sets search of the full data set. |
| |
Keywords: | Data set screening Quantitative structure/property relationship Evolutionary programming |
本文献已被 ScienceDirect 等数据库收录! |
|