Dimension Concepts and Reduced Dimensions in Toxicological QShAR Databases as Tools for Data Quality Assessment |
| |
Authors: | Paul G Mezey Peter Warburton E Jako Zsolt Szekeres |
| |
Institution: | (1) Mathematical Chemistry Research Unit, Department of Chemistry and Department of Mathematics and Statistics, University of Saskatchewan, 110 Science Place, Saskatoon, SK, Canada, S7N 5C9;(2) Institute for Advanced Study, Collegium Budapest, Szentháromság u. 2, 1014 Budapest, Hungary;(3) Committee for Data in Science and Technology, CODATA Secretariat, CODATA (ICSU/UNESCO), 51 Bd de Montmorency, 75016 Paris, France;(4) Institute of Chemistry, University of Budapest, Pázmány Péter Sétány 2, Budapest, Hungary |
| |
Abstract: | The dimensions of databases can be defined based on a variety of concepts, ranging from the standard tools of principal component analysis to context-biased approaches. The effective dimensions of databases, in particular the effective dimensions involving continua such as electron density data, provide a set of important tools for database comparisons and for the evaluation of some aspects of database quality. The problems associated with database comparisons and database mergers, such as those occurring in the process of database unification in the actual merger of two pharmaceutical companies, provide challenging tasks and opportunities for data science. Some of the tools for effective dimension reduction and dimension expansion are reviewed in the context of database quality control and conditions for database compatibility are presented. A common misconception affecting data sampling techniques for data quality evaluation is discussed and methods for circumventing the associated sampling errors are described. |
| |
Keywords: | QShAR (Quantitative Shape – Activity Relations) databases database dimension reduced dimensions database quality assessment sampling errors in high dimensions |
本文献已被 SpringerLink 等数据库收录! |
|