首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Dimension Concepts and Reduced Dimensions in Toxicological QShAR Databases as Tools for Data Quality Assessment
Authors:Paul G Mezey  Peter Warburton  E Jako  Zsolt Szekeres
Institution:(1) Mathematical Chemistry Research Unit, Department of Chemistry and Department of Mathematics and Statistics, University of Saskatchewan, 110 Science Place, Saskatoon, SK, Canada, S7N 5C9;(2) Institute for Advanced Study, Collegium Budapest, Szentháromság u. 2, 1014 Budapest, Hungary;(3) Committee for Data in Science and Technology, CODATA Secretariat, CODATA (ICSU/UNESCO), 51 Bd de Montmorency, 75016 Paris, France;(4) Institute of Chemistry, University of Budapest, Pázmány Péter Sétány 2, Budapest, Hungary
Abstract:The dimensions of databases can be defined based on a variety of concepts, ranging from the standard tools of principal component analysis to context-biased approaches. The effective dimensions of databases, in particular the effective dimensions involving continua such as electron density data, provide a set of important tools for database comparisons and for the evaluation of some aspects of database quality. The problems associated with database comparisons and database mergers, such as those occurring in the process of database unification in the actual merger of two pharmaceutical companies, provide challenging tasks and opportunities for data science. Some of the tools for effective dimension reduction and dimension expansion are reviewed in the context of database quality control and conditions for database compatibility are presented. A common misconception affecting data sampling techniques for data quality evaluation is discussed and methods for circumventing the associated sampling errors are described.
Keywords:QShAR (Quantitative Shape –  Activity Relations) databases  database dimension  reduced dimensions  database quality assessment  sampling errors in high dimensions
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号