Automated QSPR modeling and data curation of physicochemical properties using KNIME platform: Prediction of partition coefficients |
| |
Authors: | Bouhedjar Khalid Hamida Ghorab Abdelhamid Benkhemissa |
| |
Affiliation: | 1. Laboratoire Bioinformatique, Centre de Recherche en Biotechnologie (CRBt), BP 73 UV 03, Ali Mendjeli Nouvelle Ville, Constantine, Algeria;2. University of Constantine, 1, Constantine, Algeria;3. Centre de Recherche en Biotechnologie (CRBt), BP 73 UV 03, Ali Mendjeli Nouvelle Ville, Constantine, Algeria |
| |
Abstract: | In the last two decades, the volumes of chemical and biological data are constantly increasing. The problem of converting data sets into knowledge is both expensive and time-consuming, as a result a workflow technology with platforms such as KNIME, was built up to facilitate searching through multiple heterogeneous data sources and filtering for specific criteria then extracting hidden information from these large data. Before any QSAR modeling, a manual data curation is extremely recommended. However, this can be done, for small datasets, but for the extensive data accumulated recently in public databases a manual process of big data will be hardly feasible. In this work, we suggest using KNIME as an automated solution for workflow in data curation, development, and validation of predictive QSAR models from a huge dataset.In this study, we used 250250 structures from NCI database, only 3520 compounds could successfully pass through our workflow safely with their corresponding experimental log P, this property was investigated as a case study, to improve some existing log P calculation algorithms. |
| |
Keywords: | QSAR QSPR Workflow KNIME NCI database Data curation |
本文献已被 ScienceDirect 等数据库收录! |
|