A Method for Clustering and Screening of Long-dimensional Chemical Data Based on Fingerprints and Similarity Measurements |
| |
Authors: | Manuel Urbano Cuadrado Gonzalo Cerruela García Irene Luque Ruiz Miguel Ángel Gómez-Nieto |
| |
Affiliation: | (1) Department of Computing and Numerical Analysis, University of Córdoba, Campus Universitario de Rabanales, Albert Einstein Building, E-14071 Córdoba, Spain |
| |
Abstract: | A method for the treatment of long-dimensional chemical data arrays is presented in this work with the aim of maximising classification models. The method is based on the construction of fingerprints and the subsequent generation of a similarity matrix. The similarity calculation has been modified through a scaling process to take into account different significance shown by the variables. The method was applied to spectral measurements of wines and several aspects were studied, namely: threshold considered in the construction of fingerprints and patterns, weighting factor used for scaling, normalisation method, etc. The application of both Principal Components Analysis and Soft-Independent Modelling of Class Analogies to the similarity matrices gave better classifications of the information than those obtained using original data. |
| |
Keywords: | data preparation similarity calculation fingerprints clustering screening |
本文献已被 SpringerLink 等数据库收录! |