首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Sesquiterpene lactones-based classification of the family asteraceae using neural networks and k-nearest neighbors
Authors:Hristozov Dimitar  Da Costa Fernando B  Gasteiger Johann
Institution:Computer-Chemie-Centrum, Universit?t Erlangen-Nürnberg, N?gelsbachstrasse 25, D-91052 Erlangen, Germany.
Abstract:In a recent publication we described the application of an unsupervised learning method using self-organizing maps to the separation of three tribes and seven subtribes of the plant family Asteraceae based on a set of sesquiterpene lactones (STLs) isolated from individual species. In the present work, two different structure representations--atom counts (2D) and radial distribution function (RDF) (3D)--and two supervised classification methods--counterpropagation neural networks and k-nearest neighbors (k-NN)--were used to predict the tribe in which a given STL occurs. The data set was extended from 144 to 921 STLs, and the Asteraceae tribes were augmented from three to seven. The k-NN classifier with k = 1 showed the best performance, while the RDF code outperformed the atom counts. The quality of the obtained model was assessed with two test sets, which exemplified two possible applications: (1) finding a plant source for a desired compound and (2) based on a plant species chemical profile (STLs): (a) study the relationship between the current taxonomic classification and plant's chemistry and (b) assign a species to a tribe by majority vote. In addition, the problem of defining the applicability domain of the models was assessed by means of two different approaches-principal component analysis combined with Hotelling T2 statistic and an a posteriori probability-based rule.
Keywords:
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号