首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data
Authors:Richard G Brereton  
Institution:

aCentre for Chemometrics, School of Chemistry, University of Bristol, Cantocks Close, Bristol BS2 8DF, UK

Abstract:This article discusses problems of validating classification models especially in datasets where sample sizes are small and the number of variables is large. It describes the use of percentage correctly classified (%CC) as an indicator for success of a classification model. For small datasets, %CC should not be used uncritically and its interpretation depends on sample size. It illustrates the use of a common classification method, discriminant partial least squares (D-PLS) on a randomly generated dataset of 200 samples and 200 variables.

An aim of the classifier is to determine whether the null hypothesis (there is no distinction between two classes) can be rejected. Autoprediction gives an 84.5% CC. It is shown that, if there is variable selection, it must be performed independently on the training set to obtain a CC close to 50% on the test set; otherwise, over-optimistic and false conclusions can be reached about the ability to classify samples into groups.

Finally, two aims of determining the quality of a model are frequently confused, namely optimisation (often used to determine the most appropriate number of components in a model) and independent validation; to overcome this, the data should be split into three groups.

There are often difficulties with model building if validation and optimisation have been done on different groups of samples, especially using iterative methods, each group being modelled using properties, such as a different number of components or different variables.

Keywords:Classification model  Discriminant partial least squares  D-PLS  Optimisation  Percentage correctly classified  Validation  %CC
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号