101.
Near infrared (NIR) spectroscopy is an efficient, low‐cost analytical technique widely applied to identify the origin of food and pharmaceutical products. NIR spectra‐based classification strategies typically use thousands of equally spaced wavelengths as input information, some of which may not carry relevant information for product classification. When that is the case, the performance of predictive and exploratory multivariate techniques may be undermined by such noisy information. In this paper, we propose an iterative framework for selecting subsets of NIR wavelengths aimed at classifying samples into categories. For that matter, we integrate Principal Components Analysis (PCA) and three classification techniques:
k‐Nearest Neighbor (KNN), Probabilistic Neural Network (PNN) and Linear Discriminant Analysis (LDA). PCA is first applied to NIR data, and a wavelength importance index is derived based on the PCA loadings. Samples are then categorized using the wavelength with the highest index and the classification accuracy is calculated; next, the wavelength with the second highest index is inserted into the dataset and a new classification is performed. This forward‐based iterative procedure is carried out until all original wavelengths are inserted into the dataset used for classification. The subset of wavelengths leading to the maximum accuracy is chosen as the recommended subset. Our propositions performed remarkably well when applied to four datasets related to food and pharmaceutical products. Copyright © 2016 John Wiley & Sons, Ltd.
相似文献