首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Two data fusion strategies (variable and decision level) combined with a multivariate classification approach (Partial Least Squares-Discriminant Analysis, PLS-DA) have been applied to get benefits from the synergistic effect of the information obtained from two spectroscopic techniques: UV-visible and 1H NMR. Variable level data fusion consists of merging the spectra obtained from each spectroscopic technique in what is called “meta-spectrum” and then applying the classification technique. Decision level data fusion combines the results of individually applying the classification technique in each spectroscopic technique. Among the possible ways of combinations, we have used the fuzzy aggregation connective operators. This procedure has been applied to determine banned dyes (Sudan III and IV) in culinary spices. The results show that data fusion is an effective strategy since the classification results are better than the individual ones: between 80 and 100% for the individual techniques and between 97 and 100% with the two fusion strategies.  相似文献   

2.
The freshness of virgin olive oils (VOO) from typical cultivars of Garda regions was evaluated by attenuated total reflectance (ATR) and Fourier transform infrared (FTIR) spectroscopy, in combination with multivariate analysis. The olive oil freshness decreased during storage mainly because of oxidation processes. In this research, 91 virgin olive oils were packaged in glass bottles and stored either in the light or in the dark at room temperature for different periods. The oils were analysed, before and after storage, using both chemical methods and spectroscopic technique.Classification strategies investigated were partial least square discriminant analysis (PLS-DA), linear discriminant analysis (LDA), and soft independent modelling of class analogy (SIMCA).The results show that ATR-MIR spectroscopy is an interesting technique compared with traditional chemical index in classifying olive oil samples stored in different conditions. In fact, the FTIR PCA results allowed a better discrimination among fresh and oxidized oils, than samples separation obtained by PCA applied to chemical data. Moreover, the results obtained by the different classification techniques (PLS-DA, LDA, SIMCA) evidenced the ability of FTIR spectra to evaluate the olive oil freshness. FTIR spectroscopy results are in agreement with classical methods. The spectroscopic technique could be applied for the prediction of VOOs freshness giving information related to chemical modifications. The great advantages of this technique, compared to chemical analysis, are related to rapidity, non-destructive characteristics and low cost per sample. In conclusion, ATR-MIR represents a reliable, cheap and fast classification tool able to assess the freshness of virgin olive oils.  相似文献   

3.
The potential of a vanguard technique as is the ion mobility spectrometry with ultraviolet ionization (UV-IMS) coupled to a continuous flow system (CFS) have been demonstrated in this work using a gas phase separator (GPS). This vanguard system (CFS-GPS-UV-IMS) has been used for the analysis of different types of white wines to obtain a characteristic profile for each type of wine and their posterior classification using different chemometric tools. Precision of the method was 3.1% expressed as relative standard deviation. A deep chemometric study was carried out for the classification of the four types of wines selected. The best classification performance was obtained by first reducing the data dimensionality by principal component analysis (PCA) followed by linear discriminant analysis (LDA) and finally using a k-nearest neighbour (kNN) classifier. The classification rate in an independent validation set was 92.0% classification rate value with confidence interval [89.0%, 95.0%] at 95% confidence level.The same white wines analyzed using CFS-GPS-UV-IMS were analyzed using gas chromatography with a flame detector (GC-FID) as conventional technique. The chromatographic method used for the determination of superior alcohols in wine samples shown in the Regulation CEE 1238/1992 was selected to carry out the analysis of the same samples set and later the classification using appropriate chemometrics tools. In this case, strategies PCA-LDA and kNN classifier were also used for the correct classification of the wine samples. This combination showed similar results to the ones obtained with the proposed method.  相似文献   

4.
The ever increasing interest of consumers for safety, authenticity and quality of food commodities has driven the attention towards the analytical techniques used for analyzing these commodities. In recent years, rapid and reliable sensor, spectroscopic and chromatographic techniques have emerged that, together with multivariate and multiway chemometrics, have improved the whole control process by reducing the time of analysis and providing more informative results. In this progression of more and better information, the combination (fusion) of outputs of different instrumental techniques has emerged as a means for increasing the reliability of classification or prediction of foodstuff specifications as compared to using a single analytical technique. Although promising results have been obtained in food and beverage authentication and quality assessment, the combination of data from several techniques is not straightforward and represents an important challenge for chemometricians. This review provides a general overview of data fusion strategies that have been used in the field of food and beverage authentication and quality assessment.  相似文献   

5.
The combination of the different data sources for classification purposes, also called data fusion, can be done at different levels: low-level, i.e. concatenating data matrices, medium-level, i.e. concatenating data matrices after feature selection and high-level, i.e. combining model outputs. In this paper the predictive performance of high-level data fusion is investigated. Partial least squares is used on each of the data sets and dummy variables representing the classes are used as response variables. Based on the estimated responses ?(j) for data set j and class k, a Gaussian distribution p(g(k)|?(j)) is fitted. A simulation study is performed that shows the theoretical performance of high-level data fusion for two classes and two data sets. Within group correlations of the predicted responses of the two models and differences between the predictive ability of each of the separate models and the fused models are studied. Results show that the error rate is always less than or equal to the best performing subset and can theoretically approach zero. Negative within group correlations always improve the predictive performance. However, if the data sets have a joint basis, as with metabolomics data, this is not likely to happen. For equally performing individual classifiers the best results are expected for small within group correlations. Fusion of a non-predictive classifier with a classifier that exhibits discriminative ability lead to increased predictive performance if the within group correlations are strong. An example with real life data shows the applicability of the simulation results.  相似文献   

6.
Prediction of drug–disease associations is one of the current fields in drug repositioning that has turned into a challenging topic in pharmaceutical science. Several available computational methods use network-based and machine learning approaches to reposition old drugs for new indications. However, they often ignore features of drugs and diseases as well as the priority and importance of each feature, relation, or interactions between features and the degree of uncertainty. When predicting unknown drug–disease interactions there are diverse data sources and multiple features available that can provide more accurate and reliable results. This information can be collectively mined using data fusion methods and aggregation operators. Therefore, we can use the feature fusion method to make high-level features. We have proposed a computational method named scored mean kernel fusion (SMKF), which uses a new method to score the average aggregation operator called scored mean. To predict novel drug indications, this method systematically combines multiple features related to drugs or diseases at two levels: the drug–drug level and the drug–disease level. The purpose of this study was to investigate the effect of drug and disease features as well as data fusion to predict drug–disease interactions. The method was validated against a well-established drug–disease gold-standard dataset. When compared with the available methods, our proposed method outperformed them and competed well in performance with area under cover (AUC) of 0.91, F-measure of 84.9% and Matthews correlation coefficient of 70.31%.  相似文献   

7.
Dual-domain classification analysis is proposed to identify pigments used in works of art studied by Raman spectroscopy and X-ray fluorescence spectrometry. By means of this methodology, Raman and X-ray fluorescence data are jointly processed by a high-level fusion approach. The system proposed aims to avoid the pre-processing stage and directly process raw data obtained from the instrument. The system is tested with spectra contaminated with background components of different shapes and intensities and with those with the background removed by line segment correction. The benefits of the approach were well demonstrated in a study of an ochre pigment classification.The approach is based on the main advantage of wavelet transform, which is multiresolution. Each spectrum is split into blocks, according to a specific frequency, to form a wavelet prism. Partial least squares-discriminant analysis (PLS-DA) is then applied to those blocks which contain the deterministic part of the signal and are not influenced by noise and background signal components. At the end, to obtain the final classification assignment, high-level data fusion of the classifications results (decision levels) obtained from PLS-DA analysis is done by means of fuzzy aggregation connective operators. Our study showed that fuzzy aggregation may be suitable for performing high-level data fusion on dual-domain data. This method can be automated so that classification can be rapid. It can handle classifications with different levels of difficulty and requires no prior knowledge of sample composition.  相似文献   

8.
The possibility provided by Chemometrics to extract and combine (fusion) information contained in NIR and MIR spectra in order to discriminate monovarietal extra virgin olive oils according to olive cultivar (Casaliva, Leccino, Frantoio) has been investigated.Linear discriminant analysis (LDA) was applied as a classification technique on these multivariate and non-specific spectral data both separately and jointly (NIR and MIR data together).In order to ensure a more appropriate ratio between the number of objects (samples) and number of variables (absorbance at different wavenumbers), LDA was preceded either by feature selection or variable compression. For feature selection, the SELECT algorithm was used while a wavelet transform was applied for data compression.Correct classification rates obtained by cross-validation varied between 60% and 90% depending on the followed procedure. Most accurate results were obtained using the fused NIR and MIR data, with either feature selection or data compression.Chemometrical strategies applied to fused NIR and MIR spectra represent an effective method for classification of extra virgin olive oils on the basis of the olive cultivar.  相似文献   

9.
ECS: an automatic enzyme classifier based on functional domain composition   总被引:2,自引:1,他引:1  
Classification for enzymes is a prerequisite for understanding their function. Here, an automatic enzyme identifier based on support vector machine (SVM) with feature vectors from protein functional domain composition was built to identify enzymes and further a classifier to classify enzymes into six different classes: oxidoreductase, transferase, hydrolase, lyase, isomerase and ligase. Jackknife cross-validation test was adopted to evaluate the performance of our classifier. The 86.03% success rate achieved for enzyme/non-enzyme identification and 91.32% for enzyme classification, which is much better than that of the BLAST and PSI-BLAST based method, also outperforms several existed works. The results indicate that protein functional domain composition is able to capture the major features which facilitate the identification/classification of proteins, thus demonstrating that our predictor could be a more effective and promising high-throughput method in enzyme research. Moreover, a web-based software Enzyme Classification System (ECS) for identification as well as classification of enzymes can be accessed at: http://pcal.biosino.org/.  相似文献   

10.
Analysis of DNA sequences isolated directly from the environment, known as metagenomics, produces a large quantity of genome fragments that need to be classified into specific taxa. Most composition-based classification methods use all features instead of a subset of features that may maximize classifier accuracy. We show that feature selection methods can boost performance of taxonomic classifiers. This work proposes three different filter-based feature selection methods that stem from information theory: (1) a technique that combines Kullback-Leibler, Mutual Information, and distance information, (2) a text mining technique, TF-IDF, and (3) minimum redundancy-maximum-relevance (mRMR). The feature selection methods are compared by how well they improve support vector machine classification of genomic reads. Overall, the 6mer mRMR method performs well, especially on the phyla-level. If the number of total features is very large, feature selection becomes difficult because a small subset of features that captures a majority of the data variance is less likely to exist. Therefore, we conclude that there is a trade-off between feature set size and feature selection method to optimize classification performance. For larger feature set sizes, TF-IDF works better for finer-resolutions while mRMR performs the best out of any method for N=6 for all taxonomic levels.  相似文献   

11.
Five different instrumental techniques: thermogravimetry, mid-infrared, near-infrared, ultra-violet and visible spectroscopies, have been used to characterize a high quality beer (Reale) from an Italian craft brewery (Birra del Borgo) and to differentiate it from other competing and lower quality products. Chemometric classification models were built on the separate blocks using soft independent modeling of class analogies (SIMCA) and partial least squares-discriminant analysis (PLS-DA) obtaining good predictive ability on an external test set (75% or higher depending on the technique). The use of data fusion strategies – in particular, the mid-level one – to integrate the data from the different platforms allowed the correct classification of all the training and validation samples.  相似文献   

12.
In recent years, there have been a number of reported studies on the use of non-destructive technique to evaluate and determine mango maturity and ripeness levels. However, most of these reported works were conducted using single-modality sensing systems, either using an electronic nose (e-nose), acoustics, CCD, IR sensor or by other non-destructive measurements. This paper presents the work on the classification of mangoes (Magnifera Indica cv. Harumanis) maturity and ripeness levels using data fusion of the electronic nose (e-nose) and acoustic sensor and combine with CCD and IR sensor. A Fourier-based shape separation method was developed from CCD camera images to grade mango by its shape and able to correctly classify 100%. Colour intensity from infrared image was used to distinguish and classify the level of maturity and ripeness of the fruits. The finding shows 92% correct classification of maturity levels by using infrared vision Three groups of samples each from two different harvesting times (week 7 and week 8) were evaluated by the e-nose and then followed by the acoustic sensor. By applying a low level data fusion technique on the e-nose and acoustic data, the classification for maturity and ripeness levels using LDA was improved.  相似文献   

13.
14.
Despite the tremendous progress in molecular analysis of pan-cancer, little is known regarding molecular classification of cervical squamous cell carcinoma. In this study, we adopted a multi-omics approach to identify potential key classification features of cervical squamous cell carcinoma. Specifically, we analyzed mRNA, and microRNA (miRNA) expression data, as well as DNA methylation and copy number variation in cervical squamous cell carcinoma cases, using datasets obtained from The Cancer Genome Atlas (TCGA). Moreover, we identified molecules in each dimension, as well as integrated and clustered filtered classification features, and used them to distinguish different subtypes. The resulting key classification features were used to establish a classification model for cervical squamous cell carcinoma. Our results revealed two cervical squamous cell carcinoma subtypes, with significant differences across clinical survival levels, as well as 8 key classification features of cervical squamous cell carcinomas. These findings are expected to provide important references for early classification of cervical squamous cell carcinoma and identification of classification markers.  相似文献   

15.
The objective of the study was to check the authenticity of Hungarian honey using physicochemical analysis, near infrared spectroscopy, and melissopalynology. In the study, 87 samples from different botanical origins such as acacia, bastard indigo, rape, sunflower, linden, honeydew, milkweed, and sweet chestnut were collected. The samples were analyzed by physicochemical methods (pH, electrical conductivity, and moisture), melissopalynology (300 pollen grains counted), and near infrared spectroscopy (NIRS:740–1700 nm). During the evaluation of the data PCA-LDA models were built for the classification of different botanical and geographical origins, using the methods separately, and in combination (low-level data fusion). PC number optimization and external validation were applied for all the models. Botanical origin classification models were >90% and >55% accurate in the case of the pollen and NIR methods. Improved results were obtained with the combination of the physicochemical, melissopalynology, and NIRS techniques, which provided >99% and >81% accuracy for botanical and geographical origin classification models, respectively. The combination of these methods could be a promising tool for origin identification of honey.  相似文献   

16.
The voltammetric responses on selected white wines of different vintages and origins have been systematically collected by three different modified electrodes, in order to check their effectiveness in performing blind analysis of similar matrices. The electrode modifiers consist of a conducting polymer, namely poly(3,4-ethylenedioxythiophene) (PEDOT) and of composite materials of Au and Pt nanoparticles embedded in a PEDOT layer. Wine samples have been tested, without any prior treatments, with differential pulse voltammetry technique. The subsequent chemometric analysis has been carried out both separately on the signals of each sensor, and on the signals of two or even three sensors as a unique set of data, in order to check the possible complementarity of the information brought by the different electrodes. After a preliminary inspection by principal component analysis, classification models have been built and validated by partial least squares-discriminant analysis. The discriminant capability has been evaluated in terms of sensitivity and specificity of classification; in all cases quite good results have been obtained.  相似文献   

17.
The effectiveness of a regression method strongly depends on the characteristics of the considered regression problem. As a consequence, this makes it difficult to choose a priori the most appropriate algorithm for a given dataset. This issue is faced in this work through a novel regression approach based on the fusion of an ensemble of different regressors. In order to implement the proposed robust multiple system (RMS), four different fusion strategies are explored. In this context, we propose a novel fusion strategy named selection‐based strategy (SBS) that provides as output the estimate obtained by the regression algorithm (included in the ensemble) characterized by the highest expected accuracy in the region of the feature space associated with the considered model. The SBS is based not on a direct combination of the estimates yielded by all the regressors but on a selection mechanism that identifies the expected best available estimate. For such purpose, it exploits the accuracies of the regressors included in the ensemble in different portions of the input feature space. The experimental assessment of the RMS was carried out on three different datasets: a wine, an orange juice, and an apple datasets. The obtained experimental results suggest that, in general, the fusion of an ensemble of different regression algorithms leads to a regression process that is more robust and sometimes also more accurate than traditional regression methods. In particular, the proposed SBS method represents an effective solution to carry out the fusion process. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

18.
We describe a new coating method Laminar Flow Coating (LFC) technique developed to obtain highly reflective (HR) laser damage resistant sol-gel multidielectric coatings. Such coatings are used in high-power lasers for inertial confinement fusion experiments (ICF). This technique uses substrates in an upside-down position and a travelling wave of coating solution is transported with a laminar motion under the substrate surface with a tubular dispense unit. This creates a thin-film coating by solvent evaporation. Satisfactory results have been obtained on 20-cm square glass substrates regarding the optical performances, the thickness uniformity, the edge-effects and the laser damage resistance. This deposition technique combines the advantages of both classical techniques: the non-exclusive substrate geometry such as in dip-coating and the small solution consumption such as in spin-coating.The association of sol-gel colloidal suspensions and LFC coating process has been demonstrated as a promising way to produce inexpensive specific optical coatings [1].  相似文献   

19.
The concepts article describes enabling techniques (solid-phase assisted synthesis, new reactor design, microwave irradiation and new solvents) in organic chemistry and emphasizes the combination of several of them for creating new synthetic technology platforms. Particular focus is put on the combination of immobilized catalysts as well as biocatalysts with continuous flow processes. In this context, the PASSflow continuous flow technique fulfils both chemical as well as chemical engineering requirements. It combines reactor design with optimized, monolithic solid phases as well as reversible immobilization techniques for performing small as well as large scale synthesis with heterogenized catalysts under continuous flow conditions.  相似文献   

20.
Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号