首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gene expression data sets hold the promise to provide cancer diagnosis on the molecular level. However, using all the gene profiles for diagnosis may be suboptimal. Detection of the molecular signatures not only reduces the number of genes needed for discrimination purposes, but may elucidate the roles they play in the biological processes. Therefore, a central part of diagnosis is to detect a small set of tumor biomarkers which can be used for accurate multiclass cancer classification. This task calls for effective multiclass classifiers with built-in biomarker selection mechanism. We propose the sparse optimal scoring (SOS) method for multiclass cancer characterization. SOS is a simple prototype classifier based on linear discriminant analysis, in which predictive biomarkers can be automatically determined together with accurate classification. Thus, SOS differentiates itself from many other commonly used classifiers, where gene preselection must be applied before classification. We obtain satisfactory performance while applying SOS to several public data sets.  相似文献   

2.
High-throughput DNA microarray provides an effective approach to the monitoring of expression levels of thousands of genes in a sample simultaneously. One promising application of this technology is the molecular diagnostics of cancer, e.g. to distinguish normal tissue from tumor or to classify tumors into different types or subtypes. One problem arising from the use of microarray data is how to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples). There is a need to develop reliable classification methods to make full use of microarray data and to evaluate accurately the predictive ability and reliability of such derived models. In this paper, discriminant partial least squares was used to classify the different types of human tumors using four microarray datasets and showed good prediction performance. Four different cross-validation procedures (leave-one-out versus leave-half-out; incomplete versus full) were used to evaluate the classification model. Our results indicate that discriminant partial least squares using leave-half-out cross-validation provides a more realistic estimate of the predictive ability of a classification model, which may be overestimated by some of the cross-validation procedures, and the information obtained from different cross-validation procedures can be used to evaluate the reliability of the classification model.  相似文献   

3.
Class prediction based on DNA microarray data has been emerged as one of the most important application of bioinformatics for diagnostics/prognostics. Robust classifiers are needed that use most biologically relevant genes embedded in the data. A consensus approach that combines multiple classifiers has attributes that mitigate this difficulty compared to a single classifier. A new classification method named as consensus analysis of multiple classifiers using non-repetitive variables (CAMCUN) was proposed for the analysis of hyper-dimensional gene expression data. The CAMCUN method combined multiple classifiers, each of which was built from distinct, non-repeated genes that were selected for effectiveness in class differentiation. Thus, the CAMCUN utilized most biologically relevant genes in the final classifier. The CAMCUN algorithm was demonstrated to give consistently more accurate predictions for two well-known datasets for prostate cancer and leukemia. Importantly, the CAMCUN algorithm employed an integrated 10-fold cross-validation and randomization test to assess the degree of confidence of the predictions for unknown samples.  相似文献   

4.
In this study the effective discrimination of extra virgin olive oils is described using HPLC-MS, combined with chemometric evaluation. The presented method is simple since the diluted oil sample is directly injected into the system, without any preliminary chemical derivatization or purification step. Separation of diacylglycerols, triacylglycerols and sterols occurs within 20 min and is achieved using an octadecyl-silica column. Detection is performed by positive APCI mass spectrometry which provided sensitivity to detect over 50 compounds in the sample. After extraction of data, stepwise discriminant function analysis is used to select the variables with the highest discriminative power. These variables are used to perform linear discriminant analysis and classify/predict the samples. One-hundred per cent classification and 99% prediction rate was achieved for olive oils obtained from Nocellara, Biancolilla and Cerausola cultivars. Reliability of prediction was tested by cross validation.  相似文献   

5.
Four cultivars of olives picked up in the Moroccan region of Beni Mellal were subjected to a characterization and classification study. Analytical data were collected by Fourier transform infrared spectroscopy (FTIR), applied on the mesocarp of the fresh olives without any preliminary treatment. The spectral data were pre-treated by derivative elaboration based on the Savitzky-Golay algorithm to reduce noise and increase analytical information. Partial least squares discriminant analysis (PLS-DA) was performed to elaborate the measurement data and assess the discriminant features of the four cultivars. The PLS model was optimized by applying the Martens’ uncertainty test which provided to select the vibrational frequencies giving the most useful information. The optimized model resulted able to separate the four classes and classify new objects into the appropriate defined classes with a percentage prediction of 97%. The proposed method represents a real novelty to classify olives of different varieties by means of a rapid, inexpensive and reliable procedure.  相似文献   

6.
GC–MS and chemometric analysis of subcutaneous fat has been studied to classify three different feeding regimes of Iberian pigs. Nineteen fatty acids present in 57 fat samples were identified and quantified. Principal component analysis was employed for the preliminary study of the data structure. Discriminant analysis was used to classify samples into the three categories on the basis of the fatty acid profiles. Using a leave-one-out cross-validation procedure, only one fat sample from a pig fed with commercial feed, which simulated the fatty acid profile from free range animals, was incorrectly classified as having been fed on acorns and pasture. Using external validation, all of the samples were correctly classified. The most decisive fatty acids for distinguishing between groups, when using discriminant analysis were C16:1, C18:1, C17:0 and C18:0 for the first discriminant function, and C18:3, C14:0, C15:0, C22:0, C16:1 and C22:1 for the second discriminant function, ordered from the highest to the lowest coefficient. Some of the fatty acids important in distinguishing between groups are not the ones used by Spanish legislation to classify pigs in the different feeding categories. The results in this paper demonstrate the potential of statistical data treatment in the classification of animal feeding regimes.  相似文献   

7.
8.
Shen H  Carter JF  Brereton RG  Eckers C 《The Analyst》2003,128(3):287-292
Wet granulation and direct compression are two processes employed in tablet preparation. In this paper, pyrolysis-gas chromatography-mass spectrometry (Py-GC-MS) is used to discriminate these processes with the help of chemometric techniques. The data analysis procedure is as follows. First, deconvolute the Py-GC-MS data of each sample into concentration profiles and spectra, and then construct a matrix with each compound corresponding to one column; those contained only in a small number of samples are then removed. Second, the main principal components are kept after excluding three variables and one sample, and further processed by Fisher discriminant analysis. Third, the resultant data are assigned to classes using unsupervised and supervised classification methods. Results from cross-validation show that only 3 of 20 samples are misclassified by the Mahalanobis distance measure.  相似文献   

9.
10.
Headspace solid-phase microextraction (HS-SPME) coupled with gas chromatography (GC) and multivariate data analysis were applied to classify different vinegar types (white and red, balsamic, sherry and cider vinegars) on the basis of their volatile composition. The collected chromatographic signals were analysed using the stepwise linear discriminant analysis (SLDA) method, thus simultaneously performing feature selection and classification. Several options, more or less restrictive according to the final number of considered categories, were explored in order to identify the one that afforded highest discrimination ability. The simplicity and effectiveness of the classification methodology proposed in the present study (all the samples were correctly classified and predicted by cross-validation) are promising and encourage the feasibility of using a similar strategy to evaluate the quality and origin of vinegar samples in a reliable, fast, reproducible and cost-efficient way in routine applications. The high quality results obtained were even more remarkable considering the reduced number of discriminant variables finally selected by the stepwise procedure. The use of only 14 peaks enabled differentiation between cider, balsamic, sherry and wine vinegars, whereas only 3 variables were selected to discriminate between red (RW) and white wine (WW) vinegars. The subsequent identification by gas chromatography-mass spectrometry (GC-MS) of the volatile compounds associated with the discriminant peaks selected in the classification process served to interpret their chemical significance.  相似文献   

11.
Classical multivariate analysis techniques such as factor analysis and stepwise linear discriminant analysis and artificial neural networks method (ANN) have been applied to the classification of Spanish denomination of origin (DO) rose wines according to their geographical origin. Seventy commercial rose wines from four different Spanish DO (Ribera del Duero, Rioja, Valdepeñas and La Mancha) and two successive vintages were studied. Nineteen different variables were measured in these wines. The stepwise linear discriminant analyses (SLDA) model selected 10 variables obtaining a global percentage of correct classification of 98.8% and of global prediction of 97.3%. The ANN model selected seven variables, five of which were also selected by the SLDA model, and it gave a 100% of correct classification for training and prediction. So, both models can be considered satisfactory and acceptable, being the selected variables useful to classify and differentiate these wines by their origin. Furthermore, the casual index analysis gave information that can be easily explained from an enological point of view.  相似文献   

12.
The origin of medieval glass artefacts is studied by using a supervised learning technique, which is shown to be helpful when samples cannot be identified by typical design and appearance. A set of seventy pieces of glass was analyzed for ten trace elements by optical emission spectrography. The data matrix of 33 known objects from five origins was evaluated by multivariate variance and discriminant analysis in a training step. The extracted non-elementary discriminant functions were used to classify the 37 unidentified samples. The classification result is discussed in terms of its cultural/historical information content.  相似文献   

13.
The use of chiral amino acids content and stepwise discriminant analysis to classify three types of commercial orange juices (i.e., nectars, orange juices reconstituted from concentrates, and pasteurized orange juices not from concentrates) is presented. Micellar electrokinetic chromatography with laser-induced fluorescence (MEKC-LIF) and beta-cyclodextrins are used to determine L- and D-amino acids previously derivatized with fluorescein isothiocyanate (FITC). This chiral MEKC-LIF procedure is easy to implement and provides information about the main amino acids content in orange juices (i.e., L-proline; L-aspartic acid, D-Asp, L-serine, L-asparagine, L-glutamic acid, D-Glu, L-alanine, L-.arginine, D-Arg, and the non-chiral gamma-amino-n-butyric acid (GABA), i.e., gamma-aminobutyric acid). From these results, it is clearly demonstrated that some D-amino acids occur naturally in orange juices. Application of stepwise discriminant analysis to 26 standard samples showed that the amino acids L-Arg, L-Asp and GABA were the most important variables to differentiate the three groups of samples. With these three selected amino acids a 100% correct classification of the samples was obtained either by standard or by leave-one-out cross-validation procedures. These classification functions based on the content in L-Arg, L-Asp and GABA were also applied to nine test samples and provided an adequate classification and/or interesting information on these samples. It is concluded that chiral MEKC-LIF analysis of amino acids and stepwise discriminant analysis can be used as a consistent procedure to classify commercial orange juices providing useful information about their quality and processing. To our knowledge, this is the first report about the combined use of chiral capillary electrophoresis and discriminant techniques to classify foods.  相似文献   

14.

The objective of this work has been to assess the potential of capillary isotachophoretic organic acids profiling using multivariate statistical methods to classify brandy samples and wine distillate samples. The leading electrolyte was 10 mmol L−1 hydrochloric acid including 0.1% methylhydroxylethylcellulose adjusted with β-alanine to pH 2.9. The terminating electrolyte was 5 mmol L−1 acetic acid. Principal component analysis, cluster analysis, and linear discriminant analysis were used for the classification of beverages. The results show that for the 12 acids analysed, 98.57% of the total variance is extracted by the six principal components (PC). After performing backward linear discriminant analysis, a classification function was obtained containing four variables: formic (PC2-loadings: 0.989), lactic (PC1-loadings: 0.886), malic (PC1-loadings: 0.989) and oxalic (PC2-loadings: 0.777) acids, which provide 100.0% correct classification of brandies and wine distillates.

  相似文献   

15.
16.
The study presents the application of selected chemometric techniques: cluster analysis, principal component analysis, factor analysis and discriminant analysis, to classify a river water quality and evaluation of the pollution data. Seventeen stations, monitored for 16 physical and chemical parameters in 4 seasons during the period 1999-2003, located at the Bagmati river basin in Kathmandu Valley, Nepal were selected for the purpose of this study. The results allowed, determining natural clusters of monitoring stations with similar pollution characteristics and identifying main discriminant variables that are important for regional water quality variation and possible pollution sources affecting the river water quality. The analysis enabled to group 17 monitoring sites into 3 regions with 5 major discriminating variables: EC, DO, CL, NO2N and BOD. Results revealed that some locations were under the high influence of municipal contamination and some others under the influence of minerals. This study demonstrated that chemometric method is effective for river water classification, and for rapid assessment of water qualities, using the representative sites; it could serve to optimize cost and time without losing any significance of the outcome.  相似文献   

17.
一种用于二类样本判别分析的PLS方法   总被引:4,自引:0,他引:4  
提出了一种新的用于两类样本判别分析问题的PLS方法,该法对响应函数y作了类似神经网络算法中用的Signoid函数转换,可用一种新的优化目标判据来提取一组PLS方法中两两正交的隐变量t1,t2...,用这些变量可构成判别分类图,并可得到比较理想的判别方向矢量。  相似文献   

18.
结构描述符正交化及典型相关分析在饱和醇、醚质谱分类中的应用;饱和醇醚;模式识别;质谱分类变量;块变量;典型相关分析  相似文献   

19.
Researchers have demonstrated that Raman spectroscopy can be used for characterization of tumor cells with excellent spatial resolution. However, performance evaluation of different algorithms in classifying multiclass of Raman spectra has not been reported yet. In this work, we present Raman spectra of nasopharyngeal carcinoma and nasopharyngeal normal cell lines. Combined with student’s t-test and several multivariate approaches, including decision tree, support vector classification, and linear discriminant analysis, our work shows that the relative content of two histological abnormality sensitive bands at 1449 and 1658 cm−1 in tumor cells is significantly different from that of normal cells (p = 0.0132), and can be a biomarker to classify these cells. This difference is confirmed by importance analyses in the decision tree model. Furthermore, performances of statistical methods are compared with one another to explore the ability in classification. Results show that the decision tree can be more capable for classification between tumorous and normal cell lines with sensitivity and specificity of 99.0% and 96.9%, respectively. Findings of this work further support our previous work and indicate that the decision tree performs more robustly in cell classification. Our work will prove helpful to the early diagnosis of nasopharyngeal carcinoma, and will indicate the decision tree to be the primary algorithm in tumor-cell classification.  相似文献   

20.
《Analytical letters》2012,45(17):2727-2738
The K-means algorithm has some limitations including dead-unit properties, heavy dependence on the initial choice of cluster centers, convergence to local optima, and sensitivity to the number of clusters. This paper presents an efficient algorithm that optimizes K-means clustering by a hybrid particle swarm algorithm. The modified discrete algorithm is used to select variables and is continuously applied to update cluster centers simultaneously. The nearest center classification is then employed to classify the test samples. The proposed algorithm was applied to discriminate various edible oil varieties by employing Fourier transform infrared spectroscopy. As a comparison, the common K-means clustering, principal component analysis, and partial least squares techniques were also applied to classify these edible oil samples. Results demonstrated that the proposed method is an accurate and rapid strategy for identifying edible oils.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号