首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
The two-dimensional linear discriminant analysis (2D-LDA) algorithm was originally proposed in the context of face image processing for the extraction of features with maximal discriminant power. However, despite its promising performance in image processing tasks, the 2D-LDA algorithm has not yet been used in applications involving chemical data. The present paper bridges this gap by investigating the use of 2D-LDA in classification problems involving three-way spectral data. The investigation was concerned with simulated data, as well as real-life data sets involving the classification of dry-cured Parma ham according to ageing by surface autofluorescence spectrometry and the classification of edible vegetable oils according to feedstock using total synchronous fluorescence spectrometry. The results were compared with those obtained by using the spectral data with no feature extraction, U-PLS-DA (Partial Least Squares Discriminant Analysis applied to the unfolded data), and LDA employing TUCKER-3 or PARAFAC scores. In the simulated data set, all methods yielded a correct classification rate of 100%. However, in the Parma ham and vegetable oil data sets, better classification rates were obtained by using 2D-LDA (86% and 100%), compared with no feature extraction (76% and 77%), U-PLS-DA (81% and 92%), PARAFAC-LDA (76% and 86%) and TUCKER3-LDA (86% and 93%).  相似文献   

2.
Li-Juan Tang  Hai-Long Wu 《Talanta》2009,79(2):260-1694
One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. To circumvent the problem, a new gene mining approach is proposed based on the similarity between probability density functions on each gene for the class of interest with respect to the others. This method allows the ascertainment of significant genes that are informative for discriminating each individual class rather than maximizing the separability of all classes. Then one can select genes containing important information about the particular subtypes of diseases. Based on the mined significant genes for individual classes, a support vector machine with local kernel transform is constructed for the classification of different diseases. The combination of the gene mining approach with support vector machine is demonstrated for cancer classification using two public data sets. The results reveal that significant genes are identified for each cancer, and the classification model shows satisfactory performance in training and prediction for both data sets.  相似文献   

3.
Du W  Gu T  Tang LJ  Jiang JH  Wu HL  Shen GL  Yu RQ 《Talanta》2011,85(3):1689-1694
As a greedy search algorithm, classification and regression tree (CART) is easily relapsing into overfitting while modeling microarray gene expression data. A straightforward solution is to filter irrelevant genes via identifying significant ones. Considering some significant genes with multi-modal expression patterns exhibiting systematic difference in within-class samples are difficult to be identified by existing methods, a strategy that unimodal transform of variables selected by interval segmentation purity (UTISP) for CART modeling is proposed. First, significant genes exhibiting varied expression patterns can be properly identified by a variable selection method based on interval segmentation purity. Then, unimodal transform is implemented to offer unimodal featured variables for CART modeling via feature extraction. Because significant genes with complex expression patterns can be properly identified and unimodal feature extracted in advance, this developed strategy potentially improves the performance of CART in combating overfitting or underfitting while modeling microarray data. The developed strategy is demonstrated using two microarray data sets. The results reveal that UTISP-based CART provides superior performance to k-nearest neighbors or CARTs coupled with other gene identifying strategies, indicating UTISP-based CART holds great promise for microarray data analysis.  相似文献   

4.
Aruga R 《Talanta》1998,47(4):1053-1061
When it is not possible to analyze an exactly reproducible amount of sample (or whenever samples contain indefinite amounts of extraneous materials) it is customary to normalize the data by making, for example, the sum of the concentrations obtained for each sample equal to 100. Although the data normalized (or ;closed') in such a manner have been criticized, it is empirically shown that closure is appropriate in order to compare and classify samples of the type indicated above.  相似文献   

5.
This paper develops a multi-parturition genetic algorithm (MPGA) to be used in geometrical bounding of the overlapped clusters in a data set for the classification of chemical data. Two new operators have been introduced to modify the conventional genetic algorithm, namely, multi-parturition and decimation and orientated creation to improve the linear classification results and diminish the computational time. To circumvent the difficulty commonly encountered in the treatment of linearly inseparable chemical data sets, the optimized linear classifier is further modified to provide a complementary nonlinear classifier. For this reason the space regions of the overlapped clusters have been bounded by erection of half-hyperellipsoids over the linearly misclassified patterns. The proposed MPGA was applied to classify a number of chemical and other data sets with a dimension from 4 to 14. Experimental results have indicated that the proposed MPGA could classify seriously overlapped data sets with an acceptable error rate.  相似文献   

6.
A simple process for the deposition of up to six different polymers in selected areas to be used as sensitive layers in chemical sensor arrays is presented. The process is based on photolithographic processes and takes advantage of the balance between UV exposure dose, material tone and developers used. The sensing properties of the deposited films in the array were characterized by the in situ monitoring of volume expansion upon exposure to analytes using white light reflectance sspectroscopy. The swelling properties of processed films are compared to the unprocessed ones for the purpose of examining the variation induced by the processing steps (exposure and development circles). Additionally, the repeatability of the processes as well as the effect of analyte sequence is examined. This process offers good control of the lateral dimensions and the thickness of the polymeric films and allows for the parallel fabrication of sensors based on different transduction mechanisms including mass sensitive and stress induced bending chemical sensors.  相似文献   

7.
The results of an attempt to construct images from multivariate analytical data are described. The method is based on addition of the features of a pattern in a Fourier polynomial and plotting the resulting summation in polar coordinates. The images formed in this way resemble flies, dragonflies, moths, etc. and can be constructed on a personal computer with graphics and printed on a simple matrix printer.  相似文献   

8.
9.
The present paper deals with the presentation in a new interpretation of sediment quality assessment. This original approach studies the relationship between ecotoxicity parameters (acute and chronic toxicity) and chemical components (polluting species like polychlorinated biphenyls (PCBs), pesticides, polycyclic aromatic hydrocarbons (PAH), heavy metals) of lake sediments samples from Turawa Lake, Poland by an application of self-organising maps (SOMs) to the monitoring dataset (59 samples × 44 parameters) in order to obtain visual images of the components distributed at each sampling site when all components are included in the classification and data projection procedure. From the SOMs obtained, it is possible to select groups of similar ecotoxicity (either acute or chronic) and to analyse within each one of them the relationship of the other chemicals to the toxicity determining parameters (EC50 and mortality). Studies have shown, convincingly, that different regions from the Turawa Lake bottom indicate different patterns of ecotoxicity related to various chemical pollutants, such as the “heptachlor-B” pattern, “pesticide and PAH” pattern, “structural” pattern or “PCB congeners” pattern. Thus, an easy way of multivariate analysis of small datasets with ecotoxicity parameters involved becomes possible. Additionally, a distinction between the effects of pollution on acute and chronic toxicity seems reasonable.  相似文献   

10.
11.
When polyelectrolyte-neutral block copolymers are mixed in aqueous solutions with oppositely charged species, stable complexes are found to form spontaneously. The mechanism is based on electrostatics and on the compensation between the opposite charges. Electrostatic complexes exhibit a core-shell microstructure. In the core, the polyelectrolyte blocks and the oppositely charged species are tightly bound and form a dense coacervate microphase. The shell is made of the neutral chains and surrounds the core. In this paper, we report on the structural and magnetic properties of such complexes made from 6.3 nm diameter superparamagnetic nanoparticles (maghemite gamma-Fe(2)O(3)) and cationic-neutral copolymers. The copolymers investigated are poly(trimethylammonium ethylacrylate methyl sulfate)-b-poly(acrylamide), with molecular weights 5000-b-30000 g mol(-)(1) and 110000-b-30000 g mol(-)(1). The mixed copolymer-nanoparticle aggregates were characterized by a combination of light scattering and cryo-transmission electron microscopy. Their hydrodynamic diameters were found in the range 70-150 nm, and their aggregation numbers (number of nanoparticles per aggregate) from tens to hundreds. In addition, Magnetic Resonance Spin-Echo measurements show that the complexes have a better contrast in Magnetic Resonance Imaging than single nanoparticles and that these complexes could be used for biomedical applications.  相似文献   

12.
Artificial neural networks have proven to be a powerful tool for solving classification problems. Some difficulties still need to be overcome for their successful application to chemical data. The use of supervised neural networks implies the initial distribution of patterns between the pre-determined classes, while attribution of objects to the classes may be uncertain. Unsupervised neural networks are free from this problem, but do not always reveal the real structure of data. Classification algorithms which do not require a priori information about the distribution of patterns between the pre-determined classes and provide meaningful results are of special interest. This paper presents an approach based on the combination of Kohonen and probabilistic networks which enables the determination of the number of classes and the reliable classification of objects. This is illustrated for a set of 76 solvents based on nine characteristics. The resulting classification is chemically interpretable. The approach proved to be also applicable in a different field, namely in examining the solubility of C60 fullerene. The solvents belonging to the same group demonstrate similar abilities to dissolve C60. This makes it possible to estimate the solubility of fullerenes in solvents for which there are no experimental data   相似文献   

13.
Autocatalysis is fundamental to many biological processes, and kinetic models of autocatalytic reactions have mathematical forms similar to activation functions used in artificial neural networks. Inspired by these similarities, we use an autocatalytic reaction, the copper-catalyzed azide–alkyne cycloaddition, to perform digital image recognition tasks. Images are encoded in the concentration of a catalyst across an array of liquid samples, and the classification is performed with a sequence of automated fluid transfers. The outputs of the operations are monitored using UV-vis spectroscopy. The growing interest in molecular information storage suggests that methods for computing in chemistry will become increasingly important for querying and manipulating molecular memory.

Kinetic models of autocatalytic reactions have mathematical forms similar to activation functions used in artificial neural networks. Inspired by these similarities, we use a copper-catalyzed reaction to perform digital image recognition tasks.  相似文献   

14.
Aruga R 《Talanta》2003,60(5):937-944
On the basis of the results of previous studies, the problem of multivariate classification in the presence of the so-called radial or V-shaped data has been briefly re-examined. Taking into account that the radial data, in the absence of preliminary transformations, usually lead to classifications of samples meaningless from a chemical point of view, five different data transformations have been evaluated and compared in the case of both hypothetical and real samples (real samples, in particular, consisted of archaeological ceramic shards to be classified on the basis of provenance). The following transformations have been used: closure to 100, log row centering, log double centering, row centering, and double centering. The transformed data were then classified by means of hierarchical clustering and principal component analysis (PCA). It has been demonstrated that only the first three transformations lead to correct classifications of radial data, and the causes of this fact have been explained.  相似文献   

15.
Warmr: a data mining tool for chemical data   总被引:5,自引:0,他引:5  
  相似文献   

16.
Single emitter blinking with a power-law distribution for the on and off times has been observed on a variety of systems including semiconductor nanocrystals, conjugated polymers, fluorescent proteins, and organic fluorophores. The origin of this behavior is still under debate. Reliable estimation of power exponents from experimental data is crucial in validating the various models under consideration. We derive a maximum likelihood estimator for power-law distributed data and analyze its accuracy as a function of data set size and power exponent both analytically and numerically. Results are compared to least-squares fitting of the double logarithmically transformed probability density. We demonstrate that least-squares fitting introduces a severe bias in the estimation result and that the maximum likelihood procedure is superior in retrieving the correct exponent and reducing the statistical error. For a data set as small as 50 data points, the error margins of the maximum likelihood estimator are already below 7%, giving the possibility to quantify blinking behavior when data set size is limited, e.g., due to photobleaching.  相似文献   

17.
Serial analysis of gene expression (SAGE) is a powerful tool to obtain gene expression profiles. Clustering analysis is a valuable technique for analyzing SAGE data. In this paper, we propose an adaptive clustering method for SAGE data analysis, namely, PoissonAPS. The method incorporates a novel clustering algorithm, Affinity Propagation (AP). While AP algorithm has demonstrated good performance on many different data sets, it also faces several limitations. PoissonAPS overcomes the limitations of AP using the clustering validation measure as a cost function of merging and splitting, and as a result, it can automatically cluster SAGE data without user-specified parameters. We evaluated PoissonAPS and compared its performance with other methods on several real life SAGE datasets. The experimental results show that PoissonAPS can produce meaningful and interpretable clusters for SAGE data.  相似文献   

18.
Classification problems have received considerable attention in biological and medical applications. In particular, classification methods combining to microarray technology play an important role in diagnosing and predicting disease, such as cancer, in medical research. Primary objective in classification is to build an optimal classifier based on the training sample in order to predict unknown class in the test sample. In this paper, we propose a unified approach for optimal gene classification with conjunction with functional principal component analysis (FPCA) in functional data analysis (FNDA) framework to classify time-course gene expression profiles based on information from the patterns. To derive an optimal classifier in FNDA, we also propose to find optimal number of bases in the smoothing step and functional principal components in FPCA using a cross-validation technique, and compare the performance of some popular classification techniques in the proposed setting. We illustrate the propose method with a simulation study and a real world data analysis.  相似文献   

19.
ChemCam is a remote laser-induced breakdown spectroscopy (LIBS) instrument that will arrive on Mars in 2012, on-board the Mars Science Laboratory Rover. The LIBS technique is crucial to accurately identify samples and quantify elemental abundances at various distances from the rover. In this study, we compare different linear and nonlinear multivariate techniques to visualize and discriminate clusters in two dimensions (2D) from the data obtained with ChemCam. We have used principal components analysis (PCA) and independent components analysis (ICA) for the linear tools and compared them with the nonlinear Sammon’s map projection technique. We demonstrate that the Sammon’s map gives the best 2D representation of the data set, with optimization values from 2.8% to 4.3% (0% is a perfect representation), together with an entropy value of 0.81 for the purity of the clustering analysis. The linear 2D projections result in three (ICA) and five times (PCA) more stress, and their clustering purity is more than twice higher with entropy values about 1.8. We show that the Sammon’s map algorithm is faster and gives a slightly better representation of the data set if the initial conditions are taken from the ICA projection rather than the PCA projection. We conclude that the nonlinear Sammon’s map projection is the best technique for combining data visualization and clustering assessment of the ChemCam LIBS data in 2D. PCA and ICA projections on more dimensions would improve on these numbers at the cost of the intuitive interpretation of the 2D projection by a human operator.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号