首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ASTM clustering for improving coal analysis by near-infrared spectroscopy   总被引:1,自引:0,他引:1  
Andrés JM  Bona MT 《Talanta》2006,70(4):711-719
Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.  相似文献   

2.
Airborne particulate matter is an important component of atmospheric pollution, affecting human health, climate, and visibility. Modern instruments allow single particles to be analyzed one-by-one in real time, and offer the promise of determining the sources of individual particles based on their mass spectral signatures. The large number of particles to be apportioned makes clustering a necessary step. The goal of this study is to compare using mass spectral data the accuracy and speed of several clustering algorithms: ART-2a, several variants of hierarchical clustering, and K-means. Repeated simulations with various algorithms and different levels of data preprocessing suggest that hierarchical clustering methods using derivatives of Ward's algorithm discriminate sources with fewer errors than ART-2a, which itself discriminates much better than point-wise hierarchical clustering methods. In most cases, K-means algorithms do almost as well as the best hierarchical clustering. These efficient algorithms (clustering derived from Ward's algorithm, ART-2a and K-means) are most accurate when the relative peak areas have been pre-scaled by taking the square root. Analysis times vary within a factor of 30, and when accuracy above 95% is required, run times scale up as the square of the number of particles. Algorithms derived from Ward's remain the most accurate under a wide range of conditions and conversely, for an equal accuracy, can deliver a shorter list of clusters, allowing faster and maybe on-the-fly classification.  相似文献   

3.
《Analytica chimica acta》2004,515(1):87-100
The goal of present work is to analyse the effect of having non-informative variables (NIV) in a data set when applying cluster analysis and to propose a method computationally capable of detecting and removing these variables. The method proposed is based on the use of a genetic algorithm to select those variables important to make the presence of groups in data clear. The procedure has been implemented to be used with k-means and using the cluster silhouettes as fitness function for the genetic algorithm.The main problem that can appear when applying the method to real data is the fact that, in general, we do not know a priori what the real cluster structure is (number and composition of the groups).The work explores the evolution of the silhouette values computed from the clusters built by using k-means when non-informative variables are added to the original data set in both a literature data set as well as some simulated data in higher dimension. The procedure has also been applied to real data sets.  相似文献   

4.
Four cultivars of olives picked up in the Moroccan region of Beni Mellal were subjected to a characterization and classification study. Analytical data were collected by Fourier transform infrared spectroscopy (FTIR), applied on the mesocarp of the fresh olives without any preliminary treatment. The spectral data were pre-treated by derivative elaboration based on the Savitzky-Golay algorithm to reduce noise and increase analytical information. Partial least squares discriminant analysis (PLS-DA) was performed to elaborate the measurement data and assess the discriminant features of the four cultivars. The PLS model was optimized by applying the Martens’ uncertainty test which provided to select the vibrational frequencies giving the most useful information. The optimized model resulted able to separate the four classes and classify new objects into the appropriate defined classes with a percentage prediction of 97%. The proposed method represents a real novelty to classify olives of different varieties by means of a rapid, inexpensive and reliable procedure.  相似文献   

5.
Clustering of gene expression data collected across time is receiving growing attention in the biological literature since time-course experiments allow one to understand dynamic biological processes and identify genes governed by the same processes. It is believed that genes demonstrating similar expression profiles over time might give an informative insight into how underlying biological mechanisms work. In this paper, we propose a method based on functional data analysis (FNDA) to cluster time-dependent gene expression profiles. Consideration of clustering problems using the FNDA setting provides ways to take time dependency into account by using basis function expansion to describe the partially observed curves. We also discuss how to choose the number of bases in the basis function expansion in FNDA. A synthetic cycle data and a real data are used to demonstrate the proposed method and some comparisons between the proposed and existing approaches using the adjusted Rand indices are made.  相似文献   

6.
Signal transduction governs virtually every cellular function of multicellular organisms, and its deregulation leads to a variety of diseases. This intricate network of molecular interactions is mediated by proteins that are assembled into complexes within individual signaling pathways, and their composition and function is often regulated by different post-translational modifications. Proteomic approaches are commonly used to analyze biological complexes and networks, but often lack the specificity to address the dynamic and hence transient nature of the interactions and the influence of the multiple post-translational modifications that govern these processes. Here we review recent developments in proteomic research to address these limitations, and discuss several technologies that have been developed for this purpose. The synergy between these proteomic and computational tools, when applied together with global methods to the analysis of individual proteins, complexes and pathways, may allow researchers to unravel the underlying mechanisms of signaling networks in greater detail than previously possible.  相似文献   

7.
Classification and influence matrix analysis (CAIMAN) is a new classification method, recently proposed and based on the influence matrix (also called leverage matrix). Depending on the purposes of the classification analysis, CAIMAN can be used in three outlines: (1) D-CAIMAN is a discriminant classification method, (2) M-CAIMAN is a class modelling method allowing a sample to be classified, not classified at all, or assigned to more than one class (confused) and (3) A-CAIMAN deals with the asymmetric case, where only a reference class needs to be modelled.

In this work, the geographic classification of samples of wine and olive oil has been carried out by means of CAIMAN and its results compared with discriminant analysis, by focusing great attention on the model predictive capabilities. The geographic characterization has been carried out on three different datasets: extra virgin olive oils produced in a small area, with a “protected denomination of origin” label, wines with different denominations of origin, but produced in enclosed geographical areas, and olive oils belonging to different production areas.

Final results seem to indicate that the application of CAIMAN to the geographical origin identification offers several advantages: first, it shows – on an average basis – good performances; second, it is able to deal in a simple way classification problems related to tipicity, authenticity, and uniqueness characterization, which are of increasing interest in food quality issues.  相似文献   


8.
Osteoporosis is an emerging health issue worldwide. Due to the decrease of bone mineral density and the deterioration of skeletal microarchitecture, osteoporosis could lead to increased bone fragility and higher fracture risk. Since lack of specific symptoms, novel serum proteomic indicators are urgently needed for the evaluation of osteoporosis. Microvesicles (MVs) are important messengers widely present in body fluids and have emerged as novel targets for the diagnosis of multiple diseases. In this study, MVs were successfully isolated from human serum and comprehensively characterized. Comparative proteomics analysis revealed differential MVs protein profiling in normal subjects, osteopenia patients, and osteoporosis patients. In total, about 200 proteins were identified and quantified from serum MVs, among which 19 proteins were upregulated (fold change >2) and five proteins were downregulated (fold change <0.5) in osteopenia group and osteoporosis group when compared with the normal group. Three protein candidates were selected for initial verification, including Vinculin, Filamin A, and Profilin 1. Profilin 1 was further pre‐validated in an independent sample set, which could differentiate osteoporosis group from osteopenia group and normal group (p < 0.05). Our data collectively demonstrate that serum MVs proteome can be valuable indicators for the evaluation and diagnostics of bone loss disease.  相似文献   

9.
The effects of procedural variables on dolomite decomposition in carbon dioxide were investigated. The partial pressure of carbon dioxide causes dolomite decomposition to split into a two-stage process. It was observed that the first stage of dolomite decomposition is progressively displaced to higher temperatures with an increase in heating rate. However, the second stage is not affected significantly by changes in the heating rate. These studies also indicate that decrepitation analysis on dolomite in CO2 provides better information as compared to experiments in other atmospheres. The flow rate of the purge gas does not influence the thermal behavior of dolomite.This revised version was published online in November 2005 with corrections to the Cover Date.  相似文献   

10.
In this study, chemometric techniques such as cluster analysis (CA), discriminant analysis (DA), principal component analysis (PCA) and partial least squares (PLS) were used to analyse the wastewater dataset to identify the factors which affect the composition of sewage of domestic origin, spatial and temporal variations, similarity/dissimilarity among the wastewater characteristics of cis- and trans-drains and discriminating variables. Samples collected from 24 wastewater drains in Lucknow city and from three sites on Gomti river in the month of January/February, May, August and November during the period of 5 years (1994-1999) were characterized for 32 parameters. The multivariate techniques successfully described the similarities/dissimilarities among the sewage drains on the basis of their wastewater characteristics and sources signifying the effect of routine domestic/commercial activities in respective drainage areas. Spatial and seasonal variations in wastewater composition were also determined successfully. CA generated six groups of drains on the basis of similar wastewater characteristic. PCA provided information on seasonal influence and compositional differences in sewage generated by domestic and industrial waste dominated drains and showed that drains influenced by mixed industrial effluents have high organic pollution load. DA rendered six variables (TDS, alkalinity, F, TKN, Cd and Cr) discriminating between cis- and trans-drains. PLS-DA showed dominance of Cd, Cr, NO3, PO4 and F in cis-drains wastewater. The results suggest that biological-process based STPs could treat wastewater both from the cis- as well as trans-drains, however, prior removal of toxic metals will be required from the cis-drains sewage. Further, seasonal variations in wastewater composition and pollution load could be the guiding factor for determining the STPs design parameters. The information generated would be useful in selection of process type and in designing of the proposed sewage treatment plants (STPs) for safe disposal of wastewater.  相似文献   

11.
Discarding or downweighting high-noise variables in factor analytic models   总被引:1,自引:0,他引:1  
This work examines the factor analysis of matrices where the proportion of signal and noise is very different in different columns (variables). Such matrices often occur when measuring elemental concentrations in environmental samples. In the strongest variables, the error level may be a few percent. For the weakest variables, the data may consist almost entirely of noise. This paper demonstrates that the proper scaling of weak variables is critical. It is found that if a few weak variables are scaled to too high a weight in the analysis, the errors in computed factors would grow, possibly obscuring the weakest factor(s) by the increased noise level. The mathematical explanation of this phenomenon is explored by means of Givens rotations. It is shown that the customary form of principal component analysis (PCA), based on autoscaling the original data, is generally very ineffective because the scaling of weak variables becomes much too high. Practical advice is given for dealing with noisy data in both PCA and positive matrix factorization (PMF).  相似文献   

12.
Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance/correlation matrix of the analyzed data. However, to properly work with high-dimensional data sets, PCA poses severe mathematical constraints on the minimum number of different replicates, or samples, that must be included in the analysis. Generally, improper sampling is due to a small number of data respect to the number of the degrees of freedom that characterize the ensemble. In the field of life sciences it is often important to have an algorithm that can accept poorly dimensioned data sets, including degenerated ones. Here a new random projection algorithm is proposed, in which a random symmetric matrix surrogates the covariance/correlation matrix of PCA, while maintaining the data clustering capacity. We demonstrate that what is important for clustering efficiency of PCA is not the exact form of the covariance/correlation matrix, but simply its symmetry.  相似文献   

13.
A number of 7 complexes of the [Co(DH)2(amine)2)]I type (DH2 stands for dimethyloxime) have been studied by means of thermogravimetry and differential scanning calorimetry in nitrogen atmosphere, by using heating rates of 2.5, 5 and 10 K min–1. In all cases an endothermal deamination reaction occurs leading to the relatively stable [Co(DH)2I(amine)] intermediate. For this reaction apparent kinetic parameters have been derived. The influence of heating rate is discussed. The validity of a linear and a non-linear kinetic compensation law was verified.This revised version was published online in November 2005 with corrections to the Cover Date.  相似文献   

14.
This study investigates whether dry-cured hams from two European countries can be distinguished using SDS-PAGE. Thirty-seven commercial hams (19 Spanish, 18 French) were used in the study. Four protein fractions were extracted from each sample, with sufficient material prepared to allow each fraction to be analysed in triplicate lanes. The complete extraction process was carried out in duplicate. The 24 specimens originating from each ham sample were randomly allocated to different lane positions and gels, as were at least two reference lanes (for reference proteins). In total, 118 gels were prepared. Mathematical routines were developed using a matrix language to process the gel image files. Procedures were written to carry out 'within-gel' image correction, lane extraction and normalization, 'between-gel' data registration and linear discriminant analysis (LDA) of each fraction's data to establish whether the provenance could be systematically distinguished. The between-gel registration was carried out using a genetic algorithm (GA). Feature selection was also performed using a GA, to pass subsets of features to the LDA routine. Cross-validated classification success rates were 84, 91, 81 and 85%, respectively, for the four fractions. We conclude that SDS-PAGE can be conducted in a sufficiently quantitative manner and can potentially verify the provenance of regional speciality dry-cured hams.  相似文献   

15.
Prefractionation of complex protein mixtures is an efficient method for increasing the separation power of 2-DE. RP-HPLC has been successfully utilized as a prefractionation method prior to 2-DE. Here we describe the optimization of an efficient RP-HPLC method for prefractionation of baby hamster kidney cell solubilized proteins. A step gradient elution of acetonitrile was optimized and collected fractions were further examined by SDS-PAGE and 2-DE. By utilizing this method an effective increase in separation power of 2-DE is accomplished. Moreover, we describe the application of this method to expressional proteome analysis of a virally infected cell model.  相似文献   

16.
Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached.  相似文献   

17.
Kim J  Kim SH  Lee SU  Ha GH  Kang DG  Ha NY  Ahn JS  Cho HY  Kang SJ  Lee YJ  Hong SC  Ha WS  Bae JM  Lee CW  Kim JW 《Electrophoresis》2002,23(24):4142-4156
Hepatocellular carcinoma (HCC) is a common malignancy worldwide and is a leading cause of death. To contribute to the development and improvement of molecular markers for diagnostics and prognostics and of therapeutic targets for the disease, we have largely expanded the currently available human liver tissue maps and studied the differential expression of proteins in normal and cancer tissues. Reference two-dimensional electrophoresis (2-DE) maps of human liver tumor tissue include labeled 2-DE images for total homogenate and soluble fraction separated on pH 3-10 gels, and also images for soluble fraction separated on pH 4-7 and pH 6-9 gels for a more detailed map. Proteins were separated in the first dimension by isoelectric focusing on immobilized pH gradient (IPG) strips, and by 7.5-17.5% gradient sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) gels in the second dimension. Protein identification was done by peptide mass fingerprinting with delayed extraction-matrix assisted laser desorption/ionization-time of flight-mass spectrometry (DE-MALDI-TOF-MS). In total, 212 protein spots (117 spots in pH 4-7 map and 95 spots in pH 6-9) corresponding to 127 different polypeptide chains were identified. In the next step, we analyzed the differential protein expression of liver tumor samples, to find out candidates for liver cancer-associated proteins. Matched pairs of tissues from 11 liver cancer patients were analyzed for their 2-DE profiles. Protein expression was comparatively analyzed by use of image analysis software. Proteins whose expression levels were different by more than three-fold in at least 30% (four) of the patients were further analyzed. Numbers of protein spots overexpressed or underexpressed in tumor tissues as compared with nontumorous regions were 9 and 28, respectively. Among these 37 spots, 1 overexpressed and 15 underexpressed spots, corresponding to 11 proteins, were identified. The physiological significance of the differential expressions is discussed.  相似文献   

18.
As a recently developed and powerful classification tool, probabilistic neural network was used to distinguish cancer patients from healthy persons according to the levels of nucleosides in human urine. Two datasets (containing 32 and 50 patterns, respectively) were investigated and the total consistency rate obtained was 100% for dataset 1 and 94% for dataset 2. To evaluate the performance of probabilistic neural network, linear discriminant analysis and learning vector quantization network were also applied to the classification problem. The results showed that the predictive ability of the probabilistic neural network is stronger than the others in this study. Moreover, the recognition rate for dataset 2 can achieve to 100% if combining these three methods together, which indicated the promising potential of clinical diagnosis by combining different methods.  相似文献   

19.
Water quality data set from the alluvial region in the Gangetic plain in northern India, which is known for high fluoride levels in soil and groundwater, has been analysed by chemometric techniques, such as principal component analysis (PCA), discriminant analysis (DA) and partial least squares (PLS) in order to investigate the compositional differences between surface and groundwater samples, spatial variations in groundwater composition and influence of natural and anthropogenic factors. Trilinear plots of major ions showed that the groundwater in this region is mainly of Na/K-bicarbonate type. PCA performed on complete data matrix yielded six significant PCs explaining 65% of the data variance. Although, PCA rendered considerable data reduction, it could not clearly group and distinguish the sample types (dug well, hand-pump and surface water). However, a visible differentiation between the water samples pertaining to two watersheds (Khar and Loni) was obtained. DA identified six discriminating variables between surface and groundwater and also between different types of samples (dug well, hand pump and surface water). Distinct grouping of the surface and groundwater samples was achieved using the PLS technique. It further showed that the groundwater samples are dominated by variables having origin both in natural and anthropogenic sources in the region, whereas, variables of industrial origin dominate the surface water samples. It also suggested that the groundwater sources are contaminated with various industrial contaminants in the region.  相似文献   

20.
Pharmacophore modeling of large, drug-like molecules, such as the dopamine reuptake inhibitor GBR 12909, is complicated by their flexibility. A comprehensive hierarchical clustering study of two GBR 12909 analogs was performed to identify representative conformers for input to three-dimensional quantitative structure–activity relationship studies of closely-related analogs. Two data sets of more than 700 conformers each produced by random search conformational analysis of a piperazine and a piperidine GBR 12909 analog were studied. Several clustering studies were carried out based on different feature sets that include the important pharmacophore elements. The distance maps, the plot of the effective number of clusters versus actual number of clusters, and the novel derived clustering statistic, percentage change in the effective number of clusters, were shown to be useful in determining the appropriate clustering level.Six clusters were chosen for each analog, each representing a different region of the torsional angle space that determines the relative orientation of the pharmacophore elements. Conformers of each cluster that are representative of these regions were identified and compared for each analog. This study illustrates the utility of using hierarchical clustering for the classification of conformers of highly flexible molecules in terms of the three-dimensional spatial orientation of key pharmacophore elements.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号