首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Pulsed laser‐induced autofluorescence spectra of pathologically certified normal and malignant colonic mucosal tissues were recorded at 325 nm excitation. The spectra were analysed using three different methods for discrimination purposes. First, all the spectra were subjected to the principal component analysis (PCA) and the discrimination between normal and malignant cases were achieved using parameters like, spectral residuals, Mahalanobis distance and scores of factors. Second, to understand the changes in tissue composition between the two classes (normal, and malignant), difference spectrum was constructed by subtracting mean spectrum of calibration set samples from simulated mean of all spectra of any one class (normal/malignant) and in third, artificial neural network (ANN) analysis was carried out on the same set of spectral data by training the network with spectral features like, mean, median, spectral residual, energy, standard deviation, number of peaks for different thresholds (100, 250 and 500) after carrying out 1st‐order differentiation of the training set samples and discrimination between normal and malignant conditions were achieved. The specificity and sensitivity were determined in PCA and ANN analyses and they were found to be 100 and 91.3% in PCA, and 100 and 93.47% in ANN, respectively. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

2.
Sârbu C  Pop HF 《Talanta》2005,65(5):1215-1220
Principal component analysis (PCA) is a favorite tool in environmetrics for data compression and information extraction. PCA finds linear combinations of the original measurement variables that describe the significant variations in the data. However, it is well-known that PCA, as with any other multivariate statistical method, is sensitive to outliers, missing data, and poor linear correlation between variables due to poorly distributed variables. As a result data transformations have a large impact upon PCA. In this regard one of the most powerful approach to improve PCA appears to be the fuzzification of the matrix data, thus diminishing the influence of the outliers. In this paper we discuss and apply a robust fuzzy PCA algorithm (FPCA). The efficiency of the new algorithm is illustrated on a data set concerning the water quality of the Danube River for a period of 11 consecutive years. Considering, for example, a two component model, FPCA accounts for 91.7% of the total variance and PCA accounts only for 39.8%. Much more, PCA showed only a partial separation of the variables and no separation of scores (samples) onto the plane described by the first two principal components, whereas a much sharper differentiation of the variables and scores is observed when FPCA is applied.  相似文献   

3.
Multivariate chemical data often contain elements that are missing completely at random and the so-called left-censored elements whose values are only known to be below a definite threshold value (reporting limit). In the last several years, attention has been paid to developing methods for dealing with data containing missing elements and those that can handle data with missing elements and outliers. However, processing data with both missing and left-censored elements is still an ongoing problem.  相似文献   

4.
The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data,however,makes the classification quite challenging. Although principal component analysis (PCA) is of particular interest for the high-dimensional data,it may overemphasize some aspects and ignore some other important information contained in the richly complex data,because it displays only the difference in the first twoor three-dimensional PC subsp...  相似文献   

5.
Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance/correlation matrix of the analyzed data. However, to properly work with high-dimensional data sets, PCA poses severe mathematical constraints on the minimum number of different replicates, or samples, that must be included in the analysis. Generally, improper sampling is due to a small number of data respect to the number of the degrees of freedom that characterize the ensemble. In the field of life sciences it is often important to have an algorithm that can accept poorly dimensioned data sets, including degenerated ones. Here a new random projection algorithm is proposed, in which a random symmetric matrix surrogates the covariance/correlation matrix of PCA, while maintaining the data clustering capacity. We demonstrate that what is important for clustering efficiency of PCA is not the exact form of the covariance/correlation matrix, but simply its symmetry.  相似文献   

6.
All X‐ray photoelectron spectroscopy (XPS) and time‐of‐flight secondary ion mass spectrometry (ToF‐SIMS) instruments have optical cameras to image the specimen under analysis, and often to image the sample holder as it enters the system too. These cameras help the user find the appropriate points for analysis of specimens. However they seldom give as good images as stand‐alone bench optical microscopes, because of the limited geometry, source/analyser solid angle and ultra‐high‐vacuum (UHV) design compromises. This often means that the images displayed to the user necessarily have low contrast, low resolution and poor depth‐of‐field. To help identify the different regions of the samples present we have found it useful to perform multispectral imaging by illuminating the sample with narrow‐wavelength‐range light emitting diodes (LEDs). By taking an image under the illumination of these LEDs in turn, each at a successively longer wavelength, one can build up a set of registered images that contain more information than a simple Red–Green–Blue image under white‐light illumination. We show that this type of multispectral imaging is easy and inexpensive to fit to common XPS and ToF‐SIMS instruments, using LEDs that are widely available. In our system we typically use 14 LEDs including one emitting in the ultraviolet (so as to allow fluorescent imaging) and three in the near infra‐red. The design considerations of this system are discussed in detail, including the design of the drive and control electronics, and three practical examples are presented where this multispectral imaging was extremely useful. Copyright © 2016 The Authors Surface and Interface Analysis Published by John Wiley & Sons Ltd.  相似文献   

7.
Transformation of electronic absorption spectra of zirconocene catalytic systems Ph2CCpFluZrCl2-polymethylalumoxane (MAO) and rac-Me2Si(2-Me,4-PhInd)2ZrCl2-MAO (Flu is fluorenyl, Ind is indenyl) in toluene was studied upon a change in the ratio of reactants AlMAO/Zr from 0 to 3000 mol mol−1. Analysis of the spectroscopic data using statistical methods determined the number of reaction products in each system. A reaction model including three equilibria and being common for the both systems was proposed. Effective equilibrium constants and absorption spectra of individual reaction products were determined by parametric self-modeling of the experimental spectra. Published in Russian in Izvestiya Akademii Nauk. Seriya Khimicheskaya, No. 10, pp. 2257–2264, October, 2005.  相似文献   

8.
We have demonstrated an informatics methodology for finding correlations between the full profile Fourier transform infrared spectra of polycrystalline 3C‐silicon carbide (poly‐SiC) films and their growth conditions, thereby developing high‐throughput structure‐process relationships. Because SiC films are a structural element in photonic sensors, this paper focuses on the interpretation of their optical response, the multivariate tracking of critical processing pathways, and the identification of controlling processing mechanisms. Using principal component analysis, we have developed a data analysis tool to aid in the assessment of the relative contributions of experimental parameters in low‐pressure chemical vapor deposition processes to optical responses on the basis of the size of eigenvalues of the spectral data set. The applied methodology for identifying spectral relationships of stoichiometry, dopant chemistry, and microstructure of poly‐SiC provides more effective guidelines to manipulate optical responses by controlling multiple experimental parameters. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

9.
In this paper, an improved approach to interpret results of principal component analysis (PCA) of time‐of‐flight secondary ion mass spectrometry (ToF‐SIMS) spectra is presented. Signals are typically observed in different intensity ranges in a single ToF‐SIMS spectrum due to different sensitivity factors and surface concentrations. This can complicate the PCA interpretation, because loadings are reported to be strongly affected by these intensity changes. In contrast, it is shown here that correlation loadings are unaffected by these differences. In particular, correlation loadings were successfully used to identify signals with relatively low intensity but high significance. These signals may be overlooked when only loadings are used. This is particularly true in failure analysis, where ToF‐SIMS is used to screen for initially unknown signals that may be relevant for the characteristics/failure of a product. As a model study, the concept was applied to investigate ageing of Li‐ion batteries by ToF‐SIMS. In this data set, the significance of impurities that affect the quality of Li‐ion batteries was identified only by correlation loadings, whereas the loadings were found to overestimate the influence of other matrix signals. In addition, correlation loadings aid in the chemical identification and helped to successfully assign unknown peaks.  相似文献   

10.
由于碱基在核酸中的重要性 ,多年来一直有关于碱基的理论计算报道[1~ 7] 。本文将化学计量学中的主成分分析方法[8] 用来分析五种碱基 :腺嘌呤 (A)、鸟嘌呤 (G)、胞嘧啶 (C)、尿嘧啶 (U)和胸腺嘧啶 (T)计算结果的几何参数 ,以期取得有用的结构信息。1 方法通过ACD ChemSketch 3 .5 [9] 的三维优化 (分子力学方法CHARMM力场 )获得碱基的起始几何结构 ,其原子编号见图 1。所有的计算均采用Gaussian 94程序[10 ] 在IBM PC兼容机上完成。首先 ,对 5种碱基作了 6种半经验方法 (AM1、PM3、MNDO、…  相似文献   

11.
This study focuses on acquiring information on the degradation process of proteinaceous binders due to ultra violet (UV) radiation and possible interactions owing to the presence of historical mineral pigments. With this aim, three different paint model samples were prepared according to medieval recipes, using rabbit glue as proteinaceus binders. One of these model samples contained only the binder, and the other two were prepared by mixing each of the pigments (cinnabar or azurite) with the binder (glue tempera model samples). The model samples were studied by applying Principal Component Analysis (PCA) to their mass spectra obtained with Matrix‐Assisted Laser Desorption/Ionization‐Time of Flight Mass Spectrometry (MALDI‐TOF‐MS). The complementary use of Fourier Transform Infrared Spectroscopy to study conformational changes of secondary structure of the proteinaceous binder is also proposed. Ageing effects on the model samples after up to 3000 h of UV irradiation were periodically analyzed by the proposed approach. PCA on MS data proved capable of identifying significant changes in the model samples, and the results suggested different aging behavior based on the pigment present. This research represents the first attempt to use this approach (PCA on MALDI‐TOF‐MS data) in the field of Cultural Heritage and demonstrates the potential benefits in the study of proteinaceous artistic materials for purposes of conservation and restoration. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

12.
A comparison between different conformations of a given protein, relating both structure and dynamics, can be performed in terms of combined principal component analysis (combined‐PCA). To that end, a trajectory is obtained by concatenating molecular dynamics trajectories of the individual conformations under comparison. Then, the principal components are calculated by diagonalizing the correlation matrix of the concatenated trajectory. Since the introduction of this approach in 1995 it has had a large number of applications. However, the interpretation of the eigenvectors and eigenvalues so obtained is based on intuitive foundations, because analytical expressions relating the concatenated correlation matrix with those of the individual trajectories under consideration have not been provided yet. In this article, we present such expressions for the cases of two, three, and an arbitrary number of concatenated trajectories. The formulas are simple and show what is to be expected and what is not to be expected from a combined‐PCA. Their correctness and usefulness is demonstrated by discussing some representative examples. The results can be summarized in a simple sentence: the correlation matrix of a concatenated trajectory is given by the average of the individual correlation matrices plus the correlation matrix of the individual averages. From this it follows that the combined‐PCA of trajectories belonging to different free energy basins provides information that could also be obtained by alternative and more straightforward means. © 2014 Wiley Periodicals, Inc.  相似文献   

13.
The application of a new method to the multivariate analysis of incomplete data sets is described. The new method, called maximum likelihood principal component analysis (MLPCA), is analogous to conventional principal component analysis (PCA), but incorporates measurement error variance information in the decomposition of multivariate data. Missing measurements can be handled in a reliable and simple manner by assigning large measurement uncertainties to them. The problem of missing data is pervasive in chemistry, and MLPCA is applied to three sets of experimental data to illustrate its utility. For exploratory data analysis, a data set from the analysis of archeological artifacts is used to show that the principal components extracted by MLPCA retain much of the original information even when a significant number of measurements are missing. Maximum likelihood projections of censored data can often preserve original clusters among the samples and can, through the propagation of error, indicate which samples are likely to be projected erroneously. To demonstrate its utility in modeling applications, MLPCA is also applied in the development of a model for chromatographic retention based on a data set which is only 80% complete. MLPCA can predict missing values and assign error estimates to these points. Finally, the problem of calibration transfer between instruments can be regarded as a missing data problem in which entire spectra are missing on the ‘slave’ instrument. Using NIR spectra obtained from two instruments, it is shown that spectra on the slave instrument can be predicted from a small subset of calibration transfer samples even if a different wavelength range is employed. Concentration prediction errors obtained by this approach were comparable to cross-validation errors obtained for the slave instrument when all spectra were available.  相似文献   

14.
The effects of oxygen plasma treatment and the subsequent air exposure on the surface composition and properties of bisphenol A polycarbonate (BPA‐PC) were analysed by X‐ray photoelectron spectroscopy (XPS), ellipsometry, static time‐of‐flight secondary ion mass spectrometry (ToF‐SIMS) with principal component analysis (PCA) and nanoindentation using an atomic force microscope (AFM). PCA showed systematic changes in the film chemistry after short treatment times (0.1 s), with the main sites of attack being the carbonate and aromatic ring structure. On the basis of this multitechnique analysis, it was unambiguously determined that extended oxygen plasma treatment times resulted in the formation of low‐molecular‐weight material (LMWM) within the first 50 nm on the surface, and not in a cross‐linked skin as has been proposed by other researchers. The study shows that controlled surface modification of BPA‐PC polymers is possible, allowing surface oxygen incorporation without degradation of the polymer structure. This result is relevant for improved adhesion of coatings applied to BPA‐PC polymers. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

15.
Principal Component Analysis (PCA) was used for the mapping of geochemical data. A testing data matrix was prepared from the chemical and physical analyses of the coals altered by thermal and oxidation effects. PCA based on Singular Value Decomposition (SVD) of the standardized (centered and scaled by the standard deviation) data matrix revealed three principal components explaining 85.2% of the variance. Combining the scatter and components weights plots with knowledge of the composition of tested samples, the coal samples were divided into seven groups depending on the degree of their oxidation and thermal alteration. The PCA findings were verified by other multivariate methods. The relationships among geochemical variables were successfully confirmed by Factor Analysis (FA). The data structure was also described by the Average Group dendrogram using Euclidean distance. The found sample clusters were not defined so clearly as in the case of PCA. It can be explained by the PCA filtration of the data noise.  相似文献   

16.
Nonlinear underdetermined blind separation of nonnegative dependent sources consists in decomposing a set of observed nonlinearly mixed signals into a greater number of original nonnegative and dependent component (source) signals. This hard problem is practically relevant for contemporary metabolic profiling of biological samples, where sources (a.k.a. pure components or analytes) are aimed to be extracted from mass spectra of nonlinear multicomponent mixtures. This paper presents a method for nonlinear underdetermined blind separation of nonnegative dependent sources that comply with a sparse probabilistic model, that is, sources are constrained to be sparse in support and amplitude. This model is validated on experimental pure component mass spectra. Under a sparse prior, a nonlinear problem is converted into an equivalent linear one comprised of original sources and their higher‐order, mostly second‐order, monomials. The influence of these monomials, which stand for error terms, is reduced by preprocessing a matrix of mixtures by means of robust principal component analysis and hard, soft and trimmed thresholding. Preprocessed data matrices are mapped in high‐dimensional reproducible kernel Hilbert space (RKHS) of functions by means of an empirical kernel map. Sparseness‐constrained nonnegative matrix factorizations in RKHS yield sets of separated components. They are assigned to pure components from the library using a maximal correlation criterion. The methodology is exemplified on demanding numerical and experimental examples related respectively to extraction of eight dependent components from three nonlinear mixtures and to extraction of 25 dependent analytes from nine nonlinear mixture mass spectra recorded in nonlinear chemical reaction of peptide synthesis. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

17.
Time‐of‐flight secondary ion mass spectrometry (ToF‐SIMS) provides detailed molecular insight into the surface chemistry of a diverse range of material types. Extracting useful and specific information from the mass spectra and reducing the dimensionality of very large datasets are a challenge that has not been fully resolved. Multivariate analysis has been widely deployed to assist in the interpretation of ToF‐SIMS data. Principal component analysis is a popular approach that requires the generation of peak lists for every spectrum. Peak list sizes and the resulting data matrices are growing, complicating manual peak selection and analysis. Here we report the generation of very large ToF‐SIMS peak lists using up‐binning, the mass segmentation of spectral data in the range 0 to 300 m/z in 0.01 m/z intervals. Time‐of‐flight secondary ion mass spectrometry data acquired from a set of 4 standard polymers (polyethylene terephthalate, polytetrafluoroethylene, poly(methyl methacrylate), and low‐density polyethylene) are used to demonstrate the efficacy of this approach. The polymer types are discriminated to a moderate extent by principal component analysis but are easily skewed with saturated species or contaminants present in ToF‐SIMS data. Artificial neural networks, in the form of self‐organising maps, are introduced and provide a non‐linear approach to classifying data and focussing on similarities between samples. The classification outcome achieved is excellent for different polymer types and for spectra from a single polymer type generated by using different primary ions. This method offers great promise for the investigation of more complex systems including polymer classes and blends and mixtures of biological materials.  相似文献   

18.
Compared with daily recorded process variables that can be easily obtained through the distributed control system, acquirements of key quality variables are much more difficult. As a result, for soft sensor development, we may only have a small number of output data samples and have much more input data samples. In this case, it is important to incorporate more input data samples to improve the modeling performance of the soft sensor. On the basis of the semisupervised modeling method, this paper aims to extend the linear semisupervised soft sensor to the nonlinear one, with incorporation of the kernel learning algorithm. Under the probabilistic modeling framework, a mixture form of the nonlinear semisupervised soft sensor is developed in the present work. To evaluate the performance of the developed nonlinear semisupervised soft sensor, an industrial case study is provided. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

19.
ABSTRACT

It is well known that bromodomain-containing protein 4 (BRD4) has been thought as a promising target utilized for treating various human diseases, such as inflammatory disorders, malignant tumours, acute myelogenous leukaemia (AML), bone diseases, etc. For this study, molecular dynamics (MD) simulations, binding free energy calculations, and principal component analysis (PCA) were integrated together to uncover binding modes of inhibitors 8P9, 8PU, and 8PX to BRD4(1). The results obtained from binding free energy calculations show that van der Waals interactions act as the main regulator in bindings of inhibitors to BRD4(1). The information stemming from PCA reveals that inhibitor associations extremely affect conformational changes, internal dynamics, and movement patterns of BRD4(1). Residue-based free energy decomposition method was wielded to unveil contributions of independent residues to inhibitor bindings and the data signify that hydrogen bonding interactions and hydrophobic interactions are decisive factors affecting bindings of inhibitors to BRD4(1). Meanwhile, eight residues Trp81, Pro82, Val87, Leu92, Leu94, Cys136, Asn140, and Ile146 are recognized as the common hot interaction spots of three inhibitors with BRD4(1). The results from this work are expected to provide a meaningfully theoretical guidance for design and development of effective inhibitors inhibiting of the activity of BRD4.  相似文献   

20.
Matrix‐assisted laser desorption/ionisation–time of flight (MALDI‐TOF) mass spectrometry is commonly used for the identification of proteinaceous binders and their mixtures in artworks. The determination of protein binders is based on a comparison between the m/z values of tryptic peptides in the unknown sample and a reference one (egg, casein, animal glues etc.), but this method has greater potential to study changes due to ageing and the influence of organic/inorganic components on protein identification. However, it is necessary to then carry out statistical evaluation on the obtained data. Before now, it has been complicated to routinely convert the mass spectrometric data into a statistical programme, to extract and match the appropriate peaks. Only several ‘homemade’ computer programmes without user‐friendly interfaces are available for these purposes. In this paper, we would like to present our completely new, publically available, non‐commercial software, ms‐alone and multiMS‐toolbox, for principal component analyses of MALDI‐TOF MS data for R software, and their application to the study of the influence of heterogeneous matrices (organic lakes) for protein identification. Using this new software, we determined the main factors that influence the protein analyses of artificially aged model mixtures of organic lakes and fish glue, prepared according to historical recipes that were used for book illumination, using MALDI‐TOF peptide mass mapping. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号