共查询到20条相似文献,搜索用时 15 毫秒
1.
To date, few efforts have been made to take simultaneous advantage of the local nature of spectral data in both the time and frequency domains in a single regression model. We describe here the use of a novel chemometrics algorithm using the wavelet transform. We call the algorithm dual-domain regression, as the regression step defines a weighted model in the time-domain based on the contributions of parallel, frequency-domain models made from wavelet coefficients reflecting different scales. In principle, any regression method can be used, and implementation of the algorithm using partial least squares regression and principal component regression are reported here. The performance of the models produced from the algorithm is generally superior to that of regular partial least squares (PLS) or principal component regression (PCR) models applied to data restricted to a single domain. Dual-domain PLS and PCR algorithms are applied to near infrared (NIR) spectral datasets of Cargill corn samples and sets of spectra collected on batch chemical reactions run in different reactors to illustrate the improved robustness of the modeling. 相似文献
2.
In a study related to the impact of air pollution on forests, needles from a healthy and a severely damaged Norway spruce tree were analysed by temperature-programmed pyrolysis/field ionization (FI) mass spectrometry. Dried and pulverized spruce needles were heated at a rate of 0.6°C s ?1 to 450°C in the high vacuum of a FI ion source. Over 100 mass spectra were recorded electrically during each analysis. From each mass spectrum, average molecular weights of the pyrolysis products were calculated; their variation with pyrolysis temperature is discussed. The mass spectra in the range m/z 100–600 are used to calculate partial weight-loss curves. The FI mass spectra are evaluated by principal component analysis and factor rotation. The three-factor spectra based on loadings of the rotated principal components show typical FI signals which are produced during pyrolysis at low, medium and high temperatures. These signal patterns are interpreted as molecular ions of thermally stable, relatively volatile plant constituents and molecular ions of thermal degradation products derived from the thermolysis of carbohydrates, lignin and other biopolymers which occur in conifer needles. Medium- and high-temperature products of lignin can be distinguished. Principal component scores can be used to simulate the appearance of single FI signals, i.e., pyrolysis products. The evaluation of time-resolved pyrolysis and soft ionization mass spectrometric data from a single sample by principal component analysis and factor rotation appears to be suitable for characterization of the major chemical components and their thermal behaviour in complex biological samples. 相似文献
3.
Time‐of‐flight secondary ion mass spectrometry (ToF‐SIMS) provides detailed molecular insight into the surface chemistry of a diverse range of material types. Extracting useful and specific information from the mass spectra and reducing the dimensionality of very large datasets are a challenge that has not been fully resolved. Multivariate analysis has been widely deployed to assist in the interpretation of ToF‐SIMS data. Principal component analysis is a popular approach that requires the generation of peak lists for every spectrum. Peak list sizes and the resulting data matrices are growing, complicating manual peak selection and analysis. Here we report the generation of very large ToF‐SIMS peak lists using up‐binning, the mass segmentation of spectral data in the range 0 to 300 m/z in 0.01 m/z intervals. Time‐of‐flight secondary ion mass spectrometry data acquired from a set of 4 standard polymers (polyethylene terephthalate, polytetrafluoroethylene, poly(methyl methacrylate), and low‐density polyethylene) are used to demonstrate the efficacy of this approach. The polymer types are discriminated to a moderate extent by principal component analysis but are easily skewed with saturated species or contaminants present in ToF‐SIMS data. Artificial neural networks, in the form of self‐organising maps, are introduced and provide a non‐linear approach to classifying data and focussing on similarities between samples. The classification outcome achieved is excellent for different polymer types and for spectra from a single polymer type generated by using different primary ions. This method offers great promise for the investigation of more complex systems including polymer classes and blends and mixtures of biological materials. 相似文献
4.
Mathematical techniques for the identification of components in mixtures from the mass spectra of a series of related mixtures are described. The approach is analogous to library search methods in that spectra from a reference collection are compared with a multidimensional unknown. Searches are conducted with a library file containing approximately 17000 mass spectra. Results for the analyses of several mixtures are reported, to illustrate the effectiveness of the method. 相似文献
5.
Genomics-based technologies in systems biology have gained a lot of popularity in recent years. These technologies generate large amounts of data. To obtain information from this data, multivariate data analysis methods are required. Many of the datasets generated in genomics are multilevel datasets, in which the variation occurs on different levels simultaneously (e.g. variation between organisms and variation in time). We introduce multilevel component analysis (MCA) into the field of metabolic fingerprinting to separate these different types of variation. This is in contrast to the commonly used principal component analysis (PCA) that is not capable of doing this: in a PCA model the different types of variation in a multilevel dataset are confounded. MCA generates different submodels for different types of variation. These submodels are lower-dimensional component models in which the variation is approximated. These models are easier to interpret than the original data. Multilevel simultaneous component analysis (MSCA) is a method within the class of MCA models with increased interpretability, due to the fact that the time-resolved variation of all individuals is expressed in the same subspace. MSCA is applied on a time-resolved metabolomics dataset. This dataset contains 1H NMR spectra of urine collected from 10 monkeys at 29 time-points during 2 months. The MSCA model contains a submodel describing the biorhythms in the urine composition and a submodel describing the variation between the animals. Using MSCA the largest biorhythms in the urine composition and the largest variation between the animals are identified. Comparison of the MSCA model to a PCA model of this data shows that the MSCA model is better interpretable: the MSCA model gives a better view on the different types of variation in the data since they are not confounded. 相似文献
6.
Multivariate statistical analysis of sediment data (information matrix 123 × 16) from the Gulf of Mexico, USA shows that
the data structure is defined by four latent factors conditionally called “inorganic natural”, “inorganic anthropogenic”,
“bioorganic” and “organic anthropogenic” explaining 39.24%, 23.17%, 10.77% and 10.67% of the total variance of the data system,
respectively. The receptor model obtained by the application of the PCR approach makes it possible to apportion the contribution
of each chemical component for the latent factor formation. A separation of the contribution of each chemical parameter is
achieved within the frames of “natural” and “anthropogenic” origin of the respective heavy metal or organic matter to the
sediment formation process. This is a new approach as compared to the traditional “one dimensional” search with a limited
number of preliminary selected tracer components. The model suggested divides natural from anthropogenic influences and allows
in this way each participant in the sediment formation process to be used as marker of either natural or anthropogenic effects.
Received: 20 March 1999 / Revised: 1 June 1999 / Accepted: 3 June 1999 相似文献
7.
Multivariate statistical analysis of sediment data (information matrix 123 × 16) from the Gulf of Mexico, USA shows that
the data structure is defined by four latent factors conditionally called “inorganic natural”, “inorganic anthropogenic”,
“bioorganic” and “organic anthropogenic” explaining 39.24%, 23.17%, 10.77% and 10.67% of the total variance of the data system,
respectively. The receptor model obtained by the application of the PCR approach makes it possible to apportion the contribution
of each chemical component for the latent factor formation. A separation of the contribution of each chemical parameter is
achieved within the frames of “natural” and “anthropogenic” origin of the respective heavy metal or organic matter to the
sediment formation process. This is a new approach as compared to the traditional “one dimensional” search with a limited
number of preliminary selected tracer components. The model suggested divides natural from anthropogenic influences and allows
in this way each participant in the sediment formation process to be used as marker of either natural or anthropogenic effects.
Received: 20 March 1999 / Revised: 1 June 1999 / Accepted: 3 June 1999 相似文献
8.
A multivariate data analysis procedure that uses singular value decomposition and the Ho-Kashyap algorithm is proposed to obtain calibration constants for x-ray fluorescence spectrometry. These calibration constants can be used to obtain results from experimental data by means of a simple dot product calculation. The method was tested on experimental data from the literature. Comparison of results showed that the method performs at least as well or better than the Rasberry-Heinrich method or its modifications. The method can be used to express calibration results obtained with a theoretically based program in such a way that they can be used conveniently in routine applications. 相似文献
9.
Sizeable data bases are now being routinely generated in a variety of contexts in chemical industry. Statistical investigations of such data bases are aimed both at initially uncovering structure and eventually proposing models, in particular for predicting product quality by the mix or characteristics of the chemical compounds. Online Multivariate Exploratory Graphical Analysis (OMEGA) stands for a structured exploratory study of the relationships in a multivariate data set, where, rather than testing for one specific property, as many clues as possible for interesting structures are searched for by different dimension reductions and succeeding interactive graphical analyses. The stability of the projections obtained by the different dimension reduction methods is assessed by simulation producing graphical displays particularly supporting the identification of influential points. The variation of the predictions obtained by the different dimension reduction methods is assessed by cross-validation delivering misclassification rates or cross-validated R 2 values. The interpretation of the new coordinates corresponding to dimension reduction is supported by loading simplifications and graphical displays for judging its adequacy. The OMEGA strategy has been found to be an effective tool for routine searching for structure. 相似文献
10.
Summary The following parameters were analyzed 2 to 4 times a year from 37 sampling sites; T, O 2, O 2%, Turbidity, Suspended solids, Conductivity, Alkalinity, pH, Color, COD Mn, Total nitrogen, Total phosphorus, Cl, Fe, Mn, Total sulphur, K, Na, Ca, Mg, SiO 2, Total organic chlorine and Total organic bromine. Samples were taken from waters loaded by chemical pulp mills, other industries, municipal waste waters and agriculture. Also waters under natural conditions were included. Water samples have been collected and analyzed in co-operation with the National Board of Waters and the Environment. The data set was analyzed by Principal Component Analysis (PCA) to determine correlations between variables, especially between Total organic chlorine and Total organic bromine and others. Typically Total organic chlorine and Total organic bromine correlated with Na, Cl and Total sulphur. It is interesting to note that Total organic chlorine and Total organic bromine did not follow each other in all components. Total organic chlorine was predicted using other variables and Partial Least Squares (PLS) method. Very satisfactory correlation was obtained between analyzed and predicted lgTOCl values. Optimally three different object classes were found from the whole data using fuzzy clustering analysis. One class represents waters in a natural condition, one water loaded mainly by agriculture and one represent the rest of the waters. 相似文献
11.
Infrared attenuated total reflection spectra of 133 whole EDTA blood samples, from patients of a general hospital population, in the range from 1500 to 750 cm –1 were used for the calibration of glucose. Reference concentration values were provided by the enzymatic glucose dehydrogenase method. The partial-least squares (PLS) algorithm was used to solve the inverse regression problem. The prediction results from, calculations using spectral and Fourier-transformed data were compared, and in the latter case, the data reduction yielded no advantage. The spectral range optimization for calibration can be carried out more flexibly in the spectral domain which is more readily interpreted by the spectroscopist. 相似文献
12.
Summary The performance of neural networks in classifying mass spectral data is evaluated and compared to methods of multivariate data analysis and pattern recognition. Back propagation networks are matched with linear discriminant analysis, Kohonen feature maps are compared to the knearest neighbour clustering algorithm. Eight classifiers were trained, in order to discriminate mass spectra of steroids from eight distinct classes of chemical compounds. The results obtained show slightly better performance of Kohonen networks compared to k-nearest neighbour clustering and equal performance of multi-layer perceptrons and discriminant analysis. 相似文献
13.
The quality of water destined for human consumption has been treated as a multivariate property. Since most of the quality parameters are obtained by applying analytical methods, the routine analytical laboratory (responsible for the accuracy of analytical data) has been treated as a process system for water quality estimation. Multivariate tools, based on principal component analysis (PCA) and partial least squares (PLS) regression, are used in the present paper to: (i) study the main factors of the latent data structure and (ii) characterize the water samples and the analytical methods in terms of multivariate quality control (MQC). Such tools could warn of both possible health risks related to anomalous sample composition and failures in the analytical methods. 相似文献
14.
Algorithms are described for correlating a proposed molecular structure with a mass spectrum. All molecular substructures of a proposed structure are determined which have the same masses as the fragment ions. The most likely fragment ion structures are those molecular substructures formed with the fewest number of bond cleavages in the proposed structure. The algorithms, which incorporate methods for handling rearrangement and adduct ions, utilize either nominal or exact data originating from any ionization method. The algorithms are demonstrated using the mass spectra of a substituted azetidinyl ketone and the macrolide antibiotic avermectin A 1a. 相似文献
17.
A graphically-oriented data base of spectral interferences dut occur in inductively coupled plasma-mais spectrometry and glow discharge-mass spectrometry has been developed. The program is called “MS Interview” and runs on a Macintosh computer. The program allows one to specify which technique (ICP or GD) the various interferences will be presented for, and for the case of the ICP, the acid matrix background. Bated on these parameters the program provides a listing of interferences broken down into the following categories: Isobarics, Oxides, Doubly Charged Species, Background Dependent, and Matrix Dependent. For the glow discharge case there are two additional categories: Argides and Dimers. Interference information is provided for all masses of all elements and is easily accessed via mouse operations from a periodic table window and element mass spectral windows. The program is expandable to include other ion sources and interferences can be added or deleted as required by the user. Finally, the program also includes a small library of typical background spectra that can be displayed and manipulated. This article is an electronic publication in Spectrochimica Acta Electronica (SAE), the electronic section of Spectrochimica Acta B (SAB). The hardcopy text is accompanied by a disk containing the program MS Interview, a manual, a reference list, and a bar graph format mass spectral library of the elements. 相似文献
18.
A new approach to the interpretation of spectra with “fuzzy sets” is described. A computer program CIF (Compound Identification with Fuzzy sets) is applied. This program is capable of finding components in a mixture by comparing the sample spectrum with reference spectra in a library. The applications discussed involve the interpretation of infrared spectra. The problems of spectral library search are discussed, an elementary introduction to fuzzy set theory is given, and applications to spectral library search are demonstrated. 相似文献
20.
One of the most important physicochemical parameters of a molecule that determines its bioactivity is its lipophilicity. Cluster analysis (CA), principal component analysis (PCA), and sum of ranking differences (SRD) were used to compare the lipophilic parameters of twenty phenylacetamide derivatives, obtained experimentally as chromatographic retention data in the presence of different solvents and calculated by different mathematical methods. All the applied methods of multivariate analysis gave approximately similar grouping of the studied lipophilic parameters. In the attempt to group the investigated compounds in respect of their lipophilicity, the obtained results appeared to be dependent on the applied chemometric method. The CA and PCA, grouped the compounds on the basis of the nature of the substituents R 1 and R 2, indicating that they determine to a great extent the lipophilicity of the investigated molecules. Unlike them, the SRD method could not be used to group the studied compounds on the basis of their lipophilic character. 相似文献
|