首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A method is described for the selection of features from infrared spectra, aimed at computer-aided interpretation by retrieval of coded spectra. The coding procedure is similar to that of the ASTM Infrared Spectral Index, involving 140 binary-coded wavelength inter-vals (peak positions) of 0.1 μm for each spectrum (Wyandotte code). In addition to this procedure, windows of 0.3 μm and 0.5 μm are used. For a given set of spectra, the peak positions are grouped by means of numerical taxonomy; the correlation coefficient is used as a criterion. Information contents of all peak positions are calculated with Shannon's formula. One peak position is selected from each group, viz. the position having the highest information content. The selection obtained in this way is composed of peak positions that are weakly correlated yet yield much information. The spectra are then coded again, taking account of the selected peak positions only. In order to evaluate the selection, the number of spectra still differing from all other spectra in the set is determined by comparing all reduced spectra with each other. For a file of 395 spectra (hydrocarbons, alcohols, ethers, carbonyl compounds) 99.0% of the spectra are unique when 27 selected features are used. For a file of 5100 spectra (of a wide variety of compounds, taken from the ASTM Infrared Spectral Index) 97.7% of these spectra are unique when only 40 out of all 140 peak positions are used.  相似文献   

2.
A procedure has been developed for estimating the information content of retrieval systems with binary-coded mass spectra, as well as mass spectra coded by other methods, from the statistical properties of a reference file. For a reference file, binary-coded with a threshold of 1% of the intensity of the base peak, this results typically in an estimated information content of about 50 bits for 200 selected mz values. It is shown that, because of errors occurring in the binary-coded spectra, the actual information content is only about 12 bits. This explains the poor performance observed for retrieval systems with binary-coded mass spectra.  相似文献   

3.
A retrieval system for binary-coded mass spectra is described. The data base used contains 9628 low-resolution mass spectra from the Aldermaston Mass Spectra Data Collection. These spectra are reduced to 106 preselected binary-coded m/z values each. Storage of the compound names and formulae is minimized by using a special set of characters and file organization. The search strategy permits fast generation of the N- nearest neighbours. Depending on the number of best matches generated, an average search requires access to only 24—33% of the spectra contained in the data base. Because of its limited storage requirements, this search system can be used even on microcomputers.l]  相似文献   

4.
Kernel independent component analysis (KICA), a kind of independent component analysis (ICA) algorithms based on kernel, was preliminarily investigated for blind source separation (BSS) of source spectra profiles from troches. The robustness of different ICA algorithms (KICA, FastICA and Infomax) was first checked by using them in the retrieval of source infrared (IR), ultraviolet (UV) and mass spectra (MS) from synthetic mixtures. It was found that KICA is the most robust method for retrieval of source spectra profiles. KICA algorithm is subsequently adopted in the analysis of diffuse reflection IR of acetylspiramycin (ASPM) troches. It is observed that KICA is able to isolate the theoretically predicted spectral features corresponding to the ASPM active components, excipients and other minor components as different independent (spectral) component. A troche can be authenticated and semi-quantified using the estimated ICs. KICA is an useful method for estimation of source spectral features of molecules with different geometry and stoichiometry, while features belonging to very similar molecules remain grouped.  相似文献   

5.
A similarity-search system is described for proton-NMR spectroscopy. In order to achieve fast retrieval of reference compounds, the 1H-NMR spectra of the data base and of the unknown are encoded in a bitsring. The individual bits of the binary signature describe different features of the spectra. Part of the coupling information is coded in such a way that effects of magnetic field strength are taken into account. The encoding thus permits a fast search for identical and structurally similar reference compounds in the data base even when the spectra were recorded at different magnetic field strengths. Because the search consists of weighted comparison of bits, each of them describing different spectral features, a choice of different kinds of searches is possible with the same signature by selecting appropriate weight vectors. Thus specific spectroscopic features can be selected for the search. Such a context-sensitive similarity-search system allows, for example, a search for compounds having similar multiplicities or similar subspectra in a given (e.g., aromatic) region of the spectrum. Furthermore, by adjusting two “software knobs” which influence the normalization of the search results, the user can choose between the two extremes of forward and reverse search, and between an identity search, similarity search or classification search. The results were tested on a small library containing 550 spectra including some mixtures and duplicates recorded under different experimental conditions at 250 and 400 MHz.  相似文献   

6.
The introduction of a variance‐filter to both direct standardization (DS) and piece‐wise direct standardization (PDS) instrumental transfer methods for the analysis of NMR spectral data is described. The variance‐filter modification allows for the identification of regions in the NMR spectra that are not adequately represented by the limited number of transfer calibration samples used during the calculation of the instrument‐to‐instrument transfer matrix. For these spectral frequencies, the corresponding portion of the transfer matrix is replaced by identity (or scaled identity) prior to the secondary instrumental data sets being transferred to the target instrument response. The spectral matching performance of the variance‐filtered instrumental transfer method as applied to high‐resolution 1H NMR spectra is presented along with possible uses and limitations. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

7.
The problems of preprocessing of 13C-n.m.r. spectra for hierarchical clustering are discussed. Encoding of the spectra in nonequidistant intervals is proposed. To establish the optimal intervals, a Simplex method with variable-sized movements is used. The optimized parameter is the amount of information contained in the first two coordinates of the transforms, obtained by the application of principal component analysis to the 13C-n.m.r. spectra. The spectra encoded in optimized intervals are used for automated structure elucidation, based on a hierarchical organization of a collection of more than 2000 assigned13C-n.m.r. spectra. The hierarchical trees needed for the library search and prediction of some structural features were generated by a 3-distances clustering method. The retrieval and predictive abilities of the system are discussed.  相似文献   

8.
The possibility provided by Chemometrics to extract and combine (fusion) information contained in NIR and MIR spectra in order to discriminate monovarietal extra virgin olive oils according to olive cultivar (Casaliva, Leccino, Frantoio) has been investigated.Linear discriminant analysis (LDA) was applied as a classification technique on these multivariate and non-specific spectral data both separately and jointly (NIR and MIR data together).In order to ensure a more appropriate ratio between the number of objects (samples) and number of variables (absorbance at different wavenumbers), LDA was preceded either by feature selection or variable compression. For feature selection, the SELECT algorithm was used while a wavelet transform was applied for data compression.Correct classification rates obtained by cross-validation varied between 60% and 90% depending on the followed procedure. Most accurate results were obtained using the fused NIR and MIR data, with either feature selection or data compression.Chemometrical strategies applied to fused NIR and MIR spectra represent an effective method for classification of extra virgin olive oils on the basis of the olive cultivar.  相似文献   

9.
10.
Proteins possess strong absorption features in the combination range (5000-4000 cm−1) of the near infrared (NIR) spectrum. These features can be used for quantitative analysis. Partial least squares (PLS) regression was used to analyze NIR spectra of lysozyme with the leave-one-out, full cross-validation method. A strategy for spectral range optimization with cross-validation PLS calibration was presented. A five-factor PLS model based on the spectral range between 4720 and 4540 cm−1 provided the best calibration model for lysozyme in aqueous solutions. For 47 samples ranging from 0.01 to 10 mg/mL, the root mean square error of prediction was 0.076 mg/mL. This result was compared with values reported in the literature for protein measurements by NIR absorption spectroscopy in human serum and animal cell culture supernatants.  相似文献   

11.
A combined forward—reverse library search routine for low-resolution mass spectra is described. The routine requires binary-coded spectra. Masses and peak intensities are used for spectral comparison. On the basis of three possible search strategies, this routine is adaptable to analytical problems. The program was tested for 25 000 spectra from the ISAS, MSDC and EPA mass spectra libraries. The program is written completely in FORTRAN IV.  相似文献   

12.
Copy toner samples were analyzed using reflection-absorption infrared microscopy (R-A IR). The grouping of copy toners into distinguishable classes achieved by visual comparison and computer-assisted spectral matching was compared to that achieved by multivariate discriminant analysis. For a data set containing spectra of 430 copy toners, 90% (388/430) of the spectra were initially correctly grouped into the classifications previously established by spectral matching. Three groups of samples that did not classify well contained too few samples to allow reliable classification. Samples from two other pairs of groups were similar and often misclassified. Closer examination of spectra from these groups revealed discriminating features that could be used in separate discriminant analyses to improve classification. For one pair of groups, the classification accuracy improved to 91% (81/89) and 97% (28/29), for the two groups, respectively. The other pair of groups were completely distinguishable from one another. With these additional tests, multivariate discriminant analysis correctly classified 96% of the 430 R-A IR toner spectra into the toner groups found previously by spectral matching.This is publication number 03–03 of the Laboratory Division of the Federal Bureau of Investigation. Names of commercial manufacturers are provided for identification only, and inclusion does not imply endorsement by the Federal Bureau of Investigation.  相似文献   

13.
Tandem mass spectra (MS/MS) produced using electron transfer dissociation (ETD) differ from those derived from collision-activated dissociation (CAD) in several important ways. Foremost, the predominant fragment ion series are different: c- and z ·-type ions are favored in ETD spectra while b- and y-type ions comprise the bulk of the fragments in CAD spectra. Additionally, ETD spectra possess charge-reduced precursors and unique neutral losses. Most database search algorithms were designed to analyze CAD spectra, and have only recently been adapted to accommodate c- and z ·-type ions; therefore, inclusion of these additional spectral features can hinder identification, leading to lower confidence scores and decreased sensitivity. Because of this, it is important to pre-process spectral data before submission to a database search to remove those features that cause complications. Here, we demonstrate the effects of removing these features on the number of unique peptide identifications at a 1% false discovery rate (FDR) using the open mass spectrometry search algorithm (OMSSA). When analyzing two biologic replicates of a yeast protein extract in three total analyses, the number of unique identifications with a ∼1% FDR increased from 4611 to 5931 upon spectral pre-processing—an increase of ∼28. 6%. We outline the most effective pre-processing methods, and provide free software containing these algorithms.  相似文献   

14.
Principal component analysis is applied to the interpretation of 13C-n.m.r. spectra and to the resolution of mass spectral data. A procedure is given for determining the relative amounts of pure components, with and without the use of pure mass lines, in mass spectra of mixtures. The use of the Fisher discriminant method in combination with the principal components technique is demonstrated in the treatment of trace element data on hair for environmental purposes. The importance of feature generation and selection is emphasized.  相似文献   

15.
The interpretation of carbon-NMR spectra is mainly based on the comparison with suitable reference data taken from literature. The whole information contents of13C-NMR spectra cannot be utilized by manual interpretation. Therefore a network of interactive computer programs has been developed, which simulates the strategy of the spectroscopist in generating structural fragments from the spectral data. The most important knowledge source for this process is a carbon-NMR data base containing some 17,500 spectra. Structural fragments are generated automatically from this data file and assembled by a model builder to complete chemical structures using constraints derived from the spectral data. A comparison of the experimental carbon-NMR spectrum with the estimated ones allows the generation of a sorted hitlist.For part II see: H. Kalchhauser, W. Robien,J. Chem. Inf. Comput. Sci. 1985,25, 103.  相似文献   

16.
近红外光谱技术结合主成分分析法用于子宫内膜癌的诊断   总被引:3,自引:0,他引:3  
应用近红外光谱技术结合化学计量学方法研究了子宫内膜癌组织近红外光谱特征提取和早期诊断的可行性. 测定了154 例子宫内膜组织切片的近红外光谱, 选取适宜的波段和光谱预处理方法进行主成分分析, 很好地区分了癌变、增生和正常子宫内膜组织切片, 并且分辨出处于不同分化期的组织切片, 为子宫内膜癌的早期诊断提供了可靠依据. 该法快速、简便, 有望发展成为一种新型的肿瘤无创诊断方法.  相似文献   

17.
This article describes the classification of biodiesel samples using NIR spectroscopy and chemometric techniques. A total of 108 spectra of biodiesel samples were taken (being three samples each of four types of oil, cottonseed, sunflower, soybean and canola), from nine manufacturers. The measurements for each of the three samples were in the spectral region between 12,500 and 4000 cm−1. The data were preprocessed by selecting a spectral range of 5000-4500 cm−1, and then a Savitzky-Golay second-order polynomial was used with 21 data points to obtain second derivative spectra. Characterization of the biodiesel was done using chemometric models based on hierarchical cluster analysis (HCA), principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA) elaborated for each group of biodiesel samples (cotton, sunflower, soybean and canola). For the HCA and PCA, the formation of clusters for each group of biodiesel was observed, and SIMCA models were built using 18 spectral measurements for each type of biodiesel (training set), and nine spectral measurements to construct a classification set (except for the canola oil which used eight spectra). The SIMCA classifications obtained 100% accurate identifications. Using this strategy, it was feasible to classify biodiesel quickly and nondestructively without the need for various analytical determinations.  相似文献   

18.
To gain perspective on building full transferable libraries of MSn spectra from their diverse/numerous collections, a new library was built from 1723 MS>1 spectra (mainly MS2 spectra) of 490 pesticides and related compounds. Spectra acquired on different types of tandem instruments in various experimental conditions were extracted from 168 literature articles and Internet sites. Testing of the library was based on searches where 'unknown' and reference spectra originated from different sources (mainly from different laboratories) were cross‐compared. The NIST 05 MS2 library was added to the reference spectra. The library searches were performed with all the test spectra or were divided into different subsamples containing (a) various numbers of replicate spectra of test compounds or (b) spectra acquired from different instrument types. Thus, the dependence of true/false search (identification) result rates on different factors was explored. The percentage of 1st rank correct identifications (true positives) for the only 'unknown' mass spectrum and two and more reference spectra and matching precursor ion m/z values was 89%. For qualified matches, above the cut‐off match factor, that rate decreased to 80%. The corresponding rates based on the best match for two and more 'unknown' and reference spectral replicates were 89–94%. For quadrupole instruments, the rates were even higher: 91–95% (one 'unknown' spectrum) and 90–100% (two and more such spectra). This study shows that MS2 spectral libraries generated from the numerous literature/Internet sources are not less efficient for the goal of identification of unknown compounds including pesticides than very common EI‐MS1 libraries and are almost as efficient as the most productive from current MS2 spectral databases. Such libraries may be used as individual reference databases or supplements to large experimental spectral collections covering many groups of abundant compounds and different types of tandem mass spectrometers. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

19.
The reactions of 4-chloro-7-nitrobenzofurazan (NBD-Cl) with glyphosate (GLY) and with its main metabolite, aminomethylphosphonic acid (AMPA), have been studied. The resolution of binary mixtures of glyphosate and aminomethylphosphonic acid has been accomplished by partial least squares (PLS) multivariate calibration. The method of determination is based on the fluorescence emission of the derivatives formed in presence of NBD-Cl at 90 °C, in methanol and in basic medium. The dynamic ranges of the methods were comprised between 10 and 150 μg l−1 for GLY and between 10 and 200 μg l−1 for AMPA, being the detection limits 2 and 5.4 μg l−1 for GLY and AMPA, respectively. The total luminiscence information of the derivatives has been used to optimize the spectral data set to perform the calibration, by analysis of the three-dimensional excitation-emission matrices. A comparison between the predictive ability of the multivariate calibration method, partial least squares type 1 (PLS-1), on two spectral data sets, emission and synchronous spectra, has been performed. The PLS-1 method, applied to the emission spectra, has been selected as optimum. The proposed method has been applied to the simultaneous determination of GLY and AMPA in river water. For concentrations ranging from 100 to 600 μg l−1 of each compound in the samples, analytical recoveries range from 83 to 94% for GLY and from 104 to 120% for AMPA.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号