首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Direct chemometric interpretation of raw chromatographic data (as opposed to integrated peak tables) has been shown to be advantageous in many circumstances. However, this approach presents two significant challenges: data alignment and feature selection. In order to interpret the data, the time axes must be precisely aligned so that the signal from each analyte is recorded at the same coordinates in the data matrix for each and every analyzed sample. Several alignment approaches exist in the literature and they work well when the samples being aligned are reasonably similar. In cases where the background matrix for a series of samples to be modeled is highly variable, the performance of these approaches suffers. Considering the challenge of feature selection, when the raw data are used each signal at each time is viewed as an individual, independent variable; with the data rates of modern chromatographic systems, this generates hundreds of thousands of candidate variables, or tens of millions of candidate variables if multivariate detectors such as mass spectrometers are utilized. Consequently, an automated approach to identify and select appropriate variables for inclusion in a model is desirable. In this research we present an alignment approach that relies on a series of deuterated alkanes which act as retention anchors for an alignment signal, and couple this with an automated feature selection routine based on our novel cluster resolution metric for the construction of a chemometric model. The model system that we use to demonstrate these approaches is a series of simulated arson debris samples analyzed by passive headspace extraction, GC-MS, and interpreted using partial least squares discriminant analysis (PLS-DA).  相似文献   

2.
In the construction of activity prediction models, the use of feature ranking methods is a useful mechanism for extracting information for ranking features in terms of their significance to develop predictive models. This paper studies the influence of feature rankers in the construction of molecular activity prediction models; for this purpose, a comparative study of fourteen rankings methods for feature selection was conducted. The activity prediction models were constructed using four well-known classifiers and a wide collection of datasets. The ranking algorithms were compared considering the performance of these classifiers using different metrics and the consistency of the ranked features.  相似文献   

3.
A fast and objective chemometric classification method is developed and applied to the analysis of gas chromatography (GC) data from five commercial gasoline samples. The gasoline samples serve as model mixtures, whereas the focus is on the development and demonstration of the classification method. The method is based on objective retention time alignment (referred to as piecewise alignment) coupled with analysis of variance (ANOVA) feature selection prior to classification by principal component analysis (PCA) using optimal parameters. The degree-of-class-separation is used as a metric to objectively optimize the alignment and feature selection parameters using a suitable training set thereby reducing user subjectivity, as well as to indicate the success of the PCA clustering and classification. The degree-of-class-separation is calculated using Euclidean distances between the PCA scores of a subset of the replicate runs from two of the five fuel types, i.e., the training set. The unaligned training set that was directly submitted to PCA had a low degree-of-class-separation (0.4), and the PCA scores plot for the raw training set combined with the raw test set failed to correctly cluster the five sample types. After submitting the training set to piecewise alignment, the degree-of-class-separation increased (1.2), but when the same alignment parameters were applied to the training set combined with the test set, the scores plot clustering still did not yield five distinct groups. Applying feature selection to the unaligned training set increased the degree-of-class-separation (4.8), but chemical variations were still obscured by retention time variation and when the same feature selection conditions were used for the training set combined with the test set, only one of the five fuels was clustered correctly. However, piecewise alignment coupled with feature selection yielded a reasonably optimal degree-of-class-separation for the training set (9.2), and when the same alignment and ANOVA parameters were applied to the training set combined with the test set, the PCA scores plot correctly classified the gasoline fingerprints into five distinct clusters.  相似文献   

4.
This article describes the classification of biodiesel samples using NIR spectroscopy and chemometric techniques. A total of 108 spectra of biodiesel samples were taken (being three samples each of four types of oil, cottonseed, sunflower, soybean and canola), from nine manufacturers. The measurements for each of the three samples were in the spectral region between 12,500 and 4000 cm−1. The data were preprocessed by selecting a spectral range of 5000-4500 cm−1, and then a Savitzky-Golay second-order polynomial was used with 21 data points to obtain second derivative spectra. Characterization of the biodiesel was done using chemometric models based on hierarchical cluster analysis (HCA), principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA) elaborated for each group of biodiesel samples (cotton, sunflower, soybean and canola). For the HCA and PCA, the formation of clusters for each group of biodiesel was observed, and SIMCA models were built using 18 spectral measurements for each type of biodiesel (training set), and nine spectral measurements to construct a classification set (except for the canola oil which used eight spectra). The SIMCA classifications obtained 100% accurate identifications. Using this strategy, it was feasible to classify biodiesel quickly and nondestructively without the need for various analytical determinations.  相似文献   

5.
In this paper, a time-based multi-syringe flow injection (MSFI) approach is proposed for automated disk-based sorbent extraction of three nitro-substituted phenol isomers (2-, 3-, and 4-nitrophenol) followed by on-line simultaneous determination of individual species by diode-array spectrophotometry. The method involves the on-line enrichment of the targeted analytes from an acidic medium containing 0.1 mol L−1 HCl onto a co-polymeric sorbent material, and the concurrent removal of potentially interfering matrix components. The nitrophenol isomers are subsequently eluted with an alkaline solution (0.7 mol L−1 NaOH), whereupon the eluate is delivered to a diode-array spectrophotometer for recording of the spectral data in the UV-vis region. Deconvolution of strongly overlapped spectra was conducted with multivariate regression models based on multiple linear regression calibration. The analytical performance of the chemometric algorithm was characterized by relative prediction errors and recoveries.The MSFI manifold was coupled to a multiposition selection valve to set a rugged analyzer that ensures minimum operational maintenance via exploitation of membrane switching protocols. As compared with earlier methods for isolation/pre-concentration of nitro-substituted phenols based on liquid-liquid extraction, the proposed flow-through disk-based system should be regarded as an environmentally friendly approach because the use of harmful organic solvents is circumvented. Under the optimized chemical and physical variables, the 3σblank detection limits for 2-, 3-, and 4-nitrophenol were 1.2, 3.2 and 0.3 μmol L−1 for a sample loading volume of 1.5 mL, and the relative standard deviations were ≤5.0%. The flowing system, which is able to handle up to 135 samples automatically, was proven suitable for monitoring trace levels of the target isomers in mineral, tap, and seawater.  相似文献   

6.
A sequential injection analysis (SIA) method for the assay of promethazine hydrochloride, based on its oxidation by acidified cerium(IV), was optimized. Three chemometric approaches were applied: (i) factorial design (33 applied to surface plot and 23 applied to effect factor) for screening the potential interacting variables, (ii) univariant for optimizing insignificantly interacting variables and (iii) simplex for optimizing potentially interacting variables. The optimum experimental conditions were 30 μl of 0.38 mol/l sulphuric acid, 30 μl of 3.99 × 10− 3 mol/l cerium(IV), 20 μl of promethazine hydrochloride and 20 μl/s flow rate. The detection limit was 7.032 × 10− 5 mol/l and the calibration curve was linear up to 1.563 mol/l with a correlation coefficient 0.9998, accuracy range of 89.0-101.5%, relative standard deviation 1.1% (n = 10) and sample frequency at least 20 samples/h. The method was applied to tablet form and validated with the British Pharmacopoeia method. The developed SIA method is fully automated, reproducible, sensitive, rapid and reagent-saving, and therefore suitable for routine control in tablets form.  相似文献   

7.
A broad chemometric study was made on structural data from non-fused and non-pi-complexed pentafulvenes obtained both from the Cambridge structural database (CSD) and from several studies to synthesise new fulvene compounds. Three main differentiated pentafulvene groups can be established considering bond distances extracted from the CSD database. Structural data for the new 1-mono and 1,4-disubstituted 2,3,6-trioxypentafulvenes and 1,4-disubstituted-6-amino-2, 3-dioxypentafulvenes reveal different structural behaviours due to their high functionality. The chemometric techniques employed comprise principal component analysis, cluster analysis, selection of essential variables (Procrustes rotation) and isoprobability curves, all of them giving essentially the same general chemical conclusions.  相似文献   

8.
The essential oil components of geranium oil cultivated in center of Iran were identified and determined using gas chromatography-mass spectrometry data combined with the chemometric resolution techniques. A total of 61 components accounting for 91.51% were identified using similarity searches between the mass spectra and MS database. This number was extended to 85 components using chemometric techniques. Various chemometric methods such as morphological scores, simplified Borgen method (SBM) and fixed size moving window evolving factor analysis (FSMWEFA) were used for determining the number of components, pure variables, zero concentration and selective regions. Then the overlapping peak clusters were resolved into pure chromatograms and pure mass spectra using heuristic evolving latent projections (HELP) method. A characteristic feature of the Iranian geranium oil is the absence of 10-epi-gamma-eudesmol in its constituents compared with the oil from northern and southern parts of India. The results of this work show that combination of hyphenated chromatographic methods and resolution techniques provide a complementary method for accurate analysis of essential oils.  相似文献   

9.
Oilseeds with modified fatty acid profiles have been the genetic alternative for high quality vegetable oil for food and biodiesel applications. They can provide stable, functional oils for the food industry, without the hydrogenation process that produces trans-fatty acid, which has been linked to cardiovascular disease. High yield and high quality oilseeds are also necessary for the success of biodiesel programs, as polyunsatured or saturated fatty acid oil produces biofuel with undesirable properties. In this paper, a rapid and automated low resolution NMR method to select intact oilseeds with a modified fatty acid profile is introduced, based on 1H transverse relaxation time (T2). The T2 weighted NMR signal, obtained by a CPMG pulse sequence and processed by chemometric methods was able to determine the oil quality in intact seeds by its fatty composition, cetane number, iodine value and kinematic viscosity with a correlation coefficient r > 0.9. The automated system has the potential to analyze more than 1000 samples per hour and is a powerful tool to speed up the selection of high quality oilseeds for food and biodiesel applications.  相似文献   

10.
In this paper, the guidelines for the interpretation of the results of quantitative structure-retention relationship (QSRR) modeling, comparison and assessment of the established models, as well as the selection of the best and most consistent QSRR model were presented. Various linear and non-linear chemometric regression techniques were used to build QSRR models for chromatographic lipophilicity prediction of a series of triazole, tetrazole, toluenesulfonylhydrazide, nitrile, dinitrile and dione steroid derivatives. Linear regression (LR) and multiple linear regression (MLR) were used as linear techniques, while artificial neural networks (ANNs) were applied as non-linear modeling techniques. Generated models were statistically evaluated applying different approaches for model comparison and ranking. Two non-parametric methods (generalized pair correlation method – GPCM and sum of ranking differences – SRD) were used for model ranking and assessment of the best model for chromatographic lipophilicity prediction using experimentally obtained logk values and row average as a reference ranking. Both, GPCM and SRD, provided highly similar model choice regardless on a different background. These results are in agreement with the classical approach.  相似文献   

11.
A photochemically induced fluorescence system combined with second-order chemometric analysis for the determination of the anticonvulsant carbamazepine (CBZ) is presented. CBZ is a widely used drug for the treatment of epilepsy and is included in the group of emerging contaminant present in the aquatic environment. CBZ is not fluorescent in solution but can be converted into a fluorescent compound through a photochemical reaction in a strong acid medium. The determination is carried out by measuring excitation–emission photoinduced fluorescence matrices of the products formed upon ultraviolet light irradiation in a laboratory-constructed reactor constituted by two simple 4 W germicidal tubes. Working conditions related to both the reaction medium and the photoreactor geometry are optimized by an experimental design. The developed approach enabled the determination of CBZ at trace levels without the necessity of applying separation steps, and in the presence of uncalibrated interferences which also display photoinduced fluorescence and may be potentially present in the investigated samples. Different second-order algorithms were tested and successful resolution was achieved using multivariate curve resolution-alternating least-squares (MCR-ALS). The study is employed for the discussion of the scopes and yields of each of the applied second-order chemometric tools. The quality of the proposed method is probed through the determination of the studied emerging pollutant in both environmental and drinking water samples. After a pre-concentration step on a C18 membrane using 50.0 mL of real water samples, a prediction relative error of 2% and limits of detection and quantification of 0.2 and 0.6 ng mL−1 were respectively obtained.  相似文献   

12.
A chemometric approach was applied for the optimization of the extraction and separation of the antihypertensive drug eprosartan from human plasma samples. MultiSimplex program was used to optimize the HPLC-UV method due to the number of experimental and response variables to be studied. The measured responses were the corrected area, the separation of eprosartan chromatographic peak from plasma interferences peaks and the retention time of the analyte.The use of an Atlantis dC18, 100 mm × 3.9 mm i.d. chromatographic column with a 0.026% trifluoroacetic acid (TFA) in the organic phase and 0.031% TFA in the aqueous phase, an initial composition of 80% aqueous phase in the mobile phase, a stepness of acetonitrile of 3% during the gradient elution mode with a flow rate of 1.25 mL/min and a column temperature of 35 ± 0.2 °C allowed the separation of eprosartan and irbesartan used as internal standard from plasma endogenous compounds. In the solid phase extraction procedure, experimental design was used in order to achieve a maximum recovery percentage. Firstly, the significant variables were chosen by way of fractional factorial design; then, a central composite design was run to obtain the more adequate values of the significant variables. Thus, the extraction procedure for spiked human plasma samples was carried out using C8 cartridges, phosphate buffer pH 2 as conditioning agent, a drying step of 10 min, a washing step with methanol-phosphate buffer (20:80, v/v) and methanol as eluent liquid. The SPE-HPLC-UV developed method allowed the separation and quantitation of eprosartan from human plasma samples with an adequate resolution and a total analysis time of 1 h.  相似文献   

13.
The aim of this study was to develop a methodology using Raman hyperspectral imaging and chemometric methods for identification of pre- and post-blast explosive residues on banknote surfaces. The explosives studied were of military, commercial and propellant uses. After the acquisition of the hyperspectral imaging, independent component analysis (ICA) was applied to extract the pure spectra and the distribution of the corresponding image constituents. The performance of the methodology was evaluated by the explained variance and the lack of fit of the models, by comparing the ICA recovered spectra with the reference spectra using correlation coefficients and by the presence of rotational ambiguity in the ICA solutions. The methodology was applied to forensic samples to solve an automated teller machine explosion case. Independent component analysis proved to be a suitable method of resolving curves, achieving equivalent performance with the multivariate curve resolution with alternating least squares (MCR-ALS) method. At low concentrations, MCR-ALS presents some limitations, as it did not provide the correct solution. The detection limit of the methodology presented in this study was 50 μg cm−2.  相似文献   

14.
In developing partial least squares calibration models, selecting the number of latent variables used for their construction to minimize both model bias and model variance remains a challenge. Several metrics exist for incorporating these trade‐offs, but the cost of model parsimony and the potential for underfitting on achievable prediction errors are difficult to anticipate. We propose a metric that penalizes growing model variance against decreasing bias as additional latent variables are added. The magnitude of the penalty is scaled by a user‐defined parameter that is formulated to provide a constraint on the fractional increase in root mean square error of cross‐validation (RMSECV) when selecting a parsimonious model over the conventional minimum RMSECV solution. We evaluate this approach for quantification of four organic functional groups using 238 laboratory standards and 750 complex atmospheric organic aerosol mixtures with mid‐infrared spectroscopy. Parametric variation of this penalty demonstrates that increase in prediction errors due to underfitting is bounded by the magnitude of the penalty for samples similar to laboratory standards used for model training and validation. Imposing an ensemble of penalties corresponding to a 0–30% allowable increase in RMSECV through sum of ranking differences leads to the selection of a model that increases the actual RMSECV up to 20% for laboratory standards but achieves an 85% reduction in the mean error in predicted concentrations for environmental mixtures. Partial least squares models developed with laboratory mixtures can provide useful predictions in complex environmental samples, but may benefit from protection against overfitting. © 2015 The Authors. Journal of Chemometrics published by John Wiley & Sons Ltd.  相似文献   

15.
This work evaluates the use of near-infrared (NIR) overtone regions to determine biodiesel content, as well potential adulteration with vegetable oil, in diesel/biodiesel blends. For this purpose, NIR spectra (12,000–6300 cm−1) were obtained using three different optical path lengths: 10 mm, 20 mm and 50 mm. Two strategies of regression with variable selection were evaluated: partial least squares (PLS) with significant regression coefficients selected by Jack-Knife algorithm (PLS/JK) and multiple linear regression (MLR) with wavenumber selection by successive projections algorithm (MLR/SPA). For comparison, the results obtained by using PLS full-spectrum models are also presented. In addition, the performance of models using NIR (1.0 mm optical path length, 9000–4000 cm−1) and MIR (UATR – universal attenuated total reflectance, 4000–650 cm−1) spectral regions was also investigated. The results demonstrated the potential of overtone regions with MLR/SPA regression strategy to determine biodiesel content in diesel/biodiesel blends, considering the possible presence of raw oil as a contaminant. This strategy is simple, fast and uses a fewer number of spectral variables. Considering this, the overtone regions can be useful to develop low cost instruments for quality control of diesel/biodiesel blends, considering the lower cost of optical components for this spectral region.  相似文献   

16.
The principle of sequential injection (SI) was exploited to develop a fully automated pre-column derivatization procedure combined on-line to liquid chromatography (LC). Using SI-LC derivatization 14 amino acids were determined fluorimetrically in pharmaceuticals with o-phthaldialdehyde (OPA) as the derivatization reagent. The SI system was used for the handling of samples and reagents, on-line mixing and introduction to the LC injection system. Chemical (pH and reagents concentrations) and instrumental variables (sample and reagent volumes, reaction time and flow rate) were optimized to attain the highest reaction yield and detector signal. Reversed phase chromatographic resolution of 14 amino acids was achieved within 35 min using gradient elution. The automated operation of the coupled SI-LC system resulted in very satisfactory performance. The method was applied for the simultaneous determination of amino acids in pharmaceutical formulations.  相似文献   

17.
Total order ranking (TOR) strategies, which are mathematically based on elementary methods of discrete mathematics, seem to be attractive and simple tools for performing data analysis. Moreover order-ranking strategies seem to be a very useful tool not only to perform data exploration but also to develop order ranking models, a possible alternative to conventional quantitative structure–activity relationship (QSAR) methods. In fact, when data material is characterised by uncertainties, order methods can be used as alternative to statistical methods such as multilinear regression (MLR), because they do not require specific functional relationships between the independent and dependent variables (responses). A ranking model is a relationship between a set of dependent attributes, experimentally investigated, and a set of independent attributes, i.e. model attributes, which are calculated attributes. As in regression and classification models, the variable selection model is one of the main steps in finding predictive models. In this work the genetic algorithm–variable subset selection (GA–VSS) approach is proposed as the variable selection method for searching for the best ranking models within a wide set of variables. The models based on the selected subsets of variables are compared with the experimental ranking and evaluated by the Spearmans rank index. A case study application is presented on a TOR model developed for polychlorinated biphenyl (PCB) compounds, which have been analysed according to some of their physicochemical properties which play an important role in their environmental impact.  相似文献   

18.
Magnetic beads have served as a conventional bioassay platform in biotechnology. In this study, a fully automated immunoassay was performed using novel nano- and microbead-composites constructed by assembling nano-magnetic beads onto polystyrene microbeads, designated ‘Beads on Beads’. Nano-sized bacterial magnetic particles (BacMPs) displaying the immunoglobulin G (IgG)-binding domain of protein A (ZZ domain) were used for the construction of ‘Beads on Beads’ via the interaction of biotin-streptavidin. The efficient assembly of ‘Beads on Beads’ was performed by gradual addition of biotin-labeled BacMPs onto streptavidin-coated polystyrene microbeads. Approximately 2000 BacMPs were uniformly assembled on a single microbead without aggregation. The constructed ‘Beads on Beads’ were magnetized and separated from the suspension by using an automated magnetic separation system with a higher efficiency than BacMPs alone. Furthermore, fully automated detection of prostate-specific antigens was performed with the detection limit of 1.48 ng mL−1. From this preliminary assay, it can be seen that ‘Beads on Beads’ could be a powerful tool in the development of high-throughput, fully automated multiplexed bioassays.  相似文献   

19.
A new variable selection algorithm is described, based on ant colony optimization (ACO). The algorithm aim is to choose, from a large number of available spectral wavelengths, those relevant to the estimation of analyte concentrations or sample properties when spectroscopic analysis is combined with multivariate calibration techniques such as partial least-squares (PLS) regression. The new algorithm employs the concept of cooperative pheromone accumulation, which is typical of ACO selection methods, and optimizes PLS models using a pre-defined number of variables, employing a Monte Carlo approach to discard irrelevant sensors. The performance has been tested on a simulated system, where it shows a significant superiority over other commonly employed selection methods, such as genetic algorithms. Several near infrared spectroscopic experimental data sets have been subjected to the present ACO algorithm, with PLS leading to improved analytical figures of merit upon wavelength selection. The method could be helpful in other chemometric activities such as classification or quantitative structure-activity relationship (QSAR) problems.  相似文献   

20.
The essential oils extracted from Coriandrum sativum L. were analyzed by GC-MS coupled with chemometric resolution methods. Through the chemometric resolution methods, peak clusters were uniquely resolved into the pure chromatographic profiles and mass spectra of each component. Qualitative analysis was performed by comparing the pure mass spectra with those in the NIST 05 mass spectral library. Quantitative analysis was performed using the total volume integration method. A total of 118 constituents were detected, of which 104 were identified, accounting for 97.27% of the total content. The results indicate that GC-MS combined with chemometric resolution methods can greatly enhance the capability of separation and the reliability of qualitative and quantitative results. The combined method is an economical and accurate approach for the rapid analysis of the complex essential oil samples in Coriandrum sativum L.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号