首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 820 毫秒
1.
Computational models to predict the developmental toxicity of compounds are built on imbalanced datasets wherein the toxicants outnumber the non-toxicants. Consequently, the results are biased towards the majority class (toxicants). To overcome this problem and to obtain sensitive but also accurate classifiers, we followed an integrated approach wherein (i) Synthetic Minority Over Sampling (SMOTE) is used for re-sampling, (ii) genetic algorithm (GA) is used for variable selection and (iii) support vector machines (SVM) is used for model development. The best model, M3, has (i) sensitivity (SE) = 85.54% and specificity (SP) = 85.62% in leave-one-out validation, (ii) classification accuracy of the training set = 99.67%, (iii) classification accuracy of the test set = 92.59%; and (iv) sensitivity = 92.68, specificity = 92.31 on the test set. Consensus prediction based on models M3–M5 improved these percentages by 5% over M3. From the analysis of results we infer that data imbalance in toxicity studies can be effectively addressed by the application of re-sampling techniques.  相似文献   

2.
Variable predictive model based class discrimination (VPMCD) algorithm is proposed as an effective protein secondary structure classification tool. The algorithm mathematically represents the characteristics amino acid interactions specific to each protein structure and exploits them further to distinguish different structures. The new concept and the VPMCD classifier are established using well-studied datasets containing four protein classes as benchmark. The protein samples selected from SCOP and PDB databases with varying homology (25-100%) and non-uniform distribution of class samples provide challenging classification problem. The performance of the new method is compared with advanced classification algorithms like component coupled, SVM and neural networks. VPMCD provides superior performance for high homology datasets. 100% classification is achieved for self-consistency test and an improvement of 5% prediction accuracy is obtained during Jackknife test. The sensitivity of the new algorithm is investigated by varying model structures/types and sequence homology. Simpler to implement VPMCD algorithm is observed to be a robust classification technique and shows potential for effective extensions to other clinical diagnosis and data mining applications in biological systems.  相似文献   

3.
Sankaran S  Ehsani R  Etxeberria E 《Talanta》2010,83(2):574-581
In recent years, Huanglongbing (HLB) also known as citrus greening has greatly affected citrus orchards in Florida. This disease has caused significant economic and production losses costing about $750/acre for HLB management. Early and accurate detection of HLB is a critical management step to control the spread of this disease. This work focuses on the application of mid-infrared spectroscopy for the detection of HLB in citrus leaves. Leaf samples of healthy, nutrient-deficient, and HLB-infected trees were processed in two ways (process-1 and process-2) and analyzed using a rugged, portable mid-infrared spectrometer. Spectral absorbance data from the range of 5.15-10.72 μm (1942-933 cm−1) were preprocessed (baseline correction, negative offset correction, and removal of water absorbance band) and used for data analysis. The first and second derivatives were calculated using the Savitzky-Golay method. The preprocessed raw dataset, first derivatives dataset, and second derivatives dataset were first analyzed by principal component analysis. Then, the selected principal component scores were classified using two classification algorithms, quadratic discriminant analysis (QDA) and k-nearest neighbor (kNN). When the spectral data from leaf samples processed using process-1 were used for data analysis, the kNN-based algorithm yielded higher classification accuracies (especially nutrient-deficient leaf class) than that of the other spectral data (process-2). The performance of the kNN-based algorithm (higher than 95%) was better than the QDA-based algorithm. Moreover, among different types of datasets, preprocessed raw dataset resulted in higher classification accuracies than first and second derivatives datasets. The spectral peak in the region of 9.0-10.5 μm (952-1112 cm−1) was found to be distinctly different between the healthy and HLB-infected leaf samples. This carbohydrate peak could be attributed to the starch accumulation in the HLB-infected citrus leaves. Thus, this study demonstrates the applicability of mid-infrared spectroscopy for HLB detection in citrus.  相似文献   

4.
Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.  相似文献   

5.
This paper proposes a new hybrid search technique for feature (gene) selection (FS) using Independent component analysis (ICA) and Artificial Bee Colony (ABC) called ICA + ABC, to select informative genes based on a Naïve Bayes (NB) algorithm. An important trait of this technique is the optimization of ICA feature vector using ABC. ICA + ABC is a hybrid search algorithm that combines the benefits of extraction approach, to reduce the size of data and wrapper approach, to optimize the reduced feature vectors. This hybrid search technique is facilitated by evaluating the performance of ICA + ABC on six standard gene expression datasets of classification. Extensive experiments were conducted to compare the performance of ICA + ABC with the results obtained from recently published Minimum Redundancy Maximum Relevance (mRMR) +ABC algorithm for NB classifier. Also to check the performance that how ICA + ABC works as feature selection with NB classifier, compared the combination of ICA with popular filter techniques and with other similar bio inspired algorithm such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The result shows that ICA + ABC has a significant ability to generate small subsets of genes from the ICA feature vector, that significantly improve the classification accuracy of NB classifier compared to other previously suggested methods.  相似文献   

6.
Locally linear embedding (LLE) is introduced here as a nonlinear compression method for near infrared reflectance spectra of endometrial tissue sections. The LLE has been evaluated by using support vector machine (SVM) classifiers and the projected difference resolution (PDR) method. Synthetic data sets devised to resemble near-infrared spectra of tissue samples were used to characterize the performance of the LLE. The LLE was compared using principal component compression (PCC) method to evaluate nonlinear and linear compression. For a set of real tissue samples, if the compressed data were not range-scaled prior to SVM classification, the principal component compressed data gave an average prediction rate of 39 ± 2% while the LLE 94 ± 2%; if range-scaled after compression, the LLE and PCC performed evenly, with maximum average prediction values of 94 ± 2% and 93 ± 2%, respectively. The SVM without compression yielded a classification rate of 92 ± 2%. The prediction accuracy was consistent with PDR results. Without the second derivative preprocessing, the classification rates were 90 ± 3%, 89 ± 2%, and 78 ± 2% for the LLE compressed, the PCC, and no compression classifications by the SVM, respectively.  相似文献   

7.
A novel application of second-order calibration method based on an alternating penalty trilinear decomposition (APTLD) algorithm is presented to treat the data from high performance liquid chromatography-diode array detection (HPLC-DAD). The method makes it possible to accurately and reliably analyze atrazine (ATR), ametryn (AME) and prometryne (PRO) contents in soil, river sediment and wastewater samples. Satisfactory results are obtained although the elution and spectral profiles of the analytes are heavily overlapped with the background in environmental samples. The obtained average recoveries for ATR, AME and PRO are 99.7 ± 1.5, 98.4 ± 4.7 and 97.0 ± 4.4% in soil samples, 100.1 ± 3.2, 100.7 ± 3.4 and 96.4 ± 3.8% in river sediment samples, and 100.1 ± 3.5, 101.8 ± 4.2 and 101.4 ± 3.6% in wastewater samples, respectively. Furthermore, the accuracy and precision of the proposed method are evaluated with the elliptical joint confidence region (EJCR) test. It lights a new avenue to determine quantitatively herbicides in environmental samples with a simple pretreatment procedure and provides the scientific basis for an improved environment management through a better understanding of the wastewater-soil-river sediment system as a whole.  相似文献   

8.
The dispersive liquid-liquid microextraction (DLLME) combined with high performance liquid chromatography-inductively coupled plasma mass spectrometry for the speciation of mercury in water samples was described. Firstly methylmercury (MeHg+) and mercury (Hg2+) were complexed with sodium diethyldithiocarbamate, and then the complexes were extracted into carbon tetrachloride by using DLLME. Under the optimized conditions, the enrichment factors of 138 and 350 for MeHg+ and Hg2+ were obtained from only 5.00 mL sample solution. The detection limits of the analytes (as Hg) were 0.0076 ng mL−1 for MeHg+ and 0.0014 ng mL−1 for Hg2+, respectively. The relative standard deviations for ten replicate measurements of 0.5 ng mL−1 MeHg+ and Hg2+ were 6.9% and 4.4%, respectively. Standard reference material of seawater (GBW(E)080042) was analyzed to verify the accuracy of the method and the results were in good agreement with the certified values. Finally, the developed method was successfully applied for the speciation of mercury in three environmental water samples.  相似文献   

9.
Li YN  Wu HL  Qing XD  Nie CC  Li SF  Yu YJ  Zhang SR  Yu RQ 《Talanta》2011,85(1):325-332
A rapid non-separative spectrofluorometric method based on the second-order calibration of excitation-emission matrix (EEM) fluorescence was proposed for the determination of napropamide (NAP) in soil, river sediment, and wastewater as well as river water samples. With 0.10 mol L−1 sodium citrate-hydrochloric acid (HCl) buffer solution of pH 2.2, the system of NAP has a large increase in fluorescence intensity. To handle the intrinsic fluorescence interferences of environmental samples, the alternating penalty trilinear decomposition (APTLD) algorithm as an efficient second-order calibration method was employed. Satisfactory results have been achieved for NAP in complex environmental samples. The limit of detection obtained for NAP in soil, river sediment, wastewater and river water samples were 0.80, 0.24, 0.12, 0.071 ng mL−1, respectively. Furthermore, in order to fully investigate the performance of second-order calibration method, we test the second-order calibration method using different calibration approaches including the single matrix model, the intra-day various matrices model and the global model based on the APTLD algorithm with nature environmental datasets. The results showed the second-order calibration methods also enable one or more analyte(s) of interest to be determined simultaneously in the samples with various types of matrices. The maintenance of second-order advantage has been demonstrated in simultaneous determinations of the analyte of interests in the environmental samples of various matrices.  相似文献   

10.
Narcise CI  Coo LD  Del Mundo FR 《Talanta》2005,68(2):298-304
A flow injection-column preconcentration-hydride generation atomic absorption spectrophotometric (FI-column-HGAAS) method was developed for determining μg/l levels of As(III) and As(V) in water samples, with simultaneous preconcentration and speciation. The speciation scheme involved determining As(V) at neutral pH and As(III + V) at pH 12, with As(III) obtained by difference. The enrichment factor (EF) increased with increase in sample loading volume from 2.5 to 10 ml, and for preconcentration using the chloride-form anion exchange column, EFs ranged from 5 to 48 for As(V) and 4 to 24 for As(III + V), with corresponding detection limits of 0.03-0.3 and 0.07-0.3 μg/l. Linear concentration range (LCR) also varied with sample loading volume, and for a 5-ml sample was 0.3-5 and 0.2-8 μg/l for As(V) and As(III + V), respectively. Sample throughput, which decreased with increase in sample volume, was 8-17 samples/h. For the hydroxide-form column, the EFS for 2.5-10 ml samples were 3-23 for As(V) and 2-15 for As(III + V), with corresponding detection limits of 0.07-0.4 and 0.1-0.5 μg/l. The LCR for a 5-ml sample was 0.3-10 μg/l for As(V) and 0.2-20 μg/l for As(III + V). Sample throughput was 10-20 samples/h. The developed method has been effectively applied to tap water and mineral water samples, with recoveries ranging from 90 to 102% for 5-ml samples passed through the two columns.  相似文献   

11.
Sensitive detection of tetrabromobisphenol A (TBBPA) and its derivatives, a group of emerging toxic contaminants, is highly necessitated in environmental investigation. Herein a novel analytical strategy based on reactive extractive electrospray ionization (EESI) tandem mass spectrometry for detection of tetrabromobisphenol A bis(2-hydroxyethyl ether) (TBBPA-BHEE), tetrabromobisphenol A bis(glycidyl ether) (TBBPA-BGE), tetrabromobisphenol A bis(allylether) (TBBPA-BAE), and tetrabromobisphenol S bis(allylether) (TBBPS-BAE) in industrial waste water samples was developed. Active silver cations (Ag+), generated by electrospraying a silver nitrate methanol solution (10 mg L−1), collides the neutral TBBPA derivatives molecules in the EESI source to form [M + Ag]+ complexes of the analytes under the ambient conditions. Upon collision-induced dissociation (CID), characteristic fragments of the [M + Ag]+ complexes were identified for confident and sensitive detection of the four TBBPA derivatives. Under the optimized experimental conditions, the instrumental limits of detection (LODs) of TBBPA-BHEE, TBBPA-BGE, TBBPA-BAE and TBBPS-BAE were 0.37, 0.050, 0.76, and 4.6 μg L−1, respectively. The linear ranges extended to 1000 μg L−1 (R2 ≥ 0.9919), and the relative standard deviations (RSDs), inter-day variation and intra-day variation were less than 7.8% (n = 9), 10.0% (n = 5), and 14.8% (n = 1 per day for 5 days) for all derivatives. TBBPA derivative manufacturing industrial waste water, river water and tap water samples were fast analyzed with the proposed method. The contents of TBBPA derivatives were various in the collected samples, with the highest 19.9 ± 0.3 μg L−1 of TBBPA-BAE in the waste water samples.  相似文献   

12.
This paper presents two methodologies for monitoring the service condition of diesel-engine lubricating oils on the basis of infrared spectra. In the first approach, oils samples are discriminated into three groups, each one associated to a given wear stage. An algorithm is proposed to select spectral variables with good discriminant power and small collinearity for the purpose of discriminant analysis classification. As a result, a classification accuracy of 93% was obtained both in the middle (MIR) and near-infrared (NIR) ranges. The second approach employs multivariate calibration methods to predict the viscosity of the lubricant. In this case, the use of absorbance measurements in the NIR spectral range was not successful, because of experimental difficulties associated to the presence of particulate matter. Such a problem was circumvented by the use of attenuated total reflectance (ATR) measurements in the MIR spectral range, in which an RMSEP of 3.8 cSt and a relative average error of 3.2% were attained.  相似文献   

13.
14.
针对近红外光谱分析技术中模型通用性较差的问题,提出了一种新的模型传递方法——最小角回归结合一元线性直接校正法(Least angle regression combined simple linear regression direct standardization,LARSLRDS)。该方法首先采用小波变换对样品光谱数据进行预处理,然后利用LAR实现样品全谱区光谱特征波长点的筛选,最后利用SLRDS对筛选出来的变量进行校正。采用汽油和药品样本的近红外光谱数据验证LAR-SLRDS性能,汽油数据集C7、C8、C9和C10成分的光谱差异为0. 002 8、0. 002 7、0. 002 6和0. 002 7,预测标准差为0. 410 6、0. 849 2、1. 034 9和1. 215 8;药品数据集活性、硬度和重量成分的光谱差异为0. 030 0、0. 031 8和0. 033 6,预测标准差为1. 933 8、0. 440 2和2. 130 9。结果表明,LAR-SLRDS算法不仅能够消除主、从仪器光谱之间存在的差异,实现模型传递,而且能够提高PLS定量模型的准确性和稳定性,具有广泛的应用潜力。  相似文献   

15.
A systematic investigation of the CPA model’s performance within solid–liquid equilibria (SLE) in binary mixtures (methane + ethane, methane + heptane, methane + benzene, methane + CO2, ethane + heptane, ethane + CO2, 1-propanol + 1,4-dioxane, ethanol + water, 2-propanol + water) is presented. The results from the binary mixtures are used to predict SLE behaviour in ternary mixtures (methane + ethane + heptane, methane + ethane + CO2). Our results are compared with experimental data found in the literature.  相似文献   

16.
Mariet C  Belhadj O  Leroy S  Carrot F  Métrich N 《Talanta》2008,77(1):445-450
In order to implement a simpler, less expensive and more safe sample dissolution procedure, we have substituted the HF-HClO4 mixture by NH4F. By testing three certified reference materials, lichen 336, basalt BE-N, soil 7, it was found that the three-reagents digestion without HF and HClO4 (HNO3 + H2O2 + NH4F was used) was very effective for the pretreatment of ICP-MS measurement. The comparison was based on the measurement results and their uncertainties. All are reference material for amount contents of different trace elements. The accuracy and precision of the developed method were tested by replicate analyses of reference samples of established element contents. The accuracy of the data as well as detection limits (LODs) vary among elements but are usually very good (accuracy better than 8%, LODs usually below 1 μg/g in solids). ICP-MS capabilities enable us to determine routinely 13 and 16 minor and trace elements in basalt and soil.  相似文献   

17.
Rapid pyrolysis of 6 biomass/coal blends (1:4, wt) including rice straw + bituminous (RS + B), rice straw + anthracite (RS + A), chinar leaves + bituminous (CL + B), chinar leaves + anthracite (CL + A), pine sawdust + bituminous (PS + B), and pine sawdust + anthracite (PS + A) was carried out in a high-frequency magnetic field based furnace at 600-1200 °C. The reactor could not only achieve high heating rates of fuel samples but also make biomass and coal particles contact well; secondary reactions of primary products during rapid pyrolysis can also be efficiently reduced. By comparing nitrogen distributions in products of blends (experimental values) with those of the sums of individual biomass and coal (weighted values), nitrogen conversion characteristics under rapid pyrolysis of biomass/coal blends were investigated. Results show that, biomass particles in blends lead to higher experimental char-N yields than the weighted values during rapid pyrolysis of biomass/anthracite blends. The decreased heating rates of both biomass and coal particles caused by the low packing densities of biomass may be the reason. For blends of CL + B in which packing density of chinar leaves is high, and for PS + B during pyrolysis of which melting and shrinkage happen to pine sawdust, both biomass and coal particles can obtain high heating rates, synergies can be found to promote nitrogen release from fuel samples and decrease char-N yields under all the conditions. But the low fluidity and not easily collapsed carbon skeletons of rice straw make the heating rates of rice straw and bituminous particles in RS + B lower than those of CL + B and PS + B, and weaker synergies can be found from char-N yields of RS + B. The synergies can obviously be found to decrease the (NH3 + HCN)-N yields and make more nitrogen convert to N2 except for those of several low-temperature conditions (600-700 °C). Under the low-temperature (600-700 °C) condition, synergies make molar ratios of HCN-N/NH3-N higher than those of the weighted values.  相似文献   

18.
A simple and sensitive method with a fast sample preparation procedure is proposed for the determination of mercury species in plasma/serum. The method combines online high-performance liquid chromatography separation, Hg cold-vapor formation and inductively coupled plasma mass spectrometry detection. Prior to analysis, plasma (250 μL) was accurately pipetted into 15 mL conical tubes. Then, an extractant solution containing mercaptoethanol, L-cysteine and HCl was added to the samples following sonication for 10 min. Quantitative mercury extraction was achieved with the proposed procedure. Separation of mercury species was accomplished in less than 8 min on a C8 reverse phase column with a mobile phase containing 3% v/v methanol + 97% v/v (0.5% v/v 2-mercaptoethanol + 0.05% v/v formic acid). The method detection limits were found to be 12 ng L−1, 5 ng L−1 and 4 ng L−1 for inorganic mercury, ethylmercury and methylmercury, respectively. Method accuracy is traceable to Standard Reference Material (SRM) 966 Toxic Metals in Bovine Blood from NIST. Additional validation was provided by the analysis of a secondary reference serum sample from the INSQ-Canada. Finally, the method was successfully applied for the speciation of mercury in plasma samples collected from volunteers exposed to methylmercury through fish consumption. For the first time to our knowledge, levels of different species of Hg in plasma samples from riverside populations exposed to MeHg were determined.  相似文献   

19.
In this work, the reaction O(1D) + H2 → OH + H has been theoretically studied using the quasiclassical trajectory (QCT) method developed by Han and co-workers. All the quasiclassical trajectory calculations are performed on the DK (Dobbyn and Knowles) potential energy surface (PES). The vector correlation information on the reaction O(1D) + H2 → OH + H has been obtained. It has been demonstrated that the product alignment is sensitive to the reactant vibrational quantum number (v) at collision energy of 19 kcal/mol. Moreover, with increasing the value of v, backward scattering becomes weaker and forward scattering becomes stronger.  相似文献   

20.
Femtosecond laser-induced breakdown spectroscopy (fs-LIBS) has been used for the first time for quantitative determination of nutrients in plant materials from different crops. A highly heterogeneous population of 31 samples, previously analyzed by inductively coupled plasma optical emission spectroscopy, covering a wide range of matrices was interrogated. To tackle the analysis, laser-induced plasmas under argon atmosphere of pellets prepared from sieved cryogenically ground leaves were studied. Predictive functions based on univariate and multivariate modeling of optical emissions associated to macro- (Ca, Mg, and P) and micronutrients (Cu, Fe, Mn and Zn) were designed. Hierarchical cluster analysis was performed to select representative calibration (ncal = 17) and validation (nval = 14) datasets. The predictive performance of calibration functions over fs-LIBS data was compared with that attained on spectral information from nanosecond LIBS (ns-LIBS) operating at different wavelengths (1064 nm, 532 nm, and 266 nm). Findings established higher accuracy and less uncertainty on mass fractions quantification from fs-LIBS, whatever the modeling approach. Quality coefficients below 20% for the accuracy error on mass fractions’ prediction in unknown samples, and residual predictive deviations in general above 5, were obtained. In contrast, only multivariate modeling satisfactorily handled the non-linear variations of emissions in ns-LIBS, leading to 2-fold decrease in the root mean square error of prediction (RMSEP) of Ca, Mg, P, Cu, Fe, Mn and Zn in comparison with the univariate approach. But still, an averaged quality coefficient about 35% and residual predictive deviations below 3 were found. Similar predictive capabilities were observed when changing the laser wavelength. Although predicted values by ns-LIBS multivariate modeling exhibit better agreement with reference mass fractions as compared to univariate functions, fs-LIBS conducts better quantification of nutrients in plant materials since it is less dependent on the chemical composition of the matrices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号