首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
He J  Fang G  Deng Q  Wang S 《Analytica chimica acta》2011,704(1-2):57-62
The classification and regression trees (CART) possess the advantage of being able to handle large data sets and yield readily interpretable models. A conventional method of building a regression tree is recursive partitioning, which results in a good but not optimal tree. Ant colony system (ACS), which is a meta-heuristic algorithm and derived from the observation of real ants, can be used to overcome this problem. The purpose of this study was to explore the use of CART and its combination with ACS for modeling of melting points of a large variety of chemical compounds. Genetic algorithm (GA) operators (e.g., cross averring and mutation operators) were combined with ACS algorithm to select the best solution model. In addition, at each terminal node of the resulted tree, variable selection was done by ACS-GA algorithm to build an appropriate partial least squares (PLS) model. To test the ability of the resulted tree, a set of approximately 4173 structures and their melting points were used (3000 compounds as training set and 1173 as validation set). Further, an external test set containing of 277 drugs was used to validate the prediction ability of the tree. Comparison of the results obtained from both trees showed that the tree constructed by ACS-GA algorithm performs better than that produced by recursive partitioning procedure.  相似文献   

2.
(1)H-Nuclear magnetic resonance (NMR) spectroscopy was used to detect metabolic profiles of wheat flour samples of different geographical and botanical origin. The NMR profiles were analyzed by multivariate statistical techniques in order to establish the origin of the samples. A linear model, able to discriminate among three different locations, was built achieving a prediction level of about 80% of correctly assigned samples. The principal classes of compounds responsible for the geographic origin discrimination were individuated in aromatic compounds and amino acids. The statistical modeling also indicated that botanical origin information is very poor in the NMR profiles of the analyzed wheat samples.  相似文献   

3.
The discovery of materials is increasingly guided by quantum‐mechanical crystal‐structure prediction, but the structural complexity in bulk and nanoscale materials remains a bottleneck. Here we demonstrate how data‐driven approaches can vastly accelerate the search for complex structures, combining a machine‐learning (ML) model for the potential‐energy surface with efficient, fragment‐based searching. We use the characteristic building units observed in Hittorf's and fibrous phosphorus to seed stochastic (“random”) structure searches over hundreds of thousands of runs. Our study identifies a family of hierarchically structured allotropes based on a P8 cage as principal building unit, including one‐dimensional (1D) single and double helix structures, nanowires, and two‐dimensional (2D) phosphorene allotropes with square‐lattice and kagome topologies. These findings yield new insight into the intriguingly diverse structural chemistry of phosphorus, and they provide an example for how ML methods may, in the long run, be expected to accelerate the discovery of hierarchical nanostructures.  相似文献   

4.
5.
The linear solvent strength model was used to predict coverage in online comprehensive two‐dimensional reversed‐phase liquid chromatography. The prediction model uses a parallelogram to describe the separation space covered with peaks in a system with limited orthogonality. The corners of the parallelogram are assumed to behave like chromatographic peaks and the position of these pseudo‐compounds was predicted. A mix of 25 polycyclic aromatic compounds were used as a test. The precision of the prediction, span 0–25, was tested by varying input parameters, and was found to be acceptable with root mean square errors of 3. The accuracy of the prediction was assessed by comparing with the experimental coverages. Less than half of experimental coverages were outside prediction ± 1 × root mean square error and none outside prediction ± 2 × root mean square error. Accuracy was lower when retention factors were low, or when gradient conditions affected parameters not included in the model, e.g. second dimension gradient time affects the second dimension equilibration time. The concept shows promise as a tool for gradient optimization in online comprehensive two‐dimensional liquid chromatography, as it mitigates the tedious registration and modeling of all sample constituents, a circumstance that is particularly appealing when dealing with complex samples.  相似文献   

6.
7.
The feasibility of partial least squares (PLS) regression modeling of X-ray fluorescence (XRF) spectra of estuarine sediments has been evaluated as a tool for rapid trace element content monitoring. Multivariate PLS calibration models were developed to predict the concentration of Al, As, Cd, Co, Cr, Cu, Fe, Mg, Mn, Ni, Pb, Sn, V and Zn in sediments collected from different locations across the estuary of the Nerbioi-Ibaizabal River (Metropolitan Bilbao, Bay of Biscay, Basque Country). The study was carried out on a set of 116 sediment samples, previously lyophilized and sieved with a particle size lower than 63 μm. Sample reference data were obtained by inductively coupled plasma mass spectrometry. 34 samples were selected for building PLS models through a hierarchical cluster analysis. The remaining 82 samples were used as a test set to validate the models. Results obtained in the present study involved relative root mean square errors of prediction varying from 21%, for the determination of Pb at hundreds μg g−1 level, up to 87%, for Ni determination at little tens μg g−1 level. An average prediction error of ±37% for the 14 elements under study was obtained, being in all cases mean differences between predicted and reference results of the same order than the standard deviation of three replicates from a same sample. Residual predictive deviation values obtained ranged from 1.1 to 3.9.  相似文献   

8.
Sârbu C  Moţ AC 《Talanta》2011,85(2):1112-1117
The fingerprinting capacity of thin layer chromatography (TLC) and image analysis in the case of propolis samples collected in different area in Romania has been investigated. Fuzzy divisive hierarchical clustering approach was used as a powerful tool of samples discrimination and fingerprinting according to the geographical origin and local flora. The fuzzy partition and patterns obtained by membership degrees plot were in a very good agreement with floral origin and geographic location of Romanian propolis samples, and clearly illustrate the fuzziness concerning their similarities and difference. The results obtained strongly support that TLC via image analysis can be successfully employed in the fingerprinting methodologies if they are combined with appropriate fuzzy clustering method. The method developed in this paper might be also extended in the authenticity and origin control of fruits, herbs or derived products.  相似文献   

9.
As a liquid‐liquid partition chromatography, counter‐current chromatography has advantages in large sample loading capacity without irreversible adsorption, which has been widely applied in separation and purification fields. The main factors, including partition coefficient, two‐phase solvent systems, apparatus, and operating parameters greatly affect the separation process of counter‐current chromatography. To promote the applications of counter‐current chromatography, it is essential to develop theoretical research to master the principles of counter‐current chromatographic separations so as to achieve predictions before laborious trials. In this article, recent progress about separation prediction methods are reviewed from a point of the steady and unsteady state of the mass transfer process of counter‐current chromatography and its mass transfer characteristics, and then it is divided into three aspects: prediction of partition coefficient, modeling the thermodynamic process of counter‐current chromatography, and modeling the dynamic process of counter‐current chromatography.  相似文献   

10.
Radical C?H bond functionalization provides a versatile approach for elaborating heterocyclic compounds. The synthetic design of this transformation relies heavily on the knowledge of regioselectivity, while a quantified and efficient regioselectivity prediction approach is still elusive. Herein, we report the feasibility of using a machine learning model to predict the transition state barrier from the computed properties of isolated reactants. This enables rapid and reliable regioselectivity prediction for radical C?H bond functionalization of heterocycles. The Random Forest model with physical organic features achieved 94.2 % site accuracy and 89.9 % selectivity accuracy in the out‐of‐sample test set. The prediction performance was further validated by comparing the machine learning results with additional substituents, heteroarene scaffolds and experimental observations. This work revealed that the combination of mechanism‐based computational statistics and machine learning model can serve as a useful strategy for selectivity prediction of organic transformations.  相似文献   

11.
土壤总氮近红外光谱分析的波段优选   总被引:1,自引:0,他引:1  
潘涛  吴振涛  陈华舟 《分析化学》2012,40(6):920-924
利用移动窗口偏最小二乘( MWPLS)和Savitzky-Golay(SG)平滑方法优选土壤总氮的近红外(NIR)光谱分析模型.从全部97个土壤样品中随机选出35个样品作为检验集;基于偏最小二乘交叉检验预测偏差(PLSPB),将余下62个样品划分为具有相似性的建模定标集(37个样品)、建模预测集(25个样品).最优波段为1692~2138 nm,SG平滑的导数阶数(OD)、多项式次数(DP)、平滑点数(NSP)分别为0,6,69,PLS因子数为11,建模预测均方根偏差(M-RMSEP)、建模预测相关系数(M-Rp)分别为0.015%,0.931,检验预测均方根偏差(V-RM-SEP)、检验预测相关系数(V-RP)分别为0.018%,0.882.其结果可为设计专用NIR仪器提供有价值的参考.  相似文献   

12.
A new method of quantitative structure‐retention relationship (QSRR) is proposed for estimating and predicting gas chromatographic retention indices of alkanes by using a novel molecular distance‐edge vector, called μ vector, containing 10 elements. The QSRR model (Ml), between the μ vector and chromatographic retention indices of 64 alkanes, was developed by using multiple linear regression (MLR) with the correlation coefficient being R = 0.9992 and the root mean square (RMS) error between the estimated and measured retention indices being RMS = 5.938. In order to explain the equation stability and prediction abilities of the M1 model, it is essential to perform a cross‐validation (CV) procedure. Satisfactory CV results have been obtained by using one external predicted sample every time with the average correlation coefficient being R = 0.9988 and average RMS = 7.128. If 21 compounds, about one third drawn from all 64 alkanes, construct an external prediction set and the 43 remaining construct an internal calibration set, the second QSRR model (M2) can be created by using calibration set data with statistics being R = 0.9993 and RMS = 5.796. The chromatographic retention indices of 21 compounds in the external testing set can be predicted by the M2 model and good prediction results are obtained with R = 0.9988 and RMS = 6.508.  相似文献   

13.
Acquiring the three‐dimensional structure of a protein from its amino acid sequence alone, despite a great deal of work and significant progress on the subject, is still an unsolved problem. SSThread, a new template‐free algorithm is described here that consists of making several predictions of contacting pairs of α‐helices and β‐strands derived from a database of experimental structures using a knowledge‐based potential, secondary structure prediction, and contact map prediction followed by assembly of overlapping pair predictions to create an ensemble of core structure predictions whose loops are then predicted. In a set of seven CASP10 targets SSThread outperformed the two leading methods for two targets each. The targets were all β‐strand containing structures and most of them have a high relative contact order which demonstrates the advantages of SSThread. The primary bottlenecks based on sets of 74 and 21 test cases are the pair prediction and loop prediction stages. © 2014 Wiley Periodicals, Inc.  相似文献   

14.
A novel molecular structure prediction method, the Z Method, is described. It provides a versatile platform for the development and use of systematic, grid‐based conformational search protocols, in which statistical information (i.e., rotamers) can also be included. The Z Method generates trial structures by applying many changes of the same type to a single starting structure, thereby sampling the conformation space in an unbiased way. The method, implemented in the CHARMM program as the Z Module, is applied here to an illustrative model problem in which rigid, systematic searches are performed in a 36‐dimensional conformational space that describes the relative positions of the 10 secondary structural elements of the protein CheY. A polar hydrogen representation with an implicit solvation term (EEF1) is used to evaluate successively larger fragments of the protein generated in a hierarchical build‐up procedure. After a final refinement stage, and a total computational time of about two‐and‐a‐half CPU days on AMD Opteron processors, the prediction is within 1.56 Å of the native structure. The errors in the predicted backbone dihedral angles are found to approximately cancel. Monte Carlo and simulated annealing trials on the same or smaller versions of the problem, using the same atomic model and energy terms, are shown to result in less accurate predictions. Although the problem solved here is a limited one, the findings illustrate the utility of systematic searches with atom‐based models for macromolecular structure prediction and the importance of unbiased sampling in structure prediction methods. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011  相似文献   

15.
Direct‐injection mass spectrometry (DIMS) techniques have evolved into powerful methods to analyse volatile organic compounds (VOCs) without the need of chromatographic separation. Combined to chemometrics, they have been used in many domains to solve sample categorization issues based on volatilome determination. In this paper, different DIMS methods that have largely outperformed conventional electronic noses (e‐noses) in classification tasks are briefly reviewed, with an emphasis on food‐related applications. A particular attention is paid to proton transfer reaction mass spectrometry (PTR‐MS), and many results obtained using the powerful PTR‐time of flight‐MS (PTR‐ToF‐MS) instrument are reviewed. Data analysis and feature selection issues are also summarized and discussed. As a case study, a challenging problem of classification of dark chocolates that has been previously assessed by sensory evaluation in four distinct categories is presented. The VOC profiles of a set of 206 chocolate samples classified in the four sensory categories were analysed by PTR‐ToF‐MS. A supervised multivariate data analysis based on partial least squares regression‐discriminant analysis allowed the construction of a classification model that showed excellent prediction capability: 97% of a test set of 62 samples were correctly predicted in the sensory categories. Tentative identification of ions aided characterisation of chocolate classes. Variable selection using dedicated methods pinpointed some volatile compounds important for the discrimination of the chocolates. Among them, the CovSel method was used for the first time on PTR‐MS data resulting in a selection of 10 features that allowed a good prediction to be achieved. Finally, challenges and future needs in the field are discussed.  相似文献   

16.
A hierarchical classification of chemical scaffolds (molecular framework, which is obtained by pruning all terminal side chains) has been introduced. The molecular frameworks form the leaf nodes in the hierarchy trees. By an iterative removal of rings, scaffolds forming the higher levels in the hierarchy tree are obtained. Prioritization rules ensure that less characteristic, peripheral rings are removed first. All scaffolds in the hierarchy tree are well-defined chemical entities making the classification chemically intuitive. The classification is deterministic, data-set-independent, and scales linearly with the number of compounds included in the data set. The application of the classification is demonstrated on two data sets extracted from the PubChem database, namely, pyruvate kinase binders and a collection of pesticides. The examples shown demonstrate that the classification procedure handles robustly synthetic structures and natural products.  相似文献   

17.
Preferred conformations of amino acid side chains have been well established through statistically obtained rotamer libraries. Typically, these provide bond torsion angles allowing a side chain to be traced atom by atom. In cases where it is desirable to reduce the complexity of a protein representation or prediction, fixing all side-chain atoms may prove unwieldy. Therefore, we introduce a general parametrization to allow positions of representative atoms (in the present study, these are terminal atoms) to be predicted directly given backbone atom coordinates. Using a large, culled data set of amino acid residues from high-resolution protein crystal structures, anywhere from 1 to 7 preferred conformations were observed for each terminal atom of the non-glycine residues. Side-chain length from the backbone C(alpha) is one of the parameters determined for each conformation, which should itself be useful. Prediction of terminal atoms was then carried out for a second, nonredundant set of protein structures to validate the data set. Using four simple probabilistic approaches, the Monte Carlo style prediction of terminal atom locations given only backbone coordinates produced an average root mean-square deviation (RMSD) of approximately 3 A from the experimentally determined terminal atom positions. With prediction using conditional probabilities based on the side-chain chi(1) rotamer, this average RMSD was improved to 1.74 A. The observed terminal atom conformations therefore provide reasonable and potentially highly accurate representations of side-chain conformation, offering a viable alternative to existing all-atom rotamers for any case where reduction in protein model complexity, or in the amount of data to be handled, is desired. One application of this representation with strong potential is the prediction of charge density in proteins. This would likely be especially valuable on protein surfaces, where side chains are much less likely to be fixed in single rotamers. Prediction of ensembles of structures provides a method to determine the probability density of charge and atom location; such a prediction is demonstrated graphically.  相似文献   

18.
19.
An activity predictor software was previously developed to foresee activities, exposure rates and gamma spectra of activated samples for Radiation Science and Engineering Center (RSEC), Penn State Breazeale Reactor (PSBR), Neutron Activation Analysis (NAA) measurements. With Activity Predictor it has been demonstrated that the predicted spectra were less than satisfactory. In order to obtain better predicted spectra, a new detailed model for the RSEC NAA spectroscopy system with High Purity Germanium (HPGe) detector is developed using Geant-4. The model was validated with a National Bureau of Standards certified 60Co source and tree activated high purity samples at PSBR. The predicted spectra agreed well with measured spectra. Error in net photo peak area values were 8.6–33.6%. Along with the previously developed activity predictor software, this new model in Geant-4 provided realistic spectra prediction for NAA experiments at RSEC PSBR.  相似文献   

20.
Curcumae longae rhizome is a widely used traditional herb in many countries. Various geographical origins of this herb might lead to diversity or instability of the herbal quality. The objective of this work was to establish the chemical fingerprints for quality control and find the chemical markers for discriminating these herbs from different origins. First, chemical fingerprints of essential oil of 24 C. longae rhizome from four different geographical origins in China were determined by GC–MS. Then, pattern recognition techniques were introduced to analyze these abundant chemical data in depth; hierarchical cluster analysis was used to sort samples into groups by measuring their similarities, and principal component analysis and partial least‐squares discriminate analysis were applied to find the main chemical markers for discriminating these samples. Curcumae longae rhizome from Guangxi province had the highest essential oil yield (4.32 ± 1.45%). A total of 46 volatile compounds were identified in total. Consistent results were obtained to show that C. longae rhizome samples could be successfully grouped according to their origins, and turmerone, ar‐turmerone, and zingiberene were the characteristic components for discriminating these samples of various geographical origins and for quality control. This finding revealed that fingerprinting analysis based on GC–MS coupled with chemometric techniques could provide a reliable platform to discriminate herbs from different origins, which is a benefit for quality control.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号