首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Variable responses are fundamental for all experiments, and they can consist of information-rich, redundant, and low signal intensities. A dataset can consist of a collection of variable responses over multiple classes or groups. Usually some of the variables are removed in a dataset that contain very little information. Sometimes all the variables are used in the data analysis phase. It is common practice to discriminate between two distributions of data; however, there is no formal algorithm to arrive at a degree of separation (DS) between two distributions of data. The DS is defined herein as the average of the sum of the areas from the probability density functions (PDFs) of A and B that contain a ≥ percentage of A and/or B. Thus, DS90 is the average of the sum of the PDF areas of A and B that contain ≥90% of A and/or B. To arrive at a DS value, two synthesized PDFs or very large experimental datasets are required. Experimentally it is common practice to generate relatively small datasets. Therefore, the challenge was to find a statistical parameter that can be used on small datasets to estimate and highly correlate with the DS90 parameter. Established statistical methods include the overlap area of the two data distribution profiles, Welch’s t-test, Kolmogorov–Smirnov (K–S) test, Mann–Whitney–Wilcoxon test, and the area under the receiver operating characteristics (ROC) curve (AUC). The area between the ROC curve and diagonal (ACD) and the length of the ROC curve (LROC) are introduced. The established, ACD, and LROC methods were correlated to the DS90 when applied on many pairs of synthesized PDFs. The LROC method provided the best linear correlation with, and estimation of, the DS90. The estimated DS90 from the LROC (DS90–LROC) is applied to a database, as an example, of three Italian wines consisting of thirteen variable responses for variable ranking consideration. An important highlight of the DS90–LROC method is utilizing the LROC curve methodology to test all variables one-at-a-time with all pairs of classes in a dataset.  相似文献   

2.
3.
The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present several standard approaches and modifications of our Genetic Algorithm based on the Shannon Entropy Cliques (GA-SEC) algorithm and the extension for classification problems using boosting.  相似文献   

4.
5.
A concept termed Emerging Chemical Patterns (ECPs) is introduced as a novel approach to molecular classification. The methodology makes it possible to extract key molecular features from very few known active compounds and classify molecules according to different potency levels. The approach was developed in light of the situation often faced during the early stages of lead optimization efforts: too few active reference molecules are available to build computational models for the prediction of potent compounds. The ECP method generates high-resolution signatures of active compounds. Predictive ECP models can be built based on the information provided by sets of only three molecules with potency in the nanomolar and micromolar range. In addition to individual compound predictions, an iterative ECP scheme has been designed. When applied to different sets of active molecules, iterative ECP classification produced compound selection sets with increases in average potency of up to 3 orders of magnitude.  相似文献   

6.
7.
Using a series of thirteen organic materials that includes novel high-nitrogen energetic materials, conventional organic military explosives, and benign organic materials, we have demonstrated the importance of variable selection for maximizing residue discrimination with partial least squares discriminant analysis (PLS-DA). We built several PLS-DA models using different variable sets based on laser induced breakdown spectroscopy (LIBS) spectra of the organic residues on an aluminum substrate under an argon atmosphere. The model classification results for each sample are presented and the influence of the variables on these results is discussed. We found that using the whole spectra as the data input for the PLS-DA model gave the best results. However, variables due to the surrounding atmosphere and the substrate contribute to discrimination when the whole spectra are used, indicating this may not be the most robust model. Further iterative testing with additional validation data sets is necessary to determine the most robust model.  相似文献   

8.
Real-time PCR (qPCR) is the principal technique for the quantification of pathogen biomass in host tissue, yet no generic methods exist for the determination of the limit of quantification (LOQ) and the limit of detection (LOD) in qPCR. We suggest using the Youden index in the context of the receiver operating characteristic (ROC) curve analysis for this purpose. The LOQ was defined as the amount of target DNA that maximizes the sum of sensitivity and specificity. The LOD was defined as the lowest amount of target DNA that was amplified with a false-negative rate below a given threshold. We applied this concept to qPCR assays for Fusarium verticillioides and Fusarium proliferatum DNA in maize kernels. Spiked matrix and field samples characterized by melting curve analysis of PCR products were used as the source of true positives and true negatives. On the basis of the analysis of sensitivity and specificity of the assays, we estimated the LOQ values as 0.11 pg of DNA for spiked matrix and 0.62 pg of DNA for field samples for F. verticillioides. The LOQ values for F. proliferatum were 0.03 pg for spiked matrix and 0.24 pg for field samples. The mean LOQ values correspond to approximately eight genomes for F. verticillioides and three genomes for F. proliferatum. We demonstrated that the ROC analysis concept, developed for qualitative diagnostics, can be used for the determination of performance parameters of quantitative PCR.  相似文献   

9.
A pharmacophore is a model which represents the key physico-chemical interactions that mediate biological activity. There is a long history of using pharmacophore modeling methods to select subsets of compounds, focused towards a specific target of interest. This paper will review existing computational methods for deriving and comparing pharmacophore models. We outline a new classification of pharmacophore methods based on the abstraction of the underlying chemical interactions which embody a pharmacophore, and the methods available to quantitatively compare them. Within the context of this classification, example studies, using specific pharmacophore modeling methods for focused library selection, will be discussed.  相似文献   

10.
Summary An appropriate procedure based on statistical criteria is suggested for the determination of the optimum set of model parameters for a given chromatographic system. The criteria employed are the t-ratio test, the rate of change in the sum of squares of residuals, the standard error of the fit, the F-test, and the CP-test. The suggested procedure has been evaluated using two different models, one based on partition and the other on adsorption mechanisms, which describe the combined effect of pH and organic modifier content on the retention of ionogenic solutes in reversed-phase liquid chromatography. It is shown that all the criteria give almost converged results and therefore we may simply use the F-test, which seems to be the most sensitive and reliable criterion excluding any personal judgement. It is also found that the retention models tested show a different behavior towards their simplification. In particular, the use of a reduced equation of the partition model, selected on the basis of the suggested procedure, is necessary for the prediction of meaningful retention surfaces, whereas the decrease in the number of the adjustable parameters in the adsorption model offers only noise reduction and fitting simplicity, because no version of this model predicts abnormal retention surfaces.  相似文献   

11.
A purine-containing multifunctional ligand, 2-(6-oxo-6H-purin-1(9H)-yl)acetic acid (HL), and two new 2-D coordination polymers, [Co(L)2(H2O)2] n?·?2nH2O (1) and [Ni(L)2(H2O)2] n?·?2nH2O (2), were synthesized and characterized. Polymers 1 and 2 have isomorphous structures with (4,4)-connected topologies composed of left- and right-handed metal–organic helices sharing common metal centers. Two helical conformations in the same net are stabilized by strong π–π stacking interactions between purine groups. Through direct and water-mediated interlayer hydrogen-bond interactions those layers are assembled into stable 3-D supermolecules where slight differences in the strength of hydrogen bonds and coordination bonds result in their decomposition behaviors.  相似文献   

12.
The objective of this work was to apply artificial neural networks (ANNs) to the classification group of 43 derivatives of phenylcarbamic acid. To find the appropriate clusters Kohonen topological maps were employed. As input data, thermal parameters obtained during DSC and TG analysis were used. Input feature selection (IFS) algorithms were used in order to give an estimate of the relative importance of various input variables. Additionally, sensitivity analysis was carried out to eliminate less important thermal variables. As a result, one classification model was obtained, which can assign our compounds to an appropriate class. Because the classes contain groups of molecules structurally related, it is possible to predict the structure of the compounds (for example the position of the substitution alkoxy group in the phenyl ring) on the basis of obtained parameters.  相似文献   

13.
The rivality index (RI) is a normalized distance measurement between a molecule and their first nearest neighbours providing a robust prediction of the activity of a molecule based on the known activity of their nearest neighbours. Negative values of the RI describe molecules that would be correctly classified by a statistic algorithm and, vice versa, positive values of this index describe those molecules detected as outliers by the classification algorithms. In this paper, we have described a classification algorithm based on the RI and we have proposed four weighted schemes (kernels) for its calculation based on the measuring of different characteristics of the neighbourhood of molecules for each molecule of the dataset at established values of the threshold of neighbours. The results obtained have demonstrated that the proposed classification algorithm, based on the RI, generates more reliable and robust classification models than many of the more used and well-known machine learning algorithms. These results have been validated and corroborated by using 20 balanced and unbalanced benchmark datasets of different sizes and modelability. The classification models generated provide valuable information about the molecules of the dataset, the applicability domain of the models and the reliability of the predictions.  相似文献   

14.
The Taft-Kamlet-Abboud hydrogen-bond acidity, hydrogen-bond basicity and polarity-polarizability are widely used as empirical characteristics of solvent-solute interactions. These solvatochromic parameters are determined from the absorption band positions of solvatochromic probes in the standard medium and in the medium under study. The practice of solvatochromic probing is growing rapidly, and the values of solvatochromic parameters are refined from time to time. As these values are rather close for many media, the classification of media based on these values can be tedious. This increases the choice of algorithms that can be employed in order to decrease the ambiguity of classification. The classification algorithms stable to small variations of solvatochromic parameters are of special interest. The artificial neural networks (ANN) proved to be a powerful tool for the supervised classification. The paper focuses on the search of optimal parameters of probabilistic, dynamic, Elman, feed-forward, and cascade ANN for the classification of solvent on the basis of their solvatochromic characteristics. Also, the influence of data variation on the stability of classification is examined. The dynamic and probabilistic neural networks have been found to be error-free and stable; they have significantly become such a common tool for supervised classification as linear discriminant analysis.   相似文献   

15.
Quantitative criteria to ascertain the quality of calorimetric models based on physical parameters are presented. These include not only a comparison between model and experimental pulse responses, especially for the larger time constants, but also an analysis of their spectra up to the frequential limit brought about by the experimental noise.A calorimetric model based on the physical parameters of a Unipan 600 calorimeter is used to reconstruct a given power dissipation. The results are then compared to those given by other methods, i.e. dynamic optimization, inverse filtering and harmonic analysis.  相似文献   

16.
17.
Design and fabrication of an ammonia sensor operating at room temperature based on pigment-sensitized TiO2 films was described. TiO2 was prepared by sol–gel method and deposited on glass slides containing gold electrodes. Then, the film immersed in a 2.5 × 10−4 M ethanol solution of cyanidin to absorb the pigment. The hybrid organic–inorganic formed film here can detect ammonia reversibly at room temperature. The relative change resistance of the films at a potential difference of 1.5 V is determined when the films are exposed to atmospheres containing ammonia vapors with concentrations over the range 10–50 ppm. The relative change resistance, S, of the films increased almost linearly with increasing concentrations of ammonia (r = 0.92). The response time to increasing concentrations of the ammonia is about 180–220 s, and the corresponding values for decreasing concentrations 240–270 s. At low humidity, ammonia could be ionized by the cyanidin on the TiO2 film and thereby decrease in the proton concentration at the surface. Consequently, more positively charged holes at the surface of the TiO2 have to be extracted to neutralize the adsorbed cyanidin and water film. The resistance response to ammonia of the sensors was nearly independent on temperature from 10 to 50 °C. These results are not actually as good as those reported in the literature, but this preliminary work proposes simpler and cheaper processes to realize NH3 sensor for room temperature applications.  相似文献   

18.
In this work, a comparative study of two novel algorithms to perform sample selection in local regression based on Partial Least Squares Regression (PLS) is presented. These methodologies were applied for Near Infrared Spectroscopy (NIRS) quantification of five major constituents in corn seeds and are compared and contrasted with global PLS calibrations. Validation results show a significant improvement in the prediction quality when local models implemented by the proposed algorithms are applied to large data bases.  相似文献   

19.
The reproducibility of two migration parameters (retention time and mobility) of a seven-component test mixture was examined under various operating conditions using laboratory-built capillary electrophoresis systems. It was found that the frequency of rinsing the capillary and the solutions used for rinsing had the greatest effect on migration reproducibility. In addition, it was found that the migration behavior of solutes that interact with micelles is not repeatable unless the proper rinse protocol is applied. Inconsistent migration behavior is linked to inconsistent total current of the system. Preliminary investigations indicate that the fluctuation in total current were associated with non-equilibrium conditions between the buffer and the capillary wall.  相似文献   

20.
Journal of Thermal Analysis and Calorimetry - In this paper, a performance analysis based on the entropy generation and thermal efficiency has been carried out for the irreversible dual cycle. The...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号