首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
We have previously reported that the application of a Laplacian-modified naive Bayesian (NB) classifier may be used to improve the ranking of known inhibitors from a random database of compounds after High-Throughput Docking (HTD). The method relies upon the frequency of substructural features among the active and inactive compounds from 2D fingerprint information of the compounds. Here we present an investigation of the role of extended connectivity fingerprints in training the NB classifier against HTD studies on the HIV-1 protease using three docking programs: Glide, FlexX, and GOLD. The results show that the performance of the NB classifier is due to the presence of a large number of features common to the set of known active compounds rather than a single structural or substructural scaffold. We demonstrate that the Laplacian-modified naive Bayesian classifier trained with data from high-throughput docking is superior at identifying active compounds from a target database in comparison to conventional two-dimensional substructure search methods alone.  相似文献   

3.
An electronic tongue (ET) based on potentiometric chemical sensors was assessed as a rapid tool for the quantification of bitterness in red wines. A set of 39 single cultivar Pinotage wines comprising 13 samples with medium to high bitterness was obtained from the producers in West Cape, South Africa. Samples were analysed with respect to a set of routine wine parameters and major phenolic compounds using Fourier transform infrared-multiple internal reflection spectroscopy (WineScan) and high-performance liquid chromatography, respectively. A trained sensory panel assessed the bitterness intensity of 15 wines, 13 of which had a bitter taste of medium to high intensity. Thirty-one wine samples including seven bitter-tasting ones were measured by the ET. Influence of the chemical composition of wine on the occurrence of the bitter taste was evaluated using one-way analysis of variance. It was found that bitter-tasting wines had higher concentrations of phenolic compounds (catechin, epicatechin, gallic and caffeic acids and quercetin) than non-bitter wines. Sensitivity of the sensors of the array to the phenolic compounds related to the bitterness was studied at different pH levels. Sensors displayed sensitivity to all studied compounds at pH 7, but only to quercetin at pH 3.5. Based on these findings, the pH of wine was adjusted to 7 prior to measurements. Calibration models for classification of wine samples according to the presence of the bitter taste and quantification of the bitterness intensity were calculated by partial least squares-discriminant analysis (PLS-DA) regression. Statistical significance of the classification results was confirmed by the permutation test. Both ET and chemical analysis data could discriminate between bitter and control wines with the correct classification rates of 94% and 91%, respectively. Prediction of the bitterness intensity with good accuracy (root mean square error of 2 and mean relative error of 6% in validation) was possible only using ET data.  相似文献   

4.
High dimensional datasets contain up to thousands of features, and can result in immense computational costs for classification tasks. Therefore, these datasets need a feature selection step before the classification process. The main idea behind feature selection is to choose a useful subset of features to significantly improve the comprehensibility of a classifier and maximize the performance of a classification algorithm. In this paper, we propose a one-per-class model for high dimensional datasets. In the proposed method, we extract different feature subsets for each class in a dataset and apply the classification process on the multiple feature subsets. Finally, we merge the prediction results of the feature subsets and determine the final class label of an unknown instance data. The originality of the proposed model is to use appropriate feature subsets for each class. To show the usefulness of the proposed approach, we have developed an application method following the proposed model. From our results, we confirm that our method produces higher classification accuracy than previous novel feature selection and classification methods.  相似文献   

5.
6.
Analysis of DNA sequences isolated directly from the environment, known as metagenomics, produces a large quantity of genome fragments that need to be classified into specific taxa. Most composition-based classification methods use all features instead of a subset of features that may maximize classifier accuracy. We show that feature selection methods can boost performance of taxonomic classifiers. This work proposes three different filter-based feature selection methods that stem from information theory: (1) a technique that combines Kullback-Leibler, Mutual Information, and distance information, (2) a text mining technique, TF-IDF, and (3) minimum redundancy-maximum-relevance (mRMR). The feature selection methods are compared by how well they improve support vector machine classification of genomic reads. Overall, the 6mer mRMR method performs well, especially on the phyla-level. If the number of total features is very large, feature selection becomes difficult because a small subset of features that captures a majority of the data variance is less likely to exist. Therefore, we conclude that there is a trade-off between feature set size and feature selection method to optimize classification performance. For larger feature set sizes, TF-IDF works better for finer-resolutions while mRMR performs the best out of any method for N=6 for all taxonomic levels.  相似文献   

7.
8.
9.
Feature selection is frequently used as a preprocessing step to machine learning. The removal of irrelevant and redundant information often improves the performance of learning algorithms. This paper is a comparative study of feature selection in drug discovery. The focus is on aggressive dimensionality reduction. Five methods were evaluated, including information gain, mutual information, a chi2-test, odds ratio, and GSS coefficient. Two well-known classification algorithms, Na?ve Bayesian and Support Vector Machine (SVM), were used to classify the chemical compounds. The results showed that Na?ve Bayesian benefited significantly from the feature selection, while SVM performed better when all features were used. In this experiment, information gain and chi2-test were most effective feature selection methods. Using information gain with a Na?ve Bayesian classifier, removal of up to 96% of the features yielded an improved classification accuracy measured by sensitivity. When information gain was used to select the features, SVM was much less sensitive to the reduction of feature space. The feature set size was reduced by 99%, while losing only a few percent in terms of sensitivity (from 58.7% to 52.5%) and specificity (from 98.4% to 97.2%). In contrast to information gain and chi2-test, mutual information had relatively poor performance due to its bias toward favoring rare features and its sensitivity to probability estimation errors.  相似文献   

10.
11.
Direct‐injection mass spectrometry (DIMS) techniques have evolved into powerful methods to analyse volatile organic compounds (VOCs) without the need of chromatographic separation. Combined to chemometrics, they have been used in many domains to solve sample categorization issues based on volatilome determination. In this paper, different DIMS methods that have largely outperformed conventional electronic noses (e‐noses) in classification tasks are briefly reviewed, with an emphasis on food‐related applications. A particular attention is paid to proton transfer reaction mass spectrometry (PTR‐MS), and many results obtained using the powerful PTR‐time of flight‐MS (PTR‐ToF‐MS) instrument are reviewed. Data analysis and feature selection issues are also summarized and discussed. As a case study, a challenging problem of classification of dark chocolates that has been previously assessed by sensory evaluation in four distinct categories is presented. The VOC profiles of a set of 206 chocolate samples classified in the four sensory categories were analysed by PTR‐ToF‐MS. A supervised multivariate data analysis based on partial least squares regression‐discriminant analysis allowed the construction of a classification model that showed excellent prediction capability: 97% of a test set of 62 samples were correctly predicted in the sensory categories. Tentative identification of ions aided characterisation of chocolate classes. Variable selection using dedicated methods pinpointed some volatile compounds important for the discrimination of the chocolates. Among them, the CovSel method was used for the first time on PTR‐MS data resulting in a selection of 10 features that allowed a good prediction to be achieved. Finally, challenges and future needs in the field are discussed.  相似文献   

12.
The objective of this work was to apply artificial neural networks (ANNs) to the classification group of 43 derivatives of phenylcarbamic acid. To find the appropriate clusters Kohonen topological maps were employed. As input data, thermal parameters obtained during DSC and TG analysis were used. Input feature selection (IFS) algorithms were used in order to give an estimate of the relative importance of various input variables. Additionally, sensitivity analysis was carried out to eliminate less important thermal variables. As a result, one classification model was obtained, which can assign our compounds to an appropriate class. Because the classes contain groups of molecules structurally related, it is possible to predict the structure of the compounds (for example the position of the substitution alkoxy group in the phenyl ring) on the basis of obtained parameters.  相似文献   

13.
14.
M. R. KARIM  F. HASHINAGA 《催化学报》2010,31(12):1445-1451
 Limonoid bitterness is a serious problem in the citrus industry worldwide. Limonoid glucosyltransferase is an enzyme that catalyzes the conversion of bitter limonoid into non-bitter limonoid glucoside while retaining the health benefit of limonoids in the juice. The immobilization of this enzyme in a column can solve the juice bitterness problem. More information about the catalytic residues of the en-zyme is needed in this immobilization process. Glutamate/aspartate, histidine, lysine, tryptophan, serine, and cysteine residues were chemi-cally modified to investigate their roles in the catalytic function of limonoid glucosyltransferase. Inactivation of the enzyme following modi-fication of carboxyl and imidazole moieties was a consequence of a loss in substrate binding and catalysis in the glucosyltransfer reaction. The modification of a single histidine residue completely destroyed the ability of limonoid glucosyltransferase to transfer the D-glucopyranosyl unit. Tryptophan seemed to have some role in maintaining the active conformation of the catalytic site. Lysine also seemed to have some direct or indirect role in this catalysis but the modification of serine and cysteine did not have any effect on catalysis. There-fore, we conclude that the carboxyl and imidazole groups contain amino acids are responsible for the catalytic action of the enzyme.  相似文献   

15.
High throughput microsomal stability assays have been widely implemented in drug discovery and many companies have accumulated experimental measurements for thousands of compounds. Such datasets have been used to develop in silico models to predict metabolic stability and guide the selection of promising candidates for synthesis. This approach has proven most effective when selecting compounds from proposed virtual libraries prior to synthesis. However, these models are not easily interpretable at the structural level, and thus provide little insight to guide traditional synthetic efforts. We have developed global classification models of rat, mouse and human liver microsomal stability using in-house data. These models were built with FCFP_6 fingerprints using a Naïve Bayesian classifier within Pipeline Pilot. The test sets were correctly classified as stable or unstable with satisfying accuracies of 78, 77 and 75% for rat, human and mouse models, respectively. The prediction confidence was assigned using the Bayesian score to assess the applicability of the models. Using the resulting models, we developed a novel data mining strategy to identify structural features associated with good and bad microsomal stability. We also used this approach to identify structural features which are good for one species but bad for another. With these findings, the structure-metabolism relationships are likely to be understood faster and earlier in drug discovery.  相似文献   

16.
Headspace solid-phase microextraction (HS-SPME) coupled with gas chromatography (GC) and multivariate data analysis were applied to classify different vinegar types (white and red, balsamic, sherry and cider vinegars) on the basis of their volatile composition. The collected chromatographic signals were analysed using the stepwise linear discriminant analysis (SLDA) method, thus simultaneously performing feature selection and classification. Several options, more or less restrictive according to the final number of considered categories, were explored in order to identify the one that afforded highest discrimination ability. The simplicity and effectiveness of the classification methodology proposed in the present study (all the samples were correctly classified and predicted by cross-validation) are promising and encourage the feasibility of using a similar strategy to evaluate the quality and origin of vinegar samples in a reliable, fast, reproducible and cost-efficient way in routine applications. The high quality results obtained were even more remarkable considering the reduced number of discriminant variables finally selected by the stepwise procedure. The use of only 14 peaks enabled differentiation between cider, balsamic, sherry and wine vinegars, whereas only 3 variables were selected to discriminate between red (RW) and white wine (WW) vinegars. The subsequent identification by gas chromatography-mass spectrometry (GC-MS) of the volatile compounds associated with the discriminant peaks selected in the classification process served to interpret their chemical significance.  相似文献   

17.
Small molecule aggregators non‐specifically inhibit multiple unrelated proteins, rendering them therapeutically useless. They frequently appear as false hits and thus need to be eliminated in high‐throughput screening campaigns. Computational methods have been explored for identifying aggregators, which have not been tested in screening large compound libraries. We used 1319 aggregators and 128,325 non‐aggregators to develop a support vector machines (SVM) aggregator identification model, which was tested by four methods. The first is five fold cross‐validation, which showed comparable aggregator and significantly improved non‐aggregator identification rates against earlier studies. The second is the independent test of 17 aggregators discovered independently from the training aggregators, 71% of which were correctly identified. The third is retrospective screening of 13M PUBCHEM and 168K MDDR compounds, which predicted 97.9% and 98.7% of the PUBCHEM and MDDR compounds as non‐aggregators. The fourth is retrospective screening of 5527 MDDR compounds similar to the known aggregators, 1.14% of which were predicted as aggregators. SVM showed slightly better overall performance against two other machine learning methods based on five fold cross‐validation studies of the same settings. Molecular features of aggregation, extracted by a feature selection method, are consistent with published profiles. SVM showed substantial capability in identifying aggregators from large libraries at low false‐hit rates. © 2009 Wiley Periodicals, Inc.J Comput Chem, 2010  相似文献   

18.
Racemomycin-B (RM-B), the main component of Streptomyces lavendulae OP-2 which is the basis of 50% of the antibiotics produced, is a streptothricin antibiotic which contains three beta-lysine moieties in the molecule. RM-B had antimicrobial activity against plant-pathogenic microorganisms and growth-inhibitory activity against the root of Brassica rapa L. at the concentration of 50 ppm. It strongly inhibited the growth of Pseudomonas syringae pv. tabaci IFO-3508 (minimum inhibitory concentration (MIC): 0.4 microgram/ml), and also showed antifungal activity against six kinds of Fusarium oxysporum species (MIC: 0.1-2.0 micrograms/ml). The antimicrobial activity of RM-B was much stronger than those of RM-A and -C which contain, respectively, one and two beta-lysine moieties in their molecules. The above activities of RM-A, -C and -B were thus in the order of -B greater than -C greater than -A: namely, the biological activity of racemomycin compounds tended to be stronger with increase in the number of beta-lysine moieties in the molecule.  相似文献   

19.
The application of the potentiometric multisensor system (electronic tongue, ET) for quantification of the bitter taste of structurally diverse active pharmaceutical ingredients (API) is reported. The measurements were performed using a set of bitter substances that had been assessed by a professional human sensory panel and the in vivo rat brief access taste aversion (BATA) model to produce bitterness intensity scores for each substance at different concentrations. The set consisted of eight substances, both inorganic and organic – azelastine, caffeine, chlorhexidine, potassium nitrate, naratriptan, paracetamol, quinine, and sumatriptan. With the aim of enhancing the response of the sensors to the studied APIs, measurements were carried out at different pH levels ranging from 2 to 10, thus promoting ionization of the compounds. This experiment yielded a 3 way data array (samples × sensors × pH levels) from which 3wayPLS regression models were constructed with both human panel and rat model reference data. These models revealed that artificial assessment of bitter taste with ET in the chosen set of API's is possible with average relative errors of 16% in terms of human panel bitterness score and 25% in terms of inhibition values from in vivo rat model data. Furthermore, these 3wayPLS models were applied for prediction of the bitterness in blind test samples of a further set of API's. The results of the prediction were compared with the inhibition values obtained from the in vivo rat model.  相似文献   

20.
Many gram-negative bacteria use type IV secretion systems to deliver effector molecules to a wide range of target cells. These substrate proteins, which are called type IV secreted effectors (T4SE), manipulate host cell processes during infection, often resulting in severe diseases or even death of the host. Therefore, identification of putative T4SEs has become a very active research topic in bioinformatics due to its vital roles in understanding host-pathogen interactions. PSI-BLAST profiles have been experimentally validated to provide important and discriminatory evolutionary information for various protein classification tasks. In the present study, an accurate computational predictor termed iT4SE-EP was developed for identifying T4SEs by extracting evolutionary features from the position-specific scoring matrix and the position-specific frequency matrix profiles. First, four types of encoding strategies were designed to transform protein sequences into fixed-length feature vectors based on the two profiles. Then, the feature selection technique based on the random forest algorithm was utilized to reduce redundant or irrelevant features without much loss of information. Finally, the optimal features were input into a support vector machine classifier to carry out the prediction of T4SEs. Our experimental results demonstrated that iT4SE-EP outperformed most of existing methods based on the independent dataset test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号