首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
For over a decade, cheminformatics has contributed to a wide array of scientific tasks from analytical chemistry and biochemistry to pharmacology and drug discovery; and although its contributions to decision making are recognized, the challenge is how it would contribute to faster development of novel, better products. Here we address the future of cheminformatics with primary focus on innovation. Cheminformatics developers often need to choose between “mainstream” (i.e., accepted, expected) and novel, leading-edge tools, with an increasing trend for open science. Possible futures for cheminformatics include the worst case scenario (lack of funding, no creative usage), as well as the best case scenario (complete integration, from systems biology to virtual physiology). As “-omics” technologies advance, and computer hardware improves, compounds will no longer be profiled at the molecular level, but also in terms of genetic and clinical effects. Among potentially novel tools, we anticipate machine learning models based on free text processing, an increased performance in environmental cheminformatics, significant decision-making support, as well as the emergence of robot scientists conducting automated drug discovery research. Furthermore, cheminformatics is anticipated to expand the frontiers of knowledge and evolve in an open-ended, extensible manner, allowing us to explore multiple research scenarios in order to avoid epistemological “local information minimum trap”.  相似文献   

2.
Cheminformatics is used to validate the capabilities of widely used quantum chemistry and molecular mechanics methods. Among the quantum methods examined are the semiempirical MNDO, AM1, and PM3 methods, Hartree-Fock (ab initio) at a range of basis set levels, density functional theory (DFT) at a range of basis sets, and a post-Hartree-Fock method, local Moller-Plesset second-order perturbation theory (LMP2). Among the force fields compared are AMBER, MMFF94, MMFF94s, OPLS/A, OPLS-AA, Sybyl, and Tripos. Programs used are Spartan, MacroModel, SYBYL, and Jaguar. The test molecule is (2-amino-5-thiazolyl)-alpha-(methoxyimino)-N-methylacetamide, a model of the aminothiazole methoxime (ATMO) side chain of third-generation cephalosporin antibacterial agents. The Ward hierarchical clustering technique yields an insightful comparison of experimental (X-ray) and calculated (energy optimized) bond lengths and bond angles. The computational chemistry methods are also compared in terms of the potential energy curves they predict for internal rotation. Clustering analysis and regression analysis are compared. The MMFF94 force field such as implemented in MacroModel is the best overall computational chemistry method at reproducing crystallographic data and conformational properties of the ATMO moiety. This work demonstrates that going to a higher level of quantum theory does not necessarily give better results and that quantum mechanical results are not necessarily better than molecular mechanics results.  相似文献   

3.

Background  

Scripting languages such as Python are ideally suited to common programming tasks in cheminformatics such as data analysis and parsing information from files. However, for reasons of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in compiled languages such as C++. We describe Pybel, a Python module that provides access to the OpenBabel toolkit.  相似文献   

4.
5.
This tutorial provides a concise overview of support vector machines and different closely related techniques for pattern classification. The tutorial starts with the formulation of support vector machines for classification. The method of least squares support vector machines is explained. Approaches to retrieve a probabilistic interpretation are covered and it is explained how the binary classification techniques can be extended to multi-class methods. Kernel logistic regression, which is closely related to iteratively weighted least squares support vector machines, is discussed. Different practical aspects of these methods are addressed: the issue of feature selection, parameter tuning, unbalanced data sets, model evaluation and statistical comparison. The different concepts are illustrated on three real-life applications in the field of metabolomics, genetics and proteomics.  相似文献   

6.
7.
Computers have changed the way we do science. Surrounded by a sea of data and with phenomenal computing capacity, the methodology and approach to scientific problems is evolving into a partnership between experiment, theory and data analysis. Given the pace of change of the last twenty-five years, it seems folly to speculate on the future, but along with unpredictable leaps of progress there will be a continuous evolution of capability, which points to opportunities and improvements that will certainly appear as our discipline matures.  相似文献   

8.
《Vibrational Spectroscopy》2010,52(2):276-282
The combinations of NIR spectroscopy and three classification algorithms, i.e., multi-class support vector machine (BSVM), k-nearest neighbor (KNN) and soft independent modeling of class analogies (SIMCA), for discriminating different brands of cigarettes, were explored. The influence of the training set size on the relative performance of each algorithm was also investigated. A NIR spectral dataset involving the classification of cigarettes of three brands was used for illustration. Three performance criteria based on “correctly classified rate (CCR)”, i.e., “Average CCR”, “95 percentile of CCR” and “S.D. of CCR”, were defined to compare different algorithms. It was revealed that BSVM is significantly better than KNN or SIMCA in the statistical sense, especially in cases where the training set is relatively small. The results suggest that NIR spectroscopy together with BSVM could be an alternative to traditional methods for discriminating different brands of cigarettes.  相似文献   

9.
The combinations of NIR spectroscopy and three classification algorithms, i.e., multi-class support vector machine (BSVM), k-nearest neighbor (KNN) and soft independent modeling of class analogies (SIMCA), for discriminating different brands of cigarettes, were explored. The influence of the training set size on the relative performance of each algorithm was also investigated. A NIR spectral dataset involving the classification of cigarettes of three brands was used for illustration. Three performance criteria based on “correctly classified rate (CCR)”, i.e., “Average CCR”, “95 percentile of CCR” and “S.D. of CCR”, were defined to compare different algorithms. It was revealed that BSVM is significantly better than KNN or SIMCA in the statistical sense, especially in cases where the training set is relatively small. The results suggest that NIR spectroscopy together with BSVM could be an alternative to traditional methods for discriminating different brands of cigarettes.  相似文献   

10.
Summary Microprocessor-controlled analytical methods produce large amounts of data in a short time. This wealth of information is difficult to handle with conventional methods. Numerical classification is a valuable tool for reducing large arrays of data and revealing structure therein. Non-hierarchical methods have been used with good results for the classification of spectral analytical data of ancient glass samples. With the newly developed program APART, a classification was obtained according to technological aspects of glass production. The narrow clusters found were caused by local traditions and the use of similar raw materials. The colouring materials and decolourants contribute most to the trace element content, and are therefore dominant in the classification.
Nicht-hierarchische Klassifikation in der analytischen Chemie
Zusammenfassung Mit neuen mikroprocessor-gesteuerten analytischen Methoden erhält man eine Fülle von Daten in kurzer Zeit. Solche Datenmengen erweisen sich mit herkömmlichen Methoden als nur schwer überschaubar. Mit der Methode der numerischen Klassifikation ist es möglich geworden, auch große Datenmengen zu verarbeiten und zu interpretieren. Für die Klassifikation der meisten analytischen Daten empfiehlt sich die Verwendung nichthierarchischer Algorithmen. Spektralanalytische Daten von antiken Glasfunden wurden für eine Klassifikationsanalyse herangezogen. Dazu wurde ein neu entwickeltes Programm namens APART verwendet, das eine Klassifikation nach produktionstechnischen Gesichtspunkten ermöglichte. Lokale Tradition in der Herstellung und die Verwendung ähnlicher Rohstoffe ergaben eine Struktur gut definierter enger Cluster im vorhandenen Probenmaterial. Färbe- und Entfärbemittel der Glasherstellung bewirken besonders hohe Spurenelementgehalte und bestimmen daher weitgehend die Klassifikation.


Presented at the 8th International Microchemical Symposium, Graz, August 25–30, 1980.  相似文献   

11.
There are compelling needs from a variety of camps for more chemistry data to be available. While there are funder and government mandates for depositing research data in the United States and Europe, this does not mean it will be done well or expediently. Chemists themselves do not appear overly engaged at this stage and chemistry librarians who work directly with chemists and their local information environments are interested in helping with this challenge. Our unique understanding of organizing data and information enables us to contribute to building necessary infrastructure and establishing standards and best practices across the full research data cycle. As not many support structures focused on chemistry currently exist, we are initiating explorations through a few case studies and focused pilot projects presented here, with an aim of identifying opportunities for increased collaboration among chemists, chemistry librarians, cheminformaticians and other chemistry professionals.  相似文献   

12.
The wide application of next-generation sequencing has presented a new hurdle to bioinformatics for managing the fast-growing sequence data. The management of biomacromolecules at the chemistry level imposes an even greater challenge in cheminformatics because of the lack of a good chemical representation of biopolymers. Here we introduce the self-contained sequence representation (SCSR). SCSR combines the best features of bioinformatics and cheminformatics notations. SCSR is the first general, extensible, and comprehensive representation of biopolymers in a compressed format that retains chemistry detail. The SCSR-based high-performance exact structure and substructure searching methods (NEMA key and SSS) offer new ways to search biopolymers that complement bioinformatics approaches. The widely used chemical structure file format (molfile) has been enhanced to support SCSR. SCSR offers a solid framework for future development of new methods and systems for managing and handling sequences at the chemistry level. SCSR lays the foundation for the integration of bioinformatics and cheminformatics.  相似文献   

13.
The main tests developed in last 20 years to investigate the chromatographic behaviour and the stationary phase properties are described in this paper. These properties are the hydrophobicity, depending on the surface area and the bonding density, the number of accessible residual silanol groups having sometimes different acidity, which can interact with neutral solutes by hydrogen bonds or with the ionic form of basic compounds and the shape or steric selectivity, depending on both the functionality of the silanising agent and the bonding density. Two types of tests are performed, either based on key solutes having well defined properties such as phenol, caffeine, amitriptyline, benzylamine, acenaphtene, o-terphenyl, triphenylene, p-ethylaniline, carotenoid pigments, or on retention models (solvation parameter, hydrophobic subtraction) obtained from the analyses of numerous and varied compounds. Thus, the chromatographic properties are either related to selectivities or retention factors calculated from key solutes, or they are described by interaction coefficients provided by multilinear regression from retention models. Three types of comparison methods are used based on these data. First, simple plots allow the study of differences between the columns as regards to one or two properties. Columns located in the same area of the plot display close properties. Second, chemometric methods such as principal component analysis (PCA) or hierarchical cluster analysis (HCA) can be performed to compare columns. In this case, all the studied properties are included in the comparison, done either by data projection to reduce the space in which the information is located (PCA) or by distance calculation and comparison for drawing a classification (HCA). Neighbouring columns are expected to provide identical chromatographic performances. These two chemometric methods can be used together, PCA before HCA. The third way is to calculate a discrimination factor from a reference column, through calculation methods based on the Pythagorean Theorem: the lower this factor, the closer the column properties. Following the presentation of the analytical conditions, the compounds and the data treatments used by the teams working in this field, the pertinence of the different selectivities, i.e. of the different probe solute couples or of the different interaction coefficients, are discussed as regards their discrimination capacity. The accuracy of chemometric treatments in the discrimination of stationary phases having different functionalities (octadecylsiloxane (ODS), cyano, fluorinated, phenyl, polar embedded group or "aqua" type) will be discussed, as well as their performances in the finer ODS discrimination. New two-dimensional plots, from data gained by different studies will be suggested, to improve the classification of stationary phases having different nature of bonded chains.  相似文献   

14.
Accreditation and Quality Assurance - To analyze drinking water dataset, various statistical methods have been applied, including discriminant analysis, logistic regression and cluster analysis, to...  相似文献   

15.
16.
Near-infrared spectroscopy has gained great acceptance in the industry due to its multiple applications and versatility. Sometimes, however, the construction of accurate and robust calibration models involves the collection of a large number of samples with related reference analysis that can complicate and prolong the calibration stage.In this paper, ensemble methods and data augmentation by noise simulation have been applied to spectroscopic data in combination with PLSR to obtain robust models able to handle different types of perturbations likely to affect NIR data. Several types of noise have been investigated as well as different ensemble methods focused on obtaining robust PLS models able to predict both the original and the perturbed test data.The suitability of ensemble methods to perform robust calibration models has been investigated and compared to extended multiplicative signal correction (EMSC) and other calibration approaches in a real case of temperature compensation. Extended multiplicative signal correction (EMSC) and ensemble methods seem to be the most appropriate methods yielding the best results in terms of accuracy and prediction ability with a reduced calibration data set.  相似文献   

17.
18.
We propose a new classification method for the prediction of drug properties, called random feature subset boosting for linear discriminant analysis (LDA). The main novelty of this method is the ability to overcome the problems with constructing ensembles of linear discriminant models based on generalized eigenvectors of covariance matrices. Such linear models are popular in building classification-based structure-activity relationships. The introduction of ensembles of LDA models allows for an analysis of more complex problems than by using single LDA, for example, those involving multiple mechanisms of action. Using four data sets, we show experimentally that the method is competitive with other recently studied chemoinformatic methods, including support vector machines and models based on decision trees. We present an easy scheme for interpreting the model despite its apparent sophistication. We also outline theoretical evidence as to why, contrary to the conventional AdaBoost ensemble algorithm, this method is able to increase the accuracy of LDA models.  相似文献   

19.
Web-based tools offer many advantages for processing chemical information, most notably ease of use and high interactivity. Therefore more and more pharmaceutical companies are using web technology to deliver sophisticated molecular processing tools directly to the desks of their chemists, to assist them in the process of designing and developing new drugs. In this paper, the web-based cheminformatics system developed at Novartis and currently used by more than thousand users is described. The system allows various molecular modeling and molecular processing tasks, including the calculation of molecular and substituent properties, property-based virtual screening, visualization of molecules, bioisosteric design, diversity analysis, and support of combinatorial chemistry. The methodology to calculate various molecular properties relevant to drug design is described, including the prediction of intestinal absorption, blood-brain barrier penetration, efflux, and water solubility. Information about the web technology used is also provided.  相似文献   

20.
One of the drawbacks for using linear discriminant analysis (LDA) is the presence of outliers. Some methods of detecting outliers are compared and applied to a particular data base. When multivariate methods (multinormal distribution procedure and Hawkins' procedure) were applied, the two subsets produced did not differ greatly. Assumptions needed for the application of LDA were evaluated for each subset. Classification ability, feature selection and prediction ability were considered for each subset. Results for each subset were quite different. Hawkins' procedure seems the better method for detecting outliers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号