首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
This study unites six popular machine learning approaches to enhance the prediction of a molecular binding affinity between receptors (large protein molecules) and ligands (small organic molecules). Here we examine a scheme where affinity of ligands is predicted against a single receptor – human thrombin, thus, the models consider ligand features only. However, the suggested approach can be repurposed for other receptors. The methods include Support Vector Machine, Random Forest, CatBoost, feed-forward neural network, graph neural network, and Bidirectional Encoder Representations from Transformers. The first five methods use input features based on physico-chemical properties of molecules, while the last one is based on textual molecular representations. All approaches do not rely on atomic spatial coordinates, avoiding a potential bias from known structures, and are capable of generalizing for compounds with unknown conformations. Within each of the methods, we have trained two models that solve classification and regression tasks. Then, all models are grouped into a pipeline of two subsequent ensembles. The first ensemble aggregates six classification models which vote whether a ligand binds to a receptor or not. If a ligand is classified as active (i.e., binds), the second ensemble predicts its binding affinity in terms of the inhibition constant Ki.  相似文献   

2.
Recent computational methods have made strides in discovering well-structured cyclic peptides that preferentially populate a single conformation. However, many successful cyclic-peptide therapeutics adopt multiple conformations in solution. In fact, the chameleonic properties of some cyclic peptides are likely responsible for their high cell membrane permeability. Thus, we require the ability to predict complete structural ensembles for cyclic peptides, including the majority of cyclic peptides that have broad structural ensembles, to significantly improve our ability to rationally design cyclic-peptide therapeutics. Here, we introduce the idea of using molecular dynamics simulation results to train machine learning models to enable efficient structure prediction for cyclic peptides. Using molecular dynamics simulation results for several hundred cyclic pentapeptides as the training datasets, we developed machine-learning models that can provide molecular dynamics simulation-quality predictions of structural ensembles for all the hundreds of thousands of sequences in the entire sequence space. The prediction for each individual cyclic peptide can be made using less than 1 second of computation time. Even for the most challenging classes of poorly structured cyclic peptides with broad conformational ensembles, our predictions were similar to those one would normally obtain only after running multiple days of explicit-solvent molecular dynamics simulations. The resulting method, termed StrEAMM (Structural Ensembles Achieved by Molecular Dynamics and Machine Learning), is the first technique capable of efficiently predicting complete structural ensembles of cyclic peptides without relying on additional molecular dynamics simulations, constituting a seven-order-of-magnitude improvement in speed while retaining the same accuracy as explicit-solvent simulations.

The StrEAMM method enables predicting the structural ensembles of cyclic peptides that adopt multiple conformations in solution.  相似文献   

3.
We have generated an open-source dataset of over 30 000 organic chemistry gas phase partition functions. With this data, a machine learning deep neural network estimator was trained to predict partition functions of unknown organic chemistry gas phase transition states. This estimator only relies on reactant and product geometries and partition functions. A second machine learning deep neural network was trained to predict partition functions of chemical species from their geometry. Our models accurately predict the logarithm of test set partition functions with a maximum mean absolute error of 2.7%. Thus, this approach provides a means to reduce the cost of computing reaction rate constants ab initio. The models were also used to compute transition state theory reaction rate constant prefactors and the results were in quantitative agreement with the corresponding ab initio calculations with an accuracy of 98.3% on the log scale.

Deep neural networks accurately predict transition state partition functions at the low cost of reactant and product input features for organic chemistry gas phase reactions.  相似文献   

4.
Machine learning (ML) methods have been present in the field of NMR since decades, but it has experienced a tremendous growth in the last few years, especially thanks to the emergence of deep learning (DL) techniques taking advantage of the increased amounts of data and available computer power. These algorithms are successfully employed for classification, regression, clustering, or dimensionality reduction tasks of large data sets and have been intensively applied in different areas of NMR including metabonomics, clinical diagnosis, or relaxometry. In this article, we concentrate on the various applications of ML/DL in the areas of NMR signal processing and analysis of small molecules, including automatic structure verification and prediction of NMR observables in solution.  相似文献   

5.
The results of the application of a density functional theory method incorporating dispersive corrections in the 2010 crystal structure prediction blind test are reported. The method correctly predicted four out of the six experimental structures. Three of the four correct predictions were found to have the lowest lattice energy of any crystal structure for that molecule. The experimental crystal structures for all six compounds were found during the structure generation phase of the simulations, indicating that the tailor-made force fields used for screening structures were valid and that the structure generation engine, which combines a Monte Carlo parallel tempering algorithm with an efficient lattice energy minimiser, was working effectively. For the three compounds for which the experimental crystal structures did not correspond to the lowest energy structures found, the method for calculating the lattice energy needs to be further refined or there may be other polymorphs that have not yet been found experimentally.  相似文献   

6.
We explore automation of protein structural classification using supervised machine learning methods on a set of 11,360 pairs of protein domains (up to 35% sequence identity) consisting of three secondary structure elements. Fifteen algorithms from five categories of supervised algorithms are evaluated for their ability to learn for a pair of protein domains, the deepest common structural level within the SCOP hierarchy, given a one-dimensional representation of the domain structures. This representation encapsulates evolutionary information in terms of sequence identity and structural information characterising the secondary structure elements and lengths of the respective domains. The evaluation is performed in two steps, first selecting the best performing base learners and subsequently evaluating boosted and bagged meta learners. The boosted random forest, a collection of decision trees, is found to be the most accurate, with a cross-validated accuracy of 97.0% and F-measures of 0.97, 0.85, 0.93 and 0.98 for classification of proteins to the Class, Fold, Super-Family and Family levels in the SCOP hierarchy. The meta learning regime, especially boosting, improved performance by more accurately classifying the instances from less populated classes.  相似文献   

7.
Quantum-chemistry simulations based on potential energy surfaces of molecules provide invaluable insight into the physicochemical processes at the atomistic level and yield such important observables as reaction rates and spectra. Machine learning potentials promise to significantly reduce the computational cost and hence enable otherwise unfeasible simulations. However, the surging number of such potentials begs the question of which one to choose or whether we still need to develop yet another one. Here, we address this question by evaluating the performance of popular machine learning potentials in terms of accuracy and computational cost. In addition, we deliver structured information for non-specialists in machine learning to guide them through the maze of acronyms, recognize each potential''s main features, and judge what they could expect from each one.

This article provides a lifeline for those lost in the sea of the molecular machine learning potentials by providing a balanced overview and evaluation of popular potentials.  相似文献   

8.
Many organic molecules are emerging as having many crystalline forms, including polymorphs and solvates, as more techniques are being used to generate and characterise the organic solid state. The fundamental scientific and industrial interest in controlling crystallisation is inspiring the development of computational methods of predicting which crystal structures are thermodynamically feasible. Sometimes, computing this crystal energy landscape will reveal that a molecule has one way of packing with itself that is sufficiently more favourable than any other so only this crystal structure will be observed. More frequently, there will be many energy minima that are energetically feasible, showing approximately equi-energetic compromises between the various intermolecular interactions allowed by the conformational flexibility. Such cases generally lead to multiple solid forms. At the moment, we usually calculate the lattice energy landscape, an approximation to the real crystal energy landscape at 0 K. Despite its limitations, many studies show that this is a valuable complement to solid form screening, which helps in discovering new structures as well as rationalising the solid forms that are found in experimental searches. The range of factors that can determine which of the thermodynamically feasible crystal structures are observed polymorphs, shows the many further challenges in developing crystal energy landscapes as a tool for control of the organic solid state.  相似文献   

9.
The protein disulfide bond is a covalent bond that forms during post-translational modification by the oxidation of a pair of cysteines. In protein, the disulfide bond is the most frequent covalent link between amino acids after the peptide bond. It plays a significant role in three-dimensional (3D) ab initio protein structure prediction (aiPSP), stabilizing protein conformation, post-translational modification, and protein folding. In aiPSP, the location of disulfide bonds can strongly reduce the conformational space searching by imposing geometrical constraints. Existing experimental techniques for the determination of disulfide bonds are time-consuming and expensive. Thus, developing sequence-based computational methods for disulfide bond prediction becomes indispensable. This study proposed a stacking-based machine learning approach for disulfide bond prediction (diSBPred). Various useful sequence and structure-based features are extracted for effective training, including conservation profile, residue solvent accessibility, torsion angle flexibility, disorder probability, a sequential distance between cysteines, and more. The prediction of disulfide bonds is carried out in two stages: first, individual cysteines are predicted as either bonding or non-bonding; second, the cysteine-pairs are predicted as either bonding or non-bonding by including the results from cysteine bonding prediction as a feature.The examination of the relevance of the features employed in this study and the features utilized in the existing nearest neighbor algorithm (NNA) method shows that the features used in this study improve about 7.39 % in jackknife validation balanced accuracy. Moreover, for individual cysteine bonding prediction and cysteine-pair bonding prediction, diSBPred provides a 10-fold cross-validation balanced accuracy of 82.29 % and 94.20 %, respectively. Altogether, our predictor achieves an improvement of 43.25 % based on balanced accuracy compared to the existing NNA based approach. Thus, diSBPred can be utilized to annotate the cysteine bonding residues of protein sequences whose structures are unknown as well as improve the accuracy of the aiPSP method, which can further aid in experimental studies of the disulfide bond and structure determination.  相似文献   

10.
11.
12.
RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.  相似文献   

13.
14.
Summary Preliminary results of a machine learning application concerning computer-aided molecular design applied to drug discovery are presented. The artificial intelligence techniques of machine learning use a sample of active and inactive compounds, which is viewed as a set of positive and negative examples, to allow the induction of a molecular model characterizing the interaction between the compounds and a target molecule. The algorithm is based on a twofold phase. In the first one — the specialization step — the program identifies a number of active/inactive pairs of compounds which appear to be the most useful in order to make the learning process as effective as possible and generates a dictionary of molecular fragments, deemed to be responsible for the activity of the compounds. In the second phase — the generalization step — the fragments thus generated are combined and generalized in order to select the most plausible hypothesis with respect to the sample of compounds. A knowledge base concerning physical and chemical properties is utilized during the inductive process.  相似文献   

15.
Following an initial Communication [Buch et al., J. Chem. Phys. 123, 051108 (2005)], a new molecular-dynamics-based approach is explored to search for candidate crystal structures of molecular solids corresponding to minima of the enthalpy. The approach is based on the observation of phase transitions in an artificial periodic system with a small unit cell and relies on the existence of an optimal energy range for observing freezing to low-lying minima in the course of classical trajectories. Tests are carried out for O structures of nine H2O-ice polymorphs. NVE trajectories for a range of preimposed box shapes display freezing to the different crystal polymorphs whenever the box dimensions approximate roughly the appropriate unit cell; the exception is ice II for which freezing requires unit cell dimensions close to the correct ones. In an alternate version of the algorithm, an initial box shape is picked at random and subsequently readjusted at short trajectory intervals by enthalpy minimization. Tests reveal the existence of ice forms which are "difficult" and "easy" to locate in this way. The former include ice IV, which is also difficult to crystallize experimentally from the liquid, and ice II, which does not interface with the liquid in the phase diagram. On the other hand, the latter crystal search procedure located successfully the remaining seven ice polymorphs, including ice V, which corresponds to the most complicated structure of all ice phases, with a monoclinic cell of 28 molecules.  相似文献   

16.
17.
18.
大叶香茶菜庚素的晶体结构和分子结构研究   总被引:2,自引:0,他引:2  
大叶庚(Rabdophyllin G)是从中草药中提取得到的抗癌有效药物.其晶体属正交晶系,空间群为P2_12_12_1;晶胞参数为:α=6.820(5),b=9.076(1),c=33.525(7)A;Z=4.用X射线衍射法测定该化合物的晶体结构和分子结构.R因子最终修正到0.0584.不久前借其他实验手段得出的分子结构式,本文对它作出一些订正.文中还讨论了分子中的某些结构和化学活性问题.  相似文献   

19.
20.
A new molecular-dynamics based approach is proposed to search for candidate crystal structures of molecular solids. The procedure is based on the observation of spontaneous transitions between ordered and disordered states in molecular-dynamics simulations of an artificial periodic system with a small unit cell. In such a way only the most stable structures are automatically selected. The method can be applied to the solution of crystal structures from low-quality or very complex diffraction data. Tests are presented for H2O-ice polymorphs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号