首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Inductive bias is the set of assumptions that a person or procedure makes in making a prediction based on data. Different methods for ligand-based predictive modeling have different inductive biases, with a particularly sharp contrast between 2D and 3D similarity methods. A unique aspect of ligand design is that the data that exist to test methodology have been largely man-made, and that this process of design involves prediction. By analyzing the molecular similarities of known drugs, we show that the inductive bias of the historic drug discovery process has a very strong 2D bias. In studying the performance of ligand-based modeling methods, it is critical to account for this issue in dataset preparation, use of computational controls, and in the interpretation of results. We propose specific strategies to explicitly address the problems posed by inductive bias considerations.  相似文献   

3.
化学科学领域的复杂性和海量数据为人工智能应用提供了契机。人工智能、机器学习、深度学习从海量数据中识别新的化合物,建立新的模型,提出新的理论,正在改变化学物质的发现、转化和功能研究范式,促进重大问题的解决。本文综述了近年来国际上人工智能在化学研究中的重要进展,分析了人工智能化学的主要发展态势。人工智能通过助力化学海量数据挖掘、实现化学实验室智能化和自动化、增强计算化学解决实际问题的能力,推动化学跨越式发展。  相似文献   

4.
Summary Building predictive models for iterative drug design in the absence of a known target protein structure is an important challenge. We present a novel technique, Compass, that removes a major obstacle to accurate prediction by automatically selecting conformations and alignments of molecules without the benefit of a characterized active site. The technique combines explicit representation of molecular shape with neural network learning methods to produce highly predictive models, even across chemically distinct classes of molecules. We apply the method to predicting human perception of musk odor and show how the resulting models can provide graphical guidance for chemical modifications.  相似文献   

5.
6.
The interaction between the equol enantiomers and β -cyclodextrin is studied by molecular mechanics and molecular dynamics calculations. The chromatographic retention order is determined by these theoretical methods and compared with experimental findings. In the molecular mechanics calculations, the simultaneous relaxation of the host and the guest molecules is allowed, both in a vacuum and in aqueous solution. In the molecular dynamics calculations, the interaction energy between each enantiomer and the cavity is determined carrying out a simulation of 12 trajectories with different initial conditions at constant temperature (293 K), and minimising the energy of the structures extracted along the trajectories. To determine the preferential binding site and orientation of each guest molecule, the numerical density of presence in a volume element is calculated and compared with regions of maximum enantioselectivity. The more stable complex predicted in both cases is formed with R-equol, in agreement with experimental results.  相似文献   

7.
8.
Nowadays, cancer is considered a global pandemic and millions of people die every year because this disease remains a challenge for the world scientific community. Even with the efforts made to combat it, there is a growing need to discover and design new drugs and vaccines. Among these alternatives, antitumor peptides are a promising therapeutic solution to reduce the incidence of deaths caused by cancer. In the present study, we developed TTAgP, an accurate bioinformatic tool that uses the random forest algorithm for antitumor peptide predictions, which are presented in the context of MHC class I. The predictive model of TTAgP was trained and validated based on several features of 922 peptides. During the model validation we achieved sensitivity = 0.89, specificity = 0.92, accuracy = 0.90 and the Matthews correlation coefficient = 0.79 performance measures, which are indicative of a robust model. TTAgP is a fast, accurate and intuitive software focused on the prediction of tumor T cell antigens.  相似文献   

9.
10.
Dengue virus (DENV) has emerged as a rapidly spreading epidemic throughout the tropical and subtropical regions around the globe. No suitable drug has been designed yet to fight against DENV, therefore, the need for safe and effective antiviral drug has become imperative. The envelope protein of DENV is responsible for mediating the fusion process between viral and host membranes. This work reports an in silico approach to target B and T cell epitopes for dengue envelope protein inhibition. A conserved region “QHGTI” in B and T cell epitopes of dengue envelope glycoprotein was confirmed to be valid for targeting by visualizing its interactions with the host cell membrane TIM-1 protein which acts as a receptor for serotype 2 and 3. A reverse pharmacophore mapping approach was used to generate a seven featured pharmacophore model on the basis of predicted epitope. This pharmacophore model as a 3D query was used to virtually screen a chemical compounds dataset “Chembridge”. A total of 1010 compounds mapped on the developed pharmacophore model. These retrieved hits were subjected to filtering via Lipinski’s rule of five, as a result 442 molecules were shortlisted for further assessment using molecular docking. Finally, 14 hits of different structural properties having interactions with the active site residues of dengue envelope glycoprotein were selected as lead candidates. These structurally diverse lead candidates have strong likelihood to act as further starting structures in the development of novel and potential drugs for the treatment of dengue fever.  相似文献   

11.
12.
Here, we propose an in silico fragment-mapping method as a potential tool for fragment-based/structure-based drug discovery (FBDD/SBDD). For this method, we created a database named Canonical Subsite–Fragment DataBase (CSFDB) and developed a knowledge-based fragment-mapping program, Fsubsite. CSFDB consists of various pairs of subsite–fragments derived from X-ray crystal structures of known protein–ligand complexes. Using three-dimensional similarity-matching between subsites on one protein and another, Fsubsite compares the surface of a target protein with all subsites in CSFDB. When a local topography similar to the subsite is found on the surface, Fsubsite places a fragment combined with the subsite in CSFDB on the target protein. For validation purposes, we applied the method to the apo-structure of cyclin-dependent kinase 2 (CDK2) and identified four compounds containing three mapped fragments that existed in the list of known inhibitors of CDK2. Next, the utility of our fragment-mapping method for fragment-growing was examined on the complex structure of tRNA-guanine transglycosylase with a small ligand. Fsubsite mapped appropriate fragments on the same position as the binding ligand or in the vicinity of the ligand. Finally, a 3D-pharmacophore model was constructed from the fragments mapped on the apo-structure of heat shock protein 90-α (HSP90α). Then, 3D pharmacophore-based virtual screening was carried out using a commercially available compound database. The resultant hit compounds were very similar to a known ligand of HSP90α. As a result of these findings, this in silico fragment-mapping method seems to be a useful tool for computational FBDD and SBDD.  相似文献   

13.
The molecular imprinting approach provides a unique opportunity for the creation of three-dimensional cavities with tailored recognition properties. Over the last decade this field has expanded considerably, across a variety of disciplines, leading to novel approaches and many potential applications. Progress in the field of materials science has led to significant breakthroughs and the application of the imprinting approach to novel polymeric formats offers new insights and attractive methods for the preparation of synthetic receptors. In particular, nanomaterials have received considerable attention in the developing field of nanotechnology. With a large number of recent developments in the field of molecular imprinting available, this article is focused on a selection of new systems, in particular the different formats of nanomaterials, such as nanogels, nanofibres, nanowires and nanotubes.  相似文献   

14.
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.  相似文献   

15.
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.  相似文献   

16.
Nucleophilicity and electrophilicity dictate the reactivity of polar organic reactions. In the past decades, Mayr et al. established a quantitative scale for nucleophilicity (N) and electrophilicity (E), which proved to be a useful tool for the rationalization of chemical reactivity. In this study, a holistic prediction model was developed through a machine-learning approach. rSPOC, an ensemble molecular representation with structural, physicochemical and solvent features, was developed for this purpose. With 1115 nucleophiles, 285 electrophiles, and 22 solvents, the dataset is currently the largest one for reactivity prediction. The rSPOC model trained with the Extra Trees algorithm showed high accuracy in predicting Mayr's N and E parameters with R2 of 0.92 and 0.93, MAE of 1.45 and 1.45, respectively. Furthermore, the practical applications of the model, for instance, nucleophilicity prediction of NADH, NADPH and a series of enamines showed potential in predicting molecules with unknown reactivity within seconds. An online prediction platform (http://isyn.luoszgroup.com/) was constructed based on the current model, which is available free to the scientific community.  相似文献   

17.
The stability of a series of hydrogen-bonded duplexes was studied using molecular mechanics method with a modified AMBER GAFF force field, in which the original atomic charges were replaced by ones that are more appropriate for non-polar solvents. The free energy change of dimerization was calculated in vacuo and good agreement with experimental data was found. It is also shown that the stability of these duplexes increases linearly with the number of hydrogen bonds, in agreement with experimental data.  相似文献   

18.
GenerationE of huge “omics” data necessitates the development and application of computational methods to annotate the data in terms of biological features. In the context of DNA sequence, it is important to unravel the hidden physicochemical signatures. For this purpose, we have considered various sequence elements such as promoter, ACS, LTRs, telomere, and retrotransposon of the model organism Saccharomyces cerevisiae. Contributions due to di-nucleotides play a major role in studying the DNA conformation profile. The physicochemical parameters used are hydrogen bonding energy, stacking energy and solvation energy per base pair. Our computational study shows that all sequence elements in this study have distinctive physicochemical signatures and the same can be exploited for prediction experiments. The order that we see in a DNA sequence is dictated by biological regions and hence, there exists role of dependency in the sequence makeup, keeping this in mind we are proposing two computational schemes (a) using a windowing block size procedure and (b) using di-nucleotide transitions. We obtained better discriminating profile when we analyzed the sequence data in windowing manner. In the second novel approach, we introduced the di-nucleotide transition probability matrix (DTPM) to study the hidden layer of information embedded in the sequences. DTPM has been used as weights for scanning and predictions. This proposed computational scheme incorporates the memory property which is more realistic to study the physicochemical properties embedded in DNA sequences. Our analysis shows that the DTPM scheme performs better than the existing method in this applied region. Characterization of these elements will be a key to genome editing applications and advanced machine learning approaches may also require such distinctive profiles as useful input features.  相似文献   

19.
20.
Multiple machine learning models were developed in this study to optimize biodiesel production from waste cooking oil in a heterogenous catalytic reaction mode. Several input parameters were considered for the model including reaction temperature, reaction time, catalyst loading, methanol/oil molar ratio, whereas the percent of biodiesel production yield was the only output. Three ensemble models were utilized in this study: Boosted Linear Regression, Boosted Multi-layer Perceptron, and Forest of Randomized Tree for optimization of the yield. We then found their optimized configurations for each model, namely hyper-parameters. This critical task is done by running more than 1000 combinations of hyper-parameters. Finally, The R2-Scores for Boosted Linear Regression, Boosted Multi-layer Perceptron, and Forest of Randomized Tree, respectively, were 0.926, 0.998, and 0.992. MAPE criterion revealed that the error rates for boosted linear regression, boosted multi-layer perceptron, and Forest of Randomized Tree was 5.68 × 10-2, 5.20 × 10-2, and 9.83 × 10-2, respectively. Furthermore, utilizing the input vector (X1 = 165, X2 = 5.72, X3 = 5.55, X4 = 13.0), the proposed technique produces an ideal output value of 96.7 % as the optimum yield in catalytic production of biodiesel from waste cooking oil.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号