首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A variety of fields would benefit from accurate \(pK_a\) predictions, especially drug design due to the effect a change in ionization state can have on a molecule’s physiochemical properties. Participants in the recent SAMPL6 blind challenge were asked to submit predictions for microscopic and macroscopic \(pK_a\)s of 24 drug like small molecules. We recently built a general model for predicting \(pK_a\)s using a Gaussian process regression trained using physical and chemical features of each ionizable group. Our pipeline takes a molecular graph and uses the OpenEye Toolkits to calculate features describing the removal of a proton. These features are fed into a Scikit-learn Gaussian process to predict microscopic \(pK_a\)s which are then used to analytically determine macroscopic \(pK_a\)s. Our Gaussian process is trained on a set of 2700 macroscopic \(pK_a\)s from monoprotic and select diprotic molecules. Here, we share our results for microscopic and macroscopic predictions in the SAMPL6 challenge. Overall, we ranked in the middle of the pack compared to other participants, but our fairly good agreement with experiment is still promising considering the challenge molecules are chemically diverse and often polyprotic while our training set is predominately monoprotic. Of particular importance to us when building this model was to include an uncertainty estimate based on the chemistry of the molecule that would reflect the likely accuracy of our prediction. Our model reports large uncertainties for the molecules that appear to have chemistry outside our domain of applicability, along with good agreement in quantile–quantile plots, indicating it can predict its own accuracy. The challenge highlighted a variety of means to improve our model, including adding more polyprotic molecules to our training set and more carefully considering what functional groups we do or do not identify as ionizable.  相似文献   

2.
Journal of Computer-Aided Molecular Design - We applied the COSMO-RS method to predict the partition coefficient logP between water and 1-octanol for 22 small drug like molecules within the...  相似文献   

3.
In the context of the SAMPL5 challenge water-cyclohexane distribution coefficients for 53 drug-like molecules were predicted. Four different models based on molecular dynamics free energy calculations were tested. All models initially assumed only one chemical state present in aqueous or organic phases. Model A is based on results from an alchemical annihilation scheme; model B adds a long range correction for the Lennard Jones potentials to model A; model C adds charging free energy corrections; model D applies the charging correction from model C to ionizable species only. Model A and B perform better in terms of mean-unsigned error (\(\hbox {MUE}=6.79<6.87<6.95 \log\) D units ? 95 % confidence interval) and determination coefficient \((\hbox {R}^2 = 0.26< 0.27< 0.28)\), while charging corrections lead to poorer results with model D (\(\hbox {MUE}=12.8<12.63<12.98\) and \(\hbox {R}^2 = 0.16<0.17<0.18\)). Because overall errors were large, a retrospective analysis that allowed co-existence of ionisable and neutral species of a molecule in aqueous phase was investigated. This considerably reduced systematic errors (\(\hbox {MUE}=1.87<1.97<2.07\) and \(\hbox {R}^2 = 0.35<0.40<0.45\)). Overall accurate \(\log D\) predictions for drug-like molecules that may adopt multiple tautomers and charge states proved difficult, indicating a need for methodological advances to enable satisfactory treatment by explicit-solvent molecular simulations.  相似文献   

4.
5.
Journal of Computer-Aided Molecular Design - The Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenges focuses the computational modeling community on areas in need of...  相似文献   

6.
In the context of the SAMPL5 blinded challenge standard free energies of binding were predicted for a dataset of 22 small guest molecules and three different host molecules octa-acids (OAH and OAMe) and a cucurbituril (CBC). Three sets of predictions were submitted, each based on different variations of classical molecular dynamics alchemical free energy calculation protocols based on the double annihilation method. The first model (model A) yields a free energy of binding based on computed free energy changes in solvated and host-guest complex phases; the second (model B) adds long range dispersion corrections to the previous result; the third (model C) uses an additional standard state correction term to account for the use of distance restraints during the molecular dynamics simulations. Model C performs the best in terms of mean unsigned error for all guests (MUE \(3.2\,<\,3.4\,<\,3.6\,\text{kcal}\,\text{mol}^{-1}\)—95 % confidence interval) for the whole data set and in particular for the octa-acid systems (MUE \(1.7\,<\,1.9\,<\,2.1\,\text{kcal}\,\text{mol}^{-1}\)). The overall correlation with experimental data for all models is encouraging (\(R^2\, 0.65\,<\,0.70<0.75\)). The correlation between experimental and computational free energy of binding ranks as one of the highest with respect to other entries in the challenge. Nonetheless the large MUE for the best performing model highlights systematic errors, and submissions from other groups fared better with respect to this metric.  相似文献   

7.
The COSMO-RS method, a combination of the quantum chemical dielectric continuum solvation model COSMO with a COSMO based statistical thermodynamics of surface interactions, has been used in its COSMOtherm implementation for the direct, blind prediction of tautomeric equilibria within the SAMPL2 challenge. Since the quantum chemical level underlying COSMOtherm, i.e. BP/TZVP DFT-calculations, is known to be of limited accuracy with respect to reaction energies, we tested MP2 reaction energy corrections in addition. As expected, the straight application of the latest version of COSMOtherm yielded a poor predictive accuracy of ~4 kcal/mol (RMSE) for the eight compounds of the blind prediction data set, and the MP2-corrected predictions reduced the average error considerably to ~1.2 kcal/mol. But a more detailed analysis shows that this improvement is not systematic and mostly a lucky coincidence on the small data set. The systematic results of COSMOtherm allow for an efficient empirical correction with an RMSE of 0.61 kcal/mol. This allows for systematic predictions for the most important case of generalized keto-enol tautomerism.  相似文献   

8.
In the context of the SAMPL6 challenges, series of blinded predictions of standard binding free energies were made with the SOMD software for a dataset of 27 host–guest systems featuring two octa-acids hosts (OA and TEMOA) and a cucurbituril ring (CB8) host. Three different models were used, ModelA computes the free energy of binding based on a double annihilation technique; ModelB additionally takes into account long-range dispersion and standard state corrections; ModelC additionally introduces an empirical correction term derived from a regression analysis of SAMPL5 predictions previously made with SOMD. The performance of each model was evaluated with two different setups; buffer explicitly matches the ionic strength from the binding assays, whereas no-buffer merely neutralizes the host–guest net charge with counter-ions. ModelC/no-buffer shows the lowest mean-unsigned error for the overall dataset (MUE 1.29?<?1.39?<?1.50 kcal mol?1, 95% CI), while explicit modelling of the buffer improves significantly results for the CB8 host only. Correlation with experimental data ranges from excellent for the host TEMOA (R2 0.91?<?0.94?<?0.96), to poor for CB8 (R2 0.04?<?0.12?<?0.23). Further investigations indicate a pronounced dependence of the binding free energies on the modelled ionic strength, and variable reproducibility of the binding free energies between different simulation packages.  相似文献   

9.
Journal of Computer-Aided Molecular Design - We systematically tested the Autodock4 docking program for absolute binding free energy predictions using the host-guest systems from the recent SAMPL6,...  相似文献   

10.
Theoretical approaches for predicting physicochemical properties are valuable tools for accelerating the drug discovery process. In this work, quantum chemical methods are used to predict water–octanol partition coefficients as a part of the SAMPL6 blind challenge. The SMD continuum solvent model was employed with MP2 and eight DFT functionals in conjunction with correlation consistent basis sets to determine the water–octanol transfer free energy. Several tactics towards improving the predictions of the partition coefficient were examined, including increasing the quality of basis sets, considering tautomerization, and accounting for inhomogeneities in the water and n-octanol phases. Evaluation of these various schemes highlights the impact of modeling approaches across different methods. With the inclusion of tautomers and adjustments to the permittivity constants, the best predictions were obtained with smaller basis sets and the O3LYP functional, which yielded an RMSE of 0.79 logP units. The results presented correspond to the SAMPL6 logP submission IDs: DYXBT, O7DJK, and AHMTF.  相似文献   

11.
12.
Journal of Computer-Aided Molecular Design - Accurate prediction of lipophilicity—logP—based on molecular structures is a well-established field. Predictions of logP are often used to...  相似文献   

13.

In the context of the recent SAMPL6 SAMPLing challenge (Rizzi et al. 2020 in J Comput Aided Mol Des 34:601–633) aimed at assessing convergence properties and reproducibility of molecular dynamics binding free energy methodologies, we propose a simple explanation of the severe errors observed in the nonequilibrium switch double-system-single-box (NS-DSSB) approach when using unidirectional estimates. At the same time, we suggest a straightforward and minimal modification of the NS-DSSB protocol for obtaining reliable unidirectional estimates for the process where the ligand is decoupled in the bound state and recoupled in the bulk.

  相似文献   

14.
Part of the latest SAMPL challenge was to predict how a small fragment library of 500 commercially available compounds would bind to a protein target. In order to assess the modellers' work, a reasonably comprehensive set of data was collected using a number of techniques. These included surface plasmon resonance, isothermal titration calorimetry, protein crystallization and protein crystallography. Using these techniques we could determine the kinetics of fragment binding, the energy of binding, how this affects the ability of the target to crystallize, and when the fragment did bind, the pose or orientation of binding. Both the final data set and all of the raw images have been made available to the community for scrutiny and further work. This overview sets out to give the parameters of the experiments done and what might be done differently for future studies.  相似文献   

15.
Here, we test a method, called semi-explicit assembly (SEA), that computes the solvation free energies of molecules in water in the SAMPL4 blind test challenge. SEA was developed with the intention of being as accurate as explicit-solvent models, but much faster to compute. It is accurate because it uses pre-simulations of simple spheres in explicit solvent to obtain structural and thermodynamic quantities, and it is fast because it parses solute free energies into regionally additive quantities. SAMPL4 provided us the opportunity to make new tests of SEA. Our tests here lead us to the following conclusions: (1) The newest version, called Field-SEA, which gives improved predictions for highly charged ions, is shown here to perform as well as the earlier versions (dipolar and quadrupolar SEA) on this broad blind SAMPL4 test set. (2) We find that both the past and present SEA models give solvation free energies that are as accurate as TIP3P. (3) Using a new approach for force field parameter optimization, we developed improved hydroxyl parameters that ensure consistency with neat-solvent dielectric constants, and found that they led to improved solvation free energies for hydroxyl-containing compounds in SAMPL4. We also learned that these hydroxyl parameters are not just fixing solvent exposed oxygens in a general sense, and therefore do not improve predictions for carbonyl or carboxylic-acid groups. Other such functional groups will need their own independent optimizations for potential improvements. Overall, these tests in SAMPL4 indicate that SEA is an accurate, general and fast new approach to computing solvation free energies.  相似文献   

16.
For the fifth time I have provided a set of solvation energies (1 M gas to 1 M aqueous) for a SAMPL challenge. In this set there are 23 blind compounds and 30 supplementary compounds of related structure to one of the blind sets, but for which the solvation energy is readily available. The best current values of each compound are presented along with complete documentation of the experimental origins of the solvation energies. The calculations needed to go from reported data to solvation energies are presented, with particular attention to aspects which are new to this set. For some compounds the vapor pressures (VP) were reported for the liquid compound, which is solid at room temperature. To correct from VPsubcooled liquid to VPsublimation requires ΔSfusion, which is only known for mannitol. Estimated values were used for the others, all but one of which were benzene derivatives and expected to have very similar values. The final compound for which ΔSfusion was estimated was menthol, which melts at 42 °C so that modest errors in ΔSfusion will have little effect. It was also necessary to look into the effects of including estimated values of ΔCp on this correction. The approximate sizes of the effects of inclusion of ΔCp in the correction from VPsubcooled liquid to VPsublimation were estimated and it was noted that inclusion of ΔCp invariably makes ΔGS more positive. To extend the set of compounds for which the solvation energy could be calculated we explored the use of boiling point (b.p.) data from Reaxys/Beilstein as a substitute for studies of the VP as a function of temperature. B.p. data are not always reliable so it was necessary to develop a criterion for rejecting outliers. For two compounds (chlorinated guaiacols) it became clear that inclusion represented overreach; for each there were only two independent pressure, temperature points, which is too little for a trustworthy extrapolation. For a number of compounds the extrapolation from lowest temperature at which the VP was reported to 25 °C was long (sometimes over 100°) so that it was necessary to consider whether ΔCp might have significant effects. The problem is that there are no experimental values and possible intramolecular hydrogen bonds make estimation uncertain in some cases. The approximate sizes of the effects of ΔCp were estimated, and it was noted that inclusion of ΔCp in the extrapolation of VP down to room temperature invariably makes ΔGs more negative.  相似文献   

17.
The SAMPL2 hydration free energy blind prediction challenge consisted of a data set of 41 molecules divided into three subsets: explanatory, obscure and investigatory, where experimental hydration free energies were given for the explanatory, withheld for the obscure, and not known for the investigatory molecules. We employed two solvation models for this challenge, a linear interaction energy (LIE) model based on explicit-water molecular dynamics simulations, and the first-shell hydration (FiSH) continuum model previously calibrated to mimic LIE data. On the 23 compounds from the obscure (blind) dataset, the prospectively submitted LIE and FiSH models provided predictions highly correlated with experimental hydration free energy data, with mean-unsigned-errors of 1.69 and 1.71 kcal/mol, respectively. We investigated several parameters that may affect the performance of these models, namely, the solute flexibility for the LIE explicit-solvent model, the solute partial charging method, and the incorporation of the difference in intramolecular energy between gas and solution phases for both models. We extended this analysis to the various chemical classes that can be formed within the SAMPL2 dataset. Our results strengthen previous findings on the excellent accuracy and transferability of the LIE explicit-solvent approach to predict transfer free energies across a wide spectrum of functional classes. Further, the current results on the SAMPL2 test dataset provide additional support for the FiSH continuum model as a fast yet accurate alternative to the LIE explicit-solvent model. Overall, both the LIE explicit-solvent model and the FiSH continuum solvation model show considerable improvement on the SAMPL2 data set over our previous continuum electrostatics-dispersion solvation model used in the SAMPL1 blind challenge.  相似文献   

18.
Water octanol partition coefficient serves as a measure for the lipophilicity of a molecule and is important in the field of drug discovery. A novel method for computational prediction of logarithm of partition coefficient (logP) has been developed using molecular fingerprints and a deep neural network. The machine learning model was trained on a dataset of 12,000 molecules and tested on 2000 molecules. In this article, we present our results for the blind prediction of logP for the SAMPL6 challenge. While the best submission achieved a RMSE of 0.41 logP units, our submission had a RMSE of 0.61 logP units. Overall, we ranked in the top quarter out of the 92 submissions that were made. Our results show that the deep learning model can be used as a fast, accurate and robust method for high throughput prediction of logP of small molecules.  相似文献   

19.

Within the scope of SAMPL7 challenge for predicting physical properties, the Integral Equation Formalism of the Miertus-Scrocco-Tomasi (IEFPCM/MST) continuum solvation model has been used for the blind prediction of n-octanol/water partition coefficients and acidity constants of a set of 22 and 20 sulfonamide-containing compounds, respectively. The log P and pKa were computed using the B3LPYP/6-31G(d) parametrized version of the IEFPCM/MST model. The performance of our method for partition coefficients yielded a root-mean square error of 1.03 (log P units), placing this method among the most accurate theoretical approaches in the comparison with both globally (rank 8th) and physical (rank 2nd) methods. On the other hand, the deviation between predicted and experimental pKa values was 1.32 log units, obtaining the second best-ranked submission. Though this highlights the reliability of the IEFPCM/MST model for predicting the partitioning and the acid dissociation constant of drug-like compounds compound, the results are discussed to identify potential weaknesses and improve the performance of the method.

  相似文献   

20.
The performance of the extended solvent-contact model has been addressed in the SAMPL5 blind prediction challenge for distribution coefficient (LogD) of drug-like molecules with respect to the cyclohexane/water partitioning system. All the atomic parameters defined for 41 atom types in the solvation free energy function were optimized by operating a standard genetic algorithm with respect to water and cyclohexane solvents. In the parameterizations for cyclohexane, the experimental solvation free energy (ΔG sol ) data of 15 molecules for 1-octanol were combined with those of 77 molecules for cyclohexane to construct a training set because ΔG sol values of the former were unavailable for cyclohexane in publicly accessible databases. Using this hybrid training set, we established the LogD prediction model with the correlation coefficient (R), average error (AE), and root mean square error (RMSE) of 0.55, 1.53, and 3.03, respectively, for the comparison of experimental and computational results for 53 SAMPL5 molecules. The modest accuracy in LogD prediction could be attributed to the incomplete optimization of atomic solvation parameters for cyclohexane. With respect to 31 SAMPL5 molecules containing the atom types for which experimental reference data for ΔG sol were available for both water and cyclohexane, the accuracy in LogD prediction increased remarkably with the R, AE, and RMSE values of 0.82, 0.89, and 1.60, respectively. This significant enhancement in performance stemmed from the better optimization of atomic solvation parameters by limiting the element of training set to the molecules with experimental ΔG sol data for cyclohexane. Due to the simplicity in model building and to low computational cost for parameterizations, the extended solvent-contact model is anticipated to serve as a valuable computational tool for LogD prediction upon the enrichment of experimental ΔG sol data for organic solvents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号