首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We developed a novel approach called SHAFTS (SHApe-FeaTure Similarity) for 3D molecular similarity calculation and ligand-based virtual screening. SHAFTS adopts a hybrid similarity metric combined with molecular shape and colored (labeled) chemistry groups annotated by pharmacophore features for 3D similarity calculation and ranking, which is designed to integrate the strength of pharmacophore matching and volumetric overlay approaches. A feature triplet hashing method is used for fast molecular alignment poses enumeration, and the optimal superposition between the target and the query molecules can be prioritized by calculating corresponding "hybrid similarities". SHAFTS is suitable for large-scale virtual screening with single or multiple bioactive compounds as the query "templates" regardless of whether corresponding experimentally determined conformations are available. Two public test sets (DUD and Jain's sets) including active and decoy molecules from a panel of useful drug targets were adopted to evaluate the virtual screening performance. SHAFTS outperformed several other widely used virtual screening methods in terms of enrichment of known active compounds as well as novel chemotypes, thereby indicating its robustness in hit compounds identification and potential of scaffold hopping in virtual screening.  相似文献   

2.
The direct molecular structure implementations of the gage-including atomic orbital (GIAO), individual gages for atoms in molecules (IGAIM) and continuous set of gage transformations (CSGT) methods for calculating nuclear magnetic shielding tensors at both the Hartree-Fock (HF) and density functional (B3LYP) levels of theory with 6-31G(d), 6-311G(d), 6-31++G(d,p), 6-311++G(d,p), and 6-311++G(df,pd) basis sets are presented. Dependence on the 1H and 13C NMR chemical shifts on the choice of method and basis set have been investigated. Also, these chemical shifts of 2-aryl-1,3,4-oxadiazoles 5a–g have been performed related to dihedral angles (C4–C3–C2–O) of two conformers. The optimized molecular geometries and 1H and 13C chemical shift values of 2-aryl-1,3,4-oxadiazoles 5a–g in the ground state have been obtained. The linear correlation coefficients of 13C NMR chemical shifts for these molecules were given. The new nuclear magnetic shielding tensors of tetramethylsilane (TMS) were calculated. The data of 2-aryl-1,3,4-oxadiazole derivatives display significant molecular structure and NMR analysis. Also, these provide the basis for future design of efficient materials having the 1,3,4-oxadiazole core.  相似文献   

3.
4.
A new method, based on generalized Fourier analysis, is described that utilizes the concept of "molecular basis sets" to represent chemical space within an abstract vector space. The basis vectors in this space are abstract molecular vectors. Inner products among the basis vectors are determined using an ansatz that associates molecular similarities between pairs of molecules with their corresponding inner products. Moreover, the fact that similarities between pairs of molecules are, in essentially all cases, nonzero implies that the abstract molecular basis vectors are nonorthogonal, but since the similarity of a molecule with itself is unity, the molecular vectors are normalized to unity. A symmetric orthogonalization procedure, which optimally preserves the character of the original set of molecular basis vectors, is used to construct appropriate orthonormal basis sets. Molecules can then be represented, in general, by sets of orthonormal "molecule-like" basis vectors within a proper Euclidean vector space. However, the dimension of the space can become quite large. Thus, the work presented here assesses the effect of basis set size on a number of properties including the average squared error and average norm of molecular vectors represented in the space-the results clearly show the expected reduction in average squared error and increase in average norm as the basis set size is increased. Several distance-based statistics are also considered. These include the distribution of distances and their differences with respect to basis sets of differing size and several comparative distance measures such as Spearman rank correlation and Kruscal stress. All of the measures show that, even though the dimension can be high, the chemical spaces they represent, nonetheless, behave in a well-controlled and reasonable manner. Other abstract vector spaces analogous to that described here can also be constructed providing that the appropriate inner products can be directly evaluated as is the case in this work, a problem that is well-known in kernel-based machine learning.  相似文献   

5.
Correlation consistent basis sets have been optimized for use with explicitly correlated F12 methods. The new sets, denoted cc-pVnZ-F12 (n=D,T,Q), are similar in size and construction to the standard aug-cc-pVnZ and aug-cc-pV(n+d)Z basis sets, but the new sets are shown in the present work to yield much improved convergence toward the complete basis set limit in MP2-F12/3C calculations on several small molecules involving elements of both the first and second row. For molecules containing only first row atoms, the smallest cc-pVDZ-F12 basis set consistently recovers nearly 99% of the MP2 valence correlation energy when combined with the MP2-F12/3C method. The convergence with basis set for molecules containing second row atoms is slower, but the new DZ basis set still recovers 97%-99% of the frozen core MP2 correlation energy. The accuracy of the new basis sets for relative energetics is demonstrated in benchmark calculations on a set of 15 chemical reactions.  相似文献   

6.
7.
Recognition of small molecules by proteins depends on three-dimensional molecular surface complementarity. However, the dominant techniques for analyzing the similarity of small molecules are based on two-dimensional chemical structure, with such techniques often outperforming three-dimensional techniques in side-by-side comparisons of correlation to biological activity. This paper introduces a new molecular similarity method, termed morphological similarity (MS), that addresses the apparent paradox. Two sets of molecule pairs are identified from a set of ligands whose protein-bound states are known crystallographically. Pairs that bind the same protein sites form the first set, and pairs that bind different sites form the second. MS is shown to separate the two sets significantly better than a benchmark 2D similarity technique. Further, MS agrees with crystallographic observation of bound ligand states, independent of information about bound states. MS is efficient to compute and can be practically applied to large libraries of compounds.  相似文献   

8.
In this paper we study different representational spaces of molecule data sets based on 2D representation models for the building of QSAR models for the prediction of the activity of 37 benzylamino enaminone derivatives. Approximations based on classical similarity calculated from fingerprint representation of molecules and isomorphism obtained using sub-graph matching algorithms are compared to fragmentation-based approximations using partial least squares and genetic algorithms. The influence of the anchored position of a non-common moiety and the kind of substituents in the common core structure of the data set are analysed, demonstrating the anomalous behaviour of some molecules and therefore the difficulty in building prediction models. These problems are solved by considering approximate similarity models. These models tune the prediction equations based on the size of the substituent and the anchored position, by adjusting the contribution of each substituent in similarity measurements calculated between the molecule data sets.  相似文献   

9.
Lin CH  Imasaka T 《Talanta》1995,42(8):1111-1119
A technique based on pattern recognition of data obtained by supersonic jet spectrometry is employed for the prediction of chemical structure. The degree of similarity is evaluated quantitatively by calculating a cross correlation factor between sample and reference molecules. A probability density function is determined by fitting the data to a specified equation. The functional group and its position are also predicted by a similar technique. The pattern recognition provides a method for prediction of the chemical structure and is applicable to samples that have not been examined by supersonic jet spectrometry.  相似文献   

10.
By representing molecules as vectors whose components are their nuclear charges, a theorem that allows to order Born-Oppenheimer energies of sets of isoprotonic-isoelectronic molecules is stated. Upper and lower bounds for these sets are derived, along with other general energy inequalities involving homonuclear systems and molecules with common molecular fragments. These inequalities imply that the sets of molecules under consideration are endowed with the structure of a partially ordered set (POSET). Some properties related to this structure are discussed.  相似文献   

11.
12.
Compact, contracted Gaussian basis sets for halogen atoms are generated and tested in ab initio molecular calculations. These basis sets have similar structure to that of Huzinaga and co-workers' (HTS ) sets; however, they give both better atomic total energies and better properties of atomic valence orbitals. These sets, after splitting of valence orbitals and augmenting with polarization functions, provide molecular results that agree well with those given by extended calculations. Basis set superposition error (BSSE ) is calculated using the counterpoise method. BSSE has only slight influence on calculated equilibrium geometry, shape of potential curve, and electric properties (dipole and quadrupole moments) of molecules. However, atomization energies may be significantly changed by the BSSE .  相似文献   

13.
Summary Three-dimensional (3D)-database searches are now being widely applied to determine potential new active molecules. Many structural data sets obtained as a result of these searches are still large in size. In this paper we apply molecular similarity calculations as a rapid method to screen two such data sets. In the first investigation, synthetic candidates, produced as a result of a tendamistat -turn mimic search, were tested for their ability to imitate the -turn backbone. In the second study, structures extracted through a histamine pharmacophore query search were examined on the basis of their electronic similarity to histamine. Molecular similarity is shown to provide a rapid means of gaining insight into the composition of molecular data sets, with possible implications for future full 3D-database searches.  相似文献   

14.
15.
A group-contribution method was developed for calculating the binding constants or the free energies of complexation between native alpha- or beta-cyclodextrin (CD) and organic guest molecules. The nonlinear models for binary (1:1 stoichiometry) complexes of alpha- and beta-CDs were derived with squared correlation coefficients (r(2)) of 0.868 and 0.917 based on a database consisting of 102 and 218 diverse guest molecules, respectively. The parameters used in the models are first-order molecular connectivity index as a measure of molecular bulk and atom/group counts in the guest molecules. The models allow accurate estimations for the wide range of guests containing C, H, N, O, S, and/or all halogens by summing the contribution values of each defined group present in the chemical structure of the guest along with guest's molecular size factors (linear and square terms) and then the summation to a constant coefficient value. The predictive performance of the models was tested by extra set of 27 compounds which were not included in the original data set. The predicted values by the models are in good agreement with the experimentally determined data.  相似文献   

16.
17.
A computational method to rapidly assess and visualize the diversity in molecular shape associated with a given compound set has been developed. Normalized ratios of principal moments of inertia are plotted into two-dimensional triangular graphs and then used to compare the shape space covered by different compound sets, such as combinatorial libraries of varying size and composition. We have further developed a computational method to analyze interset similarity in terms of shape space coverage, which allows the shape redundancy between the different subsets of a given compound collection to be analyzed in a quantitative way. The shape space coverage has been found to originate mainly from the nature and the 3D-geometry (but not the size) of the central scaffold, while the number and nature of the peripheral substituents and conformational aspects were shown to be of minor importance. Substantial shape space coverage has been correlated with broad biological activity by applying the same shape analysis to collections of known bioactive compounds, such as MDDR and the GOLD-set. The aggregate of our results corroborates the intuitive notion that molecular shape is intimately linked to biological activity and that a high degree of shape (hence scaffold) diversity in screening collections will increase the odds of addressing a broad range of biological targets.  相似文献   

18.
基于支持向量学习机方法的人体小肠吸收药物活性的预测   总被引:2,自引:0,他引:2  
为了预测分子在人体小肠中的吸收,本文计算了表征分子的电子、拓扑、几何结构、分子形状等特征的102个分子描述符,用遗传算法变量选择方法使描述符减少到47个。体系共包含了230个化合物分子,69个不能被吸收(mA-),161个可以被吸收(HIA )。对建立的SVM模型,用5重交叉验证和独立测试集进行验证,预测正确率分别达到79.1%和77.1%,结果具有较好的一致性。在模型验证中,通过聚类分析方法组合训练集和测试集,保证了模型的稳定性,提高了建模效率。  相似文献   

19.
The coupled cluster approximation with single, double, and quasiperturbative triple excitations [CCSD(T)] was used in combination with the Douglas-Kroll contracted correlation consistent basis sets [cc-pVnZ-DK, where n = D(2), T(3), Q(4), and 5] and small-core relativistic pseudopotentials (PP) with correlation consistent polarized valence basis sets (cc-pVnZ-PP and aug-cc-pVnZ-PP) to investigate the impact of scalar relativistic corrections on energetic and structural properties of small molecules containing third-row (Ga-Kr) atoms. These molecules were taken from the Gaussian-2 extended test set for third-row atoms. Atomization energies, ionization energies, electron affinities, and proton affinities for molecules in the test set were determined and compared with nonrelativistic results which were obtained in a recent study in which the standard and augmented correlation consistent basis sets were used in combination with CCSD(T). Several schemes were used to extrapolate the energies to the complete basis set limit.  相似文献   

20.
The fuzzy set structure is analysed from the point of view of a new definition: Boolean tagged sets, which are constructed as a straightforward generalisation of the fuzzy set concept, but fully adapted to computational purposes. Boolean tagged sets are simply structured as sets whose members are tagged by a binary string, attached in turn to the vertices of a unit hypercube. Boolean tagged sets are so designed as to become useful in any field of applied mathematics, but they can also be seen as obviously prepared for chemical applications, associated to molecular information gathering and manipulation. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号