首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
A new method, based on generalized Fourier analysis, is described that utilizes the concept of "molecular basis sets" to represent chemical space within an abstract vector space. The basis vectors in this space are abstract molecular vectors. Inner products among the basis vectors are determined using an ansatz that associates molecular similarities between pairs of molecules with their corresponding inner products. Moreover, the fact that similarities between pairs of molecules are, in essentially all cases, nonzero implies that the abstract molecular basis vectors are nonorthogonal, but since the similarity of a molecule with itself is unity, the molecular vectors are normalized to unity. A symmetric orthogonalization procedure, which optimally preserves the character of the original set of molecular basis vectors, is used to construct appropriate orthonormal basis sets. Molecules can then be represented, in general, by sets of orthonormal "molecule-like" basis vectors within a proper Euclidean vector space. However, the dimension of the space can become quite large. Thus, the work presented here assesses the effect of basis set size on a number of properties including the average squared error and average norm of molecular vectors represented in the space-the results clearly show the expected reduction in average squared error and increase in average norm as the basis set size is increased. Several distance-based statistics are also considered. These include the distribution of distances and their differences with respect to basis sets of differing size and several comparative distance measures such as Spearman rank correlation and Kruscal stress. All of the measures show that, even though the dimension can be high, the chemical spaces they represent, nonetheless, behave in a well-controlled and reasonable manner. Other abstract vector spaces analogous to that described here can also be constructed providing that the appropriate inner products can be directly evaluated as is the case in this work, a problem that is well-known in kernel-based machine learning.  相似文献   

2.
Attention mechanisms have led to many breakthroughs in sequential data modeling but have yet to be incorporated into any generative algorithms for molecular design. Here we explore the impact of adding self-attention layers to generative β-VAE models and show that those with attention are able to learn a complex “molecular grammar” while improving performance on downstream tasks such as accurately sampling from the latent space (“model memory”) or exploring novel chemistries not present in the training data. There is a notable relationship between a model''s architecture, the structure of its latent memory and its performance during inference. We demonstrate that there is an unavoidable tradeoff between model exploration and validity that is a function of the complexity of the latent memory. However, novel sampling schemes may be used that optimize this tradeoff. We anticipate that attention will play an important role in future molecular design algorithms that can make efficient use of the detailed molecular substructures learned by the transformer.

An implementation of attention within the variational autoencoder framework for continuous representation of molecules. The addition of attention significantly increases model performance for complex tasks such as exploration of novel chemistries.  相似文献   

3.
《Comptes Rendus Chimie》2015,18(5):516-524
Density functional theory (DFT) is applied to obtain absorption spectra at THz frequencies for molecular clusters of H2O. The vibrational modes of the clusters are calculated. Coupling among molecular vibrational modes explains their spectral features associated with THz excitation. THz excitation is associated with vibrational frequencies which are here calculated within the DFT approximation of electronic states. This is done for both isolated molecules and collections of molecules in a cluster. The principal result of the paper is that a crystal-like cluster of 38 water molecules together with a continuum solvent background is sufficient to replicate well the experimental vibrational frequencies.  相似文献   

4.
5.
6.
In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as “hits”. In such an experiment, each molecule from a large small-molecule drug library is evaluated in terms of physical properties such as the docking score against a target receptor. In real-life drug discovery experiments, drug libraries are extremely large but still there is only a minor representation of the essentially infinite chemical space, and evaluation of physical properties for each molecule in the library is not computationally feasible. In the current study, a novel Machine learning framework for Enhanced MolEcular Screening (MEMES) based on Bayesian optimization is proposed for efficient sampling of the chemical space. The proposed framework is demonstrated to identify 90% of the top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational effort in not only drug-discovery but also areas that require such high-throughput experiments.

A novel machine learning framework based on Bayesian optimization for efficient sampling of chemical space. The framework is able to identify 90% of top-1000 hits by only sampling 6% of the complete dataset containing ∼100 million compounds.  相似文献   

7.
A spectral clustering method is presented and applied to two-dimensional molecular structures, where it has been found particularly useful in the analysis of screening data. The method provides a means to quantify (1) the degree of intermolecular similarity within a cluster and (2) the contribution that the features of a molecule make to a cluster. In an application of the spectral clustering method to an example data set of 125 COX-2 inhibitors, these two criteria were used to place the molecules into clusters of chemically related two-dimensional structures.  相似文献   

8.
We report here: (a) formulas/procedures for calculating the similarity of molecules, considering their chemical structure, size, shape and hydrophilicity (b) a procedure for clusterization of the sets of molecules, according to similarity (c) formulas/procedures for calculating the diversity of molecules in clusterized sets as well as similarity of clusterized sets, based on Shannon Entropy formalism The paper analyses the influence of the diversity of molecules and similarity of calibration/prediction sets on the quality of prediction for prediction set molecules. The calculated influence of certain molecular feature (chemical structure, size, shape and hydrophilicity) on toxicity depends on the structure of the database, specifically the number of molecules and diversity of molecules having analyzed molecular feature. A QSAR analysis of 49 phenol derivatives revealed the effect of the diversity of molecules in sets and of the similarity of sets on the quality of prediction for prediction set molecules: (a) a direct correlation with the similarity of sets, regardless of analyzed molecular feature (b) an inverse correlation with the diversity of molecules in the calibration set, from the point of view of chemical structure, size and shape (c) a direct correlation with the diversity of molecules in calibration set, from the point of view of hydrophilicity (d) a direct correlation with the diversity of molecules in prediction set, regardless of analyzed feature.  相似文献   

9.
An alternative to experimental high through-put screening is the virtual screening of compound libraries on the computer. In absence of a detailed structure of the receptor protein, candidate molecules are compared with a known reference by mutually superimposing their skeletons and scoring their similarity. Since molecular shape highly depends on the adopted conformation, an efficient conformational screening is performed using a knowledge-based approach. A comprehensive torsion library has been compiled from crystal data stored in the Cambridge Structural Database. For molecular comparison a strategy is followed considering shape associated physicochemical properties in space such as steric occupancy, electrostatics, lipophilicity and potential hydrogen-bonding. Molecular shape is approximated by a set of Gaussian functions not necessarily located at the atomic positions. The superposition is performed in two steps: first by a global alignment search operating on multiple rigid conformations and then by conformationally relaxing the best scored hits of the global search. A normalized similarity scoring is used to allow for a comparison of molecules with rather different shape and size. The approach has been implemented on a cluster of parallel processors. As a case study, the search for ligands binding to the dopamine receptor is given.  相似文献   

10.
11.
12.
We seek to recover rigorous atom types from amino acid wave functions. The atom types emerge from a cluster analysis operating on a set of seven atomic properties, including kinetic energy, volume, population, and dipole, quadrupole, octupole, and hexadecapole moments. These properties are acquired by partitioning the molecular electron density into quantum topological atoms. Wave functions are generated at the B3LYP/6-311+G(2d,p)//HF/6-31G(d) level for a sensible conformation of each of the 20 naturally occurring amino acids and smaller derived molecules, which together constitute a data set of 57 molecules. From this set 213 unique quantum topological carbons are obtained, which are linked according to the similarity of their properties. After introducing a statistical separation criterion, our cluster analysis proposes two representations: a cruder one with 5 atom types and a finer one with 21 atom types. The immediate coordination of the central carbon plays a major role in labeling the atom types.  相似文献   

13.
14.
A simple, new technique for the evaluation of the similarity of molecular shapes is presented. The concept of semisimilarity (asymmetric similarity) is applied within the topological–geometrical framework of scaling–nesting similarity measures of molecules. The practical application of these similarity measures is illustrated by the examples of the three-dimensional formal bodies of electronic charge densities of a set of simple molecules. © 1994 John Wiley & Sons, Inc.  相似文献   

15.
In drug design, often enough, no structural information on a particular receptor protein is available. However, frequently a considerable number of different ligands is known together with their measured binding affinities towards a receptor under consideration. In such a situation, a set of plausible relative superpositions of different ligands, hopefully approximating their putative binding geometry, is usually the method of choice for preparing data for the subsequent application of 3D methods that analyze the similarity or diversity of the ligands. Examples are 3D-QSAR studies, pharmacophore elucidation, and receptor modeling. An aggravating fact is that ligands are usually quite flexible and a rigorous analysis has to incorporate molecular flexibility. We review the past six years of scientific publishing on molecular superposition. Our focus lies on automatic procedures to be performed on arbitrary molecular structures. Methodical aspects are our main concern here. Accordingly, plain application studies with few methodical elements are omitted in this presentation. While this review cannot mention every contribution to this actively developing field, we intend to provide pointers to the recent literature providing important contributions to computational methods for the structural alignment of molecules. Finally we provide a perspective on how superposition methods can effectively be used for the purpose of virtual database screening. In our opinion it is the ultimate goal to detect analogues in structure databases of nontrivial size in order to narrow down the search space for subsequent experiments.  相似文献   

16.
A non-linear neural network model to perform cluster analysis is presented. It provides an efficient parallel algorithm for solving this pattern recognition task, consisting, from the mathematical point of view, of a combinatorial optimization problem. A new classification technique is discussed in order to visualize clustering patterns within a molecular set, by means of numerical analysis of the similarity matrix. As an example of the application of the reported neural network model, a quantum molecular similarity study in the field of structure-activity relationships is reported. A molecular set made of eighteen quinolones is used as an example. The resultant cluster distribution showed a good qualitative correlation between similarity data and biological activity.  相似文献   

17.
Bond orders and hybrid populations have been calculated from the density matrix localized in molecular space using similarity transformation for some fluorobenzene with the minimal basis set using Gaussian series of program. The ab initio bond orders and hybrid populations have been compared with the semiempirical calculations on this set of molecules. Also, these bond orders have been used in Coulson's bond order-bond length relationship to estimate bond lengths. The present calculations suggest that the qualitative predictions of molecular geometries are possible from these bond orders.  相似文献   

18.
A deterministic method (frequency distribution method) for selecting compounds from a partitioned virtual combinatorial library for efficient synthesis is presented here. The method is based on reagent frequency analysis and can be applied to any library of molecules distributed in any given partitioned chemical space (cluster, cell-based, etc.). Compound selection by reagent frequency distribution can produce a unique, diverse set of molecules that adequately represents the library while requiring the least amount of compounds to be synthesized and minimizing the number of different reagents that must be used. This method also provides a practical solution to the configuration of plate layout. Because the method essentially identifies "expensive" regions in the chemical space to synthesize for a desired diversity or similarity coverage, decisions concerning the necessity to synthesize these compounds can be addressed. Minimum compound generation and efficient plate layout results in savings both in time of synthesis and cost of materials. This method always results in a discrete solution, which can be used for any given library size as well as any combination of reagents and is also readily adaptable to robotic automation.  相似文献   

19.
Modern functional materials consist of large molecular building blocks with significant chemical complexity which limits spectroscopic property prediction with accurate first-principles methods. Consequently, a targeted design of materials with tailored optoelectronic properties by high-throughput screening is bound to fail without efficient methods to predict molecular excited-state properties across chemical space. In this work, we present a deep neural network that predicts charged quasiparticle excitations for large and complex organic molecules with a rich elemental diversity and a size well out of reach of accurate many body perturbation theory calculations. The model exploits the fundamental underlying physics of molecular resonances as eigenvalues of a latent Hamiltonian matrix and is thus able to accurately describe multiple resonances simultaneously. The performance of this model is demonstrated for a range of organic molecules across chemical composition space and configuration space. We further showcase the model capabilities by predicting photoemission spectra at the level of the GW approximation for previously unseen conjugated molecules.

A physically-inspired machine learning model for orbital energies is developed that can be augmented with delta learning to obtain photoemission spectra, ionization potentials, and electron affinities with experimental accuracy.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号