Deep machine learning is expanding the conceptual framework and capacity of computational compound design, enabling new applications through generative modeling. We have explored the systematic design of covalent protein kinase inhibitors by learning from kinome-relevant chemical space, followed by focusing on an exemplary kinase of interest. Covalent inhibitors experience a renaissance in drug discovery, especially for targeting protein kinases. However, computational design of this class of inhibitors has thus far only been little investigated. To this end, we have devised a computational approach combining fragment-based design and deep generative modeling augmented by three-dimensional pharmacophore screening. This approach is thought to be particularly relevant for medicinal chemistry applications because it combines knowledge-based elements with deep learning and is chemically intuitive. As an exemplary application, we report for Bruton’s tyrosine kinase (BTK), a major drug target for the treatment of inflammatory diseases and leukemia, the generation of novel candidate inhibitors with a specific chemically reactive group for covalent modification, requiring only little target-specific compound information to guide the design efforts. Newly generated compounds include known inhibitors and characteristic substructures and many novel candidates, thus lending credence to the computational approach, which is readily applicable to other targets. 相似文献
Metalloproteins are a family of proteins characterized by metal ion binding, whereby the presence of these ions confers key catalytic and ligand-binding properties. Due to their ubiquity among biological systems, researchers have made immense efforts to predict the structural and functional roles of metalloproteins. Ultimately, having a comprehensive understanding of metalloproteins will lead to tangible applications, such as designing potent inhibitors in drug discovery. Recently, there has been an acceleration in the number of studies applying machine learning to predict metalloprotein properties, primarily driven by the advent of more sophisticated machine learning algorithms. This review covers how machine learning tools have consolidated and expanded our comprehension of various aspects of metalloproteins (structure, function, stability, ligand-binding interactions, and inhibitors). Future avenues of exploration are also discussed. 相似文献
Predicting protein function and structure from sequence remains an unsolved problem in bioinformatics. The best performing methods rely heavily on evolutionary information from multiple sequence alignments, which means their accuracy deteriorates for sequences with a few homologs, and given the increasing sequence database sizes requires long computation times. Here, a single‐sequence‐based prediction method is presented, called ProteinUnet, leveraging an U‐Net convolutional network architecture. It is compared to SPIDER3‐Single model, based on long short‐term memory‐bidirectional recurrent neural networks architecture. Both methods achieve similar results for prediction of secondary structures (both three‐ and eight‐state), half‐sphere exposure, and contact number, but ProteinUnet has two times fewer parameters, 17 times shorter inference time, and can be trained 11 times faster. Moreover, ProteinUnet tends to be better for short sequences and residues with a low number of local contacts. Additionally, the method of loss weighting is presented as an effective way of increasing accuracy for rare secondary structures. 相似文献
A designer monomeric protein with a βαβ fold—two parallel β strands connected by an α helix (see structure)—was constructed solely from coded amino acids. The high thermal stability of the structure is due to a large extent to tryptophan–tryptophan interactions between the two β strands.
Increased tolerance of enzymes towards thermal and chemical stress is required for many applications and can be achieved by macrocyclization of the enzyme resulting in the stabilizing of its tertiary structure. Thus far, macrocyclization approaches utilize a very limited structural diversity, which complicates the design process. Herein, we report an approach that enables cyclization through the installation of modular crosslinks into native proteins composed entirely of proteinogenic amino acids. Our stabilization procedure involves the introduction of three surface‐exposed cysteine residues, which are reacted with a triselectrophile, resulting in the in situ cyclization of the protein (INCYPRO). A bicyclic version of sortase A was designed that exhibits increased tolerance towards thermal as well as chemical denaturation, and proved to be efficient in protein labeling under denaturing conditions. In addition, we applied INCYPRO to the KIX domain, resulting in up to 24 °C increased thermal stability. 相似文献
Deep learning methods for RNA secondary structure prediction have shown higher performance than traditional methods, but there is still much room to improve. It is known that the lengths of RNAs are very different, as are their secondary structures. However, the current deep learning methods all use length-independent models, so it is difficult for these models to learn very different secondary structures. Here, we propose a length-dependent model that is obtained by further training the length-independent model for different length ranges of RNAs through transfer learning. 2dRNA, a coupled deep learning neural network for RNA secondary structure prediction, is used to do this. Benchmarking shows that the length-dependent model performs better than the usual length-independent model. 相似文献
Transition metal (TM) complexes exhibit diverse structural and electronic properties. The properties of a TM complex can be tuned by modulating the ligand field strength (LFS) inflicted by its ligands. Current quantification of the LFS of a ligand is mainly derived from experimental measurements on a subset of highly symmetrical TM complexes and is limited in ligand scope. Herein, we report a data-driven method to quantify the LFS of ligands assigned from experimental crystal structures of TM complexes. We first show that the experimental metal–ligand bond lengths of over 4,000 mononuclear Fe, Co, and Mn complexes form bimodal distributions. Using Gaussian fits on the bimodal distributions, each TM complex is assigned a spin state (SS) label. These SS labels can then be used to calculate the LFS of the ligands of the complexes. Using the obtained data-driven LFS metric, we establish that a semi-supervised deep generative model, junction tree variational autoencoder (JTVAE), can be employed to predict LFS values. Our model exhibits a mean absolute error (MAE) of 0.047 and root mean squared error of 0.072 on the training set. The model also allows the generation of novel ligands with desirable LFS values. 相似文献
The question of molecular similarity is core in cheminformatics and is usually assessed via a pairwise comparison based on vectors of properties or molecular fingerprints. We recently exploited variational autoencoders to embed 6M molecules in a chemical space, such that their (Euclidean) distance within the latent space so formed could be assessed within the framework of the entire molecular set. However, the standard objective function used did not seek to manipulate the latent space so as to cluster the molecules based on any perceived similarity. Using a set of some 160,000 molecules of biological relevance, we here bring together three modern elements of deep learning to create a novel and disentangled latent space, viz transformers, contrastive learning, and an embedded autoencoder. The effective dimensionality of the latent space was varied such that clear separation of individual types of molecules could be observed within individual dimensions of the latent space. The capacity of the network was such that many dimensions were not populated at all. As before, we assessed the utility of the representation by comparing clozapine with its near neighbors, and we also did the same for various antibiotics related to flucloxacillin. Transformers, especially when as here coupled with contrastive learning, effectively provide one-shot learning and lead to a successful and disentangled representation of molecular latent spaces that at once uses the entire training set in their construction while allowing “similar” molecules to cluster together in an effective and interpretable way. 相似文献
RNA molecules participate in many important biological processes, and they need to fold into well-defined secondary and tertiary structures to realize their functions. Like the well-known protein folding problem, there is also an RNA folding problem. The folding problem includes two aspects: structure prediction and folding mechanism. Although the former has been widely studied, the latter is still not well understood. Here we present a deep reinforcement learning algorithms 2dRNA-Fold to study the fastest folding paths of RNA secondary structure. 2dRNA-Fold uses a neural network combined with Monte Carlo tree search to select residue pairing step by step according to a given RNA sequence until the final secondary structure is formed. We apply 2dRNA-Fold to several short RNA molecules and one longer RNA 1Y26 and find that their fastest folding paths show some interesting features. 2dRNA-Fold is further trained using a set of RNA molecules from the dataset bpRNA and is used to predict RNA secondary structure. Since in 2dRNA-Fold the scoring to determine next step is based on possible base pairings, the learned or predicted fastest folding path may not agree with the actual folding paths determined by free energy according to physical laws. 相似文献
Recently developed reduced models of proteins with knowledge-based force fields have been applied to a specific case of comparative modeling. From twenty high resolution protein structures of various structural classes, significant fragments of their chains have been removed and treated as unknown. The remaining portions of the structures were treated as fixed - i.e., as templates with an exact alignment. Then, the missed fragments were reconstructed using several modeling tools. These included three reduced types of protein models: the lattice SICHO (Side Chain Only) model, the lattice CABS (Calpha + Cbeta + Side group) model and an off-lattice model similar to the CABS model and called REFINER. The obtained reduced models were compared with more standard comparative modeling tools such as MODELLER and the SWISS-MODEL server. The reduced model results are qualitatively better for the higher resolution lattice models, clearly suggesting that these are now mature, competitive and complementary (in the range of sparse alignments) to the classical tools of comparative modeling. Comparison between the various reduced models strongly suggests that the essential ingredient for the sucessful and accurate modeling of protein structures is not the representation of conformational space (lattice, off-lattice, all-atom) but, rather, the specificity of the force fields used and, perhaps, the sampling techniques employed. These conclusions are encouraging for the future application of the fast reduced models in comparative modeling on a genomic scale. 相似文献
In this contribution, we present an algorithm for protein backbone reconstruction that comprises very high computational efficiency with high accuracy. Reconstruction of the main chain atomic coordinates from the alpha carbon trace is a common task in protein modeling, including de novo structure prediction, comparative modeling, and processing experimental data. The method employed in this work follows the main idea of some earlier approaches to the problem. The details and careful design of the present approach are new and lead to the algorithm that outperforms all commonly used earlier applications. BBQ (Backbone Building from Quadrilaterals) program has been extensively tested both on native structures as well as on near-native decoy models and compared with the different available existing methods. Obtained results provide a comprehensive benchmark of existing tools and evaluate their applicability to a large scale modeling using a reduced representation of protein conformational space. The BBQ package is available for downloading from our website at http://biocomp.chem.uw.edu.pl/services/BBQ/. This webpage also provides a user manual that describes BBQ functions in detail. 相似文献
A Fourier transform infrared (FTIR) spectroscopy assay to measure hydrogen–deuterium exchange (HDX) in surface‐adsorbed protein monolayers is developed to provide information on protein tertiary structure, because the typical secondary structural analysis of our surface and solution protein samples proved to be very similar. Adsorbed protein HDX is quantified by exposing the protein to a 50% deuterated NaPO4 buffer solution and then measuring the normalized intensity change of the amide II band in the FTIR reflection spectrum. When collected as a function of exchange time, this intensity follows the kinetics of the exposure of the protein amides to solvent. HDX kinetics have been obtained for bovine serum albumin (BSA) in solution and adsorbed to gold surfaces. Using experiments designed to allow comparisons between protein in solution and on surfaces, the extent of HDX was found to increase over that observed for BSA in solution, consistent with an increase in the exposure of albumin amide groups and protein unfolding upon adsorption. We also show that BSA adsorbs to the surface of gold in multilayers and that the increase in amide exposure is present only in the first adsorbed monolayer. Published in 2009 by John Wiley & Sons, Ltd. 相似文献