首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Selecting folded proteins from a library of secondary structural elements   总被引:1,自引:0,他引:1  
A protein evolution strategy is described by which double-stranded DNA fragments encoding defined Escherichia coli protein secondary structural elements (alpha-helices, beta-strands, and loops) are assembled semirandomly into sequences comprised of as many as 800 amino acid residues. A library of novel polypeptides generated from this system was inserted into an enhanced green fluorescent protein (EGFP) fusion vector. Library members were screened by fluorescence activated cell sorting (FACS) to identify those polypeptides that fold into soluble, stable structures in vivo that comprised a subset of shorter sequences ( approximately 60 to 100 residues) from the semirandom sequence library. Approximately 108 clones were screened by FACS, a set of 1149 high fluorescence colonies were characterized by dPCR, and four soluble clones with varying amounts of secondary structure were identified. One of these is highly homologous to a domain of aspartate racemase from a marine bacterium (Polaromonas sp.) but is not homologous to any E. coli protein sequence. Several other selected polypeptides have no global sequence homology to any known protein but show significant alpha-helical content, limited dispersion in 1D nuclear magnetic resonance spectra, pH sensitive ANS binding and reversible folding into soluble structures. These results demonstrate that this strategy can generate novel polypeptide sequences containing secondary structure.  相似文献   

2.
A Carr-Purcell-Meiboom-Gill relaxation dispersion experiment is presented for quantifying millisecond time-scale chemical exchange at side-chain (1)H positions in proteins. Such experiments are not possible in a fully protonated molecule because of magnetization evolution from homonuclear scalar couplings that interferes with the extraction of accurate transverse relaxation rates. It is shown, however, that by using a labeling strategy whereby proteins are produced using {(13)C,(1)H}-glucose and D(2)O a significant number of 'isolated' side-chain (1)H spins are generated, eliminating such effects. It thus becomes possible to record (1)H dispersion profiles at the β positions of Asx, Cys, Ser, His, Phe, Tyr, and Trp as well as the γ positions of Glx, in addition to the methyl side-chain moieties. This brings the total of amino acid side-chain positions that can be simultaneously probed using a single (1)H dispersion experiment to 16. The utility of the approach is demonstrated with an application to the four-helix bundle colicin E7 immunity protein, Im7, which folds via a partially structured low populated intermediate that interconverts with the folded, ground state on the millisecond time-scale. The extracted (1)H chemical shift differences at side-chain positions provide valuable restraints in structural studies of invisible, excited states, complementing backbone chemical shifts that are available from existing relaxation dispersion experiments.  相似文献   

3.
The generation of proteins, especially enzymes, with pre-deliberated, novel properties is a big challenge in the field of protein engineering. This aim, over the years was critically facilitated by newly emerging methods of combinatorial and evolutionary techniques, such as combinatorial gene synthesis followed by functional screening of many structural variants generated in parallel (library). Libraries can be generated by a large number of available methods. Therein the use of mixtures of pre-formed trinucleotide blocks representing codons for the 20 canonical amino acids for oligonucleotide synthesis stands out as allowing fully controlled partial (or total) randomization individually at any number of arbitrarily chosen codon positions of a given gene. This has created substantial demand of fully protected trinucleotide synthons of good reactivity in standard oligonucleotide synthesis. We here review methods for the preparation of oligonucleotide mixtures with a strong focus on codon-specific trinucleotide blocks.  相似文献   

4.
Methods of artificial evolution such as SELEX and in vitro selection have made it possible to isolate RNA and DNA motifs with a wide range of functions from large random sequence libraries. Once the primary sequence of a functional motif is known, the sequence space around it can be comprehensively explored using a combination of random mutagenesis and selection. However, methods to explore the sequence space of a secondary structure are not as well characterized. Here we address this question by describing a method to construct libraries in a single synthesis which are enriched for sequences with the potential to form a specific secondary structure, such as that of an aptamer, ribozyme, or deoxyribozyme. Although interactions such as base pairs cannot be encoded in a library using conventional DNA synthesizers, it is possible to modulate the probability that two positions will have the potential to pair by biasing the nucleotide composition at these positions. Here we show how to maximize this probability for each of the possible ways to encode a pair (in this study defined as A-U or U-A or C-G or G-C or G.U or U.G). We then use these optimized coding schemes to calculate the number of different variants of model stems and secondary structures expected to occur in a library for a series of structures in which the number of pairs and the extent of conservation of unpaired positions is systematically varied. Our calculations reveal a tradeoff between maximizing the probability of forming a pair and maximizing the number of possible variants of a desired secondary structure that can occur in the library. They also indicate that the optimal coding strategy for a library depends on the complexity of the motif being characterized. Because this approach provides a simple way to generate libraries enriched for sequences with the potential to form a specific secondary structure, we anticipate that it should be useful for the optimization and structural characterization of functional nucleic acid motifs.  相似文献   

5.
Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity frequently share the same fold. This leads to the concept of protein designability. Some folds are more designable and lots of sequences can assume that fold. Elucidating the relationship between protein sequence and the three-dimensional (3D) structure that the sequence folds into is an important problem in computational structural biology. Lattice models have been utilized in numerous studies to model protein folds and predict the designability of certain folds. In this study, all possible compact conformations within a set of two-dimensional and 3D lattice spaces are explored. Complementary interaction graphs are then generated for each conformation and are described using a set of graph features. The full HP sequence space for each lattice model is generated and contact energies are calculated by threading each sequence onto all the possible conformations. Unique conformation giving minimum energy is identified for each sequence and the number of sequences folding to each conformation (designability) is obtained. Machine learning algorithms are used to predict the designability of each conformation. We find that the highly designable structures can be distinguished from other non-designable conformations based on certain graphical geometric features of the interactions. This finding confirms the fact that the topology of a conformation is an important determinant of the extent of its designability and suggests that the interactions themselves are important for determining the designability.  相似文献   

6.
Summary Evolutionary computing is a general optimization mechanism successfully implemented for a variety of numeric problems in a variety of fields, including structural biology. We here present an evolutionary approach to optimize helix stability in peptides and proteins employing the AGADIR energy function for helix stability as scoring function. With the ability to apply masks determining positions, which are to remain constant or fixed to a certain class of amino acids, our algorithm is capable of developing stable helical scaffolds containing a wide variety of structural and functional amino acid patterns. The algorithm showed good convergence behaviour in all tested cases and can be parameterized in a wide variety of ways. We have applied our algorithm for the optimization of the stability of prion protein helix 1, a structural element of the prion protein which is thought to play a crucial role in the conformational transition from the cellular to the pathogenic form of the prion protein, and which therefore poses an interesting target for pharmacological as well as genetic engineering approaches to counter the as of yet uncurable prion diseases. NMR spectroscopic investigations of selected stabilizing and destabilizing mutations found by our algorithm could demonstrate its ability to create stabilized variants of secondary structure elements.  相似文献   

7.
On the basis of information on the evolution of the 20 amino acids and their physiochemical characteristics, we propose a new two-dimensional (2D) graphical representation of protein sequences in this article. By this representation method, we use 2D data to represent three-dimensional information constructed by the amino acids' evolution index, the class information of amino acid based on physiochemical characteristics, and the order of the amino acids appearing in the protein sequences. Then, using discrete Fourier transform, the sequence signals with different lengths can be transformed to the frequency domain, in which the sequences are with the same length. A new method is used to analyze the protein sequence similarity and to predict the protein structural class. The experiments indicate that our method is effective and useful.  相似文献   

8.
Glycoengineering is a recently used approach to extend serum half-life of valuable protein therapeutics. One aspect of glycoengineering is to introduce new N-glycosylation site (Asn-X-Thr/Ser, where X ≠ Pro) into desirable positions in the peptide backbone, resulting in the generation of hyper-glycosylated protein. In this study, human luteinizing hormone (LH) was considered for identification of the suitable positions for the addition of new N-linked glycosylation sites. A rational in silico approach was applied for prediction of structural and functional alterations caused by changes in amino acid sequence. As the first step, we explored the amino acid sequence of LH to find out desirable positions for introducing Asn or/and Thr to create new N-glycosylation sites. This exploration led to the identification of 38 potential N-glycan sites, and then the four acceptable ones were selected for further analysis. Three-dimensional (3D) structures of the selected analogs were generated and examined by the model evaluation methods. Finally, two analogs with one additional glycosylation site were suggested as the qualified analogs for hyper-glycosylation of the LH, which can be considered for further experimental investigations. Our computational strategy can reduce laborious and time-consuming experimental analyses of the analogs.  相似文献   

9.
The positions of a given fold always occupied by strong hydrophobic amino acids (V, I, L, F, M, Y, W), which we call “topohydrophobic positions”, were detected and their properties demonstrated within 153 non-redundant families of homologous domains, through 3D structural alignments. Sets of divergent sequences possessing at least four to five members appear to be as informative as larger sets, provided that their mean pairwise sequence identity is low. Amino acids in topohydrophobic positions exhibit several interesting features: they are much more buried than their equivalents in non-topohydrophobic positions, their side chains are far less dispersed; and they often constitute a lattice of close contacts in the inner core of globular domains. In most cases, each regular secondary structure possesses one to three topohydrophobic positions, which cluster in the domain core. Moreover, using sensitive alignment processes such as hydrophobic cluster analysis (HCA), it is possible to identify topohydrophobic positions from only a small set of divergent sequences. Amino acids in topohydrophobic positions, which can be identified directly from sequences, constitute key markers of protein folds, define long-range structural constraints, which, together with secondary structure predictions, limit the number of possible conformations for a given fold. Received: 24 April 1998 / Accepted: 4 August 1998 / Published online: 16 November 1998  相似文献   

10.
The binding properties of a peptidoglycan recognition protein are translated via combinatorial chemistry into short peptides. Non-adjacent histidine, tyrosine, and arginine residues in the protein’s binding cleft that associate specifically with the glycan moiety of a peptidoglycan substrate are incorporated into linear sequences creating a library of 27 candidate tripeptide reagents (three possible residues permutated across three positions). Upon electrospraying the peptide library and carbohydrate mixtures, some noncovalent complexes are observed. The binding efficiencies of the peptides vary according to their amino acid composition as well as the disaccharide linkage and carbohydrate ring-type. In addition to providing a charge-carrier for the carbohydrate, peptide reagents can also be used to differentiate carbohydrate isomers by ion mobility spectrometry. The utility of these peptide reagents as a means of enhancing ion mobility analysis of carbohydrates is illustrated by examining four glucose-containing disaccharide isomers, including a pair that is not resolved by ion mobility alone. The specificity and stoichiometry of the peptide–carbohydrate complexes are also investigated. Trihistidine demonstrates both suitable binding efficiency and successful resolution of disaccharides isomers, suggesting it may be a useful reagent in IMS analyses of carbohydrates.  相似文献   

11.
Protein structural class prediction for low similarity sequences is a significant challenge and one of the deeply explored subjects. This plays an important role in drug design, folding recognition of protein, functional analysis and several other biology applications. In this paper, we worked with two benchmark databases existing in the literature (1) 25PDB and (2) 1189 to apply our proposed method for predicting protein structural class. Initially, we transformed protein sequences into DNA sequences and then into binary sequences. Furthermore, we applied symmetrical recurrence quantification analysis (the new approach), where we got 8 features from each symmetry plot computation. Moreover, the machine learning algorithms such as Linear Discriminant Analysis (LDA), Random Forest (RF) and Support Vector Machine (SVM) are used. In addition, comparison was made to find the best classifier for protein structural class prediction. Results show that symmetrical recurrence quantification as feature extraction method with RF classifier outperformed existing methods with an overall accuracy of 100% without overfitting.  相似文献   

12.
Protein structural class prediction solely from protein sequences is a challenging problem in bioinformatics. Numerous efficient methods have been proposed for protein structural class prediction, but challenges remain. Using novel combined sequence information coupled with predicted secondary structural features (PSSF), we proposed a novel scheme to improve prediction of protein structural classes. Given an amino acid sequence, we first transformed it into a reduced amino acid sequence and calculated its word frequencies and word position features to combine novel sequence information. Then we added the PSSF to the combine sequence information to predict protein structural classes. The proposed method was tested on four benchmark datasets in low homology and achieved the overall prediction accuracies of 83.1%, 87.0%, 94.5%, and 85.2%, respectively. The comparison with existing methods demonstrates that the overall improvements range from 2.3% to 27.5%, which indicates that the proposed method is more efficient, especially for low-homology amino acid sequences.  相似文献   

13.
Several machine learning algorithms have recently been applied to modeling the specificity of HIV-1 protease. The problem is challenging because of the three issues as follows: (1) datasets with high dimensionality and small number of samples could misguide classification modeling and its interpretation; (2) symbolic interpretation is desirable because it provides us insight to the specificity in the form of human-understandable rules, and thus helps us to design effective HIV inhibitors; (3) the interpretation should take into account complexity or dependency between positions in sequences. Therefore, it is necessary to investigate multivariate and feature-selective methods to model the specificity and to extract rules from the model. We have tested extensively various machine learning methods, and we have found that the combination of neural networks and decompositional approach can generate a set of effective rules. By validation to experimental results for the HIV-1 protease, the specificity rules outperform the ones generated by frequency-based, univariate or black-box methods.  相似文献   

14.
The new FLYA automated resonance assignment algorithm determines NMR chemical shift assignments on the basis of peak lists from any combination of multidimensional through-bond or through-space NMR experiments for proteins. Backbone and side-chain assignments can be determined. All experimental data are used simultaneously, thereby exploiting optimally the redundancy present in the input peak lists and circumventing potential pitfalls of assignment strategies in which results obtained in a given step remain fixed input data for subsequent steps. Instead of prescribing a specific assignment strategy, the FLYA resonance assignment algorithm requires only experimental peak lists and the primary structure of the protein, from which the peaks expected in a given spectrum can be generated by applying a set of rules, defined in a straightforward way by specifying through-bond or through-space magnetization transfer pathways. The algorithm determines the resonance assignment by finding an optimal mapping between the set of expected peaks that are assigned by definition but have unknown positions and the set of measured peaks in the input peak lists that are initially unassigned but have a known position in the spectrum. Using peak lists obtained by purely automated peak picking from the experimental spectra of three proteins, FLYA assigned correctly 96-99% of the backbone and 90-91% of all resonances that could be assigned manually. Systematic studies quantified the impact of various factors on the assignment accuracy, namely the extent of missing real peaks and the amount of additional artifact peaks in the input peak lists, as well as the accuracy of the peak positions. Comparing the resonance assignments from FLYA with those obtained from two other existing algorithms showed that using identical experimental input data these other algorithms yielded significantly (40-142%) more erroneous assignments than FLYA. The FLYA resonance assignment algorithm thus has the reliability and flexibility to replace most manual and semi-automatic assignment procedures for NMR studies of proteins.  相似文献   

15.
The efficient synthesis of small molecules having many molecular skeletons is an unsolved problem in diversity-oriented synthesis (DOS). We describe the development and application of a synthesis strategy that uses common reaction conditions to transform a collection of similar substrates into a collection of products having distinct molecular skeletons. The substrates have different appendages that pre-encode skeletal information, called sigma-elements. This approach is analogous to the natural process of protein folding in which different primary sequences of amino acids are transformed into macromolecules having distinct three-dimensional structures under common folding conditions. Like sigma-elements, the amino acid sequences pre-encode structural information. An advantage of using folding processes to generate skeletal diversity in DOS is that skeletal information can be pre-encoded into substrates in a combinatorial fashion, similar to the way protein structural information is pre-encoded combinatorially in polypeptide sequences, thus making it possible to generate skeletal diversity in an efficient manner. This efficiency was realized in the context of a fully encoded, split-pool synthesis of approximately 1260 compounds potentially representing all possible combinations of building block, stereochemical, and skeletal diversity elements.  相似文献   

16.
Advancements in sequencing techniques place personalized genomic medicine upon the horizon, bringing along the responsibility of clinicians to understand the likelihood for a mutation to cause disease, and of scientists to separate etiology from nonpathologic variability. Pathogenicity is discernable from patterns of interactions between a missense mutation, the surrounding protein structure, and intermolecular interactions. Physicochemical stability calculations are not accessible without structures, as is the case for the vast majority of human proteins, so diagnostic accuracy remains in infancy. To model the effects of missense mutations on functional stability without structure, we combine novel protein sequence analysis algorithms to discern spatial distributions of sequence, evolutionary, and physicochemical conservation, through a new approach to optimize component selection. Novel components include a combinatory substitution matrix and two heuristic algorithms that detect positions which confer structural support to interaction interfaces. The method reaches 0.91 AUC in ten-fold cross-validation to predict alteration of function for 6,392 in vitro mutations. For clinical utility we trained the method on 7,022 disease associated missense mutations within the Online Mendelian inheritance in man amongst a larger randomized set. In a blinded prospective test to delineate mutations unique to 186 patients with craniosynostosis from those in the 95 highly variant Coriell controls and 1000 age matched controls, we achieved roughly 1/3 sensitivity and perfect specificity. The component algorithms retained during machine learning constitute novel protein sequence analysis techniques to describe environments supporting neutrality or pathology of mutations. This approach to pathogenetics enables new insight into the mechanistic relationship of missense mutations to disease phenotypes in our patients.  相似文献   

17.
Accurately predicting phosphorylation sites in proteins is an important issue in postgenomics, for which how to efficiently extract the most predictive features from amino acid sequences for modeling is still challenging. Although both the distributed encoding method and the bio-basis function method work well, they still have some limits in use. The distributed encoding method is unable to code the biological content in sequences efficiently, whereas the bio-basis function method is a nonparametric method, which is often computationally expensive. As hidden Markov models (HMMs) can be used to generate one model for one cluster of aligned protein sequences, the aim in this study is to use HMMs to extract features from amino acid sequences, where sequence clusters are determined using available biological knowledge. In this novel method, HMMs are first constructed using functional sequences only. Both functional and nonfunctional training sequences are then inputted into the trained HMMs to generate functional and nonfunctional feature vectors. From this, a machine learning algorithm is used to construct a classifier based on these feature vectors. It is found in this work that (1) this method provides much better prediction accuracy than the use of HMMs only for prediction, and (2) the support vector machines (SVMs) algorithm outperforms decision trees and neural network algorithms when they are constructed on the features extracted using the trained HMMs.  相似文献   

18.
Optimisation problems pervade structural bioinformatics. In this review, we describe recent work addressing a selection of bioinformatics challenges. We begin with a discussion of research into protein structure comparison, and highlight the utility of Kolmogorov complexity as a measure of structural similarity. We then turn to research into de novo protein structure prediction, in which structures are generated from first principles. In this endeavour, there is a compromise between the detail of the model and the extent to which the conformational space of the protein can be sampled. We discuss some developments in this area, including off-lattice structure prediction using the great deluge algorithm. One strategy to reduce the size of the search space is to restrict the protein chain to sites on a regular lattice. In this context, we highlight the use of memetic algorithms, which combine genetic algorithms with local optimisation, to the study of simple protein models on the two-dimensional square lattice and the face-centred cubic lattice.  相似文献   

19.
Pathogen–host interactions are very important to figure out the infection process at the molecular level, where pathogen proteins physically bind to human proteins to manipulate critical biological processes in the host cell. Data scarcity and data unavailability are two major problems for computational approaches in the prediction of pathogen–host interactions. Developing a computational method to predict pathogen–host interactions with high accuracy, based on protein sequences alone, is of great importance because it can eliminate these problems. In this study, we propose a novel and robust sequence based feature extraction method, named Location Based Encoding, to predict pathogen–host interactions with machine learning based algorithms. In this context, we use Bacillus Anthracis and Yersinia Pestis data sets as the pathogen organisms and human proteins as the host model to compare our method with sequence based protein encoding methods, which are widely used in the literature, namely amino acid composition, amino acid pair, and conjoint triad. We use these encoding methods with decision trees (Random Forest, j48), statistical (Bayesian Networks, Naive Bayes), and instance based (kNN) classifiers to predict pathogen–host interactions. We conduct different experiments to evaluate the effectiveness of our method. We obtain the best results among all the experiments with RF classifier in terms of F1, accuracy, MCC, and AUC.  相似文献   

20.
Point Accepted Mutation (PAM) is the Markov model of amino acid replacements in proteins introduced by Dayhoff and her co-workers (Dayhoff et al., 1978). The PAM matrices and other matrices based on the PAM model have been widely accepted as the standard scoring system of protein sequence similarity in protein sequence alignment tools. Here, we present Contact Accepted mutatiOn (CAO), a Markov model of protein residue contact mutations. The CAO model simulates the interchanging of structurally defined side-chain contacts, and introduces additional structural information into protein sequence alignments. Therefore, similarities between structurally conserved sequences can be detected even without apparent sequence similarity. CAO has been benchmarked on the HOMSTRAD database and a subset of the CATH database, by comparing sequence alignments with reference alignments derived from structural superposition. CAO yields scores that reflect coherently the structural quality of sequence alignments, which has implications particularly for homology modelling and threading techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号