首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We illustrate solving the protein alignment problem exactly using the algorithm VESPA (very efficient search for protein alignment). We have compared our result with the approximate solution obtained with BLAST (basic local alignment search tool) software, which is currently the most widely used for searching for protein alignment. We have selected human and mouse proteins having around 170 amino acids for comparison. The exact solution has found 78 pairs of amino acids, to which one should add 17 individual amino acid alignments giving a total of 95 aligned amino acids. BLAST has identified 64 aligned amino acids which involve pairs of more than two adjacent amino acids. However, the difference between the two outputs is not as large as it may appear, because a number of amino acids that are adjacent have been reported by BLAST as single amino acids. So if one counts all amino acids, whether isolated (single) or in a group of two and more amino acids, then the count for BLAST is 89 and for VESPA is 95, a difference of only six. © 2015 Wiley Periodicals, Inc.  相似文献   

2.
A novel characterization of proteins is presented based on selected properties of recently introduced 20 × 20 amino acid adjacency matrix of proteins in which matrix elements count the occurrence of all 400 possible pair-wise adjacencies obtained by reading protein primary sequence from the left to the right. In particular we consider the characterization based on the sum and the difference of the rows and the corresponding columns, which characterize proteins by a pair of 20-component vectors. The approach is illustrated on a set of ND6 proteins of eight species.  相似文献   

3.
We describe a very efficient search for nucleotide alignments, which is analogous to the novel very efficient search for protein alignment. Just as it has been the case with the alignment of proteins, based on 20 × 20 adjacency matrices for amino acids, obtained from a superposition of labeled amino acids adjacency matrices for the proteins considered, one can construct labeled matrices of size 4 × 4, listing adjacencies of nucleotides in DNA sequence. The matrix elements correspond to 16 pairs of adjacent nucleotides. To obtain DNA alignments, one combines information in the corresponding matrices for a pair of DNA nucleotides. Matrices are obtained by insertion of the sequential labels for pairs of nucleotides in the corresponding cells of the 4 × 4 tables. When two such matrices are superimposed, one can identify all segments in two DNA sequences, which are shifted relative to one another by the same amount in either direction, without using trial‐and‐error displacements of the two sequences one relative to the other to find local nucleotide alignments. © 2012 Wiley Periodicals, Inc.  相似文献   

4.
Proteins carry out the most important and difficult tasks in all living organisms. To do so, they must often interact specifically with other small and large molecules. This requires that they fold to a globular conformation with a unique active site that is used for the specific interaction. Consequently, protein folding can be regarded as the “secret of life”. Biochemists and chemists have a great interest in elucidating the mechanism by which proteins fold and in predicting the folded conformation and its stability given just the amino acid sequence. This challenge is sometimes called the “protein folding problem”. The ability to construct proteins differing in sequence by one or more amino acids and to analyze their three-dimensional structures by X-ray crystallography and NMR spectroscopy is a powerful tool for investigating the conformational stability and folding of proteins. Several proteins are now under intensive study by this approach. One of these is ribonuclease T1.  相似文献   

5.
We consider a novel numerical representation of proteins obtained by assigning to individual amino acids the polar coordinate on a unit circle. As a result one can represent protein sequence as one-dimensional numerical sequence, the entries of which when subtracted facilitates search for alignment between pairs of proteins of interest. The alignment is sought by shifting one sequence relative to another by several sequence units to the left or to the right. The novel approach is illustrated on two yeast proteins having 174 and 171 amino acids. Visiting Emeritus from the Department of Mathematics & Computer Science Drake University, Des Moines, Iowa.  相似文献   

6.
We report the discovery of a simple system through which variant pyrrolysyl-tRNA synthetase/tRNA(CUA Pyl) pairs created in Escherichia coli can be used to expand the genetic code of Saccharomyces cerevisiae. In the process we have solved the key challenges of producing a functional tRNA(CUA Pyl) in yeast and discovered a pyrrolysyl-tRNA synthetase/tRNA(CUA Pyl) pair that is orthogonal in yeast. Using our approach we have incorporated an alkyne-containing amino acid for click chemistry, an important post-translationally modified amino acid and one of its analogs, a photocaged amino acid and a photo-cross-linking amino acid into proteins in yeast. Extensions of our approach will allow the growing list of useful amino acids that have been incorporated in E. coli with variant pyrrolysyl-tRNA synthetase/tRNA(CUA Pyl) pairs to be site-specifically incorporated into proteins in yeast.  相似文献   

7.
We consider a spectrum-like two-dimensional graphical representation of proteins based on a reduced protein model in which 20 amino acids are grouped into five classes. This particular grouping of amino acids was suggested by Riddle and co-workers in 1997. The graphical representation is based on depicting sequentially the amino acids on five horizontal lines at equal separations. One-letter codes, B, O, U, X and Y, to which numerical values 1 to 5 have been assigned, are suggested as labels for the fictional amino acids that represent all the amino acids within each group. The approach is illustrated on ND6 proteins of eight species having from 168 to 175 amino acids. While visual inspection of the novel spectral graphical representations of proteins may reveal local similarities and dissimilarities of protein sequences, arithmetic manipulations of spectra offer an elegant route to graphic visualization of the degree of similarity for selected pairs of proteins.  相似文献   

8.
Protein design is a useful method to create novel artificial proteins. A rational approach to design a heterodimeric protein using domain swapping for horse myoglobin (Mb) was developed. As confirmed by X‐ray crystallographic analysis, a heterodimeric Mb with two different active sites was produced efficiently from two surface mutants of Mb, in which the charges of two amino acids involved in the dimer salt bridges were reversed in each mutant individually, with the active site of one mutant modified. This study shows that the method of constructing heterodimeric Mb with domain swapping is useful for designing artificial multiheme proteins.  相似文献   

9.
We have outlined various aspects and limitations of the collective analysis of protein species of a cell (lymphocyte). We have indicated research directions that, in to our opinion, deserve more attention. We have evaluated mainly the approach used in our laboratory and we recognize that a bulk of important research on the interface of proteomics and genomics remains to be dealt with. It is of great value that we can proceed in our quest by trial and error. But as much as the human genome initiative was not implemented by trial and error, but by formulating new technological approaches, we hope that our approach can be incorporated in the mainstream of proteomics. We need several integrating research directions, some of which are outlined in this communication, namely the use of ordered cDNA libraries, cell-free expression systems, high density filter hybridization, identification of two-dimensional (2-D) gel spots in terms of their amino acid composition through biosynthetic labeling and identification of restriction sites in the corresponding coding sequences. In the accompanying paper the cDNA ordered library approach will be described in some detail.  相似文献   

10.
Development of protein 3-D structural comparison methods is essential for understanding protein functions. Some amino acids share structural similarities while others vary considerably. These structures determine the chemical and physical properties of amino acids. Grouping amino acids with similar structures potentially improves the ability to identify structurally conserved regions and increases the global structural similarity between proteins. We systematically studied the effects of amino acid grouping on the numbers of Specific/specific, Common/common, and statistically different keys to achieve a better understanding of protein structure relations. Common keys represent substructures found in all types of proteins and Specific keys represent substructures exclusively belonging to a certain type of proteins in a data set. Our results show that applying amino acid grouping to the Triangular Spatial Relationship (TSR)-based method, while computing structural similarity among proteins, improves the accuracy of protein clustering in certain cases. In addition, applying amino acid grouping facilitates the process of identification or discovery of conserved structural motifs. The results from the principal component analysis (PCA) demonstrate that applying amino acid grouping captures slightly more structural variation than when amino acid grouping is not used, indicating that amino acid grouping reduces structure diversity as predicted. The TSR-based method uniquely identifies and discovers binding sites for drugs or interacting proteins. The binding sites of nsp16 of SARS-CoV-2, SARS-CoV and MERS-CoV that we have defined will aid future antiviral drug design for improving therapeutic outcome. This approach for incorporating the amino acid grouping feature into our structural comparison method is promising and provides a deeper insight into understanding of structural relations of proteins.  相似文献   

11.
A coarse-grained model for molecular dynamics simulations is extended from lipids to proteins. In the framework of such models pioneered by Klein, atoms are described group-wise by beads, with the interactions between beads governed by effective potentials. The extension developed here is based on a coarse-grained lipid model developed previously by Marrink et al., although future versions will reconcile the approach taken with the systematic approach of Klein and other authors. Each amino acid of the protein is represented by two coarse-grained beads, one for the backbone (identical for all residues) and one for the side-chain (which differs depending on the residue type). The coarse-graining reduces the system size about 10-fold and allows integration time steps of 25-50 fs. The model is applied to simulations of discoidal high-density lipoprotein particles involving water, lipids, and two primarily helical proteins. These particles are an ideal test system for the extension of coarse-grained models. Our model proved to be reliable in maintaining the shape of preassembled particles and in reproducing the overall structural features of high-density lipoproteins accurately. Microsecond simulations of lipoprotein assembly revealed the formation of a protein-lipid complex in which two proteins are attached to either side of a discoidal lipid bilayer.  相似文献   

12.
13.
Parameterization and test calculations of a reduced protein model with new energy terms are presented. The new energy terms retain the steric properties and the most significant degrees of freedom of protein side chains in an efficient way using only one to three virtual atoms per amino acid residue. The energy terms are implemented in a force field containing predefined secondary structure elements as constraints, electrostatic interaction terms, and a solvent‐accessible surface area term to include the effect of solvation. In the force field the main‐chain peptide units are modeled as electric dipoles, which have constant directions in α‐helices and β‐sheets and variable conformation‐dependent directions in loops. Protein secondary structures can be readily modeled using these dipole terms. Parameters of the force field were derived using a large set of experimental protein structures and refined by minimizing RMS errors between the experimental structures and structures generated using molecular dynamics simulations. The final average RMS error was 3.7 Å for the main‐chain virtual atoms (Cα atoms) and 4.2 Å for all virtual atoms for a test set of 10 proteins with 58–294 amino acid residues. The force field was further tested with a substantially larger test set of 608 proteins yielding somewhat lower accuracy. The fold recognition capabilities of the force field were also evaluated using a set of 27,814 misfolded decoy structures. © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 1229–1242, 2001  相似文献   

14.
SILAC is a widely accepted approach for quantitative proteomics in which proteins are labeled with stable isotopes during cell culture. A major drawback of this technique is the metabolic conversion of labeled amino acids that may hamper accurate quantification. A paradigmatic example of this phenomenon is the generation of labeled proline from arginine, known to occur in a good number of biological models. We propose a novel methodology to identify and quantitate metabolic conversions as well as to evaluate labeling efficiency in SILAC experiments. In this approach, labeled proteins are reduced to amino acids by acid hydrolysis before LC-MS/MS analysis. Since it is carried out at the amino acid level, tracking the fate of the isotope label is straightforward and can be performed for each amino acid independently. After applying this method to mammalian cells, grown in the presence of heavy arginine and lysine, labeling efficiency and amino acid conversions could be accurately evaluated. Only undesirable labeling of proline was found to occur at a significant extent, varying greatly among cell lines. Finally, increasing proline concentration in the growing medium was shown to be effective at preventing arginine conversion without any noticeable side effect.  相似文献   

15.
The etching induced by water on hydrophobic (001) surfaces of enantiomeric L-, D- and racemic DL-valine crystals has been characterized by means of atomic force microscopy (AFM) at ambient conditions. Well-defined chiral parallelepipedic shallow patterns, one bilayer deep, are observed for the enantiomeric crystals with sides (steps) oriented along low index crystallographic directions. Hence, chirality can be readily identified by visual inspection of an AFM image after etching. The formation of such regular patterns can be rationalized using basic concepts of electrical dipolar interactions. The key factor that determines the relative etching rate for each step and thus defines the shape of the etching patterns is the orientation of the molecular dipoles with respect to the step edge. The simplicity of the approach allows the prediction of the effect of water etching on other amino acid crystals as well as the effect of the interaction of water with amino acid molecules forming part of more complex structures.  相似文献   

16.
The adsorption of mixtures of charged proteins on charged surfaces is studied using a molecular theory. The theory explicitly treats each of the molecular species in the system. The mixtures treated in this work are composed by two types of proteins, dissociated monovalent salt and solvent. The intermolecular and surface interactions include electrostatic, van der Waals and excluded volume. The theory is more general than the Poisson-Boltzmann approach since the size and shape of all the molecular components are explicitly treated. The studies presented in this work concentrate on the differences in competitive adsorption when the proteins in the mixtures differ in their total charge or in the spatial distribution of the charges within the proteins. In the cases of mixtures that differ in the number of charges it is found, as expected, that the particles with the larger charge adsorb in excess. The ratio of adsorbed proteins can vary by 3-5 orders of magnitude by varying the bulk salt concentration from 1 to 100 mM. This is the result of an increase on the adsorption of the proteins with larger charge and an even stronger decrease on the adsorption of the less charged particles. The simple model systems studied provide guidelines on how to separate charge ladder proteins and proteins with different charge distributions. In the case of proteins with the same total charge but different charge distribution, it is found that the partition of the proteins depends upon the bulk composition. However, in general the particles with the highest localized charge tend to adsorb more on the surfaces. The proteins are adsorbed in one or more layers. The structure of the second adsorbed layer is determined mostly by the bulk properties of the solution. In all cases it is found that in the range of salt concentrations studied the number of adsorbed ions from the salt is very large. This is due to competitive adsorption with the proteins and their very low bulk concentration compared to the salt. The limitations of the theory and directions for improvement of the approach as well as the model for the proteins are discussed.  相似文献   

17.
Proteins destined for regions other than the cytoplasm in cells have to cross at least one membrane barrier before reaching their proper destination. Almost all such proteins are initially biosynthesized as precursors with signal sequences at the amino terminus. Signal sequences are essential and also sufficient for proteins to be targeted to membranes and also for translocation across membranes. One striking feature that is clearly evident amongst signal sequences of secretory proteins is a positively charged amino terminus followed by a region comprising 10–12 very hydrophobic amino acids. The structural and physico-chemical properties of signal sequences have been analysed. On the basis of the analyses it is proposed that the structural feature of a positively charged amino terminal region followed by a hydrophobic stretch of amino acids, rather than a conformational one, is recognised by components of the cells export machinery. It is also postulated that signal sequences insert in the lipid bilayer of the translocation competent membrane after targeting. The presence of the signal sequence results in the formation of local ‘defects’ in the bilayer which have a role in translocation of proteins across membranes.  相似文献   

18.
Because protein identifications rely on matches with sequence databases, high-throughput proteomics is currently largely restricted to those species for which comprehensive sequence databases are available. The identification of proteins derived from organisms with unsequenced genomes mainly depends on homology searching. Here, we report the use of a simplified, gel-based, chemical derivatization strategy for de novo sequence analysis using a MALDI-TOF/TOF mass spectrometer. This approach allows the determination of de novo peptide sequences of up to 20 amino acid residues in length. The protocol was applied on a proteomic study of 2-D PAGE-separated proteins from Halorhodospira halophila, an extremophilic eubacterium with yet unsequenced genome. Using three different homology-based search algorithms, we were able to identify more than 30 proteins from this organism using subpicomole quantities of protein.  相似文献   

19.
A novel characterization of proteins is presented based on selected properties of recently introduced 20 x 20 amino acid adjacency matrix of proteins in which matrix elements count the occurrence of all 400 possible pair-wise adjacencies obtained by reading protein primary sequence from the left to the right. In particular we consider the characterization based on the sum and the difference of the rows and the corresponding columns, which characterize proteins by a pair of 20-component vectors. The approach is illustrated on a set of ND6 proteins of eight species.  相似文献   

20.
In this paper, we propose a method to create the 60-dimensional feature vector for protein sequences via the general form of pseudo amino acid composition. The construction of the feature vector is based on the contents of amino acids, total distance of each amino acid from the first amino acid in the protein sequence and the distribution of 20 amino acids. The obtained cosine distance metric (also called the similarity matrix) is used to construct the phylogenetic tree by the neighbour joining method. In order to show the applicability of our approach, we tested it on three proteins: 1) ND5 protein sequences from nine species, 2) ND6 protein sequences from eight species, and 3) 50 coronavirus spike proteins. The results are in agreement with known history and the output from the multiple sequence alignment program ClustalW, which is widely used. We have also compared our phylogenetic results with six other recently proposed alignment-free methods. These comparisons show that our proposed method gives a more consistent biological relationship than the others. In addition, the time complexity is linear and space required is less as compared with other alignment-free methods that use graphical representation. It should be noted that the multiple sequence alignment method has exponential time complexity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号