首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
构筑了一个9个残基长度的短肽段库,根据短肽段的结构把它们分成不同的簇,用卡方分析每个簇,鉴别不同位点之间的相关性.同时还构筑了一个序列比对的评分函数,这个评分函数不但包含单个位点的信息,也包含不同位点之间相关性的信息.结果表明, 位点之间的相关性能够提供一些新颖的信息,这些信息一般都与短肽段不同位点之间的相互作用有关,相关性信息的引入能够明显的提高短肽段结构的预测准确度.  相似文献   

2.
Prediction of protein domain structural classes is an important topic in protein science. In this paper, we proposed a new conception: structural class tendency of polypeptides (SCTP), which is based on the fact that a given amino acid fragment tends to be presented in certain type of proteins. The SCTP is obtained from an available training data set PDB40-B. When using the SCTP to predict protein structural classes by Intimate Sorting predictive method, we got the predictive accuracy (jackknife test) with 93.7%, 96.5%, and 78.6% for the testing data set PDB40-j, Chou&Maggiora and CHOU. These results indicate that the SCTP approach is quite encouraging and promising. This new conception provides an effective tool to extract valuable information from protein sequences.  相似文献   

3.
The low complexity domain (LCD) sequence has been defined in terms of entropy using a 12 amino acid sliding window along a protein sequence in the study of disease-related genes. The amyotrophic lateral sclerosis (ALS)-related TDP-43 protein sequence with intra-LCD structural information based on cryo-EM data was published recently. An application of entropy and Higuchi fractal dimension calculations was described using the Znf521 and HAR1 sequences. A computational analysis of the intra-LCD sequence entropy and Higuchi fractal dimension values at the amino acid level and at the ATCG nucleotide level were conducted without the sliding window requirement. The computational results were consistent in predicting the intermediate entropy/fractal dimension value produced when two subsequences at two different entropy/fractal dimension values were combined. The computational method without the application of a sliding-window was extended to an analysis of the recently reported virulent genes—Orf6, Nsp6, and Orf7a—in SARS-CoV-2. The relationship between the virulence functionality and entropy values was found to have correlation coefficients between 0.84 and 0.99, using a 5% uncertainty on the cell viability data. The analysis found that the most virulent Orf6 gene sequence had the lowest nucleotide entropy and the highest protein fractal dimension, in line with extreme value theory. The Orf6 codon usage bias in relation to vaccine design was discussed.  相似文献   

4.
In this work, we introduce a set of pulse sequences that provide amino acid type identification of the NH correlation signals of proteins. The first pulse sequence is a modification of the CBCA(CO)NH experiment that exploits spin-coupling topologies to differentiate between amino acid types. A set of eight 2D 1H–15N correlation spectra is recorded where the sign of the cross-peaks change from one spectrum to another according to the amino acid type of the preceding residue in the protein sequence. Linear combination of these eight data sets produces four subspectra. Taking also into account the sign of the correlation signals, this method allows the classification of the NH signals into six different groups, depending on the character of the preceding residue. This sequence is complemented with a (CGCBCACO)NH experiment that allows the subdivision of the largest of these groups into two smaller ones. Finally, a modification of the CBCANH experiment led to a similar classification of NH signals into six different groups, but now depending on the type of its own amino acid. The set of pulse sequences is demonstrated with two proteins of small to moderate size.  相似文献   

5.
We present a maximum entropy approach for inferring amino acid interactions in proteins subject to constraints pertaining to the mean numbers of various types of equilibrium contacts for a given sequence or a set of sequences. We have carried out several kinds of tests for a two-dimensional lattice model with just two types of amino acids with very promising results. We also show that the method works very well even when the mean numbers of contacts are not known and therefore can be applied to real proteins.  相似文献   

6.
江凡  李南 《中国物理》2007,16(2):392-404
One of the long-standing controversial arguments in protein folding is Levinthal's paradox. We have recently proposed a new nucleation hypothesis and shown that the nucleation residues are the most conserved sequences in protein. To avoid the complicated effect of tertiary interactions, we limit our search for structural codes to the nucleation residues. Starting with the hypotheses of secondary structure nucleation and conservation of residues important for folding, we have analysed 762 folds classified as unique by SCOP. Segments of 17 residues around the top 20% conserved amino acids are analysed, resulting in approximately 100 clusters each for the main secondary structure classes of helix, sheet and coil. Helical clusters have the longest correlation range, coils the shortest (four residues). Strong specific sequence-structure correlation is observed for coil but not for helix and sheet, suggesting a mapping relationship between the sequence and the structure for coil. We propose that the central sequences in these clusters form `structural codes', a useful basis set for identifying nucleation sites, protein fragments stable in isolation, and secondary structural patterns in proteins (particularly turns and loops).  相似文献   

7.
杨为  来鲁华 《中国物理 B》2016,25(1):18702-018702
Computational design of proteins is a relatively new field, where scientists search the enormous sequence space for sequences that can fold into desired structure and perform desired functions. With the computational approach, proteins can be designed, for example, as regulators of biological processes, novel enzymes, or as biotherapeutics. These approaches not only provide valuable information for understanding of sequence–structure–function relations in proteins, but also hold promise for applications to protein engineering and biomedical research. In this review, we briefly introduce the rationale for computational protein design, then summarize the recent progress in this field, including de novo protein design, enzyme design, and design of protein–protein interactions. Challenges and future prospects of this field are also discussed.  相似文献   

8.
Li W  Lin K  Feng K  Cai Y 《Molecular diversity》2008,12(3-4):171-179
In this paper, amino acid compositions are combined with some protein sequence properties (physiochemical properties) to predict protein structural classes. We are able to predict protein structural classes using a mathematical model that combines the nearest neighbor algorithm (NNA), mRMR (minimum redundancy, maximum relevance), and feature forward searching strategy. Jackknife cross-validation is used to evaluate the prediction accuracy. As a result, the prediction success rate improves to 68.8%, which is better than the 62.2% obtained when using only amino acid compositions. Therefore, we conclude that the physiochemical properties are factors that contribute to the protein folding phenomena and the most contributing features are found to be the amino acid composition. We expect that prediction accuracy will improve further as more sequence information comes to light. A web server for predicting the protein structural classes is available at http://app3.biosino.org:8080/liwenjin/index.jsp.  相似文献   

9.
Sequence comparison is one of the major tasks in bioinformatics, which can be used to study structural and functional conservation, as well as evolutionary relations among the sequences. In this paper, we introduce the concept of distance frequency of amino acid pairs and propose a new numerical characterization of protein sequences, which converts any protein sequence into a distance frequency matrix. Using this distance frequency matrix, we can compare the similarity of protein sequences. In order to confirm the validity of our method, we test it with two experiments. The results show that our method is effective.  相似文献   

10.
The character of forming long-range contacts affects the three-dimensional structure of globular proteins deeply. As the different ability to form long-range contacts between 20 types of amino acids and 4 categories of globular proteins, the statistical properties are thoroughly discussed in this paper. Two parameters NC and ND are defined to confine the valid residues in detail. The relationship between hydrophobicity scales and valid residue percentage of each amino acid is given in the present work and the linear functions are shown in our statistical results. It is concluded that the hydrophobicity scale defined by chemical derivatives of the amino acids and nonpolar phase of large unilamellar vesicle membranes is the most effective technique to characterise the hydrophobic behavior of amino acid residues. Meanwhile, residue percentage Pi and sequential residue length Li of a certain protein i are calculated under different conditions. The statistical results show that the average value of Pi as well as Li of all-α proteins has a minimum among these 4 classes of globular proteins, indicating that all-α proteins are hardly capable of forming long-range contacts one by one along their linear amino acid sequences. All-β proteins have a higher tendency to construct long-range contacts along their primary sequences related to the secondary configurations, i.e. parallel and anti-parallel configurations of β sheets. The investigation of the interior properties of globular proteins give us the connection between the three-dimensional structure and its primary sequence data or secondary configurations, and help us to understand the structure of protein and its folding process well.  相似文献   

11.
Objective: Early mobilization and rehabilitation has become common and expectations for physical therapists working in intensive care units have increased in Japan. The objective of this study was to establish consensus-based minimum clinical practice standards for physical therapists working in intensive care units in Japan. It also aimed to make an international comparison of minimum clinical practice standards in this area. Methods: In total, 54 experienced physical therapists gave informed consent and participated in this study. A modified Delphi method with questionnaires was used over three rounds. Participants rated 272 items as “essential/unknown/non-essential”. Consensus was considered to be reached on items that over 70% of physical therapists rated as “essential” to clinical practice in the intensive care unit. Results: Of the 272 items in the first round, 188 were deemed essential. In round 2, 11 of the 62 items that failed to reach consensus in round 1 were additionally deemed essential. No item was added to the “essential” consensus in round 3. In total, 199 items were therefore deemed essential as a minimum standard of clinical practice. Participants agreed that 42 items were not essential and failed to reach agreement on 31 others. Identified 199 items were different from those in the UK and Australia due to national laws, cultural and historical backgrounds. Conclusions: This is the first study to develop a consensus-based minimum clinical practice standard for physical therapists working in intensive care units in Japan.  相似文献   

12.
Recently, the scientific community has witnessed a substantial increase in the generation of protein sequence data, triggering emergent challenges of increasing importance, namely efficient storage and improved data analysis. For both applications, data compression is a straightforward solution. However, in the literature, the number of specific protein sequence compressors is relatively low. Moreover, these specialized compressors marginally improve the compression ratio over the best general-purpose compressors. In this paper, we present AC2, a new lossless data compressor for protein (or amino acid) sequences. AC2 uses a neural network to mix experts with a stacked generalization approach and individual cache-hash memory models to the highest-context orders. Compared to the previous compressor (AC), we show gains of 2–9% and 6–7% in reference-free and reference-based modes, respectively. These gains come at the cost of three times slower computations. AC2 also improves memory usage against AC, with requirements about seven times lower, without being affected by the sequences’ input size. As an analysis application, we use AC2 to measure the similarity between each SARS-CoV-2 protein sequence with each viral protein sequence from the whole UniProt database. The results consistently show higher similarity to the pangolin coronavirus, followed by the bat and human coronaviruses, contributing with critical results to a current controversial subject. AC2 is available for free download under GPLv3 license.  相似文献   

13.
To estimate the amount of evapotranspiration in a river basin, the “short period water balance method” was formulated. Then, by introducing the “complementary relationship method,” the amount of evapotranspiration was estimated seasonally, and with reasonable accuracy, for both small and large areas. Moreover, to accurately estimate river discharge in the low water season, the “weighted statistical unit hydrograph method” was proposed and a procedure for the calculation of the unit hydrograph was developed. Also, a new model, based on the “equivalent roughness method,” was successfully developed for the estimation of flood runoff from newly reclaimed farmlands. Based on the results of this research, a “composite reservoir model” was formulated to analyze the repeated use of irrigation water in large spatial areas. The application of this model to a number of watershed areas provided useful information with regard to the realities of water demand-supply systems in watersheds predominately dedicated to paddy fields, in Japan.  相似文献   

14.
ABSTRACT

The majority of proteins perform their cellular function after folding into a specific and stable native structure. Additionally, for many proteins less compact ‘molten globule’ states have been observed. Current experimental observations show that the molten globule state can show varying degrees of compactness and solvent accessibility; the underlying molecular cause for this variation is not well understood. While the specificity of protein folding can be studied using protein lattice models, current design procedures for these models tend to generate sequences without molten globule-like behaviour. Here we alter the design process so the distance between the molten globule ensemble and the native structure can be steered; this allows us to design protein sequences with a wide range of folding pathways, and sequences with well-defined heat-induced molten globules. Simulating these sequences we find that (1) molten globule states are compact, but have less specific configurations compared to the folded state, (2) the nature of the molten globule state is highly sequence dependent, (3) both two-state and multi-state folding proteins may show heat-induced molten globule states, as observed in heat capacity curves. The varying nature of the molten globules and typical heat capacity curves associated with the transitions closely resemble experimental observations.  相似文献   

15.
In physiological conditions globular protein molecules assume a specific native conformation uniquely determined by its amino acid sequence. Upon environmental changes the protein molecules undergo reversible unfolding (order losing) and folding (order gaining) transitions, which is similar to the first-order phase transition. Pathways of folding have been intensively studied in the hope of deciphering the code that amino acid sequences carry as to the threedimensional structure of proteins. A strongly simplifiedlattice model of proteins has been found to be a powerful theoretical tool to simulate the dynamic process of the folding and unfolding transitions. The results of the simulation indicate the existence of stochastic pathways of folding.  相似文献   

16.
Here we present an approximate analytical theory for the relationship between a protein structure's contact matrix and the shape of its energy spectrum in amino acid sequence space. We demonstrate a dependence of the number of sequences of low energy in a structure on the eigenvalues of the structure's contact matrix, and then use a Monte Carlo simulation to test the applicability of this analytical result to cubic lattice proteins. We find that the lattice structures with the most low-energy sequences are the same as those predicted by the theory. We argue that, given sufficiently strict requirements for foldability, these structures are the most designable, and we propose a simple means to test whether the results in this paper hold true for real proteins.  相似文献   

17.
How do living systems process information? The search for an answer to this question is ongoing. We have developed an intelligent video analytics system. The process of the formation of detectors for content-based image retrieval aimed at detecting objects of various types simulates the operation of the structural and functional modules for image processing in living systems. The process of detector construction is, in fact, a model of the formation (or activation) of connections in the cortical column (structural and functional unit of information processing in the human and animal brain). The process of content-based image retrieval, that is, the detection of various types of images in the developed system, reproduces the process of “triggering” a model biomorphic column, i.e., a detector in which connections are formed during the learning process. The recognition process is a reaction of the receptive field of the column to the activation by a given signal. Since the learning process of the detector can be visualized, it is possible to see how a column (a detector of specific stimuli) is formed: a face, a digit, a number, etc. The created artificial cognitive system is a biomorphic model of the recognition column of living systems.  相似文献   

18.
Complex modeling has received significant attention in recent years and is increasingly used to explain statistical phenomena with increasing and decreasing fluctuations, such as the similarity or difference of spike protein charge patterns of coronaviruses. Different from the existing covariance or correlation coefficient methods in traditional integer dimension construction, this study proposes a simplified novel fractional dimension derivation with the exact Excel tool algorithm. It involves the fractional center moment extension to covariance, which results in a complex covariance coefficient that is better than the Pearson correlation coefficient, in the sense that the nonlinearity relationship can be further depicted. The spike protein sequences of coronaviruses were obtained from the GenBank and GISAID databases, including the coronaviruses from pangolin, bat, canine, swine (three variants), feline, tiger, SARS-CoV-1, MERS, and SARS-CoV-2 (including the strains from Wuhan, Beijing, New York, German, and the UK variant B.1.1.7) which were used as the representative examples in this study. By examining the values above and below the average/mean based on the positive and negative charge patterns of the amino acid residues of the spike proteins from coronaviruses, the proposed algorithm provides deep insights into the nonlinear evolving trends of spike proteins for understanding the viral evolution and identifying the protein characteristics associated with viral fatality. The calculation results demonstrate that the complex covariance coefficient analyzed by this algorithm is capable of distinguishing the subtle nonlinear differences in the spike protein charge patterns with reference to Wuhan strain SARS-CoV-2, which the Pearson correlation coefficient may overlook. Our analysis reveals the unique convergent (positive correlative) to divergent (negative correlative) domain center positions of each virus. The convergent or conserved region may be critical to the viral stability or viability; while the divergent region is highly variable between coronaviruses, suggesting high frequency of mutations in this region. The analyses show that the conserved center region of SARS-CoV-1 spike protein is located at amino acid residues 900, but shifted to the amino acid residues 700 in MERS spike protein, and then to amino acid residues 600 in SARS-COV-2 spike protein, indicating the evolution of the coronaviruses. Interestingly, the conserved center region of the spike protein in SARS-COV-2 variant B.1.1.7 shifted back to amino acid residues 700, suggesting this variant is more virulent than the original SARS-COV-2 strain. Another important characteristic our study reveals is that the distance between the divergent mean and the maximal divergent point in each of the viruses (MERS > SARS-CoV-1 > SARS-CoV-2) is proportional to viral fatality rate. This algorithm may help to understand and analyze the evolving trends and critical characteristics of SARS-COV-2 variants, other coronaviral proteins and viruses.  相似文献   

19.
Molecular modelling is a powerful methodology for analysing the three dimensional structure of biological macromolecules. There are many ways in which molecular modelling methods have been used to address problems in structural biology. It is not widely appreciated that modelling methods are often an integral component of structure determination by NMR spectroscopy and X-ray crystallography. In this review we consider some of the numerous ways in which modelling can be used to interpret and rationalise experimental data and in constructing hypotheses that can be tested by experiment. Genome sequencing projects are producing a vast wealth of data describing the protein coding regions of the genome under study. However, only a minority of the protein sequences thus identified will have a clear sequence homology to a known protein. In such cases valuable three-dimensional models of the protein coding sequence can be constructed by homology modelling methods. Threading methods, which used specialised schemes to relate protein sequences to a library of known structures, have been shown to be able to identify the likely protein fold even in cases where there is no clear sequence homology. The number of protein sequences that cannot be assigned to a structural class by homology or threading methods, simply because they belong to a previously unidentified protein folding class, will decrease in the future as collaborative efforts in systematic structure determination begin to develop. For this reason, modelling methods are likely to become increasingly useful in the near future. The role of the blind prediction contests, such as the Critical Assessment of techniques for protein Structure Prediction (CASP), will be briefly discussed. Methods for modelling protein-ligand and protein-protein complexes are also described and examples of their applications given.  相似文献   

20.
Vault is the largest nonicosahedral cytosolic nucleoprotein particle ever described. The widespread presence and evolutionary conservation of vaults suggest important biologic roles, although their functions have not been fully elucidated. X-ray structure of vault from rat liver was determined at 3.5 Å resolution. It exhibits an ovoid shape with a size of 40 × 40 × 67 nm3. The cage structure of vault consists of a dimer of half-vaults, with each half-vault comprising 39 identical major vault protein (MVP) chains. Each MVP monomer folds into 12 domains: nine structural repeat domains, a shoulder domain, a cap-helix domain and a cap-ring domain. Interactions between the 42-turn-long cap-helix domains are key to stabilizing the particle. The other components of vaults, telomerase-associated proteins, poly(ADP-ribose) polymerases and small RNAs, are in location in the vault particle by electron microscopy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号