首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
The WEB tool "AnDom" assigns to a given protein sequence all experimentally determined structural domains contained within it, including multidomain and large proteins. The server uses profile specific matrices from custom generated multiple sequence alignments of all known SCOP domains (SCOP version 1.50). Prediction time is short allowing numerous applications for structural genomics including investigation of complex eucaryotic protein families. The WWW server is at http://www.bork.embl-heidelberg.de/AnDom, and profiles can be downloaded at ftp.bork.embl-heidelberg.de/pub/users/ schmidt/AnDom.  相似文献   

2.
In silico methods play an essential role in modern drug discovery methods. Virtual screening, an in silico method, is used to filter out the chemical space on which actual wet lab experiments are need to be conducted. Ligand based virtual screening is a computational strategy using which one can build a model of the target protein based on the knowledge of the ligands that bind successfully to the target. This model is then used to predict if the new molecule is likely to bind to the target. Support vector machine, a supervised learning algorithm used for classification, can be utilized for virtual screening the ligand data. When used for virtual screening purpose, SVM could produce interesting results. But since we have a huge ligand data, the time taken for training the SVM model is quite high compared to other learning algorithms. By parallelizing these algorithms on multi-core processors, one can easily expedite these discoveries. In this paper, a GPU based ligand based virtual screening tool (GpuSVMScreen) which uses SVM have been proposed and bench-marked. This data parallel virtual screening tool provides high throughput by running in short time. The proposed GpuSVMScreen can successfully screen large number of molecules (billions) also. The source code of this tool is available at http://ccc.nitc.ac.in/project/GPUSVMSCREEN.  相似文献   

3.
In terms of the classification of the protein secondary structures, we propose a 2D representation of protein secondary structure sequences. The representation are used to display, analyze, and compare the secondary structure sequences. Based on this representation, we assign the structural class to the protein, and verify the advantage or disadvantage of the methods of predicted protein second structure.  相似文献   

4.
Protein domains that can fold in isolation are significant targets in diverse area of proteomics research as they are often readily analyzed by high-throughput methods. Here, we report IS-Dom, a dataset of Independent Structural Domains (ISDs) that are most likely to fold in isolation. IS-Dom was constructed by filtering domains from SCOP, CATH, and DomainParser using quantitative structural measures, which were calculated by estimating inter-domain hydrophobic clusters and hydrogen bonds from the full length protein’s atomic coordinates. The ISD detection protocol is fully automated, and all of the computed interactions are stored in the server which enables rapid update of IS-Dom. We also prepared a standard IS-Dom using parameters optimized by maximizing the Youden’s index. The standard IS-Dom, contained 54,860 ISDs, of which 25.5 % had high sequence identity and termini overlap with a Protein Data Bank (PDB) cataloged sequence and are thus experimentally shown to fold in isolation [coined autonomously folded domain (AFDs)]. Furthermore, our ISD detection protocol missed less than 10 % of the AFDs, which corroborated our protocol’s ability to define structural domains that are able to fold independently. IS-Dom is available through the web server (http://domserv.lab.tuat.ac.jp/IS-Dom.html), and users can either, download the standard IS-Dom dataset, construct their own IS-Dom by interactively varying the parameters, or assess the structural independence of newly defined putative domains.  相似文献   

5.
Variable predictive model based class discrimination (VPMCD) algorithm is proposed as an effective protein secondary structure classification tool. The algorithm mathematically represents the characteristics amino acid interactions specific to each protein structure and exploits them further to distinguish different structures. The new concept and the VPMCD classifier are established using well-studied datasets containing four protein classes as benchmark. The protein samples selected from SCOP and PDB databases with varying homology (25-100%) and non-uniform distribution of class samples provide challenging classification problem. The performance of the new method is compared with advanced classification algorithms like component coupled, SVM and neural networks. VPMCD provides superior performance for high homology datasets. 100% classification is achieved for self-consistency test and an improvement of 5% prediction accuracy is obtained during Jackknife test. The sensitivity of the new algorithm is investigated by varying model structures/types and sequence homology. Simpler to implement VPMCD algorithm is observed to be a robust classification technique and shows potential for effective extensions to other clinical diagnosis and data mining applications in biological systems.  相似文献   

6.
The native structures of proteins are governed by a large number of non-covalent interactions yielding a high specificity for the native packing of structural elements. This allows for the reconstitution of proteins from disconnected polypeptide fragments. The specificity for the native arrangement also enables interchange of structural elements with another identical protein chain resulting in dimers with swapped segments. Proteins are not static structures, but open up repetitively on a timescale of minutes to years depending on the identity of the protein and solution conditions. The open protein may self-close and return to the native state, or it may close with another polypeptide chain leading to 3D domain swapping. The term describes two or more protein molecules swapping identical domains or smaller secondary structure elements. The non-covalent intra-molecular interactions between domains in the monomer are thus broken and restored in the oligomer by identical inter-molecular contacts. This review will discuss 3D domain swapping in relation to protein reconstitution and fibril formation. Examples of reconstituted and domain-swapped proteins will be given. The physiological benefits of 3D domain swapping will be discussed, as well as its role in the evolution of proteins and pathology.  相似文献   

7.
A spare representation classification method for tobacco leaves based on near-infrared spectroscopy and deep learning algorithm is reported in this paper. All training samples were used to make up a data dictionary of the sparse representation and the test samples were represented by the sparsest linear combinations of the dictionary by sparse coding. The regression residual of the test sample to each class was computed and finally assigned to the class with the minimum residual. The effectiveness of spare representation classification method was compared with K-nearest neighbor and particle swarm optimization–support vector machine algorithms. The results show that the classification accuracy of the proposed method is higher and it is more efficient. The results suggest that near-infrared spectroscopy with spare representation classification algorithm may be an alternative method to traditional methods for discriminating classes of tobacco leaves.  相似文献   

8.
Choanoflagellates are single-celled eukaryotes with complex signaling pathways. They are considered the closest non-metazoan ancestors to mammals and other metazoans and form multicellular-like states called rosettes. The choanoflagellate Monosiga brevicollis contains over 150 PDZ domains, an important peptide-binding domain in all three domains of life (Archaea, Bacteria, and Eukarya). Therefore, an understanding of PDZ domain signaling pathways in choanoflagellates may provide insight into the origins of multicellularity. PDZ domains recognize the C-terminus of target proteins and regulate signaling and trafficking pathways, as well as cellular adhesion. Here, we developed a computational software suite, Domain Analysis and Motif Matcher (DAMM), that analyzes peptide-binding cleft sequence identity as compared with human PDZ domains and that can be used in combination with literature searches of known human PDZ-interacting sequences to predict target specificity in choanoflagellate PDZ domains. We used this program, protein biochemistry, fluorescence polarization, and structural analyses to characterize the specificity of A9UPE9_MONBE, a M. brevicollis PDZ domain-containing protein with no homology to any metazoan protein, finding that its PDZ domain is most similar to those of the DLG family. We then identified two endogenous sequences that bind A9UPE9 PDZ with <100 μM affinity, a value commonly considered the threshold for cellular PDZ–peptide interactions. Taken together, this approach can be used to predict cellular targets of previously uncharacterized PDZ domains in choanoflagellates and other organisms. Our data contribute to investigations into choanoflagellate signaling and how it informs metazoan evolution.  相似文献   

9.
Fluorescent proteins have been applied in a wide variety of fields ranging from basic science to industrial applications. Apart from the naturally occurring fluorescent proteins, there is a growing interest in genetically modified variants that emit light in a specific wavelength. Genetically modifying a protein is not an easy task, especially because the exchange of one residue by other has to achieve the desired property while maintaining protein stability. To help in the choice of residue exchange, computational methods are applied to predict function and stability of proteins. In this work we have prepared a dataset composed by 109 fluorescent proteins and tested four classical supervised classification algorithms: artificial neural networks (ANNs), decision trees (DTs), support vector machines (SVMs) and random forests (RFs). This is the first time that algorithms are compared in this task. Results of comparing the algorithm's performance shows that DT, SVM and RF were significantly better than ANNs, and RF was the best method in all the scenarios. However, the interpretability of DTs is highly relevant and can provide important clues about the mechanisms involved in protein color emission. The results are promising and indicate that the use of in silico methods can greatly reduce the time and cost of the in vitro experiments.  相似文献   

10.
The peroxiredoxin protein expressed in Pseudomonas aeruginosa PAO1 (PaPrx) is a typical 2-cysteine peroxiredoxin that has dual functions as both a thioredoxin-dependent peroxidase and molecular chaperone. As the function of PaPrx is regulated by its structural status, in the present study, we examined the effects of electron beam radiation on the structural modifications of PaPrx, as well as changes to PaPrx peroxidase and chaperone functions. It was found that the chaperone activity of PaPrx was increased approximately 3- to 4-fold at 2 kGy when compared to non-irradiated PaPrx, while its peroxidase activity decreased. This corresponded to a shift from the low molecular weight PaPrx species that acts as a peroxidase to the high molecular weight complex that functions as a chaperone, as detected using polyacrylamide gel electrophoresis. We also investigated the influence of the electron beam on physical protein properties such as hydrophobicity and secondary structure. The exposure of the PaPrx hydrophobic domains in response to irradiation reached a peak at 2 kGy and then decreased in a dose-dependent manner at higher doses. In addition, the exposure of β-sheet and random coil elements on the surface of PaPrx was significantly increased following irradiation with an electron beam, whereas exposure of α-helix and turn elements was decreased. These results suggest that irradiated PaPrx may be a potential candidate for use in bio-engineering systems and various industrial applications, due to its enhanced chaperone activity.  相似文献   

11.
The mean size of the most compact native states of globular proteins, independent of folding type, follows the scaling law of collapsed polymers R g ~ n 1/3, relating the radius of gyration R g to the number of protein residues, n. Until now, this behaviour has only been observed within a small subset of unrelated single-domain proteins with n < 300. Here, we employ the SCOP database of protein folds to study systematically the scaling behaviour of well-defined families of domains that share structural and functional characteristics. In the particular case of helical proteins, we identify the folding types that can be associated with scaling laws corresponding to compact behaviour (e.g., the cytochrome-C monodomains) and noncompact behaviour (e.g., the immunoglobulin/albumin-binding and spectrin-repeat domains). Our results quantify the size variations within some folding families, as well as reveal that some distinct folds represent structures with equivalent compactness.  相似文献   

12.
13.
Scattering data and radial distributions in amorphous matter can be represented accurately in terms of models based on a small set of structural elements which specify local atomic configurations, and on certain spatial random processes specifying the fraction of such elements and the decay of their correlations with distance. The local structural elements, expressed in terms of globally ordered structures, particularly in terms of lattices (L), are subject to radially evolving Markoffian-like processes of relative displacements and of transitions between Ls at different points in space. Including an empty lattice L 0 in the set (L) leads to a definition of random voids and their spatial correlations, depending on the size of the local domains considered. The models provide representation of a continuous range of amorphous structures from liquids and glasses via nanomaterials to crystalline powders.  相似文献   

14.
This study was planned to in silico screening of ssDNA aptamer against Escherichia coli O157:H7 by combination of machine learning and the PseKNC approach. For this, firstly a total numbers of 47 validated ssDNA aptamers as well as 498 random DNA sequences were considered as positive and negative training data respectively. The sequences then converted to numerical vectors using PseKNC method through Pse-in-one 2.0 web server. After that, the numerical vectors were subjected to classification by the SVM, ANN and RF algorithms available in Orange 3.2.0 software. The performances of the tested models were evaluated using cross-validation, random sampling and ROC curve analyzes. The primary results demonstrated that the ANN and RF algorithms have appropriate performances for the data classification. To improve the performances of mentioned classifiers the positive training data was triplicated and re-training process was also performed. The results confirmed that data size improvement had significant effect on the accuracy of data classification especially about RF model. Subsequently, the RF algorithm with accuracy of 98% was selected for aptamer screening. The thermodynamics details of folding process as well as secondary structures of the screened aptamers were also considered as final evaluations. The results confirmed that the selected aptamers by the proposed method had appropriate structure properties and there is no thermodynamics limit for the aptamers folding.  相似文献   

15.
Inductively coupled plasma-mass spectrometry (ICP-MS) in combination with different supervised chemometric approaches has been used to classify cultivated mussels in Galicia (Northwest of Spain) under the European Protected Designation of Origin (PDO). 158 mussel samples, collected in the five rías on the basis of the production, along with minor and trace elements, including high field strength elements (HFSEs) and rare earth elements (REEs), were used with this aim. The classification of samples was achieved according to their origin: Galician vs. other regions (from Tarragona, Spain, and Ethang de Thau, France) and between the Galician Rías. The ability of linear discriminant analysis (LDA), soft independent modelling of class analogy (SIMCA) and artificial neural network (ANN) to classify the samples was investigated. Correct assignations for Galician and non-Galician samples were obtained when LDA and SIMCA were used. ANNs were more effective when a classification according to the ría of origin was to be applied.  相似文献   

16.
The title of this review describes structural comparisons of protein classes whose task is to identify and interact with biological solids (minerals and ice). To date, the following trends have been noted: (1) biomineral-interaction proteins typically adopt unfolded, open conformations, and, where mineral binding motifs have been identified, these sequences exhibit structural trends towards extended, random coil, or other unstable secondary structures; (2) ice-interaction proteins typically adopt folded structures, featuring stable secondary structure preferences (α-helix, β-sheet, β-helix, etc.) and stable, planar ice binding motifs that exploit hydrophobicity and van der Waals’ interactions for ice binding.  相似文献   

17.
18.
The behavior of proteins is closely related to the protonation states of the residues. Therefore, prediction and measurement of pKa are essential to understand the basic functions of proteins. In this work, we develop a new empirical scheme for protein pKa prediction that is based on deep representation learning. It combines machine learning with atomic environment vector (AEV) and learned quantum mechanical representation from ANI-2x neural network potential (J. Chem. Theory Comput. 2020, 16, 4192). The scheme requires only the coordinate information of a protein as the input and separately estimates the pKa for all five titratable amino acid types. The accuracy of the approach was analyzed with both cross-validation and an external test set of proteins. Obtained results were compared with the widely used empirical approach PROPKA. The new empirical model provides accuracy with MAEs below 0.5 for all amino acid types. It surpasses the accuracy of PROPKA and performs significantly better than the null model. Our model is also sensitive to the local conformational changes and molecular interactions.

We developed new empirical ML model for protein pKa prediction with MAEs below 0.5 for all amino acid types.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号