首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 381 毫秒
1.
2.
The identification of protein complexes in protein–protein interaction (PPI) networks has greatly advanced our understanding of biological organisms. Existing computational methods to detect protein complexes are usually based on specific network topological properties of PPI networks. However, due to the inherent complexity of the network structures, the identification of protein complexes may not be fully addressed by using single network topological property. In this study, we propose a novel MultiObjective Evolutionary Programming Genetic Algorithm (MOEPGA) which integrates multiple network topological features to detect biologically meaningful protein complexes. Our approach first systematically analyzes the multiobjective problem in terms of identifying protein complexes from PPI networks, and then constructs the objective function of the iterative algorithm based on three common topological properties of protein complexes from the benchmark dataset, finally we describe our algorithm, which mainly consists of three steps, population initialization, subgraph mutation and subgraph selection operation. To show the utility of our method, we compared MOEPGA with several state-of-the-art algorithms on two yeast PPI datasets. The experiment results demonstrate that the proposed method can not only find more protein complexes but also achieve higher accuracy in terms of fscore. Moreover, our approach can cover a certain number of proteins in the input PPI network in terms of the normalized clustering score. Taken together, our method can serve as a powerful framework to detect protein complexes in yeast PPI networks, thereby facilitating the identification of the underlying biological functions.  相似文献   

3.
4.
Protein complex detection from protein–protein interaction (PPI) network has received a lot of focus in recent years. A number of methods identify protein complexes as dense sub-graphs using network information while several other methods detect protein complexes based on topological information. While the methods based on identifying dense sub-graphs are more effective in identifying protein complexes, not all protein complexes have high density. Moreover, existing methods focus more on static PPI networks and usually overlook the dynamic nature of protein complexes. Here, we propose a new method, Weighted Edge based Clustering (WEC), to identify protein complexes based on the weight of the edge between two interacting proteins, where the weight is defined by the edge clustering coefficient and the gene expression correlation between the interacting proteins. Our WEC method is capable of detecting highly inter-connected and co-expressed protein complexes. The experimental results of WEC on three real life data shows that our method can detect protein complexes effectively in comparison with other highly cited existing methods.Availability: The WEC tool is available at http://agnigarh.tezu.ernet.in/~rosy8/shared.html.  相似文献   

5.
Detection of protein complexes is very important to understand the principles of cellular organization and function. Recently, large protein–protein interactions (PPIs) networks have become available using high-throughput experimental techniques. These networks make it possible to develop computational methods for protein complex detection. Most of the current methods rely on the assumption that protein complex as a module has dense structure. However complexes have core-attachment structure and proteins in a complex core share a high degree of functional similarity, so it expects that a core has high weighted density. In this paper we present a Core-Attachment based method for protein complex detection from Weighted PPI Interactions using clustering coefficient and weighted density. Experimental results show that the proposed method, CAMWI improves the accuracy of protein complex detection.  相似文献   

6.
The identification of protein–protein interactions (PPIs) and their networks is vitally important to systemically define and understand the roles of proteins in biological systems. In spite of development of numerous experimental systems to detect PPIs and diverse research on assessment of the quality of the obtained data, a consensus – highly reliable, almost complete – interactome of Saccharomyces cerevisiae is not presented yet. In this work, we proposed an unsupervised statistical approach to create a high-confidence yeast PPI network. For this, we assembled databases of interacting protein pairs for yeast and obtained an extremely large PPI dataset which comprises of 135 154 non-redundant interactions between 6191 yeast proteins. A scoring scheme considering eight heterogeneous biological features resulted with a broad score distribution and a highly reliable network consisting of 29 046 physical interactions with scores higher than the threshold value of 0.85, for which sensitivity, specificity and coverage were 86%, 68%, and 72%, respectively. We evaluated our method by comparing it with other scoring schemes and showed that reducing the noise inherent in experimental PPIs via our scoring scheme further increased the accuracy. Current study is expected to increase the efficiency of the methodologies in biological research which make use of protein interaction networks.  相似文献   

7.
Large-scale experiments and data integration have provided the opportunity to systematically analyze and comprehensively understand the topology of biological networks and biochemical processes in cells. Modular architecture which encompasses groups of genes/proteins involved in elementary biological functional units is a basic form of the organization of interacting proteins. Here we apply a graph clustering algorithm based on clique percolation clustering to detect overlapping network modules of a protein–protein interaction (PPI) network. Our analysis of the yeast Sacchromyces cerevisiae suggests that most of the detected modules correspond to one or more experimentally functional modules and half of these annotated modules match well with experimentally determined protein complexes. Our method of analysis can of course be applied to protein–protein interaction data for any species and even other biological networks.  相似文献   

8.
In order to understand the molecular mechanism underlying any disease, knowledge about the interacting proteins in the disease pathway is essential. The number of revealed protein-protein interactions (PPI) is still very limited compared to the available protein sequences of different organisms. Experiment based high-throughput technologies though provide some data about these interactions, those are often fairly noisy. Computational techniques for predicting protein–protein interactions therefore assume significance. 1296 binary fingerprints that encode a combination of structural and geometric properties were developed using the crystallographic data of 15,000 protein complexes in the pdb server. In a case study, these fingerprints were created for proteins implicated in the Type 2 diabetes mellitus disease. The fingerprints were input into a SVM based model for discriminating disease proteins from non disease proteins yielding a classification accuracy of 78.2% (AUC value of 0.78) on an external data set composed of proteins retrieved via text mining of diabetes related literature. A PPI network was constructed and analysed to explore new disease targets. The integrated approach exemplified here has a potential for identifying disease related proteins, functional annotation and other proteomics studies.  相似文献   

9.
10.
Tripeptidyl peptidase II (TPPII) is primarily considered a house-keeping exopeptidase, which contributes to the functions of the ubiquitin-proteasome system by the maintenance of the cellular amino acid homeostasis. Although functionally well-characterised in vitro and using the mammalian cell models, less is known about the molecular mechanisms of its involvement in the signalling and metabolic pathways, which mediate its cellular functions. The present protein-protein interaction network analysis identified these mechanisms involved in the adaptive and innate immunity, the metabolism of the glucose, cancer cell growth, apoptosis, cell cycle and DNA damage responses. The interaction network constructed based on the publicly available protein-protein interaction data was extended by the application GeneMania, which was further used for the pathway enrichment, the protein function prediction and the protein node prioritisation analysis. The analysis suggested that the molecular mechanisms linked to the adaptive and innate immunity (ID, Kit receptor, BCR, IL-2 and G-CSF signalling; the regulation of NFκB), the aerobic glycolysis (ID and IL-2 signalling), tumorigenesis (TGF-β and p53 signalling; the top priority nodes MAPKs, mTOR regulation), diabetes (Kit receptor signalling; the top priority node GSK3β) and neurodegeneration (the control of mTOR and Aβ peptide degradation) are controlling the resulting TPPII interaction network. The uncharacterized interactions with two lung cancer suppressors (DOK3, DENND2D), a protein involved in the increased risk of the lung cancer in smokers (CYP1A1) and a protein implicated in asthmatic reactions (CHIA) suggest potential roles of TPPII in the lung cancer pathology. The interactions with methyltransferase CARNMT1, which modifies di- and tripeptides and the xenobiotic processing enzyme CYP1A1, are additional candidates for the breakthrough in new functions discovery of TPPII.  相似文献   

11.
Drug–target interaction (DTI) prediction is a challenging step in further drug repositioning, drug discovery and drug design. The advent of high-throughput technologies brings convenience to the development of DTI prediction methods. With the generation of a high number of data sets, many mathematical models and computational algorithms have been developed to identify the potential drug–target pairs. However, most existing methods are proposed based on the single view data. By integrating the drug and target data from different views, we aim to get more stable and accurate prediction results.In this paper, a multiview DTI prediction method based on clustering is proposed. We first introduce a model for single view drug–target data. The model is formulated as an optimization problem, which aims to identify the clusters in both drug similarity network and target protein similarity network, and at the same time make the clusters with more known DTIs be connected together. Then the model is extended to multiview network data by maximizing the consistency of the clusters in each view. An approximation method is proposed to solve the optimization problem. We apply the proposed algorithms to two views of data. Comparisons with some existing algorithms show that the multiview DTI prediction algorithm can produce more accurate predictions. For the considered data set, we finally predict 54 possible DTIs. From the similarity analysis of the drugs/targets, enrichment analysis of DTIs and genes in each cluster, it is shown that the predicted DTIs have a high possibility to be true.  相似文献   

12.
Abstract

Understanding the self-assembly of nanoscale metal—ligand clusters is an important research area in supramolecular chemistry, especially, if one wishes to develop a truly predictive design strategy for synthesizing these nanoscale clusters. As the building blocks for forming these clusters have become larger and more complex, spacious clusters have been synthesized which often contain large cavities. These assemblies can house guest molecules which play a previously uncharacterized role in the self-assembly processes. We seek to analyze this role: do these guest molecules act as templates? Are the guest molecules necessary for cluster formation? Does the guest drive cluster assemble by forming a stable host—guest complex with the cluster? Must a truly rational design strategy for forming metal—ligand clusters incorporate the use of templates? The role of guest molecules in the self-assembly of nanoscale coordination clusters is reviewed in this article.  相似文献   

13.
采用水热法合成了一种基于1,3-二[3,5-(二羧基)苯氧基]-2-羟基丙烷(H4L)和4,4′-联吡啶(Bipy)的Co(Ⅱ)配合物:[Co2(L)(Bipy)2]n(1),并利用红外光谱(IR)、紫外-可见光谱(UV-Vis)、热重分析(TGA)、单晶X-射线衍射、粉末X射线衍射(XRD)及元素分析对其结构进行了表征。配合物1属单斜晶系,C2/c空间群。在配合物1中,由2个六配位钴原子组成的双核原子簇可简化为八面体型6-连接点,连接4个羧酸配体分子L和其他2个双核原子簇。羧酸配体L可简化为四面体型4-连接点,连接4个双核原子簇。平行排列的2个联吡啶分子连接两个相邻双核原子簇,相当于"双桥",简化为拓扑回路的边。因此,配合物1的骨架描述为sqc422拓扑网络。  相似文献   

14.
Protein fold recognition   总被引:4,自引:0,他引:4  
Summary An important, yet seemingly unattainable, goal in structural molecular biology is to be able to predict the native three-dimensional structure of a protein entirely from its amino acid sequence. Prediction methods based on rigorous energy calculations have not yet been successful, and best results have been obtained from homology modelling and statistical secondary structure prediction. Homology modelling is limited to cases where significant sequence similarity is shared between a protein of known structure and the unknown. Secondary structure prediction methods are not only unreliable, but also do not offer any obvious route to the full tertiary structure. Recently, methods have been developed whereby entire protein folds are recognized from sequence, even where little or no sequence similarity is shared between the proteins under consideration. In this paper we review the current methods, including our own, and in particular offer a historical background to their development. In addition, we also discuss the future of these methods and outline the developments under investigation in our laboratory.  相似文献   

15.
《Analytica chimica acta》2004,515(1):87-100
The goal of present work is to analyse the effect of having non-informative variables (NIV) in a data set when applying cluster analysis and to propose a method computationally capable of detecting and removing these variables. The method proposed is based on the use of a genetic algorithm to select those variables important to make the presence of groups in data clear. The procedure has been implemented to be used with k-means and using the cluster silhouettes as fitness function for the genetic algorithm.The main problem that can appear when applying the method to real data is the fact that, in general, we do not know a priori what the real cluster structure is (number and composition of the groups).The work explores the evolution of the silhouette values computed from the clusters built by using k-means when non-informative variables are added to the original data set in both a literature data set as well as some simulated data in higher dimension. The procedure has also been applied to real data sets.  相似文献   

16.
A new metal-organic framework(MOF) based on metal clusters as secondary building units(SBU),has been synthesized and structurally characterized.The reported MOF presents an interesting 8-connected self-penetrating coordination network based on dinuclear cadmium cluster with a 424·5·63 topology. Moreover,the thermal stability and luminescence property of this compound have been investigated.  相似文献   

17.
18.
We have developed a computer program, named PDBETA, that performs normal mode analysis (NMA) based on an elastic network model that uses dihedral angles as independent variables. Taking advantage of the relatively small number of degrees of freedom required to describe a molecular structure in dihedral angle space and a simple potential-energy function independent of atom types, we aimed to develop a program applicable to a full-atom system of any molecule in the Protein Data Bank (PDB). The algorithm for NMA used in PDBETA is the same as the computer program FEDER/2, developed previously. Therefore, the main challenge in developing PDBETA was to find a method that can automatically convert PDB data into molecular structure information in dihedral angle space. Here, we illustrate the performance of PDBETA with a protein–DNA complex, a protein–tRNA complex, and some non-protein small molecules, and show that the atomic fluctuations calculated by PDBETA reproduce the temperature factor data of these molecules in the PDB. A comparison was also made with elastic-network-model based NMA in a Cartesian-coordinate system.  相似文献   

19.
Up to four carbonyl groups of Co2Ir2(CO)12 have been replaced by trimethylphosphite to form tetranuclear clusters of formula Co2Ir2(CO)12?n[P(OMe)3]n. The clusters do not exhibit the redistribution of the metal core which is observed in the case of mixed cobalt—rhodium clusters. Attachment of three or four trimethylphosphites to the metal skeleton of the cluster inhibits the scrambling of the carbonyl groups.  相似文献   

20.
In metabolomics research, it is often important to focus the data analysis to specific areas of interest within the metabolome. In this paper, we describe the application of consensus principal component analysis (CPCA) and canonical correlation analysis (CCA) as a means to explore the relation between metabolome data and (i) biochemically related metabolites and (ii) an amino acid biosynthesis pathway. CPCA searches for major trends in the behavior of metabolite concentrations that are in common for the metabolites of interest and the remainder of the metabolome. CCA identifies the strongest correlations between the metabolites of interest and the remainder of the metabolome.CPCA and CCA were applied to two different microbial metabolomics data sets. The first data set, derived from Pseudomonas putida S12, was relatively simple as it contained metabolomes obtained under four environmental conditions only. The second data set, obtained from Escherichia coli, was much more complex as it consisted of metabolomes obtained under 28 different environmental conditions. In case of the simple and coherent P. putida S12 data set, CCA and CPCA gave similar results as the variation in the subset of the selected metabolites and the remainder of the metabolome was similar.In contrast, CCA and CPCA yielded different results in case of the E. coli data set. With CPCA the trends in the selected subset - the phenylalanine biosynthesis pathway - dominated the results. The main trends were related to high and low phenylalanine productivity, and the metabolites showing a similar behavior in concentration were metabolites regulating the phenylalanine biosynthesis route in the subset and metabolites related to general amino acid metabolism in the remainder of the metabolome. With CCA, neither subset truly dominated the data analysis. CCA described the differences between the wild type and the overproducing strain and the differences between the succinate and glucose grown cells. For the difference between the wild type and the overproducing strain, metabolites from the beginning and the end of aromatic amino acid pathways like erythrose-4-phosphate, tryptophan, and phenylalanine were important for the selected metabolites.CCA and CPCA proved to be complementary data analysis tools that enable the focusing of the data analysis on groups of metabolites that are of specific interest in relation to the remainder of the metabolome. Compared to an ordinary PCA, focusing the data analysis on biologically relevant metabolites lead especially for the complex E. coli data to a better biological interpretation of the data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号