首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present an algorithm for automatically predicting the topological family of any RNA three-way junction, given only the information from the secondary structure: the sequence and the Watson–Crick pairings. The parameters of the algorithm have been determined on a data set of 33 three-way junctions whose 3D conformation is known. We applied the algorithm on 53 other junctions and compared the predictions to the real shape of those junctions. We show that the correct answer is selected out of nine possible configurations 64% of the time. Additionally, these results are noticeably improved if homology information is used. The resulting software, Cartaj, is available online and downloadable (with source) at: http://cartaj.lri.fr.  相似文献   

2.
As a pivotal domain within envelope protein, fusion peptide (FP) plays a crucial role in pathogenicity and therapeutic intervention. Taken into account the limited FP annotations in NCBI database and absence of FP prediction software, it is urgent and desirable to develop a bioinformatics tool to predict new putative FPs (np-FPs) in retroviruses. In this work, a sequence-based FP model was proposed by combining Hidden Markov Method with similarity comparison. The classification accuracies are 91.97% and 92.31% corresponding to 10-fold and leave-one-out cross-validation. After scanning sequences without FP annotations, this model discovered 53,946 np-FPs. The statistical results on FPs or np-FPs reveal that FP is a conserved and hydrophobic domain. The FP software programmed for windows environment is available at https://sourceforge.net/projects/fptool/files/?source=navbar.  相似文献   

3.
The P-type ATPases (P-ATPases) are present in all living cells where they mediate ion transport across membranes on the expense of ATP hydrolysis. Different ions which are transported by these pumps are protons like calcium, sodium, potassium, and heavy metals such as manganese, iron, copper, and zinc. Maintenance of the proper gradients for essential ions across cellular membranes makes P-ATPases crucial for cell survival. In this study, characterization of two families of P-ATPases including P-ATPase 13A1 and P-ATPase 13A3 protein was compared in two different insect species from different orders. According to the conserved motifs found with MEME, nine motifs were shared by insects of 13A1 family but eight in 13A3 family. Seven different insect species from 13A1 and five samples from 13A3 family were selected as the representative samples for functional and structural analyses. The structural and functional analyses were performed with ProtParam, SOPMA, SignalP 4.1, TMHMM 2.0, ProtScale and ProDom tools in the ExPASy database. The tertiary structure of Bombus terrestris as a sample of each family of insects were predicted by the Phyre2 and TM-score servers and their similarities were verified by SuperPose server. The tertiary structures were predicted via the “c3b9bA” model (PDB Accession Code: 3B9B) in P-ATPase 13A1 family and “c2zxeA” model (PDB Accession Code: 2ZXE) in P-ATPase 13A3 family. A phylogenetic tree was constructed with MEGA 6.06 software using the Neighbor-joining method. According to the results, there was a high identity of P-ATPase families so that they should be derived from a common ancestor however they belonged to separate groups. In protein–protein interaction analysis by STRING 10.0, six common enriched pathways of KEGG were identified in B. terrestris in both families. The obtained data provide a background for bioinformatic studies of the function and evolution of other insects and organisms.  相似文献   

4.
We present a new form of merit function which measures agreement between a large number of data and the model function with a particular choice of parameters. We demonstrate the efficiency of the proposed merit function on the common problem of finding the base line of a spectrum. When the base line is expected to be a horizontal straight line, the use of minimization algorithms is not necessary, i.e. the solution is achieved in a small number of steps. We discuss the advantages of the proposed merit function in general, when explicit use of a minimization algorithm is necessary.The hardcopy text is accompanied by an electronic archive, stored on the SAE homepage at http://www1.elsevier.com/homepage/saa/sab/content/lower.htm. The archive contains fully functional demo program with tutorial, examples and Visual Basic source code of the key subroutine.  相似文献   

5.
Amidation plays an important role in a variety of pathological processes and serious diseases like neural dysfunction and hypertension. However, identification of protein amidation sites through traditional experimental methods is time consuming and expensive. In this paper, we proposed a novel predictor for Prediction of Amidation Sites (PrAS), which is the first software package for academic users. The method incorporated four representative feature types, which are position-based features, physicochemical and biochemical properties features, predicted structure-based features and evolutionary information features. A novel feature selection method, positive contribution feature selection was proposed to optimize features. PrAS achieved AUC of 0.96, accuracy of 92.1%, sensitivity of 81.2%, specificity of 94.9% and MCC of 0.76 on the independent test set. PrAS is freely available at https://sourceforge.net/p/praspkg.  相似文献   

6.
In proteins, the number of interacting pairs is usually much smaller than the number of non-interacting ones. So the imbalanced data problem will arise in the field of protein–protein interactions (PPIs) prediction. In this article, we introduce two ensemble methods to solve the imbalanced data problem. These ensemble methods combine the based-cluster under-sampling technique and the fusion classifiers. And then we evaluate the ensemble methods using a dataset from Database of Interacting Proteins (DIP) with 10-fold cross validation. All the prediction models achieve area under the receiver operating characteristic curve (AUC) value about 95%. Our results show that the ensemble classifiers are quite effective in predicting PPIs; we also gain some valuable conclusions on the performance of ensemble methods for PPIs in imbalanced data. The prediction software and all dataset employed in the work can be obtained for free at http://cic.scu.edu.cn/bioinformatics/Ensemble_PPIs/index.html.  相似文献   

7.
Defining the amino acid composition of protein cores is fundamental for understanding protein folding, as different architectures might achieve structural stability only in the presence of specific amino acid networks. Quantitative characterization of protein cores in relation to the corresponding structures and dynamics is needed to increase the reliability of protein engineering procedures. Unambiguous criteria based on atom depth considerations were established to assign amino acid residues to protein cores and, hence, for classifying inner and outer molecular moieties. These criteria were summarized in a new tool named ProCoCoA, Protein Core Composition Analyzer. An user-friendly web interface was developed, available at the URL: http://www.sbl.unisi.it/prococoa. An accurate estimate of protein core composition for six protein architectures selected from the CATH database of solved structures has been carried out, and the obtained results indicate the presence of specific patterns of amino acid core composition in different protein folds.  相似文献   

8.
9.
Quantitative analysis of behaviors shown by interacting multiple animals can provide a key for revealing high-order functions of their nervous systems. To resolve these complex behaviors, a video tracking system that preserves individual identity even under severe overlap in positions, i.e., occlusion, is needed. We developed GroupTracker, a multiple animal tracking system that accurately tracks individuals even under severe occlusion. As maximum likelihood estimation of Gaussian mixture model whose components can severely overlap is theoretically an ill-posed problem, we devised an expectation–maximization scheme with additional constraints on the eigenvalues of the covariance matrix of the mixture components. Our system was shown to accurately track multiple medaka (Oryzias latipes) which freely swim around in three dimensions and frequently overlap each other. As an accurate multiple animal tracking system, GroupTracker will contribute to revealing unexplored structures and patterns behind animal interactions. The Java source code of GroupTracker is available at https://sites.google.com/site/fukunagatsu/software/group-tracker.  相似文献   

10.
Several methods have been proposed for protein–sugar binding site prediction using machine learning algorithms. However, they are not effective to learn various properties of binding site residues caused by various interactions between proteins and sugars. In this study, we classified sugars into acidic and nonacidic sugars and showed that their binding sites have different amino acid occurrence frequencies. By using this result, we developed sugar-binding residue predictors dedicated to the two classes of sugars: an acid sugar binding predictor and a nonacidic sugar binding predictor. We also developed a combination predictor which combines the results of the two predictors. We showed that when a sugar is known to be an acidic sugar, the acidic sugar binding predictor achieves the best performance, and showed that when a sugar is known to be a nonacidic sugar or is not known to be either of the two classes, the combination predictor achieves the best performance. Our method uses only amino acid sequences for prediction. Support vector machine was used as a machine learning algorithm and the position-specific scoring matrix created by the position-specific iterative basic local alignment search tool was used as the feature vector. We evaluated the performance of the predictors using five-fold cross-validation. We have launched our system, as an open source freeware tool on the GitHub repository (https://doi.org/10.5281/zenodo.61513).  相似文献   

11.
Background and objectiveRecently, differential DNA Methylation is known to affect the regulatory mechanism of biological pathways. A pathway encompasses a set of interacting genes or gene products that altogether perform a given biological function. Pathways often encode strong methylation signatures that are capable of distinguishing biologically distinct subtypes. Even though Next Generation Sequencing techniques such as MeDIP-seq and MBD-isolated genome sequencing (MiGS) allow for genome-wide identification of clinical and biological subtypes, there is a pressing need for computational methods to compare epigenetic signatures across pathways.MethodsA novel alignment method, called DEEPAligner (Deep Encoded Epigenetic Pathway Aligner), is proposed in this paper that finds functionally consistent and topologically sound alignments of epigenetic signatures from pathway networks. A deep embedding framework is used to obtain epigenetic signatures from pathways which are then aligned for functional consistency and local topological similarity.ResultsExperiments on four benchmark cancer datasets reveal epigenetic signatures that are conserved in cancer-specific and across-cancer subtypes.ConclusionThe proposed deep embedding framework obtains highly coherent signatures that are aligned for biological as well as structural orthology. Comparison with state-of-the-art network alignment methods clearly suggest that the proposed method obtains topologically and functionally more consistent alignments.Availabilityhttp://bdbl.nitc.ac.in/DEEPAligner  相似文献   

12.
Protein inference is an important issue in proteomics research. Its main objective is to select a proper subset of candidate proteins that best explain the observed peptides. Although many methods have been proposed for solving this problem, several issues such as peptide degeneracy and one-hit wonders still remain unsolved. Therefore, the accurate identification of proteins that are truly present in the sample continues to be a challenging task.Based on the concept of peptide detectability, we formulate the protein inference problem as a constrained Lasso regression problem, which can be solved very efficiently through a coordinate descent procedure. The new inference algorithm is named as ProteinLasso, which explores an ensemble learning strategy to address the sparsity parameter selection problem in Lasso model. We test the performance of ProteinLasso on three datasets. As shown in the experimental results, ProteinLasso outperforms those state-of-the-art protein inference algorithms in terms of both identification accuracy and running efficiency. In addition, we show that ProteinLasso is stable under different parameter specifications. The source code of our algorithm is available at: http://sourceforge.net/projects/proteinlasso.  相似文献   

13.
Domains are the structural basis of the physiological functions of proteins, and the prediction of which is an advantageous process on the study of protein structure and function. This article proposes a new complete automatic prediction method, PPM-Dom (Domain Position Prediction Method), for predicting the particular positions of domains in a target protein via its atomic coordinate. The presented method integrates complex networks, community division, and fuzzy mean operator (FMO). The whole sequences are divided into potential domain regions by the complex network and community division, and FMO allows the final determination for the domain position. This method will suffice to predict regions that will form a domain structure and those that are unstructured based on completely new atomic coordinate information of the query sequence, and be able to separate different domains in the same query sequence from each other. On evaluating the performance using an independent testing dataset, PPM-Dom reached 91.41% for prediction accuracy, 96.12% for sensitivity and 92.86% for specificity. The tool bag of PPM-Dom is freely available at http://cic.scu.edu.cn/bioinformatics/PPMDom.zip.  相似文献   

14.
G-protein-coupled receptors (GPCRs) are important targets of modern medicinal drugs. The accurate identification of interactions between GPCRs and drugs is of significant importance for both protein function annotations and drug discovery. In this paper, a new sequence-based predictor called TargetGDrug is designed and implemented for predicting GPCR–drug interactions. In TargetGDrug, the evolutionary feature of GPCR sequence and the wavelet-based molecular fingerprint feature of drug are integrated to form the combined feature of a GPCR–drug pair; then, the combined feature is fed to a trained random forest (RF) classifier to perform initial prediction; finally, a novel drug-association-matrix-based post-processing procedure is applied to reduce potential false positive or false negative of the initial prediction. Experimental results on benchmark datasets demonstrate the efficacy of the proposed method, and an improvement of 15% in the Matthews correlation coefficient (MCC) was observed over independent validation tests when compared with the most recently released sequence-based GPCR–drug interactions predictor. The implemented webserver, together with the datasets used in this study, is freely available for academic use at http://csbio.njust.edu.cn/bioinf/TargetGDrug.  相似文献   

15.
Protein complex detection from protein–protein interaction (PPI) network has received a lot of focus in recent years. A number of methods identify protein complexes as dense sub-graphs using network information while several other methods detect protein complexes based on topological information. While the methods based on identifying dense sub-graphs are more effective in identifying protein complexes, not all protein complexes have high density. Moreover, existing methods focus more on static PPI networks and usually overlook the dynamic nature of protein complexes. Here, we propose a new method, Weighted Edge based Clustering (WEC), to identify protein complexes based on the weight of the edge between two interacting proteins, where the weight is defined by the edge clustering coefficient and the gene expression correlation between the interacting proteins. Our WEC method is capable of detecting highly inter-connected and co-expressed protein complexes. The experimental results of WEC on three real life data shows that our method can detect protein complexes effectively in comparison with other highly cited existing methods.Availability: The WEC tool is available at http://agnigarh.tezu.ernet.in/~rosy8/shared.html.  相似文献   

16.
Circular RNAs (circRNAs) were found more than 30 years ago, but have been treated as molecular flukes in a long time. Combining deep sequencing studies with bioinformatics technique, thousands of endogenous circRNAs have been found in mammalian cells, and some researchers have proved that several circRNAs act as competing endogenous RNAs (ceRNAs) to regulate gene expression. However, the mechanism by which the precursor mRNA to be transformed into a circular RNA or a linear mRNA is largely unknown. In this paper, we attempted to bioinformatically identify shared genomic features that might further elucidate the mechanism of formation and proposed a SVM-based model to distinguish circRNAs from non-circularized, expressed exons. Firstly, conformational and thermodynamic dinucleotide properties in the flanking introns were extracted as potential features. Secondly, two feature selection methods were applied to gain the optimal feature subset. Our 10-fold cross-validation results showed that the model can be used to distinguish circRNAs from non-circularized, expressed exons with an Sn of 0.884, Sp of 0.900, ACC of 0.892, MCC of 0.784, respectively. The identification results suggest that conformational and thermodynamic properties in the flanking introns are closely related to the formation of circRNAs. Datasets and the tool involved in this paper are all available at https://sourceforge.net/projects/predicircrnatool/files/.  相似文献   

17.
A novel series of N-substituted-benzimidazolyl linked para substituted benzylidene based molecules containing three pharmacologically potent hydrogen bonding parts namely; 2,4-thiazolidinedione (TZD: a 2,4-dicarbonyl), diethyl malonate (DEM: a 1,3-diester and an isooxazolidinedione analog) and methyl acetoacetate (MAA: a β-ketoester) (6a–11b) were synthesized and evaluated for in vitro α-glucosidase inhibition. The structure of the novel synthesized compounds was confirmed through the spectral studies (LC–MS, 1H NMR, 13C NMR, FT-IR). Comparative evaluation of these compounds revealed that the compound 9b showed maximum inhibitory potential against α-amylase and α-glucosidase giving an IC50 value of 0.54 ± 0.01 μM. Furthermore, binding affinities in terms of G score values and hydrogen bond interactions between all the synthesized compounds and the AA residues in the active site of the protein (PDB code: 3TOP) to that of Acarbose (standard drug) were explored with the help of molecular docking studies. Compound 9b was considered as promising candidate of this series.  相似文献   

18.
De novo assembly of bacterial genomes from next-generation sequencing (NGS) data allows a reference-free discovery of single nucleotide polymorphisms (SNP). However, substantial rates of errors in genomes assembled by this approach remain a major barrier for the reference-free analysis of genome variations in medically important bacteria. The aim of this report was to improve the quality of SNP identification in bacterial genomes without closely related references. We developed a bioinformatics pipeline (SnpFilt) that constructs an assembly using SPAdes and then removes unreliable regions based on the quality and coverage of re-aligned reads at neighbouring regions. The performance of the pipeline was compared against reference-based SNP calling for Illumina HiSeq, MiSeq and NextSeq reads from a range of bacterial pathogens including Salmonella, which is one of the most common causes of food-borne disease. The SnpFilt pipeline removed all false SNP in all test NGS datasets consisting of paired-end Illumina reads. We also showed that for reliable and complete SNP calls, at least 40-fold coverage is required. Analysis of bacterial isolates associated with epidemiologically confirmed outbreaks using the SnpFilt pipeline produced results consistent with previously published findings. The SnpFilt pipeline improves the quality of de-novo assembly and precision of SNP calling in bacterial genomes by removal of regions of the assembly that may potentially contain assembly errors. SnpFilt is available from https://github.com/LanLab/SnpFilt.  相似文献   

19.
Regulatory single nucleotide polymorphisms (rSNPs) in human genomes are thought to be responsible for phenotypic differences, including susceptibility to diseases and treatment outcomes, even they do not change any gene product. However, a genome-wide search for rSNPs has not been properly addressed so far. In this work, a computational method for rSNP identification is proposed. As background SNPs far outnumber rSNPs, an ensemble method is applied to handle imbalanced data, which firstly converts an unbalanced dataset into several balanced ones and then models for every balanced dataset. Two major types of features are extracted, that are sequence based features and allele-specific based features. Then random forest is applied to build the recognition model for each balanced dataset. Finally, ensemble strategies are adopted to combine the result of each model together. We have tested our method on a set of experimentally verified rSNPs, and leave-one-out cross-validation results showed that our method can achieve accuracy with sensitivity of 73.8%, specificity of 71.8% and the area under ROC curve (AUC) is 0.756. In addition, our method is threshold free and doesn’t rely on data of regulatory elements, thus it will have better adaptability when facing different data scenarios. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnpdect/.  相似文献   

20.
Since Ambros’ discovery of small non-protein coding RNAs in the early 1990s, the past two decades have seen an upsurge in the number of reports of predicted microRNAs (miR), which have been implicated in various functions. The correlation of miRs with cancer has spurred the usage of this class of non-coding RNAs in various cancer therapies, although most of them are at trial stages. However, the experimental identification of a miR to be associated with cancer is still an elaborate, time-consuming process. To aid this process of miR association, we undertook an in-silico study involving the identification of global signatures in experimentally validated microRNAs associated with cancer. Subsequently, a support vector machine based two-step binary classifier system has been trained and modeled from the features extracted from the above study. A total of 60 distinguishing features were selected and ranked to form the feature set for classification – 26 of these extracted from the miR sequence itself, and the remainder from the thermodynamics of folding and the hybridized miRNA–mRNA structure. The two step classifier model – miRSEQ and miRINT had reasonably good performance measures with fairly high values of Matthew’s correlation coefficient (MCC) values ranging from 0.72 to 0.82 (availability: https://sites.google.com/site/sumitslab/tools).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号