首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
Cytosine methylation is one of the most important RNA epigenetic modifications. With the development of experimental technology, scientists attach more importance to RNA cytosine methylation and find bisulfite sequencing is an effective experimental method for RNA cytosine methylation study. However, there are only a few tools can directly deal with RNA bisulfite sequencing data efficiently. Herein, we developed a specialized tool BS-RNA, which can analyze cytosine methylation of RNA based on bisulfite sequencing data and support both paired-end and single-end sequencing reads from directional bisulfite libraries. For paired-end reads, simply removing the biased positions from the 5′ end may result in “dovetailing” reads, where one or both reads seem to extend past the start of the mate read. BS-RNA could map “dovetailing” reads successfully. The annotation result of BS-RNA is exported in BED (.bed) format, including locations, sequence context types (CG/CHG/CHH, H = A, T, or C), reference sequencing depths, cytosine sequencing depths, and methylation levels of covered cytosine sites on both Watson and Crick strands. BS-RNA is an efficient, specialized and highly automated mapping and annotation tool for RNA bisulfite sequencing data. It performs better than the existing program in terms of accuracy and efficiency. BS-RNA is developed by Perl language and the source code of this tool is freely available from the website: http://bs-rna.big.ac.cn.  相似文献   

2.
《Electrophoresis》2017,38(8):1163-1174
Next generation sequencing (NGS) is the emerging technology in forensic genomics laboratories. It offers higher resolution to address most problems of human identification, greater efficiency and potential ability to interrogate very challenging forensic casework samples. In this study, a trial set of DNA samples was artificially degraded by progressive aqueous hydrolysis, and analyzed together with the corresponding unmodified DNA sample and control sample 2800 M, to test the performance and reliability of the ForenSeqTM DNA Signature Prep kit using the MiSeq Sequencer (Illumina). The results of replicate tests performed on the unmodified sample (1.0 ng) and on scalar dilutions (1.0, 0.5 and 0.1 ng) of the reference sample 2800 M showed the robustness and the reliability of the NGS approach even from sub‐optimal amounts of high quality DNA. The degraded samples showed a very limited number of reads/sample, from 2.9–10.2 folds lower than the ones reported for the less concentrated 2800 M DNA dilution (0.1 ng). In addition, it was impossible to assign up to 78.2% of the genotypes in the degraded samples as the software identified the corresponding loci as “low coverage” (< 50x). Amplification artifacts such as allelic imbalances, allele drop outs and a single allele drop in were also scored in the degraded samples. However, the ForenSeqTM DNA Sequencing kit, on the Illumina MiSeq, was able to generate data which led to the correct typing of 5.1–44.8% and 10.9–58.7% of 58 of the STRs and 92 SNPs, respectively. In all trial samples, the SNP markers showed higher chances to be typed correctly compared to the STRs. This NGS approach showed very promising results in terms of ability to recover genetic information from heavily degraded DNA samples for which the conventional PCR/CE approach gave no results. The frequency of genetic mistyping was very low, reaching the value of 1.4% for only one of the degraded samples. However, these results suggest that further validation studies and a definition of interpretation criteria for NGS data are needed before implementation of this technique in forensic genetics.  相似文献   

3.
MotivationCheap and fast next generation sequencing (NGS) technologies facilitate research of de novo assembly greatly. The reliability of contigs is critical to construct reliable scaffolding. However, contigs generated from most assemblers contain errors because of the limitation of assembly strategy and computation complexity. Among all these errors, the misassembly error is one of the most harmful types.ResultsIn this paper, we propose a new method named “PECC” to identify and correct misassembly errors in contigs based on the paired-end read distribution. PECC extracts sequence regions with lower paired-end reads supports and verifies them based on the distribution of paired-end supports. To validate the effectiveness of PECC, we applied PECC to the contigs produced by five popular assemblers on four real datasets, and we also carried out experiments to analyze the influences of PECC on scaffolding. The results show that PECC can reduce misassembly errors and improve the performance of scaffolding results, which demonstrate the promising applications of PECC in de novo assembly.  相似文献   

4.
Structural and computational biologists often need to measure the similarity of ligand binding conformations. The commonly used root-mean-square deviation (RMSD) is not only ligand-size dependent, but also may fail to capture biologically meaningful binding features. To address these issues, we developed the Contact Mode Score (CMS), a new metric to assess the conformational similarity based on intermolecular protein-ligand contacts. The CMS is less dependent on the ligand size and has the ability to include flexible receptors. In order to effectively compare binding poses of non-identical ligands bound to different proteins, we further developed the eXtended Contact Mode Score (XCMS). We believe that CMS and XCMS provide a meaningful assessment of the similarity of ligand binding conformations. CMS and XCMS are freely available at http://brylinski.cct.lsu.edu/content/contact-mode-score and http://geaux-computational-bio.github.io/contact-mode-score/.  相似文献   

5.
Domains are the structural basis of the physiological functions of proteins, and the prediction of which is an advantageous process on the study of protein structure and function. This article proposes a new complete automatic prediction method, PPM-Dom (Domain Position Prediction Method), for predicting the particular positions of domains in a target protein via its atomic coordinate. The presented method integrates complex networks, community division, and fuzzy mean operator (FMO). The whole sequences are divided into potential domain regions by the complex network and community division, and FMO allows the final determination for the domain position. This method will suffice to predict regions that will form a domain structure and those that are unstructured based on completely new atomic coordinate information of the query sequence, and be able to separate different domains in the same query sequence from each other. On evaluating the performance using an independent testing dataset, PPM-Dom reached 91.41% for prediction accuracy, 96.12% for sensitivity and 92.86% for specificity. The tool bag of PPM-Dom is freely available at http://cic.scu.edu.cn/bioinformatics/PPMDom.zip.  相似文献   

6.
G-protein-coupled receptors (GPCRs) are important targets of modern medicinal drugs. The accurate identification of interactions between GPCRs and drugs is of significant importance for both protein function annotations and drug discovery. In this paper, a new sequence-based predictor called TargetGDrug is designed and implemented for predicting GPCR–drug interactions. In TargetGDrug, the evolutionary feature of GPCR sequence and the wavelet-based molecular fingerprint feature of drug are integrated to form the combined feature of a GPCR–drug pair; then, the combined feature is fed to a trained random forest (RF) classifier to perform initial prediction; finally, a novel drug-association-matrix-based post-processing procedure is applied to reduce potential false positive or false negative of the initial prediction. Experimental results on benchmark datasets demonstrate the efficacy of the proposed method, and an improvement of 15% in the Matthews correlation coefficient (MCC) was observed over independent validation tests when compared with the most recently released sequence-based GPCR–drug interactions predictor. The implemented webserver, together with the datasets used in this study, is freely available for academic use at http://csbio.njust.edu.cn/bioinf/TargetGDrug.  相似文献   

7.
We present an algorithm for automatically predicting the topological family of any RNA three-way junction, given only the information from the secondary structure: the sequence and the Watson–Crick pairings. The parameters of the algorithm have been determined on a data set of 33 three-way junctions whose 3D conformation is known. We applied the algorithm on 53 other junctions and compared the predictions to the real shape of those junctions. We show that the correct answer is selected out of nine possible configurations 64% of the time. Additionally, these results are noticeably improved if homology information is used. The resulting software, Cartaj, is available online and downloadable (with source) at: http://cartaj.lri.fr.  相似文献   

8.
The classical sequencing by hybridization takes into account a binary information about sequence composition. A given element from an oligonucleotide library is or is not a part of the target sequence. However, the DNA chip technology has been developed and it enables to receive a partial information about multiplicity of each oligonucleotide the analyzed sequence consist of. Currently, it is not possible to assess the exact data of such type but even partial information should be very useful.Two realistic multiplicity information models are taken into consideration in this paper. The first one, called “one and many” assumes that it is possible to obtain information if a given oligonucleotide occurs in a reconstructed sequence once or more than once. According to the second model, called “one, two and many”, one is able to receive from biochemical experiment information if a given oligonucleotide is present in an analyzed sequence once, twice or at least three times.An ant colony optimization algorithm has been implemented to verify the above models and to compare with existing algorithms for sequencing by hybridization which utilize the additional information. The proposed algorithm solves the problem with any kind of hybridization errors. Computational experiment results confirm that using even the partial information about multiplicity leads to increased quality of reconstructed sequences. Moreover, they also show that the more precise model enables to obtain better solutions and the ant colony optimization algorithm outperforms the existing ones.Test data sets and the proposed ant colony optimization algorithm are available on: http://bioserver.cs.put.poznan.pl/download/ACO4mSBH.zip.  相似文献   

9.
Regulatory single nucleotide polymorphisms (rSNPs) in human genomes are thought to be responsible for phenotypic differences, including susceptibility to diseases and treatment outcomes, even they do not change any gene product. However, a genome-wide search for rSNPs has not been properly addressed so far. In this work, a computational method for rSNP identification is proposed. As background SNPs far outnumber rSNPs, an ensemble method is applied to handle imbalanced data, which firstly converts an unbalanced dataset into several balanced ones and then models for every balanced dataset. Two major types of features are extracted, that are sequence based features and allele-specific based features. Then random forest is applied to build the recognition model for each balanced dataset. Finally, ensemble strategies are adopted to combine the result of each model together. We have tested our method on a set of experimentally verified rSNPs, and leave-one-out cross-validation results showed that our method can achieve accuracy with sensitivity of 73.8%, specificity of 71.8% and the area under ROC curve (AUC) is 0.756. In addition, our method is threshold free and doesn’t rely on data of regulatory elements, thus it will have better adaptability when facing different data scenarios. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnpdect/.  相似文献   

10.
As a pivotal domain within envelope protein, fusion peptide (FP) plays a crucial role in pathogenicity and therapeutic intervention. Taken into account the limited FP annotations in NCBI database and absence of FP prediction software, it is urgent and desirable to develop a bioinformatics tool to predict new putative FPs (np-FPs) in retroviruses. In this work, a sequence-based FP model was proposed by combining Hidden Markov Method with similarity comparison. The classification accuracies are 91.97% and 92.31% corresponding to 10-fold and leave-one-out cross-validation. After scanning sequences without FP annotations, this model discovered 53,946 np-FPs. The statistical results on FPs or np-FPs reveal that FP is a conserved and hydrophobic domain. The FP software programmed for windows environment is available at https://sourceforge.net/projects/fptool/files/?source=navbar.  相似文献   

11.
We present a new form of merit function which measures agreement between a large number of data and the model function with a particular choice of parameters. We demonstrate the efficiency of the proposed merit function on the common problem of finding the base line of a spectrum. When the base line is expected to be a horizontal straight line, the use of minimization algorithms is not necessary, i.e. the solution is achieved in a small number of steps. We discuss the advantages of the proposed merit function in general, when explicit use of a minimization algorithm is necessary.The hardcopy text is accompanied by an electronic archive, stored on the SAE homepage at http://www1.elsevier.com/homepage/saa/sab/content/lower.htm. The archive contains fully functional demo program with tutorial, examples and Visual Basic source code of the key subroutine.  相似文献   

12.
Amidation plays an important role in a variety of pathological processes and serious diseases like neural dysfunction and hypertension. However, identification of protein amidation sites through traditional experimental methods is time consuming and expensive. In this paper, we proposed a novel predictor for Prediction of Amidation Sites (PrAS), which is the first software package for academic users. The method incorporated four representative feature types, which are position-based features, physicochemical and biochemical properties features, predicted structure-based features and evolutionary information features. A novel feature selection method, positive contribution feature selection was proposed to optimize features. PrAS achieved AUC of 0.96, accuracy of 92.1%, sensitivity of 81.2%, specificity of 94.9% and MCC of 0.76 on the independent test set. PrAS is freely available at https://sourceforge.net/p/praspkg.  相似文献   

13.
Rothe J  Nagy M 《Electrophoresis》2012,33(9-10):1488-1491
Current human genome databases for public single nucleotide polymorphisms (SNPs) still contain a substantial fraction of false entries. The main reasons for errors include sequencing or assembly errors, paralogous sequence-, and private variants. In the course of our studies on the Y chromosome, we established a set of internal laboratory guidelines for reliably identifying false SNP entries in databases.  相似文献   

14.
15.
Defining the amino acid composition of protein cores is fundamental for understanding protein folding, as different architectures might achieve structural stability only in the presence of specific amino acid networks. Quantitative characterization of protein cores in relation to the corresponding structures and dynamics is needed to increase the reliability of protein engineering procedures. Unambiguous criteria based on atom depth considerations were established to assign amino acid residues to protein cores and, hence, for classifying inner and outer molecular moieties. These criteria were summarized in a new tool named ProCoCoA, Protein Core Composition Analyzer. An user-friendly web interface was developed, available at the URL: http://www.sbl.unisi.it/prococoa. An accurate estimate of protein core composition for six protein architectures selected from the CATH database of solved structures has been carried out, and the obtained results indicate the presence of specific patterns of amino acid core composition in different protein folds.  相似文献   

16.
In proteins, the number of interacting pairs is usually much smaller than the number of non-interacting ones. So the imbalanced data problem will arise in the field of protein–protein interactions (PPIs) prediction. In this article, we introduce two ensemble methods to solve the imbalanced data problem. These ensemble methods combine the based-cluster under-sampling technique and the fusion classifiers. And then we evaluate the ensemble methods using a dataset from Database of Interacting Proteins (DIP) with 10-fold cross validation. All the prediction models achieve area under the receiver operating characteristic curve (AUC) value about 95%. Our results show that the ensemble classifiers are quite effective in predicting PPIs; we also gain some valuable conclusions on the performance of ensemble methods for PPIs in imbalanced data. The prediction software and all dataset employed in the work can be obtained for free at http://cic.scu.edu.cn/bioinformatics/Ensemble_PPIs/index.html.  相似文献   

17.
Quantitative analysis of behaviors shown by interacting multiple animals can provide a key for revealing high-order functions of their nervous systems. To resolve these complex behaviors, a video tracking system that preserves individual identity even under severe overlap in positions, i.e., occlusion, is needed. We developed GroupTracker, a multiple animal tracking system that accurately tracks individuals even under severe occlusion. As maximum likelihood estimation of Gaussian mixture model whose components can severely overlap is theoretically an ill-posed problem, we devised an expectation–maximization scheme with additional constraints on the eigenvalues of the covariance matrix of the mixture components. Our system was shown to accurately track multiple medaka (Oryzias latipes) which freely swim around in three dimensions and frequently overlap each other. As an accurate multiple animal tracking system, GroupTracker will contribute to revealing unexplored structures and patterns behind animal interactions. The Java source code of GroupTracker is available at https://sites.google.com/site/fukunagatsu/software/group-tracker.  相似文献   

18.
The phenomena accompanying the temperature-induced structural changes in five As4SexTe6–x glasses, withx=1 tox=5, were examined and are discussed. Differential thermal analysis traces of each glass composition at different heating rates from 2 to 50 deg/min were obtained and interpreted. The effect of the Se/Te ratio on the crystallization behaviour is discussed. It is interesting to note that the compositional dependence of the overall behaviour of the crystallization activation energy (E) seems to be similar to that of both the melting point (Tm) and the thermal conductivity () for the investigated glasses. Created structural defects due to gamma-irradiation have some effects on the crystallization process.
Zusammenfassung Die die temperaturbedingten strukturellen Veränderungen von 5 Glasern der allgemeinen Zusammensetzung As4SexTe6–x (x=1–5) begleitenden Phänomene wurden untersucht und diskutiert. Die von jedem Glas bei unterschiedlichen Aufheizgeschwindigkeiten zwischen 2 und 50 Grad pro Minute erhaltenen DTA-Kurven werden interpretiert. Der Effekt des Se/Te-Verhältnisses auf das Kristallisationsverhalten wird diskutiert. Von Interesse ist, daß die Abhängigkeit der Aktivierungsenergie (E) der Kristallisation von der Zusammensetzung der des Schmelzpunktes (Tm) und der Wärmeleitfähigkeit (*) der untersuchten Gläser ähnelt. Durch y-Bestrahlung hervorgerufene strukturelle Defekte haben einen gewissen Einfluß auf den Kristallisationsprozeß.

, As4Sex,Te6–x, c x=1–5. ( 2 50°/) . Se/Te . , , (T m ) (). , -, .
  相似文献   

19.
A unique example of macromolecular self‐assembly, where a mono‐component homopolyimide bearing carboxy end‐groups spontaneously forms nanopartilces with novel dimple‐like morphology in a single good solvent, is presented. The self‐assembly process is dramatically affected by the solution concentration and the temperature. It is proposed that such an unexpected self‐assembly behavior is a synergistic result of the self‐complementary hydrogen bonding between carboxy end‐groups and the propensity to parallel packing of polyimide chains through aromatic interactions.

  相似文献   


20.
Gene networks (GNs) have become one of the most important approaches for modeling biological processes. They are very useful to understand the different complex biological processes that may occur in living organisms. Currently, one of the biggest challenge in any study related with GN is to assure the quality of these GNs. In this sense, recent works use artificial data sets or a direct comparison with prior biological knowledge. However, these approaches are not entirely accurate as they only take into account direct gene–gene interactions for validation, leaving aside the weak (indirect) relationships.We propose a new measure, named gene network coherence (GNC), to rate the coherence of an input network according to different biological databases. In this sense, the measure considers not only the direct gene–gene relationships but also the indirect ones to perform a complete and fairer evaluation of the input network. Hence, our approach is able to use the whole information stored in the networks. A GNC JAVA-based implementation is available at: http://fgomezvela.github.io/GNC/.The results achieved in this work show that GNC outperforms the classical approaches for assessing GNs by means of three different experiments using different biological databases and input networks. According to the results, we can conclude that the proposed measure, which considers the inherent information stored in the direct and indirect gene–gene relationships, offers a new robust solution to the problem of GNs biological validation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号