期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A non-coding RNA (ncRNA) is a kind of RNA that is not converted into protein, however, it is involved in many biological processes, diseases, and cancers. Numerous ncRNAs have been identified and classified with high throughput sequencing technology. Hence, accurate ncRNAs class prediction is important and necessary for further study of their functions. Several computation techniques have been employed to predict the class of ncRNAs. Recent classification methods used the secondary structure as their primary input. However, the computational tools of RNA secondary structure are not accurate enough which affects the final performance of ncRNAs predictors. In this paper, we propose a simple yet efficient method, called ncRDeep, for ncRNAs prediction. It uses a simple convolutional neural network and RNA sequence information only. The ncRDeep was evaluated on benchmark datasets and the comparison results showed that the ncRDeep outperforms the state-of-the-art methods significantly. More specifically, the average accuracy was improved by 8.32%. Finally, we built a freely accessible web server for the developed tool ncRDeep at http://home.jbnu.ac.kr/NSCL/ncRDeep.htm 相似文献

2.

Cnngeno: A high-precision deep learning based strategy for the calling of structural variation genotype

Genotype plays a significant role in determining characteristics in an organism and genotype calling has been greatly accelerated by sequencing technologies. Furthermore, most parametric statistical models are unable to effectively call genotype, which is influenced by the size of structural variations and the coverage fluctuations of sequencing data. In this study, we propose a new method for calling deletions’ genotypes from the next-generation data, called Cnngeno. Cnngeno can convert sequencing data into images and classifies the genotypes from these images using the convolutional neural network(CNN). Moreover, Cnngeno adopted the convolutional bootstrapping strategy to improve the anti-noisy label’s ability. The results show that Cnngeno performs better in terms of precision for calling genotype when compared with other existing methods. The Cnngeno is an open-source method, available at https://github.com/BRF123/Cnngeno. 相似文献

3.

SubFeat: Feature subspacing ensemble classifier for function prediction of DNA,RNA and protein sequences

相似文献

4.

iR5hmcSC: Identifying RNA 5-hydroxymethylcytosine with multiple features based on stacking learning

相似文献

5.

Ligand based virtual screening using SVM on GPU

Fast predictions of liquid-phase acid-catalyzed reaction rates using molecular dynamics simulations and convolutional neural networks

In silico methods play an essential role in modern drug discovery methods. Virtual screening, an in silico method, is used to filter out the chemical space on which actual wet lab experiments are need to be conducted. Ligand based virtual screening is a computational strategy using which one can build a model of the target protein based on the knowledge of the ligands that bind successfully to the target. This model is then used to predict if the new molecule is likely to bind to the target. Support vector machine, a supervised learning algorithm used for classification, can be utilized for virtual screening the ligand data. When used for virtual screening purpose, SVM could produce interesting results. But since we have a huge ligand data, the time taken for training the SVM model is quite high compared to other learning algorithms. By parallelizing these algorithms on multi-core processors, one can easily expedite these discoveries. In this paper, a GPU based ligand based virtual screening tool (GpuSVMScreen) which uses SVM have been proposed and bench-marked. This data parallel virtual screening tool provides high throughput by running in short time. The proposed GpuSVMScreen can successfully screen large number of molecules (billions) also. The source code of this tool is available at http://ccc.nitc.ac.in/project/GPUSVMSCREEN. 相似文献

6.

Alex K. Chew Shengli Jiang Weiqi Zhang Victor M. Zavala Reid C. Van Lehn 《Chemical science》2020,11(46):12464

The rates of liquid-phase, acid-catalyzed reactions relevant to the upgrading of biomass into high-value chemicals are highly sensitive to solvent composition and identifying suitable solvent mixtures is theoretically and experimentally challenging. We show that the complex atomistic configurations of reactant–solvent environments generated by classical molecular dynamics simulations can be exploited by 3D convolutional neural networks to enable accurate predictions of Brønsted acid-catalyzed reaction rates for model biomass compounds. We develop a 3D convolutional neural network, which we call SolventNet, and train it to predict acid-catalyzed reaction rates using experimental reaction data and corresponding molecular dynamics simulation data for seven biomass-derived oxygenates in water–cosolvent mixtures. We show that SolventNet can predict reaction rates for additional reactants and solvent systems an order of magnitude faster than prior simulation methods. This combination of machine learning with molecular dynamics enables the rapid, high-throughput screening of solvent systems and identification of improved biomass conversion conditions.

Solvent-mediated, acid-catalyzed reaction rates relevant to the upgrading of biomass into high-value chemicals are accurately predicted using a combination of molecular dynamics simulations and 3D convolutional neural networks. 相似文献

7.

TTRMDB: A database for structural and functional analysis on the impact of SNPs over transthyretin (TTR) using bioinformatic tools

Hereditary Transthyretin-associated amyloidosis (ATTR) is an autosomal dominant protein-folding disorder with adult-onset caused by mutation of transthyretin (TTR). TTR is characterized by extracellular deposition of amyloid, leading to loss of autonomy and finally, death. More than 100 distinct mutations in TTR gene have been reported from variable age of onset, clinical expression and penetrance data. Besides, the cure for the disease remains still obscure. Further, the prioritizing of mutations concerning the characteristic features governing the stability and pathogenicity of TTR mutant proteins remains unanswered, to date and thus, a complex state of study for researchers. Herein, we provide a full report encompassing the effects of every reported mutant model of TTR protein about the stability, functionality and pathogenicity using various computational tools. In addition, the results obtained from our study were used to create TTRMDB (Transthyretin mutant database), which could be easy access to researchers at http://vit.ac.in/ttrmdb. 相似文献

8.

CWLy-SVM: A support vector machine-based tool for identifying cell wall lytic enzymes

Cell wall lytic enzymes, as an important biotechnical tool in drug development, agriculture and the food industry, have attracted more research attention. In this research, the accurate identification of cell wall lytic enzymes is one of the key and fundamental tasks. In this study, in order to eliminate the inefficiency of in vitro experiments, a support vector machine-based cell wall lytic enzyme identification model was constructed using bioinformatics. This machine learning process includes feature extraction, feature selection, model training and optimization. According to the jackknife cross validation test, this model obtained a sensitivity of 0.853, a specificity of 0.977, an MCC of 0.845 and an AUC of 0.915. These benchmark results demonstrate that the proposed model outperforms the state-of-the-art method and that it has powerful cell wall lytic enzyme identification ability. Furthermore, we comprehensively analyzed the selected optimal features and used the proposed model to construct a user friendly web server called the CWLy-SVM to identify cell wall lytic enzymes, which is available at http://server.malab.cn/CWLy-SVM/index.jsp. 相似文献

9.

HIVCoR: A sequence-based tool for predicting HIV-1 CRF01_AE coreceptor usage

Determination of HIV-1 coreceptor usage is strongly recommended before starting the coreceptor-specific inhibitors for HIV treatment. Currently, the genotypic assays are the most interesting tools due to they are more feasible than phenotypic assays. However, most of prediction models were developed and validated by data set of HIV-1 subtype B and C. The present study aims to develop a powerful and reliable model to accurately predict HIV-1 coreceptor usage for CRF01_AE subtype called HIVCoR. HIVCoR utilized random forest and support vector machine as the prediction model, together with amino acid compositions, pseudo amino acid compositions and relative synonymous codon usage frequencies as the input feature. The overall success rate of 93.79% was achieved from the external validation test on the objective benchmark dataset. Comparison results indicated that HIVCoR was superior to other bioinformatics tools and genotypic predictors. For the convenience of experimental scientists, a user-friendly webserver has been established at http://codes.bio/hivcor/. 相似文献

10.

Discovery of perturbation gene targets via free text metadata mining in Gene Expression Omnibus

There exists over 2.5 million publicly available gene expression samples across 101,000 data series in NCBI's Gene Expression Omnibus (GEO) database. Due to the lack of the use of standardised ontology terms in GEO's free text metadata to annotate the experimental type and sample type, this database remains difficult to harness computationally without significant manual intervention.In this work, we present an interactive R/Shiny tool called GEOracle that utilises text mining and machine learning techniques to automatically identify perturbation experiments, group treatment and control samples and perform differential expression. We present applications of GEOracle to discover conserved signalling pathway target genes and identify an organ specific gene regulatory network.GEOracle is effective in discovering perturbation gene targets in GEO by harnessing its free text metadata. Its effectiveness and applicability has been demonstrated by cross validation and two real-life case studies. It opens up new avenues to unlock the gene regulatory information embedded inside large biological databases such as GEO. GEOracle is available at https://github.com/VCCRI/GEOracle. 相似文献

11.

Disorder Atlas: Web-based software for the proteome-based interpretation of intrinsic disorder predictions

相似文献

12.

FWAVina: A novel optimization algorithm for protein-ligand docking based on the fireworks algorithm

Protein-ligand docking is an essential process that has accelerated drug discovery. How to accurately and effectively optimize the predominant position and orientation of ligands in the binding pocket of a target protein is a major challenge. This paper proposed a novel ligand binding pose search method called FWAVina based on the fireworks algorithm, which combined the fireworks algorithm with the efficient Broyden-Fletcher-Goldfarb-Shannon local search method adopted in AutoDock Vina to address the pose search problem in docking. The FWA was used as a global optimizer to rapidly search promising poses, and the Broyden-Fletcher-Goldfarb-Shannon method was incorporated into FWAVina to perform an exact local search. FWAVina was developed and tested on the PDBbind and DUD-E datasets. The docking performance of FWAVina was compared with the original Vina program. The results showed that FWAVina achieves a remarkable execution time reduction of more than 50 % than Vina without compromising the prediction accuracies in the docking and virtual screening experiments. In addition, the increase in the number of ligand rotatable bonds has almost no effect on the efficiency of FWAVina. The higher accuracy, faster convergence and improved stability make the FWAVina method a better choice of docking tool for computer-aided drug design. The source code is available at https://github.com/eddyblue/FWAVina/. 相似文献

13.

Efficient utilization on PSSM combining with recurrent neural network for membrane protein types prediction

DistAA: Database of amino acid distances in proteins and web application for statistical review of distances

Position-Specific Scoring Matrix (PSSM) is an excellent feature extraction method that was proposed early in protein classifying prediction, but within the restriction of feature shape in PSSM, researchers make a lot attempts to process it so that PSSM can be input to the traditional machine learning algorithms. These processes drop information provided by PSSM in a way thus the feature representation is limited. Moreover, the high-dimensional feature representation of PSSM makes it incompatible with other feature extraction methods. We use the PSSM as the input of Recurrent Neural Network without any post-processing, the amino acids in protein sequences are regarded as time step in RNN. This way takes full advantage of the information that PSSM provides. In this study, the PSSM is input to the model directly and the internal information of PSSM is fully utilized, we propose an end-to-end solution and achieve state-of-the-art performance. Ultimately, the exploration of how to combine PSSM with traditional feature extraction methods is carried out and achieve slightly improved performance. Our network architecture is implemented in Python and is available at https://github.com/YellowcardD/RNN-for-membrane-protein-types-prediction. 相似文献

14.

相似文献

15.

SETE: Sequence-based Ensemble learning approach for TCR Epitope binding prediction

Predicting the binding of T cell receptors (TCRs) to epitopes plays a vital role in the immunotherapy, because it guides the development of therapeutic vaccines and cancer treatments. Many prediction methods attempted to explain the relationship between TCR repertoires from different aspects such as the V(D)J gene locus and the biophysical features of amino acids molecules, but the extraction of these features is time consuming and the performance of these models are limited. Few studies have investigated how k-mers formed by adjacent amino acids in TCR sequences direct the epitope recognition, and the specific mechanism of TCR epitope binding is still unclear. Motivated by these, we presented SETE (Sequence-based Ensemble learning approach for TCR Epitope binding prediction), a novel model to predict the TCR epitope binding accurately. The model deconstructed the CDR3β sequence to short amino acid chains as features and learned the pattern of them between different TCR repertoires with gradient boosting decision tree algorithm. Experiments have demonstrated that SETE can be helpful in predicting the TCRs’ corresponding epitopes and it outperforms other state-of-the-art methods in predicting the epitope specificity of TCR on VDJdb data set. The source codes have been uploaded at https://github.com/wonanut/SETE for academic usage only. 相似文献

16.

ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning

Hayley Weir Keiran Thompson Amelia Woodward Benjamin Choi Augustin Braun Todd J. Martínez 《Chemical science》2021,12(31):10622

Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition tool designed to remove these barriers. A neural image captioning approach consisting of a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder learned a mapping from photographs of hand-drawn hydrocarbon structures to machine-readable SMILES representations. We generated a large auxiliary training dataset, based on RDKit molecular images, by combining image augmentation, image degradation and background addition. Additionally, a small dataset of ∼600 hand-drawn hydrocarbon chemical structures was crowd-sourced using a phone web application. These datasets were used to train the image-to-SMILES neural network with the goal of maximizing the hand-drawn hydrocarbon recognition accuracy. By forming a committee of the trained neural networks where each network casts one vote for the predicted molecule, we achieved a nearly 10 percentage point improvement of the molecule recognition accuracy and were able to assign a confidence value for the prediction based on the number of agreeing votes. The ensemble model achieved an accuracy of 76% on hand-drawn hydrocarbons, increasing to 86% if the top 3 predictions were considered.

Offline recognition of hand-drawn hydrocarbon structures is learned using an image-to-SMILES neural network through the application of synthetic data generation and ensemble learning. 相似文献

17.

dbHDPLS: A database of human disease-related protein-ligand structures

A merged molecular docking,ADME-T and dynamics approaches towards the genus of Arisaema as herpes simplex virus type 1 and type 2 inhibitors

Protein-ligand complexes perform specific functions, most of which are related to human diseases. The database, called as human disease-related protein-ligand structures (dbHDPLS), collected 8833 structures which were extracted from protein data bank (PDB) and other related databases. The database is annotated with comprehensive information involving ligands and drugs, related human diseases and protein-ligand interaction information, with the information of protein structures. The database may be a reliable resource for structure-based drug target discoveries and druggability predictions of protein-ligand binding sites, drug-disease relationships based on protein-ligand complex structures. It can be publicly accessed at the website: http://DeepLearner.ahu.edu.cn/web/dbDPLS/. 相似文献

18.

An attempt toward screening of phytoconstituents (Arisaema genus) against herpes viruses (HSV-1 and HSV-2) was carried out using in silico approaches. Human HSV-1 and HSV-2 are accountable for cold sores genital herpes, respectively. Two drug targets, namely thymidine kinase (TK; PDB: 2ki5) serine protease (PDB: 1at3) were selected for HSV-1 and HSV-2. Initially, molecular docking tool was employed to screened apex hits phytoconstituents against herpes infections. ADME-T studies of top ranked were also further highlighted to achieve their effectiveness. Following, molecular dynamics studies were also examined to further optimize the stability of ligands. Glide scores and binding interactions of phytoconstituents were compared with Acyclovir, the main drug used in treatment of HSV, the screened top hits exhibited more glide scores and better binding for both HSV-1 and HSV-2 receptors. Additionally, ADME-T showed an ideal range for top hits while molecular dynamics results also illustrated stability of models. Ultimately, the whole efforts reveal to top three most promising hits for HSV-1 (39, 21, 19) and HSV-2 (20, 51, 19) receptors which can be explored further in wet lab experiments as promising agents against HSV infections. 相似文献

19.

FPDock: Protein–protein docking using flower pollination algorithm

Proteins play their vital role in biological systems through interaction and complex formation with other biological molecules. Indeed, abnormalities in the interaction patterns affect the proteins’ structure and have detrimental effects on living organisms. Research in structure prediction gains its gravity as the functions of proteins depend on their structures. Protein–protein docking is one of the computational methods devised to understand the interaction between proteins. Metaheuristic algorithms are promising to use owing to the hardness of the structure prediction problem. In this paper, a variant of the Flower Pollination Algorithm (FPA) is applied to get an accurate protein–protein complex structure. The algorithm begins execution from a randomly generated initial population, which gets flourished in different isolated islands, trying to find their local optimum. The abiotic and biotic pollination applied in different generations brings diversity and intensity to the solutions. Each round of pollination applies an energy-based scoring function whose value influences the choice to accept a new solution. Analysis of final predictions based on CAPRI quality criteria shows that the proposed method has a success rate of 58% in top10 ranks, which in comparison with other methods like SwarmDock, pyDock, ZDOCK is better. Source code of the work is available at: https://github.com/Sharon1989Sunny/_FPDock_. 相似文献

20.

Effective single-cell clustering through ensemble feature selection and similarity measurements