首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
Proteins are one of the most important molecules that govern the cellular processes in most of the living organisms. Various functions of the proteins are of paramount importance to understand the basics of life. Several supervised learning approaches are applied in this field to predict the functionality of proteins. In this paper, we propose a convolutional neural network based approach ProtConv to predict the functionality of proteins by converting the amino-acid sequences to a two dimensional image. We have used a protein embedding technique using transfer learning to generate the feature vector. Feature vector is then converted into a square sized single channel image to be fed into a convolutional network. The neural network architecture used here is a combination of convolutional filters and average pooling layers followed by dense fully connected layers to predict a binary function. We have performed experiments on standard benchmark datasets taken from two very important protein function prediction task: proinflammatory cytokines and anticancer peptides. Our experiments show that the proposed method, ProtConv achieves state-of-the-art performances on both of the datasets. All necessary details about implementation with source code and datasets are made available at: https://github.com/swakkhar/ProtConv.  相似文献   

2.
A non-coding RNA (ncRNA) is a kind of RNA that is not converted into protein, however, it is involved in many biological processes, diseases, and cancers. Numerous ncRNAs have been identified and classified with high throughput sequencing technology. Hence, accurate ncRNAs class prediction is important and necessary for further study of their functions. Several computation techniques have been employed to predict the class of ncRNAs. Recent classification methods used the secondary structure as their primary input. However, the computational tools of RNA secondary structure are not accurate enough which affects the final performance of ncRNAs predictors. In this paper, we propose a simple yet efficient method, called ncRDeep, for ncRNAs prediction. It uses a simple convolutional neural network and RNA sequence information only. The ncRDeep was evaluated on benchmark datasets and the comparison results showed that the ncRDeep outperforms the state-of-the-art methods significantly. More specifically, the average accuracy was improved by 8.32%. Finally, we built a freely accessible web server for the developed tool ncRDeep at http://home.jbnu.ac.kr/NSCL/ncRDeep.htm  相似文献   

3.
Single-cell RNA sequencing technologies have revolutionized biomedical research by providing an effective means to profile gene expressions in individual cells. One of the first fundamental steps to perform the in-depth analysis of single-cell sequencing data is cell type classification and identification. Computational methods such as clustering algorithms have been utilized and gaining in popularity because they can save considerable resources and time for experimental validations. Although selecting the optimal features (i.e., genes) is an essential process to obtain accurate and reliable single-cell clustering results, the computational complexity and dropout events that can introduce zero-inflated noise make this process very challenging. In this paper, we propose an effective single-cell clustering algorithm based on the ensemble feature selection and similarity measurements. We initially identify the set of potential features, then measure the cell-to-cell similarity based on the subset of the potentials through multiple feature sampling approaches. We construct the ensemble network based on cell-to-cell similarity. Finally, we apply a network-based clustering algorithm to obtain single-cell clusters. We evaluate the performance of our proposed algorithm through multiple assessments in real-world single-cell RNA sequencing datasets with known cell types. The results show that our proposed algorithm can identify accurate and consistent single-cell clustering. Moreover, the proposed algorithm takes relative expression as input, so it can easily be adopted by existing analysis pipelines. The source code has been made publicly available at https://github.com/jeonglab/scCLUE.  相似文献   

4.
Evolution builds up new genetic material from existing ones, not in random, but in highly ordered and eloquent patterns. Most of these sequence repeats are revelatory of valuable information contributing to areas of disease research and function of macromolecules, to name a few. In the age of next generation genome sequencing, rapid and efficient extraction of all unbiased sequence repeats from macromolecules is the need of the hour. In view of this reckoning, an online web-based computing server, RepEx, has been developed to extract and display all possible repeats for DNA and protein sequences. Apart from exact or identical repeats, the server has been designed adeptly to identify and extract degenerate, inverted, everted and mirror repeats from both DNA and protein sequences. The server has striking output displays, featuring interactive graphs and comprehensive output files. In addition, RepEx has been accoutered with an easy-to-use interface and search filters to facilitate a user-defined query or search and is freely available and accessible via the World Wide Web at http://bioserver2.physics.iisc.ac.in/RepEx/.  相似文献   

5.
Proteins play their vital role in biological systems through interaction and complex formation with other biological molecules. Indeed, abnormalities in the interaction patterns affect the proteins’ structure and have detrimental effects on living organisms. Research in structure prediction gains its gravity as the functions of proteins depend on their structures. Protein–protein docking is one of the computational methods devised to understand the interaction between proteins. Metaheuristic algorithms are promising to use owing to the hardness of the structure prediction problem. In this paper, a variant of the Flower Pollination Algorithm (FPA) is applied to get an accurate protein–protein complex structure. The algorithm begins execution from a randomly generated initial population, which gets flourished in different isolated islands, trying to find their local optimum. The abiotic and biotic pollination applied in different generations brings diversity and intensity to the solutions. Each round of pollination applies an energy-based scoring function whose value influences the choice to accept a new solution. Analysis of final predictions based on CAPRI quality criteria shows that the proposed method has a success rate of 58% in top10 ranks, which in comparison with other methods like SwarmDock, pyDock, ZDOCK is better. Source code of the work is available at: https://github.com/Sharon1989Sunny/_FPDock_.  相似文献   

6.
7.
Protein-ligand complexes perform specific functions, most of which are related to human diseases. The database, called as human disease-related protein-ligand structures (dbHDPLS), collected 8833 structures which were extracted from protein data bank (PDB) and other related databases. The database is annotated with comprehensive information involving ligands and drugs, related human diseases and protein-ligand interaction information, with the information of protein structures. The database may be a reliable resource for structure-based drug target discoveries and druggability predictions of protein-ligand binding sites, drug-disease relationships based on protein-ligand complex structures. It can be publicly accessed at the website: http://DeepLearner.ahu.edu.cn/web/dbDPLS/.  相似文献   

8.
In silico methods play an essential role in modern drug discovery methods. Virtual screening, an in silico method, is used to filter out the chemical space on which actual wet lab experiments are need to be conducted. Ligand based virtual screening is a computational strategy using which one can build a model of the target protein based on the knowledge of the ligands that bind successfully to the target. This model is then used to predict if the new molecule is likely to bind to the target. Support vector machine, a supervised learning algorithm used for classification, can be utilized for virtual screening the ligand data. When used for virtual screening purpose, SVM could produce interesting results. But since we have a huge ligand data, the time taken for training the SVM model is quite high compared to other learning algorithms. By parallelizing these algorithms on multi-core processors, one can easily expedite these discoveries. In this paper, a GPU based ligand based virtual screening tool (GpuSVMScreen) which uses SVM have been proposed and bench-marked. This data parallel virtual screening tool provides high throughput by running in short time. The proposed GpuSVMScreen can successfully screen large number of molecules (billions) also. The source code of this tool is available at http://ccc.nitc.ac.in/project/GPUSVMSCREEN.  相似文献   

9.
An attempt toward screening of phytoconstituents (Arisaema genus) against herpes viruses (HSV-1 and HSV-2) was carried out using in silico approaches. Human HSV-1 and HSV-2 are accountable for cold sores genital herpes, respectively. Two drug targets, namely thymidine kinase (TK; PDB: 2ki5) serine protease (PDB: 1at3) were selected for HSV-1 and HSV-2. Initially, molecular docking tool was employed to screened apex hits phytoconstituents against herpes infections. ADME-T studies of top ranked were also further highlighted to achieve their effectiveness. Following, molecular dynamics studies were also examined to further optimize the stability of ligands. Glide scores and binding interactions of phytoconstituents were compared with Acyclovir, the main drug used in treatment of HSV, the screened top hits exhibited more glide scores and better binding for both HSV-1 and HSV-2 receptors. Additionally, ADME-T showed an ideal range for top hits while molecular dynamics results also illustrated stability of models. Ultimately, the whole efforts reveal to top three most promising hits for HSV-1 (39, 21, 19) and HSV-2 (20, 51, 19) receptors which can be explored further in wet lab experiments as promising agents against HSV infections.  相似文献   

10.
Predicting the binding of T cell receptors (TCRs) to epitopes plays a vital role in the immunotherapy, because it guides the development of therapeutic vaccines and cancer treatments. Many prediction methods attempted to explain the relationship between TCR repertoires from different aspects such as the V(D)J gene locus and the biophysical features of amino acids molecules, but the extraction of these features is time consuming and the performance of these models are limited. Few studies have investigated how k-mers formed by adjacent amino acids in TCR sequences direct the epitope recognition, and the specific mechanism of TCR epitope binding is still unclear. Motivated by these, we presented SETE (Sequence-based Ensemble learning approach for TCR Epitope binding prediction), a novel model to predict the TCR epitope binding accurately. The model deconstructed the CDR3β sequence to short amino acid chains as features and learned the pattern of them between different TCR repertoires with gradient boosting decision tree algorithm. Experiments have demonstrated that SETE can be helpful in predicting the TCRs’ corresponding epitopes and it outperforms other state-of-the-art methods in predicting the epitope specificity of TCR on VDJdb data set. The source codes have been uploaded at https://github.com/wonanut/SETE for academic usage only.  相似文献   

11.
There exists over 2.5 million publicly available gene expression samples across 101,000 data series in NCBI's Gene Expression Omnibus (GEO) database. Due to the lack of the use of standardised ontology terms in GEO's free text metadata to annotate the experimental type and sample type, this database remains difficult to harness computationally without significant manual intervention.In this work, we present an interactive R/Shiny tool called GEOracle that utilises text mining and machine learning techniques to automatically identify perturbation experiments, group treatment and control samples and perform differential expression. We present applications of GEOracle to discover conserved signalling pathway target genes and identify an organ specific gene regulatory network.GEOracle is effective in discovering perturbation gene targets in GEO by harnessing its free text metadata. Its effectiveness and applicability has been demonstrated by cross validation and two real-life case studies. It opens up new avenues to unlock the gene regulatory information embedded inside large biological databases such as GEO. GEOracle is available at https://github.com/VCCRI/GEOracle.  相似文献   

12.
Protein-ligand docking is an essential process that has accelerated drug discovery. How to accurately and effectively optimize the predominant position and orientation of ligands in the binding pocket of a target protein is a major challenge. This paper proposed a novel ligand binding pose search method called FWAVina based on the fireworks algorithm, which combined the fireworks algorithm with the efficient Broyden-Fletcher-Goldfarb-Shannon local search method adopted in AutoDock Vina to address the pose search problem in docking. The FWA was used as a global optimizer to rapidly search promising poses, and the Broyden-Fletcher-Goldfarb-Shannon method was incorporated into FWAVina to perform an exact local search. FWAVina was developed and tested on the PDBbind and DUD-E datasets. The docking performance of FWAVina was compared with the original Vina program. The results showed that FWAVina achieves a remarkable execution time reduction of more than 50 % than Vina without compromising the prediction accuracies in the docking and virtual screening experiments. In addition, the increase in the number of ligand rotatable bonds has almost no effect on the efficiency of FWAVina. The higher accuracy, faster convergence and improved stability make the FWAVina method a better choice of docking tool for computer-aided drug design. The source code is available at https://github.com/eddyblue/FWAVina/.  相似文献   

13.
Hereditary Transthyretin-associated amyloidosis (ATTR) is an autosomal dominant protein-folding disorder with adult-onset caused by mutation of transthyretin (TTR). TTR is characterized by extracellular deposition of amyloid, leading to loss of autonomy and finally, death. More than 100 distinct mutations in TTR gene have been reported from variable age of onset, clinical expression and penetrance data. Besides, the cure for the disease remains still obscure. Further, the prioritizing of mutations concerning the characteristic features governing the stability and pathogenicity of TTR mutant proteins remains unanswered, to date and thus, a complex state of study for researchers. Herein, we provide a full report encompassing the effects of every reported mutant model of TTR protein about the stability, functionality and pathogenicity using various computational tools. In addition, the results obtained from our study were used to create TTRMDB (Transthyretin mutant database), which could be easy access to researchers at http://vit.ac.in/ttrmdb.  相似文献   

14.
Single cell technology is a powerful tool to reveal intercellular heterogeneity and discover cellular developmental processes. When analyzing the complexity of cellular dynamics and variability, it is important to construct a pseudo-time trajectory using single-cell expression data to reflect the process of cellular development. Although a number of computational and statistical methods have been developed recently for single-cell analysis, more effective and efficient methods are still strongly needed. In this work we propose a new method named SCOUT for the inference of single-cell pseudo-time ordering with bifurcation trajectories. We first propose to use the fixed-radius near neighbors algorithms based on cell densities to find landmarks to represent the cell states, and employ the minimum spanning tree (MST) to determine the developmental branches. We then propose to use the projection of Apollonian circle or a weighted distance to determine the pseudo-time trajectories of single cells. The proposed algorithm is applied to one synthetic and two realistic single-cell datasets (including single-branching and multi-branching trajectories) and the cellular developmental dynamics is recovered successfully. Compared with other popular methods, numerical results show that our proposed method is able to generate more robust and accurate pseudo-time trajectories. The code of the method is implemented in Python and available at https://github.com/statway/SCOUT.  相似文献   

15.
In the present era, a major drawback of current anti-cancer drugs is the lack of satisfactory specificity towards tumor cells. Despite the presence of several therapies against cancer, tumor homing peptides are gaining importance as therapeutic agents. In this regard, the huge number of therapeutic peptides generated in recent years, demands the need to develop an effective and interpretable computational model for rapidly, effectively and automatically predicting tumor homing peptides. Therefore, a sequence-based approach referred herein as THPep has been developed to predict and analyze tumor homing peptides by using an interpretable random forest classifier in concomitant with amino acid composition, dipeptide composition and pseudo amino acid composition. An overall accuracy and Matthews correlation coefficient of 90.13% and 0.76, respectively, were achieved from the independent test set on an objective benchmark dataset. Upon comparison, it was found that THPep was superior to the existing method and holds high potential as a useful tool for predicting tumor homing peptides. For the convenience of experimental scientists, a web server for this proposed method is provided publicly at http://codes.bio/thpep/.  相似文献   

16.
A growing number of people suffer from colorectal cancer, which is one of the most common cancers. It is essential to diagnose and treat the cancer as early as possible. The disease may change the microorganism communities in the gut, and it could be an efficient method to employ gut microorganisms to predict colorectal cancer. In this study, we selected operational taxonomic units that include several kinds of microorganisms to predict colorectal cancer. To find the most important microorganisms and obtain the best prediction performance, we explore effective feature selection methods. We employ three main steps. First, we use a single method to reduce features. Next, to reduce the number of features, we integrate the dimension reduction methods correlation-based feature selection and maximum relevance–maximum distance (MRMD 1.0 and MRMD 2.0). Then, we selected the important features according to the taxonomy files. In this study, we created training and test sets to obtain a more objective evaluation. Random forest, naïve Bayes, and decision tree classifiers were evaluated. The results show that the methods proposed in this study are better than hierarchical feature engineering. The proposed method, which combines correlation-based feature selection with MRMD 2.0, performed the best on the CRC2 dataset. The dataset and methods can be found in http://lab.malab.cn/data/microdata/data.html.  相似文献   

17.
《印度化学会志》2023,100(4):100951
The current research work deals with the design, synthesis and characterization of a series of 6-substituted-4-hydroxy-1-(2-substitutedthiazol-4-yl)quinolin-2(1H)-one derivatives [III(a-d)(1–3)] and evaluation of their in-vitro anticancer activity against MDA-MB (Breast cancer) and A549 (Lung cancer) cell lines based upon MTT assay and in-vitro antibacterial by the measurement of zone of inhibition and determining the Minimum Inhibitory Concentration (MIC). All the synthesized compounds were characterized by UV, IR, 1H NMR and 13C NMR spectral data.Molecular docking studies of the title compounds were carried out using Molegro Virtual Docker (MVD-2013, 6.0) software. The synthesized compounds exhibited well conserved hydrogen bond interactions with one or more amino acid residues in the active pocket of EGFRK tyrosine kinase domain (PDB ID: 1m17) for docking study on anticancer activity and S. aureus DNA Gyrase domain complexed with a ciprofloxacin inhibitor (PDB ID: 2XCT) for antibacterial docking study. All synthesized derivatives were potent against A549 (Lung cancer) cell line as compared to MDA-MB (Breast cancer) cell line. Compound 2-(4-(4-hydroxy-6-methyl-2-oxoquinolin-1(2H)-yl)thiazol-2-yl)hydrazin-1-ium iodide (IIId-2) was found to be the most cytotoxic as compared to the other synthesized derivatives, with IC50 values of 346.12 μg/mL against A549 (Lung cancer) cell line, however all synthesized derivatives were found to be a poor antibacterial agent when compared with standard ciprofloxacin.Thus, the synthesized derivatives possessed a potential to bind with some of the residues of the active site and can be further developed into potential pharmacological agents.  相似文献   

18.
Cell wall lytic enzymes, as an important biotechnical tool in drug development, agriculture and the food industry, have attracted more research attention. In this research, the accurate identification of cell wall lytic enzymes is one of the key and fundamental tasks. In this study, in order to eliminate the inefficiency of in vitro experiments, a support vector machine-based cell wall lytic enzyme identification model was constructed using bioinformatics. This machine learning process includes feature extraction, feature selection, model training and optimization. According to the jackknife cross validation test, this model obtained a sensitivity of 0.853, a specificity of 0.977, an MCC of 0.845 and an AUC of 0.915. These benchmark results demonstrate that the proposed model outperforms the state-of-the-art method and that it has powerful cell wall lytic enzyme identification ability. Furthermore, we comprehensively analyzed the selected optimal features and used the proposed model to construct a user friendly web server called the CWLy-SVM to identify cell wall lytic enzymes, which is available at http://server.malab.cn/CWLy-SVM/index.jsp.  相似文献   

19.
《印度化学会志》2023,100(5):100981
In this study, in order to obtain biologically active compounds, a series of anti-glyoximehydrazone ligands bearing vic-dioxime, hydrazone, and pyrazole moieties and their (O•••H–O) bridged nickel(II), cobalt(II) and copper(II) metal complexes were prepared. Further, the molecular docking studies were carried out on those ligands and their nickel(II), cobalt(II) and copper(II) metal complexes to analyze the interaction with EGFR Kinase domain complexed with tak-285 (PDB ID: 3POZ) and human androgen receptor T877A mutant (PDB ID:2OZ7). In addition, the compounds were optimized by using B3LYP/6-311G+(d,p) level of Density Functional Theory (DFT) to evaluate the HOMO–LUMO contours and quantum chemical parameters. Also, bioactivity analysis were performed.Metal complexes had higher binding affinities against 3POZ and 2OZ7. The most promising compounds for 3POZ were nickel(II) and copper(II) metal complexes. However, for the 2OZ7 target receptor, cobalt(II) and copper(II) metal complexes were the possible hit compounds. Furthermore, cobalt(II) metal complex of ligand two was found to be the most reactive one among others. Moreover, it had the highest ω which is related to a potent higher electrophilic character. It was determined that all the compounds had moderate bioactivity.In conclusion, nickel(II), cobalt(II), and copper(II) complexes could be powerful hit compounds for anti-cancer drug discovery studies.  相似文献   

20.
Position-Specific Scoring Matrix (PSSM) is an excellent feature extraction method that was proposed early in protein classifying prediction, but within the restriction of feature shape in PSSM, researchers make a lot attempts to process it so that PSSM can be input to the traditional machine learning algorithms. These processes drop information provided by PSSM in a way thus the feature representation is limited. Moreover, the high-dimensional feature representation of PSSM makes it incompatible with other feature extraction methods. We use the PSSM as the input of Recurrent Neural Network without any post-processing, the amino acids in protein sequences are regarded as time step in RNN. This way takes full advantage of the information that PSSM provides. In this study, the PSSM is input to the model directly and the internal information of PSSM is fully utilized, we propose an end-to-end solution and achieve state-of-the-art performance. Ultimately, the exploration of how to combine PSSM with traditional feature extraction methods is carried out and achieve slightly improved performance. Our network architecture is implemented in Python and is available at https://github.com/YellowcardD/RNN-for-membrane-protein-types-prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号