首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the present era, a major drawback of current anti-cancer drugs is the lack of satisfactory specificity towards tumor cells. Despite the presence of several therapies against cancer, tumor homing peptides are gaining importance as therapeutic agents. In this regard, the huge number of therapeutic peptides generated in recent years, demands the need to develop an effective and interpretable computational model for rapidly, effectively and automatically predicting tumor homing peptides. Therefore, a sequence-based approach referred herein as THPep has been developed to predict and analyze tumor homing peptides by using an interpretable random forest classifier in concomitant with amino acid composition, dipeptide composition and pseudo amino acid composition. An overall accuracy and Matthews correlation coefficient of 90.13% and 0.76, respectively, were achieved from the independent test set on an objective benchmark dataset. Upon comparison, it was found that THPep was superior to the existing method and holds high potential as a useful tool for predicting tumor homing peptides. For the convenience of experimental scientists, a web server for this proposed method is provided publicly at http://codes.bio/thpep/.  相似文献   

2.
Conotoxins are small peptide toxins which are rich in disulfide and have the unique diversity of sequences. It is significant to correctly identify the types of ion channel-targeted conotoxins because that they are considered as the optimal pharmacological candidate medicine in drug design owing to their ability specifically binding to ion channels and interfering with neural transmission. Comparing with other feature extracting methods, the reduced amino acid cluster (RAAC) better resolved in simplifying protein complexity and identifying functional conserved regions. Thus, in our study, 673 RAACs generated from 74 types of reduced amino acid alphabet were comprehensively assessed to establish a state-of-the-art predictor for predicting ion channel-targeted conotoxins. The results showed Type 20, Cluster 9 (T = 20, C = 9) in the tripeptide composition (N = 3) achieved the best accuracy, 89.3%, which was based on the algorithm of amino acids reduction of variance maximization. Further, the ANOVA with incremental feature selection (IFS) was used for feature selection to improve prediction performance. Finally, the cross-validation results showed that the best overall accuracy we calculated was 96.4% and 1.8% higher than the best accuracy of previous studies. Based on the predictor we proposed, a user-friendly webserver was established and can be friendly accessed at http://bioinfor.imu.edu.cn/ictcraac.  相似文献   

3.
Determination of HIV-1 coreceptor usage is strongly recommended before starting the coreceptor-specific inhibitors for HIV treatment. Currently, the genotypic assays are the most interesting tools due to they are more feasible than phenotypic assays. However, most of prediction models were developed and validated by data set of HIV-1 subtype B and C. The present study aims to develop a powerful and reliable model to accurately predict HIV-1 coreceptor usage for CRF01_AE subtype called HIVCoR. HIVCoR utilized random forest and support vector machine as the prediction model, together with amino acid compositions, pseudo amino acid compositions and relative synonymous codon usage frequencies as the input feature. The overall success rate of 93.79% was achieved from the external validation test on the objective benchmark dataset. Comparison results indicated that HIVCoR was superior to other bioinformatics tools and genotypic predictors. For the convenience of experimental scientists, a user-friendly webserver has been established at http://codes.bio/hivcor/.  相似文献   

4.
DNA Replication plays the most crucial part in biological inheritance, ensuring an even flow of genetic information from parent to offspring. The beginning site of DNA Replication which is called the Origin of Replication (ORI), plays a significant role in understanding the molecular mechanisms and genomic analysis of DNA. Hence, it is paramount to accurately identify the origin of replication to gain a more accurate understanding of the biochemical and genomic properties of DNA. In this paper, We have proposed a new approach named OriC-ENS that uses sequence-based feature extraction techniques, K-mer, K-gapped Mono-Di, and Di Mono, and an ensemble classification technique that uses majority voting for the identification of Origin of Replication. We have used three SVM classifiers, one for the K-mer features and two more for K-Gapped Mono-Di and K-Gapped Di-mono features. Finally, we used majority voting to combine the prediction by each predictor. Experimental results on the S. Cerevisiae dataset have shown that our method achieves an accuracy of 91.62 % which outperforms other state-of-the-art methods by a significant margin. We have also tested our method using other evaluation metrics such as Matthews Correlation Coefficient (MCC), Area Under Curve(AUC), Sensitivity, and Specificity, where it has achieved a score of 0.83, 0.98, 0.90, and 0.92 respectively. We have further evaluated our model on an independent test set collected from OriDB, consisting of the sequences of Schizosaccharomyces pombe where we have seen that our model can predict the origin of replication efficiently and with great precision. We have made our python-based source code available at https://github.com/MehediAzim/OriC-ENS.  相似文献   

5.
6.
Cell wall lytic enzymes, as an important biotechnical tool in drug development, agriculture and the food industry, have attracted more research attention. In this research, the accurate identification of cell wall lytic enzymes is one of the key and fundamental tasks. In this study, in order to eliminate the inefficiency of in vitro experiments, a support vector machine-based cell wall lytic enzyme identification model was constructed using bioinformatics. This machine learning process includes feature extraction, feature selection, model training and optimization. According to the jackknife cross validation test, this model obtained a sensitivity of 0.853, a specificity of 0.977, an MCC of 0.845 and an AUC of 0.915. These benchmark results demonstrate that the proposed model outperforms the state-of-the-art method and that it has powerful cell wall lytic enzyme identification ability. Furthermore, we comprehensively analyzed the selected optimal features and used the proposed model to construct a user friendly web server called the CWLy-SVM to identify cell wall lytic enzymes, which is available at http://server.malab.cn/CWLy-SVM/index.jsp.  相似文献   

7.
A non-coding RNA (ncRNA) is a kind of RNA that is not converted into protein, however, it is involved in many biological processes, diseases, and cancers. Numerous ncRNAs have been identified and classified with high throughput sequencing technology. Hence, accurate ncRNAs class prediction is important and necessary for further study of their functions. Several computation techniques have been employed to predict the class of ncRNAs. Recent classification methods used the secondary structure as their primary input. However, the computational tools of RNA secondary structure are not accurate enough which affects the final performance of ncRNAs predictors. In this paper, we propose a simple yet efficient method, called ncRDeep, for ncRNAs prediction. It uses a simple convolutional neural network and RNA sequence information only. The ncRDeep was evaluated on benchmark datasets and the comparison results showed that the ncRDeep outperforms the state-of-the-art methods significantly. More specifically, the average accuracy was improved by 8.32%. Finally, we built a freely accessible web server for the developed tool ncRDeep at http://home.jbnu.ac.kr/NSCL/ncRDeep.htm  相似文献   

8.
Protein-ligand docking is an essential process that has accelerated drug discovery. How to accurately and effectively optimize the predominant position and orientation of ligands in the binding pocket of a target protein is a major challenge. This paper proposed a novel ligand binding pose search method called FWAVina based on the fireworks algorithm, which combined the fireworks algorithm with the efficient Broyden-Fletcher-Goldfarb-Shannon local search method adopted in AutoDock Vina to address the pose search problem in docking. The FWA was used as a global optimizer to rapidly search promising poses, and the Broyden-Fletcher-Goldfarb-Shannon method was incorporated into FWAVina to perform an exact local search. FWAVina was developed and tested on the PDBbind and DUD-E datasets. The docking performance of FWAVina was compared with the original Vina program. The results showed that FWAVina achieves a remarkable execution time reduction of more than 50 % than Vina without compromising the prediction accuracies in the docking and virtual screening experiments. In addition, the increase in the number of ligand rotatable bonds has almost no effect on the efficiency of FWAVina. The higher accuracy, faster convergence and improved stability make the FWAVina method a better choice of docking tool for computer-aided drug design. The source code is available at https://github.com/eddyblue/FWAVina/.  相似文献   

9.
MicroRNAs (miRNAs) have been proved to play an indispensable role in many fundamental biological processes, and the dysregulation of miRNAs is closely correlated with human complex diseases. Many studies have focused on the prediction of potential miRNA-disease associations. Considering the insufficient number of known miRNA-disease associations and the poor performance of many existing prediction methods, a novel model combining gradient boosting decision tree with logistic regression (GBDT-LR) is proposed to prioritize miRNA candidates for diseases. To balance positive and negative samples, GBDT-LR firstly adopted k-means clustering to screen negative samples from unknown miRNA-disease associations. Then, the gradient boosting decision tree (GBDT) model, which has an intrinsic advantage in finding many distinguishing features and feature combinations is applied to extract features. Finally, the new features extracted by the GBDT model are input into a logistic regression (LR) model for predicting the final miRNA-disease association score. The experimental results show that the average AUC of GBDT-LR in 5-fold cross-validation (CV) can achieve 0.9274. Besides, in the case studies, 90 %, 94 % and 88 % of the top 50 miRNAs potentially associated with colon cancer, gastric cancer, and pancreatic cancer were confirmed by databases, respectively. Compared with the other three state-of-the-art methods, GBDT-LR can achieve the best prediction performance. The source code and dataset of GBDT-LR are freely available at https://github.com/Pualalala/GBDT-LR.  相似文献   

10.
11.
12.
A growing number of people suffer from colorectal cancer, which is one of the most common cancers. It is essential to diagnose and treat the cancer as early as possible. The disease may change the microorganism communities in the gut, and it could be an efficient method to employ gut microorganisms to predict colorectal cancer. In this study, we selected operational taxonomic units that include several kinds of microorganisms to predict colorectal cancer. To find the most important microorganisms and obtain the best prediction performance, we explore effective feature selection methods. We employ three main steps. First, we use a single method to reduce features. Next, to reduce the number of features, we integrate the dimension reduction methods correlation-based feature selection and maximum relevance–maximum distance (MRMD 1.0 and MRMD 2.0). Then, we selected the important features according to the taxonomy files. In this study, we created training and test sets to obtain a more objective evaluation. Random forest, naïve Bayes, and decision tree classifiers were evaluated. The results show that the methods proposed in this study are better than hierarchical feature engineering. The proposed method, which combines correlation-based feature selection with MRMD 2.0, performed the best on the CRC2 dataset. The dataset and methods can be found in http://lab.malab.cn/data/microdata/data.html.  相似文献   

13.
An attempt toward screening of phytoconstituents (Arisaema genus) against herpes viruses (HSV-1 and HSV-2) was carried out using in silico approaches. Human HSV-1 and HSV-2 are accountable for cold sores genital herpes, respectively. Two drug targets, namely thymidine kinase (TK; PDB: 2ki5) serine protease (PDB: 1at3) were selected for HSV-1 and HSV-2. Initially, molecular docking tool was employed to screened apex hits phytoconstituents against herpes infections. ADME-T studies of top ranked were also further highlighted to achieve their effectiveness. Following, molecular dynamics studies were also examined to further optimize the stability of ligands. Glide scores and binding interactions of phytoconstituents were compared with Acyclovir, the main drug used in treatment of HSV, the screened top hits exhibited more glide scores and better binding for both HSV-1 and HSV-2 receptors. Additionally, ADME-T showed an ideal range for top hits while molecular dynamics results also illustrated stability of models. Ultimately, the whole efforts reveal to top three most promising hits for HSV-1 (39, 21, 19) and HSV-2 (20, 51, 19) receptors which can be explored further in wet lab experiments as promising agents against HSV infections.  相似文献   

14.
Single cell technology is a powerful tool to reveal intercellular heterogeneity and discover cellular developmental processes. When analyzing the complexity of cellular dynamics and variability, it is important to construct a pseudo-time trajectory using single-cell expression data to reflect the process of cellular development. Although a number of computational and statistical methods have been developed recently for single-cell analysis, more effective and efficient methods are still strongly needed. In this work we propose a new method named SCOUT for the inference of single-cell pseudo-time ordering with bifurcation trajectories. We first propose to use the fixed-radius near neighbors algorithms based on cell densities to find landmarks to represent the cell states, and employ the minimum spanning tree (MST) to determine the developmental branches. We then propose to use the projection of Apollonian circle or a weighted distance to determine the pseudo-time trajectories of single cells. The proposed algorithm is applied to one synthetic and two realistic single-cell datasets (including single-branching and multi-branching trajectories) and the cellular developmental dynamics is recovered successfully. Compared with other popular methods, numerical results show that our proposed method is able to generate more robust and accurate pseudo-time trajectories. The code of the method is implemented in Python and available at https://github.com/statway/SCOUT.  相似文献   

15.
In silico methods play an essential role in modern drug discovery methods. Virtual screening, an in silico method, is used to filter out the chemical space on which actual wet lab experiments are need to be conducted. Ligand based virtual screening is a computational strategy using which one can build a model of the target protein based on the knowledge of the ligands that bind successfully to the target. This model is then used to predict if the new molecule is likely to bind to the target. Support vector machine, a supervised learning algorithm used for classification, can be utilized for virtual screening the ligand data. When used for virtual screening purpose, SVM could produce interesting results. But since we have a huge ligand data, the time taken for training the SVM model is quite high compared to other learning algorithms. By parallelizing these algorithms on multi-core processors, one can easily expedite these discoveries. In this paper, a GPU based ligand based virtual screening tool (GpuSVMScreen) which uses SVM have been proposed and bench-marked. This data parallel virtual screening tool provides high throughput by running in short time. The proposed GpuSVMScreen can successfully screen large number of molecules (billions) also. The source code of this tool is available at http://ccc.nitc.ac.in/project/GPUSVMSCREEN.  相似文献   

16.
There exists over 2.5 million publicly available gene expression samples across 101,000 data series in NCBI's Gene Expression Omnibus (GEO) database. Due to the lack of the use of standardised ontology terms in GEO's free text metadata to annotate the experimental type and sample type, this database remains difficult to harness computationally without significant manual intervention.In this work, we present an interactive R/Shiny tool called GEOracle that utilises text mining and machine learning techniques to automatically identify perturbation experiments, group treatment and control samples and perform differential expression. We present applications of GEOracle to discover conserved signalling pathway target genes and identify an organ specific gene regulatory network.GEOracle is effective in discovering perturbation gene targets in GEO by harnessing its free text metadata. Its effectiveness and applicability has been demonstrated by cross validation and two real-life case studies. It opens up new avenues to unlock the gene regulatory information embedded inside large biological databases such as GEO. GEOracle is available at https://github.com/VCCRI/GEOracle.  相似文献   

17.
Protein-ligand complexes perform specific functions, most of which are related to human diseases. The database, called as human disease-related protein-ligand structures (dbHDPLS), collected 8833 structures which were extracted from protein data bank (PDB) and other related databases. The database is annotated with comprehensive information involving ligands and drugs, related human diseases and protein-ligand interaction information, with the information of protein structures. The database may be a reliable resource for structure-based drug target discoveries and druggability predictions of protein-ligand binding sites, drug-disease relationships based on protein-ligand complex structures. It can be publicly accessed at the website: http://DeepLearner.ahu.edu.cn/web/dbDPLS/.  相似文献   

18.
Predicting the binding of T cell receptors (TCRs) to epitopes plays a vital role in the immunotherapy, because it guides the development of therapeutic vaccines and cancer treatments. Many prediction methods attempted to explain the relationship between TCR repertoires from different aspects such as the V(D)J gene locus and the biophysical features of amino acids molecules, but the extraction of these features is time consuming and the performance of these models are limited. Few studies have investigated how k-mers formed by adjacent amino acids in TCR sequences direct the epitope recognition, and the specific mechanism of TCR epitope binding is still unclear. Motivated by these, we presented SETE (Sequence-based Ensemble learning approach for TCR Epitope binding prediction), a novel model to predict the TCR epitope binding accurately. The model deconstructed the CDR3β sequence to short amino acid chains as features and learned the pattern of them between different TCR repertoires with gradient boosting decision tree algorithm. Experiments have demonstrated that SETE can be helpful in predicting the TCRs’ corresponding epitopes and it outperforms other state-of-the-art methods in predicting the epitope specificity of TCR on VDJdb data set. The source codes have been uploaded at https://github.com/wonanut/SETE for academic usage only.  相似文献   

19.
20.
Genotype plays a significant role in determining characteristics in an organism and genotype calling has been greatly accelerated by sequencing technologies. Furthermore, most parametric statistical models are unable to effectively call genotype, which is influenced by the size of structural variations and the coverage fluctuations of sequencing data. In this study, we propose a new method for calling deletions’ genotypes from the next-generation data, called Cnngeno. Cnngeno can convert sequencing data into images and classifies the genotypes from these images using the convolutional neural network(CNN). Moreover, Cnngeno adopted the convolutional bootstrapping strategy to improve the anti-noisy label’s ability. The results show that Cnngeno performs better in terms of precision for calling genotype when compared with other existing methods. The Cnngeno is an open-source method, available at https://github.com/BRF123/Cnngeno.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号