共查询到20条相似文献,搜索用时 15 毫秒
1.
BackgroundDiscover possible Drug Target Interactions (DTIs) is a decisive step in the detection of the effects of drugs as well as drug repositioning. There is a strong incentive to develop effective computational methods that can effectively predict potential DTIs, as traditional DTI laboratory experiments are expensive, time-consuming, and labor-intensive. Some technologies have been developed for this purpose, however large numbers of interactions have not yet been detected, the accuracy of their prediction still low, and protein sequences and structured data are rarely used together in the prediction process.MethodsThis paper presents DTIs prediction model that takes advantage of the special capacity of the structured form of proteins and drugs. Our model obtains features from protein amino-acid sequences using physical and chemical properties, and from drugs smiles (Simplified Molecular Input Line Entry System) strings using encoding techniques. Comparing the proposed model with different existing methods under K-fold cross validation, empirical results show that our model based on ensemble learning algorithms for DTI prediction provide more accurate results from both structures and features data.ResultsThe proposed model is applied on two datasets:Benchmark (feature only) datasets and DrugBank (Structure data) datasets. Experimental results obtained by Light-Boost and ExtraTree using structures and feature data results in 98 % accuracy and 0.97 f-score comparing to 94 % and 0.92 achieved by the existing methods. Moreover, our model can successfully predict more yet undiscovered interactions, and hence can be used as a practical tool to drug repositioning.A case study of applying our prediction model on the proteins that are known to be affected by Corona viruses in order to predict the possible interactions among these proteins and existing drugs is performed. Also, our model is applied on Covid-19 related drugs announced on DrugBank. The results show that some drugs like DB00691 and DB05203 are predicted with 100 % accuracy to interact with ACE2 protein. This protein is a self-membrane protein that enables Covid-19 infection. Hence, our model can be used as an effective tool in drug reposition to predict possible drug treatments for Covid-19. 相似文献
2.
The protein disulfide bond is a covalent bond that forms during post-translational modification by the oxidation of a pair of cysteines. In protein, the disulfide bond is the most frequent covalent link between amino acids after the peptide bond. It plays a significant role in three-dimensional (3D) ab initio protein structure prediction (aiPSP), stabilizing protein conformation, post-translational modification, and protein folding. In aiPSP, the location of disulfide bonds can strongly reduce the conformational space searching by imposing geometrical constraints. Existing experimental techniques for the determination of disulfide bonds are time-consuming and expensive. Thus, developing sequence-based computational methods for disulfide bond prediction becomes indispensable. This study proposed a stacking-based machine learning approach for disulfide bond prediction (diSBPred). Various useful sequence and structure-based features are extracted for effective training, including conservation profile, residue solvent accessibility, torsion angle flexibility, disorder probability, a sequential distance between cysteines, and more. The prediction of disulfide bonds is carried out in two stages: first, individual cysteines are predicted as either bonding or non-bonding; second, the cysteine-pairs are predicted as either bonding or non-bonding by including the results from cysteine bonding prediction as a feature.The examination of the relevance of the features employed in this study and the features utilized in the existing nearest neighbor algorithm (NNA) method shows that the features used in this study improve about 7.39 % in jackknife validation balanced accuracy. Moreover, for individual cysteine bonding prediction and cysteine-pair bonding prediction, diSBPred provides a 10-fold cross-validation balanced accuracy of 82.29 % and 94.20 %, respectively. Altogether, our predictor achieves an improvement of 43.25 % based on balanced accuracy compared to the existing NNA based approach. Thus, diSBPred can be utilized to annotate the cysteine bonding residues of protein sequences whose structures are unknown as well as improve the accuracy of the aiPSP method, which can further aid in experimental studies of the disulfide bond and structure determination. 相似文献
3.
Gene selection from microarray data for cancer classification--a machine learning approach 总被引:1,自引:0,他引:1
Wang Y Tetko IV Hall MA Frank E Facius A Mayer KF Mewes HW 《Computational Biology and Chemistry》2005,29(1):1384-46
A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, nave Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis. 相似文献
4.
Protein inference from the identified peptides is of primary importance in the shotgun proteomics. The target of protein inference is to identify whether each candidate protein is truly present in the sample. To date, many computational methods have been proposed to solve this problem. However, there is still no method that can fully utilize the information hidden in the input data.In this article, we propose a learning-based method named BagReg for protein inference. The method firstly artificially extracts five features from the input data, and then chooses each feature as the class feature to separately build models to predict the presence probabilities of proteins. Finally, the weak results from five prediction models are aggregated to obtain the final result. We test our method on six public available data sets. The experimental results show that our method is superior to the state-of-the-art protein inference algorithms. 相似文献
5.
This study was planned to in silico screening of ssDNA aptamer against Escherichia coli O157:H7 by combination of machine learning and the PseKNC approach. For this, firstly a total numbers of 47 validated ssDNA aptamers as well as 498 random DNA sequences were considered as positive and negative training data respectively. The sequences then converted to numerical vectors using PseKNC method through Pse-in-one 2.0 web server. After that, the numerical vectors were subjected to classification by the SVM, ANN and RF algorithms available in Orange 3.2.0 software. The performances of the tested models were evaluated using cross-validation, random sampling and ROC curve analyzes. The primary results demonstrated that the ANN and RF algorithms have appropriate performances for the data classification. To improve the performances of mentioned classifiers the positive training data was triplicated and re-training process was also performed. The results confirmed that data size improvement had significant effect on the accuracy of data classification especially about RF model. Subsequently, the RF algorithm with accuracy of 98% was selected for aptamer screening. The thermodynamics details of folding process as well as secondary structures of the screened aptamers were also considered as final evaluations. The results confirmed that the selected aptamers by the proposed method had appropriate structure properties and there is no thermodynamics limit for the aptamers folding. 相似文献
6.
In the present era, a major drawback of current anti-cancer drugs is the lack of satisfactory specificity towards tumor cells. Despite the presence of several therapies against cancer, tumor homing peptides are gaining importance as therapeutic agents. In this regard, the huge number of therapeutic peptides generated in recent years, demands the need to develop an effective and interpretable computational model for rapidly, effectively and automatically predicting tumor homing peptides. Therefore, a sequence-based approach referred herein as THPep has been developed to predict and analyze tumor homing peptides by using an interpretable random forest classifier in concomitant with amino acid composition, dipeptide composition and pseudo amino acid composition. An overall accuracy and Matthews correlation coefficient of 90.13% and 0.76, respectively, were achieved from the independent test set on an objective benchmark dataset. Upon comparison, it was found that THPep was superior to the existing method and holds high potential as a useful tool for predicting tumor homing peptides. For the convenience of experimental scientists, a web server for this proposed method is provided publicly at http://codes.bio/thpep/. 相似文献
7.
8.
In order to understand the molecular mechanism underlying any disease, knowledge about the interacting proteins in the disease pathway is essential. The number of revealed protein-protein interactions (PPI) is still very limited compared to the available protein sequences of different organisms. Experiment based high-throughput technologies though provide some data about these interactions, those are often fairly noisy. Computational techniques for predicting protein–protein interactions therefore assume significance. 1296 binary fingerprints that encode a combination of structural and geometric properties were developed using the crystallographic data of 15,000 protein complexes in the pdb server. In a case study, these fingerprints were created for proteins implicated in the Type 2 diabetes mellitus disease. The fingerprints were input into a SVM based model for discriminating disease proteins from non disease proteins yielding a classification accuracy of 78.2% (AUC value of 0.78) on an external data set composed of proteins retrieved via text mining of diabetes related literature. A PPI network was constructed and analysed to explore new disease targets. The integrated approach exemplified here has a potential for identifying disease related proteins, functional annotation and other proteomics studies. 相似文献
9.
Literature contains over fifty years of accumulated methods proposed by researchers for predicting the secondary structures of proteins in silico. A large part of this collection is comprised of artificial neural network-based approaches, a field of artificial intelligence and machine learning that is gaining increasing popularity in various application areas. The primary objective of this paper is to put together the summary of works that are important but sparse in time, to help new researchers have a clear view of the domain in a single place. An informative introduction to protein secondary structure and artificial neural networks is also included for context. This review will be valuable in designing future methods to improve protein secondary structure prediction accuracy. The various neural network methods found in this problem domain employ varying architectures and feature spaces, and a handful stand out due to significant improvements in prediction. Neural networks with larger feature scope and higher architecture complexity have been found to produce better protein secondary structure prediction. The current prediction accuracy lies around the 84% marks, leaving much room for further improvement in the prediction of secondary structures in silico. It was found that the estimated limit of 88% prediction accuracy has not been reached yet, hence further research is a timely demand. 相似文献
10.
11.
Xiaoye Jin Yuluo Liu Yuanyuan Zhang Yongle Li Dr. Chuanliang Chen Dr. Hongdan Wang 《Electrophoresis》2021,42(14-15):1473-1479
A lot of population data of 30 deletion/insertion polymorphisms (DIPs) of the Investigator DIPplex kit in different continental populations have been reported. Here, we assessed genetic distributions of these 30 DIPs in different continental populations to pinpoint candidate ancestry informative DIPs. Besides, the effectiveness of machine learning methods for ancestry analysis was explored. Pairwise informativeness (In) values of 30 DIPs revealed that six loci displayed relatively high In values (>0.1) among different continental populations. Besides, more loci showed high population-specific divergence (PSD) values in African population. Based on the pairwise In and PSD values of 30 DIPs, 17 DIPs in the Investigator DIPplex kit were selected to ancestry analyses of African, European, and East Asian populations. Even though 30 DIPs provided better ancestry resolution of these continental populations based on the results of PCA and population genetic structure, we found that 17 DIPs could also distinguish these continental populations. More importantly, these 17 DIPs possessed more balanced cumulative PSD distributions in these populations. Six machine learning methods were used to perform ancestry analyses of these continental populations based on 17 DIPs. Obtained results revealed that naïve Bayes manifested the greatest performance; whereas, k nearest neighbor showed relatively low performance. To sum up, these machine learning methods, especially for naïve Bayes, could be used as the valuable tool for ancestry analysis. 相似文献
12.
Structural Chemistry - Repurposing of ‘old’ drugs to treat both common and rare diseases has garnered huge attention of the researchers because of the high attrition rates and... 相似文献
13.
Schroeter TS Schwaighofer A Mika S Ter Laak A Suelzle D Ganzer U Heinrich N Müller KR 《Journal of computer-aided molecular design》2007,21(9):485-498
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error. 相似文献
14.
Schroeter TS Schwaighofer A Mika S Ter Laak A Suelzle D Ganzer U Heinrich N Müller KR 《Journal of computer-aided molecular design》2007,21(12):651-664
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error. 相似文献
15.
Maha Abdallah Alnuwaiser M. Faisal Javed M. Ijaz Khan M. Waqar Ahmed Ahmed M. Galal 《印度化学会志》2022,99(7):100538
The current study investigates the potential of well-known artificial neural network (ANN), Support vector regression (SVR), multilinear and multi-nonlinear regression techniques to predict total dissolve solids (TSO) and electrical conductivity (ECO), which are essential water quality indicators. To develop the anticipated models, seven effective parameters: Ca2+ Mg2+ Na+ Cl- SO42- HCO3- and pH were used as input variables. The external validation criteria were employed to address the modeling overfitting. The outcome of the study demonstrated a strong association between experimental and models predicted data. The coefficient of determination was 0.97, 0.96, 0.92, and 0.94 for SVR, ANN, MLR, and MNLR models, respectively. The lowest error value of 5.37 and 7.92 was attained by SVR model for training and testing data, respectively. Performance of the proposed techniques showed relative dominance of SVR compared to ANN, MLR and MNLR. Sensitivity analysis demonstrated that the HCO3- is the most sensitive parameter for both TSO and ECO followed by Cl- and SO42-. The models assessment on external criteria ensured generalized results. Conclusively, the outcome of the present research indicated that formulation of machine learning models for prediction of water quality parameters are cost effective and helpful in river water quality assessment, management and policy making. 相似文献
16.
The movement of ions across the cell membrane is an essential for many biological processes. This study is focused on ion channels and ion transporters (pumps) as types of border guards control the incessant traffic of ions across cell membranes. Ion channels and ion transporters function to regulate membrane potential and electrical signaling and play important roles in cell proliferation, migration, apoptosis, and differentiation. In their behaviors, it is found that ion channels differ significantly from ion transporters. Therefore, a method for automatically classifying ion transporters and ion channels from membrane proteins is proposed by training deep neural networks and using the position-specific scoring matrix profile as an input. The key of novelty is the three-stage approach, in which five techniques for data normalization are used; next three imbalanced data techniques are applied to the minority classes and then, six classifiers are compared with the proposed method. © 2019 Wiley Periodicals, Inc. 相似文献
17.
The calculation of contact-dependent secondary structure propensity (CSSP) has been reported to sensitively detect non-native β-strand propensities in the core sequences of amyloidogenic proteins. Here we describe a noble energy-based CSSP method implemented on dual artificial neural networks that rapidly and accurately estimate the potential for the non-native secondary structure formation in local regions of protein sequences. In this method, we attempted to quantify long-range interaction patterns in diverse secondary structures by potential energy calculations and decomposition on a pairwise per-residue basis. The calculated energy parameters and seven-residue sequence information were used as inputs for artificial neural networks (ANNs) to predict sequence potential for secondary structure conversion. The trained single ANN using the >(i, i ± 4) interaction energy parameter exhibited 74% accuracy in predicting the secondary structure of test sequences in their native energy state, while the dual ANN-based predictor using (i, i ± 4) and >(i, i ± 4) interaction energies showed 83% prediction accuracy. The present method provides a simple and accurate tool for predicting sequence potential for secondary structure conversions without using 3D structural information. 相似文献
18.
Engineering optimization is an actual goal in manufacturing and service industries. In the tutorial we represented the concept of traditional parametric estimation models (Factorial Design (FD) and Central Composite Design (CCD)) for searching optimal setting parameters of technological processes. Then the 2D mapping method based on Auto Associative Neural Networks (ANN) (particularly, the Feed Forward Bottle Neck Neural Network (FFBN NN)) was described in comparison with traditional methods. 相似文献
19.
C. Lucarelli P. Betto G. Ricciarello M. Giambenedetti F. Sciarra C. Tosti Croce P. L. Mottironi 《Chromatographia》1987,24(1):423-426
Summary A dual-step procedure for the rapid, quantitative isolation of free catecholamines (norepinephrine, epinephrine and dopamine)
from plasma, using a little column of CM-Sephadex and alumina adsorption, is described. Sensitive high performance liquid
chromatography is also discussed, employing an amperometric detector for the quantitative determination. The recovery of the
three catecholamines, and of N-methyldopamine used as the internal standard, was about 70–80%; the detection limits were 2pg
for norepinephrine, 3pg for epinephrine and 3pg for dopamine. The combination of the rather specific and easy to handle two-step
sample clean-up procedure, the high resolving power of the chromatography and the high sensitivity of electrochemical detection
provided a simple method for the determination of free catecholamines in plasma samples of normal and essential hypertensive
subjects under different conditions (supine position for 45 min, standing for 5 and 10 min). It was found that a significant
increase in epinephrine levels (P<0.01) occurred in hypertensive patients under the three conditions studied. 相似文献
20.
Prostate cancer (PCa) is the second most frequently diagnosed cancer for men and is viewed as the fifth leading cause of death worldwide. The body mass index (BMI) is taken as a vital criterion to elucidate the association between obesity and PCa. In this study, systematic methods are employed to investigate how obesity influences the noncutaneous malignancies of PCa. By comparing the core signaling pathways of lean and obese patients with PCa, we are able to investigate the relationships between obesity and pathogenic mechanisms and identify significant biomarkers as drug targets for drug discovery. Regarding drug design specifications, we take drug–target interaction, drug regulation ability, and drug toxicity into account. One deep neural network (DNN)-based drug–target interaction (DTI) model is trained in advance for predicting drug candidates based on the identified biomarkers. In terms of the application of the DNN-based DTI model and the consideration of drug design specifications, we suggest two potential multiple-molecule drugs to prevent PCa (covering lean and obese PCa) and obesity-specific PCa, respectively. The proposed multiple-molecule drugs (apigenin, digoxin, and orlistat) not only help to prevent PCa, suppressing malignant metastasis, but also result in lower production of fatty acids and cholesterol, especially for obesity-specific PCa. 相似文献