首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Proteins are one of the most important molecules that govern the cellular processes in most of the living organisms. Various functions of the proteins are of paramount importance to understand the basics of life. Several supervised learning approaches are applied in this field to predict the functionality of proteins. In this paper, we propose a convolutional neural network based approach ProtConv to predict the functionality of proteins by converting the amino-acid sequences to a two dimensional image. We have used a protein embedding technique using transfer learning to generate the feature vector. Feature vector is then converted into a square sized single channel image to be fed into a convolutional network. The neural network architecture used here is a combination of convolutional filters and average pooling layers followed by dense fully connected layers to predict a binary function. We have performed experiments on standard benchmark datasets taken from two very important protein function prediction task: proinflammatory cytokines and anticancer peptides. Our experiments show that the proposed method, ProtConv achieves state-of-the-art performances on both of the datasets. All necessary details about implementation with source code and datasets are made available at: https://github.com/swakkhar/ProtConv.  相似文献   

2.
In order to design molecular electronic devices with high performance and stability, it is crucial to understand their structure-to-property relationships. Single-molecule break junction measurements yield a large number of conductance-distance traces, which are inherently highly stochastic. Here we propose a weakly supervised deep learning algorithm to classify and segment these conductance traces, a method that is mainly based on transfer learning with the pretrain-finetune technique. By exploiting the powerful feature extraction capabilities of the pretrained VGG-16 network, our convolutional neural network model not only achieves high accuracy in the classification of the conductance traces, but also segments precisely the conductance plateau from an entire trace with very few manually labeled traces. Thus, we can produce a more reliable estimation of the junction conductance and quantify the junction stability. These findings show that our model has achieved a better accuracy-to-manpower efficiency balance, opening up the possibility of using weakly supervised deep learning approaches in the studies of single-molecule junctions.  相似文献   

3.
Feature extraction is essential for chemical property estimation of molecules using machine learning. Recently, graph neural networks have attracted attention for feature extraction from molecules. However, existing methods focus only on specific structural information, such as node relationship. In this paper, we propose a novel graph convolutional neural network that performs feature extraction with simultaneously considering multiple structures. Specifically, we propose feature extraction paths specialized in node, edge, and three-dimensional structures. Moreover, we propose an attention mechanism to aggregate the features extracted by the paths. The attention aggregation enables us to select useful features dynamically. The experimental results showed that the proposed method outperformed previous methods.  相似文献   

4.
5.
6.
Neural networks are rapidly gaining popularity in chemical modeling and Quantitative Structure–Activity Relationship (QSAR) thanks to their ability to handle multitask problems. However, outcomes of neural networks depend on the tuning of several hyperparameters, whose small variations can often strongly affect their performance. Hence, optimization is a fundamental step in training neural networks although, in many cases, it can be very expensive from a computational point of view. In this study, we compared four of the most widely used approaches for tuning hyperparameters, namely, grid search, random search, tree-structured Parzen estimator, and genetic algorithms on three multitask QSAR datasets. We mainly focused on parsimonious optimization and thus not only on the performance of neural networks, but also the computational time that was taken into account. Furthermore, since the optimization approaches do not directly provide information about the influence of hyperparameters, we applied experimental design strategies to determine their effects on the neural network performance. We found that genetic algorithms, tree-structured Parzen estimator, and random search require on average 0.08% of the hours required by grid search; in addition, tree-structured Parzen estimator and genetic algorithms provide better results than random search.  相似文献   

7.
具有体积小、功耗低、灵敏度高、硅工艺兼容性好等优点的金属氧化物半导体(MOS)气体传感器现已广泛地应用于军事、科研和国民经济的各个领域。然而MOS传感器的低选择性阻碍了其在物联网(IoT)时代的应用前景。为此,本文综述了解决MOS传感器选择性的研究进展,主要介绍了敏感材料性能提升、电子鼻和热调制三种改善MOS传感器选择性的技术方法,阐述了三种方法目前所存在的问题及其未来的发展趋势。同时,本文还对比介绍了机器嗅觉领域主流的主成分分析(PCA)、线性判别分析(LDA)和神经网络(NN)模式识别/机器学习算法。最后,本综述展望了具有数据降维、特征提取和鲁棒性识别分类性能的卷积神经网络(CNN)深度学习算法在气体识别领域的应用前景。基于敏感材料性能的提升、多种调制手段与阵列技术的结合以及人工智能(AI)领域深度学习算法的最新进展,将会极大地增强非选择性MOS传感器的挥发性有机化合物(VOCs)分子识别能力。  相似文献   

8.
9.
Mass spectrometry imaging (MSI) is widely used for the label-free molecular mapping of biological samples. The identification of co-localized molecules in MSI data is crucial to the understanding of biochemical pathways. One of key challenges in molecular colocalization is that complex MSI data are too large for manual annotation but too small for training deep neural networks. Herein, we introduce a self-supervised clustering approach based on contrastive learning, which shows an excellent performance in clustering of MSI data. We train a deep convolutional neural network (CNN) using MSI data from a single experiment without manual annotations to effectively learn high-level spatial features from ion images and classify them based on molecular colocalizations. We demonstrate that contrastive learning generates ion image representations that form well-resolved clusters. Subsequent self-labeling is used to fine-tune both the CNN encoder and linear classifier based on confidently classified ion images. This new approach enables autonomous and high-throughput identification of co-localized species in MSI data, which will dramatically expand the application of spatial lipidomics, metabolomics, and proteomics in biological research.

Contrastive learning is used to train a deep convolutional neural network to identify high-level features in mass spectrometry imaging data. These features enable self-supervised clustering of ion images without manual annotation.  相似文献   

10.
11.
杜卓锟  邵伟  秦伟捷 《色谱》2021,39(3):211-218
在基于液相色谱-质谱联用的蛋白质组学研究中,肽段的保留时间作为有效区分不同肽段的特征参数,可以根据肽段自身的序列等信息对其进行预测。使用预测得到的保留时间辅助质谱数据鉴定肽段序列可以提高鉴定的准确性,因此对保留时间预测的工作一直受到领域内的广泛关注。传统的保留时间预测方法通常是根据氨基酸序列计算肽段的理化性质,进而计算肽段在特定色谱条件下的保留时间。近年来,深度学习方法取得了极大的进展,在蛋白质组学研究中发挥着越来越重要的作用。目前已发展出了多种基于深度学习的保留时间预测方法,与传统的保留时间预测方法相比有着更高的准确度,易于跨平台使用,并且能对修饰肽段的保留时间进行预测。但对某些复杂的修饰,如糖基化修饰等的预测结果还不够准确。如何进一步提高对修饰肽段预测的准确性是基于深度学习的保留时间预测方法的重要研究方向。这些预测的保留时间被应用于肽段鉴定的质量控制和方法评估,以及与预测的二级质谱谱图结合,建立模拟谱图库等方面。该文综述了深度学习方法在保留时间预测领域的最新研究进展以及应用成果,同时对其发展趋势和未来的应用方向进行了展望,以期为保留时间预测研究以及蛋白质组鉴定工作提供参考。  相似文献   

12.
Drug-likeness prediction is important for the virtual screening of drug candidates. It is challenging because the drug-likeness is presumably associated with the whole set of necessary properties to pass through clinical trials, and thus no definite data for regression is available. Recently, binary classification models based on graph neural networks have been proposed but with strong dependency of their performances on the choice of the negative set for training. Here we propose a novel unsupervised learning model that requires only known drugs for training. We adopted a language model based on a recurrent neural network for unsupervised learning. It showed relatively consistent performance across different datasets, unlike such classification models. In addition, the unsupervised learning model provides drug-likeness scores that well separate distributions with increasing mean values in the order of datasets composed of molecules at a later step in a drug development process, whereas the classification model predicted a polarized distribution with two extreme values for all datasets presumably due to the overconfident prediction for unseen data. Thus, this new concept offers a pragmatic tool for drug-likeness scoring and further can be applied to other biochemical applications.

A new quantification method of drug-likeness based on unsupervised learning. The method only uses drug molecules as training set without any non-drug-like molecules.  相似文献   

13.
A variety of issues decide the efficiency of 3D QSAR methods, and their practical importance for drug design is still controversial. This refers both to the predictive ability and the possibility for the indication of these areas within 3D molecular representations that are responsible for biological or chemical effects. Technically, the latter comes down to the selection or elimination of the reliable variables during 3D QSAR modeling using the Partial Least-Squares (PLS) method. In this paper we used a series of benzoic acids to test the dependence between the predictive ability and variable selection performance of PLS with Iterative Variable Elimination (IVE-PLS) in the Comparative Molecular Surface Analysis (CoMSA) modeling of Hammett constant which correlates with the pKa values. Modeling this chemical effect allowed us to select the IVE-PLS variant that plots the contour maps indicating a carboxylic function, i.e., the region including the dissociation reaction center that determines the respective pKa values. In fact, it appeared that a novel robust IVE version is capable of the indication of the proper contour plots independent of the method used for the calculation of partial atomic charges (AM1 or Gasteiger-Marsili).  相似文献   

14.
15.
16.
17.
The aim of this study was to propose a QSAR modelling approach based on the combination of simple competitive learning (SCL) networks with radial basis function (RBF) neural networks for predicting the biological activity of chemical compounds. The proposed QSAR method consisted of two phases. In the first phase, an SCL network was applied to determine the centres of an RBF neural network. In the second phase, the RBF neural network was used to predict the biological activity of various phenols and Rho kinase (ROCK) inhibitors. The predictive ability of the proposed QSAR models was evaluated and compared with other QSAR models using external validation. The results of this study showed that the proposed QSAR modelling approach leads to better performances than other models in predicting the biological activity of chemical compounds. This indicated the efficiency of simple competitive learning networks in determining the centres of RBF neural networks.  相似文献   

18.
19.
自适应模糊偏最小二乘方法在药物构效关系建模中的应用   总被引:2,自引:0,他引:2  
作为一种局部逼近方法,自适应神经模糊推理系统(ANFIS)适于为药物定量构效关系(QSAR)建模。描述药物分子结构的参数较多,常存在耦合关系,会增加建模难度,并影响模型的预报性能。为此,将ANFIS和偏最小二乘(PLS)相结合,先由PLS从样本数据中提取成分,再由ANFIS实现每对成分间的非线性映射,并基于输出误差进一步修正所提取的成分,使之对因变量具有最优的解释能力,由此构建为EB-AFPLS方法。该法已成功地应用于HIV-1蛋白酶抑制剂的QSAR建模,效果良好,显示出很强的学习能力,所建模型的预报性能也优于其它方法。  相似文献   

20.
Precise information about protein locations in a cell facilitates in the understanding of the function of a protein and its interaction in the cellular environment. This information further helps in the study of the specific metabolic pathways and other biological processes. We propose an ensemble approach called "CE-PLoc" for predicting subcellular locations based on fusion of individual classifiers. The proposed approach utilizes features obtained from both dipeptide composition (DC) and amphiphilic pseudo amino acid composition (PseAAC) based feature extraction strategies. Different feature spaces are obtained by varying the dimensionality using PseAAC for a selected base learner. The performance of the individual learning mechanisms such as support vector machine, nearest neighbor, probabilistic neural network, covariant discriminant, which are trained using PseAAC based features is first analyzed. Classifiers are developed using same learning mechanism but trained on PseAAC based feature spaces of varying dimensions. These classifiers are combined through voting strategy and an improvement in prediction performance is achieved. Prediction performance is further enhanced by developing CE-PLoc through the combination of different learning mechanisms trained on both DC based feature space and PseAAC based feature spaces of varying dimensions. The predictive performance of proposed CE-PLoc is evaluated for two benchmark datasets of protein subcellular locations using accuracy, MCC, and Q-statistics. Using the jackknife test, prediction accuracies of 81.47 and 83.99% are obtained for 12 and 14 subcellular locations datasets, respectively. In case of independent dataset test, prediction accuracies are 87.04 and 87.33% for 12 and 14 class datasets, respectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号