Convolutional neural networks with image representation of amino acid sequences for protein function prediction
Affiliation:
1. Data Science Institute, NUI Galway, Galway, Ireland;2. Insight Centre for Data Analytics, NUI Galway, Galway, Ireland;1. DAIS, Università Ca’ Foscari di Venezia, Via Torino 155, Venezia Mestre 30172, Italy;2. AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Via Comelico 39, Milano 20135, Italy;3. ECLT, Università Ca’ Foscari di Venezia, San Marco 2940, Venezia 30124, Italy;4. Dipartimento di Management, Università Ca’ Foscari di Venezia, San Giobbe, Cannaregio 873, 30121 Venezia
Abstract:
Proteins are one of the most important molecules that govern the cellular processes in most of the living organisms. Various functions of the proteins are of paramount importance to understand the basics of life. Several supervised learning approaches are applied in this field to predict the functionality of proteins. In this paper, we propose a convolutional neural network based approach ProtConv to predict the functionality of proteins by converting the amino-acid sequences to a two dimensional image. We have used a protein embedding technique using transfer learning to generate the feature vector. Feature vector is then converted into a square sized single channel image to be fed into a convolutional network. The neural network architecture used here is a combination of convolutional filters and average pooling layers followed by dense fully connected layers to predict a binary function. We have performed experiments on standard benchmark datasets taken from two very important protein function prediction task: proinflammatory cytokines and anticancer peptides. Our experiments show that the proposed method, ProtConv achieves state-of-the-art performances on both of the datasets. All necessary details about implementation with source code and datasets are made available at: https://github.com/swakkhar/ProtConv.