首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Enzyme classification using multiclass support vector machine and feature subset selection
Institution:1. Department of Computer Scienceing and Engineering, Silicon Institute of Technology, Silicon Hills, Patia, Bhubaneswar, 751024, India;2. School of Computer Engineering, KIIT University, Bhubaneswar, 751024, India;1. Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905, USA;1. SiSaf Ltd, Innovation Centre, Northern Ireland Science Park, Queen''s Island, Belfast, BT3 9DT, UK;2. Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF), Università di Palermo, Via Archirafi 32, 90123, Palermo, Italy;1. Dept. of Computing Science, ATH 3-55, University of Alberta, Edmonton, AB, Canada T6G 2E8;2. Robotics Institute, Smith Hall 211, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA;3. Dept. of Electrical & Computer Engineering and Centre of Intelligent Machines, McGill University, McConnell Engineering Building, Room 441, 3480 University Street, Montreal, QC, Canada H3A 2E9
Abstract:Proteins are the macromolecules responsible for almost all biological processes in a cell. With the availability of large number of protein sequences from different sequencing projects, the challenge with the scientist is to characterize their functions. As the wet lab methods are time consuming and expensive, many computational methods such as FASTA, PSI-BLAST, DNA microarray clustering, and Nearest Neighborhood classification on protein–protein interaction network have been proposed. Support vector machine is one such method that has been used successfully for several problems such as protein fold recognition, protein structure prediction etc. Cai et al. in 2003 have used SVM for classifying proteins into different functional classes and to predict their function. They used the physico-chemical properties of proteins to represent the protein sequences. In this paper a model comprising of feature subset selection followed by multiclass Support Vector Machine is proposed to determine the functional class of a newly generated protein sequence. To train and test the model for its performance, 32 physico-chemical properties of enzymes from 6 enzyme classes are considered. To determine the features that contribute significantly for functional classification, Sequential Forward Floating Selection (SFFS), Orthogonal Forward Selection (OFS), and SVM Recursive Feature Elimination (SVM-RFE) algorithms are used and it is observed that out of 32 properties considered initially, only 20 features are sufficient to classify the proteins into its functional classes with an accuracy ranging from 91% to 94%. On comparison it is seen that, OFS followed by SVM performs better than other methods. Our model generalizes the existing model to include multiclass classification and to identify most significant features affecting the protein function.
Keywords:Enzyme classification  Multiclass SVM  Sequential Forward Floating Selection (SFFS)  Orthogonal Forward Selection (OFS)  SVM Recursive Feature Elimination (SVM-RFE)  Random Forest (RF)
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号