首页 | 本学科首页   官方微博 | 高级检索  
     


Predicting structural class for protein sequences of 40% identity based on features of primary and secondary structure using Random Forest algorithm
Affiliation:1. Dharmsinh Desai University, Department of Computer Engineering, Faculty of Technology, D D University, Nadiad, 387001, India;2. Research and Development Center, Faculty of Technology, Dharmsinh Desai University, Nadiad, 387001, India
Abstract:At present, tertiary structure discovery growth rate is lagging far behind discovery of primary structure. The prediction of protein structural class using Machine Learning techniques can help reduce this gap. The Structural Classification of Protein – Extended (SCOPe 2.07) is latest and largest dataset available at present. The protein sequences with less than 40% identity to each other are used for predicting α, β, α/β and α + β SCOPe classes. The sensitive features are extracted from primary and secondary structure representations of Proteins. Features are extracted experimentally from secondary structure with respect to its frequency, pitch and spatial arrangements. Primary structure based features contain species information for a protein sequence. The species parameters are further validated with uniref100 dataset using TaxId. As it is known, protein tertiary structure is manifestation of function. Functional differences are observed in species. Hence, the species are expected to have strong correlations with structural class, which is discovered in current work. It enhances prediction accuracy by 7%–10%. The subset of SCOPe 2.07 is trained using 65 dimensional feature vector using Random Forest classifier. The test result for the rest of the set gives consistent accuracy of better than 95%. The accuracy achieved on benchmark datasets ASTRAL 1.73, 25PDB and FC699 is better than 86%, 91% and 97% respectively, which is best reported to our knowledge.
Keywords:Evolutionary  Low identity  Protein secondary structure  Protein structural class  Random Forest  Structural classification of proteins (SCOP)
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号