首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Improving the accuracy of predicting disulfide connectivity by feature selection
Authors:Lin Zhu  Jie Yang  Jiang‐Ning Song  Kuo‐Chen Chou  Hong‐Bin Shen
Institution:1. Department of Bioinformatics, Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai 200240, China;2. Department of Bioinformatics, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611‐0011, Japan;3. Department of Bioinformatics, Gordon Life Science Institute, 13784 Torrey Del Mar Drive, San Diego, California 92130
Abstract:Disulfide bonds are primary covalent cross‐links formed between two cysteine residues in the same or different protein polypeptide chains, which play important roles in the folding and stability of proteins. However, computational prediction of disulfide connectivity directly from protein primary sequences is challenging due to the nonlocal nature of disulfide bonds in the context of sequences, and the number of possible disulfide patterns grows exponentially when the number of cysteine residues increases. In the previous studies, disulfide connectivity prediction was usually performed in high‐dimensional feature space, which can cause a variety of problems in statistical learning, such as the dimension disaster, overfitting, and feature redundancy. In this study, we propose an efficient feature selection technique for analyzing the importance of each feature component. On the basis of this approach, we selected the most important features for predicting the connectivity pattern of intra‐chain disulfide bonds. Our results have shown that the high‐dimensional features contain redundant information, and the prediction performance can be further improved when these high‐dimensional features are reduced to a lower but more compact dimensional space. Our results also indicate that the global protein features contribute little to the formation and prediction of disulfide bonds, while the local sequential and structural information play important roles. All these findings provide important insights for structural studies of disulfide‐rich proteins. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010
Keywords:protein structure prediction  disulfide connectivity prediction  support vector machine  feature selection  Fisher score
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号