The combinatorics and extreme value statistics of protein threading |
| |
Authors: | John L. Spouge Aron Marchler-Bauer Stephen Bryant |
| |
Affiliation: | (1) National Center for Biotechnology Information, National Library of Medicine, 20894 Bethesda, MD, USA |
| |
Abstract: | In protein threading, one is given a protein sequence, together with a database of protein core structures that may contain the natural structure of the sequence. The object of protein threading is to correctly identify the structure(s) corresponding to the sequence. Since the core structures are already associated with specific biological functions, threading has the potential to provide biologists with useful insights about the function of a newly discovered protein sequence. Statistical tests for threading results based on the theory of extreme values suggest several combinatorial problems. For example, what is the number of waysm′=# t {L i >x i } i =0n of choosing a sequence {X i } i =1n from the set {1, 2, ...,t}, subject to the difference constraints {L i =X i+1?X i >x i } i =0n , whereX 0=0,X n+1=t+1, and {x i } i =0n is an arbitrary sequence of integers? The quantitym′ has many attractive combinatorial interpretations and reduces in special continuous limits to a probabilistic formula discovered by the Finetti. Just as many important probabilities can be derived from de Finetti's formula, many interesting combinatorial quantities can be derived fromm′. Empirical results presented here show that the combinatorial approach to threading statistics appears promising, but that structural periodicities in proteins and energetically unimportant structure elements probably introduce statistical correlations that must be better understood. |
| |
Keywords: | 62E20 62E25 92C40 |
本文献已被 SpringerLink 等数据库收录! |
|