Significance analysis and statistical mechanics: an application to clustering |
| |
Authors: | ?uksza Marta L?ssig Michael Berg Johannes |
| |
Affiliation: | Max Planck Institute for Molecular Genetics, Ihnestra?e 63-73, 14195 Berlin, Germany. |
| |
Abstract: | This Letter addresses the statistical significance of structures in random data: given a set of vectors and a measure of mutual similarity, how likely is it that a subset of these vectors forms a cluster with enhanced similarity among its elements? The computation of this cluster p value for randomly distributed vectors is mapped onto a well-defined problem of statistical mechanics. We solve this problem analytically, establishing a connection between the physics of quenched disorder and multiple-testing statistics in clustering and related problems. In an application to gene expression data, we find a remarkable link between the statistical significance of a cluster and the functional relationships between its genes. |
| |
Keywords: | |
本文献已被 PubMed 等数据库收录! |
|