首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Large deviation properties for patterns
Institution:1. LINA, CNRS UMR 6241, Université de Nantes, France;2. DYLISS-Inria team, Inria Rennes-Bretagne-Atlantique, France;3. AMIB-Inria team, LIX-Ecole Polytechnique, 91128 Palaiseau, France
Abstract:Deciding whether a given pattern is over- or under-represented according to a given background model is a key question in computational biology. Such a decision is usually made by computing some p-values reflecting the “exceptionality” of a pattern in a given sequence or set of sequences. In the simplest cases (short and simple patterns, simple background model, small number of sequences), an exact p-value can be computed with a tractable complexity. The realistic cases are in general too complicated to get such an exact p-value. Approximations are thus proposed (Gaussian, Poisson, Large deviation approximations). These approximations are applicable under some conditions: Gaussian approximations are valid in the central domain while Poisson and Large deviation approximations are valid for rare events. In the present paper, we prove a large deviation approximation to the double strands counting problem that refers to a counting of a given pattern in a set of sequences that arise from both strands of the genome. In that case, dependencies between a sequence and its reverse complement cannot be neglected. They are captured here for a Bernoulli model from general combinatorial properties of the pattern. A large deviation result is also provided for a set of small sequences.
Keywords:Pattern matching  Statistics  Large deviation
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号