期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Estimating the number of zero-one multi-way tables via sequential importance sampling

Jing Xi Ruriko Yoshida David Haws 《Annals of the Institute of Statistical Mathematics》2013,65(4):763-783

In 2005, Chen et al. introduced a sequential importance sampling (SIS) procedure to analyze zero-one two-way tables with given fixed marginal sums (row and column sums) via the conditional Poisson (CP) distribution. They showed that compared with Monte Carlo Markov chain (MCMC)-based approaches, their importance sampling method is more efficient in terms of running time and also provides an easy and accurate estimate of the total number of contingency tables with fixed marginal sums. In this paper, we extend their result to zero-one multi-way ( $d$ -way, $d \ge 2$ ) contingency tables under the no $d$ -way interaction model, i.e., with fixed $d-1$ marginal sums. Also, we show by simulations that the SIS procedure with CP distribution to estimate the number of zero-one three-way tables under the no three-way interaction model given marginal sums works very well even with some rejections. We also applied our method to Samson’s monks data set. 相似文献

2.

Dynamic Markov Bases

Adrian Dobra 《Journal of computational and graphical statistics》2013,22(2):496-517

This article presents a computational approach for generating Markov bases for multiway contingency tables whose cell counts might be constrained by fixed marginals and by lower and upper bounds. Our framework includes tables with structural zeros as a particular case. Instead of computing the entire Markov bases in an initial step, our framework finds sets of local moves that connect each table in the reference set with a set of neighbor tables. We construct a Markov chain on the reference set of tables that requires only a set of local moves at each iteration. The union of these sets of local moves forms a dynamic Markov basis. We illustrate the practicality of our algorithms in the estimation of exact p-values for a three-way table with structural zeros and a sparse eight-way table. This article has online supplementary materials. 相似文献

3.

Markov bases and subbases for bounded contingency tables

Fabio Rapallo Ruriko Yoshida 《Annals of the Institute of Statistical Mathematics》2010,62(4):785-805

In this paper we study the computation of Markov bases for contingency tables whose cell entries have an upper bound. It is known that in this case one has to compute universal Gröbner bases, and this is often infeasible also in small- and medium-sized problems. Here we focus on bounded two-way contingency tables under independence model. We show that when these bounds on cells are positive the set of basic moves of all 2 × 2 minors connects all tables with given margins. We also give some results about bounded incomplete table and we conclude with an open problem on the necessary and sufficient condition on the set of structural zeros so that the set of basic moves of all 2 × 2 minors connects all incomplete contingency tables with given positive margins. 相似文献

4.

混合狄雷克利分布和高维表的贝叶斯估计

陈拥君张尧庭《应用数学》1996,9(4):480-484

本文讨论多项分布情况下的高维列联表使用混合狄雷克利分布为先验分布时，贝叶斯估计的表达，以及独立性条件的表述．将文献［４］和［５］的结论推广到高维列联表中．相似文献

5.

特征标表各列零点个数不超过2个的有限群 总被引：1，自引：0，他引：1

徐海静张广祥陈贵云《数学年刊A辑(中文版)》2013,34(5):513-520

继续考虑特征标的零点对有限群结构的影响, 并给出了特征标表中每列至多有两个零点的有限群的分类,从而完成了特征标表中每列至多 $p$ ($p$是群的阶的最小素因子)个零点的有限群的完全分类. 相似文献

6.

Sampling contingency tables

Martin Dyer Ravi Kannan John Mount 《Random Structures and Algorithms》1997,10(4):487-506

We give polynomial time algorithms for random sampling from a set of contingency tables, which is the set of m×n matrices with given row and column sums, provided the row and column sums are sufficiently large with respect to m, n. We use this to approximately count the number of such matrices. These problems are of interest in Statistics and Combinatorics. © 1997 John Wiley & Sons, Inc. Random Struct. Alg., 10 , 487–506, 1997 相似文献

7.

Hybrid schemes for exact conditional inference in discrete exponential families

David Kahle Ruriko Yoshida Luis Garcia-Puente 《Annals of the Institute of Statistical Mathematics》2018,70(5):983-1011

Exact conditional goodness-of-fit tests for discrete exponential family models can be conducted via Monte Carlo estimation of p values by sampling from the conditional distribution of multiway contingency tables. The two most popular methods for such sampling are Markov chain Monte Carlo (MCMC) and sequential importance sampling (SIS). In this work we consider various ways to hybridize the two schemes and propose one standout strategy as a good general purpose method for conducting inference. The proposed method runs many parallel chains initialized at SIS samples across the fiber. When a Markov basis is unavailable, the proposed scheme uses a lattice basis with intermittent SIS proposals to guarantee irreducibility and asymptotic unbiasedness. The scheme alleviates many of the challenges faced by the MCMC and SIS schemes individually while largely retaining their strengths. It also provides diagnostics that guide and lend credibility to the procedure. Simulations demonstrate the viability of the approach. 相似文献

8.

A markov chain sampler for contingency table exact inference

Ao Yuan Yimin Yang 《Computational Statistics》2005,20(1):63-80

Summary In the inference of contingency table, when the cell counts are not large enough for asymptotic approximation, conditioning exact method is used and often computationally impractical for large tables. Instead, various sampling methods can be used. Based on permutation, the Monte Carlo sampling may become again impractical for large tables. For this, existing the Markov chain method is to sample a few elements of the table at each iteration and is inefficient. Here we consider a Markov chain, in which a sub-table of user specified size is updated at each iteration, and it achieves high sampling efficiency. Some theoretical properties of the chain and its applications to some commonly used tables are discussed. As an illustration, this method is applied to the exact test of the Hardy-Weinberg equilibrium in the population genetics context. 相似文献

9.

Bayesian Estimation of Discrete Multivariate Latent Structure Models With Structural Zeros

Daniel Manrique-Vallier Jerome P. Reiter 《Journal of computational and graphical statistics》2013,22(4):1061-1079

In multivariate categorical data, models based on conditional independence assumptions, such as latent class models, offer efficient estimation of complex dependencies. However, Bayesian versions of latent structure models for categorical data typically do not appropriately handle impossible combinations of variables, also known as structural zeros. Allowing nonzero probability for impossible combinations results in inaccurate estimates of joint and conditional probabilities, even for feasible combinations. We present an approach for estimating posterior distributions in Bayesian latent structure models with potentially many structural zeros. The basic idea is to treat the observed data as a truncated sample from an augmented dataset, thereby allowing us to exploit the conditional independence assumptions for computational expediency. As part of the approach, we develop an algorithm for collapsing a large set of structural zero combinations into a much smaller set of disjoint marginal conditions, which speeds up computation. We apply the approach to sample from a semiparametric version of the latent class model with structural zeros in the context of a key issue faced by national statistical agencies seeking to disseminate confidential data to the public: estimating the number of records in a sample that are unique in the population on a set of publicly available categorical variables. The latent class model offers remarkably accurate estimates of population uniqueness, even in the presence of a large number of structural zeros. 相似文献

10.

On Markov chain Monte Carlo Algorithms for Computing Conditional Expectations Based on Sufficient Statistics

《Journal of computational and graphical statistics》2013,22(3):660-677

Much work has focused on developing exact tests for the analysis of discrete data using log linear or logistic regression models. A parametric model is tested for a dataset by conditioning on the value of a sufficient statistic and determining the probability of obtaining another dataset as extreme or more extreme relative to the general model, where extremeness is determined by the value of a test statistic such as the chi-square or the log-likelihood ratio. Exact determination of these probabilities can be infeasible for high dimensional problems, and asymptotic approximations to them are often inaccurate when there are small data entries and/or there are many nuisance parameters. In these cases Monte Carlo methods can be used to estimate exact probabilities by randomly generating datasets (tables) that match the sufficient statistic of the original table. However, naive Monte Carlo methods produce tables that are usually far from matching the sufficient statistic. The Markov chain Monte Carlo method used in this work (the regression/attraction approach) uses attraction to concentrate the distribution around the set of tables that match the sufficient statistic, and uses regression to take advantage of information in tables that “almost” match. It is also more general than others in that it does not require the sufficient statistic to be linear, and it can be adapted to problems involving continuous variables. The method is applied to several high dimensional settings including four-way tables with a model of no four-way interaction, and a table of continuous data based on beta distributions. It is powerful enough to deal with the difficult problem of four-way tables and flexible enough to handle continuous data with a nonlinear sufficient statistic. 相似文献

11.

Markov bases for two-way subtable sum problems

Hisayuki Hara Ruriko Yoshida 《Journal of Pure and Applied Algebra》2009,213(8):1507-1521

It is well known that for two-way contingency tables with fixed row sums and column sums the set of square-free moves of degree two forms a Markov basis. However when we impose an additional constraint that the sum of cell counts in a subtable is also fixed, then these moves do not necessarily form a Markov basis. Thus, in this paper, we show a necessary and sufficient condition on a subtable so that the set of square-free moves of degree two forms a Markov basis. 相似文献

12.

A structure theorem on bivariate positive quadrant dependent distributions and tests for independence in two-way contingency tables

《Journal of multivariate analysis》1987,23(1):93-118

In this paper, the set of all bivariate positive quadrant dependent distributions with fixed marginals is shown to be compact and convex. Extreme points of this convex set are enumerated in some specific examples. Applications are given in testing the hypothesis of independence against strict positive quadrant dependence in the context of ordinal contingency tables. The performance of two tests, one of which is based on eigenvalues of a random matrix, is compared. Various procedures based upon certain functions of the eigenvalues of a random matrix are also proposed for testing for independence in a two-way contingency table when the marginals are random. 相似文献

13.

Profiting from correlations: Adjusted estimators for categorical data

Tobias Niebuhr Mathias Trabs 《商业与工业应用随机模型》2019,35(4):1090-1102

To take sample biases and skewness in the observations into account, practitioners frequently weight their observations according to some marginal distribution. The present paper demonstrates that such weighting can indeed improve the estimation. Studying contingency tables, estimators for marginal distributions are proposed under the assumption that another marginal is known. It is shown that the weighted estimators have a strictly smaller asymptotic variance whenever the two marginals are correlated. The finite sample performance is illustrated in a simulation study. As an application to traffic accident data the method allows for correcting a well‐known bias in the observed injury severity distribution. 相似文献

14.

Sampling for Conditional Inference on Contingency Tables

Robert D. Eisinger Yuguo Chen 《Journal of computational and graphical statistics》2017,26(1):79-87

We propose new sequential importance sampling methods for sampling contingency tables with given margins. The proposal for each method is based on asymptotic approximations to the number of tables with fixed margins. These methods generate tables that are very close to the uniform distribution. The tables, along with their importance weights, can be used to approximate the null distribution of test statistics and calculate the total number of tables. We apply the methods to a number of examples and demonstrate an improvement over other methods in a variety of real problems. Supplementary materials are available online. 相似文献

15.

分层关联表中伞形序对无约束问题的检验

下载免费PDF全文

冯艳钦邱小霞马成刚《数学物理学报(A辑)》2009,29(5):1283-1290

概率分布间的随机序是应用概率论与统计推断中的一个重要概念. 基于交叉分类数据的趋势检验问题已被广泛地研究, 并且分层关联表广泛存在于实践中. 似然比检验方法常用于涉及随机序约束问题的检验. 对带序约束的分层关联表, 该文介绍了一种不基于模型假定的似然比检验方法, 并且给出了检验统计量的极限分布. 相似文献

16.

A generalization of the zero–one principle for sorting algorithms

Dorothea Wagner 《Discrete Applied Mathematics》1991,30(2-3):265-273

In this paper a new general approach for the so-called “zero-one principle” for sorting algorithms is described. A theorem from propositional logic that states the connection between two-valued logic and many-valued logic is used to prove this zero-one principle. As a consequence a zero-one principle for a more general class of sorting algorithms is derived. 相似文献

17.

Mutual information,phi-squared and model-based co-clustering for contingency tables

Gérard Govaert Mohamed Nadif 《Advances in Data Analysis and Classification》2018,12(3):455-488

Many of the datasets encountered in statistics are two-dimensional in nature and can be represented by a matrix. Classical clustering procedures seek to construct separately an optimal partition of rows or, sometimes, of columns. In contrast, co-clustering methods cluster the rows and the columns simultaneously and organize the data into homogeneous blocks (after suitable permutations). Methods of this kind have practical importance in a wide variety of applications such as document clustering, where data are typically organized in two-way contingency tables. Our goal is to offer coherent frameworks for understanding some existing criteria and algorithms for co-clustering contingency tables, and to propose new ones. We look at two different frameworks for the problem of co-clustering. The first involves minimizing an objective function based on measures of association and in particular on phi-squared and mutual information. The second uses a model-based co-clustering approach, and we consider two models: the block model and the latent block model. We establish connections between different approaches, criteria and algorithms, and we highlight a number of implicit assumptions in some commonly used algorithms. Our contribution is illustrated by numerical experiments on simulated and real-case datasets that show the relevance of the presented methods in the document clustering field. 相似文献

18.

Confidence Circles For Correspondence Analysis Using Orthogonal Polynomials

Eric J. Beh 《Journal of Applied Mathematics and Decision Sciences》2001,5(1):35-45

An alternative approach to classical correspondence analysis was developed in [3] and involves decomposing the matrix of Pearson contingencies of a contingency table using orthogonal polynomials rather than via singular value decomposition. It is especially useful in analysing contingency tables which are of an ordinal nature. This short paper demonstrates that the confidence circles of Lebart, Morineau and Warwick (1984) for the classical approach can be applied to ordinal correspondence analysis. The advantage of the circles in analysing a contingency table is that the researcher can graphically identify the row and column categories that contribute or not to the hypothesis of independence. 相似文献

19.

Testing for Ordered Trends of Binary Responses between Contingency Tables

Chul Gyu Park 《Journal of multivariate analysis》2002,81(2):229

In this article, likelihood ratio tests (LRTs) are developed for detecting that stochastic trends of binary responses are ordered between 2×k contingency tables. We provide a simple iterative algorithm for the maximum likelihood estimators under the order restriction and construct the LRTs using those estimators. All the distributional results of these tests are based on the large sampling theory. The finite-sample behaviors of these tests are investigated through a simulation study. As an illustration of these tests, we analyze a set of data on wheeziness of smoking coalminers. 相似文献

20.

Improved bounds for sampling contingency tables

Ben J. Morris 《Random Structures and Algorithms》2002,21(2):135-146

We study the problem of sampling contingency tables (nonnegative integer matrices with specified row and column sums) uniformly at random. We give an algorithm which runs in polynomial time provided that the row sums r_i and the column sums c_j satisfy r_i = Ω(n^3/2m log m), and c_j = Ω(m^3/2n log n). This algorithm is based on a reduction to continuous sampling from a convex set. The same approach was taken by Dyer, Kannan, and Mount in previous work. However, the algorithm we present is simpler and has weaker requirements on the row and column sums. © 2002 Wiley Periodicals, Inc. Random Struct. Alg., 21: 135–146, 2002 相似文献