共查询到20条相似文献,搜索用时 15 毫秒
1.
David J. Nott Zeming Yu Chris Cotsapas Jeremy Pulvers Peter Little 《Journal of multivariate analysis》2007,98(4):852-872
Hierarchical and empirical Bayes approaches to inference are attractive for data arising from microarray gene expression studies because of their ability to borrow strength across genes in making inferences. Here we focus on the simplest case where we have data from replicated two colour arrays which compare two samples and where we wish to decide which genes are differentially expressed and obtain estimates of operating characteristics such as false discovery rates. The purpose of this paper is to examine the frequentist performance of Bayesian variable selection approaches to this problem for different prior specifications and to examine the effect on inference of commonly used empirical Bayes approximations to hierarchical Bayes procedures. The paper makes three main contributions. First, we describe how the log odds of differential expression can usually be computed analytically in the case where a double tailed exponential prior is used for gene effects rather than a normal prior, which gives an alternative to the commonly used B-statistic for ranking genes in simple comparative experiments. The second contribution of the paper is to compare empirical Bayes procedures for detecting differential expression with hierarchical Bayes methods which account for uncertainty in prior hyperparameters to examine how much is lost in using the commonly employed empirical Bayes approximations. Third, we describe an efficient MCMC scheme for carrying out the computations required for the hierarchical Bayes procedures. Comparisons are made via simulation studies where the simulated data are obtained by fitting models to some real microarray data sets. The results have implications for analysis of microarray data using parametric hierarchical and empirical Bayes methods for more complex experimental designs: generally we find that the empirical Bayes methods work well, which supports their use in the analysis of more complex experiments when a full hierarchical Bayes analysis would impose heavy computational demands. 相似文献
2.
Harro Walk 《Journal of multivariate analysis》2008,99(6):1035-1050
For kn-nearest neighbor estimates of a regression Y on X (d-dimensional random vector X, integrable real random variable Y) based on observed independent copies of (X,Y), strong universal pointwise consistency is shown, i.e., strong consistency PX-almost everywhere for general distribution of (X,Y). With tie-breaking by indices, this means validity of a universal strong law of large numbers for conditional expectations E(Y|X=x). 相似文献
3.
M.S. Srivastava 《Journal of multivariate analysis》2005,96(1):55-72
This paper considers the estimation of the mean vector θ of a p-variate normal distribution with unknown covariance matrix Σ when it is suspected that for a p×r known matrix B the hypothesis θ=Bη, η∈Rr may hold. We consider empirical Bayes estimators which includes (i) the unrestricted unbiased (UE) estimator, namely, the sample mean vector (ii) the restricted estimator (RE) which is obtained when the hypothesis θ=Bη holds (iii) the preliminary test estimator (PTE), (iv) the James-Stein estimator (JSE), and (v) the positive-rule Stein estimator (PRSE). The biases and the risks under the squared loss function are evaluated for all the five estimators and compared. The numerical computations show that PRSE is the best among all the five estimators even when the hypothesis θ=Bη is true. 相似文献
4.
5.
Hirokazu Yanagihara 《Journal of multivariate analysis》2006,97(5):1070-1089
This paper deals with the bias reduction of Akaike information criterion (AIC) for selecting variables in multivariate normal linear regression models when the true distribution of observation is an unknown nonnormal distribution. We propose a corrected version of AIC which is partially constructed by the jackknife method and is adjusted to the exact unbiased estimator of the risk when the candidate model includes the true model. It is pointed out that the influence of nonnormality in the bias of our criterion is smaller than the ones in AIC and TIC. We verify that our criterion is better than the AIC, TIC and EIC by conducting numerical experiments. 相似文献
6.
Tao WangLixing Zhu 《Journal of multivariate analysis》2011,102(7):1141-1151
An exhaustive search as required for traditional variable selection methods is impractical in high dimensional statistical modeling. Thus, to conduct variable selection, various forms of penalized estimators with good statistical and computational properties, have been proposed during the past two decades. The attractive properties of these shrinkage and selection estimators, however, depend critically on the size of regularization which controls model complexity. In this paper, we consider the problem of consistent tuning parameter selection in high dimensional sparse linear regression where the dimension of the predictor vector is larger than the size of the sample. First, we propose a family of high dimensional Bayesian Information Criteria (HBIC), and then investigate the selection consistency, extending the results of the extended Bayesian Information Criterion (EBIC), in Chen and Chen (2008) to ultra-high dimensional situations. Second, we develop a two-step procedure, the SIS+AENET, to conduct variable selection in p>n situations. The consistency of tuning parameter selection is established under fairly mild technical conditions. Simulation studies are presented to confirm theoretical findings, and an empirical example is given to illustrate the use in the internet advertising data. 相似文献
7.
Inference on the largest mean of a multivariate normal distribution is a surprisingly difficult and unexplored topic. Difficulties arise when two or more of the means are simultaneously the largest mean. Our proposed solution is based on an extension of R.A. Fisher’s fiducial inference methods termed generalized fiducial inference. We use a model selection technique along with the generalized fiducial distribution to allow for equal largest means and alleviate the overestimation that commonly occurs. Our proposed confidence intervals for the largest mean have asymptotically correct frequentist coverage and simulation results suggest that they possess promising small sample empirical properties. In addition to the theoretical calculations and simulations we also applied this approach to the air quality index of the four largest cities in the northeastern United States (Baltimore, Boston, New York, and Philadelphia). 相似文献
8.
In the model of sequential order statistics, prior distributions are considered for the model parameters, which, for example, describe increasing load put on remaining components. Gamma priors are examined as well as priors out of a class of extended truncated Erlang distributions (ETED), which is introduced along with some properties. The choice of independent priors in both set-ups leads to respective independent, conjugate posterior distributions for the model parameters of sequential order statistics. Since, in practical applications, the model parameters will often be increasingly ordered, a multivariate prior is applied being the joint distribution of common ETED-order statistics. Whatever baseline distribution of the sequential order statistics is chosen, the joint posterior distribution turns out to be a Weinman multivariate exponential distribution. Posterior moments are given explicitly, and HPD credible sets for the model parameters are stated. 相似文献
9.
For two multivariate normal populations with unequal covariance matrices, a procedure is developed for testing the equality of the mean vectors based on the concept of generalized p-values. The generalized p-values we have developed are functions of the sufficient statistics. The computation of the generalized p-values is discussed and illustrated with an example. Numerical results show that one of our generalized p-value test has a type I error probability not exceeding the nominal level. A formula involving only a finite number of chi-square random variables is provided for computing this generalized p-value. The formula is useful in a Bayesian solution as well. The problem of constructing a confidence region for the difference between the mean vectors is also addressed using the concept of generalized confidence regions. Finally, using the generalized p-value approach, a solution is developed for the heteroscedastic MANOVA problem. 相似文献
10.
Y. Wu 《Journal of multivariate analysis》2008,99(9):2154-2171
In this paper, an information-based criterion is proposed for carrying out change point analysis and variable selection simultaneously in linear models with a possible change point. Under some weak conditions, this criterion is shown to be strongly consistent in the sense that with probability one, it chooses the smallest true model for large n. Its byproducts include strongly consistent estimates of the regression coefficients regardless if there is a change point. In case that there is a change point, its byproducts also include a strongly consistent estimate of the change point parameter. In addition, an algorithm is given which has significantly reduced the computation time needed by the proposed criterion for the same precision. Results from a simulation study are also presented. 相似文献
11.
Guy Martial Nkiet 《Journal of multivariate analysis》2012,105(1):151-163
We propose a criterion for variable selection in discriminant analysis. This criterion permits to arrange the variables in decreasing order of adequacy for discrimination, so that the variable selection problem reduces to that of the estimation of suitable permutation and dimensionality. Then, estimators for these parameters are proposed and the resulting method for selecting variables is shown to be consistent. In a simulation study, we compute proportions of correct classification after variable selection in order to gain understanding of the performance of our proposal and to compare it to existing methods. 相似文献
12.
Takayuki Yamada 《Statistics & probability letters》2012,82(3):692-698
This paper is concerned with the testing problem of generalized multivariate linear hypothesis for the mean in the growth curve model(GMANOVA). Our interest is the case in which the number of the observed points p is relatively large compared to the sample size N. Asymptotic expansions of the non-null distributions of the likelihood ratio criterion, Lawley-Hotelling’s trace criterion and Bartlett-Nanda-Pillai’s trace criterion are derived under the asymptotic framework that N and p go to infinity together, while p/N→c∈(0,1). It also can be confirmed that Rothenberg’s condition on the magnitude of the asymptotic powers of the three tests is valid when p is relatively large, theoretically and numerically. 相似文献
13.
A method for constructing priors is proposed that allows the off-diagonal elements of the concentration matrix of Gaussian data to be zero. The priors have the property that the marginal prior distribution of the number of nonzero off-diagonal elements of the concentration matrix (referred to below as model size) can be specified flexibly. The priors have normalizing constants for each model size, rather than for each model, giving a tractable number of normalizing constants that need to be estimated. The article shows how to estimate the normalizing constants using Markov chain Monte Carlo simulation and supersedes the method of Wong et al. (2003) [24] because it is more accurate and more general. The method is applied to two examples. The first is a mixture of constrained Wisharts. The second is from Wong et al. (2003) [24] and decomposes the concentration matrix into a function of partial correlations and conditional variances using a mixture distribution on the matrix of partial correlations. The approach detects structural zeros in the concentration matrix and estimates the covariance matrix parsimoniously if the concentration matrix is sparse. 相似文献
14.
One of the most powerful algorithms for maximum likelihood estimation for many incomplete-data problems is the EM algorithm. The restricted EM algorithm for maximum likelihood estimation under linear restrictions on the parameters has been handled by Kim and Taylor (J. Amer. Statist. Assoc. 430 (1995) 708-716). This paper proposes an EM algorithm for maximum likelihood estimation under inequality restrictions A0β?0, where β is the parameter vector in a linear model W=Xβ+ε and ε is an error variable distributed normally with mean zero and a known or unknown variance matrix Σ>0. Some convergence properties of the EM sequence are discussed. Furthermore, we consider the consistency of the restricted EM estimator and a related testing problem. 相似文献
15.
In this paper, we consider the general growth curve model with multivariate random effects covariance structure and provide a new simple estimator for the parameters of interest. This estimator is not only convenient for testing the hypothesis on the corresponding parameters, but also has higher efficiency than the least-square estimator and the improved two-stage estimator obtained by Rao under certain conditions. Moreover, we obtain the necessary and sufficient condition for the new estimator to be identical to the best linear unbiased estimator. Examples of its application are given. 相似文献
16.
Jesse Frey 《Journal of multivariate analysis》2012,103(1):48-57
Goodness-of-fit tests allow one to conclude that k possible outcomes are not equally likely. In this paper, we develop an exact equivalence test that allows one to conclude that k possible outcomes are approximately equally likely. We show that the power properties of the test compare favorably to those of possible alternative tests, and we develop an associated simultaneous confidence interval procedure. We apply the test to data sets on the digits of π, winning roulette numbers, and winning numbers from the Pennsylvania Lottery. 相似文献
17.
Reduced-rank restrictions can add useful parsimony to coefficient matrices of multivariate models, but their use is limited by the daunting complexity of the methods and their theory. The present work takes the easy road, focusing on unifying themes and simplified methods. For Gaussian and non-Gaussian (GLM, GAM, mixed normal, etc.) multivariate models, the present work gives a unified, explicit theory for the general asymptotic (normal) distribution of maximum likelihood estimators (MLE). MLE can be complex and computationally hard, but we show a strong asymptotic equivalence between MLE and a relatively simple minimum (Mahalanobis) distance estimator. The latter method yields particularly simple tests of rank, and we describe its asymptotic behavior in detail. We also examine the method's performance in simulation and via analytical and empirical examples. 相似文献
18.
On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding 总被引:1,自引:0,他引:1
We study the distributions of the LASSO, SCAD, and thresholding estimators, in finite samples and in the large-sample limit. The asymptotic distributions are derived for both the case where the estimators are tuned to perform consistent model selection and for the case where the estimators are tuned to perform conservative model selection. Our findings complement those of Knight and Fu [K. Knight, W. Fu, Asymptotics for lasso-type estimators, Annals of Statistics 28 (2000) 1356–1378] and Fan and Li [J. Fan, R. Li, Variable selection via non-concave penalized likelihood and its oracle properties, Journal of the American Statistical Association 96 (2001) 1348–1360]. We show that the distributions are typically highly non-normal regardless of how the estimator is tuned, and that this property persists in large samples. The uniform convergence rate of these estimators is also obtained, and is shown to be slower than n−1/2 in case the estimator is tuned to perform consistent model selection. An impossibility result regarding estimation of the estimators’ distribution function is also provided. 相似文献
19.
María Teresa Gallegos 《Journal of multivariate analysis》2006,97(5):1221-1250
Recently, we proposed variants as a statistical model for treating ambiguity. If data are extracted from an object with a machine then it might not be able to give a unique safe answer due to ambiguity about the correct interpretation of the object. On the other hand, the machine is often able to produce a finite number of alternative feature sets (of the same object) that contain the desired one. We call these feature sets variants of the object. Data sets that contain variants may be analyzed by means of statistical methods and all chapters of multivariate analysis can be seen in the light of variants. In this communication, we focus on point estimation in the presence of variants and outliers. Besides robust parameter estimation, this task requires also selecting the regular objects and their valid feature sets (regular variants). We determine the mixed MAP-ML estimator for a model with spurious variants and outliers as well as estimators based on the integrated likelihood. We also prove asymptotic results which show that the estimators are nearly consistent.The problem of variant selection turns out to be computationally hard; therefore, we also design algorithms for efficient approximation. We finally demonstrate their efficacy with a simulated data set and a real data set from genetics. 相似文献
20.
The asymptotic distribution of the quasi-maximum likelihood (QML) estimator is established for generalized autoregressive conditional heteroskedastic (GARCH) processes, when the true parameter may have zero coefficients. This asymptotic distribution is the projection of a normal vector distribution onto a convex cone. The results are derived under mild conditions. For an important subclass of models, no moment condition is imposed on the GARCH process. The main practical implication of these results concerns the estimation of overidentified GARCH models. 相似文献