期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bias correction of cross-validation criterion based on Kullback-Leibler information under a general condition

Hirokazu Yanagihara Tetsuji Tonda 《Journal of multivariate analysis》2006,97(9):1965-1975

This paper deals with the bias correction of the cross-validation (CV) criterion to estimate the predictive Kullback-Leibler information. A bias-corrected CV criterion is proposed by replacing the ordinary maximum likelihood estimator with the maximizer of the adjusted log-likelihood function. The adjustment is just slight and simple, but the improvement of the bias is remarkable. The bias of the ordinary CV criterion is O(n^-1), but that of the bias-corrected CV criterion is O(n^-2). We verify that our criterion has smaller bias than the AIC, TIC, EIC and the ordinary CV criterion by numerical experiments. 相似文献

2.

Selecting mixed-effects models based on a generalized information criterion

Wenji Pu Xu-Feng Niu 《Journal of multivariate analysis》2006,97(3):733-758

The generalized information criterion (GIC) proposed by Rao and Wu [A strongly consistent procedure for model selection in a regression problem, Biometrika 76 (1989) 369-374] is a generalization of Akaike's information criterion (AIC) and the Bayesian information criterion (BIC). In this paper, we extend the GIC to select linear mixed-effects models that are widely applied in analyzing longitudinal data. The procedure for selecting fixed effects and random effects based on the extended GIC is provided. The asymptotic behavior of the extended GIC method for selecting fixed effects is studied. We prove that, under mild conditions, the selection procedure is asymptotically loss efficient regardless of the existence of a true model and consistent if a true model exists. A simulation study is carried out to empirically evaluate the performance of the extended GIC procedure. The results from the simulation show that if the signal-to-noise ratio is moderate or high, the percentages of choosing the correct fixed effects by the GIC procedure are close to one for finite samples, while the procedure performs relatively poorly when it is used to select random effects. 相似文献

3.

Bias of the structural quasi-score estimator of a measurement error model under misspecification of the regressor distribution

Hans Schneeweiss Chi-Lun Cheng 《Journal of multivariate analysis》2006,97(2):455-473

In a structural measurement error model the structural quasi-score (SQS) estimator is based on the distribution of the latent regressor variable. If this distribution is misspecified, the SQS estimator is (asymptotically) biased. Two types of misspecification are considered. Both assume that the statistician erroneously adopts a normal distribution as his model for the regressor distribution. In the first type of misspecification, the true model consists of a mixture of normal distributions which cluster around a single normal distribution, in the second type, the true distribution is a normal distribution admixed with a second normal distribution of low weight. In both cases of misspecification, the bias, of course, tends to zero when the size of misspecification tends to zero. However, in the first case the bias goes to zero in a flat way so that small deviations from the true model lead to a negligible bias, whereas in the second case the bias is noticeable even for small deviations from the true model. 相似文献

4.

Simultaneous change point analysis and variable selection in a regression problem 总被引：1，自引：0，他引：1

Y. Wu 《Journal of multivariate analysis》2008,99(9):2154-2171

In this paper, an information-based criterion is proposed for carrying out change point analysis and variable selection simultaneously in linear models with a possible change point. Under some weak conditions, this criterion is shown to be strongly consistent in the sense that with probability one, it chooses the smallest true model for large n. Its byproducts include strongly consistent estimates of the regression coefficients regardless if there is a change point. In case that there is a change point, its byproducts also include a strongly consistent estimate of the change point parameter. In addition, an algorithm is given which has significantly reduced the computation time needed by the proposed criterion for the same precision. Results from a simulation study are also presented. 相似文献

5.

Parametric estimation of a bivariate stable Lévy process

Habib Esmaeili Claudia Klüppelberg 《Journal of multivariate analysis》2011,102(5):918-930

We propose a parametric model for a bivariate stable Lévy process based on a Lévy copula as a dependence model. We estimate the parameters of the full bivariate model by maximum likelihood estimation. As an observation scheme we assume that we observe all jumps larger than some ε>0 and base our statistical analysis on the resulting compound Poisson process. We derive the Fisher information matrix and prove asymptotic normality of all estimates when the truncation point ε→0. A simulation study investigates the loss of efficiency because of the truncation. 相似文献

6.

Combining the data from two normal populations to estimate the mean of one when their means difference is bounded

Constance van Eeden James V. Zidek 《Journal of multivariate analysis》2004,88(1):19-46

In this paper we address the problem of estimating θ₁ when , are observed and |θ₁−θ₂|?c for a known constant c. Clearly Y₂ contains information about θ₁. We show how the so-called weighted likelihood function may be used to generate a class of estimators that exploit that information. We discuss how the weights in the weighted likelihood may be selected to successfully trade bias for precision and thus use the information effectively. In particular, we consider adaptively weighted likelihood estimators where the weights are selected using the data. One approach selects such weights in accord with Akaike's entropy maximization criterion. We describe several estimators obtained in this way. However, the maximum likelihood estimator is investigated as a competitor to these estimators along with a Bayes estimator, a class of robust Bayes estimators and (when c is sufficiently small), a minimax estimator. Moreover we will assess their properties both numerically and theoretically. Finally, we will see how all of these estimators may be viewed as adaptively weighted likelihood estimators. In fact, an over-riding theme of the paper is that the adaptively weighted likelihood method provides a powerful extension of its classical counterpart. 相似文献

7.

Asymptotic power comparison of three tests in GMANOVA when the number of observed points is large

Takayuki Yamada 《Statistics & probability letters》2012,82(3):692-698

This paper is concerned with the testing problem of generalized multivariate linear hypothesis for the mean in the growth curve model(GMANOVA). Our interest is the case in which the number of the observed points p is relatively large compared to the sample size N. Asymptotic expansions of the non-null distributions of the likelihood ratio criterion, Lawley-Hotelling’s trace criterion and Bartlett-Nanda-Pillai’s trace criterion are derived under the asymptotic framework that N and p go to infinity together, while p/N→c∈(0,1). It also can be confirmed that Rothenberg’s condition on the magnitude of the asymptotic powers of the three tests is valid when p is relatively large, theoretically and numerically. 相似文献

8.

Estimation of the mean vector of a multivariate normal distribution: subspace hypothesis

M.S. Srivastava 《Journal of multivariate analysis》2005,96(1):55-72

This paper considers the estimation of the mean vector θ of a p-variate normal distribution with unknown covariance matrix Σ when it is suspected that for a p×r known matrix B the hypothesis θ=Bη, η∈R^r may hold. We consider empirical Bayes estimators which includes (i) the unrestricted unbiased (UE) estimator, namely, the sample mean vector (ii) the restricted estimator (RE) which is obtained when the hypothesis θ=Bη holds (iii) the preliminary test estimator (PTE), (iv) the James-Stein estimator (JSE), and (v) the positive-rule Stein estimator (PRSE). The biases and the risks under the squared loss function are evaluated for all the five estimators and compared. The numerical computations show that PRSE is the best among all the five estimators even when the hypothesis θ=Bη is true. 相似文献

9.

Optimum two level fractional factorial plans for model identification and discrimination

Subir Ghosh Ying Tian 《Journal of multivariate analysis》2006,97(6):1437-1450

Model identification and discrimination are two major statistical challenges. In this paper we consider a set of models M_k for factorial experiments with the parameters representing the general mean, main effects, and only k out of all two-factor interactions. We consider the class D of all fractional factorial plans with the same number of runs having the ability to identify all the models in M_k, i.e., the full estimation capacity.The fractional factorial plans in D with the full estimation capacity for k?2 are able to discriminate between models in M_u for u?k^*, where k^*=(k/2) when k is even, k^*=((k-1)/2) when k is odd. We obtain fractional factorial plans in D satisfying the six optimality criterion functions AD, AT, AMCR, GD, GT, and GMCR for 2^m factorial experiments when m=4 and 5. Both single stage and multi-stage (hierarchical) designs are given. Some results on estimation capacity of a fractional factorial plan for identifying models in M_k are also given. Our designs D4.1 and D10 stand out in their performances relative to the designs given in Li and Nachtsheim [Model-robust factorial designs, Technometrics 42(4) (2000) 345-352.] for m=4 and 5 with respect to the criterion functions AD, AT, AMCR, GD, GT, and GMCR. Our design D4.2 stands out in its performance relative the Li-Nachtsheim design for m=4 with respect to the four criterion functions AT, AMCR, GT, and GMCR. However, the Li-Nachtsheim design for m=4 stands out in its performance relative to our design D4.2 with respect to the criterion functions AD and GD. Our design D14 does have the full estimation capacity for k=5 but the twelve run Li-Nachtsheim design does not have the full estimation capacity for k=5. 相似文献

10.

Semiparametric inference for transformation models via empirical likelihood

Yichuan Zhao 《Journal of multivariate analysis》2010,101(8):1846-1858

Recent advances in the transformation model have made it possible to use this model for analyzing a variety of censored survival data. For inference on the regression parameters, there are semiparametric procedures based on the normal approximation. However, the accuracy of such procedures can be quite low when the censoring rate is heavy. In this paper, we apply an empirical likelihood ratio method and derive its limiting distribution via U-statistics. We obtain confidence regions for the regression parameters and compare the proposed method with the normal approximation based method in terms of coverage probability. The simulation results demonstrate that the proposed empirical likelihood method overcomes the under-coverage problem substantially and outperforms the normal approximation based method. The proposed method is illustrated with a real data example. Finally, our method can be applied to general U-statistic type estimating equations. 相似文献

11.

Sparse estimation in functional linear regression

Eun Ryung LeeByeong U. Park 《Journal of multivariate analysis》2012,105(1):1-17

As a useful tool in functional data analysis, the functional linear regression model has become increasingly common and been studied extensively in recent years. In this paper, we consider a sparse functional linear regression model which is generated by a finite number of basis functions in an expansion of the coefficient function. In this model, we do not specify how many and which basis functions enter the model, thus it is not like a typical parametric model where predictor variables are pre-specified. We study a general framework that gives various procedures which are successful in identifying the basis functions that enter the model, and also estimating the resulting regression coefficients in one-step. We adopt the idea of variable selection in the linear regression setting where one adds a weighted L₁ penalty to the traditional least squares criterion. We show that the procedures in our general framework are consistent in the sense of selecting the model correctly, and that they enjoy the oracle property, meaning that the resulting estimators of the coefficient function have asymptotically the same properties as the oracle estimator which uses knowledge of the underlying model. We investigate and compare several methods within our general framework, via a simulation study. Also, we apply the methods to the Canadian weather data. 相似文献

12.

Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix

Kazuyoshi Yata 《Journal of multivariate analysis》2010,101(9):2060-2077

In this paper, we propose a new methodology to deal with PCA in high-dimension, low-sample-size (HDLSS) data situations. We give an idea of estimating eigenvalues via singular values of a cross data matrix. We provide consistency properties of the eigenvalue estimation as well as its limiting distribution when the dimension d and the sample size n both grow to infinity in such a way that n is much lower than d. We apply the new methodology to estimating PC directions and PC scores in HDLSS data situations. We give an application of the findings in this paper to a mixture model to classify a dataset into two clusters. We demonstrate how the new methodology performs by using HDLSS data from a microarray study of prostate cancer. 相似文献

13.

Asymptotic distribution of the OLS estimator for a mixed spatial model

Kairat T. Mynbaev 《Journal of multivariate analysis》2010,101(3):733-748

We find the asymptotic distribution of the OLS estimator of the parameters β and ρ in the mixed spatial model with exogenous regressors Y_n=X_nβ+ρW_nY_n+V_n. The exogenous regressors may be bounded or growing, like polynomial trends. The assumption about the spatial matrix W_n is appropriate for the situation when each economic agent is influenced by many others. The error term is a short-memory linear process. The key finding is that in general the asymptotic distribution contains both linear and quadratic forms in standard normal variables and is not normal. 相似文献

14.

Robust Bayesian prediction and estimation under a squared log error loss function

A. Kiapour N. Nematollahi 《Statistics & probability letters》2011,81(11):1717-1724

Robust Bayesian analysis is concerned with the problem of making decisions about some future observation or an unknown parameter, when the prior distribution belongs to a class Γ instead of being specified exactly. In this paper, the problem of robust Bayesian prediction and estimation under a squared log error loss function is considered. We find the posterior regret Γ-minimax predictor and estimator in a general class of distributions. Furthermore, we construct the conditional Γ-minimax, most stable and least sensitive prediction and estimation in a gamma model. A prequential analysis is carried out by using a simulation study to compare these predictors. 相似文献

15.

Optimal rates of convergence in the Weibull model based on kernel-type estimators

Cécile Mercadier Philippe Soulier 《Statistics & probability letters》2012,82(3):548-556

Let F be a distribution function in the maximal domain of attraction of the Gumbel distribution such that −log(1−F(x))=x^1/θL(x) for a positive real number θ, called the Weibull tail index, and a slowly varying function L. It is well known that the estimators of θ have a very slow rate of convergence. We establish here a sharp optimality result in the minimax sense, that is when L is treated as an infinite dimensional nuisance parameter belonging to some functional class. We also establish the rate optimal asymptotic property of a data-driven choice of the sample fraction that is used for estimation. 相似文献

16.

Adaptive confidence region for the direction in semiparametric regressions

Gao-Rong Li Li-Xing Zhu 《Journal of multivariate analysis》2010,101(6):1364-1377

In this paper we aim to construct adaptive confidence region for the direction of ξ in semiparametric models of the form Y=G(ξ^TX,ε) where G(⋅) is an unknown link function, ε is an independent error, and ξ is a p_n×1 vector. To recover the direction of ξ, we first propose an inverse regression approach regardless of the link function G(⋅); to construct a data-driven confidence region for the direction of ξ, we implement the empirical likelihood method. Unlike many existing literature, we need not estimate the link function G(⋅) or its derivative. When p_n remains fixed, the empirical likelihood ratio without bias correlation can be asymptotically standard chi-square. Moreover, the asymptotic normality of the empirical likelihood ratio holds true even when the dimension p_n follows the rate of p_n=o(n^1/4) where n is the sample size. Simulation studies are carried out to assess the performance of our proposal, and a real data set is analyzed for further illustration. 相似文献

17.

Analysis of two-sample truncated data using generalized logistic models

Gang Li Jing Qin 《Journal of multivariate analysis》2006,97(3):675-697

Parallel to Cox's [JRSS B34 (1972) 187-230] proportional hazards model, generalized logistic models have been discussed by Anderson [Bull. Int. Statist. Inst. 48 (1979) 35-53] and others. The essential assumption is that the two densities ratio has a known parametric form. A nice property of this model is that it naturally relates to the logistic regression model for categorical data. In astronomic, demographic, epidemiological, and other studies the variable of interest is often truncated by an associated variable. This paper studies generalized logistic models for the two-sample truncated data problem, where the two lifetime densities ratio is assumed to have the form exp{α+φ(x;β)}. Here φ is a known function of x and β, and the baseline density is unspecified. We develop a semiparametric maximum likelihood method for the case where the two samples have a common truncation distribution. It is shown that inferences for β do not depend the nonparametric components. We also derive an iterative algorithm to maximize the semiparametric likelihood for the general case where different truncation distributions are allowed. We further discuss how to check goodness of fit of the generalized logistic model. The developed methods are illustrated and evaluated using both simulated and real data. 相似文献

18.

Model selection by sequentially normalized least squares

Jorma Rissanen Petri Myllymäki 《Journal of multivariate analysis》2010,101(4):839-849

Model selection by means of the predictive least squares (PLS) principle has been thoroughly studied in the context of regression model selection and autoregressive (AR) model order estimation. We introduce a new criterion based on sequentially minimized squared deviations, which are smaller than both the usual least squares and the squared prediction errors used in PLS. We also prove that our criterion has a probabilistic interpretation as a model which is asymptotically optimal within the given class of distributions by reaching the lower bound on the logarithmic prediction errors, given by the so called stochastic complexity, and approximated by BIC. This holds when the regressor (design) matrix is non-random or determined by the observed data as in AR models. The advantages of the criterion include the fact that it can be evaluated efficiently and exactly, without asymptotic approximations, and importantly, there are no adjustable hyper-parameters, which makes it applicable to both small and large amounts of data. 相似文献

19.

Consistent variable selection in large panels when factors are observable

Rachida Ouysse 《Journal of multivariate analysis》2006,97(4):946-984

In this paper we develop an econometric method for consistent variable selection in the context of a linear factor model with observable factors for panels of large dimensions. The subset of factors that best fit the data is sequentially determined. Firstly, a partial R² rule is used to show the existence of an optimal ordering of the candidate variables. Secondly, We show that for a given order of the regressors, the number of factors can be consistently estimated using the Bayes information criterion. The Akaike will asymptotically lead to overfitting of the model. The theory is established under approximate factor structure which allows for limited cross-section and serial dependence in the idiosyncratic term. Simulations show that the proposed two-step selection technique has good finite sample properties. The likelihood of selecting the correct specification increases with the number of cross-sections both asymptotically and in small samples. Moreover, the proposed variable selection method is computationally attractive. For K potential candidate factors, the search requires only 2K regressions compared to 2^K for an exhaustive search. 相似文献

20.

Boundary kernels for adaptive density estimators on regions with irregular boundaries

Jonathan C. Marshall 《Journal of multivariate analysis》2010,101(4):949-963

In some applications of kernel density estimation the data may have a highly non-uniform distribution and be confined to a compact region. Standard fixed bandwidth density estimates can struggle to cope with the spatially variable smoothing requirements, and will be subject to excessive bias at the boundary of the region. While adaptive kernel estimators can address the first of these issues, the study of boundary kernel methods has been restricted to the fixed bandwidth context. We propose a new linear boundary kernel which reduces the asymptotic order of the bias of an adaptive density estimator at the boundary, and is simple to implement even on an irregular boundary. The properties of this adaptive boundary kernel are examined theoretically. In particular, we demonstrate that the asymptotic performance of the density estimator is maintained when the adaptive bandwidth is defined in terms of a pilot estimate rather than the true underlying density. We examine the performance for finite sample sizes numerically through analysis of simulated and real data sets. 相似文献