首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sufficient Dimension Reduction (SDR) in regression comprises the estimation of the dimension of the smallest (central) dimension reduction subspace and its basis elements. For SDR methods based on a kernel matrix, such as SIR and SAVE, the dimension estimation is equivalent to the estimation of the rank of a random matrix which is the sample based estimate of the kernel. A test for the rank of a random matrix amounts to testing how many of its eigen or singular values are equal to zero. We propose two tests based on the smallest eigen or singular values of the estimated matrix: an asymptotic weighted chi-square test and a Wald-type asymptotic chi-square test. We also provide an asymptotic chi-square test for assessing whether elements of the left singular vectors of the random matrix are zero. These methods together constitute a unified approach for all SDR methods based on a kernel matrix that covers estimation of the central subspace and its dimension, as well as assessment of variable contribution to the lower-dimensional predictor projections with variable selection, a special case. A small power simulation study shows that the proposed and existing tests, specific to each SDR method, perform similarly with respect to power and achievement of the nominal level. Also, the importance of the choice of the number of slices as a tuning parameter is further exhibited.  相似文献   

2.
Risk management technology applied to high-dimensional portfolios needs simple and fast methods for calculation of value at risk (VaR). The multivariate normal framework provides a simple off-the-shelf methodology but lacks the heavy-tailed distributional properties that are observed in data. A principle component-based method (tied closely to the elliptical structure of the distribution) is therefore expected to be unsatisfactory. Here, we propose and analyze a technology that is based on independent component analysis (ICA). We study the proposed ICVaR methodology in an extensive simulation study and apply it to a high-dimensional portfolio situation. Our analysis yields very accurate VaRs.  相似文献   

3.
Inference about the difference between two normal mean vectors when the covariance matrices are unknown and arbitrary is considered. Assuming that the incomplete data are of monotone pattern, a pivotal quantity, similar to the Hotelling T2 statistic, is proposed. A satisfactory moment approximation to the distribution of the pivotal quantity is derived. Hypothesis testing and confidence estimation based on the approximate distribution are outlined. The accuracy of the approximation is investigated using Monte Carlo simulation. Monte Carlo studies indicate that the approximate method is very satisfactory even for moderately small samples. The proposed methods are illustrated using an example.  相似文献   

4.
We propose a criterion for variable selection in discriminant analysis. This criterion permits to arrange the variables in decreasing order of adequacy for discrimination, so that the variable selection problem reduces to that of the estimation of suitable permutation and dimensionality. Then, estimators for these parameters are proposed and the resulting method for selecting variables is shown to be consistent. In a simulation study, we compute proportions of correct classification after variable selection in order to gain understanding of the performance of our proposal and to compare it to existing methods.  相似文献   

5.
The sample-based rule obtained from Bayes classification rule by replacing the unknown parameters by ML estimates from a stratified training sample is used for the classification of a random observationX into one ofL populations. The asymptotic expansions in terms of the inverses of the training sample sizes for cross-validation, apparent and plug-in error rates are found. These are used to compare estimation methods of the error rate for a wide range of regular distributions as probability models for considered populations. The optimal training sample allocation minimizing the asymptotic expected error regret is found in the cases of widely applicable, positively skewed distributions (Rayleigh and Maxwell distributions). These probability models for populations are often met in ecology and biology. The results indicate that equal training sample sizes for each populations sometimes are not optimal, even when prior probabilities of populations are equal.  相似文献   

6.
It is natural to assume that a missing-data mechanism depends on latent variables in the analysis of incomplete data in latent variate modeling because latent variables are error-free and represent key notions investigated by applied researchers. Unfortunately, the missing-data mechanism is then not missing at random (NMAR). In this article, a new estimation method is proposed, which leads to consistent and asymptotically normal estimators for all parameters in a linear latent variate model, where the missing mechanism depends on the latent variables and no concrete functional form for the missing-data mechanism is used in estimation. The method to be proposed is a type of multi-sample analysis with or without mean structures, and hence, it is easy to implement. Complete-case analysis is shown to produce consistent estimators for some important parameters in the model.  相似文献   

7.
Clustering and classification are important tasks for the analysis of microarray gene expression data. Classification of tissue samples can be a valuable diagnostic tool for diseases such as cancer. Clustering samples or experiments may lead to the discovery of subclasses of diseases. Clustering genes can help identify groups of genes that respond similarly to a set of experimental conditions. We also need validation tools for clustering and classification. Here, we focus on the identification of outliers—units that may have been misallocated, or mislabeled, or are not representative of the classes or clusters.We present two new methods: DDclust and DDclass, for clustering and classification. These non-parametric methods are based on the intuitively simple concept of data depth. We apply the methods to several gene expression and simulated data sets. We also discuss a convenient visualization and validation tool—the relative data depth plot.  相似文献   

8.
For qualitative data models, Gini-Simpson index and Shannon entropy are commonly used for statistical analysis. In the context of high-dimensional low-sample size (HDLSS) categorical models, abundant in genomics and bioinformatics, the Gini-Simpson index, as extended to Hamming distance in a pseudo-marginal setup, facilitates drawing suitable statistical conclusions. Under Lorenz ordering it is shown that Shannon entropy and its multivariate analogues proposed here appear to be more informative than the Gini-Simpson index. The nested subset monotonicity prospect along with subgroup decomposability of some proposed measures are exploited. The usual jackknifing (or bootstrapping) methods may not work out well for HDLSS constrained models. Hence, we consider a permutation method incorporating the union-intersection (UI) principle and Chen-Stein Theorem to formulate suitable statistical hypothesis testing procedures for gene classification. Some applications are included as illustration.  相似文献   

9.
In this paper, we propose auto-associative (AA) models to generalize Principal component analysis (PCA). AA models have been introduced in data analysis from a geometrical point of view. They are based on the approximation of the observations scatter-plot by a differentiable manifold. In this paper, they are interpreted as Projection pursuit models adapted to the auto-associative case. Their theoretical properties are established and are shown to extend the PCA ones. An iterative algorithm of construction is proposed and its principle is illustrated both on simulated and real data from image analysis.  相似文献   

10.
In this article, the Stein-Haff identity is established for a singular Wishart distribution with a positive definite mean matrix but with the dimension larger than the degrees of freedom. This identity is then used to obtain estimators of the precision matrix improving on the estimator based on the Moore-Penrose inverse of the Wishart matrix under the Efron-Morris loss function and its variants. Ridge-type empirical Bayes estimators of the precision matrix are also given and their dominance properties over the usual one are shown using this identity. Finally, these precision estimators are used in a quadratic discriminant rule, and it is shown through simulation that discriminant methods based on the ridge-type empirical Bayes estimators provide higher correct classification rates.  相似文献   

11.
We present a new parametric model for the angular measure of a multivariate extreme value distribution. Unlike many parametric models that are limited to the bivariate case, the flexible model can describe the extremes of random vectors of dimension greater than two. The novel construction method relies on a geometric interpretation of the requirements of a valid angular measure. An advantage of this model is that its parameters directly affect the level of dependence between each pair of components of the random vector, and as such the parameters of the model are more interpretable than those of earlier parametric models for multivariate extremes. The model is applied to air quality data and simulated spatial data.  相似文献   

12.
Spearman’s rank-correlation coefficient (also called Spearman’s rho) represents one of the best-known measures to quantify the degree of dependence between two random variables. As a copula-based dependence measure, it is invariant with respect to the distribution’s univariate marginal distribution functions. In this paper, we consider statistical tests for the hypothesis that all pairwise Spearman’s rank correlation coefficients in a multivariate random vector are equal. The tests are nonparametric and their asymptotic distributions are derived based on the asymptotic behavior of the empirical copula process. Only weak assumptions on the distribution function, such as continuity of the marginal distributions and continuous partial differentiability of the copula, are required for obtaining the results. A nonparametric bootstrap method is suggested for either estimating unknown parameters of the test statistics or for determining the associated critical values. We present a simulation study in order to investigate the power of the proposed tests. The results are compared to a classical parametric test for equal pairwise Pearson’s correlation coefficients in a multivariate random vector. The general setting also allows the derivation of a test for stochastic independence based on Spearman’s rho.  相似文献   

13.
A new nonparametric approach to the problem of testing the joint independence of two or more random vectors in arbitrary dimension is developed based on a measure of association determined by interpoint distances. The population independence coefficient takes values between 0 and 1, and equals zero if and only if the vectors are independent. We show that the corresponding statistic has a finite limit distribution if and only if the two random vectors are independent; thus we have a consistent test for independence. The coefficient is an increasing function of the absolute value of product moment correlation in the bivariate normal case, and coincides with the absolute value of correlation in the Bernoulli case. A simple modification of the statistic is affine invariant. The independence coefficient and the proposed statistic both have a natural extension to testing the independence of several random vectors. Empirical performance of the test is illustrated via a comparative Monte Carlo study.  相似文献   

14.
A multivariate dispersion ordering based on random simplices is proposed in this paper. Given a Rd-valued random vector, we consider two random simplices determined by the convex hulls of two independent random samples of sizes d+1 of the vector. By means of the stochastic comparison of the Hausdorff distances between such simplices, a multivariate dispersion ordering is introduced. Main properties of the new ordering are studied. Relationships with other dispersion orderings are considered, placing emphasis on the univariate version. Some statistical tests for the new order are proposed. An application of such ordering to the clinical evaluation of human corneal endothelia is provided. Different analyses are included using an image database of human corneal endothelia.  相似文献   

15.
A new multivariate dispersion ordering based on the Hausdorff distance between nonempty convex compact sets is proposed. This dispersion ordering depends on an index, whose purpose is to blur for each random vector the ball centered at its expected value, and with a radius equal to the index. So, on the basis of such an index, we consider a random set associated with each random vector and dispersion comparisons are established by means of the Hausdorff distance associated with the random sets. Different properties of the new dispersion ordering are stated as well as some characterization theorems. Possible relationships with other dispersion orderings are also studied. Finally, several examples are developed.  相似文献   

16.
Our aim is to construct a factor analysis method that can resist the effect of outliers. For this we start with a highly robust initial covariance estimator, after which the factors can be obtained from maximum likelihood or from principal factor analysis (PFA). We find that PFA based on the minimum covariance determinant scatter matrix works well. We also derive the influence function of the PFA method based on either the classical scatter matrix or a robust matrix. These results are applied to the construction of a new type of empirical influence function (EIF), which is very effective for detecting influential data. To facilitate the interpretation, we compute a cutoff value for this EIF. Our findings are illustrated with several real data examples.  相似文献   

17.
A general depth measure, based on the use of one-dimensional linear continuous projections, is proposed. The applicability of this idea in different statistical setups (including inference in functional data analysis, image analysis and classification) is discussed. A special emphasis is made on the possible usefulness of this method in some statistical problems where the data are elements of a Banach space.The asymptotic properties of the empirical approximation of the proposed depth measure are investigated. In particular, its asymptotic distribution is obtained through U-statistics techniques. The practical aspects of these ideas are discussed through a small simulation study and a real-data example.  相似文献   

18.
A recently proposed method for the pairwise comparison of arbitrary independent random variables results in a probabilistic relation. When restricted to discrete random variables uniformly distributed on finite multisets of numbers, this probabilistic relation expresses the winning probabilities between pairs of hypothetical dice that carry these numbers and exhibits a particular type of transitivity called dice-transitivity. In case these multisets have equal cardinality, two alternative methods for statistically comparing the ordered lists of the numbers on the faces of the dice have been studied recently: the comonotonic method based upon the comparison of the numbers of the same rank when the lists are in increasing order, and the countermonotonic method, also based upon the comparison of only numbers of the same rank but with the lists in opposite order. In terms of the discrete random variables associated to these lists, these methods each turn out to be related to a particular copula that joins the marginal cumulative distribution functions into a bivariate cumulative distribution function. The transitivity of the generated probabilistic relation has been completely characterized. In this paper, the list comparison methods are generalized for the purpose of comparing arbitrary random variables. The transitivity properties derived in the case of discrete uniform random variables are shown to be generic. Additionally, it is shown that for a collection of normal random variables, both comparison methods lead to a probabilistic relation that is at least moderately stochastic transitive.  相似文献   

19.
The paper presents a unified approach to local likelihood estimation for a broad class of nonparametric models, including e.g. the regression, density, Poisson and binary response model. The method extends the adaptive weights smoothing (AWS) procedure introduced in Polzehl and Spokoiny (2000) in context of image denoising. The main idea of the method is to describe a greatest possible local neighborhood of every design point Xi in which the local parametric assumption is justified by the data. The method is especially powerful for model functions having large homogeneous regions and sharp discontinuities. The performance of the proposed procedure is illustrated by numerical examples for density estimation and classification. We also establish some remarkable theoretical nonasymptotic results on properties of the new algorithm. This includes the ``propagation' property which particularly yields the root-n consistency of the resulting estimate in the homogeneous case. We also state an ``oracle' result which implies rate optimality of the estimate under usual smoothness conditions and a ``separation' result which explains the sensitivity of the method to structural changes.  相似文献   

20.
This paper explains the differences between the densities and the Jacobians of the transforms of the same singular random matrices treated by several authors. Some comments on the results proposed by Srivastava [Singular Wishart and multivariate beta distributions, Ann. Statist. 31 (2003) 1537-1560] are presented. Definitions about a measure with respect to which a singular random matrix possesses a density are proposed. Finally two Jacobians of certain transforms under any of those measures are found.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号