首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
For a sequence of independent and identically distributed random vectors , i=1,2,…,n, we consider the conditional ordering of these random vectors with respect to the magnitudes of , where N is a p-variate continuous function defined on the support set of X1 and satisfying certain regularity conditions. We also consider the Progressive Type II right censoring for multivariate observations using conditional ordering. The need for the conditional ordering of random vectors exists for example, in reliability analysis when a system has n independent components each consisting of p arbitrarily dependent and parallel connected elements. Let the vector of life lengths for the ith component of the system be , where denotes the life length of the jth element of the ith component. Then the first failure in the system occurs at time , and for this case . In this paper we introduce the conditionally ordered and Progressive Type II right-censored conditionally ordered statistics for multivariate observations and to study their distributional properties.  相似文献   

2.
3.
Clustering and classification are important tasks for the analysis of microarray gene expression data. Classification of tissue samples can be a valuable diagnostic tool for diseases such as cancer. Clustering samples or experiments may lead to the discovery of subclasses of diseases. Clustering genes can help identify groups of genes that respond similarly to a set of experimental conditions. We also need validation tools for clustering and classification. Here, we focus on the identification of outliers—units that may have been misallocated, or mislabeled, or are not representative of the classes or clusters.We present two new methods: DDclust and DDclass, for clustering and classification. These non-parametric methods are based on the intuitively simple concept of data depth. We apply the methods to several gene expression and simulated data sets. We also discuss a convenient visualization and validation tool—the relative data depth plot.  相似文献   

4.
In this paper, we propose a new methodology to deal with PCA in high-dimension, low-sample-size (HDLSS) data situations. We give an idea of estimating eigenvalues via singular values of a cross data matrix. We provide consistency properties of the eigenvalue estimation as well as its limiting distribution when the dimension d and the sample size n both grow to infinity in such a way that n is much lower than d. We apply the new methodology to estimating PC directions and PC scores in HDLSS data situations. We give an application of the findings in this paper to a mixture model to classify a dataset into two clusters. We demonstrate how the new methodology performs by using HDLSS data from a microarray study of prostate cancer.  相似文献   

5.
In this article, we consider the problem of testing a linear hypothesis in a multivariate linear regression model which includes the case of testing the equality of mean vectors of several multivariate normal populations with common covariance matrix Σ, the so-called multivariate analysis of variance or MANOVA problem. However, we have fewer observations than the dimension of the random vectors. Two tests are proposed and their asymptotic distributions under the hypothesis as well as under the alternatives are given under some mild conditions. A theoretical comparison of these powers is made.  相似文献   

6.
In this article, we propose a new estimation methodology to deal with PCA for high-dimension, low-sample-size (HDLSS) data. We first show that HDLSS datasets have different geometric representations depending on whether a ρ-mixing-type dependency appears in variables or not. When the ρ-mixing-type dependency appears in variables, the HDLSS data converge to an n-dimensional surface of unit sphere with increasing dimension. We pay special attention to this phenomenon. We propose a method called the noise-reduction methodology to estimate eigenvalues of a HDLSS dataset. We show that the eigenvalue estimator holds consistency properties along with its limiting distribution in HDLSS context. We consider consistency properties of PC directions. We apply the noise-reduction methodology to estimating PC scores. We also give an application in the discriminant analysis for HDLSS datasets by using the inverse covariance matrix estimator induced by the noise-reduction methodology.  相似文献   

7.
A weighted multivariate signed-rank test is introduced for an analysis of multivariate clustered data. Observations in different clusters may then get different weights. The test provides a robust and efficient alternative to normal theory based methods. Asymptotic theory is developed to find the approximate p-value as well as to calculate the limiting Pitman efficiency of the test. A conditionally distribution-free version of the test is also discussed. The finite-sample behavior of different versions of the test statistic is explored by simulations and the new test is compared to the unweighted and weighted versions of Hotelling’s T2 test and the multivariate spatial sign test introduced in [D. Larocque, J. Nevalainen, H. Oja, A weighted multivariate sign test for cluster-correlated data, Biometrika 94 (2007) 267-283]. Finally, a real data example is used to illustrate the theory.  相似文献   

8.
We consider the problem of deriving the asymptotic distribution of the three commonly used multivariate test statistics, namely likelihood ratio, Lawley-Hotelling and Bartlett-Nanda-Pillai statistics, for testing hypotheses on the various effects (main, nested or interaction) in multivariate mixed models. We derive the distributions of these statistics, both in the null as well as non-null cases, as the number of levels of one of the main effects (random or fixed) goes to infinity. The robustness of these statistics against departure from normality will be assessed.Essentially, in the asymptotic spirit of this paper, both the hypothesis and error degrees of freedom tend to infinity at a fixed rate. It is intuitively appealing to consider asymptotics of this type because, for example, in random or mixed effects models, the levels of the main random factors are assumed to be a random sample from a large population of levels.For the asymptotic results of this paper to hold, we do not require any distributional assumption on the errors. That means the results can be used in real-life applications where normality assumption is not tenable.As it happens, the asymptotic distributions of the three statistics are normal. The statistics have been found to be asymptotically null robust against the departure from normality in the balanced designs. The expressions for the asymptotic means and variances are fairly simple. That makes the results an attractive alternative to the standard asymptotic results. These statements are favorably supported by the numerical results.  相似文献   

9.
In this paper, the problem of variable selection in classification is considered. On the basis of recent developments in model selection theory, we provide a criterion based on penalized empirical risk, where the penalization explicitly takes into account the number of variables of the considered models. Moreover, we give an oracle-type inequality that non-asymptotically guarantees the performance of the resulting classification rule. We discuss the optimality of the proposed criterion and present an application of the main result to backward and forward selection procedures.  相似文献   

10.
For high dimensional data sets the sample covariance matrix is usually unbiased but noisy if the sample is not large enough. Shrinking the sample covariance towards a constrained, low dimensional estimator can be used to mitigate the sample variability. By doing so, we introduce bias, but reduce variance. In this paper, we give details on feasible optimal shrinkage allowing for time series dependent observations.  相似文献   

11.
The aim of this paper is to propose a simple method in order to evaluate the (approximate) distribution of matrix quadratic forms when Wishartness conditions do not hold. The method is based upon a factorization of a general Gaussian stochastic matrix as a special linear combination of nonstochastic matrices with the standard Gaussian matrix. An application of previous result is proposed for matrix quadratic forms arising in MANOVA for a multivariate split-plot design with circular dependence structure.  相似文献   

12.
Two robustness criteria are presented that are applicable to general clustering methods. Robustness and stability in cluster analysis are not only data dependent, but even cluster dependent. Robustness is in the present paper defined as a property of not only the clustering method, but also of every individual cluster in a data set. The main principles are: (a) dissimilarity measurement of an original cluster with the most similar cluster in the induced clustering obtained by adding data points, (b) the dissolution point, which is an adaptation of the breakdown point concept to single clusters, (c) isolation robustness: given a clustering method, is it possible to join, by addition of g points, arbitrarily well separated clusters?Results are derived for k-means, k-medoids (k estimated by average silhouette width), trimmed k-means, mixture models (with and without noise component, with and without estimation of the number of clusters by BIC), single and complete linkage.  相似文献   

13.
In this paper we propose a new test procedure for sphericity of the covariance matrix when the dimensionality, p, exceeds that of the sample size, N=n+1. Under the assumptions that (A) as p for i=1,…,16 and (B) p/nc< known as the concentration, a new statistic is developed utilizing the ratio of the fourth and second arithmetic means of the eigenvalues of the sample covariance matrix. The newly defined test has many desirable general asymptotic properties, such as normality and consistency when (n,p)→. Our simulation results show that the new test is comparable to, and in some cases more powerful than, the tests for sphericity in the current literature.  相似文献   

14.
We establish the Stein phenomenon in the context of two-step, monotone incomplete data drawn from , a (p+q)-dimensional multivariate normal population with mean and covariance matrix . On the basis of data consisting of n observations on all p+q characteristics and an additional Nn observations on the last q characteristics, where all observations are mutually independent, denote by the maximum likelihood estimator of . We establish criteria which imply that shrinkage estimators of James-Stein type have lower risk than under Euclidean quadratic loss. Further, we show that the corresponding positive-part estimators have lower risk than their unrestricted counterparts, thereby rendering the latter estimators inadmissible. We derive results for the case in which is block-diagonal, the loss function is quadratic and non-spherical, and the shrinkage estimator is constructed by means of a nondecreasing, differentiable function of a quadratic form in . For the problem of shrinking to a vector whose components have a common value constructed from the data, we derive improved shrinkage estimators and again determine conditions under which the positive-part analogs have lower risk than their unrestricted counterparts.  相似文献   

15.
Finding predictive gene groups from microarray data   总被引:1,自引:0,他引:1  
Microarray experiments generate large datasets with expression values for thousands of genes, but not more than a few dozens of samples. A challenging task with these data is to reveal groups of genes which act together and whose collective expression is strongly associated with an outcome variable of interest. To find these groups, we suggest the use of supervised algorithms: these are procedures which use external information about the response variable for grouping the genes. We present Pelora, an algorithm based on penalized logistic regression analysis, that combines gene selection, gene grouping and sample classification in a supervised, simultaneous way. With an empirical study on six different microarray datasets, we show that Pelora identifies gene groups whose expression centroids have very good predictive potential and yield results that can keep up with state-of-the-art classification methods based on single genes. Thus, our gene groups can be beneficial in medical diagnostics and prognostics, but they may also provide more biological insights into gene function and regulation.  相似文献   

16.
We propose a new class of rotation invariant and consistent goodness-of-fit tests for multivariate distributions based on Euclidean distance between sample elements. The proposed test applies to any multivariate distribution with finite second moments. In this article we apply the new method for testing multivariate normality when parameters are estimated. The resulting test is affine invariant and consistent against all fixed alternatives. A comparative Monte Carlo study suggests that our test is a powerful competitor to existing tests, and is very sensitive against heavy tailed alternatives.  相似文献   

17.
A minimum volume (MV) set, at level α, is a set having minimum volume among all those sets containing at least α probability mass. MV sets provide a natural notion of the ‘central mass’ of a distribution and, as such, have recently become popular as a tool for the detection of anomalies in multivariate data. Motivated by the fact that anomaly detection problems frequently arise in settings with temporally indexed measurements, we propose here a new method for the estimation of MV sets from dependent data. Our method is based on the concept of complexity-penalized estimation, extending recent work of Scott and Nowak for the case of independent and identically distributed measurements, and has both desirable theoretical properties and a practical implementation. Of particular note is the fact that, for a large class of stochastic processes, choice of an appropriate complexity penalty reduces to the selection of a single tuning parameter, which represents the data dependency of the underlying stochastic process. While in reality the dependence structure is unknown, we offer a data-dependent method for selecting this parameter, based on subsampling principles. Our work is motivated by and illustrated through an application to the detection of anomalous traffic levels in Internet traffic time series.  相似文献   

18.
Let Λ=|Se|/|Se+Sh|, where Sh and Se are independently distributed as Wishart distributions Wp(q,Σ) and Wp(n,Σ), respectively. Then Λ has Wilks’ lambda distribution Λp,q,n which appears as the distributions of various multivariate likelihood ratio tests. This paper is concerned with theoretical accuracy for asymptotic expansions of the distribution of T=-nlogΛ. We derive error bounds for the approximations. It is necessary to underline that our error bounds are given in explicit and computable forms.  相似文献   

19.
Suppose that Y=(Yi) is a normal random vector with mean Xb and covariance σ2In, where b is a p-dimensional vector (bj),X=(Xij) is an n×p matrix. A-optimal designs X are chosen from the traditional set D of A-optimal designs for ρ=0 such that X is still A-optimal in D when the components Yi are dependent, i.e., for ii′, the covariance of Yi,Yi is ρ with ρ≠0. Such designs depend on the sign of ρ. The general results are applied to X=(Xij), where Xij∈{-1,1}; this corresponds to a factorial design with -1,1 representing low level or high level respectively, or corresponds to a weighing design with -1,1 representing an object j with weight bj being weighed on the left and right of a chemical balance respectively.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号