首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 995 毫秒
1.
It is shown that the accuracy of chromosome classification constrained by class size can be improved over previously reported results by a combination of straightforward modifications to previously used methods. These are (i) the use of the logarithm of the Mahalanobis distance of an unknown chromosome's feature vector to estimated class mean vectors as the basis of the transportation method objective function, rather than the estimated likelihood; (ii) the use of all available features and full estimated covariance to compute the Mahalanobis distance, rather than a subset of features and the diagonal (variance) terms only; (iii) a modification to the way the transportation model deals with the constraint on the number of sex chromosomes in a metaphase cell; and (iv) the use of a newly discovered heuristic to weight off-diagonal elements of the covariance; this proved to be particularly valuable in cases where relatively few training examples were available to estimate covariance. The methods have been verified using 5 different sets of chromosome data.  相似文献   

2.
马氏距离聚类分析中协方差矩阵估算的改进   总被引:1,自引:0,他引:1  
本文考虑了变量权重和样本类别的影响,建立了马氏距离聚类过程中评估协方差矩阵的迭代法。以Fisher的iris数据为样本,运用欧氏距离一般聚类、主成分聚类、改进前后的马氏距离聚类方法,进行实证分析和比较,结果表明本文所提出的新方法准确率至少提高了6.63%。最后,运用该方法对35个国家的相关指标数据进行聚类分析,确定了各国的卫生保健状况等级。  相似文献   

3.
The estimation of the covariance matrix is a key concern in the analysis of longitudinal data. When data consist of multiple groups, it is often assumed the covariance matrices are either equal across groups or are completely distinct. We seek methodology to allow borrowing of strength across potentially similar groups to improve estimation. To that end, we introduce a covariance partition prior that proposes a partition of the groups at each measurement time. Groups in the same set of the partition share dependence parameters for the distribution of the current measurement given the preceding ones, and the sequence of partitions is modeled as a Markov chain to encourage similar structure at nearby measurement times. This approach additionally encourages a lower-dimensional structure of the covariance matrices by shrinking the parameters of the Cholesky decomposition toward zero. We demonstrate the performance of our model through two simulation studies and the analysis of data from a depression study. This article includes Supplementary Materials available online.  相似文献   

4.
A local geometrical properties application to fuzzy clustering   总被引:1,自引:0,他引:1  
Possibilistic clustering is seen increasingly as a suitable means to resolve the limitations resulting from the constraints imposed in the fuzzy C-means algorithm. Studying the metric derived from the covariance matrix we obtain a membership function and an objective function whether the Mahalanobis distance or the Euclidean distance is used. Applying the theoretical results using the Euclidean distance we obtain a new algorithm called fuzzy-minimals, which detects the possible prototypes of the groups of a sample. We illustrate the new algorithm with several examples.  相似文献   

5.

We consider hypothesis testing for high-dimensional covariance structures in which the covariance matrix is a (i) scaled identity matrix, (ii) diagonal matrix, or (iii) intraclass covariance matrix. Our purpose is to systematically establish a nonparametric approach for testing the high-dimensional covariance structures (i)–(iii). We produce a new common test statistic for each covariance structure and show that the test statistic is an unbiased estimator of its corresponding test parameter. We prove that the test statistic establishes the asymptotic normality. We propose a new test procedure for (i)–(iii) and evaluate its asymptotic size and power theoretically when both the dimension and sample size increase. We investigate the performance of the proposed test procedure in simulations. As an application of testing the covariance structures, we give a test procedure to identify an eigenvector. Finally, we demonstrate the proposed test procedure by using a microarray data set.

  相似文献   

6.
In this paper we extend the definition of the influence function to functionals of more than one distribution, that is, for estimators depending on more than one sample, such as the pooled variance, the pooled covariance matrix, and the linear discriminant analysis coefficients. In this case the appropriate designation should be “partial influence functions,” following the analogy with derivatives and partial derivatives. Some useful results are derived, such as an asymptotic variance formula. These results are then applied to several estimators of the Mahalanobis distance between two populations and the linear discriminant function coefficients.  相似文献   

7.
ANANALYSISOFAMULTIVARIATETWO-WAYMODELWITHINTERACTIONANDNOREPLICATIONGUODAWEI(郭大伟)(DepartmentofMathematics,AnhuiNormalUniversi...  相似文献   

8.
In many real world classification problems, class-conditional classification noise (CCC-Noise) frequently deteriorates the performance of a classifier that is naively built by ignoring it. In this paper, we investigate the impact of CCC-Noise on the quality of a popular generative classifier, normal discriminant analysis (NDA), and its corresponding discriminative classifier, logistic regression (LR). We consider the problem of two multivariate normal populations having a common covariance matrix. We compare the asymptotic distribution of the misclassification error rate of these two classifiers under CCC-Noise. We show that when the noise level is low, the asymptotic error rates of both procedures are only slightly affected. We also show that LR is less deteriorated by CCC-Noise compared to NDA. Under CCC-Noise contexts, the Mahalanobis distance between the populations plays a vital role in determining the relative performance of these two procedures. In particular, when this distance is small, LR tends to be more tolerable to CCC-Noise compared to NDA.  相似文献   

9.
The canonical correlation (CANCOR) method for dimension reduction in a regression setting is based on the classical estimates of the first and second moments of the data, and therefore sensitive to outliers. In this paper, we study a weighted canonical correlation (WCANCOR) method, which captures a subspace of the central dimension reduction subspace, as well as its asymptotic properties. In the proposed WCANCOR method, each observation is weighted based on its Mahalanobis distance to the location of the predictor distribution. Robust estimates of the location and scatter, such as the minimum covariance determinant (MCD) estimator of Rousseeuw [P.J. Rousseeuw, Multivariate estimation with high breakdown point, Mathematical Statistics and Applications B (1985) 283-297], can be used to compute the Mahalanobis distance. To determine the number of significant dimensions in WCANCOR, a weighted permutation test is considered. A comparison of SIR, CANCOR and WCANCOR is also made through simulation studies to show the robustness of WCANCOR to outlying observations. As an example, the Boston housing data is analyzed using the proposed WCANCOR method.  相似文献   

10.
Our aim is to construct a factor analysis method that can resist the effect of outliers. For this we start with a highly robust initial covariance estimator, after which the factors can be obtained from maximum likelihood or from principal factor analysis (PFA). We find that PFA based on the minimum covariance determinant scatter matrix works well. We also derive the influence function of the PFA method based on either the classical scatter matrix or a robust matrix. These results are applied to the construction of a new type of empirical influence function (EIF), which is very effective for detecting influential data. To facilitate the interpretation, we compute a cutoff value for this EIF. Our findings are illustrated with several real data examples.  相似文献   

11.
The kernel-based regression (KBR) method, such as support vector machine for regression (SVR) is a well-established methodology for estimating the nonlinear functional relationship between the response variable and predictor variables. KBR methods can be very sensitive to influential observations that in turn have a noticeable impact on the model coefficients. The robustness of KBR methods has recently been the subject of wide-scale investigations with the aim of obtaining a regression estimator insensitive to outlying observations. However, existing robust KBR (RKBR) methods only consider Y-space outliers and, consequently, are sensitive to X-space outliers. As a result, even a single anomalous outlying observation in X-space may greatly affect the estimator. In order to resolve this issue, we propose a new RKBR method that gives reliable result even if a training data set is contaminated with both Y-space and X-space outliers. The proposed method utilizes a weighting scheme based on the hat matrix that resembles the generalized M-estimator (GM-estimator) of conventional robust linear analysis. The diagonal elements of hat matrix in kernel-induced feature space are used as leverage measures to downweight the effects of potential X-space outliers. We show that the kernelized hat diagonal elements can be obtained via eigen decomposition of the kernel matrix. The regularized version of kernelized hat diagonal elements is also proposed to deal with the case of the kernel matrix having full rank where the kernelized hat diagonal elements are not suitable for leverage. We have shown that two kernelized leverage measures, namely, the kernel hat diagonal element and the regularized one, are related to statistical distance measures in the feature space. We also develop an efficiently kernelized training algorithm for the parameter estimation based on iteratively reweighted least squares (IRLS) method. The experimental results from simulated examples and real data sets demonstrate the robustness of our proposed method compared with conventional approaches.  相似文献   

12.
In portfolio selection, there is often the need for procedures to generate “realistic” covariance matrices for security returns, for example to test and benchmark optimization algorithms. For application in portfolio optimization, such a procedure should allow the entries in the matrices to have distributional characteristics which we would consider “realistic” for security returns. Deriving motivation from the fact that a covariance matrix can be viewed as stemming from a matrix of factor loadings, a procedure is developed for the random generation of covariance matrices (a) whose off-diagonal (covariance) entries possess a pre-specified expected value and standard deviation and (b) whose main diagonal (variance) entries possess a likely different pre-specified expected value and standard deviation. The paper concludes with a discussion about the futility one would likely encounter if one simply tried to invent a valid covariance matrix in the absence of a procedure such as in this paper.  相似文献   

13.
??In the last few decades, longitudinal data was deeply research in statistics science and widely used in many field, such as finance, medical science, agriculture and so on. The characteristic of longitudinal data is that the values are independent from different samples but they are correlate from one sample. Many nonparametric estimation methods were applied into longitudinal data models with development of computer technology. Using Cholesky decomposition and Profile least squares estimation, we will propose a effective spline estimation method pointing at nonparametric model of longitudinal data with covariance matrix unknown in this paper. Finally, we point that the new proposed method is more superior than Naive spline estimation in the covariance matrix is unknown case by comparing the simulated results of one example.  相似文献   

14.
The correlation matrix (denoted by R) plays an important role in many statistical models. Unfortunately, sampling the correlation matrix in Markov chain Monte Carlo (MCMC) algorithms can be problematic. In addition to the positive definite constraint of covariance matrices, correlation matrices have diagonal elements fixed at one. In this article, we propose an efficient two-stage parameter expanded reparameterization and Metropolis-Hastings (PX-RPMH) algorithm for simulating R. Using this algorithm, we draw all elements of R simultaneously by first drawing a covariance matrix from an inverse Wishart distribution, and then translating it back to a correlation matrix through a reduction function and accepting it based on a Metropolis-Hastings acceptance probability. This algorithm is illustrated using multivariate probit (MVP) models and multivariate regression (MVR) models with a common correlation matrix across groups. Via both a simulation study and a real data example, the performance of the PX-RPMH algorithm is compared with those of other common algorithms. The results show that the PX-RPMH algorithm is more efficient than other methods for sampling a correlation matrix.  相似文献   

15.
We consider a new method for sparse covariance matrix estimation which is motivated by previous results for the so-called Stein-type estimators. Stein proposed a method for regularizing the sample covariance matrix by shrinking together the eigenvalues; the amount of shrinkage is chosen to minimize an unbiased estimate of the risk (UBEOR) under the entropy loss function. The resulting estimator has been shown in simulations to yield significant risk reductions over the maximum likelihood estimator. Our method extends the UBEOR minimization problem by adding an ?1 penalty on the entries of the estimated covariance matrix, which encourages a sparse estimate. For a multivariate Gaussian distribution, zeros in the covariance matrix correspond to marginal independences between variables. Unlike the ?1-penalized Gaussian likelihood function, our penalized UBEOR objective is convex and can be minimized via a simple block coordinate descent procedure. We demonstrate via numerical simulations and an analysis of microarray data from breast cancer patients that our proposed method generally outperforms other methods for sparse covariance matrix estimation and can be computed efficiently even in high dimensions.  相似文献   

16.
We show that a semigroup of positive matrices (all entries greater than or equal to zero) with binary diagonals (diagonal entries either 0 or 1) is either decomposable (all matrices in the semigroup have a common zero entry) or is similar, via a positive diagonal matrix, to a binary semigroup (all entries 0 or 1). In the case where the idempotents of minimal rank in S{\mathcal{S}} satisfy a “diagonal disjointness” condition, we obtain additional structural information. In the case where the semigroup is not necessarily positive but has binary diagonals we show that either the semigroup is reducible or the minimal rank ideal is a binary semigroup. We also give generalizations of these results to operators acting on the Hilbert space of square-summable sequences.  相似文献   

17.
Circulant matrix embedding is one of the most popular and efficient methods for the exact generation of Gaussian stationary univariate series. Although the idea of circulant matrix embedding has also been used for the generation of Gaussian stationary random fields, there are many practical covariance structures of random fields where classical embedding methods break down. In this work, we propose a novel methodology that adaptively constructs feasible circulant embeddings based on convex optimization with an objective function measuring the distance of the covariance embedding to the targeted covariance structure over the domain of interest. The optimal value of the objective function will be zero if and only if there exists a feasible embedding for the a priori chosen embedding size.  相似文献   

18.
An influence measure for investigating the influence of deleting an observation in linear regression is proposed based on geometric thoughts of the sampling distribution of the distance between two estimators of regression coefficients computed with and without a single specific observation. The covariance matrix of the above sampling distribution plays a key role in deriving the influence measure. It turns out that geometrically, this distance is distributed entirely along the axis associated with the nonnull eigenvalue of the covariance matrix. The deviation of the regression coefficients computed without an observation from the regression coefficients computed with the full data is reflected in the eigenvalue of the covariance matrix which can be used for investigating the influence. The distance is normalized using the associated covariance matrix and this normalized distance turns out to be the square of internally studentized residuals. Illustrative examples for showing the effectiveness of the influence measure proposed here are given. In judging the influence of observations on the least squares estimates of regression coefficients, Cook’s distance does not work well for one example and therefore we should be cautious about a blind use of the Cook’s distance.  相似文献   

19.
Time series linear regression models with stationary residuals are a well studied topic, and have been widely applied in a number of fields. However, the stationarity assumption on the residuals seems to be restrictive. The analysis of relatively long stretches of time series data that may contain changes in the spectrum is of interest in many areas. Locally stationary processes have time-varying spectral densities, the structure of which smoothly changes in time. Therefore, we extend the model to the case of locally stationary residuals. The best linear unbiased estimator (BLUE) of vector of regression coefficients involves the residual covariance matrix which is usually unknown. Hence, we often use the least squares estimator (LSE), which is always feasible, but in general is not efficient. We evaluate the asymptotic covariance matrices of the BLUE and the LSE. We also study the efficiency of the LSE relative to the BLUE. Numerical examples illustrate the situation under locally stationary disturbances.  相似文献   

20.
We establish a sufficient condition for strict total positivity of a matrix In particular, we show that if the (positive) elements of a square matrix grow sufficiently fast as their distance from the diagonal of the matrix increases, then the matrix is strictly totally positive.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号