首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 562 毫秒
1.
因子分析是常用的多元统计分析方法之一,其思想是根据变量间的相关关系求出少数几个主因子,利用这些主因子描述原始变量。传统因子分析方法具有不稳健性,如果数据存在离群值会得到不合理的结果。虽然基于MCD估计的稳健因子分析具有良好的抗干扰性,但是MCD估计的精度会随着维数的增加而不断降低,在维数大于样本量的情形下,该方法甚至会失去有效性。为了对高维数据进行有效的因子分析,本文提出基于MRCD估计的高维稳健因子分析方法。模拟分析的结果表明,在高维数据下,相较于传统因子分析以及MCD稳健因子分析,MRCD高维稳健因子分析能够很好地抵抗离群值的影响,得出更为合理的结论。本文在实证分析部分对11个沿海省份进行研究,结果显示MRCD高维稳健因子模型能够有效地得出高维数据的因子分析结果;沿海各省份经济增长质量发展不平衡,上海、广东经济增长质量发展得较好。  相似文献   

2.
主成分分析方法是在经济管理中经常使用的多元统计分析方法,在变量降维方面扮演着很重要的角色,是进行多变量综合评价的有力工具。但传统的主成分分析对于异常值十分敏感,计算结果很容易受到异常值影响,而实际数据常包含异常情况,通常分析很少考虑它们的作用。本文基于MCD估计提出一种稳健的主成分分析方法,模拟和实证分析结果表明,该方法对于抵抗异常值有很好的效果。  相似文献   

3.
《数理统计与管理》2019,(5):849-857
传统的主成分聚类方法往往会因对离群值比较敏感而导致聚类的结果与实际不相符。针对这一现象,本文运用稳健统计量对传统主成分聚类方法进行修正,构建出稳健主成分聚类分析算法,以克服离群值对模型计算结果的影响。由模拟和实证分析的计算结果可得知:当数据中没有离群值时,稳健主成分聚类方法的结果与传统主成分聚类方法一致;但当数据中有离群值时,相对于传统主成分聚类方法而言,稳健主成分聚类方法能有效抵抗离群值的影响,具有良好的抗干扰性和高抗差性。  相似文献   

4.
基于有限维离散数据的传统聚类分析并不能直接用于函数型数据的分类挖掘。本文针对函数型数据的稀疏性和无穷维特殊性展开讨论,在综合剖析现有函数型聚类方法优势与不足的基础上,依据聚类指标的信息量差异重构加权主成分距离为函数相似性测度,提出了一种函数型数据的自适应权重聚类分析。相对同类函数型聚类算法,新方法的核心优势在于:(1)自适应赋权的距离函数体现了聚类指标分类效率的差异,并且有充分的理论基础保证其必要性和客观合理性;(2)基于有限维离散数据的聚类实现了无限维连续函数的聚类,能够显著降低计算成本。实证检验表明,新方法的分类正确率明显提高,能够有效解决传统聚类算法极端情形下的失效问题,有着复杂函数型数据分类问题下的灵活性和普遍适用性。  相似文献   

5.
俞燕  徐勤丰  孙鹏飞 《应用数学》2006,19(3):600-605
本文基于Dirichlet分布有限混合模型,提出了一种用于成分数据的Bayes聚类方法.采用EM算法获得模型参数的估计,用BIC准则确定类数,用类似于Bayes判别的方法对各观测分类.推导了计算公式,编写出程序.模拟研究结果表明,本文提出的方法有较好的聚类效果.  相似文献   

6.
由于信息的不完全性,我们得到的表示事物特征的数据往往是一些区间值.基于区间值的模糊聚类分析方法,可以更大程度地保留信息.首次提出了基于区间值的最大最小法构造相似系数,给出解决区间数多指标信息聚类问题的计算步骤和基于区间值的聚类方法,最后,通过算例说明了所给出的聚类方法在实际中的应用.  相似文献   

7.
广义估计方程(GEE)是分析纵向数据的常用方法.如果响应变量的维数是一, XIE和YANG(2003)及WANG(2011)分别研究了协变量维数是固定的和协变量维数趋于无穷时, GEE估计的渐近性质.本文研究纵向多分类数据(multicategorical data)的GEE建模和GEE估计的渐近性质.当数据的分类数大于二时,响应变量的维数大于一,所以推广了文献的相关结果.  相似文献   

8.
回归系数的主相关估计及其优良性   总被引:6,自引:0,他引:6  
本文在回归系数的主成分估计的基础上,提出一种新的降维估计-主相关估计,讨论了它的优良性,并用实例说明主相关估计对主成分估计的改进效果。  相似文献   

9.
主成分估计的最优性   总被引:3,自引:0,他引:3  
一、引言 自从Hotelling从概率结构引进主成分概念之后,主成分估计受到了统计工作者的广泛重视.它已成为多元分析中减少数据维数的有效工具.许多人从不同角度研究了主成分的最优性质.但是,这些性质都是关于协方差阵的.本文力图从更能反映一个估  相似文献   

10.
为了顺应改革开放以来经济规模和结构的不断调整,我国的统计体系发生了较大的变化。部分经济指标在不同层面的汇总结果存在差异,导致一些学者和组织对我国公布的经济增长数据质量提出质疑。因此,对我国经济增长数据的可靠性进行检测,成为学界持续关注的热点话题。在过去的研究中,很多学者使用了传统的回归方法,但这些方法容易受到异常值的影响,造成结果的可靠性较低。本文提出一种基于MRCD估计和MM估计的稳健回归方法,使用2019年中国大陆31个省级行政区域的GDP增长率和14个经济增长相关指标的增长率数据对中国的经济数据质量进行了评估。研究结果表明,该模型不仅提高了对异常值的识别能力,还降低了异常值对回归估计值的影响,因而同时提高了结果的可靠性和实际应用能力。实证结果表明,我国的经济增长数据是有质量保证的。  相似文献   

11.
Functional principal component analysis is the preliminary step to represent the data in a lower dimensional space and to capture the main modes of variability of the data by means of small number of components which are linear combinations of original variables. Sensitivity of the variance and the covariance functions to irregular observations make this method vulnerable to outliers and may not capture the variation of the regular observations. In this study, we propose a robust functional principal component analysis to find the linear combinations of the original variables that contain most of the information, even if there are outliers and to flag functional outliers. We demonstrate the performance of the proposed method on an extensive simulation study and two datasets from chemometrics and environment.  相似文献   

12.
为解决规模以下工业企业调查中存在的样本代表性不足的问题,提出基于平衡样本的校准估计方法,并得出相应的估计量和估计量方差。该方法在抽样设计阶段采用了平衡抽样设计,在估计阶段采用了校准估计方法,较大限度地使用了辅助信息;通过数据分析得出基于平衡样本的校准估计方法要优于基于平衡抽样的HT估计方法。同时,为满足平衡变量间线性无关的假定,提出使用主成分分析、切片逆回归和切片平均方差估计三种方法对相关的平衡变量进行处理的思路。该方法对我国规模以下工业企业调查的完善具有理论与实践的双重意义,可适当的推广至我国政府统计的其他调查中。  相似文献   

13.
Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be effective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate in size. For large datasets, current model-based clustering methods tend to be limited by memory and time requirements and the increasing difficulty of maximum likelihood estimation. They may fit too many clusters in some portions of the data and/or miss clusters containing relatively few observations. We propose an incremental approach for data that can be processed as a whole in memory, which is relatively efficient computationally and has the ability to find small clusters in large datasets. The method starts by drawing a random sample of the data, selecting and fitting a clustering model to the sample, and extending the model to the full dataset by additional EM iterations. New clusters are then added incrementally, initialized with the observations that are poorly fit by the current model. We demonstrate the effectiveness of this method by applying it to simulated data, and to image data where its performance can be assessed visually.  相似文献   

14.
Outlier mining is an important aspect in data mining and the outlier mining based on Cook distance is most commonly used. But we know that when the data have multicoUinearity, the traditional Cook method is no longer effective. Considering the excellence of the principal component estimation, we use it to substitute the least squares estimation, and then give the Cook distance measurement based on principal component estimation, which can be used in outlier mining. At the same time, we have done some research on related theories and application problems.  相似文献   

15.
Robust techniques for multivariate statistical methods—such as principal component analysis, canonical correlation analysis, and factor analysis—have been recently constructed. In contrast to the classical approach, these robust techniques are able to resist the effect of outliers. However, there does not yet exist a graphical tool to identify in a comprehensive way the data points that do not obey the model assumptions. Our goal is to construct such graphics based on empirical influence functions. These graphics not only detect the influential points but also classify the observations according to their robust distances. In this way the observations are divided into four different classes which are regular points, nonoutlying influential points, influential outliers, and noninfluential outliers. We thus gain additional insight in the data by detecting different types of deviating observations. Some real data examples will be given to show how these plots can be used in practice.  相似文献   

16.
The canonical correlation (CANCOR) method for dimension reduction in a regression setting is based on the classical estimates of the first and second moments of the data, and therefore sensitive to outliers. In this paper, we study a weighted canonical correlation (WCANCOR) method, which captures a subspace of the central dimension reduction subspace, as well as its asymptotic properties. In the proposed WCANCOR method, each observation is weighted based on its Mahalanobis distance to the location of the predictor distribution. Robust estimates of the location and scatter, such as the minimum covariance determinant (MCD) estimator of Rousseeuw [P.J. Rousseeuw, Multivariate estimation with high breakdown point, Mathematical Statistics and Applications B (1985) 283-297], can be used to compute the Mahalanobis distance. To determine the number of significant dimensions in WCANCOR, a weighted permutation test is considered. A comparison of SIR, CANCOR and WCANCOR is also made through simulation studies to show the robustness of WCANCOR to outlying observations. As an example, the Boston housing data is analyzed using the proposed WCANCOR method.  相似文献   

17.
In this paper, the problem of fitting the exploratory factor analysis (EFA) model to data matrices with more variables than observations is reconsidered. A new algorithm named ‘zig-zag EFA’ is introduced for the simultaneous least squares estimation of all EFA model unknowns. As in principal component analysis, zig-zag EFA is based on the singular value decomposition of data matrices. Another advantage of the proposed computational routine is that it facilitates the estimation of both common and unique factor scores. Applications to both real and artificial data illustrate the algorithm and the EFA solutions.  相似文献   

18.
时空数据经常含有奇异点或来自重尾分布,此时基于最小二乘的估计方法效果欠佳,需要更稳健的估计方法.本文提出时空模型的基于局部众数(local modal, LM)的局部线性估计方法.理论和数据分析结果都显示,若数据含有奇异点或来自重尾分布,基于局部众数的局部线性方法比基于最小二乘的局部线性方法有效;若数据无奇异点且来自正态分布,两种方法效率渐近一致.本文采用众数期望最大化(modal expectation-maximization, MEM)算法,并在数据相依情形下得出估计量的渐近正态性.  相似文献   

19.
The paper is devoted to statistical nonparametric estimation of multivariate distribution density. The influence of data pre-clustering on the estimation accuracy of multimodal density is analyzed by means of the Monte Carlo method. It is shown that the soft clustering is more advantageous than the hard one. While a moderate increase in the number of clusters also increases the calculation time, it considerably reduces the estimation error.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号