首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 676 毫秒
1.
异常点诊断是统计学中的经典问题.发现并减少异常点对纳税评估数据分析的影响是一项很有意义的研究.然而,通常的异常点诊断一般采用适用于单峰分布的全局识别方法.借鉴局部域相关积分(Local correlation integral)理论,提出基于非参数密度估计的识别方法.方法适用于多峰分布,能识别局域性质的异常点,对异常点占比较高的样本也有较强的识别能力.基于某市10 920个企业样本,实证分析对比研究了税务局目前使用的和建议的纳税评估方法,结果表明税务局采用的方法有较大的纳税评估风险(误判风险).  相似文献   

2.
对模型精度与稳健性的要求使得异常值检测与稳健估计在模型构建中变得日益重要.本文首先利用基于边际相关系数构造的高维影响度量指标(HIM)与基于距离相关系数构造的高维数据异常值判别方法(HDC)分别对数据中的异常值进行初步检测,将数据集中的点分为正常点与异常点两类,然后在初始正常点集的基础上利用稳健的参数估计方法和残差空间超椭球等高面的概念构造了对初始正常点集中误判点的纠正方法,并对初始异常点集中各点的异常值概率重新进行计算,以进一步纠正误判入异常点集的正常点,最终对异常值检测的准确率进行进一步的提升.通过对两种数据结构下三种不同类型异常数据的模拟,证明了所提方法的有效性,并通过实例进行验证与分析.  相似文献   

3.
聂斌  王曦  胡雪 《运筹与管理》2019,28(1):101-107
在质量控制领域,非线性轮廓异常点识别问题是重点研究问题之一。本文综合运用了小波分析、数据深度、聚类分析等数据分析处理技术,提出了一种新的非正态变异的异常点识别方法。文章通过仿真分析技术,将新方法χ2与控制图方法进行性能对比,结果证实新方法能够以更高的准确率和稳定性识别异常点,表现出更好的异常点识别性能。最后将新方法应用于木板垂直密度轮廓实例对新方法进行验证,分析结果表明本方法能够有效识别出异常轮廓数据。  相似文献   

4.
针对ARMA模型建模过程中模型识别和参数估计易受观测值异常点影响问题,构建了同时考虑加性异常点和更新性异常点的ARMA模型.运用基于Gibbs抽样的Markov Chain Monte Carlo贝叶斯方法,估计稳健ARMA模型参数,同步确定观测值中异常点的位置,辨别异常点类型.并利用我国人口自然增长数据进行仿真分析,研究结果表明:贝叶斯方法能够有效地识别ARMA序列的异常点.  相似文献   

5.
考虑ATM交易过程当中产生的一系列参数,如交易量、交易成功率和响应时间等,对交易状态特征进行分析并建立了异常检测模型。针对成功率与响应时间2个参数,利用聚类算法将数据点划分为正常点、疑似异常点、异常点3大类。对于疑似的异常点,再根据其时间序列周围点的分布情况确定是否确实为异常点;对于交易量参数,首先通过LOF局部离群因子对离群点进行识别,再结合交易量随时间的移动均线及标准差加以辅助筛选,得到初步的疑似异常点,进一步通过与不同天同一时刻数据进行比较,最终确定是否为异常点。根据上述模型,本文将异常情况划分为3个预警等级,并对重大故障情况进行预测。  相似文献   

6.
基因识别是生物信息学研究的一个分支.多元统计中的判别分析方法模型简单、便于解释,处理剪切位点的识别问题效果良好,但极易受到异常值的影响.对于传统判别分析方法,使用稳健统计量进行优化,得到较好的效果,并通过加权方法进一步提高了判别分析方法的稳健性,取得了更好的识别效果.加权稳健判别分析方法稳健性高、受离群值影响小,对其他分类判别问题具有很好的实际意义和参考价值.  相似文献   

7.
指数族半参数非线性模型的统计诊断和影响分析   总被引:1,自引:0,他引:1  
本文研究了指数族半参数非线性模型的统计诊断和影响分析方法,得到了一系列识别异常点和强影响点的诊断统计量.数值例子验证了本文给出的诊断方法的有效性.  相似文献   

8.
异常值的存在会对时间序列波动率模型的识别及参数估计会产生重要影响,采用Tukey双权法权函数对被拟合相关序列模型的残差进行变换,再将变换后的残差序列对波动率模型进行稳健识别及建模,模拟及实证分析表明稳健识别及估计方法具有很好的耐抗性,而且能更好的捕捉到资产收益率的波动性.  相似文献   

9.
对某火电厂10个设备的60000条高频监测数据进行基本的统计分析,得出高频数据的异常数据具有异常点和异常段两种特征,提出了一种基于频数分布和一阶向前差分的检测高频数据中的异常点和异常段的方法.根据数据的一阶向前差分绝对值的频率分布以及风险系数来确定异常数据的阈值,根据设备本身的性能和采样频率确定了异常段所包含异常点的最大个数,根据阈值和最大异常点个数给出异常点和异常段的判断规则.用该方法诊断火电厂前置泵电机的6000条数据的异常数据,结果与实际异常数据相符.  相似文献   

10.
可疑交易识别是打击洗钱犯罪所要面对的一项重要任务.为辅助反洗钱分析人员从海量金融交易信息中甄别客户异常交易,本文提出一种新的基于非线性马尔科夫随机过程、相空间重构和隐马尔科夫链的非线性随机方法,用于对金融交易时序进行建模拟合,然后应用鲁棒控制图对估计误差进行检验以发现异常.应用该算法对实际交易数据和仿真数据的分析验证了所提方法的有效性和可行性,可以被用于异常交易的监测.  相似文献   

11.
Summary  The problem of detection of multidimensional outliers is a fundamental and important problem in applied statistics. The unreliability of multivariate outlier detection techniques such as Mahalanobis distance and hat matrix leverage has led to development of techniques which have been known in the statistical community for well over a decade. The literature on this subject is vast and growing. In this paper, we propose to use the artificial intelligence technique ofself-organizing map (SOM) for detecting multiple outliers in multidimensional datasets. SOM, which produces a topology-preserving mapping of the multidimensional data cloud onto lower dimensional visualizable plane, provides an easy way of detection of multidimensional outliers in the data, at respective levels of leverage. The proposed SOM based method for outlier detection not only identifies the multidimensional outliers, it actually provides information about the entire outlier neighbourhood. Being an artificial intelligence technique, SOM based outlier detection technique is non-parametric and can be used to detect outliers from very large multidimensional datasets. The method is applied to detect outliers from varied types of simulated multivariate datasets, a benchmark dataset and also to real life cheque processing dataset. The results show that SOM can effectively be used as a useful technique for multidimensional outlier detection.  相似文献   

12.
We propose new tools for visualizing large amounts of functional data in the form of smooth curves. The proposed tools include functional versions of the bagplot and boxplot, which make use of the first two robust principal component scores, Tukey’s data depth and highest density regions.

By-products of our graphical displays are outlier detection methods for functional data. We compare these new outlier detection methods with existing methods for detecting outliers in functional data, and show that our methods are better able to identify outliers.

An R-package containing computer code and datasets is available in the online supplements.  相似文献   

13.
This article proposes a new technique for detecting outliers in autoregressive models and identifying the type as either innovation or additive. This technique can be used without knowledge of the true model order, outlier location, or outlier type. Specifically, we perturb an observation to obtain the perturbation size that minimizes the resulting residual sum of squares (SSE). The reduction in the SSE yields outlier detection and identification measures. In addition, the perturbation size can be used to gauge the magnitude of the outlier. Monte Carlo studies and empirical examples are presented to illustrate the performance of the proposed method as well as the impact of outliers on model selection and parameter estimation. We also obtain robust estimators and model selection criteria, which are shown in simulation studies to perform well when large outliers occur.  相似文献   

14.
In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results, particularly for two-stage analyses. In the DEA-related literature, prior work on this issue has focused on the efficient frontier as a basis for detecting outliers. An iterative approach for dealing with the potential for one outlier to mask the presence of another has been proposed but not demonstrated. This paper proposes using both the efficient frontier and the inefficient frontier to identify outliers and thereby improve the accuracy of second stage results in two-stage nonparametric analysis. The iterative outlier detection approach is implemented in a leave-one-out method using both the efficient frontier and the inefficient frontier and demonstrated in a two-stage semi-parametric bootstrapping analysis of a classic data set. The results show that the conclusions drawn can be different when outlier identification includes consideration of the inefficient frontier.  相似文献   

15.
高质量的决策越来越依赖于高质量的数据挖掘及其分析,高质量的数据挖掘离不开高质量的数据.在大型仪器利用情况调查中,由于主客观因素,总是致使有些数据出现异常,影响数据的质量.这就需要通过适用的方法对异常数据进行检测处理.不同类型数据往往需要不同的异常值检测方法.分析了大型仪器利用情况调查数据的总体特点、一般方法,并以国家科技部平台中心主持的"我国大型仪器资源现状调查"(2009)中大型仪器使用机时和共享机时数据为主线,比较研究了回归方法、基于深度的方法和箱线图方法等对不同类型数据异常值检测的适用性.选取不同角度,检验并采用不同的适用方法,找出相关的可疑异常值,有助于下一步有效开展大型仪器利用情况异常数据的分析处理,提高数据质量,为大型仪器利用情况综合评价奠定基础,也为科技资源调查数据预处理中异常值检测方法提供有益借鉴.  相似文献   

16.
This paper suggests an outlier detection procedure which applies a nonparametric model accounting for undesired outputs and exogenous influences in the sample. Although efficiency is estimated in a deterministic frontier approach, each potential outlier initially benefits of the doubt of not being an outlier. We survey several outlier detection procedures and select five complementary methodologies which, taken together, are able to detect all influential observations. To exploit the singularity of the leverage and the peer count, the super-efficiency and the order-m method and the peer index, it is proposed to select these observations as outliers which are simultaneously revealed as atypical by at least two of the procedures. A simulated example demonstrates the usefulness of this approach. The model is applied to the Portuguese drinking water sector, for which we have an unusually rich data set.  相似文献   

17.
指数样本中多个异常值的Unmasking检验   总被引:2,自引:0,他引:2  
指数样本中多个异常值的非一致性检验因受masking或swamping效应的影响而变得十分的困难和复杂,解决这一问题的关键在于K值的确定,传统的方法是无能为力的.本文基于变量选择的AIC准则的思想提出了异常值检验的一种新方法,它具有不预先指定k,计算简单且通过达到极大化MAIC就能达到确定k和消除检验中的masking或swamping的优点.还给出了易计算检验显著水平的统计量和公式.最后,通过实例的验证标明本文方法的有效性.  相似文献   

18.
The outlier detection problem and the robust covariance estimation problem are often interchangeable. Without outliers, the classical method of maximum likelihood estimation (MLE) can be used to estimate parameters of a known distribution from observational data. When outliers are present, they dominate the log likelihood function causing the MLE estimators to be pulled toward them. Many robust statistical methods have been developed to detect outliers and to produce estimators that are robust against deviation from model assumptions. However, the existing methods suffer either from computational complexity when problem size increases or from giving up desirable properties, such as affine equivariance. An alternative approach is to design a special mathematical programming model to find the optimal weights for all the observations, such that at the optimal solution, outliers are given smaller weights and can be detected. This method produces a covariance estimator that has the following properties: First, it is affine equivariant. Second, it is computationally efficient even for large problem sizes. Third, it easy to incorporate prior beliefs into the estimator by using semi-definite programming. The accuracy of this method is tested for different contamination models, including recently proposed ones. The method is not only faster than the Fast-MCD method for high dimensional data but also has reasonable accuracy for the tested cases.  相似文献   

19.
This article considers the problem of detecting outliers in time series data and proposes a general detection method based on wavelets. Unlike other detection procedures found in the literature, our method does not require that a model be specified for the data. Also, use of our method is not restricted to data generated from ARIMA processes. The effectiveness of the proposed method is compared with existing outlier detection procedures. Comparisons based on various models, sample sizes, and parameter values illustrate the effectiveness of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号