共查询到20条相似文献,搜索用时 31 毫秒
1.
在时间序列回归模型分析中,相关性和方差齐性的检验是一个很基本的问题.本文讨论了具有双线性BL(1,1,1,1)误差的非线性回归模型的相关性和方差齐性的检验问题, 用Score检验方法给出了双线性项检验、相关性检验、方差齐性检验、以及相关性和方差齐性同时检验的检验统计量.推广和发展了具有线性序列误差项回归模型的结果.本文还用数值实例说明了检验方法的实用价值. 相似文献
2.
《Journal of Computational and Applied Mathematics》2002,149(1):119-129
In this paper we introduce COV, a novel information retrieval (IR) algorithm for massive databases based on vector space modeling and spectral analysis of the covariance matrix, for the document vectors, to reduce the scale of the problem. Since the dimension of the covariance matrix depends on the attribute space and is independent of the number of documents, COV can be applied to databases that are too massive for methods based on the singular value decomposition of the document-attribute matrix, such as latent semantic indexing (LSI). In addition to improved scalability, theoretical considerations indicate that results from our algorithm tend to be more accurate than those from LSI, particularly in detecting subtle differences in document vectors. We demonstrate the power and accuracy of COV through an important topic in data mining, known as outlier cluster detection. We propose two new algorithms for detecting major and outlier clusters in databases—the first is based on LSI, and the second on COV. Our implementation studies indicate that our cluster detection algorithms outperform the basic LSI and COV algorithm in detecting outlier clusters. 相似文献
3.
Summary The problem of detection of multidimensional outliers is a fundamental and important problem in applied statistics. The unreliability
of multivariate outlier detection techniques such as Mahalanobis distance and hat matrix leverage has led to development of
techniques which have been known in the statistical community for well over a decade. The literature on this subject is vast
and growing. In this paper, we propose to use the artificial intelligence technique ofself-organizing map (SOM) for detecting multiple outliers in multidimensional datasets. SOM, which produces a topology-preserving mapping of
the multidimensional data cloud onto lower dimensional visualizable plane, provides an easy way of detection of multidimensional
outliers in the data, at respective levels of leverage. The proposed SOM based method for outlier detection not only identifies
the multidimensional outliers, it actually provides information about the entire outlier neighbourhood. Being an artificial
intelligence technique, SOM based outlier detection technique is non-parametric and can be used to detect outliers from very
large multidimensional datasets. The method is applied to detect outliers from varied types of simulated multivariate datasets,
a benchmark dataset and also to real life cheque processing dataset. The results show that SOM can effectively be used as
a useful technique for multidimensional outlier detection. 相似文献
4.
《Journal of computational and graphical statistics》2013,22(2):450-471
This article proposes a new technique for detecting outliers in autoregressive models and identifying the type as either innovation or additive. This technique can be used without knowledge of the true model order, outlier location, or outlier type. Specifically, we perturb an observation to obtain the perturbation size that minimizes the resulting residual sum of squares (SSE). The reduction in the SSE yields outlier detection and identification measures. In addition, the perturbation size can be used to gauge the magnitude of the outlier. Monte Carlo studies and empirical examples are presented to illustrate the performance of the proposed method as well as the impact of outliers on model selection and parameter estimation. We also obtain robust estimators and model selection criteria, which are shown in simulation studies to perform well when large outliers occur. 相似文献
5.
Influential observations in frontier models,a robust non-oriented approach to the water sector 总被引:1,自引:0,他引:1
This paper suggests an outlier detection procedure which applies a nonparametric model accounting for undesired outputs and
exogenous influences in the sample. Although efficiency is estimated in a deterministic frontier approach, each potential
outlier initially benefits of the doubt of not being an outlier. We survey several outlier detection procedures and select
five complementary methodologies which, taken together, are able to detect all influential observations. To exploit the singularity
of the leverage and the peer count, the super-efficiency and the order-m method and the peer index, it is proposed to select these observations as outliers which are simultaneously revealed as atypical
by at least two of the procedures. A simulated example demonstrates the usefulness of this approach. The model is applied
to the Portuguese drinking water sector, for which we have an unusually rich data set. 相似文献
6.
We consider the problem of deleting bad influential observations (outliers) in linear regression models. The problem is formulated
as a Quadratic Mixed Integer Programming (QMIP) problem, where penalty costs for discarding outliers are used into the objective
function. The optimum solution defines a robust regression estimator called penalized trimmed squares (PTS). Due to the high
computational complexity of the resulting QMIP problem, the proposed robust procedure is computationally suitable for small
sample data. The computational performance and the effectiveness of the new procedure are improved significantly by using
the idea of ε-Insensitive loss function from support vectors machine regression. Small errors are ignored, and the mathematical formula
gains the sparseness property. The good performance of the ε-Insensitive PTS (IPTS) estimator allows identification of multiple outliers avoiding masking or swamping effects. The computational
effectiveness and successful outlier detection of the proposed method is demonstrated via simulated experiments.
This research has been partially funded by the Greek Ministry of Education under the program Pythagoras II. 相似文献
7.
We consider a square random matrix of size N of the form A + Y where A is deterministic and Y has i.i.d. entries with variance 1/N. Under mild assumptions, as N grows the empirical distribution of the eigenvalues of A + Y converges weakly to a limit probability measure β on the complex plane. This work is devoted to the study of the outlier eigenvalues, i.e., eigenvalues in the complement of the support of β. Even in the simplest cases, a variety of interesting phenomena can occur. As in earlier works, we give a sufficient condition to guarantee that outliers are stable and provide examples where their fluctuations vary with the particular distribution of the entries of Y or the Jordan decomposition of A. We also exhibit concrete examples where the outlier eigenvalues converge in distribution to the zeros of a Gaussian analytic function. © 2016 Wiley Periodicals, Inc. 相似文献
8.
9.
This paper explains some drawbacks on previous approaches for detecting influential observations in deterministic nonparametric data envelopment analysis models as developed by Yang et al. (Annals of Operations Research 173:89–103, 2010). For example efficiency scores and relative entropies obtained in this model are unimportant to outlier detection and the empirical distribution of all estimated relative entropies is not a Monte-Carlo approximation. In this paper we developed a new method to detect whether a specific DMU is truly influential and a statistical test has been applied to determine the significance level. An application for measuring efficiency of hospitals is used to show the superiority of this method that leads to significant advancements in outlier detection. 相似文献
10.
Robert Kosara 《Journal of computational and graphical statistics》2013,22(1):29-32
We propose new tools for visualizing large amounts of functional data in the form of smooth curves. The proposed tools include functional versions of the bagplot and boxplot, which make use of the first two robust principal component scores, Tukey’s data depth and highest density regions. By-products of our graphical displays are outlier detection methods for functional data. We compare these new outlier detection methods with existing methods for detecting outliers in functional data, and show that our methods are better able to identify outliers. An R-package containing computer code and datasets is available in the online supplements. 相似文献
11.
Cluster-based outlier detection 总被引:1,自引:0,他引:1
Outlier detection has important applications in the field of data mining, such as fraud detection, customer behavior analysis,
and intrusion detection. Outlier detection is the process of detecting the data objects which are grossly different from or
inconsistent with the remaining set of data. Outliers are traditionally considered as single points; however, there is a key
observation that many abnormal events have both temporal and spatial locality, which might form small clusters that also need
to be deemed as outliers. In other words, not only a single point but also a small cluster can probably be an outlier. In
this paper, we present a new definition for outliers: cluster-based outlier, which is meaningful and provides importance to
the local data behavior, and how to detect outliers by the clustering algorithm LDBSCAN (Duan et al. in Inf. Syst. 32(7):978–986,
2007) which is capable of finding clusters and assigning LOF (Breunig et al. in Proceedings of the 2000 ACM SIG MOD International
Conference on Manegement of Data, ACM Press, pp. 93–104, 2000) to single points. 相似文献
12.
Srabashi Basu Ayanendranath Basu M. C. Jones 《Annals of the Institute of Statistical Mathematics》2006,58(2):341-355
We fit parametric models to survival data in the case of censoring and (outlier) contamination. To do so, we adapt the robust
density power divergence methodology of Basu, Harris, Hjort, and Jones (Biometrika, 85, 549–559, 1998) to the case of censored survival data. Asymptotic properties, simulation performance and application to data
are provided. 相似文献
13.
We address the statistical problem of detecting change points in the stress‐strength reliability R=P(X<Y) in a sequence of paired variables (X,Y). Without specifying their underlying distributions, we embed this nonparametric problem into a parametric framework and apply the maximum likelihood method via a dynamic programming approach to determine the locations of the change points in R. Under some mild conditions, we show the consistency and asymptotic properties of the procedure to locate the change points. Simulation experiments reveal that, in comparison with existing parametric and nonparametric change‐point detection methods, our proposed method performs well in detecting both single and multiple change points in R in terms of the accuracy of the location estimation and the computation time. Applications to real data demonstrate the usefulness of our proposed methodology for detecting the change points in the stress‐strength reliability R. Supplementary materials are available online. 相似文献
14.
In [10] it is claimed that the set of predicate tautologies of all complete BL‐chains and the set of all standard tautologies (i. e., the set of predicate formulas valid in all standard BL‐algebras) coincide. As noticed in [11], this claim is wrong. In this paper we show that a complete BL‐chain B satisfies all standard BL‐tautologies iff for any transfinite sequence (ai: i ∈ I) of elements of B , the condition ∧i ∈ I (a2i ) = (∧i ∈ I ai)2 holds in B . (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献
15.
Likelihood ratio tests for detecting a single outlier in multivariate linear models are considered, where an observation is called an outlier if there has been a shift in the mean. The test statistics are the maximum of n nonindependent statistics, where n is the number of observations. Relevant distributions to use upper and lower Bonferroni's inequalities are given. 相似文献
16.
Ping-Hung Hsieh 《Journal of computational and graphical statistics》2013,22(2):318-332
Abstract The implementation of the Hill estimator, which estimates the heaviness of the tail of a distribution, requires a choice of the number of extreme observations in the tails, r from a sample of size n where 2 ≤ r + 1 ≤ n. This article is concerned with a robust procedure of choosing an optimal r. Thus, an estimation procedure, δ s , based on the idea of spacing statistics, H(r) is developed. The proposed decision rule for choosing r under the squared error loss is found to be a simple function of the sample size. The proposed rule is then illustrated across a wide range of data, including insurance claims, currency exchange rate returns, and city size. 相似文献
17.
Pinyuen Chen 《Annals of the Institute of Statistical Mathematics》1987,39(1):325-330
Summary Ak-in-a-row procedure is proposed to select the most demanded element in a set ofn elements. We show that the least favorable configuration of the proposed procedure which always selects the element when
the same element has been demanded (or observed)k times in a row has a simple form similar to those of classical selection procedures. Moreover, numerical evidences are provided
to illustrate the fact thatk-in-a-row procedure is better than the usual inverse sampling procedure and fixed sample size procedure when the distance
between the most demanded element and the other elements is large and when the number of elements is small. 相似文献
18.
Esko Turunen 《Mathematical Logic Quarterly》2007,53(2):170-175
Generalizations of Boolean elements of a BL‐algebra L are studied. By utilizing the MV‐center MV(L) of L, it is reproved that an element x ∈ L is Boolean iff x ∨ x * = 1 . L is called semi‐Boolean if for all x ∈ L, x * is Boolean. An MV‐algebra L is semi‐Boolean iff L is a Boolean algebra. A BL‐algebra L is semi‐Boolean iff L is an SBL‐algebra. A BL‐algebra L is called hyper‐Archimedean if for all x ∈ L, xn is Boolean for some finite n ≥ 1. It is proved that hyper‐Archimedean BL‐algebras are MV‐algebras. The study has application in mathematical fuzzy logics whose Lindenbaum algebras are MV‐algebras or BL‐algebras. (© 2007 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献
19.
20.
For the several sample problem, a vector of estimable parameters is considered. For a fixed total sample size, a multistage (sequential) procedure based on generalized U-statistics is developed for choosing a partition of this sample size into individual sample size for which the generalized variance of the estimator of the parameter vector is asymptotically minimized. 相似文献