首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Support vector machines (SVMs) have attracted much attention in theoretical and in applied statistics. The main topics of recent interest are consistency, learning rates and robustness. We address the open problem whether SVMs are qualitatively robust. Our results show that SVMs are qualitatively robust for any fixed regularization parameter λ. However, under extremely mild conditions on the SVM, it turns out that SVMs are not qualitatively robust any more for any null sequence λn, which are the classical sequences needed to obtain universal consistency. This lack of qualitative robustness is of a rather theoretical nature because we show that, in any case, SVMs fulfill a finite sample qualitative robustness property.For a fixed regularization parameter, SVMs can be represented by a functional on the set of all probability measures. Qualitative robustness is proven by showing that this functional is continuous with respect to the topology generated by weak convergence of probability measures. Combined with the existence and uniqueness of SVMs, our results show that SVMs are the solutions of a well-posed mathematical problem in Hadamard’s sense.  相似文献   

2.
A minimum volume (MV) set, at level α, is a set having minimum volume among all those sets containing at least α probability mass. MV sets provide a natural notion of the ‘central mass’ of a distribution and, as such, have recently become popular as a tool for the detection of anomalies in multivariate data. Motivated by the fact that anomaly detection problems frequently arise in settings with temporally indexed measurements, we propose here a new method for the estimation of MV sets from dependent data. Our method is based on the concept of complexity-penalized estimation, extending recent work of Scott and Nowak for the case of independent and identically distributed measurements, and has both desirable theoretical properties and a practical implementation. Of particular note is the fact that, for a large class of stochastic processes, choice of an appropriate complexity penalty reduces to the selection of a single tuning parameter, which represents the data dependency of the underlying stochastic process. While in reality the dependence structure is unknown, we offer a data-dependent method for selecting this parameter, based on subsampling principles. Our work is motivated by and illustrated through an application to the detection of anomalous traffic levels in Internet traffic time series.  相似文献   

3.
We consider the estimation of the regression operator r in the functional model: Y=r(x)+ε, where the explanatory variable x is of functional fixed-design type, the response Y is a real random variable and the error process ε is a second order stationary process. We construct the kernel type estimate of r from functional data curves and correlated errors. Then we study their performances in terms of the mean square convergence and the convergence in probability. In particular, we consider the cases of short and long range error processes. When the errors are negatively correlated or come from a short memory process, the asymptotic normality of this estimate is derived. Finally, some simulation studies are conducted for a fractional autoregressive integrated moving average and for an Ornstein-Uhlenbeck error processes.  相似文献   

4.
Consider the model Y=m(X)+ε, where m(⋅)=med(Y|⋅) is unknown but smooth. It is often assumed that ε and X are independent. However, in practice this assumption is violated in many cases. In this paper we propose modeling the dependence between ε and X by means of a copula model, i.e. (ε,X)∼Cθ(Fε(⋅),FX(⋅)), where Cθ is a copula function depending on an unknown parameter θ, and Fε and FX are the marginals of ε and X. Since many parametric copula families contain the independent copula as a special case, the so-obtained regression model is more flexible than the ‘classical’ regression model.We estimate the parameter θ via a pseudo-likelihood method and prove the asymptotic normality of the estimator, based on delicate empirical process theory. We also study the estimation of the conditional distribution of Y given X. The procedure is illustrated by means of a simulation study, and the method is applied to data on food expenditures in households.  相似文献   

5.
Model identification and discrimination are two major statistical challenges. In this paper we consider a set of models Mk for factorial experiments with the parameters representing the general mean, main effects, and only k out of all two-factor interactions. We consider the class D of all fractional factorial plans with the same number of runs having the ability to identify all the models in Mk, i.e., the full estimation capacity.The fractional factorial plans in D with the full estimation capacity for k?2 are able to discriminate between models in Mu for u?k*, where k*=(k/2) when k is even, k*=((k-1)/2) when k is odd. We obtain fractional factorial plans in D satisfying the six optimality criterion functions AD, AT, AMCR, GD, GT, and GMCR for 2m factorial experiments when m=4 and 5. Both single stage and multi-stage (hierarchical) designs are given. Some results on estimation capacity of a fractional factorial plan for identifying models in Mk are also given. Our designs D4.1 and D10 stand out in their performances relative to the designs given in Li and Nachtsheim [Model-robust factorial designs, Technometrics 42(4) (2000) 345-352.] for m=4 and 5 with respect to the criterion functions AD, AT, AMCR, GD, GT, and GMCR. Our design D4.2 stands out in its performance relative the Li-Nachtsheim design for m=4 with respect to the four criterion functions AT, AMCR, GT, and GMCR. However, the Li-Nachtsheim design for m=4 stands out in its performance relative to our design D4.2 with respect to the criterion functions AD and GD. Our design D14 does have the full estimation capacity for k=5 but the twelve run Li-Nachtsheim design does not have the full estimation capacity for k=5.  相似文献   

6.
In this paper, we discuss the construction of the confidence intervals for the regression vector β in a linear model under negatively associated errors. It is shown that the blockwise empirical likelihood (EL) ratio statistic for β is asymptotically χ2-type distributed. The result is used to obtain an EL based confidence region for β.  相似文献   

7.
We analyze a sequence of single-server queueing systems with impatient customers in heavy traffic. Our state process is the offered waiting time, and the customer arrival process has a state dependent intensity. Service times and customer patient-times are independent; i.i.d. with general distributions subject to mild constraints. We establish the heavy traffic approximation for the scaled offered waiting time process and obtain a diffusion process as the heavy traffic limit. The drift coefficient of this limiting diffusion is influenced by the sequence of patience-time distributions in a non-linear fashion. We also establish an asymptotic relationship between the scaled version of offered waiting time and queue-length. As a consequence, we obtain the heavy traffic limit of the scaled queue-length. We introduce an infinite-horizon discounted cost functional whose running cost depends on the offered waiting time and server idle time processes. Under mild assumptions, we show that the expected value of this cost functional for the n-th system converges to that of the limiting diffusion process as n tends to infinity.  相似文献   

8.
We consider block thresholding wavelet-based density estimators with randomly right-censored data and investigate their asymptotic convergence rates. Unlike for the complete data case, the empirical wavelet coefficients are constructed through the Kaplan-Meier estimators of the distribution functions in the censored data case. On the basis of a result of Stute [W. Stute, The central limit theorem under random censorship, Ann. Statist. 23 (1995) 422-439] that approximates the Kaplan-Meier integrals as averages of i.i.d. random variables with a certain rate in probability, we can show that these wavelet empirical coefficients can be approximated by averages of i.i.d. random variables with a certain error rate in L2. Therefore we can show that these estimators, based on block thresholding of empirical wavelet coefficients, achieve optimal convergence rates over a large range of Besov function classes , p≥2, q≥1 and nearly optimal convergence rates when 1≤p<2. We also show that these estimators achieve optimal convergence rates over a large class of functions that involve many irregularities of a wide variety of types, including chirp and Doppler functions, and jump discontinuities. Therefore, in the presence of random censoring, wavelet estimators still provide extensive adaptivity to many irregularities of large function classes. The performance of the estimators is tested via a modest simulation study.  相似文献   

9.
Many applications aim to learn a high dimensional parameter of a data generating distribution based on a sample of independent and identically distributed observations. For example, the goal might be to estimate the conditional mean of an outcome given a list of input variables. In this prediction context, bootstrap aggregating (bagging) has been introduced as a method to reduce the variance of a given estimator at little cost to bias. Bagging involves applying an estimator to multiple bootstrap samples and averaging the result across bootstrap samples. In order to address the curse of dimensionality, a common practice has been to apply bagging to estimators which themselves use cross-validation, thereby using cross-validation within a bootstrap sample to select fine-tuning parameters trading off bias and variance of the bootstrap sample-specific candidate estimators. In this article we point out that in order to achieve the correct bias variance trade-off for the parameter of interest, one should apply the cross-validation selector externally to candidate bagged estimators indexed by these fine-tuning parameters. We use three simulations to compare the new cross-validated bagging method with bagging of cross-validated estimators and bagging of non-cross-validated estimators.  相似文献   

10.
Homogeneity tests based on several progressively Type-II censored samples   总被引:2,自引:0,他引:2  
In this paper, we discuss the problem of testing the homogeneity of several populations when the available data are progressively Type-II censored. Defining for each sample a univariate counting process, we can modify all the methods that were developed during the last two decades (see e.g. [P.K. Andersen, Ø. Borgan, R. Gill, N. Keiding, Statistical Models Based on Counting Processes, Springer, New York, 1993]) for use to this problem. An important aspect of these tests is that they are based on either linear or non-linear functionals of a discrepancy process (DP) based on the comparison of the cumulative hazard rate (chr) estimated from each sample with the chr estimated from the whole sample (viz., the aggregation of all the samples), leading to either linear tests or non-linear tests. Both these kinds of tests suffer from some serious drawbacks. For example, it is difficult to extend non-linear tests to the K-sample situation when K?3. For this reason, we propose here a new class of non-linear tests, based on a chi-square type functional of the DP, that can be applied to the K-sample problem for any K?2.  相似文献   

11.
In this paper, we use an empirical likelihood method to construct confidence regions for the stationary ARMA(p,q) models with infinite variance. An empirical log-likelihood ratio is derived by the estimating equation of the self-weighted LAD estimator. It is proved that the proposed statistic has an asymptotic standard chi-squared distribution. Simulation studies show that in a small sample case, the performance of empirical likelihood method is better than that of normal approximation of the LAD estimator in terms of the coverage accuracy.  相似文献   

12.
Testing for the independence between two categorical variables R and S forming a contingency table is a well-known problem: the classical chi-square and likelihood ratio tests are used. Suppose now that for each individual a set of p characteristics is also observed. Those explanatory variables, likely to be associated with R and S, can play a major role in their possible association, and it can therefore be interesting to test the independence between R and S conditionally on them. In this paper, we propose two nonparametric tests which generalise the chi-square and the likelihood ratio ideas to this case. The procedure is based on a kernel estimator of the conditional probabilities. The asymptotic law of the proposed test statistics under the conditional independence hypothesis is derived; the finite sample behaviour of the procedure is analysed through some Monte Carlo experiments and the approach is illustrated with a real data example.  相似文献   

13.
Efficiency of a Liu-type estimator in semiparametric regression models   总被引:1,自引:0,他引:1  
In this paper we consider the semiparametric regression model, y=Xβ+f+ε. Recently, Hu [11] proposed ridge regression estimator in a semiparametric regression model. We introduce a Liu-type (combined ridge-Stein) estimator (LTE) in a semiparametric regression model. Firstly, Liu-type estimators of both β and f are attained without a restrained design matrix. Secondly, the LTE estimator of β is compared with the two-step estimator in terms of the mean square error. We describe the almost unbiased Liu-type estimator in semiparametric regression models. The almost unbiased Liu-type estimator is compared with the Liu-type estimator in terms of the mean squared error matrix. A numerical example is provided to show the performance of the estimators.  相似文献   

14.
De Haan and Pereira (2006) [6] provided models for spatial extremes in the case of stationarity, which depend on just one parameter β>0 measuring tail dependence, and they proposed different estimators for this parameter. We supplement this framework by establishing local asymptotic normality (LAN) of a corresponding point process of exceedances above a high multivariate threshold. Standard arguments from LAN theory then provide the asymptotic minimum variance within the class of regular estimators of β. It turns out that the relative frequency of exceedances is a regular estimator sequence with asymptotic minimum variance, if the underlying observations follow a multivariate extreme value distribution or a multivariate generalized Pareto distribution.  相似文献   

15.
The so-called independent component (IC) model states that the observed p-vector X is generated via X=ΛZ+μ, where μ is a p-vector, Λ is a full-rank matrix, and the centered random vector Z has independent marginals. We consider the problem of testing the null hypothesis H0:μ=0 on the basis of i.i.d. observations X1,…,Xn generated by the symmetric version of the IC model above (for which all ICs have a symmetric distribution about the origin). In the spirit of [M. Hallin, D. Paindaveine, Optimal tests for multivariate location based on interdirections and pseudo-Mahalanobis ranks, Annals of Statistics, 30 (2002), 1103-1133], we develop nonparametric (signed-rank) tests, which are valid without any moment assumption and are, for adequately chosen scores, locally and asymptotically optimal (in the Le Cam sense) at given densities. Our tests are measurable with respect to the marginal signed ranks computed in the collection of null residuals , where is a suitable estimate of Λ. Provided that is affine-equivariant, the proposed tests, unlike the standard marginal signed-rank tests developed in [M.L. Puri, P.K. Sen, Nonparametric Methods in Multivariate Analysis, Wiley & Sons, New York, 1971] or any of their obvious generalizations, are affine-invariant. Local powers and asymptotic relative efficiencies (AREs) with respect to Hotelling’s T2 test are derived. Quite remarkably, when Gaussian scores are used, these AREs are always greater than or equal to one, with equality in the multinormal model only. Finite-sample efficiencies and robustness properties are investigated through a Monte Carlo study.  相似文献   

16.
A set of n-principal points of a distribution is defined as a set of n points that optimally represent the distribution in terms of mean squared distance. It provides an optimal n-point-approximation of the distribution. However, it is in general difficult to find a set of principal points of a multivariate distribution. Tarpey et al. [T. Tarpey, L. Li, B. Flury, Principal points and self-consistent points of elliptical distributions, Ann. Statist. 23 (1995) 103-112] established a theorem which states that any set of n-principal points of an elliptically symmetric distribution is in the linear subspace spanned by some principal eigenvectors of the covariance matrix. This theorem, called a “principal subspace theorem”, is a strong tool for the calculation of principal points. In practice, we often come across distributions consisting of several subgroups. Hence it is of interest to know whether the principal subspace theorem remains valid even under such complex distributions. In this paper, we define a multivariate location mixture model. A theorem is established that clarifies a linear subspace in which n-principal points exist.  相似文献   

17.
In this article, we propose a new estimation methodology to deal with PCA for high-dimension, low-sample-size (HDLSS) data. We first show that HDLSS datasets have different geometric representations depending on whether a ρ-mixing-type dependency appears in variables or not. When the ρ-mixing-type dependency appears in variables, the HDLSS data converge to an n-dimensional surface of unit sphere with increasing dimension. We pay special attention to this phenomenon. We propose a method called the noise-reduction methodology to estimate eigenvalues of a HDLSS dataset. We show that the eigenvalue estimator holds consistency properties along with its limiting distribution in HDLSS context. We consider consistency properties of PC directions. We apply the noise-reduction methodology to estimating PC scores. We also give an application in the discriminant analysis for HDLSS datasets by using the inverse covariance matrix estimator induced by the noise-reduction methodology.  相似文献   

18.
Let X={X(s)}sS be an almost sure continuous stochastic process (S compact subset of Rd) in the domain of attraction of some max-stable process, with index function constant over S. We study the tail distribution of ∫SX(s)ds, which turns out to be of Generalized Pareto type with an extra ‘spatial’ parameter (the areal coefficient from Coles and Tawn (1996) [3]). Moreover, we discuss how to estimate the tail probability P(∫SX(s)ds>x) for some high value x, based on independent and identically distributed copies of X. In the course we also give an estimator for the areal coefficient. We prove consistency of the proposed estimators. Our methods are applied to the total rainfall in the North Holland area; i.e. X represents in this case the rainfall over the region for which we have observations, and its integral amounts to total rainfall.The paper has two main purposes: first to formalize and justify the results of Coles and Tawn (1996) [3]; further we treat the problem in a non-parametric way as opposed to their fully parametric methods.  相似文献   

19.
In this article we develop a nonparametric methodology for estimating the mean change for matched samples on a Lie group. We then notice that for k≥5, a manifold of projective shapes of k-ads in 3D has the structure of a 3k−15 dimensional Lie group that is equivariantly embedded in a Euclidean space, therefore testing for mean change amounts to a one sample test for extrinsic means on this Lie group. The Lie group technique leads to a large sample and a nonparametric bootstrap test for one population extrinsic mean on a projective shape space, as recently developed by Patrangenaru, Liu and Sughatadasa. On the other hand, in the absence of occlusions, the 3D projective shape of a spatial k-ad can be recovered from a stereo pair of images, thus allowing one to test for mean glaucomatous 3D projective shape change detection from standard stereo pair eye images.  相似文献   

20.
We show that under different moment bounds on the underlying variables, bootstrap approximation to the large deviation probabilities of standardized sample sum, based on independent random variables, is valid for a wider zone of n, the sample size, compared to the classical normal tail probability approximation. As an application, different notions of efficiency for statistical tests are considered from Bayesian point of view. In particular, efficiency due to Pitman (1938) [11], Chernoff (1952) [1], and Bayes risk efficiency due to Rubin and Sethuraman (1965) [12] turn out to be special cases with the choice of the weight function; i.e., prior density times loss.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号