首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many applications aim to learn a high dimensional parameter of a data generating distribution based on a sample of independent and identically distributed observations. For example, the goal might be to estimate the conditional mean of an outcome given a list of input variables. In this prediction context, bootstrap aggregating (bagging) has been introduced as a method to reduce the variance of a given estimator at little cost to bias. Bagging involves applying an estimator to multiple bootstrap samples and averaging the result across bootstrap samples. In order to address the curse of dimensionality, a common practice has been to apply bagging to estimators which themselves use cross-validation, thereby using cross-validation within a bootstrap sample to select fine-tuning parameters trading off bias and variance of the bootstrap sample-specific candidate estimators. In this article we point out that in order to achieve the correct bias variance trade-off for the parameter of interest, one should apply the cross-validation selector externally to candidate bagged estimators indexed by these fine-tuning parameters. We use three simulations to compare the new cross-validated bagging method with bagging of cross-validated estimators and bagging of non-cross-validated estimators.  相似文献   

2.
We develop importance sampling estimators for Monte Carlo pricing of European and path-dependent options in models driven by Lévy processes. Using results from the theory of large deviations for processes with independent increments, we compute an explicit asymptotic approximation for the variance of the pay-off under a time-dependent Esscher-style change of measure. Minimizing this asymptotic variance using convex duality, we then obtain an importance sampling estimator of the option price. We show that our estimator is logarithmically optimal among all importance sampling estimators. Numerical tests in the variance gamma model show consistent variance reduction with a small computational overhead.  相似文献   

3.
This paper investigates the use of stratified sampling as a variance reduction technique for approximating integrals over large dimensional spaces. The accuracy of this method critically depends on the choice of the space partition, the strata, which should be ideally fitted to the subsets where the functions to integrate is nearly constant, and on the allocation of the number of samples within each strata. When the dimension is large and the function to integrate is complex, finding such partitions and allocating the sample is a highly non-trivial problem. In this work, we investigate a novel method to improve the efficiency of the estimator “on the fly”, by jointly sampling and adapting the strata which are hyperrectangles and the allocation within the strata. The accuracy of estimators when this method is used is examined in detail, in the so-called asymptotic regime (i.e. when both the number of samples and the number of strata are large). It turns out that the limiting variance depends on the directions defining the hyperrectangles but not on the precise abscissa of their boundaries along these directions, which gives a mathematical justification to the common choice of equiprobable strata. So, only the directions are adaptively modified by our algorithm. We illustrate the use of the method for the computation of the price of path-dependent options in models with both constant and stochastic volatility. The use of this adaptive technique yields variance reduction by factors sometimes larger than 1000 compared to classical Monte Carlo estimators.  相似文献   

4.

This paper presents reduced-order nonlinear filtering schemes based on a theoretical framework that combines stochastic dimensional reduction and nonlinear filtering. Here, dimensional reduction is achieved for estimating the slow-scale process in a multiscale environment by constructing a filter using stochastic averaging results. The nonlinear filter is approximated numerically using the ensemble Kalman filter and particle filter. The particle filter is further adapted to the complexities of inherently chaotic signals. In particle filters, an ensemble of particles is used to represent the distribution of the state of the hidden signal. The ensemble is updated using observation data to obtain the best representation of the conditional density of the true state variables given observations. Particle methods suffer from the “curse of dimensionality,” an issue of particle degeneracy within a sample, which increases exponentially with system dimension. Hence, particle filtering in high dimensions can benefit from some form of dimensional reduction. A control is superimposed on particle dynamics to drive particles to locations most representative of observations, in other words, to construct a better prior density. The control is determined by solving a classical stochastic optimization problem and implemented in the particle filter using importance sampling techniques.

  相似文献   

5.
The ordinary least squares estimation is based on minimization of the squared distance of the response variable to its conditional mean given the predictor variable. We extend this method by including in the criterion function the distance of the squared response variable to its second conditional moment. It is shown that this “second-order” least squares estimator is asymptotically more efficient than the ordinary least squares estimator if the third moment of the random error is nonzero, and both estimators have the same asymptotic covariance matrix if the error distribution is symmetric. Simulation studies show that the variance reduction of the new estimator can be as high as 50% for sample sizes lower than 100. As a by-product, the joint asymptotic covariance matrix of the ordinary least squares estimators for the regression parameter and for the random error variance is also derived, which is only available in the literature for very special cases, e.g. that random error has a normal distribution. The results apply to both linear and nonlinear regression models, where the random error distributions are not necessarily known.  相似文献   

6.
Several techniques for resampling dependent data have already been proposed. In this paper we use missing values techniques to modify the moving blocks jackknife and bootstrap. More specifically, we consider the blocks of deleted observations in the blockwise jackknife as missing data which are recovered by missing values estimates incorporating the observation dependence structure. Thus, we estimate the variance of a statistic as a weighted sample variance of the statistic evaluated in a “complete” series. Consistency of the variance and the distribution estimators of the sample mean are established. Also, we apply the missing values approach to the blockwise bootstrap by including some missing observations among two consecutive blocks and we demonstrate the consistency of the variance and the distribution estimators of the sample mean. Finally, we present the results of an extensive Monte Carlo study to evaluate the performance of these methods for finite sample sizes, showing that our proposal provides variance estimates for several time series statistics with smaller mean squared error than previous procedures.  相似文献   

7.
??In the additive regression models, the single-index model is considered commonly for high dimensional regression analysis. The specification of this model that it is more flexible compared with a parametric model, and it avoids the curse of dimensionality because the single-index reduces the dimensionality of a standard variable vector (x in the multi-regression) to a univariate index (\beta^\T X in the single-index model). In this paper, we developed a single-index regression model with a functional errors' term that serves in checking the heteroscedasticity. Since the efficient inference of a regression model demands that heteroscedasticity is regarded when it exists, this paper presents the assumptions of testing variance constancy in single-index models. The test statistic is assessing the variance homogeneity stated as a combination of Levene's test and the theories of ANOVA for the infinite factor levels. The test statistic in the simulation studies displays appropriately in all situations compared to a well-known method and applies to a real dataset.  相似文献   

8.

This work introduces and compares approaches for estimating rare-event probabilities related to the number of edges in the random geometric graph on a Poisson point process. In the one-dimensional setting, we derive closed-form expressions for a variety of conditional probabilities related to the number of edges in the random geometric graph and develop conditional Monte Carlo algorithms for estimating rare-event probabilities on this basis. We prove rigorously a reduction in variance when compared to the crude Monte Carlo estimators and illustrate the magnitude of the improvements in a simulation study. In higher dimensions, we use conditional Monte Carlo to remove the fluctuations in the estimator coming from the randomness in the Poisson number of nodes. Finally, building on conceptual insights from large-deviations theory, we illustrate that importance sampling using a Gibbsian point process can further substantially reduce the estimation variance.

  相似文献   

9.
In the additive regression models, the single-index model is considered commonly for high dimensional regression analysis. The specification of this model that it is more flexible compared with a parametric model, and it avoids the curse of dimensionality because the single-index reduces the dimensionality of a standard variable vector (x in the multi-regression) to a univariate index (\beta^\T X in the single-index model). In this paper, we developed a single-index regression model with a functional errors' term that serves in checking the heteroscedasticity. Since the efficient inference of a regression model demands that heteroscedasticity is regarded when it exists, this paper presents the assumptions of testing variance constancy in single-index models. The test statistic is assessing the variance homogeneity stated as a combination of Levene's test and the theories of ANOVA for the infinite factor levels. The test statistic in the simulation studies displays appropriately in all situations compared to a well-known method and applies to a real dataset.  相似文献   

10.
This paper considers the problem of estimating the finite-population distribution function and quantiles with the use of auxiliary information at the estimation stage of a survey. We propose the families of estimators of the distribution function of the study variate y using the knowledge of the distribution function of the auxiliary variate x. In addition to ratio, product and difference type estimators, many other estimators are identified as members of the proposed families. For these families the approximate variances are derived, and in addition, the optimum estimator is identified along with its approximate variance. Estimators based on the estimated optimum values of the unknown parameters used to minimize the variance are also given with their properties. Further, the family of estimators of a finite-population distribution function using two-phase sampling is given, and its properties are investigated.   相似文献   

11.
Bayes estimation of the mean of a variance mixture of multivariate normal distributions is considered under sum of squared errors loss. We find broad class of priors (also in the variance mixture of normal class) which result in proper and generalized Bayes minimax estimators. This paper extends the results of Strawderman [Minimax estimation of location parameters for certain spherically symmetric distribution, J. Multivariate Anal. 4 (1974) 255-264] in a manner similar to that of Maruyama [Admissible minimax estimators of a mean vector of scale mixtures of multivariate normal distribution, J. Multivariate Anal. 21 (2003) 69-78] but somewhat more in the spirit of Fourdrinier et al. [On the construction of bayes minimax estimators, Ann. Statist. 26 (1998) 660-671] for the normal case, in the sense that we construct classes of priors giving rise to minimaxity. A feature of this paper is that in certain cases we are able to construct proper Bayes minimax estimators satisfying the properties and bounds in Strawderman [Minimax estimation of location parameters for certain spherically symmetric distribution, J. Multivariate Anal. 4 (1974) 255-264]. We also give some insight into why Strawderman's results do or do not seem to apply in certain cases. In cases where it does not apply, we give minimax estimators based on Berger's [Minimax estimation of location vectors for a wide class of densities, Ann. Statist. 3 (1975) 1318-1328] results. A main condition for minimaxity is that the mixing distributions of the sampling distribution and the prior distribution satisfy a monotone likelihood ratio property with respect to a scale parameter.  相似文献   

12.
We establish an ordering criterion for the asymptotic variances of two consistent Markov chain Monte Carlo (MCMC) estimators: an importance sampling (IS) estimator, based on an approximate reversible chain and subsequent IS weighting, and a standard MCMC estimator, based on an exact reversible chain. Essentially, we relax the criterion of the Peskun type covariance ordering by considering two different invariant probabilities, and obtain, in place of a strict ordering of asymptotic variances, a bound of the asymptotic variance of IS by that of the direct MCMC. Simple examples show that IS can have arbitrarily better or worse asymptotic variance than Metropolis–Hastings and delayed-acceptance (DA) MCMC. Our ordering implies that IS is guaranteed to be competitive up to a factor depending on the supremum of the (marginal) IS weight. We elaborate upon the criterion in case of unbiased estimators as part of an auxiliary variable framework. We show how the criterion implies asymptotic variance guarantees for IS in terms of pseudo-marginal (PM) and DA corrections, essentially if the ratio of exact and approximate likelihoods is bounded. We also show that convergence of the IS chain can be less affected by unbounded high-variance unbiased estimators than PM and DA chains.  相似文献   

13.
Time series data with periodic trends like daily temperatures or sales of seasonal products can be seen in periods fluctuating between highs and lows throughout the year. Generalized least squares estimators are often computed for such time series data as these estimators have minimum variance among all linear unbiased estimators. However, the generalized least squares solution can require extremely demanding computation when the data is large. This paper studies an efficient algorithm for generalized least squares estimation in periodic trended regression with autoregressive errors. We develop an algorithm that can substantially simplify generalized least squares computation by manipulating large sets of data into smaller sets. This is accomplished by coining a structured matrix for dimension reduction. Simulations show that the new computation methods using our algorithm can drastically reduce computing time. Our algorithm can be easily adapted to big data that show periodic trends often pertinent to economics, environmental studies, and engineering practices.  相似文献   

14.
Resampling methods are often invoked in risk modelling when the stability of estimators of model parameters has to be assessed. The accuracy of variance estimates is crucial since the operational risk management affects strategies, decisions and policies. However, auxiliary variables and the complexity of the sampling design are seldom taken into proper account in variance estimation. In this paper bootstrap algorithms for finite population sampling are proposed in presence of an auxiliary variable and of complex samples. Results from a simulation study exploring the empirical performance of some bootstrap algorithms are presented.   相似文献   

15.
Monte Carlo variance reduction techniques within the supertrack approach are justified as applied to estimating non-Boltzmann tallies equal to the mean of a random variable defined on the set of all branching trajectories. For this purpose, a probability space is constructed on the set of all branching trajectories, and the unbiasedness of this method is proved by averaging over all trajectories. Variance reduction techniques, such as importance sampling, splitting, and Russian roulette, are discussed. A method is described for extending available codes based on the von Neumann-Ulam scheme in order to cover the supertrack approach.  相似文献   

16.
Data from most complex surveys are subject to selection bias and clustering due to the sampling design. Results developed for a random sample from a super-population model may not apply. Ignoring the survey sampling weights may cause biased estimators and erroneous confidence intervals. In this paper, we use the design approach for fitting the proportional hazards (PH) model and prove formally the asymptotic normality of the sample maximum partial likelihood (SMPL) estimators under the PH model for both stochastically independent and clustered failure times. In the first case, we use the central limit theorem for martingales in the joint design-model space, and this enables us to obtain results for a general multistage sampling design under mild and easily verifiable conditions. In the case of clustered failure times, we require asymptotic normality in the sampling design space directly, and this holds for fewer sampling designs than in the first case. We also propose a variance estimator of the SMPL estimator. A key property of this variance estimator is that we do not have to specify the second-stage correlation model.  相似文献   

17.
We propose a criterion for variable selection in discriminant analysis. This criterion permits to arrange the variables in decreasing order of adequacy for discrimination, so that the variable selection problem reduces to that of the estimation of suitable permutation and dimensionality. Then, estimators for these parameters are proposed and the resulting method for selecting variables is shown to be consistent. In a simulation study, we compute proportions of correct classification after variable selection in order to gain understanding of the performance of our proposal and to compare it to existing methods.  相似文献   

18.
In this paper, we investigate the estimation of semi-varying coefficient models when the nonlinear covariates are prone to measurement error. With the help of validation sampling, we propose two estimators of the parameter and the coefficient functions by combining dimension reduction and the profile likelihood methods without any error structure equation specification or error distribution assumption. We establish the asymptotic normality of proposed estimators for both the parametric and nonparametric parts and show that the proposed estimators achieves the best convergence rate. Data-driven bandwidth selection methods are also discussed. Simulations are conducted to evaluate the finite sample property of the estimation methods proposed.  相似文献   

19.
An exhaustive search as required for traditional variable selection methods is impractical in high dimensional statistical modeling. Thus, to conduct variable selection, various forms of penalized estimators with good statistical and computational properties, have been proposed during the past two decades. The attractive properties of these shrinkage and selection estimators, however, depend critically on the size of regularization which controls model complexity. In this paper, we consider the problem of consistent tuning parameter selection in high dimensional sparse linear regression where the dimension of the predictor vector is larger than the size of the sample. First, we propose a family of high dimensional Bayesian Information Criteria (HBIC), and then investigate the selection consistency, extending the results of the extended Bayesian Information Criterion (EBIC), in Chen and Chen (2008) to ultra-high dimensional situations. Second, we develop a two-step procedure, the SIS+AENET, to conduct variable selection in p>n situations. The consistency of tuning parameter selection is established under fairly mild technical conditions. Simulation studies are presented to confirm theoretical findings, and an empirical example is given to illustrate the use in the internet advertising data.  相似文献   

20.
This paper deals with the estimation, under sampling in two successive occasions, of a finite population quantile. For this sampling design a class of estimators is proposed whose the ratio and difference estimators are particular cases. Asymptotic variance formulae are derived for the proposed estimators, and the optimal matching fraction is discussed. Comparisons are made with existing estimators in a simulation study using a natural population.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号