期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Comparison of hyperbolic and constant width simultaneous confidence bands in multiple linear regression under MVCS criterion

W. Liu A.J. Hayter W.W. Piegorsch P. Ah-Kine 《Journal of multivariate analysis》2009,100(7):1432-1439

A simultaneous confidence band provides useful information on the plausible range of the unknown regression model, and different confidence bands can often be constructed for the same regression model. For a simple regression line, Liu and Hayter [W. Liu, A.J. Hayter, Minimum area confidence set optimality for confidence bands in simple linear regression, J. Amer. Statist. Assoc. 102 (477) (2007) pp. 181–190] proposed the use of the area of the confidence set corresponding to a confidence band as an optimality criterion in comparison of confidence bands; the smaller the area of the confidence set, the better the corresponding confidence band. This minimum area confidence set (MACS) criterion can be generalized to a minimum volume confidence set (MVCS) criterion in the study of confidence bands for a multiple linear regression model. In this paper hyperbolic and constant width confidence bands for a multiple linear regression model over a particular ellipsoidal region of the predictor variables are compared under the MVCS criterion. It is observed that whether one band is better than the other depends on the magnitude of one particular angle that determines the size of the predictor variable region. When the angle and hence the size of the predictor variable region is small, the constant width band is better than the hyperbolic band but only marginally. When the angle and hence the size of the predictor variable region is large the hyperbolic band can be substantially better than the constant width band. 相似文献

2.

Modeling a Simultaneous Confidence Band of the Mean Value of Multiple Responses with a Rectangular Domain for Predictors

A. G. Belov 《Moscow University Computational Mathematics and Cybernetics》2018,42(3):114-118

The problem is considered of modeling simultaneous confidence intervals for the mean values of multiple responses in a linear multivariate normal regression model with predictor variables defined in intervals. To solve it, a numerical way of calculating the critical value that determines the simultaneous confidence interval of a given level is used. Simultaneous confidence intervals are numerically modelled and analyzed by comparison for regression, the mean value of multiple responses, and individual observation. 相似文献

3.

Variable selection in model-based discriminant analysis

C. Maugis G. Celeux M.-L. Martin-Magniette 《Journal of multivariate analysis》2011,102(10):1374-1387

A general methodology for selecting predictors for Gaussian generative classification models is presented. The problem is regarded as a model selection problem. Three different roles for each possible predictor are considered: a variable can be a relevant classification predictor or not, and the irrelevant classification variables can be linearly dependent on a part of the relevant predictors or independent variables. This variable selection model was inspired by a previous work on variable selection in model-based clustering. A BIC-like model selection criterion is proposed. It is optimized through two embedded forward stepwise variable selection algorithms for classification and linear regression. The model identifiability and the consistency of the variable selection criterion are proved. Numerical experiments on simulated and real data sets illustrate the interest of this variable selection methodology. In particular, it is shown that this well ground variable selection model can be of great interest to improve the classification performance of the quadratic discriminant analysis in a high dimension context. 相似文献

4.

Simultaneous confidence bands for the integrated hazard function 总被引：1，自引：1，他引：0

Anna Dudek Maciej Goćwin Jacek Leśkow 《Computational Statistics》2008,23(1):41-62

The construction of the simultaneous confidence bands for the integrated hazard function is considered. The Nelson–Aalen estimator is used. The simultaneous confidence bands based on bootstrap methods are presented. Four methods of construction of such confidence bands are proposed. The weird and conditional bootstrap methods are used for resampling. Simulations are made to compare the actual coverage probability of the bootstrap and the asymptotic simultaneous confidence bands. It is shown that the equal-tailed bootstrap confidence band has the coverage probability closest to the nominal one. We also present application of our confidence bands to the data regarding survival after heart transplant. This research was partly supported by AGH grant No. 10.420.03. 相似文献

5.

Exact one-sided simultaneous confidence bands via Uusipaikka’s method

Wei Pan Walter W. Piegorsch R. Webster West 《Annals of the Institute of Statistical Mathematics》2003,55(2):243-250

Computation of one-sided simultaneous confidence bands is detailed for a simple linear regression under interval restrictions on the predictor variable, using a method due to Uusipaikka (1983,J. Amer. Statist. Assoc.,78, 638–644). The case of a single interval restriction is emphasized. A WWW-based applet for computing the bands is described. 相似文献

6.

Confidence Intervals of Variance Functions in Generalized Linear Model

Yong Zhou Dao-ji Li 《应用数学学报(英文版)》2006,22(3):353-368

In this paper we introduce an appealing nonparametric method for estimating variance and conditional variance functions in generalized linear models （GLMs）, when designs are fixed points and random variables respectively, Bias-corrected confidence bands are proposed for the （conditional） variance by local linear smoothers. Nonparametric techniques are developed in deriving the bias-corrected confidence intervals of the （conditional） variance. The asymptotic distribution of the proposed estimator is established and show that the bias-corrected confidence bands asymptotically have the correct coverage properties. A small simulation is performed when unknown regression parameter is estimated by nonparametric quasi-likelihood. The results are also applicable to nonparamctric autoregressive times series model with heteroscedastic conditional variance. 相似文献

7.

Clusterwise analysis for multiblock component methods

Stéphanie Bougeard Hervé Abdi Gilbert Saporta Ndèye Niang 《Advances in Data Analysis and Classification》2018,12(2):285-313

Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing. 相似文献

8.

On confidence bands and set estimators for the simple linear model

《Statistics & probability letters》1987,5(6):409-413

This paper reviews the duality between confidence bands and (convex) set estimators in a simple linear regression. Applications of this duality are explored. These include the nature of polygonal sets and the development of an algorithm that approximates the coverage probability of smooth confidence band functions. 相似文献

9.

Simultaneous estimation and variable selection in median regression using Lasso-type penalty

Jinfeng Xu Zhiliang Ying 《Annals of the Institute of Statistical Mathematics》2010,62(3):487-514

We consider the median regression with a LASSO-type penalty term for variable selection. With the fixed number of variables in regression model, a two-stage method is proposed for simultaneous estimation and variable selection where the degree of penalty is adaptively chosen. A Bayesian information criterion type approach is proposed and used to obtain a data-driven procedure which is proved to automatically select asymptotically optimal tuning parameters. It is shown that the resultant estimator achieves the so-called oracle property. The combination of the median regression and LASSO penalty is computationally easy to implement via the standard linear programming. A random perturbation scheme can be made use of to get simple estimator of the standard error. Simulation studies are conducted to assess the finite-sample performance of the proposed method. We illustrate the methodology with a real example. 相似文献

10.

Variable Selection in Linear Regression With Many Predictors

《Journal of computational and graphical statistics》2013,22(3):573-591

With advanced capability in data collection, applications of linear regression analysis now often involve a large number of predictors. Variable selection thus has become an increasingly important issue in building a linear regression model. For a given selection criterion, variable selection is essentially an optimization problem that seeks the optimal solution over 2^m possible linear regression models, where m is the total number of candidate predictors. When m is large, exhaustive search becomes practically impossible. Simple suboptimal procedures such as forward addition, backward elimination, and backward-forward stepwise procedure are fast but can easily be trapped in a local solution. In this article we propose a relatively simple algorithm for selecting explanatory variables in a linear regression for a given variable selection criterion. Although the algorithm is still a suboptimal algorithm, it has been shown to perform well in extensive empirical study. The main idea of the procedure is to partition the candidate predictors into a small number of groups. Working with various combinations of the groups and iterating the search through random regrouping, the search space is substantially reduced, hence increasing the probability of finding the global optimum. By identifying and collecting “important” variables throughout the iterations, the algorithm finds increasingly better models until convergence. The proposed algorithm performs well in simulation studies with 60 to 300 predictors. As a by-product of the proposed procedure, we are able to study the behavior of variable selection criteria when the number of predictors is large. Such a study has not been possible with traditional search algorithms.

This article has supplementary material online. 相似文献

11.

Sparse estimation in functional linear regression

Eun Ryung LeeByeong U. Park 《Journal of multivariate analysis》2012,105(1):1-17

As a useful tool in functional data analysis, the functional linear regression model has become increasingly common and been studied extensively in recent years. In this paper, we consider a sparse functional linear regression model which is generated by a finite number of basis functions in an expansion of the coefficient function. In this model, we do not specify how many and which basis functions enter the model, thus it is not like a typical parametric model where predictor variables are pre-specified. We study a general framework that gives various procedures which are successful in identifying the basis functions that enter the model, and also estimating the resulting regression coefficients in one-step. We adopt the idea of variable selection in the linear regression setting where one adds a weighted L₁ penalty to the traditional least squares criterion. We show that the procedures in our general framework are consistent in the sense of selecting the model correctly, and that they enjoy the oracle property, meaning that the resulting estimators of the coefficient function have asymptotically the same properties as the oracle estimator which uses knowledge of the underlying model. We investigate and compare several methods within our general framework, via a simulation study. Also, we apply the methods to the Canadian weather data. 相似文献

12.

Confidence ellipsoids based on a general family of shrinkage estimators for a linear model with non-spherical disturbances

Anoop Chaturvedi Suchita GuptaM. Ishaq Bhatti 《Journal of multivariate analysis》2012,104(1):140-158

This paper considers a general family of Stein rule estimators for the coefficient vector of a linear regression model with nonspherical disturbances, and derives estimators for the Mean Squared Error (MSE) matrix, and risk under quadratic loss for this family of estimators. The confidence ellipsoids for the coefficient vector based on this family of estimators are proposed, and the performance of the confidence ellipsoids under the criterion of coverage probability and expected volumes is investigated. The results of a numerical simulation are presented to illustrate the theoretical findings, which could be applicable in the area of economic growth modeling. 相似文献

13.

On the impact of model selection on predictor identification and parameter inference

Ruth M. Pfeiffer Andrew Redd Raymond J. Carroll 《Computational Statistics》2017,32(2):667-690

We assessed the ability of several penalized regression methods for linear and logistic models to identify outcome-associated predictors and the impact of predictor selection on parameter inference for practical sample sizes. We studied effect estimates obtained directly from penalized methods (Algorithm 1), or by refitting selected predictors with standard regression (Algorithm 2). For linear models, penalized linear regression, elastic net, smoothly clipped absolute deviation (SCAD), least angle regression and LASSO had a low false negative (FN) predictor selection rates but false positive (FP) rates above 20 % for all sample and effect sizes. Partial least squares regression had few FPs but many FNs. Only relaxo had low FP and FN rates. For logistic models, LASSO and penalized logistic regression had many FPs and few FNs for all sample and effect sizes. SCAD and adaptive logistic regression had low or moderate FP rates but many FNs. 95 % confidence interval coverage of predictors with null effects was approximately 100 % for Algorithm 1 for all methods, and 95 % for Algorithm 2 for large sample and effect sizes. Coverage was low only for penalized partial least squares (linear regression). For outcome-associated predictors, coverage was close to 95 % for Algorithm 2 for large sample and effect sizes for all methods except penalized partial least squares and penalized logistic regression. Coverage was sub-nominal for Algorithm 1. In conclusion, many methods performed comparably, and while Algorithm 2 is preferred to Algorithm 1 for estimation, it yields valid inference only for large effect and sample sizes. 相似文献

14.

On the Semisupervised Joint Trained Elastic Net

Mark Culp 《Journal of computational and graphical statistics》2013,22(2):300-318

The elastic net (supervised enet henceforth) is a popular and computationally efficient approach for performing the simultaneous tasks of selecting variables, decorrelation, and shrinking the coefficient vector in the linear regression setting. Semisupervised regression, currently unrelated to the supervised enet, uses data with missing response values (unlabeled) along with labeled data to train the estimator. In this article, we propose the joint trained elastic net (jt-enet), which elegantly incorporates the benefits of semisupervised regression with the supervised enet. The supervised enet and other approaches like it rely on shrinking the linear estimator in a way that simultaneously performs variable selection and decorrelates the data. Both the variable selection and decorrelation components of the supervised enet inherently rely on the pairwise correlation structure in the feature data. In circumstances in which the number of variables is high, the feature data are relatively easy to obtain, and the response is expensive to generate, it seems reasonable that one would want to be able to use any existing unlabeled observations to more accurately define these correlations. However, the supervised enet is not able to incorporate this information and focuses only on the information within the labeled data. In this article, we propose the jt-enet, which allows the unlabeled data to influence the variable selection, decorrelation, and shrinkage capabilities of the linear estimator. In addition, we investigate the impact of unlabeled data on the risk and bias of the proposed estimator. The jt-enet is demonstrated on two applications with encouraging results. Online supplementary material is available for this article. 相似文献

15.

Spline Estimation of Discontinuous Regression Functions

Ja-Yong Koo 《Journal of computational and graphical statistics》2013,22(3):266-284

Abstract

This article deals with regression function estimation when the regression function is smooth at all but a finite number of points. An important question is: How can one produce discontinuous output without knowledge of the location of discontinuity points? Unlike most commonly used smoothers that tend to blur discontinuity in the data, we need to find a smoother that can detect such discontinuity. In this article, linear splines are used to estimate discontinuous regression functions. A procedure of knot-merging is introduced for the estimation of regression functions near discontinuous points. The basic idea is to use multiple knots for spline estimates. We use an automatic procedure involving the least squares method, stepwise knot addition, stepwise basis deletion, knot-merging, and the Bayes information criterion to select the final model. The proposed method can produce discontinuous outputs. Numerical examples using both simulated and real data are given to illustrate the performance of the proposed method. 相似文献

16.

缺失数据下线性模型中反映变量均值的经验似然置信区间

齐化富秦永松《大学数学》2010,26(2)

在协变量和反映变量都缺失下,构造了线性模型中反映变量均值的经验似然置信区间,数据模拟表明调整的经验似然置信区间有较好的覆盖率和精度,进一步完善了缺失数据下对线性模型的研究. 相似文献

17.

Simultaneous confidence bands for nonparametric regression with missing covariate data

Cai Li Gu Lijie Wang Qihua Wang Suojin 《Annals of the Institute of Statistical Mathematics》2021,73(6):1249-1279

We consider a weighted local linear estimator based on the inverse selection probability for nonparametric regression with missing covariates at random. The asymptotic distribution of the maximal deviation between the estimator and the true regression function is derived and an asymptotically accurate simultaneous confidence band is constructed. The estimator for the regression function is shown to be oracally efficient in the sense that it is uniformly indistinguishable from that when the selection probabilities are known. Finite sample performance is examined via simulation studies which support our asymptotic theory. The proposed method is demonstrated via an analysis of a data set from the Canada 2010/2011 Youth Student Survey.

相似文献

18.

Interval estimation for fitting straight line when both variables are subject to error

Jia-Ren Tsai 《Computational Statistics》2013,28(1):219-240

When both variables are subject to error in regression model, the least squares estimators are biased and inconsistent. The measurement error model is more appropriate to fit the data. This study focuses on the problem to construct interval estimation for fitting straight line in linear measurement error model when one of the error variances is known. We use the concepts of generalized pivotal quantity and construct the confidence interval for the slope because no pivot is available in this case. We compare the existing confidence intervals in terms of coverage probability and expected length via simulation studies. A real data example is also analyzed. 相似文献

19.

��ά��ģ�͵�PGFR��ɸѡ

�� 《应用概率统计》2017,33(6):608-624

In this paper, we consider the ultra-high dimensional partially linear model, where the dimensionality p of linear component is much larger than the sample size n, and p can be as large as an exponential of the sample size n. Firstly, we transform the ultra-high dimensional partially linear model into the ultra-high dimensional linear model based the profile technique used in the semiparametric regression. Secondly, in order to finish the variable screening for high-dimensional linear component, we propose a variable screening method called as the profile greedy forward regression (PGFR) by combining the greedy algorithm with the forward regression (FR) method. The proposed PGFR method not only considers the correlation between the covariates, but also identifies all relevant predictors consistently and possesses the screening consistency property under the some regularity conditions. We further propose the BIC criterion to determine whether the selected model contains the true model with probability tending to one. Finally, some simulation studies and a real application are conducted to examine the finite sample performance of the proposed PGFR procedure. 相似文献

20.

Derivative-free optimization and neural networks for robust regression

《Optimization》2012,61(12):1467-1490

Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares (LTS) criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks (ANNs) to contaminated data using LTS criterion. We introduce a penalized LTS criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression. 相似文献