首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
Regression models with interaction effects have been widely used in multivariate analysis to improve model flexibility and prediction accuracy. In functional data analysis, however, due to the challenges of estimating three-dimensional coefficient functions, interaction effects have not been considered for function-on-function linear regression. In this article, we propose function-on-function regression models with interaction and quadratic effects. For a model with specified main and interaction effects, we propose an efficient estimation method that enjoys a minimum prediction error property and has good predictive performance in practice. Moreover, converting the estimation of three-dimensional coefficient functions of the interaction effects to the estimation of two- and one-dimensional functions separately, our method is computationally efficient. We also propose adaptive penalties to account for varying magnitudes and roughness levels of coefficient functions. In practice, the forms of the models are usually unspecified. We propose a stepwise procedure for model selection based on a predictive criterion. This method is implemented in our R package FRegSigComp. Supplemental materials are available online.  相似文献   

2.
We consider the problem of nonparametric estimation of unknown smooth functions in the presence of restrictions on the shape of the estimator and on its support using polynomial splines. We provide a general computational framework that treats these estimation problems in a unified manner, without the limitations of the existing methods. Applications of our approach include computing optimal spline estimators for regression, density estimation, and arrival rate estimation problems in the presence of various shape constraints. Our approach can also handle multiple simultaneous shape constraints. The approach is based on a characterization of nonnegative polynomials that leads to semidefinite programming (SDP) and second-order cone programming (SOCP) formulations of the problems. These formulations extend and generalize a number of previous approaches in the literature, including those with piecewise linear and B-spline estimators. We also consider a simpler approach in which nonnegative splines are approximated by splines whose pieces are polynomials with nonnegative coefficients in a nonnegative basis. A condition is presented to test whether a given nonnegative basis gives rise to a spline cone that is dense in the space of nonnegative continuous functions. The optimization models formulated in the article are solvable with minimal running time using off-the-shelf software. We provide numerical illustrations for density estimation and regression problems. These examples show that the proposed approach requires minimal computational time, and that the estimators obtained using our approach often match and frequently outperform kernel methods and spline smoothing without shape constraints. Supplementary materials for this article are provided online.  相似文献   

3.
关于回归模型的参数估计效率   总被引:2,自引:0,他引:2  
本文讨论回归模型的参数估计效率。本文说明了现有线性回归模型的参数估计效率的下界与真实的参数估计效率在很多情况下相差较大,而且这种下界对于实测数据处理很难得到精确值。本文给出了估算参数估计效率的仿真方法。理论分析表明,该方法给出的参数估计效率的估计较现有的下界估计更合理;仿真和实算结果表明,对于一大类线性和非线性回归模型,该方法给出的回归模型的参数估计效率的估计更接近模型参数估计效率的真值。  相似文献   

4.
In this paper we analyze the importance of initial conditions in exponential smoothing models on forecast errors and prediction intervals. We work with certain exponential smoothing models, namely Holt’s additive linear and Gardner’s damped trend. We study some probability properties of those models, showing the influence of the initial conditions on the forecast, which highlights the importance of obtaining accurate estimates of initial conditions. Using the linear heteroscedastic modeling approach, we show how to obtain the joint estimation of initial conditions and smoothing parameters through maximum likelihood via box-constrained nonlinear optimization. Point-wise forecasts of future values and prediction intervals are computed under normality assumptions on the stochastic component. We also propose an alternative formulation of prediction intervals in order to obtain an empirical coverage closer to their nominal values; that formulation adds an additional term to the standard formulas for the estimation of the error variance. We illustrate the proposed approach by using the yearly data time-series from the M3-Competition.  相似文献   

5.
We consider a class of distribution-free regression models only defined in terms of moments, which can be used to model separate reported but not settled reserves, and incurred but not reported reserves. These regression models can be estimated using standard least squares and method of moments techniques, similar to those used in the distribution-free chain-ladder model. Further, these regression models are closely related to double chain-ladder type models, and the suggested estimation techniques could serve as alternative estimation procedures for these models. Due to the simple structure of the models it is possible to obtain Mack-type mean squared error of prediction estimators. Moreover, the analysed regression models can be used on different levels of detailed data, and by using the least squares estimation techniques it is possible to show that the precision in the reserve predictor is improved by using more detailed data. These regression models can be seen as a sequence of linear models, and are therefore also easy to bootstrap non-parametrically.  相似文献   

6.
Second-order cone programs are a class of convex optimization problems. We refer to them as deterministic second-order cone programs (DSCOPs) since data defining them are deterministic. In DSOCPs we minimize a linear objective function over the intersection of an affine set and a product of second-order (Lorentz) cones. Stochastic programs have been studied since 1950s as a tool for handling uncertainty in data defining classes of optimization problems such as linear and quadratic programs. Stochastic second-order cone programs (SSOCPs) with recourse is a class of optimization problems that defined to handle uncertainty in data defining DSOCPs. In this paper we describe four application models leading to SSOCPs.  相似文献   

7.
The theoretical relationship between the prediction variance of a Gaussian process model (GPM) and its mean square prediction error is well known. This relationship has been studied for the case when deterministic simulations are used in GPM, with application to design of computer experiments and metamodeling optimization. This article analyzes the error estimation of Gaussian process models when the simulated data observations contain measurement noise. In particular, this work focuses on the correlation between the GPM prediction variance and the distribution of prediction errors over multiple experimental designs, as a function of location in the input space. The results show that the error estimation properties of a Gaussian process model using stochastic simulations are preserved when the signal-to-noise ratio in the data is larger than 10, regardless of the number of training points used in the metamodel. Also, this article concludes that the distribution of prediction errors approaches a normal distribution with a variance equal to the GPM prediction variance, even in the presence of significant bias in the GPM predictions.  相似文献   

8.
In this study, composite earnings per share models are estimated for 35 chemical, food, and utility firms during the 1981–1982 period. Although it is generally held that financial analysts produce superior earnings forecasts when compared to time series model forecasts, the results of this study indicate that analysts fared very poorly in 1982 and the average mean square forecasting error of analyst forecasts may be reduced by 74.2 percent by combining analyst and univariate time series model forecasts. This reduction is very interesting when one finds that the univariate time series model forecasts do not substantially deviate from those produced by random walk drift models, the ARIMA (0, 1, 1) process. Moreover, despite the high degree of correlation existing among analyst and time series forecasts, the ordinary least squares estimation of the composite earnings model is a better forecasting model than the composite earnings models estimated with ridge regression and latent root regression techniques.  相似文献   

9.
When modeling optimal product mix under emission restrictions produces a solution with unacceptable level of profit, analyst is moved to investigate the cause(s). Interior analysis (IA) is proposed for this purpose. With IA, analyst can investigate the impact of accommodating emission controls in step-by-step one-at-a-time manner and in doing so track how profit and other important features of product mix degrade and to which emission control enforcements its diminution may be attributed. In this way, analyst can assist manager in identifying implementation strategies. Although IA is presented within context of a linear programming formulation of the green product mix problem, its methodology may be applied to other modeling frameworks. Quantity dependent penalty rates and transformations of emissions to forms with or without economic value are included in the modeling and illustrations of IA.  相似文献   

10.
This paper studies how to identify influential observations in the functional linear model in which the predictor is functional and the response is scalar. Measurement of the effects of a single observation on estimation and prediction when the model is estimated by the principal components method is undertaken. For that, three statistics are introduced for measuring the influence of each observation on estimation and prediction of the functional linear model with scalar response that are generalizations of the measures proposed for the standard regression model by [D.R. Cook, Detection of influential observations in linear regression, Technometrics 19 (1977) 15-18; D. Peña, A new statistic for influence in linear regression, Technometrics 47 (2005) 1-12] respectively. A smoothed bootstrap method is proposed to estimate the quantiles of the influence measures, which allows us to point out which observations have the larger influence on estimation and prediction. The behavior of the three statistics and the quantile estimation bootstrap based method is analyzed via a simulation study. Finally, the practical use of the proposed statistics is illustrated by the analysis of a real data example, which show that the proposed measures are useful for detecting heterogeneity in the functional linear model with scalar response.  相似文献   

11.
Partially linear regression models with fixed effects are useful tools for making econometric analyses and normalizing microarray data. Baltagi and Li (2002) [7] proposed a computation friendly difference-based series estimation (DSE) for them. We show that the DSE is not asymptotically efficient in most cases and further propose a weighted difference-based series estimation (WDSE). The weights in it do not involve any unknown parameters. The asymptotic properties of the resulting estimators are established for both balanced and unbalanced cases, and it is shown that they achieve a semiparametric efficient boundary. Additionally, we propose a variable selection procedure for identifying significant covariates in the parametric part of the semiparametric fixed-effects regression model. The method is based on a combination of the nonconcave penalization (Fan and Li, 2001 [13]) and weighted difference-based series estimation techniques. The resulting estimators have the oracle property; that is, they can correctly identify the true model as if the true model (the subset of variables with nonvanishing coefficients) were known in advance. Simulation studies are conducted and an application is given to demonstrate the finite sample performance of the proposed procedures.  相似文献   

12.
Abstract

An essential feature of longitudinal data is the existence of autocorrelation among the observations from the same unit or subject. Two-stage random-effects linear models are commonly used to analyze longitudinal data. These models are not flexible enough, however, for exploring the underlying data structures and, especially, for describing time trends. Semi-parametric models have been proposed recently to accommodate general time trends. But these semi-parametric models do not provide a convenient way to explore interactions among time and other covariates although such interactions exist in many applications. Moreover, semi-parametric models require specifying the design matrix of the covariates (time excluded). We propose nonparametric models to resolve these issues. To fit nonparametric models, we use the novel technique of the multivariate adaptive regression splines for the estimation of mean curve and then apply an EM-like iterative procedure for covariance estimation. After giving a general algorithm of model building, we show how to design a fast algorithm. We use both simulated and published data to illustrate the use of our proposed method.  相似文献   

13.
Finite mixture distributions arise in sampling a heterogeneous population. Data drawn from such a population will exhibit extra variability relative to any single subpopulation. Statistical models based on finite mixtures can assist in the analysis of categorical and count outcomes when standard generalized linear models (GLMs) cannot adequately express variability observed in the data. We propose an extension of GLMs where the response follows a finite mixture distribution and the regression of interest is linked to the mixture’s mean. This approach may be preferred over a finite mixture of regressions when the population mean is of interest; here, only one regression must be specified and interpreted in the analysis. A technical challenge is that the mixture’s mean is a composite parameter that does not appear explicitly in the density. The proposed model maintains its link to the regression through a certain random effects structure and is completely likelihood-based. We consider typical GLM cases where means are either real-valued, constrained to be positive, or constrained to be on the unit interval. The resulting model is applied to two example datasets through Bayesian analysis. Supporting the extra variation is seen to improve residual plots and produce widened prediction intervals reflecting the uncertainty. Supplementary materials for this article are available online.  相似文献   

14.
Testing for nonindependence among the residuals from a regression or time series model is a common approach to evaluating the adequacy of a fitted model. This idea underlies the familiar Durbin–Watson statistic, and previous works illustrate how the spatial autocorrelation among residuals can be used to test a candidate linear model. We propose here that a version of Moran's I statistic for spatial autocorrelation, applied to residuals from a fitted model, is a practical general tool for selecting model complexity under the assumption of iid additive errors. The “space” is defined by the independent variables, and the presence of significant spatial autocorrelation in residuals is evidence that a more complex model is needed to capture all of the structure in the data. An advantage of this approach is its generality, which results from the fact that no properties of the fitted model are used other than consistency. The problem of smoothing parameter selection in nonparametric regression is used to illustrate the performance of model selection based on residual spatial autocorrelation (RSA). In simulation trials comparing RSA with established selection criteria based on minimizing mean square prediction error, smooths selected by RSA exhibit fewer spurious features such as minima and maxima. In some cases, at higher noise levels, RSA smooths achieved a lower average mean square error than smooths selected by GCV. We also briefly describe a possible modification of the method for non-iid errors having short-range correlations, for example, time-series errors or spatial data. Some other potential applications are suggested, including variable selection in regression models.  相似文献   

15.
Surveys show that the mean absolute percentage error (MAPE) is the most widely used measure of prediction accuracy in businesses and organizations. It is, however, biased: when used to select among competing prediction methods it systematically selects those whose predictions are too low. This has not been widely discussed and so is not generally known among practitioners. We explain why this happens. We investigate an alternative relative accuracy measure which avoids this bias: the log of the accuracy ratio, that is, log (prediction/actual). Relative accuracy is particularly relevant if the scatter in the data grows as the value of the variable grows (heteroscedasticity). We demonstrate using simulations that for heteroscedastic data (modelled by a multiplicative error factor) the proposed metric is far superior to MAPE for model selection. Another use for accuracy measures is in fitting parameters to prediction models. Minimum MAPE models do not predict a simple statistic and so theoretical analysis is limited. We prove that when the proposed metric is used instead, the resulting least squares regression model predicts the geometric mean. This important property allows its theoretical properties to be understood.  相似文献   

16.
Abstract

This article deals with regression function estimation when the regression function is smooth at all but a finite number of points. An important question is: How can one produce discontinuous output without knowledge of the location of discontinuity points? Unlike most commonly used smoothers that tend to blur discontinuity in the data, we need to find a smoother that can detect such discontinuity. In this article, linear splines are used to estimate discontinuous regression functions. A procedure of knot-merging is introduced for the estimation of regression functions near discontinuous points. The basic idea is to use multiple knots for spline estimates. We use an automatic procedure involving the least squares method, stepwise knot addition, stepwise basis deletion, knot-merging, and the Bayes information criterion to select the final model. The proposed method can produce discontinuous outputs. Numerical examples using both simulated and real data are given to illustrate the performance of the proposed method.  相似文献   

17.
Due to the small sample size of data available in medical research and the levels of uncertainty and ambiguity associated with medical data, some researchers have employed fuzzy regression models to find the relationship between outcomes and explanatory variables in medical decision-making. The advantages of regression models are their ability to handle small sample sizes while fuzzy logic can model vagueness, thus making fuzzy regression a popular model among researchers. In addition, the high levels of uncertainty in medical data encourage the use of type-2 fuzzy which is capable of handling such uncertainty. The current paper proposes an interval type-2 fuzzy regression model for predicting retinopathy in diabetic patients. The results of the present work shall prevent unnecessary testing of diabetic patient. This study also aims to assist patients and the healthcare community to reduce the cost of diabetes control and treatment by optimizing the number of check-ups.  相似文献   

18.
We introduce a method for learning pairwise interactions in a linear regression or logistic regression model in a manner that satisfies strong hierarchy: whenever an interaction is estimated to be nonzero, both its associated main effects are also included in the model. We motivate our approach by modeling pairwise interactions for categorical variables with arbitrary numbers of levels, and then show how we can accommodate continuous variables as well. Our approach allows us to dispense with explicitly applying constraints on the main effects and interactions for identifiability, which results in interpretable interaction models. We compare our method with existing approaches on both simulated and real data, including a genome-wide association study, all using our R package glinternet.  相似文献   

19.
本文研究了不等式约束条件下部分线性回归模型的参数估计问题,利用最优化方法和贝叶斯方法,给出了不等式约束条件下部分线性回归模型的最小二乘核估计和最佳贝叶斯估计,并且证明了在一定条件下,带约束条件的最小二乘核估计在均方误差意义下要优于无约束条件的最小二乘核估计。  相似文献   

20.
The unknown parameters in multiple linear regression models may be estimated using any one of a number of criteria such as the minimization of the sum of squared errors MSSE, the minimization of the sum of absolute errors MSAE, and the minimization of the maximum absolute error MMAE. At present, the MSSE or the least squares criterion continues to be the most popular. However, at times the choice of a criterion is not clear from statistical, practical or other considerations. Under such circumstances, it may be more appropriate to use multiple criteria rather than a single criterion to estimate the unknown parameters in a multiple linear regression model. We motivate the use of multiple criteria estimation in linear regression models with an example, propose a few models, and outline a solution procedure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号