期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Functional fuzzy clusterwise regression analysis

Tianyu Tan Hye Won Suk Heungsun Hwang Jooseop Lim 《Advances in Data Analysis and Classification》2013,7(1):57-82

We propose a functional extension of fuzzy clusterwise regression, which estimates fuzzy memberships of clusters and regression coefficient functions for each cluster simultaneously. The proposed method permits dependent and/or predictor variables to be functional, varying over time, space, and other continua. The fuzzy memberships and clusterwise regression coefficient functions are estimated by minimizing an objective function that adopts a basis function expansion approach to approximating functional data. An alternating least squares algorithm is developed to minimize the objective function. We conduct simulation studies to demonstrate the superior performance of the proposed method compared to its non-functional counterpart and to examine the performance of various cluster validity measures for selecting the optimal number of clusters. We apply the proposed method to real datasets to illustrate the empirical usefulness of the proposed method. 相似文献

2.

带线性约束的回归模型参数估计的新研究

吴平王中艳陈兰花《数学理论与应用》2010,(4):103-106

针对一般带约束的最小二乘估计(ORLSE)在参数估计中处理复共线性的不足,引入随机线性约束,提出了约束k-d估计方法。在均方误差(MSE)下,讨论了它的性质,得到了四个主要结果,与带约束的最小二乘估计ORLSE、约束岭估计(RRE)和约束型Liu估计比较,得出更好的结论。相似文献

3.

Least-squares estimates in fuzzy regression analysis

《European Journal of Operational Research》2003,148(2):426-435

Regression is a very powerful methodology for forecasting, which is considered as an essential component of successful OR applications. In this paper an idea stemmed from the classical least squares is proposed to handle fuzzy observations in regression analysis. Based on the extension principle, the membership function of the sum of squared errors is constructed. The fuzzy sum of squared errors is a function of the regression coefficients to be determined, which can be minimized via a nonlinear program formulated under the structure of the Chen–Klein method for ranking fuzzy numbers. To illustrate how the proposed method is applied, three cases, one crisp input-fuzzy output, one fuzzy input-fuzzy output, and one non-triangular fuzzy observations, are exemplified. The results show that the least-squares method of this paper is able to determine the regression coefficients with better explanatory power. Most important, it works for all types of fuzzy observations, not restricted to the triangular one. 相似文献

4.

A mathematical programming approach to clusterwise regression model and its extensions

《European Journal of Operational Research》1999,116(3):640-652

The clusterwise regression model is used to perform cluster analysis within a regression framework. While the traditional regression model assumes the regression coefficient (β) to be identical for all subjects in the sample, the clusterwise regression model allows β to vary with subjects of different clusters. Since the cluster membership is unknown, the estimation of the clusterwise regression is a tough combinatorial optimization problem. In this research, we propose a “Generalized Clusterwise Regression Model” which is formulated as a mathematical programming (MP) problem. A nonlinear programming procedure (with linear constraints) is proposed to solve the combinatorial problem and to estimate the cluster membership and β simultaneously. Moreover, by integrating the cluster analysis with the discriminant analysis, a clusterwise discriminant model is developed to incorporate parameter heterogeneity into the traditional discriminant analysis. The cluster membership and discriminant parameters are estimated simultaneously by another nonlinear programming model. 相似文献

5.

Clusterwise analysis for multiblock component methods

Stéphanie Bougeard Hervé Abdi Gilbert Saporta Ndèye Niang 《Advances in Data Analysis and Classification》2018,12(2):285-313

Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing. 相似文献

6.

Globally optimal clusterwise regression by mixed logical-quadratic programming

Réal A. Carbonneau Gilles CaporossiPierre Hansen 《European Journal of Operational Research》2011,212(1):213-222

Exact global optimization of the clusterwise regression problem is challenging and there are currently no published feasible methods for performing this clustering optimally, even though it has been over thirty years since its original proposal. This work explores global optimization of the clusterwise regression problem using mathematical programming and related issues. A mixed logical-quadratic programming formulation with implication of constraints is presented and contrasted against a quadratic formulation based on the traditional big-M, which cannot guarantee optimality because the regression line coefficients, and thus errors, may be arbitrarily large. Clusterwise regression optimization times and solution optimality for two clusters are empirically tested on twenty real datasets and three series of synthetic datasets ranging from twenty to one hundred observations and from two to ten independent variables. Additionally, a few small real datasets are clustered into three lines. 相似文献

7.

带线性约束的回归模型参数的Liu估计方法 总被引：1，自引：0，他引：1

黄文焕戚佳金黄南天《系统科学与数学》2009,29(7):937-946

针对一般带约束的最小二乘估计(ORLSE)在参数估计中处理复共线性的不足,通过引入附加随机线性约束,提出了约束Liu型估计方法.经理论证明,此方法在均方误差下较已经提出的ORLSE和约束岭估计(RRE)效果更好,并且它可以作为两种方法的推广形式.讨论了其中k, d的确定原则,并将此方法推广到了广义形式和典则形式下的估计公式. 相似文献

8.

Comparisons among some estimators in misspecified linear models with multicollinearity

Nityananda Sarkar 《Annals of the Institute of Statistical Mathematics》1989,41(4):717-724

In this paper we deal with comparisons among several estimators available in situations of multicollinearity (e.g., the r-k class estimator proposed by Baye and Parker, the ordinary ridge regression (ORR) estimator, the principal components regression (PCR) estimator and also the ordinary least squares (OLS) estimator) for a misspecified linear model where misspecification is due to omission of some relevant explanatory variables. These comparisons are made in terms of the mean square error (mse) of the estimators of regression coefficients as well as of the predictor of the conditional mean of the dependent variable. It is found that under the same conditions as in the true model, the superiority of the r-k class estimator over the ORR, PCR and OLS estimators and those of the ORR and PCR estimators over the OLS estimator remain unchanged in the misspecified model. Only in the case of comparison between the ORR and PCR estimators, no definite conclusion regarding the mse dominance of one over the other in the misspecified model can be drawn. 相似文献

9.

部分线性模型的随机约束岭估计

魏传华郭双王肖南《数学的实践与认识》2014,(13)

研究了部分线性回归模型附加有随机约束条件时的估计问题.基于Profile最小二乘方法和混合估计方法提出了参数分量随机约束下的Profile混合估计,并研究了其性质.为了克服共线性问题,构造了参数分量的Profile混合岭估计,并给出了估计量的偏和方差. 相似文献

10.

Fuzzy clusterwise quasi-likelihood generalized linear models

Heungsun Hwang Marc A. Tomiuk 《Advances in Data Analysis and Classification》2010,4(4):255-270

The quasi-likelihood method has emerged as a useful approach to the parameter estimation of generalized linear models (GLM) in circumstances where there is insufficient distributional information to construct a likelihood function. Despite its flexibility, the quasi-likelihood approach to GLM is currently designed for an aggregate-sample analysis based on the assumption that the entire sample of observations is taken from a single homogenous population. Thus, this approach may not be suitable when heterogeneous subgroups exist in the population, which involve qualitatively distinct effects of covariates on the response variable. In this paper, the quasi-likelihood GLM approach is generalized to a fuzzy clustering framework which explicitly accounts for such cluster-level heterogeneity. A simple iterative estimation algorithm is presented to optimize the regularized fuzzy clustering criterion of the proposed method. The performance of the proposed method in recovering parameters is investigated based on a Monte Carlo analysis involving synthetic data. Finally, the empirical usefulness of the proposed method is illustrated through an application to actual data on the coupon usage behaviour of a sample of consumers. 相似文献

11.

基于泛岭估计对岭估计过度压缩的改进方法

刘文卿《数理统计与管理》2011,30(4):614-619

岭估计是解决多元线性回归多重共线性问题的有效方法,是有偏的压缩估计。与普通最小二乘估计相比,岭估计可以降低参数估计的均方误差,但是却增大残差平方和,拟合效果变差。本文提出一种基于泛岭估计对岭估计过度压缩的改进方法,可以改进岭估计的拟合效果,减小岭估计残差平方和的增加幅度。相似文献

12.

投影寻踪模型的改进及其在城市水资源承载能力预测中的应用 总被引：5，自引：0，他引：5

赵小勇付强《数学的实践与认识》2007,37(7):76-81

在城市水资源承载能力研究中,偏最小二乘回归方法能有效地处理自变量间多重线性相关性问题,但不能较好地处理因变量与自变量间复杂的非线性问题.投影寻踪神经网络耦合模型是处理非线性问题的有力工具,而且神经网络投影寻踪耦合模型稳健性高,但不能较好地处理自变量间多重线性相关性问题.本文把这两种方法结合在一起,建立了基于偏最小二乘回归的神经网络投影寻踪耦合模型,对城市水资源承载能力进行了预测,并取得了满意效果. 相似文献

13.

Influence measures in ridge regression when the error terms follow an Ar(1) process

Tuğba Söküt Açar M. Revan Özkale 《Computational Statistics》2016,31(3):879-898

Influence concepts have an important place in linear regression models and case deletion is a useful method for assessing the influence of single case. The influence measures in the presence of multicollinearity were discussed under the linear regression models when the errors structure is uncorrelated and homoscedastic. In contrast to other article on this subject, we consider the influence measures in ridge regression with autocorrelated errors. Theoretical results are illustrated with a numerical example and a Monte Carlo simulation is conducted to see the effect autocorrelation coefficient, strength of multicollinearity and sample size on leverage points and influential observations. 相似文献

14.

系数为LR-型模糊数的模糊线性最小二乘回归 总被引：2，自引：2，他引：0

梁艳魏立力《模糊系统与数学》2007,21(3):111-117

针对输入、输出以及系数为LR-型模糊数的情况,建立模糊线性回归模型,提出该模型的最小二乘估计以及模型性能评价方法。当输入、输出以及系数都退化为精确值时,该估计退化为经典的最小二乘估计。该方法不仅适用于三角模糊数,也适用于其它LR-型模糊数(如指数型模糊数)。数值模拟表明,该方法的拟合效果较好。相似文献

15.

Supervised learning algorithms for multi-class classification problems with partial class memberships 总被引：1，自引：0，他引：1

Willem Waegeman Jan Verwaeren Bram Slabbinck Bernard De Baets 《Fuzzy Sets and Systems》2011,184(1):106-125

In several application domains such as biology, computer vision, social network analysis and information retrieval, multi-class classification problems arise in which data instances not simply belong to one particular class, but exhibit a partial membership to several classes. Existing machine learning or fuzzy set approaches for representing this type of fuzzy information mainly focus on unsupervised methods. In contrast, we present in this article supervised learning algorithms for classification problems with partial class memberships, where class memberships instead of crisp class labels serve as input for fitting a model to the data. Using kernel logistic regression (KLR) as a baseline method, first a basic one-versus-all approach is proposed, by replacing the binary-coded label vectors with [0,1]-valued class memberships in the likelihood. Subsequently, we use this KLR extension as base classifier to construct one-versus-one decompositions, in which partial class memberships are transformed and estimated in a pairwise manner. Empirical results on synthetic data and a real-world application in bioinformatics confirm that our approach delivers promising results. The one-versus-all method yields the best computational efficiency, while the one-versus-one methods are preferred in terms of predictive performance, especially when the observed class memberships are heavily unbalanced. 相似文献

16.

Minimax multivariate empirical Bayes estimators under multicollinearity

M. S. Srivastava T. Kubokawa 《Journal of multivariate analysis》2005,93(2):394-416

In this paper we consider the problem of estimating the matrix of regression coefficients in a multivariate linear regression model in which the design matrix is near singular. Under the assumption of normality, we propose empirical Bayes ridge regression estimators with three types of shrinkage functions, that is, scalar, componentwise and matricial shrinkage. These proposed estimators are proved to be uniformly better than the least squares estimator, that is, minimax in terms of risk under the Strawderman's loss function. Through simulation and empirical studies, they are also shown to be useful in the multicollinearity cases. 相似文献

17.

Clustering of functional data in a low-dimensional subspace

Michio Yamamoto 《Advances in Data Analysis and Classification》2012,6(3):219-247

To find optimal clusters of functional objects in a lower-dimensional subspace of data, a sequential method called tandem analysis, is often used, though such a method is problematic. A new procedure is developed to find optimal clusters of functional objects and also find an optimal subspace for clustering, simultaneously. The method is based on the k-means criterion for functional data and seeks the subspace that is maximally informative about the clustering structure in the data. An efficient alternating least-squares algorithm is described, and the proposed method is extended to a regularized method. Analyses of artificial and real data examples demonstrate that the proposed method gives correct and interpretable results. 相似文献

18.

多元广义岭估计及K值选取的 Q(c)准则

陈世基曾志斌《应用数学和力学》1993,(1)

当自变量间存在复共线性时,最小二乘估计就表现出不稳定并可能导致错误的结果.本文采用广义岭估计β(K)来估计多元线性模型的回归系数β=vec(B),通过岭参数K值的选取,可使广义岭估计的均方误差MSE小于最小二乘估计的MSE.指出了广义岭估计中根据MSE准则选取K值存在的主要缺陷,采用了一种选取K值的新准则Q(c),它包含MSE准则和最小二乘LS准则作为特例,从理论上证明和讨论了Q(c)准则的优良性,阐明了c值的统计含义,并给出了确定c值的方法. 相似文献

19.

Robust kernel ridge regression based on M-estimation

Antoni Wibowo 《Computational Mathematics and Modeling》2009,20(4):438-446

Ridge regression (RR) and kernel ridge regression (KRR) are important tools to avoid the effects of multicollinearity. However, the predictions of RR and KRR become inappropriate for use in regression models when data are contaminated by outliers. In this paper, we propose an algorithm to obtain a nonlinear robust prediction without specifying a nonlinear model in advance. We combine M-estimation and kernel ridge regression to obtain the nonlinear prediction. Then, we compare the proposed method with some other methods. 相似文献

20.

基于多目标规划法的模糊线性回归分析

黄华宋艳萍苗新艳肉孜阿吉《模糊系统与数学》2012,26(3):114-119

讨论输入、输出均为模糊数,回归系数为实数时的模糊线性回归分析。由于模糊最小二乘线性回归容易受异常值的影响,而最小一乘法能有效地降低回归模型的误差。为此,基于最小一乘法,建立多目标规划模型并将其转化为非线性规划问题进行求解,从而实现模糊线性回归模型的参数估计。最后,结合一个数值实例,验证和比较该方法的合理性和优越性。相似文献