首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
To find optimal clusters of functional objects in a lower-dimensional subspace of data, a sequential method called tandem analysis, is often used, though such a method is problematic. A new procedure is developed to find optimal clusters of functional objects and also find an optimal subspace for clustering, simultaneously. The method is based on the k-means criterion for functional data and seeks the subspace that is maximally informative about the clustering structure in the data. An efficient alternating least-squares algorithm is described, and the proposed method is extended to a regularized method. Analyses of artificial and real data examples demonstrate that the proposed method gives correct and interpretable results.  相似文献   

2.
Functional data clustering: a survey   总被引:1,自引:0,他引:1  
Clustering techniques for functional data are reviewed. Four groups of clustering algorithms for functional data are proposed. The first group consists of methods working directly on the evaluation points of the curves. The second groups is defined by filtering methods which first approximate the curves into a finite basis of functions and second perform clustering using the basis expansion coefficients. The third groups is composed of methods which perform simultaneously dimensionality reduction of the curves and clustering, leading to functional representation of data depending on clusters. The last group consists of distance-based methods using clustering algorithms based on specific distances for functional data. A software review as well as an illustration of the application of these algorithms on real data are presented.  相似文献   

3.
Yao  Xin  Cheng  Yuanyuan  Zhou  Li  Song  Malin 《Annals of Operations Research》2022,308(1-2):727-752

This paper aims to analyze the green efficiency performance of the logistics industry in China’s 30 provinces from 2008 to 2017. We first evaluate the green efficiency of the logistics industry through the non-directional distance function method. Then, we use the functional clustering method funHDDC, which is one of the popular machine learning methods, to divide 30 provinces into 4 clusters and analyze the similarities and differences in green efficiency performance patterns among different groups. Further, we explore the driving factors of dynamic changes in green efficiency through the decomposition method. The main conclusions of this paper are as follows: (1) in general, the level of green efficiency is closely related to the geographical location. From the clustering results, we can find that most of the eastern regions belong to the cluster with higher green efficiency, while most of the western regions belong to the cluster with lower green efficiency. However, the green efficiency performance in several regions with high economic levels, such as Beijing and Shanghai, is not satisfactory. (2) Based on the analysis of decomposition results, the innovation effect of China’s logistics industry is the most obvious, but the efficiency change still needs to be improved, and technical leadership should be strengthened. Based on these conclusions, we further propose some policy recommendations for the green development of the logistics industry in China.

  相似文献   

4.
The problem of Hybrid Linear Modeling (HLM) is to model and segment data using a mixture of affine subspaces. Different strategies have been proposed to solve this problem, however, rigorous analysis justifying their performance is missing. This paper suggests the Theoretical Spectral Curvature Clustering (TSCC) algorithm for solving the HLM problem and provides careful analysis to justify it. The TSCC algorithm is practically a combination of Govindu’s multi-way spectral clustering framework (CVPR 2005) and Ng et al.’s spectral clustering algorithm (NIPS 2001). The main result of this paper states that if the given data is sampled from a mixture of distributions concentrated around affine subspaces, then with high sampling probability the TSCC algorithm segments well the different underlying clusters. The goodness of clustering depends on the within-cluster errors, the between-clusters interaction, and a tuning parameter applied by TSCC. The proof also provides new insights for the analysis of Ng et al. (NIPS 2001). This work was supported by NSF grant #0612608.  相似文献   

5.
Cluster analysis is an unsupervised learning technique for partitioning objects into several clusters. Assuming that noisy objects are included, we propose a soft clustering method which assigns objects that are significantly different from noise into one of the specified number of clusters by controlling decision errors through multiple testing. The parameters of the Gaussian mixture model are estimated from the EM algorithm. Using the estimated probability density function, we formulated a multiple hypothesis testing for the clustering problem, and the positive false discovery rate (pFDR) is calculated as our decision error. The proposed procedure classifies objects into significant data or noise simultaneously according to the specified target pFDR level. When applied to real and artificial data sets, it was able to control the target pFDR reasonably well, offering a satisfactory clustering performance.  相似文献   

6.
This paper presents an approach for online learning of Takagi–Sugeno (T-S) fuzzy models. A novel learning algorithm based on a Hierarchical Particle Swarm Optimization (HPSO) is introduced to automatically extract all fuzzy logic system (FLS)’s parameters of a T–S fuzzy model. During online operation, both the consequent parameters of the T–S fuzzy model and the PSO inertia weight are continually updated when new data becomes available. By applying this concept to the learning algorithm, a new type T–S fuzzy modeling approach is constructed where the proposed HPSO algorithm includes an adaptive procedure and becomes a self-adaptive HPSO (S-AHPSO) algorithm usable in real-time processes. To improve the computational time of the proposed HPSO, particles positions are initialized by using an efficient unsupervised fuzzy clustering algorithm (UFCA). The UFCA combines the K-nearest neighbour and fuzzy C-means methods into a fuzzy modeling method for partitioning of the input–output data and identifying the antecedent parameters of the fuzzy system, enhancing the HPSO’s tuning. The approach is applied to identify the dynamical behavior of the dissolved oxygen concentration in an activated sludge reactor within a wastewater treatment plant. The results show that the proposed approach can identify nonlinear systems satisfactorily, and reveal superior performance of the proposed methods when compared with other state of the art methods. Moreover, the methodologies proposed in this paper can be involved in wider applications in a number of fields such as model predictive control, direct controller design, unsupervised clustering, motion detection, and robotics.  相似文献   

7.
Unsupervised classification is a highly important task of machine learning methods. Although achieving great success in supervised classification, support vector machine (SVM) is much less utilized to classify unlabeled data points, which also induces many drawbacks including sensitive to nonlinear kernels and random initializations, high computational cost, unsuitable for imbalanced datasets. In this paper, to utilize the advantages of SVM and overcome the drawbacks of SVM-based clustering methods, we propose a completely new two-stage unsupervised classification method with no initialization: a new unsupervised kernel-free quadratic surface SVM (QSSVM) model is proposed to avoid selecting kernels and related kernel parameters, then a golden-section algorithm is designed to generate the appropriate classifier for balanced and imbalanced data. By studying certain properties of proposed model, a convergent decomposition algorithm is developed to implement this non-covex QSSVM model effectively and efficiently (in terms of computational cost). Numerical tests on artificial and public benchmark data indicate that the proposed unsupervised QSSVM method outperforms well-known clustering methods (including SVM-based and other state-of-the-art methods), particularly in terms of classification accuracy. Moreover, we extend and apply the proposed method to credit risk assessment by incorporating the T-test based feature weights. The promising numerical results on benchmark personal credit data and real-world corporate credit data strongly demonstrate the effectiveness, efficiency and interpretability of proposed method, as well as indicate its significant potential in certain real-world applications.  相似文献   

8.
The method of cyclic projections finds nearest points in the intersection of finitely many affine subspaces. To accelerate convergence, Gearhart & Koshy proposed a modification which, in each iteration, performs an exact line search based on minimising the distance to the solution. When the subspaces are linear, the procedure can be made explicit using feasibility of the zero vector. This work studies an alternative approach which does not rely on this fact, thus providing an efficient implementation in the affine setting.  相似文献   

9.
A model based clustering procedure for data of mixed type, clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. clustMD employs a parsimonious covariance structure for the latent variables, leading to a suite of six clustering models that vary in complexity and provide an elegant and unified approach to clustering mixed data. An expectation maximisation (EM) algorithm is used to estimate clustMD; in the presence of nominal data a Monte Carlo EM algorithm is required. The clustMD model is illustrated by clustering simulated mixed type data and prostate cancer patients, on whom mixed data have been recorded.  相似文献   

10.
数据描述又称为一类分类方法,用于描述现有数据的分布特征,以研究待测试数据是否与该分布相吻合.首先简要叙述了基于核方法的数据描述原理,指出:选择适当的核函数以及与之对应的参数,数据描述可应用于模式聚类中,并且这种聚类方法具有边界紧致、易剔除噪声的优势.针对基于数据描述的聚类方法在确定类别数目和具体样本类别归属上所存在的问题,提出了基于搜索的解决方法,理论分析和实例计算都验证了该方法的可行性.最后将该聚类算法应用到企业关系评价中,取得了较为合理的结果.  相似文献   

11.
俞燕  徐勤丰  孙鹏飞 《应用数学》2006,19(3):600-605
本文基于Dirichlet分布有限混合模型,提出了一种用于成分数据的Bayes聚类方法.采用EM算法获得模型参数的估计,用BIC准则确定类数,用类似于Bayes判别的方法对各观测分类.推导了计算公式,编写出程序.模拟研究结果表明,本文提出的方法有较好的聚类效果.  相似文献   

12.
The quasi-likelihood method has emerged as a useful approach to the parameter estimation of generalized linear models (GLM) in circumstances where there is insufficient distributional information to construct a likelihood function. Despite its flexibility, the quasi-likelihood approach to GLM is currently designed for an aggregate-sample analysis based on the assumption that the entire sample of observations is taken from a single homogenous population. Thus, this approach may not be suitable when heterogeneous subgroups exist in the population, which involve qualitatively distinct effects of covariates on the response variable. In this paper, the quasi-likelihood GLM approach is generalized to a fuzzy clustering framework which explicitly accounts for such cluster-level heterogeneity. A simple iterative estimation algorithm is presented to optimize the regularized fuzzy clustering criterion of the proposed method. The performance of the proposed method in recovering parameters is investigated based on a Monte Carlo analysis involving synthetic data. Finally, the empirical usefulness of the proposed method is illustrated through an application to actual data on the coupon usage behaviour of a sample of consumers.  相似文献   

13.
Supervised clustering of variables   总被引:1,自引:0,他引:1  
In predictive modelling, highly correlated predictors lead to unstable models that are often difficult to interpret. The selection of features, or the use of latent components that reduce the complexity among correlated observed variables, are common strategies. Our objective with the new procedure that we advocate here is to achieve both purposes: to highlight the group structure among the variables and to identify the most relevant groups of variables for prediction. The proposed procedure is an iterative adaptation of a method developed for the clustering of variables around latent variables (CLV). Modification of the standard CLV algorithm leads to a supervised procedure, in the sense that the variable to be predicted plays an active role in the clustering. The latent variables associated with the groups of variables, selected for their “proximity” to the variable to be predicted and their “internal homogeneity”, are progressively added in a predictive model. The features of the methodology are illustrated based on a simulation study and a real-world application.  相似文献   

14.
15.
We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The proposed procedure simultaneously selects significant covariates with functional coefficients and local significant variables with parametric coefficients. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. The proposed method can naturally be applied to deal with pure single-index model and varying-coefficient model. Finite sample performances of the proposed method are illustrated by a simulation study and the real data analysis.  相似文献   

16.
We consider a problem of estimating local smoothness of a spatially inhomogeneous function from noisy data under the framework of smoothing splines. Most existing studies related to this problem deal with estimation induced by a single smoothing parameter or partially local smoothing parameters, which may not be efficient to characterize various degrees of smoothness of the underlying function when it is spatially varying. In this paper, we propose a new nonparametric method to estimate local smoothness of the function based on a moving local risk minimization coupled with spatially adaptive smoothing splines. The proposed method provides full information of the local smoothness at every location on the entire data domain, so that it is able to understand the degrees of spatial inhomogeneity of the function. A successful estimate of the local smoothness is useful for identifying abrupt changes of smoothness of the data, performing functional clustering and improving the uniformity of coverage of the confidence intervals of smoothing splines. We further consider a nontrivial extension of the local smoothness of inhomogeneous two-dimensional functions or spatial fields. Empirical performance of the proposed method is evaluated through numerical examples, which demonstrates promising results of the proposed method.  相似文献   

17.
This article presents a Bayesian kernel-based clustering method. The associated model arises as an embedding of the Potts density for class membership probabilities into an extended Bayesian model for joint data and class membership probabilities. The method may be seen as a principled extension of the super-paramagnetic clustering. The model depends on two parameters: the temperature and the kernel bandwidth. The clustering is obtained from the posterior marginal adjacency membership probabilities and does not depend on any particular value of the parameters. We elicit an informative prior based on random graph theory and kernel density estimation. A stochastic population Monte Carlo algorithm, based on parallel runs of the Wang–Landau algorithm, is developed to estimate the posterior adjacency membership probabilities and the parameter posterior. The convergence of the algorithm is also established. The method is applied to the whole human proteome to uncover human genes that share common evolutionary history. Our experiments and application show that good clustering results are obtained at many different values of the temperature and bandwidth parameters. Hence, instead of focusing on finding adequate values of the parameters, we advocate making clustering inference based on the study of the distribution of the posterior adjacency membership probabilities. This article has online supplementary material.  相似文献   

18.
A new online clustering method called E2GK (Evidential Evolving Gustafson–Kessel) is introduced. This partitional clustering algorithm is based on the concept of credal partition defined in the theoretical framework of belief functions. A credal partition is derived online by applying an algorithm resulting from the adaptation of the Evolving Gustafson–Kessel (EGK) algorithm. Online partitioning of data streams is then possible with a meaningful interpretation of the data structure. A comparative study with the original online procedure shows that E2GK outperforms EGK on different entry data sets. To show the performance of E2GK, several experiments have been conducted on synthetic data sets as well as on data collected from a real application problem. A study of parameters’ sensitivity is also carried out and solutions are proposed to limit complexity issues.  相似文献   

19.
A modified version of the Akaike information criterion and two modified versions of the Bayesian information criterion are proposed to select the number of principal components and to choose the penalty parameters of penalized splines in a joint model of paired functional data. Numerical results show that, compared with an existing procedure using the cross-validation, the procedure based on the information criteria is computationally much faster while giving a similar performance.  相似文献   

20.
Increasingly, scientific studies yield functional image data, in which the observed data consist of sets of curves recorded on the pixels of the image. Examples include temporal brain response intensities measured by fMRI and NMR frequency spectra measured at each pixel.

This article presents a new methodology for improving the characterization of pixels in functional imaging, formulated as a spatial curve clustering problem. Our method operates on curves as a unit. It is nonparametric and involves multiple stages: (i) wavelet thresholding, aggregation, and Neyman truncation to effectively reduce dimensionality; (ii) clustering based on an extended EM algorithm; and (iii) multiscale penalized dyadic partitioning to create a spatial segmentation. We motivate the different stages with theoretical considerations and arguments, and illustrate the overall procedure on simulated and real datasets. Our method appears to offer substantial improvements over monoscale pixel-wise methods.

An Appendix which gives some theoretical justifications of the methodology, computer code, documentation and dataset are available in the online supplements.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号