首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this work we address a technique for effectively clustering points in specific convex sets, called homogeneous boxes, having sides aligned with the coordinate axes (isothetic condition). The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and, even if it was originally developed in the context of the logical analysis of data, it is now placed inside the framework of Supervised clustering. First, we introduce the basic concepts in box geometry; then, we consider a generalized clustering algorithm based on a class of graphs, called incompatibility graphs. For supervised classification problems, we consider classifiers based on box sets, and compare the overall performances to the accuracy levels of competing methods for a wide range of real data sets. The results show that the proposed method performs comparably with other supervised learning methods in terms of accuracy.  相似文献   

2.
《Fuzzy Sets and Systems》2004,141(2):301-317
This paper presents fuzzy clustering algorithms for mixed features of symbolic and fuzzy data. El-Sonbaty and Ismail proposed fuzzy c-means (FCM) clustering for symbolic data and Hathaway et al. proposed FCM for fuzzy data. In this paper we give a modified dissimilarity measure for symbolic and fuzzy data and then give FCM clustering algorithms for these mixed data types. Numerical examples and comparisons are also given. Numerical examples illustrate that the modified dissimilarity gives better results. Finally, the proposed clustering algorithm is applied to real data with mixed feature variables of symbolic and fuzzy data.  相似文献   

3.
We consider the number Kn of clusters at a distance level dn ∈ (0, 1) of n independent random variables uniformly distributed in [0, 1], or the number Kn of connected components in the random interval graph generated by these variables and dn, and, depending upon how fast dn → 0 as n → ∞, determine the asymptotic distribution of Kn, with rates of convergence, and of related random variables that describe the cluster sizes. © 2004 Wiley Periodicals, Inc. Random Struct. Alg., 2004  相似文献   

4.
Advances in Data Analysis and Classification - The Ministry of Social Development in Mexico is in charge of creating and assigning social programmes targeting specific needs in the population for...  相似文献   

5.
For clustering objects, we often collect not only continuous variables, but binary attributes as well. This paper proposes a model-based clustering approach with mixed binary and continuous variables where each binary attribute is generated by a latent continuous variable that is dichotomized with a suitable threshold value, and where the scores of the latent variables are estimated from the binary data. In economics, such variables are called utility functions and the assumption is that the binary attributes (the presence or the absence of a public service or utility) are determined by low and high values of these functions. In genetics, the latent response is interpreted as the ??liability?? to develop a qualitative trait or phenotype. The estimated scores of the latent variables, together with the observed continuous ones, allow to use a multivariate Gaussian mixture model for clustering, instead of using a mixture of discrete and continuous distributions. After describing the method, this paper presents the results of both simulated and real-case data and compares the performances of the multivariate Gaussian mixture model and of a mixture of joint multivariate and multinomial distributions. Results show that the former model outperforms the mixture model for variables with different scales, both in terms of classification error rate and reproduction of the clusters means.  相似文献   

6.
We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA.  相似文献   

7.
Let X = (X t ) t∈[0,1] be a stochastic process with label Y ∈ {0, 1}.We assume that X is some Brownian diffusion when Y = 0, while X is another Brownian diffusion when Y = 1. Based on an explicit computation of the Bayes rule, we construct an empirical classification rule $\hat g$ drawn from an i.i.d. sample of copies of (X, Y). In a nonparametric setting, we prove that $\hat g$ is a consistent rule, and we derive its rate of convergence under mild assumptions on the model.  相似文献   

8.
Supervised fuzzy pattern recognition   总被引:1,自引:0,他引:1  
This paper is devoted to the problem of supervised fuzzy pattern recognition. The cases with non-fuzzy and fuzzy labels are considered. Based on the properties of linearly separable fuzzy classes, some algorithms are proposed for building matching functions of these classes. All algorithms are computer oriented and can be implemented for the automatic recognition of fuzzy patterns.  相似文献   

9.
Let ρ be an unknown Borel measure defined on the space Z := X × Y with X ⊂ ℝd and Y = [-M,M]. Given a set z of m samples zi =(xi,yi) drawn according to ρ, the problem of estimating a regression function fρ using these samples is considered. The main focus is to understand what is the rate of approximation, measured either in expectation or probability, that can be obtained under a given prior fρ ∈ Θ, i.e., under the assumption that fρ is in the set Θ, and what are possible algorithms for obtaining optimal or semioptimal (up to logarithms) results. The optimal rate of decay in terms of m is established for many priors given either in terms of smoothness of fρ or its rate of approximation measured in one of several ways. This optimal rate is determined by two types of results. Upper bounds are established using various tools in approximation such as entropy, widths, and linear and nonlinear approximation. Lower bounds are proved using Kullback-Leibler information together with Fano inequalities and a certain type of entropy. A distinction is drawn between algorithms which employ knowledge of the prior in the construction of the estimator and those that do not. Algorithms of the second type which are universally optimal for a certain range of priors are given.  相似文献   

10.
In high-dimensional data, one often seeks a few interesting low-dimensional projections that reveal important features of the data. Projection pursuit is a procedure for searching high-dimensional data for interesting low-dimensional projections via the optimization of a criterion function called the projection pursuit index. Very few projection pursuit indices incorporate class or group information in the calculation. Hence, they cannot be adequately applied in supervised classification problems to provide low-dimensional projections revealing class differences in the data. This article introduces new indices derived from linear discriminant analysis that can be used for exploratory supervised classification.  相似文献   

11.
Advances in Data Analysis and Classification - We propose a generative classification model that extends Quadratic Discriminant Analysis (QDA) (Cox in J R Stat Soc Ser B (Methodol)...  相似文献   

12.
Principal component analysis (PCA) is an important tool for dimension reduction in multivariate analysis. Regularized PCA methods, such as sparse PCA and functional PCA, have been developed to incorporate special features in many real applications. Sometimes additional variables (referred to as supervision) are measured on the same set of samples, which can potentially drive low-rank structures of the primary data of interest. Classical PCA methods cannot make use of such supervision data. In this article, we propose a supervised sparse and functional principal component (SupSFPC) framework that can incorporate supervision information to recover underlying structures that are more interpretable. The framework unifies and generalizes several existing methods and flexibly adapts to the practical scenarios at hand. The SupSFPC model is formulated in a hierarchical fashion using latent variables. We develop an efficient modified expectation-maximization (EM) algorithm for parameter estimation. We also implement fast data-driven procedures for tuning parameter selection. Our comprehensive simulation and real data examples demonstrate the advantages of SupSFPC. Supplementary materials for this article are available online.  相似文献   

13.
监督模糊模式识别交叉迭代模型   总被引:2,自引:0,他引:2  
从模糊模式识别概念出发,建立一种以决策者经验、偏好为监督,在方案优属等级识别过程中确定最佳目标权重和方案优属度的监督模糊模式识别交叉迭代算法,该算法集成了决策偏好信息完全未知、部分未知、完全已知的主客观权重识别方法。并严格证明了该算法的局部收敛性。  相似文献   

14.
Stability is a major requirement to draw reliable conclusions when interpreting results from supervised statistical learning. In this article, we present a general framework for assessing and comparing the stability of results, which can be used in real-world statistical learning applications as well as in simulation and benchmark studies. We use the framework to show that stability is a property of both the algorithm and the data-generating process. In particular, we demonstrate that unstable algorithms (such as recursive partitioning) can produce stable results when the functional form of the relationship between the predictors and the response matches the algorithm. Typical uses of the framework in practical data analysis would be to compare the stability of results generated by different candidate algorithms for a dataset at hand or to assess the stability of algorithms in a benchmark study. Code to perform the stability analyses is provided in the form of an R package. Supplementary material for this article is available online.  相似文献   

15.
16.
Min-cut clustering   总被引:1,自引:0,他引:1  
We describe a decomposition framework and a column generation scheme for solving a min-cut clustering problem. The subproblem to generate additional columns is itself an NP-hard mixed integer programming problem. We discuss strong valid inequalities for the subproblem and describe some efficient solution strategies. Computational results on compiler construction problems are reported.This paper is dedicated to Phil Wolfe on the occasion of his 65th birthday.This research was supported by NSF grants DMS-8719128 and DDM-9115768, and by an IBM grant to the Computational Optimization Center, Georgia Institute of Technology.  相似文献   

17.
In many clustering systems (hierarchies, pyramids and more generally weak hierarchies) clusters are generated by two elements only.This paper is devoted to such clustering systems (called binary clustering systems). It provides some basic properties, links with (closed) weak hierarchies and some qualitative versions of bijection theorems that occur in Numerical Taxonomy. Moreover, a way to associate a binary clustering system to every clustering system is discussed.Finally, introducing the notion of weak ultrametrics, a bijection between indexed weak hierarchies and weak ultrametrics is obtained (the standard theorem involves closed weak hierarchies and quasi-ultrametrics).  相似文献   

18.
Fuzzy variables     
The purpose of this study is to explore a possible axiomatic framework from which a rigorous theory of fuzziness may be constructed. The approach we propose is analogous to the sample space concept of probability theory. A fuzzy variable is a mapping from an abstract space (called the pattern space) onto the real line. The membership function is obtained as the extension of a special type of capacity (called a scale) from the pattern space to the real line via the fuzzy variable. In essence we are proposing an entirely new definition of a fuzzy set on the line as a mapping to the line rather than on the line. The current definition of a transformation of a fuzzy set is obtained as a derived result of our model. In addition, we derive the membership function of sums and products of fuzzy sets and present an example which reinforces the credibility of our approach.  相似文献   

19.
一种新的概率神经网络有监督学习算法   总被引:3,自引:0,他引:3  
提出一种新的PNN有监督学习算法:用学习矢量量化对各类训练样本进行聚类,对平滑参数σ和距离各类模式中心最近的聚类点构造区域,并采用遗传算法在构造的区域内训练网络,实验表明:该算法在分类效果上优于其它PNN学习算法。  相似文献   

20.
袁源  郭进利 《运筹与管理》2022,31(12):234-239
复杂网络已经成为复杂系统分析问题的通用方法,随着人工智能和机器学习的广泛兴起,越来越多的学者开始关注在复杂网络上进行机器学习。监督学习作为机器学习的一个重要组成部分,本文深入研究和总结了基于复杂网络的监督学习方法。首先,本文分别从复杂网络和监督学习的理论基础入手,明确了相似性函数和相异性函数的概念和测度方法,系统梳理了复杂网络的构建方法,并阐明了监督学习的概念及其在机器学习中的地位。其次,介绍了监督学习的几种常用算法,梳理了各种算法的研究现状。然后,提出了基于复杂网络监督学习方法未来关注方向。最后,说明了基于复杂网络监督学习方法的局限性,为相关学者的研究提供了参考。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号