首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 38 毫秒
1.
Quasi-ultrametric multi-way dissimilarities and their respective sets of k-balls extend the fundamental bijection in classification between ultrametric pairwise dissimilarities and indexed hierarchies. We show that nonempty Galois closed subsets of a finite entity set coincide with k-balls of some quasi-ultrametric multi-way dissimilarity. This result relates the order theoretic Galois connection based clustering approach to the dissimilarity based one. Moreover, it provides an effective way to specify easy-to-interpret cluster systems, from complex data sets, as well as to derive informative attribute implications.  相似文献   

2.
一种新的分类方法   总被引:5,自引:0,他引:5  
本文在属性聚类网络的基础上 ,提出了堆近邻分类方法 .通过将无监督的属性聚类加上有监督信息 ,能自适应地优选堆数 .样本所考察的近邻个数依据它所在的堆的大小 ,因而每个样本所考查的近邻的个数不是完全相等的 .这种方法可用到高维小样本的数据分类问题中 .我们将它应用到基因表达谱形式的癌症辩识问题中 ,结果表明分类性能得到了较大的提高  相似文献   

3.
In this work, we assess the suitability of cluster analysis for the gene grouping problem confronted with microarray data. Gene clustering is the exercise of grouping genes based on attributes, which are generally the expression levels over a number of conditions or subpopulations. The hope is that similarity with respect to expression is often indicative of similarity with respect to much more fundamental and elusive qualities, such as function. By formally defining the true gene-specific attributes as parameters, such as expected expression across the conditions, we obtain a well-defined gene clustering parameter of interest, which greatly facilitates the statistical treatment of gene clustering. We point out that genome-wide collections of expression trajectories often lack natural clustering structure, prior to ad hoc gene filtering. The gene filters in common use induce a certain circularity to most gene cluster analyses: genes are points in the attribute space, a filter is applied to depopulate certain areas of the space, and then clusters are sought (and often found!) in the “cleaned” attribute space. As a result, statistical investigations of cluster number and clustering strength are just as much a study of the stringency and nature of the filter as they are of any biological gene clusters. In the absence of natural clusters, gene clustering may still be a worthwhile exercise in data segmentation. In this context, partitions can be fruitfully encoded in adjacency matrices and the sampling distribution of such matrices can be studied with a variety of bootstrapping techniques.  相似文献   

4.
Many clustering schemes are defined by optimizing an objective function defined on the partitions of the underlying set of a finite metric space. In this paper, we construct a framework for studying what happens when we instead impose various structural conditions on the clustering schemes, under the general heading of functoriality. Functoriality refers to the idea that one should be able to compare the results of clustering algorithms as one varies the dataset, for example by adding points or by applying functions to it. We show that, within this framework, one can prove a theorem analogous to one of Kleinberg (Becker et al. (eds.), NIPS, pp. 446–453, MIT Press, Cambridge, 2002), in which, for example, one obtains an existence and uniqueness theorem instead of a nonexistence result. We obtain a full classification of all clustering schemes satisfying a condition we refer to as excisiveness. The classification can be changed by varying the notion of maps of finite metric spaces. The conditions occur naturally when one considers clustering as the statistical version of the geometric notion of connected components. By varying the degree of functoriality that one requires from the schemes, it is possible to construct richer families of clustering schemes that exhibit sensitivity to density.  相似文献   

5.
Sequential clustering aims at determining homogeneous and/or well-separated clusters within a given set of entities, one at a time, until no more such clusters can be found. We consider a bi-criterion sequential clustering problem in which the radius of a cluster (or maximum dissimilarity between an entity chosen as center and any other entity of the cluster) is chosen as a homogeneity criterion and the split of a cluster (or minimum dissimilarity between an entity in the cluster and one outside of it) is chosen as a separation criterion. An O(N 3) algorithm is proposed for determining radii and splits of all efficient clusters, which leads to an O(N 4) algorithm for bi-criterion sequential clustering with radius and split as criteria. This algorithm is illustrated on the well known Ruspini data set.  相似文献   

6.
In this paper we present a new method for clustering categorical data sets named CL.E.KMODES. The proposed method is a modified k-modes algorithm that incorporates a new four-step dissimilarity measure, which is based on elements of the methodological framework of the ELECTRE I multicriteria method. The four-step dissimilarity measure introduces an alternative and more accurate way of assigning objects to clusters. In particular, it compares each object with each mode, for every attribute that they have in common, and then chooses the most appropriate mode and its corresponding cluster for that object. Seven widely used data sets are tested to verify the robustness of the proposed method in six clustering evaluation measures.  相似文献   

7.
Multicriteria conflict arises in pairwise comparisons, where each alternative outperforms the other one on some criterion, which imposes a trade-off. Comparing two alternatives can be difficult if their respective advantages are of high magnitude (the attribute spread is large). In this paper, we investigate to which extent conflict in a comparison situation can lead decision makers to express incomplete preferences, that is, to refuse to compare the two alternatives, or to be unable to compare them with confidence. We report on an experiment in which subjects expressed preferences on pairs of alternatives involving varying conflicts. Results show that depending on whether the participants are allowed to express incomplete preferences or not, attribute spread has a different effect: a large attribute spread increases the frequency of incomparability statements, when available, while it increases the use of indifference statements when only indifference and preference answers are permitted. These results lead us to derive some implications for preference elicitation methods involving comparison tasks.  相似文献   

8.
目前模糊技术已经应用于许多智能系统,如模糊关系与模糊聚类.聚类是数据挖掘的重要任务,它将数据对像分成多个聚类,在同一个聚类中,对象的属性特征之间具有较高的相似度,有很大研究及应用价值.结合数据库中的挖掘技术,对属性特征为区间数的多属性决策问题,提出了一种基于区间数隶属度的区间模糊ISODATA动态聚类方法.  相似文献   

9.
The attribute-oriented induction (AOI for short) method is one of the most important data mining methods. The input of the AOI method contains a relational table and a concept tree (concept hierarchy) for each attribute, and the output is a small relation summarizing the general characteristics of the task-relevant data. Although AOI is very useful for inducing general characteristics, it has the limitation that it can only be applied to relational data, where there is no order among the data items. If the data are ordered, the existing AOI methods are unable to find the generalized knowledge. In view of this weakness, this paper proposes a dynamic programming algorithm, based on AOI techniques, to find generalized knowledge from an ordered list of data. By using the algorithm, we can discover a sequence of K generalized tuples describing the general characteristics of different segments of data along the list, where K is a parameter specified by users.  相似文献   

10.
针对不确定多属性决策中的属性信息分布不均匀,且评价信息多数为二维信息的情况,本文提出了二维区间密度加权算子(TDIDW算子)的属性信息集结方法.依据密度算子的集结过程特点,文章首先定义了二维区间密度加权算子及其合成算子,然后介绍了基于灰色区间聚类法的评价信息分组方法以及基于非线性模型的密度加权向量确定方法,最后进行了算例验证.验证结果表明,该方法可以有效地解决由于属性信息分布不均匀而导致评价结果不准确的问题.  相似文献   

11.
《Journal of Complexity》2002,18(1):375-391
The process of partitioning a large set of patterns into disjoint and homogeneous clusters is fundamental in knowledge acquisition. It is called Clustering in the literature and it is applied in various fields including data mining, statistical data analysis, compression and vector quantization. The k-means is a very popular algorithm and one of the best for implementing the clustering process. The k-means has a time complexity that is dominated by the product of the number of patterns, the number of clusters, and the number of iterations. Also, it often converges to a local minimum. In this paper, we present an improvement of the k-means clustering algorithm, aiming at a better time complexity and partitioning accuracy. Our approach reduces the number of patterns that need to be examined for similarity, in each iteration, using a windowing technique. The latter is based on well known spatial data structures, namely the range tree, that allows fast range searches.  相似文献   

12.
基于集值统计的直觉模糊聚类   总被引:1,自引:0,他引:1  
针对聚类问题中样品属性值不能或很难直接给出的情况,提出了一种基于集值统计的直觉模糊聚类方法,讨论了四种情况下的属性直觉模糊相似矩阵的公式,给出了聚类方法的步骤,最后通过算例 对该方法进行了说明和分析.  相似文献   

13.
针对一类具有不确定性区间数多指标信息的聚类分析问题,依据传统的基于数值信息的FCM聚类算法的思路,提出了一种新的聚类分析算法。章首先描述了具有区间数多指标信息的聚类分析问题;其次给出了基于区间数多指标信息的关于最优划分和最优聚类中心确定的两个定理;然后给出了基于区间数多指标信息的FCM聚类算法的计算步骤。该算法的特点是聚类中心的表现形式为精确的数值,给出的两个定理说明了该聚类算法的收敛性。最后,通过给出一个算例说明了本给出的聚类算法。  相似文献   

14.
提出了一种基于模糊聚类的属性匹配算法。该算法采用能综合反映属性名称相似性和语义相似性的模糊相似关系,提高了属性匹配的准确率;以等价闭包法对相似属性进行模糊聚类,得到多层次属性分类结果,更客观真实地反映了属性匹配的模糊性;同时,属性匹配过程中不需要设置匹配参数,避免了人为造成的误差。  相似文献   

15.
Sub-dominant theory provides efficient tools for clustering. However, it classically works only for ultrametrics and ad hoc extensions like Jardine and Sibson's 2-ultrametrics. In this paper we study the extension of the notion of sub-dominant to other distance models in classification accounting for overlapping clusters.We prove that a given dissimilarity admits one and only one lower-maximal quasi-ultrametric and one and only one lower-maximal weak k-ultrametric. In addition, we also prove the existence of (several) lower-maximal strongly Robinsonian dissimilarities. The construction of the lower-maximal weak k-ultrametric (for k=2) and quasi-ultrametric can be performed in polynomial time.  相似文献   

16.
An axiomatization of the Choquet integral is proposed in the context of multiple criteria decision making without any commensurability assumption. The most essential axiom—named Commensurability Through Interaction—states that the importance of an attribute i takes only one or two values when a second attribute k varies. When the importance takes two values, the point of discontinuity is exactly the value on the attribute k that is commensurate to the fixed value on attribute i. If the weight of criterion i does not depend on criterion k, for any value of the other criteria than i and k, then criteria i and k are independent. Applying this construction to any pair ik of criteria, one obtains a partition of the set of criteria. In each block, the criteria interact one with another, and it is thus possible to construct vectors of values on the attributes that are commensurate. There is complete independence between the criteria of any two blocks in this partition. Hence one cannot ensure commensurability between two blocks in the partition. But this is not a problem since the Choquet integral is additive between subsets of criteria that are independent.  相似文献   

17.
Cluster analysis is an important task in data mining and refers to group a set of objects such that the similarities among objects within the same group are maximal while similarities among objects from different groups are minimal. The particle swarm optimization algorithm (PSO) is one of the famous metaheuristic optimization algorithms, which has been successfully applied to solve the clustering problem. However, it has two major shortcomings. The PSO algorithm converges rapidly during the initial stages of the search process, but near global optimum, the convergence speed will become very slow. Moreover, it may get trapped in local optimum if the global best and local best values are equal to the particle’s position over a certain number of iterations. In this paper we hybridized the PSO with a heuristic search algorithm to overcome the shortcomings of the PSO algorithm. In the proposed algorithm, called PSOHS, the particle swarm optimization is used to produce an initial solution to the clustering problem and then a heuristic search algorithm is applied to improve the quality of this solution by searching around it. The superiority of the proposed PSOHS clustering method, as compared to other popular methods for clustering problem is established for seven benchmark and real datasets including Iris, Wine, Crude Oil, Cancer, CMC, Glass and Vowel.  相似文献   

18.
The partitioning clustering is a technique to classify n objects into k disjoint clusters, and has been developed for years and widely used in many applications. In this paper, a new overlapping cluster algorithm is defined. It differs from traditional clustering algorithms in three respects. First, the new clustering is overlapping, because clusters are allowed to overlap with one another. Second, the clustering is non-exhaustive, because an object is permitted to belong to no cluster. Third, the goals considered in this research are the maximization of the average number of objects contained in a cluster and the maximization of the distances among cluster centers, while the goals in previous research are the maximization of the similarities of objects in the same clusters and the minimization of the similarities of objects in different clusters. Furthermore, the new clustering is also different from the traditional fuzzy clustering, because the object–cluster relationship in the new clustering is represented by a crisp value rather than that represented by using a fuzzy membership degree. Accordingly, a new overlapping partitioning cluster (OPC) algorithm is proposed to provide overlapping and non-exhaustive clustering of objects. Finally, several simulation and real world data sets are used to evaluate the effectiveness and the efficiency of the OPC algorithm, and the outcomes indicate that the algorithm can generate satisfactory clustering results.  相似文献   

19.
Clustering multimodal datasets can be problematic when a conventional algorithm such as k-means is applied due to its implicit assumption of Gaussian distribution of the dataset. This paper proposes a tandem clustering process for multimodal data sets. The proposed method first divides the multimodal dataset into many small pre-clusters by applying k-means or fuzzy k-means algorithm. These pre-clusters are then clustered again by agglomerative hierarchical clustering method using Kullback–Leibler divergence as an initial measure of dissimilarity. Benchmark results show that the proposed approach is not only effective at extracting the multimodal clusters but also efficient in computational time and relatively robust at the presence of outliers.  相似文献   

20.
Clustering is the process of grouping a set of objects into classes of similar objects. In the past, clustering algorithms had a common problem that they use only one set of attributes for both partitioning the data space and measuring the similarity between objects. This problem has limited the use of the existing algorithms on some practical situation. Hence, this paper introduces a new clustering algorithm, which partitions data space by constructing a decision tree using one attribute set, and measures the degree of similarity using another. Three different partitioning methods are presented. The algorithm is explained with illustration. The performance and accuracy of the four partitioning methods are evaluated and compared.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号