首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
结合粗糙集与模糊集理论,提出一种信息检索的新方法.该方法首先对已知文本信息按关键词进行模糊聚类;然后利用粗糙集理论求出各关键词的重要性程度;最后根据最大相似度原则对文本信息进行检索,若结果集中文本数量较大,则按文本与已知文本的相似度从高到低进行排序,先返回相似度较高的相关文档.  相似文献   

2.
目前模糊技术已经应用于许多智能系统,如模糊关系与模糊聚类.聚类是数据挖掘的重要任务,它将数据对像分成多个聚类,在同一个聚类中,对象的属性特征之间具有较高的相似度,有很大研究及应用价值.结合数据库中的挖掘技术,对属性特征为区间数的多属性决策问题,提出了一种基于区间数隶属度的区间模糊ISODATA动态聚类方法.  相似文献   

3.
由于信息的不完全性,我们得到的表示事物特征的数据往往是一些区间值.基于区间值的模糊聚类分析方法,可以更大程度地保留信息.首次提出了基于区间值的最大最小法构造相似系数,给出解决区间数多指标信息聚类问题的计算步骤和基于区间值的聚类方法,最后,通过算例说明了所给出的聚类方法在实际中的应用.  相似文献   

4.
在医学科研与其它科研领域中,聚类分析方法已有广泛的应用.聚类分析包括两个内容:样品聚类与指标聚类,本文试图充实后一个内容.指标聚类的依据是相似系数,常用的相似系数是简单相关系数[1],从而导出了指标聚类的最大相关系数法和最小相关系数法.最大相关系数与最小相关系数均不能很好地反映两类指标间的相关关系,因此简单相关系数作为相似系数有一定的局限性,有可能使指标聚类结果无法解释.典型相关系数是简单相关系数的直接推广,它能较好地描述两类指标间的相关关系,自然可以用典型相关系数作为指标聚类的相似系数. 一、典型相关系数[2’3]…  相似文献   

5.
连续值域决策表的一种属性权重确定方法   总被引:1,自引:0,他引:1  
结合模糊聚类技术与粗糙集中属性重要性思想,针对连续值域决策表,提出一种属性权重的确定方法.该方法首先将连续值域决策表转化为模糊决策表,然后求出对象之间关于条件属性的相似度矩阵,再采用平方法将相似度矩阵化为等价矩阵,最后根据模糊聚类技术与粗糙集中求属性重要性方法求各连续属性的权重.实例说明了该方法的合理性和有效性.  相似文献   

6.
基于正交函数系和FCM算法,提出了一种新的时间序列聚类的方法.该方法首先通过一个非线性映射,将长度为n的时间序列映射到L_2空间,然后通过计算函数之间的距离得到时间序列之间的相似度.在此基础上,经过FCM算法实现时间序列的聚类.该方法克服了时间序列的高维数特征为时间序列聚类带来的计算困难.实验结果表明,对高维的时间序列,该方法在压缩率达到80%的情况下,依然具有良好的聚类效果.  相似文献   

7.
以客户关系为导向的服务经济时代,客户细分是企业判断具有相似特征的客户,并以此为基础有针对性的提供产品及服务的最为有效的手段.利用数据挖掘技术,将模糊聚类算法应用于客户细分中,以此寻找具有相似特征及价值的客户,并在电信行业中进行实证应用.  相似文献   

8.
超图聚类方法是目前主流聚类方法之一.它的经典版本出现在超大规模集成电路研究领域.近年来,它的各种改进版本被提出并广泛应用于机器视觉领域.例如,在图像聚类和运动分割方面,它的各种版本常有较好的表现.本文将超图聚类方法引入文本聚类领域.首先,根据文本数据高度稀疏的特点,采用SVD(或PCA)对其进行降维;其次,采用基于大超边的超图规范割聚类对文本的低维投影进行聚类;最后,采用聚类准确率指标对聚类进行评价.在两个文本数据集的实验中,基于超图规范割聚类取得了比传统的k均值聚类和层次聚类更好的聚类表现.  相似文献   

9.
图像分割就是把感兴趣的区域从背景中分割、提取出来,为了使分割出来的图像特征信息完整,根据图像的灰度值和空间距离构造了一种相似度函数,得到基于图的灰度值的相似度矩阵,将图像分割转化为图论最小割问题,然后运用谱聚类算法进行分割.针对谱聚类算法运行所需的内存空间和运算量大的特点,提出一种考虑概率因素的随机抽样谱聚类算法.在具体实施时,为了减少背景噪声对分割结果的影响,对图像进行了滤波预处理.结果表明,算法稳定性好,相对现有算法,分割效果得到改善.  相似文献   

10.
针对水环境质量综合评价中指标权重确定方法的不足,利用学习向量量化(LVQ)神经网络具有的强大的非线性运算和相似特征聚类功能,提出一种基于学习向量量化(LVQ)神经网络的水质综合评价决策方法.将它应用于水质综合指标评价,为改进水质综合评价提供了一种简捷的分类评价方法.  相似文献   

11.
In this paper, at first a new line symmetry (LS) based distance is proposed which calculates the amount of symmetry of a point with respect to the first principal axis of a data set. The proposed distance uses a recently developed point symmetry (PS) based distance in its computation. Kd-tree based nearest neighbor search is used to reduce the complexity of computing the closest symmetric point. Thereafter an evolutionary clustering technique is described that uses this new principal axis based LS distance for assignment of points to different clusters. The proposed GA with line symmetry distance based (GALS) clustering technique is able to detect any type of clusters, irrespective of their geometrical shape, size or convexity as long as they possess the characteristics of LS. GALS is compared with the existing genetic algorithm based K-means clustering technique, GAK-means, existing genetic algorithm with PS based clustering technique, GAPS, spectral clustering technique, and average linkage clustering technique. Five artificially generated data sets having different characteristics and seven real-life data sets are used to demonstrate the superiority of the proposed GALS clustering technique. In a part of experiment, utility of the proposed genetic LS distance based clustering technique is demonstrated for segmenting the satellite image of the part of the city of Kolkata. The proposed technique is able to distinguish different landcover types in the image. In the last part of the paper genetic algorithm is used to search for the suitable line of symmetry of each cluster.  相似文献   

12.
Clustering is a popular data analysis and data mining technique. Since clustering problem have NP-complete nature, the larger the size of the problem, the harder to find the optimal solution and furthermore, the longer to reach a reasonable results. A popular technique for clustering is based on K-means such that the data is partitioned into K clusters. In this method, the number of clusters is predefined and the technique is highly dependent on the initial identification of elements that represent the clusters well. A large area of research in clustering has focused on improving the clustering process such that the clusters are not dependent on the initial identification of cluster representation. Another problem about clustering is local minimum problem. Although studies like K-Harmonic means clustering solves the initialization problem trapping to the local minima is still a problem of clustering. In this paper we develop a new algorithm for solving this problem based on a tabu search technique—Tabu K-Harmonic means (TabuKHM). The experiment results on the Iris and the other well known data, illustrate the robustness of the TabuKHM clustering algorithm.  相似文献   

13.
Clustering has been widely used to partition data into groups so that the degree of association is high among members of the same group and low among members of different groups. Though many effective and efficient clustering algorithms have been developed and deployed, most of them still suffer from the lack of automatic or online decision for optimal number of clusters. In this paper, we define clustering gain as a measure for clustering optimality, which is based on the squared error sum as a clustering algorithm proceeds. When the measure is applied to a hierarchical clustering algorithm, an optimal number of clusters can be found. Our clustering measure shows good performance producing intuitively reasonable clustering configurations in Euclidean space according to the evidence from experimental results. Furthermore, the measure can be utilized to estimate the desired number of clusters for partitional clustering methods as well. Therefore, the clustering gain measure provides a promising technique for achieving a higher level of quality for a wide range of clustering methods.  相似文献   

14.
For hierarchical clustering, dendrograms are a convenient and powerful visualization technique. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this article we extend (dissimilarity) matrix shading with several reordering steps based on seriation techniques. Both ideas, matrix shading and reordering, have been well known for a long time. However, only recent algorithmic improvements allow us to solve or approximately solve the seriation problem efficiently for larger problems. Furthermore, seriation techniques are used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is able to present the structure between clusters and the micro-structure within clusters in one concise plot. This not only allows us to judge cluster quality but also makes misspecification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples. Experiments show that dissimilarity plots scale very well with increasing data dimensionality.

Supplemental materials with additional experiments for this article are available online.  相似文献   

15.
聚类分析是数据挖掘的重要技术,是一种无监督的学习方式,可根据数据间的相似程度,将数据进行分类.竞争决策算法是一种基于竞争造就优化和决策左右结果的新型优化算法,针对聚类分析的特点,设计了一种竞争决策算法进行求解,经实验测试和验证,并与其它算法的结果进行比较,获得了较好的结果.  相似文献   

16.
Cluster analysis is a popular technique in statistics and computer science with the objective of grouping similar observations in relatively distinct groups generally known as clusters. Semi-supervised clustering assumes that some additional information about group memberships is available. Under the most frequently considered scenario, labels are known for some portion of data and unavailable for the rest of observations. In this paper, we discuss a general type of semi-supervised clustering defined by so called positive and negative constraints. Under positive constraints, some data points are required to belong to the same cluster. On the contrary, negative constraints specify that particular points must represent different data groups. We outline a general framework for semi-supervised clustering with constraints naturally incorporating the additional information into the EM algorithm traditionally used in mixture modeling and model-based clustering. The developed methodology is illustrated on synthetic and classification datasets. A dendrochronology application is considered and thoroughly discussed.  相似文献   

17.
In data mining, the unsupervised learning technique of clustering is a useful method for ascertaining trends and patterns in data. Most general clustering techniques do not take into consideration the time-order of data. In this paper, mathematical programming and statistical techniques and methodologies are combined to develop a seasonal clustering technique for determining clusters of time series data. We apply this technique to weather and aviation data to determine probabilistic distributions of arrival capacity scenarios, which can be used for efficient traffic flow management. In general, this technique may be used for seasonal forecasting and planning.  相似文献   

18.
The partitioning clustering is a technique to classify n objects into k disjoint clusters, and has been developed for years and widely used in many applications. In this paper, a new overlapping cluster algorithm is defined. It differs from traditional clustering algorithms in three respects. First, the new clustering is overlapping, because clusters are allowed to overlap with one another. Second, the clustering is non-exhaustive, because an object is permitted to belong to no cluster. Third, the goals considered in this research are the maximization of the average number of objects contained in a cluster and the maximization of the distances among cluster centers, while the goals in previous research are the maximization of the similarities of objects in the same clusters and the minimization of the similarities of objects in different clusters. Furthermore, the new clustering is also different from the traditional fuzzy clustering, because the object–cluster relationship in the new clustering is represented by a crisp value rather than that represented by using a fuzzy membership degree. Accordingly, a new overlapping partitioning cluster (OPC) algorithm is proposed to provide overlapping and non-exhaustive clustering of objects. Finally, several simulation and real world data sets are used to evaluate the effectiveness and the efficiency of the OPC algorithm, and the outcomes indicate that the algorithm can generate satisfactory clustering results.  相似文献   

19.
For several years, model-based clustering methods have successfully tackled many of the challenges presented by data-analysts. However, as the scope of data analysis has evolved, some problems may be beyond the standard mixture model framework. One such problem is when observations in a dataset come from overlapping clusters, whereby different clusters will possess similar parameters for multiple variables. In this setting, mixed membership models, a soft clustering approach whereby observations are not restricted to single cluster membership, have proved to be an effective tool. In this paper, a method for fitting mixed membership models to data generated by a member of an exponential family is outlined. The method is applied to count data obtained from an ultra running competition, and compared with a standard mixture model approach.  相似文献   

20.
Clustering analysis plays an important role in the filed of data mining. Nowadays, hierarchical clustering technique is becoming one of the most widely used clustering techniques. However, for most algorithms of hierarchical clustering technique, the requirements of high execution efficiency and high accuracy of clustering result cannot be met at the same time. After analyzing the advantages and disadvantages of the hierarchical algorithms, the paper puts forward a two-stage clustering algorithm, named Chameleon Based on Clustering Feature Tree (CBCFT), which hybridizes the Clustering Tree of algorithm BIRCH with algorithm CHAMELEON. By calculating the time complexity of CBCFT, the paper argues that the time complexity of CBCFT increases linearly with the number of data. By experimenting on sample data set, this paper demonstrates that CBCFT is able to identify clusters with large variance in size and shape and is robust to outliers. Moreover, the result of CBCFT is as similar as that of CHAMELEON, but CBCFT overcomes the shortcoming of the low execution efficiency of CHAMELEON. Although the execution time of CBCFT is longer than BIRCH, the clustering result of CBCFT is much satisfactory than that of BIRCH. Finally, through a case of customer segmentation of Chinese Petroleum Corp. HUBEI branch; the paper demonstrates that the clustering result of the case is meaningful and useful. The research is partially supported by National Natural Science Foundation of China (grants #70372049 and #70121001).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号