首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 425 毫秒
1.
目前模糊技术已经应用于许多智能系统,如模糊关系与模糊聚类.聚类是数据挖掘的重要任务,它将数据对像分成多个聚类,在同一个聚类中,对象的属性特征之间具有较高的相似度,有很大研究及应用价值.结合数据库中的挖掘技术,对属性特征为区间数的多属性决策问题,提出了一种基于区间数隶属度的区间模糊ISODATA动态聚类方法.  相似文献   

2.
Data clustering, also called unsupervised learning, is a fundamental issue in data mining that is used to understand and mine the structure of an untagged assemblage of data into separate groups based on their similarity. Recent studies have shown that clustering techniques that optimize a single objective may not provide satisfactory result because no single validity measure works well on different kinds of data sets. Moreover, the performance of clustering algorithms degrades with more and more overlaps among clusters in a data set. These facts have motivated us to develop a fuzzy multi-objective particle swarm optimization framework in an innovative fashion for data clustering, termed as FMOPSO, which is able to deliver more effective results than state-of-the-art clustering algorithms. The key challenge in designing FMOPSO framework for data clustering is how to resolve cluster assignments confusion with such points in the data set which have significant belongingness to more than one cluster. The proposed framework addresses this problem by identification of points having significant membership to multiple classes, excluding them, and re-classifying them into single class assignments. To ascertain the superiority of the proposed algorithm, statistical tests have been performed on a variety of numerical and categorical real life data sets. Our empirical study shows that the performance of the proposed framework (in both terms of efficiency and effectiveness) significantly outperforms the state-of-the-art data clustering algorithms.  相似文献   

3.
石子烨  梁恒  白峰杉 《计算数学》2014,36(3):325-334
数据分割研究的基本内容是数据的分类和聚类,是数据挖掘的核心问题之一,在实际问题中应用广泛.特别是针对有向网络数据的研究更是学科发展的前沿.但由于这类问题结构的非对称性,使得模型与算法的构建存在本质困难,因此相应的研究结果较少.本文借鉴分子动力学方法的思想,提出了一类新的网络数据半监督分类模型及算法.该算法不仅适用于关系对称的无向网络数据,而且适用于关系非对称的有向网络.最后针对期刊引用网络数据进行了数值实验,结果表明了模型及算法的可行性和有效性.  相似文献   

4.
A clustering methodology based on biological visual models that imitates how humans visually cluster data by spatially associating patterns has been recently proposed. The method is based on Cellular Neural Networks and some resolution adjustments. The Cellular Neural Network rebuilds low-density areas while different resolutions find the best clustering option. The algorithm has demonstrated good performance compared to other clustering techniques. However, its main drawbacks correspond to its inability to operate with more than two-dimensional data sets and the computational time required for the resolution adjustment mechanism. This paper proposes a new version of this clustering methodology to solve such flaws. In the new approach, a pre-processing stage is incorporated featuring a Self-Organization Map that maps complex high-dimensional relations into a reduced lattice yet preserving the topological organization of the initial data set. This reduced representation is employed as the two-dimensional data set for further processing. In the new version, the resolution adjustment process is also accelerated through the use of an optimization method that combines the Hill-Climbing and the Random Search techniques. By incorporating such mechanisms rather than evaluating all possible resolutions, the optimization strategy finds the best resolution for a clustering problem by using a limited number of iterations. The proposed approach has been evaluated, considering several two-dimensional and high-dimensional datasets. Experimental evidence exhibits that the proposed algorithm performs the clustering task over complex problems delivering a 46% faster on average than the original method. The approach is also compared to other popular clustering techniques reported in the literature. Computational experiments demonstrate competitive results in comparison to other algorithms in terms of accuracy and robustness.  相似文献   

5.
Functional data clustering: a survey   总被引:1,自引:0,他引:1  
Clustering techniques for functional data are reviewed. Four groups of clustering algorithms for functional data are proposed. The first group consists of methods working directly on the evaluation points of the curves. The second groups is defined by filtering methods which first approximate the curves into a finite basis of functions and second perform clustering using the basis expansion coefficients. The third groups is composed of methods which perform simultaneously dimensionality reduction of the curves and clustering, leading to functional representation of data depending on clusters. The last group consists of distance-based methods using clustering algorithms based on specific distances for functional data. A software review as well as an illustration of the application of these algorithms on real data are presented.  相似文献   

6.
Unsupervised classification is a highly important task of machine learning methods. Although achieving great success in supervised classification, support vector machine (SVM) is much less utilized to classify unlabeled data points, which also induces many drawbacks including sensitive to nonlinear kernels and random initializations, high computational cost, unsuitable for imbalanced datasets. In this paper, to utilize the advantages of SVM and overcome the drawbacks of SVM-based clustering methods, we propose a completely new two-stage unsupervised classification method with no initialization: a new unsupervised kernel-free quadratic surface SVM (QSSVM) model is proposed to avoid selecting kernels and related kernel parameters, then a golden-section algorithm is designed to generate the appropriate classifier for balanced and imbalanced data. By studying certain properties of proposed model, a convergent decomposition algorithm is developed to implement this non-covex QSSVM model effectively and efficiently (in terms of computational cost). Numerical tests on artificial and public benchmark data indicate that the proposed unsupervised QSSVM method outperforms well-known clustering methods (including SVM-based and other state-of-the-art methods), particularly in terms of classification accuracy. Moreover, we extend and apply the proposed method to credit risk assessment by incorporating the T-test based feature weights. The promising numerical results on benchmark personal credit data and real-world corporate credit data strongly demonstrate the effectiveness, efficiency and interpretability of proposed method, as well as indicate its significant potential in certain real-world applications.  相似文献   

7.
Clustering is an important problem in data mining. It can be formulated as a nonsmooth, nonconvex optimization problem. For the most global optimization techniques this problem is challenging even in medium size data sets. In this paper, we propose an approach that allows one to apply local methods of smooth optimization to solve the clustering problems. We apply an incremental approach to generate starting points for cluster centers which enables us to deal with nonconvexity of the problem. The hyperbolic smoothing technique is applied to handle nonsmoothness of the clustering problems and to make it possible application of smooth optimization algorithms to solve them. Results of numerical experiments with eleven real-world data sets and the comparison with state-of-the-art incremental clustering algorithms demonstrate that the smooth optimization algorithms in combination with the incremental approach are powerful alternative to existing clustering algorithms.  相似文献   

8.
In this paper, we present a new clustering method that involves data envelopment analysis (DEA). The proposed DEA-based clustering approach employs the piecewise production functions derived from the DEA method to cluster the data with input and output items. Thus, each evaluated decision-making unit (DMU) not only knows the cluster that it belongs to, but also checks the production function type that it confronts. It is important for managerial decision-making where decision-makers are interested in knowing the changes required in combining input resources so it can be classified into a desired cluster/class. In particular, we examine the fundamental CCR model to set up the DEA clustering approach. While this approach has been carried for the CCR model, the proposed approach can be easily extended to other DEA models without loss of generality. Two examples are given to explain the use and effectiveness of the proposed DEA-based clustering method.  相似文献   

9.
Goldfarb has proposed a unified approach to pattern recognition. In Goldfarb's approach the data from a pseudometric space are isometrically embedded in a pseudo-Euclidean space. The data representation is built for every particular data set. The resulting data space will be data dependent. The aim of this paper is to extend Goldfarb's approach for fuzzy clustering. A fuzzy clustering procedure for a pseudometric data set is given. A generalisation of this algorithm to obtain the cluster substructure of a fuzzy class is also proposed.  相似文献   

10.
With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introduce a unified and robust model-free feature screening approach for high-dimensional survival data with censoring, which has several advantages: it is a model-free approach under a general model framework, and hence avoids the complication to specify an actual model form with huge number of candidate variables; under mild conditions without requiring the existence of any moment of the response, it enjoys the ranking consistency and sure screening properties in ultra-high dimension. In particular, we impose a conditional independence assumption of the response and the censoring variable given each covariate, instead of assuming the censoring variable is independent of the response and the covariates. Moreover, we also propose a more robust variant to the new procedure, which possesses desirable theoretical properties without any finite moment condition of the predictors and the response. The computation of the newly proposed methods does not require any complicated numerical optimization and it is fast and easy to implement. Extensive numerical studies demonstrate that the proposed methods perform competitively for various configurations. Application is illustrated with an analysis of a genetic data set.  相似文献   

11.
针对在重大突发事件应急决策大数据环境下决策者偏好的不确定性及偏离群体一致性导致的风险,提出一种基于UGC大数据挖掘的大群体两阶段风险性应急决策方法。首先,通过数据挖掘和自然语言处理方法从UGC中获取公众对事件的偏好信息并构建应急决策属性体系,利用TF-IDF方法结合专家评估信息确定属性权重;其次,建立一个意见开放式的两阶段决策流程,提出依据决策者意见的可靠度和准确度量化决策风险,利用聚类方法得到相应的成员权重,并使用TOPSIS法对决策方案进行排序。最后通过天津港“8·12”重大爆炸事故的案例分析和对比验证了所提出方法的可行性和有效性。  相似文献   

12.
Severe constraints imposed by the nature of endless sequences of data collected from unstable phenomena have pushed the understanding and the development of automated analysis strategies, such as data clustering techniques. However, current clustering validation approaches are inadequate to data streams due to they do not properly evaluate representation of behavior changes. This paper proposes a novel function to continuously evaluate data stream clustering inspired in Lyapunov energy functions used by techniques such as the Hopfield artificial neural network and the Bidirectional Associative Memory (Bam). The proposed function considers three terms: i) the intra-cluster distance, which allows to evaluate cluster compactness; ii) the inter-cluster distance, which reflects cluster separability; and iii) entropy estimation of the clustering model, which permits the evaluation of the level of uncertainty in data streams. A first set of experiments illustrate the proposed function applied to scenarios of continuous evaluation of data stream clustering. Further experiments were conducted to compare this new function to well-established clustering indices and results confirm our proposal reflects the same information obtained with external clustering indices.  相似文献   

13.
Cluster analysis is an important task in data mining and refers to group a set of objects such that the similarities among objects within the same group are maximal while similarities among objects from different groups are minimal. The particle swarm optimization algorithm (PSO) is one of the famous metaheuristic optimization algorithms, which has been successfully applied to solve the clustering problem. However, it has two major shortcomings. The PSO algorithm converges rapidly during the initial stages of the search process, but near global optimum, the convergence speed will become very slow. Moreover, it may get trapped in local optimum if the global best and local best values are equal to the particle’s position over a certain number of iterations. In this paper we hybridized the PSO with a heuristic search algorithm to overcome the shortcomings of the PSO algorithm. In the proposed algorithm, called PSOHS, the particle swarm optimization is used to produce an initial solution to the clustering problem and then a heuristic search algorithm is applied to improve the quality of this solution by searching around it. The superiority of the proposed PSOHS clustering method, as compared to other popular methods for clustering problem is established for seven benchmark and real datasets including Iris, Wine, Crude Oil, Cancer, CMC, Glass and Vowel.  相似文献   

14.
Application of honey-bee mating optimization algorithm on clustering   总被引:4,自引:0,他引:4  
Cluster analysis is one of attractive data mining technique that use in many fields. One popular class of data clustering algorithms is the center based clustering algorithm. K-means used as a popular clustering method due to its simplicity and high speed in clustering large datasets. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. Over the last decade, modeling the behavior of social insects, such as ants and bees, for the purpose of search and problem solving has been the context of the emerging area of swarm intelligence. Honey-bees are among the most closely studied social insects. Honey-bee mating may also be considered as a typical swarm-based approach to optimization, in which the search algorithm is inspired by the process of marriage in real honey-bee. Honey-bee has been used to model agent-based systems. In this paper, we proposed application of honeybee mating optimization in clustering (HBMK-means). We compared HBMK-means with other heuristics algorithm in clustering, such as GA, SA, TS, and ACO, by implementing them on several well-known datasets. Our finding shows that the proposed algorithm works than the best one.  相似文献   

15.
The aim of this paper is to model lifetime data for systems that have failure modes by using the finite mixture of Weibull distributions. It involves estimating of the unknown parameters which is an important task in statistics, especially in life testing and reliability analysis. The proposed approach depends on different methods that will be used to develop the estimates such as MLE through the EM algorithm. In addition, Bayesian estimations will be investigated and some other extensions such as Graphic, Non-Linear Median Rank Regression and Monte Carlo simulation methods can be used to model the system under consideration. A numerical application will be used through the proposed approach. This paper also presents a comparison of the fitted probability density functions, reliability functions and hazard functions of the 3-parameter Weibull and Weibull mixture distributions using the proposed approach and other conventional methods which characterize the distribution of failure times for the system components. GOF is used to determine the best distribution for modeling lifetime data, the priority will be for the proposed approach which has more accurate parameter estimates.  相似文献   

16.
Candidate groups search for K-harmonic means data clustering   总被引:2,自引:0,他引:2  
Clustering is a very popular data analysis and data mining technique. K-means is one of the most popular methods for clustering. Although K-mean is easy to implement and works fast in most situations, it suffers from two major drawbacks, sensitivity to initialization and convergence to local optimum. K-harmonic means clustering has been proposed to overcome the first drawback, sensitivity to initialization. In this paper we propose a new algorithm, candidate groups search (CGS), combining with K-harmonic mean to solve clustering problem. Computational results showed CGS does get better performance with less computational time in clustering, especially for large datasets or the number of centers is big.  相似文献   

17.
Conventional open pit mine optimization models for designing mining phases and ultimate pit limit do not consider expected variations and uncertainty in metal content available in a mineral deposit (supply) and commodity prices (market demand). Unlike the conventional approach, a stochastic framework relies on multiple realizations of the input data so as to account for uncertainty in metal content and financial parameters, reflecting potential supply and demand. This paper presents a new method that jointly considers uncertainty in metal content and commodity prices, and incorporates time-dependent discounted values of mining blocks when designing optimal production phases and ultimate pit limit, while honouring production capacity constraints. The structure of a graph representing the stochastic framework is proposed, and it is solved with a parametric maximum flow algorithm. Lagragnian relaxation and the subgradient method are integrated in the proposed approach to facilitate producing practical designs. An application at a copper deposit in Canada demonstrates the practical aspects of the approach and quality of solutions over conventional methods, as well as the effectiveness of the proposed stochastic approach in solving mine planning and design problems.  相似文献   

18.
Medoid-based fuzzy clustering generates clusters of objects based on relational data, which records pairwise similarities or dissimilarities among objects. Compared with single-medoid based approaches, our recently proposed fuzzy clustering with multiple-weighted medoids has shown superior performance in clustering via experimental study. In this paper, we present a new version of fuzzy relational clustering in this family called fuzzy clustering with multi-medoids (FMMdd). Based on the new objective function of FMMdd, update equations can be derived more conveniently. Moreover, a unified view of FMMdd and two existing fuzzy relational approaches fuzzy c-medoids (FCMdd) and assignment-prototype (A-P) can be established, which allows us to conduct further analytical study to investigate the effectiveness and feasibility of the proposed approach as well as the limitations of existing ones. The robustness of FMMdd is also investigated. Our theoretical and numerical studies show that the proposed approach produces good quality of clusters with rich cluster-based information and it is less sensitive to noise.  相似文献   

19.
The curse of high-dimensionality has emerged in the statistical fields more and more frequently. Many techniques have been developed to address this challenge for classification problems. We propose a novel feature screening procedure for dichotomous response data. This new method can be implemented as easily as t-test marginal screening approach, and the proposed procedure is free of any subexponential tail probability conditions and moment requirement and not restricted in a specific model structure. We prove that our method possesses the sure screening property and also illustrate the effect of screening by Monte Carlo simulation and apply it to a real data example.  相似文献   

20.
Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be effective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate in size. For large datasets, current model-based clustering methods tend to be limited by memory and time requirements and the increasing difficulty of maximum likelihood estimation. They may fit too many clusters in some portions of the data and/or miss clusters containing relatively few observations. We propose an incremental approach for data that can be processed as a whole in memory, which is relatively efficient computationally and has the ability to find small clusters in large datasets. The method starts by drawing a random sample of the data, selecting and fitting a clustering model to the sample, and extending the model to the full dataset by additional EM iterations. New clusters are then added incrementally, initialized with the observations that are poorly fit by the current model. We demonstrate the effectiveness of this method by applying it to simulated data, and to image data where its performance can be assessed visually.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号