首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 25 毫秒
1.
This paper presents DivClusFD, a new divisive hierarchical method for the non-supervised classification of functional data. Data of this type present the peculiarity that the differences among clusters may be caused by changes as well in level as in shape. Different clusters can be separated in different subregion and there may be no subregion in which all clusters are separated. In each step of division, the DivClusFD method explores the functions and their derivatives at several fixed points, seeking the subregion in which the highest number of clusters can be separated. The number of clusters is estimated via the gap statistic. The functions are assigned to the new clusters by combining the k-means algorithm with the use of functional boxplots to identify functions that have been incorrectly classified because of their atypical local behavior. The DivClusFD method provides the number of clusters, the classification of the observed functions into the clusters and guidelines that may be for interpreting the clusters. A simulation study using synthetic data and tests of the performance of the DivClusFD method on real data sets indicate that this method is able to classify functions accurately.  相似文献   

2.
The electronic structures and atomic magnetic moments of Nin clusters (n= 2—6) have been studied. Compared with crystalline nickel, some clusters increase obviously in magnetism; some decrease obviously; and some show ferrimagnetism. The symmetry of clusters has great effect on magnetic moment. If they are similar in symmetry, the clusters are similar in magnetic moment. The magnetic moment for small clusters does not seem to increase or decrease monotonically with the change in their size, because adding or removing one atom may fully change the symmetry of small clusters. As the surface layer of ultrafine particles is made of many different polyhedrons with low symmetry and the alignment of the polyhedrons is complicated, the whole surface layer presents short-range order. The calculated results explain the abnormal phenomena about surface magnetism that have been in existence for a long time.  相似文献   

3.
Clustering is a popular data analysis and data mining technique. Since clustering problem have NP-complete nature, the larger the size of the problem, the harder to find the optimal solution and furthermore, the longer to reach a reasonable results. A popular technique for clustering is based on K-means such that the data is partitioned into K clusters. In this method, the number of clusters is predefined and the technique is highly dependent on the initial identification of elements that represent the clusters well. A large area of research in clustering has focused on improving the clustering process such that the clusters are not dependent on the initial identification of cluster representation. Another problem about clustering is local minimum problem. Although studies like K-Harmonic means clustering solves the initialization problem trapping to the local minima is still a problem of clustering. In this paper we develop a new algorithm for solving this problem based on a tabu search technique—Tabu K-Harmonic means (TabuKHM). The experiment results on the Iris and the other well known data, illustrate the robustness of the TabuKHM clustering algorithm.  相似文献   

4.
Spatio-temporal clusters in 1997–2003 fire sequences of Tuscany region (central Italy) have been identified and analysed by using the scan statistic, a method which was devised to evidence clusters in epidemiology. Results showed that the method is reliable to find clusters of events and to evaluate their significance via Monte Carlo replication. The evaluation of the presence of spatial and temporal patterns in fire occurrence and their significance could have a great impact in forthcoming studies on fire occurrences prediction.  相似文献   

5.
Given a set of data, very little is known about tests to determine number of clusters and/or elements of the clusters. Even in the simplest case of detecting between only one or two clusters with multivariate normal data, theoretically the number of tests needed seems to be infinite. Alternatively, suppose N independent estimates of generalized variances (GVs) are computed from a given set of p-dimensional vector observations. Assuming multivariate normality, tests based on GVs are proposed which objectively and uniquely determine, simultaneously, the number of clusters and their corresponding elements. Only a reasonably small nunber of tests are required for this stepwise procedure. The exact percentage points are either available from existing tables or can be computed from a result presented.  相似文献   

6.
Cluster analysis of genome-wide expression data from DNA microarray hybridization studies is a useful tool for identifying biologically relevant gene groupings (DeRisi et al. 1997; Weiler et al. 1997). It is hence important to apply a rigorous yet intuitive clustering algorithm to uncover these genomic relationships. In this study, we describe a novel clustering algorithm framework based on a variant of the Generalized Benders Decomposition, denoted as the Global Optimum Search (Floudas et al. 1989; Floudas 1995), which includes a procedure to determine the optimal number of clusters to be used. The approach involves a pre-clustering of data points to define an initial number of clusters and the iterative solution of a Linear Programming problem (the primal problem) and a Mixed-Integer Linear Programming problem (the master problem), that are derived from a Mixed Integer Nonlinear Programming problem formulation. Badly placed data points are removed to form new clusters, thus ensuring tight groupings amongst the data points and incrementing the number of clusters until the optimum number is reached. We apply the proposed clustering algorithm to experimental DNA microarray data centered on the Ras signaling pathway in the yeast Saccharomyces cerevisiae and compare the results to that obtained with some commonly used clustering algorithms. Our algorithm compares favorably against these algorithms in the aspects of intra-cluster similarity and inter-cluster dissimilarity, often considered two key tenets of clustering. Furthermore, our algorithm can predict the optimal number of clusters, and the biological coherence of the predicted clusters is analyzed through gene ontology.  相似文献   

7.
We present an optimum metallic-bond scheme to study the geometric structures of sodium clusters Nan(n⩽15) systematically by combining the characteristics of metallic bonds and the first principle molecular dynamics simulation. The scheme provides an optimum way to examine almost all stable structures of sodium clusters and to determine their ground state structures. It is interesting to note that for the larger sodium clusters (13⩽n⩽15), there are some plane-like substructures on their surfaces, which resemble the fragments of the (110) plane with the highest atomic area density in the bulk bcc sodium crystal. We also propose a possible way to understand the formation of large icosahedral sodium clusters (1500<n<22000).  相似文献   

8.
A major challenge in biological sciences is the reconstruction of the Tree of Life. To this effect, large genomic databases like GenBank and SwissProt are being mined for clusters from which phylogenies can be inferred. Systematists and comparative biologists commonly combine such phylogenies into informative supertrees that reveal information which was not explicitly displayed in any of the original phylogenies. However, whether a supertree is informative depends on particular overlap properties among the clusters from which it originates. In this work we formally introduce the concept of groves — sets of clusters with the potential to construct informative supertrees. Thus maximal potential candidate clusters for informative supertree construction can be identified in large databases through groves, prior to inferring trees for each cluster. Groves also have the potential to lead to informative supermatrix construction. We developed methods that (i) efficiently identify particular types of groves and (ii) find lower and upper bounds on the minimal number of groves needed to cover all the trees or data sets in a database. Finally, we apply our methods to the green plant sequences from GenBank.  相似文献   

9.
In distributed computing, the recent paradigm shift from centrally-owned clusters to organizationally distributed computational grids introduces a number of new challenges in resource management and scheduling. In this work, we study the problem of Selfish Load Balancing which extends the well-known load balancing (LB) problem to scenarios in which each processor is concerned only with the performance of its local jobs. We propose a simple mathematical model for such systems and a novel function for computing the cost of the execution of foreign jobs. Then, we use the game-theoretic framework to analyze the model in order to compute the expected result of LB performed in a grid formed by two clusters. We show that, firstly, LB is a socially-optimal strategy, and secondly, for similarly loaded clusters, it is sufficient to collaborate during longer time periods in order to make LB the dominant strategy for each cluster. However, we show that if we allow clusters to make decisions depending on their current queue length, LB will never be performed. Then, we propose a LB algorithm which balances the load more equitably, even in the presence of overloaded clusters. Our algorithms do not use any external forms of compensation (such as money). The load is balanced only by considering the parameters of execution of jobs. This analysis is assessed experimentally by simulation, involving scenarios with multiple clusters and heterogeneous load.  相似文献   

10.
In this paper, we investigate the problem of determining the number of clusters in the k-modes based categorical data clustering process. We propose a new categorical data clustering algorithm with automatic selection of k. The new algorithm extends the k-modes clustering algorithm by introducing a penalty term to the objective function to make more clusters compete for objects. In the new objective function, we employ a regularization parameter to control the number of clusters in a clustering process. Instead of finding k directly, we choose a suitable value of regularization parameter such that the corresponding clustering result is the most stable one among all the generated clustering results. Experimental results on synthetic data sets and the real data sets are used to demonstrate the effectiveness of the proposed algorithm.  相似文献   

11.

In model-based clustering mixture models are used to group data points into clusters. A useful concept introduced for Gaussian mixtures by Malsiner Walli et al. (Stat Comput 26:303–324, 2016) are sparse finite mixtures, where the prior distribution on the weight distribution of a mixture with K components is chosen in such a way that a priori the number of clusters in the data is random and is allowed to be smaller than K with high probability. The number of clusters is then inferred a posteriori from the data. The present paper makes the following contributions in the context of sparse finite mixture modelling. First, it is illustrated that the concept of sparse finite mixture is very generic and easily extended to cluster various types of non-Gaussian data, in particular discrete data and continuous multivariate data arising from non-Gaussian clusters. Second, sparse finite mixtures are compared to Dirichlet process mixtures with respect to their ability to identify the number of clusters. For both model classes, a random hyper prior is considered for the parameters determining the weight distribution. By suitable matching of these priors, it is shown that the choice of this hyper prior is far more influential on the cluster solution than whether a sparse finite mixture or a Dirichlet process mixture is taken into consideration.

  相似文献   

12.
Using the classical distribution-function approach to simple liquids, we estimate the orientational interaction between clusters consisting of a particle and its nearest neighbors. We show that there are density and temperature ranges where the interaction changes sign as a function of the cluster radius. On this basis, the corresponding model of interacting cubic and icosahedral clusters (of the type of a spin glass model) is proposed and solved in the replica-symmetric approximation. We show that the glass order parameter grows continuously on cooling and the replica-symmetry-breaking temperature can be identified with the glass transition temperature. We also show that on cooling a system of particles with a Lennard-Jones interaction, cubic clusters freeze first. The transition temperature for icosahedral clusters is somewhat lower; therefore, the cubic structure of the short-range order is more likely in a Lennard-Jones glass near transition.  相似文献   

13.
The TCLUST procedure performs robust clustering with the aim of finding clusters with different scatter structures and weights. An Eigenvalues Ratio constraint is considered by TCLUST in order to achieve a wide range of clustering alternatives depending on the allowed differences among cluster scatter matrices. Moreover, this constraint avoids finding uninteresting spurious clusters. In order to guarantee the robustness of the method against the presence of outliers and background noise, the method allows for trimming of a given proportion of observations self-determined by the data. Based on this “impartial trimming”, the procedure is assumed to have good robustness properties. As it was done for the trimmed k-means method, this article studies robustness properties of the TCLUST procedure in the univariate case with two clusters by means of the influence function. The conclusion is that the TCLUST has a robustness behavior close to that of the trimmed k-means in spite of the fact that it addresses a more general clustering approach.  相似文献   

14.
This paper presents an extension of the standard regression tree method to clustered data. Previous works extending tree methods to accommodate correlated data are mainly based on the multivariate repeated-measures approach. We propose a “mixed effects regression tree” method where the correlated observations are viewed as nested within clusters rather than as vectors of multivariate repeated responses. The proposed method can handle unbalanced clusters, allows observations within clusters to be split, and can incorporate random effects and observation-level covariates. We implemented the proposed method using a standard tree algorithm within the framework of the expectation-maximization (EM) algorithm. The simulation results show that the proposed regression tree method provides substantial improvements over standard trees when the random effects are non negligible. A real data example is used to illustrate the method.  相似文献   

15.
Abstract

The primary model for cluster analysis is the latent class model. This model yields the mixture likelihood. Due to numerous local maxima, the success of the EM algorithm in maximizing the mixture likelihood depends on the initial starting point of the algorithm. In this article, good starting points for the EM algorithm are obtained by applying classification methods to randomly selected subsamples of the data. The performance of the resulting two-step algorithm, classification followed by EM, is compared to, and found superior to, the baseline algorithm of EM started from a random partition of the data. Though the algorithm is not complicated, comparing it to the baseline algorithm and assessing its performance with several classification methods is nontrivial. The strategy employed for comparing the algorithms is to identify canonical forms for the easiest and most difficult datasets to cluster within a large collection of cluster datasets and then to compare the performance of the two algorithms on these datasets. This has led to the discovery that, in the case of three homogeneous clusters, the most difficult datasets to cluster are those in which the clusters are arranged on a line and the easiest are those in which the clusters are arranged on an equilateral triangle. The performance of the two-step algorithm is assessed using several classification methods and is shown to be able to cluster large, difficult datasets consisting of three highly overlapping clusters arranged on a line with 10,000 observations and 8 variables.  相似文献   

16.
We consider the asymptotic probability distribution coagula-tion-fragmentation process in the thermodynamic limit of the size of a reversible random We prove that the distributions of small, medium and the largest clusters converge to Gaussian, Poisson and 0-1 distributions in the supercritical stage (post-gelation), respectively. We show also that the mutually dependent distributions of clusters will become independent after the occurrence of a gelation transition. Furthermore, it is proved that all the number distributions of clusters are mutually independent at the critical stage (gelation), but the distributions of medium and the largest clusters are mutually dependent with positive correlation coefficient in the supercritical stage. When the fragmentation strength goes to zero, there will exist only two types of clusters in the process, one type consists of the smallest clusters, the other is the largest one which has a size nearly equal to the volume (total number of units).  相似文献   

17.
We show that the Schrödinger propagator can be expanded in terms of resonances at energy levels at which a barrier separates the interaction region from infinity. The expansions hold for all times with errors small in the semi-classical parameter. As a byproduct we obtain a result on the approximation of clusters of resonant states by clusters of eigenfunctions of a self-adjoint reference operator.  相似文献   

18.
We consider random processes occurring on bond percolation clusters and represented as a generalization of the “divide and color model” introduced by Häggström in 2001. We investigate the asymptotic behaviors for bond percolation clusters with uncorrelated weights. For subcritical and supercritical phases, we prove the law of large numbers and central limit theorems in the models corresponding to the so-called quenched and annealed probabilities.  相似文献   

19.
Wave propagation is used in many fields for measurement and characterization. Corresponding multiphase models usually use a continuous approach. Nevertheless, systems like wetted rocks may be saturated residually in certain situations. In such cases, one fluid is distributed as clusters, each different in size and shape. One single, continuous phase cannot account for a variety of fluid clusters, either disconnected from each other or connected only about thin liquid films. Therefore, we present a model that considers a heterogeneous distribution of disconnected fluid clusters in the form of harmonic oscillators. These oscillators are described and distinguished by their mass, damping and eigenfrequency. Hence, the model allows to characterize different clusters and includes an additional damping mechanism due to oscillations of the fluid clusters. (© 2012 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

20.
Generalized linear and additive models are very efficient regression tools but many parameters have to be estimated if categorical predictors with many categories are included. The method proposed here focusses on the main effects of categorical predictors by using tree type methods to obtain clusters of categories. When the predictor has many categories one wants to know in particular which of the categories have to be distinguished with respect to their effect on the response. The tree-structured approach allows to detect clusters of categories that share the same effect while letting other predictors, in particular metric predictors, have a linear or additive effect on the response. An algorithm for the fitting is proposed and various stopping criteria are evaluated. The preferred stopping criterion is based on p values representing a conditional inference procedure. In addition, stability of clusters is investigated and the relevance of predictors is investigated by bootstrap methods. Several applications show the usefulness of the tree-structured approach and small simulation studies demonstrate that the fitting procedure works well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号