共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents an approach to the local stereo correspondence problem. The primitives or features used are groups of collinear connected edge points called segments. Each segment has several associated attributes or properties. We have verified that the differences of the attributes for the true matches cluster in a cloud around a center. Then for each current pair of primitives we compute a distance between the difference of its attributes and the cluster center. The correspondence is established in the basis of the minimum distance criterion (similarity constraint). We have designed an image understanding system to learn the best representative cluster center. For such purpose a new learning method is derived from the Fuzzy c-Means (FcM) algorithm where the dispersion of the true samples in the cluster is taken into account through the Mahalanobis distance. This is the main contribution of this paper. A better performance of the proposed local stereo-matching learning method is illustrated with a comparative analysis between classical local methods without learning. 相似文献
2.
3.
The fast-growing number of complete genome sequences prompts the development of new phylogenetic approaches. Until recently, understanding the phylogeny of prokaryotes was based on the comparison of highly conserved genes. Several novel whole-genome methods have been proposed during the last few years. Here, we present a novel method of taxonomic analysis, constructed on the basis of gene content and lengths of orthologous genes of 66 completely sequenced genomes of unicellular organisms using Clusters of Orthologous Groups (COGs). Our input data consist of average protein lengths related to ~5000 COGs from 66 genomes. We clustered these data, using an application of the information bottleneck method for unsupervised clustering. This approach is not a regular distance-based method, distinguishing it from other recently published whole-genome based clustering techniques. Although our comprehensive genome clustering is independent of phylogenies based on the level of homology of individual genes, it correlates well with the standard “tree of life” based on sequence similarity of 16s rRNA. 相似文献
4.
《Journal of computational science》2013,4(4):219-231
We propose in this paper two new competitive unsupervised clustering algorithms: the first algorithm deals with ultrametric data, it has a computational cost of O(n). The second algorithm has two strong features: it is fast and flexible on the processed data type as well as in terms of precision. The second algorithm has a computational cost, in the worst case, of O(n2), and in the average case, of O(n). These complexities are due to exploitation of ultrametric distance properties. In the first method, we use the order induced by an ultrametric in a given space to demonstrate how we can explore quickly data proximity. In the second method, we create an ultrametric space from a sample data, chosen uniformly at random, in order to obtain a global view of proximities in the data set according to the similarity criterion. Then, we use this proximity profile to cluster the global set. We present an example of our algorithms and compare their results with those of a classic clustering method. 相似文献
5.
Clustering is an important problem in data mining. It can be formulated as a nonsmooth, nonconvex optimization problem. For the most global optimization techniques this problem is challenging even in medium size data sets. In this paper, we propose an approach that allows one to apply local methods of smooth optimization to solve the clustering problems. We apply an incremental approach to generate starting points for cluster centers which enables us to deal with nonconvexity of the problem. The hyperbolic smoothing technique is applied to handle nonsmoothness of the clustering problems and to make it possible application of smooth optimization algorithms to solve them. Results of numerical experiments with eleven real-world data sets and the comparison with state-of-the-art incremental clustering algorithms demonstrate that the smooth optimization algorithms in combination with the incremental approach are powerful alternative to existing clustering algorithms. 相似文献
6.
Rivera-García Diego García-Escudero Luis A. Mayo-Iscar Agustín Ortega Joaquín 《Advances in Data Analysis and Classification》2019,13(1):201-225
Advances in Data Analysis and Classification - Many clustering algorithms when the data are curves or functions have been recently proposed. However, the presence of contamination in the sample of... 相似文献
7.
M. CraneV. Patrangenaru 《Journal of multivariate analysis》2011,102(2):225-237
In this article we develop a nonparametric methodology for estimating the mean change for matched samples on a Lie group. We then notice that for k≥5, a manifold of projective shapes of k-ads in 3D has the structure of a 3k−15 dimensional Lie group that is equivariantly embedded in a Euclidean space, therefore testing for mean change amounts to a one sample test for extrinsic means on this Lie group. The Lie group technique leads to a large sample and a nonparametric bootstrap test for one population extrinsic mean on a projective shape space, as recently developed by Patrangenaru, Liu and Sughatadasa. On the other hand, in the absence of occlusions, the 3D projective shape of a spatial k-ad can be recovered from a stereo pair of images, thus allowing one to test for mean glaucomatous 3D projective shape change detection from standard stereo pair eye images. 相似文献
8.
We introduce a method for edge detection which is based on clustering the pixels representing any given digital image into two sets (the edge pixels and the non-edge ones). The process is based on associating to each pixel an appropriate vector representing the differences in brightness w.r.t. the surrounding pixels. Clustering is driven by the norms of such vectors, thus it takes place in \(\mathbb {R}\), which allows us to use a (simple) DC (Difference of Convex) optimization algorithm to get the clusters. A novel thinning technique, based on calculation of the edge phase angles, refines the classification obtained by the clustering algorithm. The results of some numerical experiments are also provided. 相似文献
9.
Jenny Bryan 《Journal of multivariate analysis》2004,90(1):44-66
In this work, we assess the suitability of cluster analysis for the gene grouping problem confronted with microarray data. Gene clustering is the exercise of grouping genes based on attributes, which are generally the expression levels over a number of conditions or subpopulations. The hope is that similarity with respect to expression is often indicative of similarity with respect to much more fundamental and elusive qualities, such as function. By formally defining the true gene-specific attributes as parameters, such as expected expression across the conditions, we obtain a well-defined gene clustering parameter of interest, which greatly facilitates the statistical treatment of gene clustering. We point out that genome-wide collections of expression trajectories often lack natural clustering structure, prior to ad hoc gene filtering. The gene filters in common use induce a certain circularity to most gene cluster analyses: genes are points in the attribute space, a filter is applied to depopulate certain areas of the space, and then clusters are sought (and often found!) in the “cleaned” attribute space. As a result, statistical investigations of cluster number and clustering strength are just as much a study of the stringency and nature of the filter as they are of any biological gene clusters. In the absence of natural clusters, gene clustering may still be a worthwhile exercise in data segmentation. In this context, partitions can be fruitfully encoded in adjacency matrices and the sampling distribution of such matrices can be studied with a variety of bootstrapping techniques. 相似文献
10.
The problem of file organization which we consider involves altering the placement of records on pages of a secondary storage device. In addition, we want this reorganization to be done in-place, i.e., using the file's original storage space for the newly reorganized file. The motivation for such a physical change is to improve the database system's performance. For example, by placing frequently and jointly accessed records on the same page or pages, we can try to minimize the number of page accesses made in answering a set of queeries. The optimal assignment (or reassignment) of records to clusters is exactly what record clustering algorithms attempt to do. However, record clustering algorithms usually do not solve the entire problem, i.e., they do not specify how to efficiently reorganize the file to reflect the clustering assignment which they determine. Our algorithm is a companion to general record clustering algorithms since it actually transforms the file. The problem of optimal file reorganization isNP-hard. Consequently, our reorganization algorithm is based on heuristics. The algorithm's time and space requirements are reasonable and its solution is near optimal. In addition, the reorganization problem which we consider in this paper is similar to the problem of join processing when indexes are used.The research of this author was partially supported by the National Science Foundation under grant IST-8696157. 相似文献
11.
A clustering methodology based on biological visual models that imitates how humans visually cluster data by spatially associating patterns has been recently proposed. The method is based on Cellular Neural Networks and some resolution adjustments. The Cellular Neural Network rebuilds low-density areas while different resolutions find the best clustering option. The algorithm has demonstrated good performance compared to other clustering techniques. However, its main drawbacks correspond to its inability to operate with more than two-dimensional data sets and the computational time required for the resolution adjustment mechanism. This paper proposes a new version of this clustering methodology to solve such flaws. In the new approach, a pre-processing stage is incorporated featuring a Self-Organization Map that maps complex high-dimensional relations into a reduced lattice yet preserving the topological organization of the initial data set. This reduced representation is employed as the two-dimensional data set for further processing. In the new version, the resolution adjustment process is also accelerated through the use of an optimization method that combines the Hill-Climbing and the Random Search techniques. By incorporating such mechanisms rather than evaluating all possible resolutions, the optimization strategy finds the best resolution for a clustering problem by using a limited number of iterations. The proposed approach has been evaluated, considering several two-dimensional and high-dimensional datasets. Experimental evidence exhibits that the proposed algorithm performs the clustering task over complex problems delivering a 46% faster on average than the original method. The approach is also compared to other popular clustering techniques reported in the literature. Computational experiments demonstrate competitive results in comparison to other algorithms in terms of accuracy and robustness. 相似文献
12.
A parity subgraph of a graph is a spanning subgraph such that the degrees of each vertex have the same parity in both the subgraph and the original graph. Known results include that every graph has an odd number of minimal parity subgraphs. Define a disparity subgraph to be a spanning subgraph such that each vertex has degrees of opposite parities in the subgraph and the original graph. (Only graphs with all even-order components can have disparity subgraphs). Every even-order spanning tree contains both a unique parity subgraph and a unique disparity subgraph. Moreover, every minimal disparity subgraph is shown to be paired by sharing a spanning tree with an odd number of minimal parity subgraphs, and every minimal parity subgraph is similarly paired with either one or an even number of minimal disparity subgraphs. 相似文献
13.
In this paper, a version of K-median problem, one of the most popular and best studied clustering measures, is discussed.
The model using squared Euclidean distances terms to which the K-means algorithm has been successfully applied is considered.
A fast and robust algorithm based on DC (Difference of Convex functions) programming and DC Algorithms (DCA) is investigated.
Preliminary numerical solutions on real-world databases show the efficiency and the superiority of the appropriate DCA with
respect to the standard K-means algorithm.
相似文献
14.
Clustering analysis plays an important role in the filed of data mining. Nowadays, hierarchical clustering technique is becoming
one of the most widely used clustering techniques. However, for most algorithms of hierarchical clustering technique, the
requirements of high execution efficiency and high accuracy of clustering result cannot be met at the same time. After analyzing
the advantages and disadvantages of the hierarchical algorithms, the paper puts forward a two-stage clustering algorithm,
named Chameleon Based on Clustering Feature Tree (CBCFT), which hybridizes the Clustering Tree of algorithm BIRCH with algorithm
CHAMELEON. By calculating the time complexity of CBCFT, the paper argues that the time complexity of CBCFT increases linearly
with the number of data. By experimenting on sample data set, this paper demonstrates that CBCFT is able to identify clusters
with large variance in size and shape and is robust to outliers. Moreover, the result of CBCFT is as similar as that of CHAMELEON,
but CBCFT overcomes the shortcoming of the low execution efficiency of CHAMELEON. Although the execution time of CBCFT is
longer than BIRCH, the clustering result of CBCFT is much satisfactory than that of BIRCH. Finally, through a case of customer
segmentation of Chinese Petroleum Corp. HUBEI branch; the paper demonstrates that the clustering result of the case is meaningful
and useful.
The research is partially supported by National Natural Science Foundation of China (grants #70372049 and #70121001). 相似文献
15.
《Nonlinear Analysis: Hybrid Systems》2008,2(3):735-749
This paper deals with the modelling of switching systems and focuses on the characterization of the local functioning modes using the online clustering approach. The system considered is represented as a weighted sum of local linear models where each model could have its own structure. This implies that the parameters and the order of the switching system could change when the system switches. Moreover, possible constants of the local models are also unknown. The method presented consists of two steps. First, an online estimation method of the Markov parameters matrix of the local linear models is established. Secondly, the labelling of these parameters is done using a dynamical decision space worked out with learning techniques; each local model being represented by a cluster. The paper ends with an example and a discussion with an aim of illustrating the method’s performance. 相似文献
16.
A novel sparse spectral clustering method using linear algebra techniques is proposed. Spectral clustering methods solve an eigenvalue problem containing a graph Laplacian. The proposed method exploits the structure of the Laplacian to construct an approximation, not in terms of a low rank approximation but in terms of capturing the structure of the matrix. With this approximation, the size of the eigenvalue problem can be reduced. To obtain the indicator vectors from the eigenvectors the method proposed by Zha et al. (2002) [26], which computes a pivoted LQ factorization of the eigenvector matrix, is adapted. This formulation also gives the possibility to extend the method to out-of-sample points. 相似文献
17.
Dynamic clustering for interval data based on L
2 distance 总被引:2,自引:0,他引:2
Francisco de A. T. de Carvalho Paula Brito Hans-Hermann Bock 《Computational Statistics》2006,21(2):231-250
Summary This paper introduces a partitioning clustering method for objects described by interval data. It follows the dynamic clustering
approach and uses and L
2 distance. Particular emphasis is put on the standardization problem where we propose and investigate three standardization
techniques for interval-type variables. Moreover, various tools for cluster interpretation are presented and illustrated by
simulated and real-case data. 相似文献
18.
One of the most promising approaches for clustering is based on methods of mathematical programming. In this paper we propose new optimization methods based on DC (Difference of Convex functions) programming for hierarchical clustering. A bilevel hierarchical clustering model is considered with different optimization formulations. They are all nonconvex, nonsmooth optimization problems for which we investigate attractive DC optimization Algorithms called DCA. Numerical results on some artificial and real-world databases are reported. The results demonstrate that the proposed algorithms are more efficient than related existing methods. 相似文献
19.
《数学的实践与认识》2015,(13)
模拟人类视觉的显著性视觉注意机制还没有形成统一的模型,依据对人类视觉的分析可知显眼、紧凑和对比度高的目标更加吸引人眼的注意,提出一种基于全局对比度结合随机游走的显著目标视觉注意算法,并将视觉显著性检测问题化为马尔科夫随机游走问题.首先计算输入图像的颜色和方向的全局对比度形成特征向量,利用向量间的距离确定图表示的边权重,从而构造随机游走模型的转移矩阵.同时通过全连通图随机游走和k_regular图随机游走提取图像的全局特性和局部特性,并将二者相结合得到显著图,从而确定显著目标.在国际上现有公开测试集上进行仿真实验,并与其它显著性视觉注意检测方法进行对比,结果表明,方法检测结果更加准确、合理,证明算法切实可行. 相似文献
20.
The following mixture model-based clustering methods are compared in a simulation study with one-dimensional data, fixed number of clusters and a focus on outliers and uniform “noise”: an ML-estimator (MLE) for Gaussian mixtures, an MLE for a mixture of Gaussians and a uniform distribution (interpreted as “noise component” to catch outliers), an MLE for a mixture of Gaussian distributions where a uniform distribution over the range of the data is fixed (Fraley and Raftery in Comput J 41:578–588, 1998), a pseudo-MLE for a Gaussian mixture with improper fixed constant over the real line to catch “noise” (RIMLE; Hennig in Ann Stat 32(4): 1313–1340, 2004), and MLEs for mixtures of t-distributions with and without estimation of the degrees of freedom (McLachlan and Peel in Stat Comput 10(4):339–348, 2000). The RIMLE (using a method to choose the fixed constant first proposed in Coretto, The noise component in model-based clustering. Ph.D thesis, Department of Statistical Science, University College London, 2008) is the best method in some, and acceptable in all, simulation setups, and can therefore be recommended. 相似文献