首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 549 毫秒
1.
We formulate a discrete optimization problem that leads to a simple and informative derivation of a widely used class of spectral clustering algorithms. Regarding the algorithms as attempting to bi-partition a weighted graph with N vertices, our derivation indicates that they are inherently tuned to tolerate all partitions into two non-empty sets, independently of the cardinality of the two sets. This approach also helps to explain the difference in behaviour observed between methods based on the unnormalized and normalized graph Laplacian. We also give a direct explanation of why Laplacian eigenvectors beyond the Fiedler vector may contain fine-detail information of relevance to clustering. We show numerical results on synthetic data to support the analysis. Further, we provide examples where normalized and unnormalized spectral clustering is applied to microarray data—here the graph summarizes similarity of gene activity across different tissue samples, and accurate clustering of samples is a key task in bioinformatics.  相似文献   

2.
与一般相似度函数相关的谱聚类的收敛性   总被引:1,自引:0,他引:1       下载免费PDF全文
谱聚类算法由与相似度函数相关的图Laplace 算子的特征函数产生. 本文证明与一般相似度函数相关的谱聚类算法的收敛性, 并使用覆盖数方法对收敛性给出量化估计. 当相似度函数是欧氏空间子集上一个Lipschitz s > 0 函数时, O(√log(n + 1)/√n) 形式的收敛率得到证实. 我们同时指出一个相应函数集的覆盖数的增长可以表现任意差.  相似文献   

3.
In this paper we study a special class of multiobjective discrete control problems on dynamic networks. We assume that the dynamics of the system is controlled by p actors (players) and each of them intend to minimize his own integral-time cost by a certain trajectory. Applying Nash and Pareto optimality principles we study multiobjective control problems on dynamic networks where the dynamics is described by a directed graph.Polynomial-time algorithms for determining the optimal strategies of the players in the considered multiobjective control problems are proposed exploiting the special structure of the underlying graph. Properties of time-expanded networks are characterized. A constructive scheme which consists of several algorithms is presented.  相似文献   

4.
One of the most significant discussions in the field of machine learning today is on the clustering ensemble. The clustering ensemble combines multiple partitions generated by different clustering algorithms into a single clustering solution. Genetic algorithms are known for their high ability to solve optimization problems, especially the problem of the clustering ensemble. To date, despite the major contributions to find consensus cluster partitions with application of genetic algorithms, there has been little discussion on population initialization through generative mechanisms in genetic-based clustering ensemble algorithms as well as the production of cluster partitions with favorable fitness values in first phase clustering ensembles. In this paper, a threshold fuzzy C-means algorithm, named TFCM, is proposed to solve the problem of diversity of clustering, one of the most common problems in clustering ensembles. Moreover, TFCM is able to increase the fitness of cluster partitions, such that it improves performance of genetic-based clustering ensemble algorithms. The fitness average of cluster partitions generated by TFCM are evaluated by three different objective functions and compared against other clustering algorithms. In this paper, a simple genetic-based clustering ensemble algorithm, named SGCE, is proposed, in which cluster partitions generated by the TFCM and other clustering algorithms are used as the initial population used by the SGCE. The performance of the SGCE is evaluated and compared based on the different initial populations used. The experimental results based on eleven real world datasets demonstrate that TFCM improves the fitness of cluster partitions and that the performance of the SGCE is enhanced using initial populations generated by the TFCM.  相似文献   

5.
Many machine learning based algorithms contain a training step that is done once. The training step is usually computational expensive since it involves processing of huge matrices. If the training profile is extracted from an evolving dynamic dataset, it has to be updated as some features of the training dataset are changed. This paper proposes a solution how to update this profile efficiently. Therefore, we investigate how to update the training profile when the data is constantly evolving. We assume that the data is modeled by a kernel method and processed by a spectral decomposition. In many algorithms for clustering and classification, a low dimensional representation of the affinity (kernel) graph of the embedded training dataset is computed. Then, it is used for classifying newly arrived data points. We present methods for updating such embeddings of the training datasets in an incremental way without the need to perform the entire computation upon the occurrences of changes in a small number of the training samples. Efficient computation of such an algorithm is critical in many web based applications.  相似文献   

6.
An approach for factoring general boolean functions was described in Golumbic and Mintz [Factoring logic functions using graph partitioning, in: Proceedings of IEEE/ACM International Conference on Computer Aided Design, November 1999, pp. 195-198] and Mintz and Golumbic [Factoring Boolean functions using graph partitioning, Discrete Appl. Math. 149 (2005) 131-153] which is based on graph partitioning algorithms. In this paper, we present a very fast algorithm for recognizing and factoring read-once functions which is needed as a dedicated factoring subroutine to handle the lower levels of that factoring process. The algorithm is based on algorithms for cograph recognition and on checking normality.For non-read-once functions, we investigate their factoring based on their corresponding graph classes. In particular, we show that if a function F is normal and its corresponding graph is a partial k-tree, then F is a read 2k function and a read 2k formula for F can be obtained in polynomial time.  相似文献   

7.
Fitness landscape theory is a mathematical framework for numerical analysis of search algorithms on combinatorial optimization problems. We study a representation of fitness landscape as a weighted directed graph. We consider out forest and in forest structures in this graph and establish important relationships among the forest structures of a directed graph, the spectral properties of the Laplacian matrices, and the numbers of local optima of the landscape. These relationships provide a new approach for computing the numbers of local optima for various problem instances and neighborhood structures.  相似文献   

8.
This is primarily an expository paper surveying up-to-date known results on the spectral theory of 1-Laplacian on graphs and its applications to the Cheeger cut, maxcut and multi-cut problems. The structure of eigenspace, nodal domains, multiplicities of eigenvalues, and algorithms for graph cuts are collected.  相似文献   

9.
Graph Theoretic and Spectral Analysis of Enron Email Data   总被引:1,自引:0,他引:1  
Analysis of social networks to identify communities and model their evolution has been an active area of recent research. This paper analyzes the Enron email data set to discover structures within the organization. The analysis is based on constructing an email graph and studying its properties with both graph theoretical and spectral analysis techniques. The graph theoretical analysis includes the computation of several graph metrics such as degree distribution, average distance ratio, clustering coefficient and compactness over the email graph. The spectral analysis shows that the email adjacency matrix has a rank-2 approximation. It is shown that preprocessing of data has significant impact on the results, thus a standard form is needed for establishing a benchmark data. Anurat Chapanond is currently a Ph.D. student in Computer Science, RPI. Anurat graduated B. Eng. degree in Computer Engineering from Chiangmai University (Thailand) in 1997, M. S. in Computer Science from Columbia University in 2002. His research interest is in web data mining analyses and algorithms. M.S. Krishnamoorthy received the B.E. degree (with honors) from Madras University in 1969, the M. Tech degree in Electrical Engineering from the Indian Institute of Technology, Kanpur, in 1971, and the Ph. D. degree in Computer Science, also from the Indian Institute of Technology, in 1976. From 1976 to 1979, he was an Assistant Professor of Computer Science at the Indian Institute of Technology, Kanpur. From 1979 to 1985, he was an Assistant Professor of Computer Science at Rensselaer Polytechnic Institute, Troy, NY, and since, 1985, he has been an Associate Professor of Computer Science at Rensselaer. Dr. Krishnamoorthy's research interests are in the design and analysis of combinatorial and algebraic algorithms, visualization algorithms and programming environments. Bulent Yener is an Associate Professor in the Department of Computer Science and Co-Director of Pervasive Computing and Networking Center at Rensselaer Polytechnic Institute in Troy, New York. He is also a member of Griffiss Institute of Information Assurance. Dr. Yener received MS. and Ph.D. degrees in Computer Science, both from Columbia University, in 1987 and 1994, respectively. Before joining to RPI, he was a Member of Technical Staff at the Bell Laboratories in Murray Hill, New Jersey. His current research interests include bioinformatics, medical informtatics, routing problems in wireless networks, security and information assurance, intelligence and security informatics. He has served on the Technical Program Committee of leading IEEE conferences and workshops. Currently He is an associate editor of ACM/Kluwer Winet journal and the IEEE Network Magazine. Dr. Yener is a Senior Member of the IEEE Computer Society.  相似文献   

10.
Tag SNP selection is an important problem in genetic association studies. A class of algorithms to perform this task, among them a popular tool called Tagger, can be described as searching for a minimal vertex cover of a graph. In this article this approach is contrasted with a recently introduced clustering algorithm based on the graph theoretical concept of dominant sets. To compare the performance of both procedures comprehensive simulation studies have been performed using SNP data from the ten ENCODE regions included in the HapMap project. Quantitative traits have been simulated from additive models with a single causative SNP. Simulation results suggest that clustering performs always at least as good as Tagger, while in more than a third of the considered instances substantial improvement can be observed. Additionally an extension of the clustering algorithm is described which can be used for larger genomic data sets.  相似文献   

11.
We introduce a new kind of graph called multi-edge graph which arises in many applications such as finite state Markov chains and broadcasting communication networks. A cluster in such a graph is a set of nodes such that for any ordered pair of nodes, there is a path of multi-edges from one to the other such that these edges remain within the same set. We give two algorithms to partition a multi-edge graph into maximal clusters. Both these algorithms are based on the depth-first search algorithm to find strongly connected components of the directed graph. We also discuss some applications of clustering in multi-edge graphs.  相似文献   

12.
帅天平  胡晓东 《应用数学》2005,18(3):411-416
本文讨论了一类在线变尺寸装箱问题,假定箱子的尺寸可以是不同的.箱子是在线到达的,仅当箱子到达后其尺寸才知道.给定一个带有核元的物品表及其上的核元关系图.我们的目标是要将表中元素装入到达的箱子中,保证任何箱子所装物品不互为核元,即所装物品对应的点所导出的子图是个空图,并使得所用的箱子总长最小.我们证明了该问题是NPHard的,并给出了基于图的点染色、图的团分解和基于背包问题的近似算法,给出了算法的时间复杂度和性能界.  相似文献   

13.
Graph complexity measures such as tree-width, clique-width and rank-width are important because they yield Fixed Parameter Tractable algorithms. These algorithms are based on hierarchical decompositions of the considered graphs, and on boundedness conditions on the graph operations that, more or less explicitly, recombine the components of decompositions into larger graphs. Rank-width is defined in a combinatorial way, based on a tree, and not in terms of graph operations. We define operations on graphs that characterize rank-width and help for the construction of Fixed Parameter Tractable algorithms, especially for problems specified in monadic second-order logic.  相似文献   

14.
We present efficient (parallel) algorithms for two hierarchical clustering heuristics. We point out that these heuristics can also be applied to solving some algorithmic problems in graphs, including split decomposition. We show that efficient parallel split decomposition induces an efficient parallel parity graph recognition algorithm. This is a consequence of the result of S. Cicerone and D. Di Stefano [[7]] that parity graphs are exactly those graphs that can be split decomposed into cliques and bipartite graphs.  相似文献   

15.
Finding complete subgraphs in a graph, that is, cliques, is a key problem and has many real-world applications, e.g., finding communities in social networks, clustering gene expression data, modeling ecological niches in food webs, and describing chemicals in a substance. The problem of finding the largest clique in a graph is a well-known difficult combinatorial optimization problem and is called the maximum clique problem. In this paper, we formulate a very convenient continuous characterization of the maximum clique problem based on the symmetric rank-one non-negative approximation of a given matrix and build a one-to-one correspondence between stationary points of our formulation and cliques of a given graph. In particular, we show that the local (resp. global) minima of the continuous problem corresponds to the maximal (resp. maximum) cliques of the given graph. We also propose a new and efficient clique finding algorithm based on our continuous formulation and test it on the DIMACS data sets to show that the new algorithm outperforms other existing algorithms based on the Motzkin–Straus formulation and can compete with a sophisticated combinatorial heuristic.  相似文献   

16.
Given a row-stochastic matrix describing pairwise similarities between data objects, spectral clustering makes use of the eigenvectors of this matrix to perform dimensionality reduction for clustering in fewer dimensions. One example from this class of algorithms is the Robust Perron Cluster Analysis (PCCA+), which delivers a fuzzy clustering. Originally developed for clustering the state space of Markov chains, the method became popular as a versatile tool for general data classification problems. The robustness of PCCA+, however, cannot be explained by previous perturbation results, because the matrices in typical applications do not comply with the two main requirements: reversibility and nearly decomposability. We therefore demonstrate in this paper that PCCA+ always delivers an optimal fuzzy clustering for nearly uncoupled, not necessarily reversible, Markov chains with transition states.  相似文献   

17.
We consider the problem of subspace clustering with data that is potentially corrupted by both dense noise and sparse gross errors. In particular, we study a recently proposed low rank subspace clustering approach based on a nonconvex modeling formulation. This formulation includes a nonconvex spectral function in the objective function that makes the optimization task challenging, e.g., it is unknown whether the alternating direction method of multipliers (ADMM) framework proposed to solve the nonconvex model formulation is provably convergent. In this paper, we establish that the spectral function is differentiable and give a formula for computing the derivative. Moreover, we show that the derivative of the spectral function is Lipschitz continuous and provide an explicit value for the Lipschitz constant. These facts are then used to provide a lower bound for how the penalty parameter in the ADMM method should be chosen. As long as the penalty parameter is chosen according to this bound, we show that the ADMM algorithm computes iterates that have a limit point satisfying first-order optimality conditions. We also present a second strategy for solving the nonconvex problem that is based on proximal gradient calculations. The convergence and performance of the algorithms is verified through experiments on real data from face and digit clustering and motion segmentation.  相似文献   

18.
This paper discusses the graph covering problem in which a set of edges in an edge- and node-weighted graph is chosen to satisfy some covering constraints while minimizing the sum of the weights. In this problem, because of the large integrality gap of a naive linear programming (LP) relaxation, LP rounding algorithms based on the relaxation yield poor performance. Here we propose a stronger LP relaxation for the graph covering problem. The proposed relaxation is applied to designing primal–dual algorithms for two fundamental graph covering problems: the prize-collecting edge dominating set problem and the multicut problem in trees. Our algorithms are an exact polynomial-time algorithm for the former problem, and a 2-approximation algorithm for the latter problem. These results match the currently known best results for purely edge-weighted graphs.  相似文献   

19.
The defining characteristic of fixed interval scheduling problems is that each job has a finite number of fixed processing intervals. A job can be processed only in one of its intervals on one of the available machines, or is not processed at all. A decision has to be made about a subset of the jobs to be processed and their assignment to the processing intervals such that the intervals on the same machine do not intersect. These problems arise naturally in different real-life operations planning situations, including the assignment of transports to loading/unloading terminals, work planning for personnel, computer wiring, bandwidth allocation of communication channels, printed circuit board manufacturing, gene identification and examining computer memory structures. We present a general formulation of the interval scheduling problem, show its relations to cognate problems in graph theory, and survey existing models, results on computational complexity and solution algorithms.  相似文献   

20.
One of the most promising approaches for clustering is based on methods of mathematical programming. In this paper we propose new optimization methods based on DC (Difference of Convex functions) programming for hierarchical clustering. A bilevel hierarchical clustering model is considered with different optimization formulations. They are all nonconvex, nonsmooth optimization problems for which we investigate attractive DC optimization Algorithms called DCA. Numerical results on some artificial and real-world databases are reported. The results demonstrate that the proposed algorithms are more efficient than related existing methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号