首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, we develop a family of data clustering algorithms that combine the strengths of existing spectral approaches to clustering with various desirable properties of fuzzy methods. In particular, we show that the developed method “Fuzzy-RW,” outperforms other frequently used algorithms in data sets with different geometries. As applications, we discuss data clustering of biological and face recognition benchmarks such as the IRIS and YALE face data sets.  相似文献   

2.

In this paper, an ordinal multilevel latent Markov model based on separate random effects is proposed. In detail, two distinct second-level discrete effects are considered in the model, one affecting the initial probability vector and the other affecting the transition probability matrix of the first-level ordinal latent Markov process. To model these separate effects, we consider a bi-dimensional mixture specification that allows to avoid unverifiable assumptions on the random effect distribution and to derive a two-way clustering of second-level units. Starting from a general model where the two random effects are dependent, we also obtain the independence model as a special case. The proposal is applied to data on the physical health status of a sample of elderly residents grouped into nursing homes. A simulation study assessing the performance of the proposal is also included.

  相似文献   

3.
4.
In cricket, when a batsman is dismissed towards the end of a day's play, he is often replaced by a lower-order batsman (a ‘night watchman’), in the hope that the remaining recognised batsmen can start their innings on the following day. A dynamic programming analysis suggests that the common practice of using a lower-order batsman is often sub-optimal. Towards the end of a day's play, when the conventional wisdom seems to be to use a night watchman, it may be best to send in the next recognised batsman in the batting order. Sending in a night watchman may be good judgement when there are several recognised batsman and several lower order batsmen still to play (say four of each). However, with smaller numbers (two of each, for example), then, with very few overs left to play, it may be better to send in a recognised batsman.  相似文献   

5.
Type II topoisomerases are enzymes that change the topology of DNA by performing strand-passage. In particular, they unknot knotted DNA very efficiently. Motivated by this experimental observation, we investigate transition probabilities between knots. We use the BFACF algorithm to generate ensembles of polygons in Z3 of fixed knot type. We introduce a novel strand-passage algorithm which generates a Markov chain in knot space. The entries of the corresponding transition probability matrix determine state-transitions in knot space and can track the evolution of different knots after repeated strand-passage events. We outline future applications of this work to DNA unknotting.  相似文献   

6.
Functional data clustering: a survey   总被引:1,自引:0,他引:1  
Clustering techniques for functional data are reviewed. Four groups of clustering algorithms for functional data are proposed. The first group consists of methods working directly on the evaluation points of the curves. The second groups is defined by filtering methods which first approximate the curves into a finite basis of functions and second perform clustering using the basis expansion coefficients. The third groups is composed of methods which perform simultaneously dimensionality reduction of the curves and clustering, leading to functional representation of data depending on clusters. The last group consists of distance-based methods using clustering algorithms based on specific distances for functional data. A software review as well as an illustration of the application of these algorithms on real data are presented.  相似文献   

7.
The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data available from numerous sources. Nonnegative matrix factorization (NMF) has proven to be a successful method for cluster and topic discovery in unlabeled data sets. In this paper, we propose a fast algorithm for computing NMF using a divide-and-conquer strategy, called DC-NMF. Given an input matrix where the columns represent data items, we build a binary tree structure of the data items using a recently-proposed efficient algorithm for computing rank-2 NMF, and then gather information from the tree to initialize the rank-k NMF, which needs only a few iterations to reach a desired solution. We also investigate various criteria for selecting the node to split when growing the tree. We demonstrate the scalability of our algorithm for computing general rank-k NMF as well as its effectiveness in clustering and topic modeling for large-scale text data sets, by comparing it to other frequently utilized state-of-the-art algorithms. The value of the proposed approach lies in the highly efficient and accurate method for initializing rank-k NMF and the scalability achieved from the divide-and-conquer approach of the algorithm and properties of rank-2 NMF. In summary, we present efficient tools for analyzing large-scale data sets, and techniques that can be generalized to many other data analytics problem domains along with an open-source software library called SmallK.  相似文献   

8.
Factor clustering methods have been developed in recent years thanks to improvements in computational power. These methods perform a linear transformation of data and a clustering of the transformed data, optimizing a common criterion. Probabilistic distance (PD)-clustering is an iterative, distribution free, probabilistic clustering method. Factor PD-clustering (FPDC) is based on PD-clustering and involves a linear transformation of the original variables into a reduced number of orthogonal ones using a common criterion with PD-clustering. This paper demonstrates that Tucker3 decomposition can be used to accomplish this transformation. Factor PD-clustering alternatingly exploits Tucker3 decomposition and PD-clustering on transformed data until convergence is achieved. This method can significantly improve the PD-clustering algorithm performance; large data sets can thus be partitioned into clusters with increasing stability and robustness of the results. Real and simulated data sets are used to compare FPDC with its main competitors, where it performs equally well when clusters are elliptically shaped but outperforms its competitors with non-Gaussian shaped clusters or noisy data.  相似文献   

9.
With an increasing population of mobile subscribers, the signalling traffic to control the subscriber mobility expands rapidly. Subscriber mobility is controlled through location registration based on the so-called location area, the basic area unit for paging which consists of a number of cells. There is a tradeoff between the two kinds of signalling traffic: paging and location updating. As location areas include a larger number of cells, the traffic volume for paging increases while that for location updating decreases. Given not only the pattern of call arrivals but also that for subscriber mobility, our problem is to minimise the total signalling traffic by optimally partitioning the whole area into location areas. We show that this problem can be transformed to the so-called clique partitioning problem (CPP). Also we demonstrate the process of implementing the algorithm for solving the CPP for real-world problems defined on the cellular network in Seoul.  相似文献   

10.
We present sufficient conditions for the transience of random walks with bounded jumps in random media on a Cayley tree.  相似文献   

11.
A new numerical method is proposed to predict the effect of particle clustering on grain boundaries in a ceramic- particle-reinforced metal matrix composite on its mechanical properties, and micromechanical finite-element simulation of stress–strain responses in composites with random and clustered arrangements of ceramic particles are carried out. A particular material modeled and analyzed is a TiC-particle-reinforced Al matrix composite processed by powder metallurgy. A representative volume element of a composite microstructure with 5 vol.% TiC is reconstructed based on the tetrakaidecahedral grain boundary structure by using a modified random sequential adsorption. The model proposed in this study accurately represents the stress concentrations and particle-particle interactions during deformation of the powder-metallurgy-processed composite. A comparison with the random-arrangement model shows that the present numerical approach is more accurate in simulating the behavior of the composite material.  相似文献   

12.
13.
Many optimization algorithms require gradients of the model functions, but computing accurate gradients can be computationally expensive. We study the implications of using inexact gradients in the context of the multilevel optimization algorithm MG/Opt. MG/Opt recursively uses (typically cheaper) coarse models to obtain search directions for finer-level models. However, MG/Opt requires the gradient on the fine level to define the recursion. Our primary focus here is the impact of the gradient errors on the multilevel recursion. We analyze, partly through model problems, how MG/Opt is affected under various assumptions about the source of the error in the gradients, and demonstrate that in many cases the effect of the errors is benign. Computational experiments are included.  相似文献   

14.
An appropriate distance is an essential ingredient in various real-world learning tasks. Distance metric learning proposes to study a metric, which is capable of reflecting the data configuration much better in comparison with the commonly used methods. We offer an algorithm for simultaneous learning the Mahalanobis like distance and K-means clustering aiming to incorporate data rescaling and clustering so that the data separability grows iteratively in the rescaled space with its sequential clustering. At each step of the algorithm execution, a global optimization problem is resolved in order to minimize the cluster distortions resting upon the current cluster configuration. The obtained weight matrix can also be used as a cluster validation characteristic. Namely, closeness of such matrices learned during a sample process can indicate the clusters readiness; i.e. estimates the true number of clusters. Numerical experiments performed on synthetic and on real datasets verify the high reliability of the proposed method.  相似文献   

15.
16.
17.
18.
Suppose that S n is the permutation group of degree n, A is a subset of the set of natural numbers ?, and T n(A) is the set of all permutations from S n whose cycle lengths belong to the set A. Permutations from T n are usually called A-permutations. We consider a wide class of sets A of positive asymptotic density. Suppose that ζ mn is the number of cycles of length m of a random permutation uniformly distributed on T n. It is shown in this paper that the finite-dimensional distributions of the random process {tz mn, m ε A} weakly converge as n → ∞ to the finite-dimensional distributions of a Poisson process on A.  相似文献   

19.
A continuum dislocation-based model for the size-dependent plastic deformation at the microscale is presented. An outlook to its application to the single-slip bending of a thin strip is given. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

20.
This paper provides a selective review of the recent developments on econometric/statistical modeling in quantile treatment effects under both selection on observables and on unobservables.First,we discuss identification,estimation and inference of quantile treatment effects under the framework of selection on observables.Then,we consider the case where the treatment variable is endogenous or self-selected,for which an instrumental variable method provides a powerful tool to tackle this problem.Finally,some extensions are discussed to the data-rich environments,to the regression discontinuity design,and some other approaches to identify quantile treatment effects are also discussed.In particular,some future research works in this area are addressed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号