首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Advanced statistical techniques and data mining methods have been recognized as a powerful support for mass spectrometry (MS) data analysis. Particularly, due to its unsupervised learning nature, data clustering methods have attracted increasing interest for exploring, identifying, and discriminating pathological cases from MS clinical samples. Supporting biomarker discovery in protein profiles has drawn special attention from biologists and clinicians. However, the huge amount of information contained in a single sample, that is, the high-dimensionality of MS data makes the effective identification of biomarkers a challenging problem.In this paper, we present a data mining approach for the analysis of MS data, in which the mining phase is developed as a task of clustering of MS data. Under the natural assumption of modeling MS data as time series, we propose a new representation model of MS data which allows for significantly reducing the high-dimensionality of such data, while preserving the relevant features. Besides the reduction of high-dimensionality (which typically affects effectiveness and efficiency of computational methods), the proposed representation model of MS data also alleviates the critical task of preprocessing the raw spectra in the whole process of MS data analysis. We evaluated our MS data clustering approach to publicly available proteomic datasets, and experimental results have shown the effectiveness of the proposed approach that can be used to aid clinicians in studying and formulating diagnosis of pathological states.  相似文献   

2.
3.
Scalability of clustering algorithms is a critical issue facing the data mining community. One method to handle this issue is to use only a subset of all instances. This paper develops an optimization-based approach to the partitional clustering problem using an algorithm specifically designed for noisy performance, which is a problem that arises when using a subset of instances. Numerical results show that computation time can be dramatically reduced by using a partial set of instances without sacrificing solution quality. In addition, these results are more persuasive as the size of the problem is larger.  相似文献   

4.
In this paper, we present a new clustering method that involves data envelopment analysis (DEA). The proposed DEA-based clustering approach employs the piecewise production functions derived from the DEA method to cluster the data with input and output items. Thus, each evaluated decision-making unit (DMU) not only knows the cluster that it belongs to, but also checks the production function type that it confronts. It is important for managerial decision-making where decision-makers are interested in knowing the changes required in combining input resources so it can be classified into a desired cluster/class. In particular, we examine the fundamental CCR model to set up the DEA clustering approach. While this approach has been carried for the CCR model, the proposed approach can be easily extended to other DEA models without loss of generality. Two examples are given to explain the use and effectiveness of the proposed DEA-based clustering method.  相似文献   

5.
4OR - Partitioning a given data-set into subsets based on similarity among the data is called clustering. Clustering is a major task in data mining and machine learning having many applications...  相似文献   

6.
A new empirical likelihood approach is developed to analyze data from two-stage sampling designs, in which a primary sample of rough or proxy measures for the variables of interest and a validation subsample of exact information are available. The validation sample is assumed to be a simple random subsample from the primary one. The proposed empirical likelihood approach is capable of utilizing all the information from both the specific models and the two available samples flexibly. It maintains some nice features of the empirical likelihood method and improves the asymptotic efficiency of the existing inferential procedures. The asymptotic properties are derived for the new approach. Some numerical studies are carried out to assess the finite sample performance.  相似文献   

7.
Maximal margin based frameworks have emerged as a powerful tool for supervised learning. The extension of these ideas to the unsupervised case, however, is problematic since the underlying optimization entails a discrete component. In this paper, we first study the computational complexity of maximal hard margin clustering and show that the hard margin clustering problem can be precisely solved in O(n d+2) time where n is the number of the data points and d is the dimensionality of the input data. However, since it is well known that many datasets commonly ‘express’ themselves primarily in far fewer dimensions, our interest is in evaluating if a careful use of dimensionality reduction can lead to practical and effective algorithms. We build upon these observations and propose a new algorithm that gradually increases the number of features used in the separation model in each iteration, and analyze the convergence properties of this scheme. We report on promising numerical experiments based on a ‘truncated’ version of this approach. Our experiments indicate that for a variety of datasets, good solutions equivalent to those from other existing techniques can be obtained in significantly less time.  相似文献   

8.
A model based clustering procedure for data of mixed type, clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. clustMD employs a parsimonious covariance structure for the latent variables, leading to a suite of six clustering models that vary in complexity and provide an elegant and unified approach to clustering mixed data. An expectation maximisation (EM) algorithm is used to estimate clustMD; in the presence of nominal data a Monte Carlo EM algorithm is required. The clustMD model is illustrated by clustering simulated mixed type data and prostate cancer patients, on whom mixed data have been recorded.  相似文献   

9.
10.
Finite mixture regression models are useful for modeling the relationship between response and predictors arising from different subpopulations. In this article, we study high-dimensional predictors and high-dimensional response and propose two procedures to cluster observations according to the link between predictors and the response. To reduce the dimension, we propose to use the Lasso estimator, which takes into account the sparsity and a maximum likelihood estimator penalized by the rank, to take into account the matrix structure. To choose the number of components and the sparsity level, we construct a collection of models, varying those two parameters and we select a model among this collection with a non-asymptotic criterion. We extend these procedures to functional data, where predictors and responses are functions. For this purpose, we use a wavelet-based approach. For each situation, we provide algorithms and apply and evaluate our methods both on simulated and real datasets, to understand how they work in practice.  相似文献   

11.
The analysis of finite mixture models for exponential repeated data is considered. The mixture components correspond to different unknown groups of the statistical units. Dependency and variability of repeated data are taken into account through random effects. For each component, an exponential mixed model is thus defined. When considering parameter estimation in this mixture of exponential mixed models, the EM-algorithm cannot be directly used since the marginal distribution of each mixture component cannot be analytically derived. In this paper, we propose two parameter estimation methods. The first one uses a linearisation specific to the exponential distribution hypothesis within each component. The second approach uses a Metropolis–Hastings algorithm as a building block of a general MCEM-algorithm.  相似文献   

12.
Functional data clustering: a survey   总被引:1,自引:0,他引:1  
Clustering techniques for functional data are reviewed. Four groups of clustering algorithms for functional data are proposed. The first group consists of methods working directly on the evaluation points of the curves. The second groups is defined by filtering methods which first approximate the curves into a finite basis of functions and second perform clustering using the basis expansion coefficients. The third groups is composed of methods which perform simultaneously dimensionality reduction of the curves and clustering, leading to functional representation of data depending on clusters. The last group consists of distance-based methods using clustering algorithms based on specific distances for functional data. A software review as well as an illustration of the application of these algorithms on real data are presented.  相似文献   

13.
New clustering methods for interval data   总被引:3,自引:0,他引:3  
Summary  In this paper we propose two clustering methods for interval data based on the dynamic cluster algorithm. These methods use different homogeneity criteria as well as different kinds of cluster representations (prototypes). Some tools to interpret the final partitions are also introduced. An application of one of the methods concludes the paper.  相似文献   

14.
15.
We present a new exact approach for solving bi-objective integer linear programs. The new approach employs two of the existing exact algorithms in the literature, including the balanced box and the ?-constraint methods, in two stages. A computationally study shows that the new approach has three desirable characteristics. (1) It solves less single-objective integer linear programs. (2) Its solution time is significantly smaller. (3) It is competitive with the two-stage algorithm proposed by Leitner et al. (2016).  相似文献   

16.
This paper develops a new numerical technique to price an American option written upon an underlying asset that follows a bivariate diffusion process. The technique presented here exploits the supermartingale representation of an American option price together with a coarse approximation of its early exercise surface that is based on an efficient implementation of the least-squares Monte–Carlo algorithm (LSM) of Longstaff and Schwartz (Rev Financ Stud 14:113–147, 2001). Our approach also has the advantage of avoiding two main issues associated with LSM, namely its inherent bias and the basis functions selection problem. Extensive numerical results show that our approach yields very accurate prices in a computationally efficient manner. Finally, the flexibility of our method allows for its extension to a much larger class of optimal stopping problems than addressed in this paper.  相似文献   

17.
In this paper we demonstrate how Gröbner bases and other algebraic techniques can be used to explore the geometry of the probability space of Bayesian networks with hidden variables. These techniques employ a parametrisation of Bayesian network by moments rather than conditional probabilities. We show that whilst Gröbner bases help to explain the local geometry of these spaces a complimentary analysis, modelling the positivity of probabilities, enhances and completes the geometrical picture. We report some recent geometrical results in this area and discuss a possible general methodology for the analyses of such problems.  相似文献   

18.
This study presents a methodology that is able to further discriminate the efficient decision-making units (DMUs) in a two-stage data envelopment analysis (DEA) context. The methodology is an extension of the single-stage network-based ranking method, which utilizes the eigenvector centrality concept in social network analysis to determine the rank of efficient DMUs. The mathematical formulation for the method to work under the two-stage DEA context is laid out and then applied to a real-world problem. In addition to its basic ranking function, the exercise highlights two particular features of the method that are not available in standard DEA: suggesting a benchmark unit for each input/intermediate/output factor, and identifying the strengths of each efficient unit. With the methodology, the value of DEA greatly increases.  相似文献   

19.
In data stream environment, most of the conventional clustering algorithms are not sufficiently efficient, since large volumes of data arrive in a stream and these data points unfold with time. The problem of clustering time-evolving metric data and categorical time-evolving data has separately been well explored in recent years, but the problem of clustering mixed type time-evolving data remains a challenging issue due to an awkward gap between the structure of metric and categorical attributes. In this paper, we devise a generalized framework, termed Equi-Clustream to dynamically cluster mixed type time-evolving data, which comprises three algorithms: a Hybrid Drifting Concept Detection Algorithm that detects the drifting concept between the current sliding window and previous sliding window, a Hybrid Data Labeling Algorithm that assigns an appropriate cluster label to each data vector of the current non-drifting window based on the clustering result of the previous sliding window, and a visualization algorithm that analyses the relationship between the clusters at different timestamps and also visualizes the evolving trends of the clusters. The efficacy of the proposed framework is shown by experiments on synthetic and real world datasets.  相似文献   

20.
We propose a new clustering approach, called optimality-based clustering, that clusters data points based on their latent decision-making preferences. We assume that each data point is a decision generated by a decision-maker who (approximately) solves an optimization problem and cluster the data points by identifying a common objective function of the optimization problems for each cluster such that the worst-case optimality error is minimized. We propose three different clustering models and test them in the diet recommendation application.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号