期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A time series approach for clustering mass spectrometry data

Francesco Gullo Giovanni Ponti Andrea Tagarelli Giuseppe Tradigo Pierangelo Veltri 《Journal of computational science》2012,3(5):344-355

Advanced statistical techniques and data mining methods have been recognized as a powerful support for mass spectrometry (MS) data analysis. Particularly, due to its unsupervised learning nature, data clustering methods have attracted increasing interest for exploring, identifying, and discriminating pathological cases from MS clinical samples. Supporting biomarker discovery in protein profiles has drawn special attention from biologists and clinicians. However, the huge amount of information contained in a single sample, that is, the high-dimensionality of MS data makes the effective identification of biomarkers a challenging problem.In this paper, we present a data mining approach for the analysis of MS data, in which the mining phase is developed as a task of clustering of MS data. Under the natural assumption of modeling MS data as time series, we propose a new representation model of MS data which allows for significantly reducing the high-dimensionality of such data, while preserving the relevant features. Besides the reduction of high-dimensionality (which typically affects effectiveness and efficiency of computational methods), the proposed representation model of MS data also alleviates the critical task of preprocessing the raw spectra in the whole process of MS data analysis. We evaluated our MS data clustering approach to publicly available proteomic datasets, and experimental results have shown the effectiveness of the proposed approach that can be used to aid clinicians in studying and formulating diagnosis of pathological states. 相似文献

2.

An efficient decentralized clustering algorithm for aggregation of noisy multi-mean data 总被引：1，自引：0，他引：1

Séamus Ó Buadhacháin Gregory Provan 《Journal of Heuristics》2015,21(2):301-328

相似文献

3.

An optimization approach to partitional data clustering

J Kim J Yang S Ólafsson 《The Journal of the Operational Research Society》2009,60(8):1069-1084

Scalability of clustering algorithms is a critical issue facing the data mining community. One method to handle this issue is to use only a subset of all instances. This paper develops an optimization-based approach to the partitional clustering problem using an algorithm specifically designed for noisy performance, which is a problem that arises when using a subset of instances. Numerical results show that computation time can be dramatically reduced by using a partial set of instances without sacrificing solution quality. In addition, these results are more persuasive as the size of the problem is larger. 相似文献

4.

A new clustering approach using data envelopment analysis

Rung-Wei Po Yuh-Yuan Guh Miin-Shen Yang 《European Journal of Operational Research》2009

In this paper, we present a new clustering method that involves data envelopment analysis (DEA). The proposed DEA-based clustering approach employs the piecewise production functions derived from the DEA method to cluster the data with input and output items. Thus, each evaluated decision-making unit (DMU) not only knows the cluster that it belongs to, but also checks the production function type that it confronts. It is important for managerial decision-making where decision-makers are interested in knowing the changes required in combining input resources so it can be classified into a desired cluster/class. In particular, we examine the fundamental CCR model to set up the DEA clustering approach. While this approach has been carried for the CCR model, the proposed approach can be easily extended to other DEA models without loss of generality. Two examples are given to explain the use and effectiveness of the proposed DEA-based clustering method. 相似文献

5.

A competitive optimization approach for data clustering and orthogonal non-negative matrix factorization

Dehghanpour-Sahron Ja’far Mahdavi-Amiri Nezam 《4OR: A Quarterly Journal of Operations Research》2021,19(4):473-499

4OR - Partitioning a given data-set into subsets based on similarity among the data is called clustering. Clustering is a major task in data mining and machine learning having many applications... 相似文献

6.

An empirical likelihood approach to data analysis under two-stage sampling designs

Ming ZhengWen Yu 《Statistics & probability letters》2011,81(8):947-956

A new empirical likelihood approach is developed to analyze data from two-stage sampling designs, in which a primary sample of rough or proxy measures for the variables of interest and a validation subsample of exact information are available. The validation sample is assumed to be a simple random subsample from the primary one. The proposed empirical likelihood approach is capable of utilizing all the information from both the specific models and the two available samples flexibly. It maintains some nice features of the empirical likelihood method and improves the asymptotic efficiency of the existing inferential procedures. The asymptotic properties are derived for the new approach. Some numerical studies are carried out to assess the finite sample performance. 相似文献

7.

An efficient algorithm for maximal margin clustering

Jiming Peng Lopamudra Mukherjee Vikas Singh Dale Schuurmans Linli Xu 《Journal of Global Optimization》2012,52(1):123-137

Maximal margin based frameworks have emerged as a powerful tool for supervised learning. The extension of these ideas to the unsupervised case, however, is problematic since the underlying optimization entails a discrete component. In this paper, we first study the computational complexity of maximal hard margin clustering and show that the hard margin clustering problem can be precisely solved in O(n ^d+2) time where n is the number of the data points and d is the dimensionality of the input data. However, since it is well known that many datasets commonly ‘express’ themselves primarily in far fewer dimensions, our interest is in evaluating if a careful use of dimensionality reduction can lead to practical and effective algorithms. We build upon these observations and propose a new algorithm that gradually increases the number of features used in the separation model in each iteration, and analyze the convergence properties of this scheme. We report on promising numerical experiments based on a ‘truncated’ version of this approach. Our experiments indicate that for a variety of datasets, good solutions equivalent to those from other existing techniques can be obtained in significantly less time. 相似文献

8.

Model based clustering for mixed data: clustMD

Damien McParland Isobel Claire Gormley 《Advances in Data Analysis and Classification》2016,10(2):155-169

A model based clustering procedure for data of mixed type, clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. clustMD employs a parsimonious covariance structure for the latent variables, leading to a suite of six clustering models that vary in complexity and provide an elegant and unified approach to clustering mixed data. An expectation maximisation (EM) algorithm is used to estimate clustMD; in the presence of nominal data a Monte Carlo EM algorithm is required. The clustMD model is illustrated by clustering simulated mixed type data and prostate cancer patients, on whom mixed data have been recorded. 相似文献

9.

Convex clustering for binary data

Choi Hosik Lee Seokho 《Advances in Data Analysis and Classification》2019,13(4):991-1018

相似文献

10.

Model-based regression clustering for high-dimensional data: application to functional data

Emilie Devijver 《Advances in Data Analysis and Classification》2017,11(2):243-279

Finite mixture regression models are useful for modeling the relationship between response and predictors arising from different subpopulations. In this article, we study high-dimensional predictors and high-dimensional response and propose two procedures to cluster observations according to the link between predictors and the response. To reduce the dimension, we propose to use the Lasso estimator, which takes into account the sparsity and a maximum likelihood estimator penalized by the rank, to take into account the matrix structure. To choose the number of components and the sparsity level, we construct a collection of models, varying those two parameters and we select a model among this collection with a non-asymptotic criterion. We extend these procedures to functional data, where predictors and responses are functions. For this purpose, we use a wavelet-based approach. For each situation, we provide algorithms and apply and evaluate our methods both on simulated and real datasets, to understand how they work in practice. 相似文献

11.

A mixture model-based approach to the clustering of exponential repeated data

M.J. Martinez C. Lavergne C. Trottier 《Journal of multivariate analysis》2009,100(9):1938-1951

The analysis of finite mixture models for exponential repeated data is considered. The mixture components correspond to different unknown groups of the statistical units. Dependency and variability of repeated data are taken into account through random effects. For each component, an exponential mixed model is thus defined. When considering parameter estimation in this mixture of exponential mixed models, the EM-algorithm cannot be directly used since the marginal distribution of each mixture component cannot be analytically derived. In this paper, we propose two parameter estimation methods. The first one uses a linearisation specific to the exponential distribution hypothesis within each component. The second approach uses a Metropolis–Hastings algorithm as a building block of a general MCEM-algorithm. 相似文献

12.

Functional data clustering: a survey 总被引：1，自引：0，他引：1

Julien Jacques Cristian Preda 《Advances in Data Analysis and Classification》2014,8(3):231-255

Clustering techniques for functional data are reviewed. Four groups of clustering algorithms for functional data are proposed. The first group consists of methods working directly on the evaluation points of the curves. The second groups is defined by filtering methods which first approximate the curves into a finite basis of functions and second perform clustering using the basis expansion coefficients. The third groups is composed of methods which perform simultaneously dimensionality reduction of the curves and clustering, leading to functional representation of data depending on clusters. The last group consists of distance-based methods using clustering algorithms based on specific distances for functional data. A software review as well as an illustration of the application of these algorithms on real data are presented. 相似文献

13.

New clustering methods for interval data 总被引：3，自引：0，他引：3

Marie Chavent Francisco de A. T. de Carvalho Yves Lechevallier Rosanna Verde 《Computational Statistics》2006,21(2):211-229

Summary In this paper we propose two clustering methods for interval data based on the dynamic cluster algorithm. These methods use different homogeneity criteria as well as different kinds of cluster representations (prototypes). Some tools to interpret the final partitions are also introduced. An application of one of the methods concludes the paper. 相似文献

14.

Region-restricted clustering for geographic data mining

Joachim Gudmundsson Marc van Kreveld Giri Narasimhan 《Computational Geometry》2009,42(3):231-240

相似文献

15.

A two-stage approach for bi-objective integer linear programming

Rui Dai Hadi Charkhgard 《Operations Research Letters》2018,46(1):81-87

We present a new exact approach for solving bi-objective integer linear programs. The new approach employs two of the existing exact algorithms in the literature, including the balanced box and the

?

-constraint methods, in two stages. A computationally study shows that the new approach has three desirable characteristics. (1) It solves less single-objective integer linear programs. (2) Its solution time is significantly smaller. (3) It is competitive with the two-stage algorithm proposed by Leitner et al. (2016). 相似文献

16.

American option pricing under stochastic volatility: an efficient numerical approach

Farid AitSahlia Manisha Goswami Suchandan Guha 《Computational Management Science》2010,7(2):171-187

This paper develops a new numerical technique to price an American option written upon an underlying asset that follows a bivariate diffusion process. The technique presented here exploits the supermartingale representation of an American option price together with a coarse approximation of its early exercise surface that is based on an efficient implementation of the least-squares Monte–Carlo algorithm (LSM) of Longstaff and Schwartz (Rev Financ Stud 14:113–147, 2001). Our approach also has the advantage of avoiding two main issues associated with LSM, namely its inherent bias and the basis functions selection problem. Extensive numerical results show that our approach yields very accurate prices in a computationally efficient manner. Finally, the flexibility of our method allows for its extension to a much larger class of optimal stopping problems than addressed in this paper. 相似文献

17.

Bayesian networks for discrete multivariate data: an algebraic approach to inference

J. Q. Smith J. Croft 《Journal of multivariate analysis》2003,84(2):387-402

In this paper we demonstrate how Gröbner bases and other algebraic techniques can be used to explore the geometry of the probability space of Bayesian networks with hidden variables. These techniques employ a parametrisation of Bayesian network by moments rather than conditional probabilities. We show that whilst Gröbner bases help to explain the local geometry of these spaces a complimentary analysis, modelling the positivity of probabilities, enhances and completes the geometrical picture. We report some recent geometrical results in this area and discuss a possible general methodology for the analyses of such problems. 相似文献

18.

Network-based method for ranking of efficient units in two-stage DEA models

J S Liu W-M Lu 《The Journal of the Operational Research Society》2012,63(8):1153-1164

This study presents a methodology that is able to further discriminate the efficient decision-making units (DMUs) in a two-stage data envelopment analysis (DEA) context. The methodology is an extension of the single-stage network-based ranking method, which utilizes the eigenvector centrality concept in social network analysis to determine the rank of efficient DMUs. The mathematical formulation for the method to work under the two-stage DEA context is laid out and then applied to a real-world problem. In addition to its basic ranking function, the exercise highlights two particular features of the method that are not available in standard DEA: suggesting a benchmark unit for each input/intermediate/output factor, and identifying the strengths of each efficient unit. With the methodology, the value of DEA greatly increases. 相似文献

19.

Equi-Clustream: a framework for clustering time evolving mixed data

Ravi Sankar Sangam Hari Om 《Advances in Data Analysis and Classification》2018,12(4):973-995

In data stream environment, most of the conventional clustering algorithms are not sufficiently efficient, since large volumes of data arrive in a stream and these data points unfold with time. The problem of clustering time-evolving metric data and categorical time-evolving data has separately been well explored in recent years, but the problem of clustering mixed type time-evolving data remains a challenging issue due to an awkward gap between the structure of metric and categorical attributes. In this paper, we devise a generalized framework, termed Equi-Clustream to dynamically cluster mixed type time-evolving data, which comprises three algorithms: a Hybrid Drifting Concept Detection Algorithm that detects the drifting concept between the current sliding window and previous sliding window, a Hybrid Data Labeling Algorithm that assigns an appropriate cluster label to each data vector of the current non-drifting window based on the clustering result of the previous sliding window, and a visualization algorithm that analyses the relationship between the clusters at different timestamps and also visualizes the evolving trends of the clusters. The efficacy of the proposed framework is shown by experiments on synthetic and real world datasets. 相似文献

20.

Optimality-based clustering: An inverse optimization approach

《Operations Research Letters》2022,50(2):205-212

We propose a new clustering approach, called optimality-based clustering, that clusters data points based on their latent decision-making preferences. We assume that each data point is a decision generated by a decision-maker who (approximately) solves an optimization problem and cluster the data points by identifying a common objective function of the optimization problems for each cluster such that the worst-case optimality error is minimized. We propose three different clustering models and test them in the diet recommendation application. 相似文献