A random version of principal component analysis in data clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

A random version of principal component analysis in data clustering

Institution:	1. Regional Institute of Oncology, Iasi, Romania;2. Faculty of Physics, Alexandru Ioan Cuza University, Iasi, Romania;3. Department of Biomedical Sciences, Faculty of Medical Bioengineering, “Grigore T. Popa” University of Medicine and Pharmacy, Iasi, Romania

Abstract:	Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance/correlation matrix of the analyzed data. However, to properly work with high-dimensional data sets, PCA poses severe mathematical constraints on the minimum number of different replicates, or samples, that must be included in the analysis. Generally, improper sampling is due to a small number of data respect to the number of the degrees of freedom that characterize the ensemble. In the field of life sciences it is often important to have an algorithm that can accept poorly dimensioned data sets, including degenerated ones. Here a new random projection algorithm is proposed, in which a random symmetric matrix surrogates the covariance/correlation matrix of PCA, while maintaining the data clustering capacity. We demonstrate that what is important for clustering efficiency of PCA is not the exact form of the covariance/correlation matrix, but simply its symmetry.

Keywords:	Principal component analysis Random projection Dimensionality reduction Data clustering Protein structure Structural bioinformatics
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏