Classification for high-throughput data with an optimal subset of principal components期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Classification for high-throughput data with an optimal subset of principal components

Authors:	Joon Jin Song Yuan Ren Fenglan Yan

Affiliation:	^aDepartment of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA;^bDepartment of Poultry Science, University of Arkansas, Fayetteville, AR 72701, USA

Abstract:	High-throughput data have been widely used in biological and medical studies to discover gene and protein functions. Due to the high dimensionality, principal component analysis (PCA) is often involved for data dimension reduction. However, when a few principal components (PCs) are selected for dimension reduction or considered for dimension determination, they are typically ranked by their variances, eigenvalues. However, this approach is not always effective in subsequent multivariate analysis, particularly classification. To maximize information from data with a subset of the components, we apply a different ranking criterion, canonical variate criterion, which considers within- and between-group variance rather than total variance in the classical criterion. Four prevalent classification methods are considered and compared using leave-one-out cross-validation. These methods are illustrated with three real high-throughput data sets, two microarray data sets and a nuclear magnetic resonance spectra data set.

Keywords:	Principal component analysis Canonical variate analysis High-throughput data Classification
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏