Data mining the NCI cancer cell line compound GI(50) values: identifying quinone subtypes effective against melanoma and leukemia cell classes期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Data mining the NCI cancer cell line compound GI(50) values: identifying quinone subtypes effective against melanoma and leukemia cell classes

Authors:	Marx Kenneth A O'Neil Philip Hoffman Patrick Ujwal M L

Institution:	AnVil, Inc, 25 Corporate Drive, Burlington, Massachusetts 01803, USA. kenneth_marx@uml.edu

Abstract:	Using data mining techniques, we have studied a subset (1400) of compounds from the large public National Cancer Institute (NCI) compounds data repository. We first carried out a functional class identity assignment for the 60 NCI cancer testing cell lines via hierarchical clustering of gene expression data. Comprised of nine clinical tissue types, the 60 cell lines were placed into six classes-melanoma, leukemia, renal, lung, and colorectal, and the sixth class was comprised of mixed tissue cell lines not found in any of the other five classes. We then carried out supervised machine learning, using the GI(50) values tested on a panel of 60 NCI cancer cell lines. For separate 3-class and 2-class problem clustering, we successfully carried out clear cell line class separation at high stringency, p < 0.01 (Bonferroni corrected t-statistic), using feature reduction clustering algorithms embedded in RadViz, an integrated high dimensional analytic and visualization tool. We started with the 1400 compound GI(50) values as input and selected only those compounds, or features, significant in carrying out the classification. With this approach, we identified two small sets of compounds that were most effective in carrying out complete class separation of the melanoma, non-melanoma classes and leukemia, non-leukemia classes. To validate these results, we showed that these two compound sets' GI(50) values were highly accurate classifiers using five standard analytical algorithms. One compound set was most effective against the melanoma class cell lines (14 compounds), and the other set was most effective against the leukemia class cell lines (30 compounds). The two compound classes were both significantly enriched in two different types of substituted p-quinones. The melanoma cell line class of 14 compounds was comprised of 11 compounds that were internal substituted p-quinones, and the leukemia cell line class of 30 compounds was comprised of 6 compounds that were external substituted p-quinones. Attempts to subclassify melanoma or leukemia cell lines based upon their clinical cancer subtype met with limited success. For example, using GI(50) values for the 30 compounds we identified as effective against all leukemia cell lines, we could subclassify acute lymphoblastic leukemia (ALL) origin cell lines from non-ALL leukemia origin cell lines without significant overlap from non-leukemia cell lines. Based upon clustering using GI(50) values for the 60 cancer cell lines laid out by the RadViz algorithm, these two compound subsets did not overlap with clusters containing any of the NCI's 92 compounds of known mechanism of action, a few of which are quinones. Given their structural patterns, the two p-quinone subtypes we identified would clearly be expected to possess different redox potentials/substrate specificities for enzymatic reduction in vivo. These two p-quinone subtypes represent valuable information that may be used in the elucidation of pharmacophores for the design of compounds to treat these two cancer tissue types in the clinic.

Keywords:
本文献已被 PubMed 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏