Breast Cancer Molecular Signatures as Determined by SAGE:
Correlation with Lymph Node Status
Abstract
Background: Global gene expression measured by DNA microarrays platforms have been extensively used to classify breast carcinomas correlating with clinical characteristics, including outcome. We generated a breast cancer Serial Analysis of Gene Expression (SAGE) high-resolution database of approximately 2.7 million tags, to perform unsupervised statistical analyses to obtain the molecular classification of breast-invasive ductal carcinomas in correlation with clinicopathological features.
Results: Unsupervised statistical analysis by means of a random forest approach identified two main clusters of breast carcinomas, which differed in their lymph node status (P = 0.01); this suggested that lymph node status leads to globally distinct expression profiles. A total of 245 (55 upmodulated and 190 down-modulated) transcripts were differentially expressed between lymph node (+) [LN(+)] and LN(-) primary breast tumors (fold change, ≥2; P < 0.05). Various LN(+) up-modulated transcripts were validated in independent sets of human breast tumors by means of real-time reverse transcription-PCR (RT-PCR). We validated significant overexpression of transcripts for HOXC10 (P = 0.001), TPD52L1 (P = 0.007), ZFP36L1 (P = 0.011), PLINP1 (P = 0.013), DCTN3 (P = 0.025), DEK (P = 0.031), and CSNK1D (P = 0.04) in LN(+) breast carcinomas. Moreover, the DCTN3 (P = 0.022) and RHBDD2 (P = 0.002) transcripts were confirmed to be overexpressed in tumors that recurred within 6 years of follow-up by real-time RT-PCR. In addition, meta-analysis was employed to compare SAGE data associated with LN(+) status with publicly available breast cancer DNA microarray data sets.
Conclusions: We have generated evidence indicating that the pattern of gene expression in primary breast cancers at the time of surgical removal could discriminate those tumors with lymph node metastatic involvement employing SAGE identifying as well specific transcripts that behave as predictors of recurrence.
Contact: maaldaz-at-mdanderson.org
Supplementary information:
Supplementary data file 1: Differentially expressed genes between LN (+) vs. LN (-) primary breast carcinomas (Fold change; p< 0.05). (44 KB Excel file, zipped.)
Supplementary data files 2 & 3: Meta-analysis: a cross-platform comparison of gene expression profiles. (2-28 KB Excel files, zipped.)
Figure 1. SAGE profiles of 27 primary invasive breast carcinomas. A. The SAGE profiles of 27 breast carcinomas are visualized in a two-dimensional multidimensional scaling plot where each dot represents one sample and the relative distances between samples are correlated with their RF dissimilarities. Breast carcinomas are colored by their RF clustering memberships: cluster A (fuchsia) composed by 78% of LN(+) carcinomas and cluster B (blue) composed of 87% of LN(-) breast carcinomas. B. Hierarchical clustering of 245 differentially expressed genes (55 up-modulated transcripts and 190 downmodulated transcripts) according to patient’s LN based on pathologic diagnosis. Color scale at the bottom of the picture is used to represent expression level: low expression is represented by green, and high expression is represented by red. Results of meta-analysis (from publicly available gene expression microarray data sets) of 55 up-modulated C. and 55 down-modulated transcripts D. identified by SAGE. Red or green boxes, statistically significant agreement between our study and previously published studies not only on LN status, but also in association with other progression parameters such as metastasis or relapse. Red, statistically significant P values (P < 0.05) associated with gene overexpression in LN(+), metastasis, and relapse (DFS); green, statistically significant down-modulated expression. Gray boxes, Unavailable data.
Figure 2. Validation assays of SAGE expression profiles in an independent set of primary invasive breast carcinomas (n=40). A. Real time RT-PCR of seven up-modulated transcripts (HOXC10, TPD52L1, ZFP36L1, PLINP1, DCTN3, DEK, CSNK1D, RHBDD2) in LN(+) carcinomas. B. Real time RT-PCR of two up-modulated transcripts (DCTN3, RHBDD2) in recurrent breast carcinomas. Mean ± 2 SE based on log2 transformation of real time RT-PCR values of the assayed gene relative to 18S rRNA used as normalizing control.
Figure 3. Hierarchical clustering of primary breast carcinomas based on real time RT-PCR validation data. A. Cluster showing two nodes in the basis of LN distribution (P = 0.0001). B. Cluster showing two nodes in the basis of recurrence status distribution (P = 0.001).
Figure 4. DCTN3 immunohistochemical staining in normal (adjacent tumor), ductal carcinoma in situ (DCIS), invasive ductal carcinoma (IDC), and metastatic breast samples.

