Pathway analysis of primary central nervous system lymphoma

Han W. Tun, David Personett, Karen A. Baskerville, David M. Menke, Kurt A. Jaeckle, Pamela Kreinest, Brandy Edenfield, Abba C. Zubair, Brian P. O'Neill, Weil R. Lai, Peter J. Park and Michael McKinney
This article has an Erratum 112(8):3530


Primary central nervous system (CNS) lymphoma (PCNSL) is a diffuse large B-cell lymphoma (DLBCL) confined to the CNS. A genome-wide gene expression comparison between PCNSL and non-CNS DLBCL was performed, the latter consisting of both nodal and extranodal DLBCL (nDLBCL and enDLBCL), to identify a “CNS signature.” Pathway analysis with the program SigPathway revealed that PCNSL is characterized notably by significant differential expression of multiple extracellular matrix (ECM) and adhesion-related pathways. The most significantly up-regulated gene is the ECM-related osteopontin (SPP1). Expression at the protein level of ECM-related SPP1 and CHI3L1 in PCNSL cells was demonstrated by immunohistochemistry. The alterations in gene expression can be interpreted within several biologic contexts with implications for PCNSL, including CNS tropism (ECM and adhesion-related pathways, SPP1, DDR1), B-cell migration (CXCL13, SPP1), activated B-cell subtype (MUM1), lymphoproliferation (SPP1, TCL1A, CHI3L1), aggressive clinical behavior (SPP1, CHI3L1, MUM1), and aggressive metastatic cancer phenotype (SPP1, CHI3L1). The gene expression signature discovered in our study may represent a true “CNS signature” because we contrasted PCNSL with wide-spectrum non-CNS DLBCL on a genomic scale and performed an in-depth bioinformatic analysis.


Primary central nervous system (CNS) lymphoma (PCNSL) is a diffuse large B-cell lymphoma (DLBCL) with a tropism for the CNS microenvironment and is confined to the CNS. Biologically, PCNSL is interesting in that it is a B-cell lymphoma in the CNS where very few B lymphocytes, if any, are found under normal circumstances.1 Some studies have indicated that PCNSL is of germinal center B-cell origin.2,3 According to a gene expression study, non-CNS DLBCL has been classified into 3 groups: germinal center B cell type, activated B cell type (ABC), and type 3.4 PCNSL has been shown to have immunophenotypic features of ABC.5 These findings taken together indicate that PCNSL develops from a B cell that has been exposed to a germinal center influence outside the CNS. Therefore, understanding the mechanisms that mediate B-cell migration and adaptation to the CNS microenvironment are important goals in research into the biology of PCNSL.

PCNSL remains incurable in most patients.6 Obviously, a better understanding of its biology is crucial to improve the prognosis. To this end, many studies, including DNA microarray studies, have been performed comparing PCNSL to non-CNS DLBCL, usually of nodal type. Of these, the largest microarray study to date compared PCNSL to nodal DLBCL and revealed several important molecular properties, including features linked to angiotropism.7 In our opinion, it is important to contrast PCNSL with all types of non-CNS DLBCL (both nDLBCL and enDLBCL) on a genomic scale and to use in-depth bioinformatic analysis, especially pathway analysis, to identify the “CNS signature.” We performed such a study and revealed new biologic insights.


Study subjects

Fresh frozen samples of PCNSL, nDLBCL, and enDLBCL from immunocompetent patients were obtained under the protocol approved by the Institutional Review Board of the Mayo Clinic. These samples were surplus tissues after the establishment of definitive pathologic diagnosis. The pathologic diagnosis was confirmed by central pathology review (D.M.M.). Totals of 13 PCNSL, 11 nDLBCL, and 19 enDLBCL were used in this study. For CNS tumors, 11 of 13 were stereotactic needle biopsies; the other 2 were from resections. The quality of the samples was ascertained by CD20 immunohistochemical stain; generally an estimated 80% or more content of B cells was observed. The enDLBCL samples originated from spleen (1), tonsil (4), adenoid (1), skin (1), bone (3), stomach (1), liver (1), testes (2), ovary (1), epidural (1), pericardium (1), thyroid (1), and pleural (1).

Microarray protocols

Total RNA was extracted from the dissected lymphoma tissue using a kit from QIAGEN (Rneasy mini kit; Valencia, CA). A fraction of the total RNA was used to perform a quality check for RNA integrity using the Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA). Only samples yielding profiles of intact total RNA (retention of both ribosomal bands and the broad central peak of mRNA) were used for the microarray analyses reported in this paper. The mRNA in the sample was amplified with RiboAmp HS RNA Amplification Kit (Arcturus Engineering, Sunnyvale, CA). The resulting amplified RNA (aRNA) preparations were labeled either with Alexa Fluor 555 (lymphoma sample) or Alexa Fluor 647 (reference RNA) from Invitrogen (Carlsbad, CA). The reference RNA was the “Universal” human RNA from Stratagene (Santa Clara, CA). The Alexa dyes have been shown to have reduced labeling bias.8 The labeled samples were hybridized to Agilent Human Genomic Oligo 60-mer microarrays (41 061 probes) in Agilent microarray chambers (G2534A) at 60°C for 17 hours. After washing and drying, the array was scanned for analysis in a confocal laser scanner (ScanArray Express, PerkinElmer Life and Analytical Sciences, Waltham, MA) and Imagene software (version 6; BioDiscovery, El Segundo, CA) was used to process the images.

Validation of microarray results

Validation of microarray results was accomplished using quantitative real-time polymerase chain reaction (RT-PCR; detailed protocol is in “Detailed protocol for quantitative real-time PCR,” available on the Blood website; see the Supplemental Materials link at the top of the online article). Briefly, a portion of cDNA that was also used for microarray experiments was used to quantitate 10 genes: ATP5J, BCL-6, CD10, CD44, CHI3L1, COX6B1, IRF4, SPP1, TFPI2, and GAPDH. The level of GAPDH was used as a reference for obtaining the levels of the other mRNAs, and the ratios of CNS/nodal and CNS/extranodal were calculated. These ratios were then plotted vs ratios determined from the microarray data, and the correlation coefficient was determined after linear regression. We also performed immunohistochemistry (IHC) on SPP1 and CHI3L1 (detailed protocol in “Detailed protocol for immunohistochemistry procedures” in the Supplemental Materials).

Bioinformatics methods

Clustering and parametric tests.

The gene list used was approximately 11 500 in number (genes “present” on 35 of 43 arrays). In GeneSpring (version 7.2, Agilent) the LOWESS method of normalization was used, and unsupervised clustering of genome-wide expression profiles of PCNSL and non-CNS nodal and extranodal DLBCL was performed using “standard correlation” metric in GeneSpring. Genes identified by Fisher discriminant analysis (FDA) were also used to cluster the 43 samples, using cluster 3.0.9 The clustering was performed using uncentered correlation and complete linkage using genes identified in the FDA for genes separating the samples into 2 classes (CNS and non-CNS) at P less than .01.

Pathway analysis.

Data were imported into GeneSpring where LOWESS normalization was performed. The data were trimmed to those genes that were present on at least 35 of the 43 arrays, approximately 11 500 in number. In effect, this processing removed most of the weaker signals. Because additional global normalization did not change the SigPathway (Sun Microsystems, Santa Clara, CA) results substantively (not shown), the results presented in this article are from data with only LOWESS normalization performed. In this type of normalization, the ratios of the 2 channels on each array are adjusted to correct for nonlinearity between the ratios and signal intensity. The data were exported into Microsoft Excel (Microsoft, Redmond, WA), then to Star Calc (, and saved in “comma-separated value” format. For pathway analysis, the data were imported into R (version 2.2.1; and analyzed with the SigPathway package10 (version 1.1.3, available as a Bioconductor package at Missing values were imputed using the K-nearest neighbor method11 exactly as previously described.12

The SigPathway package performs “pathway analysis” on microarray data by first compiling genes on the microarray into functional (ontologic and pathway associations) categories based on databases searches, producing many gene sets for statistical testing. Assessment of the differential expression of each gene set in pair-wise comparisons of phenotypes in the experimental data is accomplished by calculating a composite t score for each gene set, and then using permutation methods to determine 2 statistical parameters, NT and NE, for each gene set. NT is a measure of the degree to which a given gene set differs from the other gene sets on the array. Rows of the gene X array matrix (where the genes are in rows and the arrays are in columns) are permuted for this calculation (gene labels are permuted). NE is a measure of the degree to which the gene set composite expression is different between phenotypes; columns of the gene X array matrix are permuted for calculation of this parameter (sample labels are permuted). The program ranks gene sets according to the average of the rank-orders of NT and NE; false discovery rate (q value) is calculated to adjust for multiple testing problems. The rank is required to be high in both rank-orders to minimize false positives.


Dimensional reduction methods using matrix decomposition have been applied extensively to microarray data. One useful method in this class is Fisher discriminant analysis (FDA), which maximizes separation of phenotypes.13 As these authors describe, FDA is a matrix decomposition method whereby orthonormal dimensions are determined that maximize the separation between classes. Microarray data (∼11 500 genes on at least 35 of 43 arrays) were exported from GeneSpring into Microsoft Excel, in which a text file was composed for analysis by BioSystAnSe. The FDA was performed with the Singular Value Decomposition option at a criterion of P less than .01. The lymphoma samples were classified either by 3 phenotypes (CNS, nodal, extranodal) or by 2 phenotypes (CNS, non-CNS).

Image acquisition and preparation

Pathology slides were viewed with a Leica DMLB optical microscope (Leica Microsystems, Wetzlar, Germany). Cytoseal-60 mounting media (Richard Allen, Kalamazoo, MI) was used. Images were acquired using a SPOT RT Color Camera (Diagnostic Instruments, Sterling Heights, MI), and were processed with SPOT Advanced program version 2.0 (Diagnostic Instruments) and Adobe Photoshop version 6.0 software (Adobe Systems, San Jose, CA).


Initial characterization of the microarray data

In the final dataset, there were a total of 43 high-quality lymphoma samples that produced reliable microarray data, from which filtering yielded approximately 11 500 genes that were present on at least 35 arrays, a reasonable compromise that optimized both gene numbers and data quality. A standard clustering of these approximately 1l 500 filtered and LOWESS-normalized genes is shown in Figure 1, where the array data are averaged according to the 3 phenotypes. Within the basis list of approximately 11 500 genes, there were 50 genes that were significantly (Student t test; P < .05) expressed at 2-fold or greater difference between the PCNSL and non-CNS DLBCL (Table 1).

Figure 1

Unsupervised clustering of genome-wide expression profiles of PCNSL and non-CNS nodal and extranodal DLBCL. The gene list used was approximately 11 500 in number (genes present on at least 35 of the 43 arrays). The metric used was “standard correlation” in GeneSpring. Because the 2-color array method involved a reference standard, the colors do not represent actual gene expression levels in the tumor samples but rather the ratio of the tumor mRNA to the reference mRNA. The LOWESS method of normalization was used. To the right of the cluster are shown 10 genes of interest enlarged from the cluster; the colored bars correspond to the 3 phenotypes identified at the bottom of the cluster (“Brain,” “Extranodal,” and “Nodal”; left to right). SPP1 (osteopontin), CHI3L1 (chitinase-3 like 1), IRF4 (MUM1), S-100B (S-100 calcium binding protein beta), SERPINA3 (serine proteinase inhibitor, clade A, member 3), CRYAB (crystallin alpha B), LUM (lumican), COL1A2 (collagen type 1 alpha 2), COL6A1 (collagen type 6 alpha 1), and LAMA4 (laminin alpha 4).

Table 1

Genes at least two-fold different between PCNSL and non-CNS DLBCL at P < 0.05

For validation of the microarray data, we performed quantitative RT-PCR for a set of 10 genes, which included several extracellular matrix (ECM)-related genes (which will be shown to be important under “Pathway analysis of the DLBCL gene expression dataset”), several others of interest, and GAPDH. There was excellent agreement between the microarray and quantitative RT-PCR data. Figure 2 shows a plot of the averages of quantitative RT-PCR values for multiple samples of the CNS and non-CNS phenotypes versus their corresponding values on the microarrays. There was one outlier; without this point, the linear correlation coefficient was 0.94 and highly significant (P < .001); the correlation was still statistically significant when the outlier was included (R = 0.79; P < .02).

Figure 2

Validation of selected genes using quantitative RT-PCR. The blue squares represent CNS/nodal sample ratios; the red inverted triangles are the CNS/extranodal sample ratios. The ratios obtained using quantitative RT-PCR are plotted along the y-axis, whereas the ratios calculated from the microarray data are plotted on the x-axis. The genes analyzed were ATP5J, BCL-6, CD10, CD44, CHI3L1, COX6B1, IRF4, SPP1, TFPI2, and GAPDH. The correlation coefficient shown is that calculated without the non-CNS outlier at the right (CHI3L1 CNS/nodal ratio). The correlation remains significant when including this outlier (R = 0.79; P < .02).

Pathway analysis of the DLBCL gene expression dataset

The bioinformatics program SigPathway was used to identify those gene sets that were most powerful in contrasting phenotypes.10 The PCNSL was contrasted pair-wise with non-CNS DLBCL (nDLBCL + enDLBCL), nDLBCL, and enDLBCL for a total of 3 comparisons (Tables 24; these tables are abbreviated in the main text; full versions are included in Tables S1S3). Table 2 shows the pathway analysis results for the contrast between the 13 CNS samples versus 30 “non-CNS” samples (nDLBCL and enDLBCL combined). This contrast led to discoveries of primary importance in this study: assessment of biologic pathways unique to PCNSL. Table 3 contrasts CNS with nDLBCL samples; Table 4 contrasts CNS with enDLBCL samples. Our results in Tables 2 to 4 show numerous gene sets for which there are high values of NT and NE along with corresponding low q values (as shown numerically in the full-length Tables S1S3). This indicates statistically strong differential expression of these biologic pathways between the phenotypes. In each table, there are as many as 20 ranked gene sets exhibiting high NT and NE parameters associated with q values less than 0.0001, indicating very high statistical reliability (shown in Tables S1S3).

Table 2

SigPathway results: PCNSL versus non-CNS DLBCL

Table 3

SigPathway results: PCNSL versus nodal DLBCL

Table 4

SigPathway results: PCNSL versus extranodal DLBCL

Examination of these 3 tables reveals that the PCNSL phenotype differentially expresses 2 major types of ontologic gene sets: one type that primarily sets apart the PCNSL phenotype from both nDLBCL and enDLBCL combined and other gene sets that differentiate PCNSL from each non-CNS group, either nDLBCL or enDLBCL separately. Gene sets of the first type will appear in all 3 tables. For example, in the PCNSL versus non-CNS contrast (Table 2), there are several gene sets that exhibit biologic associations with the ECM and adhesion: gene set 1 (ECM-receptor interaction, gene set 2 (basement membrane), gene set 3 (focal adhesion), gene set 4 (ECM structural constituent), gene set 8 (basal lamina), gene set 19 (extracellular structure organization and biogenesis), gene set 20 (ECM organization and biogenesis), gene set 24 (ECM [sensu Metazoa]), gene set 25 (ECM), gene set 26 (collagen), and gene set 33 (ECM/adhesion molecules). Nine of these 11 gene sets also appear in the PCNSL versus nDLBCL contrast (Table 3), whereas 10 of these 11 gene sets also appear in the PCNSL versus enDLBCL contrast (Table 4). After removing duplicate listings, there were 244 unique genes in these 11 gene sets listed from Table 2. Their expression levels are plotted (in black) overlying the approximately 11 500 total genes (shown in color) in the scatter-plot in Figure 3A. The normalized levels for non-CNS samples (n = 30) were averaged and plotted against averages of the normalized levels for the PCNSL samples (n = 13). Of these 244 ECM and adhesion-related genes (plotted in black), 170 lie above the line of equivalent expression, indicating relatively higher expression in non-CNS samples; 74 genes are expressed at higher levels in the PCNSL samples. Several of these ECM and adhesion-related genes are labeled in Figure 3A. Notably, SPP1 (secreted phosphoprotein 1, osteopontin; NM_000582) and CHI3L1 (chitinase-3 like 1, cartilage glycoprotein-39, ECM structural hydrolase; NM_001276) are expressed at much higher levels in PCNSL (9.7-fold and 2.7-fold, PCNSL > non-CNS, respectively). Several other ECM-related genes labeled in Figure 3A are expressed higher in the non-CNS samples: TFP12 (tissue factor pathway inhibitor 2; NM_006528; 5.1-fold non-CNS > PCNSL), FBN1 (fibrillin 1; NM_000138; 3.4-fold non-CNS > PCNSL), COL1A2 (collagen type 1 alpha2; NM_000089; 3.7-fold non-CNS > PCNSL), and LUM (lumican, collagen binding protein; NM_002345; 3.6-fold non-CNS > PCNSL). Differential expression of genes in sterol biosynthesis pathway also appears to be part of the CNS pathway signature as this pathway is found to be significant in all 3 contrasts.

Figure 3

Expression of selected gene sets. (A) Expression of a set of 244 ECM and adhesion-related genes that distinguish PCNSL from non-CNS DLBCL. LOWESS normalization was performed using genes present on at least 35 of the 43 arrays. The colored points are normalized gene ratios for these approximately 11 500 genes, whereas the black points are the ECM and adhesion-related genes. (B) Expression of a set of 92 cytokine genes that distinguish PCNSL and nodal DLBCL. LOWESS normalization was performed using genes present on at least 35 of the 43 arrays. The colored points are normalized gene ratios for these approximately 11 500 genes, whereas the black points are the cytokine-related genes. (C) Expression of a set of 159 apoptosis-related genes that distinguish PCNSL from extranodal DLBCL. LOWESS normalization was performed using genes present on at least 35 of the 43 arrays. The colored points are normalized gene ratios for these approximately 11 500 genes, whereas the black points are the apoptosis-related genes. The range of colors in these panels reflects range of gene levels in the CNS phenotype in panel A, or the Nodal and Extranodal phenotype in panels B and C, respectively. Specifically, the gene level refers to the ratio formed by dividing the gene level in the tumor by the level of the universal reference. Red indicates levels more than 1.0, whereas green indicates fractions less than 1.0.

Other groupings of ontologic gene sets differentiate the PCNSL from either enDLBCL or nDLBCL separately. For example, The gene sets 10, 15, and 16 in Table 3, where CNS and nDLBCL are contrasted, are involved in cytokine production; these groups do not appear in Table 4, where CNS and enDLBCL are contrasted. Only gene set 41 in Table 4 is linked to cytokines, and it concerns receptor functions, not cytokine production. There are 92 unique genes in gene sets 10, 15, and 16 combined; they are plotted in black overlying all approximately 11 500 genes used in the analysis (plotted in color) in Figure 3B. Several individual genes in this composite are labeled: S-100B, IRF4, CXCL13, BMP7, BC 027979, and IL8 are elevated 2-fold or more in the PCNSL; TNFSF17, TNFSF13B, and VEGFC are elevated in nDLBCL at least 2-fold with respect to CNS samples. Another grouping of gene sets in Table 3 concerns the immunologic functions and responses of T cells and B cells: gene sets 4, 5, 20, 26, 27, and 41. None of the gene sets appear in the contrast between CNS and EN (Table 4). The contrast PCNSL versus nDLBCL also showed significant differential expression of many metabolic pathways. Of these, lipoprotein-related pathways are absent in PCNSL versus enDLBCL contrast.

The PCNSL versus enDLBCL contrast (Table 4) exhibits several gene sets associated with apoptosis (6, 17, 32, 33, 34, 35) that do not appear in the PCNSL versus nDLBCL contrast. These 7 gene sets contain 159 unique genes, which are plotted (in black) overlying all approximately 11 500 genes in the basis (color) in Figure 3C. Several genes of interest in this composite are labeled: S-100B, AF 217966 (CED4-like death effector filament forming), AK074291 (Oligo capping), which are relatively higher in expression in the PCNSL; U45880 (X-linked inhibitor of apoptosis protein, XIAP), S56204 (insulin-like growth factor binding protein 3), and CAPN2 (calpain type 2), which are expressed 2-fold or more higher in the enDLBCL relative to PCNSL. Moreover, certain chromatin and chromosome-related pathways (5, 7, 10, 21, 24) showed significant differential expression between PCNSL and enDLBCL. They are conspicuously absent from PCNSL versus nDLBCL contrast. The contrast results for PCNSL versus enDLBCL also reveal that certain aspects of amine metabolism separate these 2 phenotypes (gene sets 19 and 25).

FDA of the DLBCL gene expression dataset

When FDA was applied to the discrimination of 2 classes, CNS from non-CNS, with the criterion of P less than .01 reliability, 172 genes were identified, and these included 7 of the ECM and adhesion-related group, 3 of the cytokine group, and 4 of the apoptosis group. Clustering (using cluster 3.0) with these 172 genes completely separated the CNS samples from the non-CNS samples (Figure 4; the ratio data were log-transformed and clustered, with cluster 3.0 using uncentered correlation and complete linkage. The right-hand portion of Figure 4 shows enlarged views of several gene clusters of interest, including SPP1 (Figure 4A), DDR1 and DKK1 (Figure 4B), a group of ECM-related genes COL6A1, COL1A2, and LAM4 (Figure 4C), and a cluster containing CRYAB (Figure 4D). Careful examination of the main clustering results reveals patterns of expression that separate most of the samples according to the 3 main phenotypes (“brain” [CNS], nodal, and extranodal), as well as patterns that subdivide the CNS phenotype into 2 subclasses.

Figure 4

Clustering results using FDA genes separating 2 classes: CNS versus non-CNS. The left-hand plot shows the complete tree, whereas 4 regions within the tree are shown at right. Shades of red indicate ratios more than 1.0; shades of green indicate ratios less than 1.0; black indicates a ratio of 1.0.

When FDA was performed to identify genes that separated 3 classes (CNS, nodal, extranodal) at P less than .01, 144 genes were identified. The gene list was similar to that from the 2-class analysis and contained many of the genes in Table 1 and in the SigPathway analyses (not shown). In this analysis, the CNS phenotypes were separated into 2 adjacent groups, one of which contained only CNS samples (n = 8) and the other of which contained the other 5 CNS samples and one EN sample (not shown).

Immunohistochemical studies of DLBCL

Two genes that were strongly and reliably up-regulated in PCNSL were selected for IHC. Expression of osteopontin (SPP1) and chitinase-3 like1 (CHI3L1) proteins was studied in DLBCL samples from PCNSL (n = 15, which made up the 5 used in the microarray study plus 10 additional), nDLBCL (n = 10 with 5 samples used in the microarray study plus 5 additional), and enDLBCL (n = 7 with 5 samples used in the microarray study plus 2 additional). Figure 5 shows examples of osteopontin IHC from PCNSL (Figure 5A,D), nDLBCL (Figure 5B), and enDLBCL (skin, Figure 5C). The 2 views of the PCNSL sample show that most of the tumor cells express moderate to high levels of osteopontin, whereas tumor cells from the 2 samples of non-CNS DLBCL contain little or no osteopontin. The staining pattern for SPP1 was predominantly nuclear, but cytoplasmic staining was also seen. At least some positive staining for SPP1 was seen in 100% of PCNSL and 80% of non-CNS DLBCL. Heavy staining was present in 92% of PCNSL and 26% of non-CNS DLBCL. Figure 6 contains images of CHI3L1 IHC in PCNSL (Figure 6A,D), nDLBCL (Figure 6B), and enDLBCL (spleen, Figure 6C). The 2 views of the PCNSL sample show that most tumor cells express moderate levels of CHI3L1 immunoreactivity with some heavy staining of astrocyte-like cells; the nodal and extranodal sample exhibit some moderate immunoreactivity in nontumor cells (probably macrophages). The staining pattern for CHI3L1 was both cytoplasmic and nuclear, but the nuclear staining was more predominant. Positive staining for CHI3L1 was seen in 73% of PCNSL and 41% of non-CNS DLBCL. Heavy staining was present in 40% of PCNSL and 18% of non-CNS DLBCL.

Figure 5

Osteopontin immunohistochemistry in DLBCL. The immunoperoxidase complexes were visualized with diaminobenzidine (brown), and the sections were counterstained with hematoxylin. (A) PCNSL: original magnification ×200. Nearly every tumor cell of this brain biopsy is immunoreactive. (B) Nodal DLBCL: original magnification ×200. Essentially no tumor cell contains immunoreactivity. (C) Extranodal DLBCL (skin): original magnification ×200. Essentially no tumor cell contains immunoreactivity. (D) PCNSL: original magnification ×1000 oil. Cross section of a small vessel, probably a vein, surrounded by osteopontin-positive tumor cells.

Figure 6

Chitinase-3-like 1 immunohistochemistry in DLBCL. The immunoperoxidase complexes were visualized with diaminobenzidine (brown), and the sections were counterstained with hematoxylin. (A) PCNSL: original magnification ×200. Most tumor cells express moderate levels of CHI3L1 with a minority expressing strong levels. The largest profiles with heavy immunoreactivity are possibly astrocytes (see also panel D). (B) Nodal DLBCL: original magnification ×200. Most of the tumor cells contain low levels of immunoreactivity. The larger, strongly positive cells may be macrophages. (C) Extranodal DLBCL (spleen): original magnification ×200. (D) PCNSL: original magnification ×400. This is a higher power view of the PCNSL shown in panel A, showing astrocyte-like profiles with moderate to strong levels of immunoreactivity.


We have identified alterations in gene expression signature of DLBCL that correlate with anatomic locations, using a statistically powerful method, SigPathway, which makes use of the fact that many, if not most, genes are coregulated according to activation or repression of particular pathways. Most important in the present study was the finding that the PCNSL expresses a unique set of ECM and adhesion-related pathways and genes when contrasted with non-CNS DLBCL. This “CNS signature” for PCNSL was readily attained by contrasting PCNSL with the all the non-CNS DLBCL combined (nDLBCL + enDLBCL). Similar findings were also seen when PCNSL was contrasted with either nDLBCL or enDLBCL, further indicating that these findings are uniquely important for PCNSL. The most significant gene set found in pathway analysis was the ECM-receptor pathway, suggesting that the interaction between the CNS microenvironment and lymphoma cells is of great importance for PCNSL. At the single gene expression level, we also found significant, differential expression of numerous ECM and adhesion-related genes. In addition, we demonstrated the up-regulation in PCNSL of 2 important ECM-related genes, SPP1 and CHI3L1, at the protein level.

The differential expression of ECM-related genes, especially adhesion genes, has been long suspected in PCNSL but has not been proven in previous studies.1416 The reason for this may be that most of the genes in these pathways do not differentially express at a statistically significant level when tested individually. However, when they are analyzed in groups by SigPathway, many ECM-related pathways, including focal adhesion pathways, are found to be significantly implicated in PCNSL biology. Thus, the SigPathway results demonstrate the power of pathway analysis to infer statistically reliable biologic mechanisms in DLBCL, in particular indicating the existence of a unique expression profile for PCNSL. The results of the FDA analyses also provide support for the idea of a unique PCNSL signature. FDA is a matrix decomposition method that identifies those genes whose composite expression patterns are most powerful in distinguishing the phenotypes, and is mathematically very different from the methods of SigPathway. From the basis dataset of approximately 11 500 genes, subsets of fewer than 200 genes were identified by FDA that have expression patterns that can completely, or nearly completely, classify a DLBCL sample according to one of either 2 or 3 phenotypes. Many of the genes identified by FDA were also present in the SigPathway results.

By several measures, the most significant up-regulated gene in PCNSL is an ECM-related gene, SPP1 (osteopontin; OPN). The SigPathway results implicate SPP1 in numerous cellular functions, including cell communication, focal adhesion, immune cell activation, and immune cell migration. This multiplicity of function is consistent with the literature, which has shown SPP1 involvement in various aspects of cancer biology, including cellular proliferation, invasion, metastasis, and regulation of cytokine expression and angiogenesis.17,18 A high level of expression of SPP1 has been associated with aggressive cancers and poor prognosis.17 Our immunohistochemical finding of predominantly nuclear staining pattern in PCNSL cells is quite unique, as SPP1 staining is usually cytoplasmic in other malignant tumors.19 The nuclear localization of SPP1 has been linked to cellular proliferation.20 SPP1 has been found to be up-regulated in other CNS diseases, such as multiple sclerosis,21 and glioblastoma multiforme, and astrocytomas.17 It appears that SPP1 plays an important role in pathogenesis of CNS diseases. To our knowledge, this is the first report of significant up-regulation of SPP1 in PCNSL. It is noteworthy that SPP1 has not been previously reported to be expressed significantly in B cells.

Our IHC experiments show that CHI3L1 expression is higher in PCNSL compared with non-CNS DLBCL. CHI3L1 (YKL-40) is an ECM-related gene widely implicated in the biology of several types of cancer. It has a role in cancer cell proliferation, differentiation, survival, invasiveness, metastasis, angiogenesis, and remodeling of ECM surrounding the tumor.22 Highest serum levels of CHI3L1 are found in patients with metastatic cancer with the shortest recurrence-free interval and shortest overall survival.22 The presence of immunohistochemically detected CHI3L1 in breast cancer is associated with a poor prognosis.23 Other significantly up-regulated ECM/adhesion genes in our study include DDR1 and TACSTD1 (EpCAM). DDR1 is a member of a novel family of receptor tyrosine kinases thought to play a role in cell adhesion.24 It has been shown to be consistently and selectively expressed in human brain tumors.25 TACSTD1 (EpCAM) is a cell adhesion molecule expressed by a variety of carcinomas.26 It is also expressed in normal retina and retinoblastoma.27

Several previous studies of PCNSL are relevant to other findings in our experiments. Up-regulated CXCL13, a B cell attracting–chemokine, has been shown previously to occur in PCNSL.28 We also found that RGS13, which regulates germinal center B-cell responsiveness to CXCL13,29 is up-regulated. Another up-regulated gene, MUM1, a marker of ABC subtype of nDLBCL, has been reported as expressed in more than 90% of PCNSL.5

Other up-regulated genes in our dataset have been implicated previously in cancer biology. TCL1A has been implicated in lymphatic leukemias and lymphomas.30 CRYAB has been found to be up-regulated in cancers and correlated with risk of cancer recurrence.31,32 DKK3 has been implicated in cancer, although its exact role has not been clarified.33 We found that MGST1, one of the glutathione S-transferases, is up-regulated in PCNSL. Glutathione S-transferases have been implicated in lymphomas34 and chemotherapy resistance.35 S-100B has been implicated in neoplasia, especially in melanoma.36,37 TF was the second most up-regulated gene for PCNSL. It has been shown to act as an autocrine regulator of cellular proliferation.38 The receptor for transferrin is frequently expressed in non-Hodgkin lymphomas.39

We have interpreted gene expression alterations within the context of signaling or regulatory pathways to develop hypotheses of biologic mechanisms in DLBCL. In doing so, it is important to keep in mind that the tumors are heterogeneous. In PCNSL samples, malignant B cells are admixed with infiltrating immune cells and cells from the CNS microenvironment. These non-B cells may express some of the implicated genes or indirectly influence B-cell gene expression. Thus, it remains possible that some cell type other than a B cell is the site of expression of a given gene of interest. It is our opinion that the contribution of surrounding tissue is probably minor but that the presence of non-B cells may in some instances be functionally relevant to some aspects of the expression profiles. Sorting out complexities like these will require extensive follow-up experiments with immunohistochemical procedures and microdissection combined with microarray or quantitative RT-PCR approaches.

We think that our findings have significant biologic relevance for lymphoma research and development of novel treatments. The findings indicate that the CNS microenvironment is of great importance for PCNSL. The ECM and adhesion-related pathways may determine some of the biologic characteristics of PCNSL, such as CNS tropism. Individual genes discovered in our study may have roles in different aspects of the biology of PCNSL. SPP1 and DDR1 may play a role in CNS tropism of PCNSL. CXCL13 and SPP1 are probably relevant to the B-cell migration involved in the pathogenesis of PCNSL. B-cell proliferation may be associated with increased expression of SPP1, TCL1A, and CHI3L1. Elevated MUM1 expression indicates that PCNSL is of activated B-cell subtype as previously reported. The coordinate up-regulation of SPP1, CHI3L1, and MUM1 is consistent with the known aggressive clinical behavior of PCNSL. SPP1 and CHI3L1 have been associated with aggressive metastatic cancers, suggesting that PCNSL has an aggressive metastatic cancer phenotype. Because our approach contrasted PCNSL with a wide spectrum of non-CNS DLBCL on a genomic scale with in-depth bioinformatic analysis, the gene expression signature identified in our study may represent a true “CNS signature.”

Table S1

Supplementary PDF file available online.

Table S2

Supplementary PDF file available online.

Table S3

Supplementary PDF file available online.


Contribution: H.W.T. designed the study, performed the statistical analysis and interpretation, and wrote the manuscript. D.P. and K.A.B. performed bench work. D.M.M. performed a pathology review. K.A.J. obtained samples. P.K., B.E., and A.C.Z. performed the immunohistochemistry work. B.P.O. obtained samples. W.R.L. and P.J.P. performed statistical analysis and interpretation. M.M. designed the study, performed the statistical analysis and interpretation, and wrote the manuscript.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Han W. Tun, Department of Hematology and Oncology, Mayo Clinic Jacksonville, 4500 San Pablo Road, Jacksonville, FL 32224; e-mail: Tun.Han{at}; or Michael McKinney, Department of Molecular Pharmacology and Therapeutics, 4500 San Pablo Road, Jacksonville, FL 32224; e-mail: mckinney{at}


The authors thank Kathleen Roberson for expert secretarial help.

This work was supported by the Mayo Foundation (research program, M.M.; internal grant, H.T.), the University of Iowa/Mayo Clinic Lymphoma Specialized Programs of Research Excellence (SPORE; P50 CA97274), the Mayo SPORE in Brain Cancer (P50 CA108961), and the Immunochemistry Core at the Mayo Clinic Jacksonville (MCJ) Cancer Center, a National Cancer Institute-designated Comprehensive Cancer Center (P30 CA15083).

P30 CA15083National Institutes of Health


  • The online version of this article contains a data supplement.

  • The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

  • Submitted October 19, 2007.
  • Accepted December 31, 2007.


View Abstract