Genome-wide copy number profiling reveals molecular evolution from diagnosis to relapse in childhood acute lymphoblastic leukemia

Jun J. Yang, Deepa Bhojwani, Wenjian Yang, Xiangjun Cai, Gabriele Stocco, Kristine Crews, Jinhua Wang, Debra Morrison, Meenakshi Devidas, Stephen P. Hunger, Cheryl L. Willman, Elizabeth A. Raetz, Ching-hon Pui, William E. Evans, Mary V. Relling and William L. Carroll


The underlying pathways that lead to relapse in childhood acute lymphoblastic leukemia (ALL) are unknown. To comprehensively characterize the molecular evolution of relapsed childhood B-precursor ALL, we used human 500K single-nucleotide polymorphism arrays to identify somatic copy number alterations (CNAs) in 20 diagnosis/relapse pairs relative to germ line. We identified 758 CNAs, 66.4% of which were less than 1 Mb, and deletions outnumbered amplifications by approximately 2.5:1. Although CNAs persisting from diagnosis to relapse were observed in all 20 cases, 17 patients exhibited differential CNA patterns from diagnosis to relapse. Of the 396 CNAs observed in 20 relapse samples, only 69 (17.4%) were novel (absent in the matched diagnosis samples). EBF1 and IKZF1 deletions were particularly frequent in this relapsed ALL cohort (25.0% and 35.0%, respectively), suggesting their role in disease recurrence. In addition, we noted concordance in global gene expression and DNA copy number changes (P = 2.2 × 10−16). Finally, relapse-specific focal deletion of MSH6 and, consequently, reduced gene expression were found in 2 of 20 cases. In an independent cohort of children with ALL, reduced expression of MSH6 was associated with resistance to mercaptopurine and prednisone, thereby providing a plausible mechanism by which this acquired deletion contributes to drug resistance at relapse.


Owing to advances in risk-adapted combination chemotherapy, the great majority of children with acute lymphoblastic leukemia (ALL) will survive beyond 5 years after initial presentation of leukemia.1 However, 15% to 20% of ALL patients will have a relapse, making relapsed ALL the fifth most common childhood cancer in the United States.2 Unlike newly diagnosed ALL, the prognosis of relapsed ALL is relatively poor, even with aggressive retrieval therapies. Five-year event-free survival of relapsed ALL is only 10% to 40%, depending on the length of first remission.3 Consistent with clinical observations, Klumper et al have demonstrated that leukemia cells obtained at relapse were dramatically more resistant to many chemotherapeutic agents compared with cells from the initial tumor of the same person.4 For example, the drug concentration of prednisone needed to kill 50% of leukemia cells (LC50) increased by as much as 357-fold from diagnosis to relapse

Analyses of antigen receptor gene rearrangements indicate that, in almost all cases, the relapsed population of cells is clonally related to the original diagnostic sample.5 Thus, whereas 50% of relapsed leukemia samples exhibit a novel clonal marker that was absent at diagnosis, 94% of relapse samples share at least one marker with the corresponding initial leukemia. Similar studies comparing genetic lesions between diagnosis and relapse provide evidence for the evolution or selection of the relapse leukemic clone.6,7 Lesions specific to relapse are of particular importance because they constitute strong candidates responsible for drug resistance, which is clinically much more prominent at the time of relapse. Indeed, genetic features characteristic to relapse have been reported previously (eg, more frequent deletions of the CDKN2A locus8,9 and loss of heterozygosity of the glucocorticoid receptor).10 However, a comprehensive picture of the genetic alterations in ALL from diagnosis to relapse, especially those specific to relapse, remains elusive, and the exact genetic underpinnings of the emergence of drug resistant cells are poorly understood.

Recent breakthroughs in genomic technology now allow for a global analysis of the leukemia cell genome to determine changes in gene dose and subsequent alterations in key biologic pathways that account for the phenotypic changes associated with relapsed disease. To this end, we have used the high-resolution Affymetrix 500K single-nucleotide polymorphism (SNP) arrays to profile genome-wide DNA copy number in 60 samples comprising of matched germ line, diagnosis, and relapse samples (triplets) from 20 childhood B-precursor ALL patients who experienced a hematologic relapse.


Copy number and gene expression analysis in matched diagnosis and relapse samples

Twenty pediatric ALL patients with a hematologic relapse after first remission were studied (Tables 1 and S1, available on the Blood website; see the Supplemental Materials link at the top of the online article). Samples were obtained from the Children's Oncology Group (COG) cell bank based on availability, and patients (or parents) had consented for their use in research studies. All patients were treated for a primary B-precursor ALL on COG protocols P9904 (n = 1), P9905 (n = 11), or P9906 (n = 8, Table S1).11 In this series of protocols, patients were assigned to therapy based on National Cancer Institute risk criteria (age and presenting leukocyte count), molecular abnormalities, and presence/absence of extramedullary disease. During induction, National Cancer Institute standard-risk patients received 3 drugs and high-risk patients received 4 drugs. At the end of induction, patients were reclassified as low risk (P9904), standard risk (P9905), and high risk (P9906) and subsequently received risk-directed therapy. Patients with Philadelphia chromosome-positive ALL, patients with hypodiploidy, and infants were not included in these 3 protocols. The biologic studies described in this report were approved by Institutional Review Boards at the New York University Medical Center and St Jude Children's Research Hospital.

Table 1

Somatic copy number alterations at diagnosis and relapse in 20 childhood B-precursor ALL cases

For all 20 patients, bone marrow or peripheral blood samples were obtained at initial diagnosis (diagnosis sample), at remission (germ line sample), and at relapse (relapse sample). DNA from these samples was applied to Affymetrix GeneChip Human Mapping 250K Nsp array (Affymetrix, Santa Clara, CA) and 250K Sty array as per the manufacturer's recommendation. Raw signal intensities were first summarized by CNAT4.0 using an independent set of germ line DNA arrays (250 ALL patients enrolled on COG 9906) as reference. Signal intensities were then normalized to median intensity values consistent across chips. Segmented DNA copy numbers were inferred using DNAcopy, a circular binary segmentation algorithm developed on the R platform, and normalized to a median of 2 copies.12 Somatic copy number alterations (CNAs) in the diagnosis and relapse sample were determined by comparison with the matched germ line sample, and a selected number of CNAs were confirmed by genomic real-time polymerase chain reaction (PCR) with details described in Document S1.

Of 20 patients, 17 had RNA extracted from the diagnosis and relapse samples. Amplified cRNA was labeled and hybridized to Affymetrix U133A microarray. Data analysis was performed as previously described, and gene expression profiles for 13 patients have been published earlier.13 Expression of MSH6 was verified by transcript real-time PCR.

Detailed description of sample preparation, copy number estimation and quality control, real-time PCR verification of copy number change and gene expression, and all statistical analyses are given in Document S1.

In vitro drug sensitivity in primary ALL samples

Cytotoxicity was determined in diagnostic ALL blasts for prednisone and mercaptopurine in 152 patients enrolled on St Jude Children's Research Hospital Total Therapy Protocol XV.14 LC50 (lethal concentration to 50% of leukemia cells) was determined using the 4-day in vitro 3-[4,5-dimethylthiazole-2-yl]-2,5-diphenyl tetrazolium bromide drug resistance assay, as previously described.15 ALL cells isolated from bone marrow or peripheral blood samples of a subset of these patients were also analyzed for gene expression on the Affymetrix U133A array. The association between MSH6 expression (probe set 211450_s_at) and drug sensitivity was statistically assessed as detailed in Document S1. The actual number of patients evaluable in each analysis is described in “Results.”


Recurring copy number alterations in leukemic blasts

The genome-wide copy number analysis revealed a total of 758 somatic genetic lesions in 20 pairs of diagnosis and relapse leukemia samples (Figure 1; Table 1). The number of genetic lesions varied significantly among patients, ranging from 3 to 84 per sample. These CNAs included gross copy number changes consistent with conventional cytogenetic analysis but were mostly cryptic. Thus, the median size of CNAs identified in this study was 353 kb, with 22.7% less than 100 kb, and 66.4% less than 1 Mb. The median copy number loss per sample was 9 at diagnosis and 9.5 at relapse. Copy number gains were less common (P < .001), with a median of only 3.5 amplification events per sample at diagnosis and 4 at relapse. Across patients, there was a slight increase of CNAs at relapse (P = .035).

Figure 1

Overview of somatic copy number changes in 20 childhood ALL patients. (A) Comparison of copy number changes (relative to germ line) in the matched diagnosis and relapse samples. For each sample (column), copy number changes are indicated by color (blue indicates loss; red, gain) from chromosome 1 to X. Each row represents a segment of the genome. Sample type is indicated by the color bars at the top of each column (orange indicates diagnosis; green, relapse), and numbers denote patient ID. (B) Frequency (number of the affected cases) of somatic copy number gains (red, left panel) and copy number losses (blue, right panel). CNAs in the diagnosis samples are indicated above the baseline (dark red and dark blue), whereas CNAs in the relapse samples are below the baseline (bright red and bright blue). CNAs are mapped according to their chromosomal position, from chromosome 1 to X.

Although all 44 autosome arms showed one or more CNAs, a number of regions appeared to be affected more frequently (Figure 1B). The most common CNA events were deletions at 9p21.3, occurring in 12 of 20 (60.0%) cases and persisting from diagnosis to relapse (Figure 2A). Of these 12 cases, 11 exhibited deletion of both CDKN2A and CDKN2B, whereas one patient had deletion of the former only. In line with prior reports,8,9 we also observed that patients with deletion of CDKN2A at diagnosis experienced a significantly shorter first remission than patients with wild-type CDKN2A (P = .008, Figure S1). Of interest, patient 13 exhibited a homozygous deletion of CDKN2A at diagnosis but only hemizygous deletion at relapse (confirmed by genomic real-time PCR, Table S3). However, this patient's diagnosis and relapse samples shared 9 CNAs, indicating common clonality. CNAs involving several transcription regulators essential in early lymphoid specification and B lineage commitment16 (PAX5, EBF1, and IKZF1) were also common in this relapsed ALL cohort. Of 20 patients, deletion of PAX5 at 9p13.2 was observed in 7 (35.0%) cases at both diagnosis and relapse, 2 of which included flanking genes and 5 involved only the PAX5 gene (Figure 2B). Somewhat surprisingly, patient 5 exhibited PAX5 deletion only at diagnosis but not at relapse (Table S3). Similar to patient 13 described in this paragraph (for CDKN2A), this patient's diagnosis and relapse samples also shared 16 CNAs, indicating common clonality. Copy number losses of IKZF1 (7p12.2) were present at both diagnosis and relapse in 5 patients, but 2 additional patients developed IKZF1 deletion at relapse (Figure 2C; Table S3). Similarly, focal deletion of EBF1 (5q33.3) was shared at diagnosis and relapse in 3 cases and present only at relapse in 2 additional cases (Figure 2D; Table S3).

Figure 2

Somatic copy number alternations of CDKN2A/B, PAX5, IKZF1, and EBF1. Copy number heatmaps at 9p21.3 (A), 9p13.2 (B), 7p12.2 (C), and 5q33.3 (D) are shown by patient (orange indicates diagnosis; green, relapse; from left to right: patients 1 to 20).

Comparison of genomic alterations between diagnosis and relapse

A systematic enumeration of CNA events in the matched diagnosis and relapse samples revealed features that are common and those that differ at these 2 time points. As summarized in Table 1, of 74 copy number gains observed in 20 diagnosis samples, 71 (94.7%) persisted in the relapse leukemia cells from the same person. Likewise, 256 of 288 (88.9%) copy number loss events at diagnosis remained at relapse. Conversely, 24 novel amplifications and 45 novel deletions arose at relapse, accounting for 25.0% and 14.9% of total copy number gains and losses at relapse, respectively. Together, all 20 cases exhibited genetic lesions persisting from diagnosis to relapse (median, 22 shared CNAs per patient), whereas 17 of 20 patients (87.5%) either lost or acquired genetic lesions from diagnosis to relapse (median, 3 diagnosis-specific or relapse-specific CNAs per patient). It should also be noted that the majority of the diagnosis- or relapse-specific CNAs were focal, with a median size of 537 kb. However, an early relapse case (patient 19) exhibited novel large-scale copy number gains (trisomy of 5, 8, 10, 17, and 21) at the time of relapse, whereas his primary leukemia was karyotypically normal.

To quantitatively assess the degree of genotypic change in leukemia cells from diagnosis to relapse, we performed an unsupervised hierarchical clustering analysis in which all 40 tumor samples were grouped on the basis of CNA pattern. Not surprisingly, in all 20 cases, paired diagnosis and relapse samples from the same person clustered together (Figure 3), consistent with the distinctive nature of individual cases. Further, 19 pairs (except patient 1) clustered into 3 groups with distinct CNA patterns, and the average time to relapse differed significantly among groups (P < .01, Figure 3 bottom panel). Thus, most members of group 1 exhibited persisting deletions on 9p and had shortest first remissions (< 20 months). Cases in group 3 were characterized by trisomy of chromosome 21, and all experienced a relatively prolonged first remission (> 20 months in all cases). The same clustering pattern was also observed when only the diagnosis samples were included in the analysis (Figure S2), suggesting that certain somatic CNA lesions present at diagnosis might be predictive of whether a patient would relapse early vs late. Common clonal origin of the recurrent leukemia was also suggested by matched copy number patterns between diagnosis and relapse at 6 TCR/Ig loci (Figure S3).

Figure 3

Unsupervised hierarchical clustering of diagnosis and relapse samples based on CNA pattern. In all 20 cases, paired diagnosis and relapse samples from the same person clustered next to each other. Nineteen cases clustered into 3 groups, and the average time to relapse (in months) differed significantly. Numbers at the bottom of the heatmap indicate patient ID (orange indicates diagnosis; green, relapse).

In addition to the DNA level analyses, we also profiled gene expression in 17 of these 20 matched pairs to investigate whether copy number changes from diagnosis to relapse are reflected by concordant alterations in gene expression (Figure S4). There was a significant correlation between the change in DNA copy number and the change in gene expression from diagnosis to relapse (P = 2.2 × 10−16). In addition, across patients, correlations in gene expression (∼ 14 500 genes) were observed between diagnosis and relapse (r2 ranged from 0.18 to 0.73). Consistent with our earlier observations,13 correlation coefficients by gene expression decreased gradually from early relapse cases to late relapse cases, exhibiting an inverse correlation with time to relapse (P = .029, Figure S5).

MSH6 deletion at relapse is associated with drug resistance

Genetic lesions acquired at relapse are more likely to account for resistance to retrieval therapy as clinical drug resistance is generally more prominent after disease recurrence. We therefore focused on relapse-specific CNAs that were both focal and recurrent (occurred in more than one patient). One CNA that met such criteria involved a region of chromosome 2p16.3, which was hemizygously deleted in a relapse-specific manner in 2 of 20 patients (Figure 4A). Both deletions were focal (affecting 2 and 4 genes, respectively) and overlapped at the MSH6 gene, whereas one even extended upstream into the MSH2 gene. Real-time PCR of MSH6 in both DNA and RNA derived from these 2 patients confirmed SNP array-inferred copy number loss and consequent reduction of the MSH6 expression (Figure 4B). MSH6 is a critical component of the cellular mismatch repair machinery and is implicated in tumorigenesis17 and resistance to thiopurines and DNA alkylating agents.18 To explore the relevance of the relapse-specific MSH6 deletion to drug resistance, we examined the relationship between MSH6 gene expression and mercaptopurine sensitivity in an independent cohort of children with ALL (Figure 4C). MSH6 expression was significantly lower in leukemic blasts resistant to mercaptopurine than in sensitive cells (n = 66, P = .006). In addition, prednisone sensitivity was also inversely related to MSH6 expression (n = 51, P = .028). This significant association between MSH6 expression and glucocorticoid sensitivity was replicated in a previously published dataset,19 including 176 childhood ALL samples (P = .008, Figure S8).

Figure 4

MSH6 and drug resistance. (A) DNA copy number at 2p16.3, as inferred by SNP array. Each column represents a sample (orange indicates diagnosis; green, relapse; from left to right: patients 1-20). (B) Relative DNA copy number loss and reduced RNA expression of MSH6 in the 2 diagnosis/relapse pairs were confirmed by real-time PCR. (C) MSH6 gene expression and mercaptopurine sensitivity in primary ALL samples (n = 66). Samples with LC50 more than or less than 2 mM were considered as resistant (n = 11) or sensitive (n = 55), respectively. (D) MSH6 gene expression and prednisone sensitivity in primary ALL samples (n = 51). Samples with LC50 more than or less than 4 μM were considered as resistant (n = 17) or sensitive (n = 34), respectively. Boxes include data between the 25th and 75th percentiles, and whiskers indicate the minimal and maximal values excluding the outliers.


Relapsed ALL remains a formidable challenge in pediatric oncology. The etiology of de novo and/or acquired drug resistance at relapse is largely unknown, and further intensification of therapy is not likely to improve outcomes. Matched samples collected at multiple time points (eg, diagnosis, remission, and relapse) offer unique opportunities for characterizing pathways leading to disease recurrence and drug resistance. Prior studies of loss of heterozygosity using microsatellite markers20 or low-density arrays10 (10K) report similarity as well as differences between diagnosis and relapse. Whereas both studies confirmed deletions at 9p as the most common genetic lesions in ALL at both time points, the extent of concordance between diagnosis and relapse was probably underestimated primarily because of the low resolution and inability of these allelotype analyses to discriminate copy number gains and uniparental disomy. Although these studies also reported genetic lesions that were specific to relapse, such as loss of heterozygosity of the glucocorticoid receptor, none was observed in more than one case.

The acquisition of resistance-conferring mutations induced by initial treatment may be responsible for the relative drug resistance noted at relapse. Alternatively, the relapse population may be derived from a minor population already present at diagnosis and arises as a result of selection pressures imposed by treatment. Several recent studies report compelling evidence in favor of the latter hypothesis. For instance, TCR/Ig rearrangement patterns at multiple time points during and after therapy reveal that “relapse clones” can be present at diagnosis but usually at a very low level.7 During initial therapy, this minor population exhibits only moderate reduction relative to the bulk of diagnostic leukemic cells but rapidly expands before clinical relapse. Consistent with this notion, we observed that patient 13 had a homozygous deletion of CDKN2A at diagnosis but hemizygous deletion at relapse, but he also exhibited multiple CNAs shared at diagnosis and relapse. Because reacquisition of CDKN2A from a homozygous deletion is doubtful, we speculate that 2 leukemic subclones (distinguished by deletion status at CDKN2A locus) were present at diagnosis, and both were derived from a common tumor-initiating cell (characterized by the shared CNAs).

Overall, we found a high concordance between genomic lesions at the DNA level with mRNA expression levels. It is also noteworthy that the number of genes demonstrating copy number changes from diagnosis to relapse appeared to be small compared with those observed in genome-wide gene expression studies.13,21 This is not entirely unanticipated because gene regulation at the transcription level is much more diverse than that on the DNA level. For instance, genes exhibiting normal diploidy can still be up-regulated/down-regulated on the mRNA and protein levels. However, genetic lesions detected at the DNA level are conceivably more stable than changes at the mRNA level.

Somatically acquired deletions at CDKN2A/B, PAX5, EBF1, and IKZF1 loci have been previously reported in pediatric ALL, with the prevalence varied in different molecular subtypes.8,9,2224 However, the frequencies of these lesions (except PAX5) appeared to be higher in the relapsed cases analyzed here relative to newly diagnosed B-precursor ALL22: CDKN2A, 60.0% vs 33.9% (P = .038); EBF1, 25.0% vs 4.2% (P = .001); and IKZF1: 35.0% vs 8.9% (P = .002). We also identified cases where deletions of EBF1 and IKZF1 were seen exclusively at relapse, possibly indicating a role in disease recurrence. Correlations of these particular genomic lesions at diagnosis to risk of relapse have not been investigated so far. Our cohort consisted of patients who eventually relapsed, but it would be informative to study the relative frequency of these CNAs in patients who relapse vs those who do not. In addition, we did not observe an association of mRNA expression of these genes (CDKN2A/B, PAX5, IKZF1) with drug sensitivity in an independent cohort of ALL samples, with the exception of CDKN2A/B, which exhibited a inverse correlation between expression and prednisone IC50 (P = .022, data not shown). Although the number of cases is small and the majority of cases did not exhibit known molecular abnormalities, these findings nonetheless raise the question as to whether deletion of key regulators of early B-cell development, such as IKZF1 and EBF1, carry any potential prognostic value at diagnosis.

Other than more frequent deletions of the CDKN2A locus at relapse,9,10 attempts to characterize genetic alterations that may be responsible for the emergence of drug-resistant cells have failed to identify additional common abnormalities. This can be attributed to incomplete coverage of the genome in low resolution studies but may also reflect the extreme diversity in the development of drug resistance. In this study, however, we have described an aberration in the mismatch repair system (MMR), that is, loss of MSH6, as a recurrent relapse-specific alteration. MSH6, a component of the MutSα complex, is required for recognition of the anomalous DNA structure arising from incorporation of thiopurines.25 Indeed, MSH6 was the most frequently mutated gene identified in recessive genetic screen for determinants of 6-thioguanine resistance.18 This is clinically relevant because most childhood ALL treatment regimens include prolonged use of thiopurines26 (eg, 6-mercaptopurine or 6-thioguanine), which may lead to the selection of leukemic cells with defective MMR (eg, relapse-specific deletion of MSH6). Additional evidence also exists that DNA repair ability is markedly reduced at relapse relative to leukemic cells at diagnosis.27,28 Somewhat unexpectedly, we observed significant correlation between MSH6 expression and glucocorticoid sensitivity. It can be argued that cells with defective MMR probably accumulate additional mutations that may lead to resistance to glucocorticoids, despite MMR itself not directly involved in glucocorticoid-induced cell death.29

In conclusion, we have described recurrent as well as unique molecular alterations that evolve from diagnosis to relapse in childhood ALL. These changes provide biologic insights into the development of relapsed disease and subsequent resistance to retrieval therapy. The data generated from this study combining genome-wide copy number alterations and transcript expression in paired diagnosis and relapse samples provide a unique opportunity to dissect pathways and identify potential therapeutic strategies for relapsed childhood ALL.

Figure S1

Supplementary PDF file available online.

Figure S2

Supplementary PDF file available online.

Figure S3

Supplementary PDF file available online.

Figure S4

Supplementary PDF file available online.

Figure S5

Supplementary PDF file available online.

Figure S8

Supplementary PDF file available online.

Table S1

Supplementary PDF file available online.

Table S3

Supplementary PDF file available online.

Document S1

Supplementary PDF file available online.


Contribution: J.J.Y. and D.B. performed research, analyzed data, and wrote the paper; W.Y. and J.W. analyzed data; X.C. and D.M. performed research; G.S. and K.C. performed research and analyzed data; M.D. identified specimens and provided clinical data; S.P.H. designed research and identified specimens; C.L.W. and E.A.R. designed research; C.-h.P. provided validation data; W.E.E. designed research and provided validation data; M.V.R. and W.L.C. were the principal investigators, designed research, and wrote the paper.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: William L. Carroll, New York University Cancer Institute, Stephen D. Hassenfeld Children's Center, 160 East 32nd Street, 2nd Floor, New York, NY 10016; e-mail: William.carroll{at}


This work was supported by the National Cancer Institute (NCI; SPEC U01 CA114762), the Penelope London Foundation, the Friedman Fund for Childhood Leukemia, the Walter Family Pediatric Leukemia Fund, the Pediatric Cancer Foundation (CA093552-02, NCI CA 51 001, CA 78 224, CA21765), and the National Institutes of Health (NIH)/National Institute of General Medical Sciences Pharmacogenetics Research Network and Database (U01 GM61393, U01GM61374; from the NIH, American Lebanese Syrian Associated Charities, and CureSearch.

U01 CA114762U01 GM61393U01GM61374National Institutes of Health


  • *J.J.Y. and D.B. contributed equally to this work.

  • The online version of this article contains a data supplement.

  • The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

  • Submitted June 23, 2008.
  • Accepted August 10, 2008.


View Abstract