Genome-wide association study identifies germline polymorphisms associated with relapse of childhood acute lymphoblastic leukemia

Jun J. Yang, Cheng Cheng, Meenakshi Devidas, Xueyuan Cao, Dario Campana, Wenjian Yang, Yiping Fan, Geoff Neale, Nancy Cox, Paul Scheet, Michael J. Borowitz, Naomi J. Winick, Paul L. Martin, W. Paul Bowman, Bruce Camitta, Gregory H. Reaman, William L. Carroll, Cheryl L. Willman, Stephen P. Hunger, William E. Evans, Ching-Hon Pui, Mignon Loh and Mary V. Relling


With the use of risk-directed therapy for childhood acute lymphoblastic leukemia (ALL), outcome has improved dramatically in the past 40 years. However, a substantial portion of patients, many of whom have no known risk factors, experience relapse. Taking a genome-wide approach, in the present study, we evaluated the relationships between genotypes at 444 044 single nucleotide polymorphisms (SNPs) with the risk of relapse in 2535 children with newly diagnosed ALL after adjusting for genetic ancestry and treatment regimen. We identified 134 SNPs that were reproducibly associated with ALL relapse. Of 134 relapse SNPs, 133 remained prognostic after adjusting for all known relapse risk factors, including minimal residual disease, and 111 were significant even among patients who were negative for minimal residual disease after remission induction therapy. The C allele at rs7142143 in the PYGL gene was associated with 3.6-fold higher risk of relapse than the T allele (P = 6.7 × 10−9). Fourteen of the 134 relapse SNPs, including variants in PDE4B and ABCB1, were also associated with antileukemic drug pharmacokinetics and/or pharmacodynamics. In the present study, we systematically identified host genetic variations related to treatment outcome of childhood ALL, most of which were prognostic independent of known risk factors for relapse, and some of which also influenced outcome by affecting host dis-position of antileukemic drugs. All trials are registered at or (COG P9904: NCT00005585; COG P9905: NCT00005596; COG P9906: NCT00005603; St Jude Total XIIIB: NCI-T93-0101D; and St Jude Total XV: NCT00137111).


In childhood acute lymphoblastic leukemia (ALL), risk-adapted combination therapy has led to dramatic improvements in outcome, with 5-year survival rates of more than 85% in most industrialized countries.1 However, ALL relapse still carries a very poor prognosis and the majority of relapses continue to occur in patients without apparent “high-risk” features.24 Therefore, more accurate risk classification of newly diagnosed disease is needed to reduce relapses and improve the overall outcome of ALL.

Prior efforts to improve risk stratification primarily focused on genetic variations of the tumor or on assessment of early antileukemic response; for example, the upfront prednisone response5 and minimal residual disease (MRD) after remission induction.6 The role of inherited genetic variations on treatment response in children with ALL has received increasing attention.7 For example, candidate gene studies have suggested that genetic polymorphisms in metabolizing enzymes (GSTM1) and transporters (ABCC4) of antileukemic agents affect treatment outcome.8,9 Recent studies have also identified associations between sequence variations in cytokine and chemokine genes and early treatment response,7,10 indicating tumor-microenvironment interaction as one plausible mechanism by which host genetic variations affect ALL outcome.11

Prognostic features of ALL are strongly dependent on treatment, and the lack of “replication” of pharmacogenetic findings in chemotherapy trials (and particularly outside of the context of clinical trials) may be attributed to modest differences in therapy among trials. To maximize the opportunity of identifying genotype-phenotype associations generalizable to diverse treatment regimens, in the present study, we applied an iterative “discovery versus replication” genomic screen to a core panel of frontline ALL trials in the Children's Oncology Group (COG) and at St Jude Children's Research Hospital. Our goal was to agnostically test single nucleotide polymorphisms (SNPs) throughout the genome for their association with the risk of relapse in a large group of children with ALL (N = 2535) from diverse racial/ethnic backgrounds.


Patients and treatment

Germline DNA samples were collected at remission in children with newly diagnosed ALL treated on St Jude Children's Research Hospital Total Therapy XIIIB12 and XV13 or the COG P990614 or COG P9904/P9905 studies14 (Table 1). The studies were approved by the institutional review boards and informed consent was obtained from the parents, the guardians, or the patients in accordance with the Declaration of Helsinki. ALL treatment regimens were as described for the St Jude12,13 and COG14 protocols.

Table 1

Patient characteristics

Genotyping and genetic ancestry

Genotyping and quality control were performed as described previously. Briefly, germline DNA was genotyped using either the Affymetrix GeneChip human Mapping 500K sets or SNP6.0 array. Genotyping calls were determined using the BRLMM15 or the Birdseed algorithms16 for the 500K array and the SNP6.0 array, respectively. Genotypes were coded assuming an additive genetic model (based on the number of B alleles). Only SNPs common to both Affymetrix array platforms were investigated in this study. We also excluded 26 303 SNPs because of low minor allele frequency (< 0.5%) or poor call rate (< 95%), and 136 patients because of poor genotyping (call rate < 95%).

European, African, Asian, and Native-American genetic ancestries were estimated using STRUCTURE,17 with HapMap CEU, YRI, CHB/JPT, and a cohort of Native Americans18 as reference ancestral populations.

GWAS for germline SNP genotypes related to risk of relapse

For this genome-wide association study (GWAS), relapse was defined as disease recurrence in bone marrow and/or extramedullary sites. Lineage switch, second malignancy, and death in remission were incorporated in the analyses as competing events. All genotype-relapse association analyses were stratified by 9 risk-adapted treatment arms: St Jude Total XIIIB low risk,12 St Jude Total XIIIB high risk,12 St Jude Total XV low risk,13 St Jude Total XV standard/high risk,13 COG P9906, and COG P9904/9905 regimens A, B, C, and D.14 Known risk factors (leukocyte count [≥ vs < 50 000/μL [age ≥ vs < 10 years], and the presence or absence of molecular ALL subtypes [including MLL rearrangements; ETV6-RUNX1, TCF3-PBX1, or BCR-ABL1; DNA index [≥ vs < 1.16]; and MRD [negative, positive, and high positive as defined below]) were included in the multivariate analysis together with SNP genotype.

To maximize the statistical power for detecting associations between SNP genotype and relapse risk, we performed a GWAS inclusive of all 2535 patients. SNPs were individually tested for association with relapse using a 2-step procedure. First, the Gray test (stratifying on treatment arms) was applied to each SNP and, second, those SNPs achieving P < .05 were subsequently analyzed using the Fine and Gray hazard rate regression model19 including both genetic ancestry (European, African, Asian, and Native American) and treatment arm as covariates, and genotypes were treated as ordinal variables under an additive genetic model. To balance false-positive and false-negative error rates, we examined the profile information criteria20 and determined that the P value threshold was 4.4 × 10−3 to declare SNPs significantly related to relapse hazard. To further prioritize the top relapse SNPs, we applied an iterative resampling procedure to determine whether a SNP was reproducibly associated with relapse (Figure 1). We first split 2535 patients at a 1:1 ratio into a discovery and a replication cohort, balancing on treatment arm, genetic ancestry, CNS disease status, and treatment outcome. SNPs were individually tested in the discovery cohort for association with relapse by the 2-step procedure (Gray test followed by the Fine and Gray hazard regression model) as described in this paragraph. Significant SNPs (at P < 4.4 × 10−3) from the discovery cohort GWAS were tested for replication. In the replication cohort, a SNP was considered validated if its genotype was associated with relapse at the 0.05 significance level by both the Gray test and the hazard rate regression model. This discovery-replication procedure was repeated 100 times (Figure 1), with each of iterations generating a list of replicated SNPs. Those SNPs that were replicated at least 10 times exceeded the frequency that would be expected by chance (P = .028 by the binomial model [100, 0.05]) and were designated as “relapse SNPs.” These SNPs were selected for further analyses with additional phenotypes.

Association of relapse SNPs with additional prognostic phenotypes

Clinical presenting features.

Patient characteristics at diagnosis have been associated with relapse risk. In the present study, we assessed whether genotypes associated with relapse were also related to these prognostic patient characteristics. For each of 134 relapse SNPs, we tested the association between genotype and presenting leukocyte count ≥ versus < 50 000/μL, age ≥ versus < 10 years, and ALL blast DNA index ≥ versus < 1.16 by Fisher test. The association between SNP genotypes and molecular ALL subtypes (MLL rearrangements, ETV6-RUNX1, TCF3-PBX1, or BCR-ABL1, or T-ALL) was tested using the χ2 test.


Of 2535 patients included in the GWAS of relapse, MRD status was available at the end of remission induction therapy (day 28 in the COG studies and day 46 in the St Jude studies) in 2289 children. MRD was measured by flow cytometry and classified as negative (< 0.01%), positive (≥ 0.01% but < 1%), or high-positive (≥ 1%) for St Jude patients. In COG patients, the MRD classification was nearly identical: negative (≤ 0.01%), positive (> 0.01% but ≤ 1%), or high-positive (> 1%). The Spearman rank correlation test was used to determine the association between genotypes and MRD, both of which were treated as ordinal variables.

Antileukemic drug disposition.

SNPs that have been previously associated with 4 pharmacokinetic and pharmacodynamic phenotypes were analyzed for their overlap with relapse SNPs in the present analyses. These SNPs were identified in GWASs from a subset of patients enrolled on St Jude Total Therapy XIIIB and XV protocols. Methotrexate plasma clearance was assessed in 699 children with ALL,21 and the GWAS adjusted for ancestry, dosage schedule, and sex. Those SNPs with P < .05 for their association with clearance were assessed for overlap with SNPs associated with relapse. Intracellular methotrexate polyglutamate accumulation in ALL blasts at 44 hours after upfront methotrexate therapy was determined in 144 newly diagnosed patients who received preinduction therapy with IV methotrexate.22 Those SNPs with P < .05 for their association with methotrexate polyglutamates (adjusting for subtype, ancestry, and treatment) were assessed for overlap with relapse SNPs. Dexamethasone apparent oral plasma clearance was determined in 334 patients at week 7 of continuation therapy23; those SNPs with P < .05 for their association with apparent oral clearance (adjusting for age, ancestry, and treatment arm) were assessed for overlap with relapse SNPs. Asparaginase antibody levels were measured in 403 patients using ELISA serially during therapy, and the total cumulative asparaginase Ab area under the curve was estimated23; those SNPs with P < .05 for their association with Ab exposure were assessed for overlap with SNPs associated with relapse. Associations between genotype and all 4 pharmacokinetic phenotypes were evaluated by linear regression tests.

Statistical analysis

Statistical and computational analyses were performed using “R” Version 2.9.1 software ( and SAS Version 9.2 software.


Identification of genomic loci associated with ALL relapse risk

Using an iterative resampling strategy, we investigated which of the 444 044 germline SNP genotypes were reproducibly associated with relapse across treatment regimens (Figure 1). Dividing the 2535 patients into a discovery and a replication cohort at a 1:1 ratio, we used the discovery cohort to perform a genome-wide screen and then filtered SNPs based on their association in the replication cohort (Figure 1). A total of 134 SNPs were successfully replicated in at least 10 of 100 rounds of discovery-replication tests (the top 25 SNPs are shown in Table 2 and a full listing is provided in supplemental Table 1, available on the Blood Web site; see the Supplemental Materials link at the top of the online article). These 134 SNPs, representing 88 independent genomic loci (pairwise r2 < 0.5, supplemental Figure 1), were also significant in a GWAS inclusive of all 2535 patients (P < .004) and thus were prioritized for subsequent analyses. A total of 73 of 134 SNPs were annotated to 43 genes (within 10 kb of a gene), all of which were intronic except for a synonymous coding variant in the ATP8A2 gene (rs6491066).

Figure 1

Iterative resampling approach to identify 134 SNPs reproducibly associated with ALL relapse. A total of 2535 children with newly diagnosed ALL were split into a discovery and a validation cohort at a 1:1 ratio with balanced representation of treatment and clinical features. GWAS was performed on the discovery cohort and then on filtered SNPs based on replication in the remaining patients (replication cohort). Resampling was performed for 100 iterations and 134 SNPs were selected as “relapse SNPs” because they were successfully replicated in multiple rounds of resampling. Ip indicates information profiling (see “GWAS for germline SNP genotypes related to risk of relapse”).

Table 2

Top 25 SNPs associated with ALL relapse identified through genome-wide association and multiple rounds of resampling

Across the genome, the strongest association with relapse risk was observed at 14q22.1 in the PYGL gene (rs7142143). Each copy of the C allele at this PYGL intronic SNP (rs7142143) rendered a 3.6-fold increase in the risk of relapse (P = 6.7 × 10−9, Figure 2) and an association of this SNP with relapse was replicated in 79 of 100 rounds of resampling (Table 2).

Figure 2

Association of genotypes at the PYGL SNP (rs7142143) with the risk of ALL relapse. The cumulative incidence of any relapse was compared for each genotype group at rs7142143 (CC/CT or TT) in all patients (A) and in those patients negative for MRD at the end of remission induction (B). The P value was estimated using the Fine and Gray hazard regression model.

Many GWAS investigations of treatment response neglect to consider the confounding of germline SNP genotypes with known prognostic factors. In the present study, of 134 relapse SNPs, 73 were associated with one or more clinical presenting features that are recognized as prognostic in childhood ALL (P < .05, supplemental Table 2). For example, 32 SNPs had alleles that were related to leukocyte count ≥ 50 000/μL at diagnosis and to relapse risk; genotypes at 19 SNPs were enriched in children older than 10 years and associated with relapse; 16 SNPs were associated with hyperdiploid ALL (DNA index ≥ 1.16) and with a lower risk of relapse. In addition, 61 of 134 SNPs were associated with ALL molecular subtype and/or lineage (MLL rearrangements; ETV6-RUNX1, TCF3-PBX1, or BCR-ABL1; or T-cell ALL). However, 133 of 134 relapse SNPs remained prognostic after adjusting for all known risk factors (presenting features and MRD, supple-mental Table 1), indicating their independent associations with relapse risk.

Associations of relapse SNPs with MRD

Early reduction of leukemia burden, as determined by MRD status at the end of remission induction therapy, is highly prognostic for ALL relapse. However, the majority of relapses still occur among patients with negative MRD status.14 Therefore, in the present study, we characterized the relationships between the 134 relapse SNPs with this early treatment response measure and determined the prognostic value of germline polymorphisms. Of 134 relapse SNPs, 133 (99%) remained prognostic after adjusting for MRD (P < .05) and 110 (82%) were significantly associated with relapse even within MRD patients (eg, rs7142143, which is shown in Figure 2B with full listing in supplemental Table 1), strongly suggesting that these genetic variants substantially contribute to interpatient variability in outcome beyond what is explained by MRD. Of 134 relapse SNPs, 34 were significantly associated with MRD, with the alleles related to higher MRD status always linked to higher relapse risk. For example, genotypes at rs10834571 in the LUZP2 gene were most strongly associated with MRD (P = 3 × 10−4). Although the G allele at this SNP was related to higher MRD at the end of remission induction and a subsequent higher risk of relapse, it also identified patients at risk of relapse even within the MRD patient population (supplemental Figure 2).

Relationships between relapse SNPs and pharmacologic phenotypes

To explore the mechanisms by which SNPs might influence treatment outcome of ALL, we examined the association of the 134 relapse SNPs with 4 pharmacokinetic and pharmacodynamic endophenotypes available in a subset of patients: methotrexate plasma clearance, intracellular accumulation of polyglutamated (active) methotrexate, dexamethasone plasma clearance, and asparaginase antibody levels. Three relapse SNPs were associated with methotrexate plasma clearance after high-dose intravenous methotrexate, and all 3 genotypes were related to low clearance (more drug exposure) and lower risk of relapse. Genotypes at 7 SNPs were associated with polyglutamated methotrexate levels in ALL blasts, 5 of which were associated with reduced levels of methotrexate polyglutamate and a higher relapse risk, including 3 SNPs within the PDE4B gene. Dexamethasone plasma clearance differed significantly by genotype at 4 relapse SNPs, 2 of which were within the ABC transporter gene ABCB1 and both associated with higher dexamethasone clearance and a higher relapse risk (Figure 3). Finally, 5 relapse SNPs were related to higher asparaginase Ab levels, 2 of which were associated with a higher relapse rate. As summarized in Table 3, 14 of 134 relapse SNPs were significant for at least 1 of the 4 pharmacologic phenotypes in a manner consistent with a pharmacokinetically intuitive association with relapse (ie, lower drug exposure translated into a higher risk of relapse).

Figure 3

An example of relapse-associated SNPs affecting pharmacokinetics and pharmacodynamics of antileukemic agents. Genotype at ABCB1 SNP rs10264856 is associated with both ALL relapse (A) and dexamethasone apparent oral plasma clearance (B). Note that the C allele is linked to lower dexamethasone clearance and also lower cumulative incidence of relapse. The association of SNP genotype with clearance and with relapse was estimated by linear regression and the Fine and Gray hazard regression model, respectively. Dexamethasone clearance was determined in St Jude Total XV protocol at week 7 of continuation therapy, which is shown for those in the standard-/high-risk arm.

Table 3

Relationships between relapse-associated SNPs and pharmacokinetics and pharmacodynamics of antileukemic drugs


Although current risk classification schemes identify children with more aggressive ALL, a substantial portion of these high-risk patients still do not respond to more intensive chemotherapy. Conversely, a substantial portion of the children with ALL who eventually relapse are initially classified as being low risk at diagnosis.24 For example, even for MRD (the best clinical predictor of relapse in childhood ALL), more than 50% of relapse cases exhibit undetectable levels of leukemia burden after initial therapy and therefore cannot be reclassified for more intense treatment.3,4,14 For these reasons, it was particularly impressive that in the present study, germline variations distinguished patients at risk of relapse even within the group showing negative MRD, indicating that these genetic factors can significantly improve current models of risk stratification.

Capitalizing on the nearly population-based capture of childhood ALL patients in the United States to clinical trials led by the COG and St Jude, our present study systematically identified germline genetic variations associated with risk of relapse among 2535 children with ALL, the largest group of children with cancer ever studied for genetic determinants of outcome. Nevertheless, the number of patients receiving identical therapy is too small to separate patients into simple “discovery versus replication” cohorts for traditional genome-wide outcome analyses. Because therapy so strongly affects the likelihood of relapse, each discovery and replication cohort must be balanced for therapy arms. Therefore, we developed an iterative resampling approach25,26 in which patients were divided into a discovery and a replication cohort 100 times, each time balancing on treatment arm, ancestry, CNS status, and treatment outcome. By conducting the study in this manner, the replicated SNPs would be expected to be robust for multiple types of therapy. In fact, 133 of 134 relapse SNPs selected using this approach maintained prognostic significance after adjusting for treatment regimens and patient characteristics.

The top-ranked SNP is in the PYGL gene. PYGL, glycogen phosphorylase, is a target of adenosine monophosphate (, which plays a critical role in response to antileukemic agents such as mercaptopurine and methotrexate.27 Interestingly, PYGL expression in diagnostic ALL blasts was shown to be positively correlated with in vitro response to prednisolone in 173 children with ALL (P = .002).28 PYGL has also been shown to be significantly overexpressed in a multidrug-resistant cancer cell line.29 Another interesting hit among the top-ranking genes is PDE4B, which encodes phosphodiesterase 4B, the primary regulator of cyclic AMP signaling in B lymphocytes.30,31 Prior studies have already shown that inhibition of PDE4B induces apoptosis in chronic lymphoblastic leukemia and diffuse large cell lymphoma32,33 and sensitizes cells to glucocorticoid-induced cell death.34,35 In ALL, pharmacologic inhibition of the PDE4 results in growth suppression and dexamethasone sensitivity,36 suggesting glucocorticoid response as a plausible mechanism by which PDE4B is linked to ALL relapse.

Over the past 50 years of studying prognostic features for ALL, many have been associated with relapse (eg, high leukocyte count, age ≥ 10 years, and blasts that are not hyperdiploid or carry unfavorable translations such as MLL rearrangements or BCR-ABL1); however, the mechanisms by which these features confer a higher relapse risk after chemotherapy remains mostly unclear.1 In the present study, we did find that a majority (73 of 134) of relapse SNPs were related to one or more clinical presenting features. Although nearly all SNPs remained prognostic after adjusting for those features, there may nevertheless be mechanisms by which germline polymorphisms associate with both poor outcome and with presenting ALL features.

We found herein that 14 of the 134 relapse SNPs (11%) were also associated with unfavorable pharmacokinetics of commonly used antileukemic agents, suggesting that some inherited variation is likely to affect relapse by affecting host disposition of antileukemic agents. Two SNPs annotated to ABCB1, rs10264856 and rs4728709, are in linkage disequilibrium with each other (r2 > 0.9) and were associated with increased dexamethasone apparent oral clearance. This would translate into lower plasma levels of dexamethasone, a phenotype that we have shown previously to be related to higher risk of ALL relapse.23 Interestingly, many drugs that are used to treat ALL, including glucocorticoids such as dexamethasone,37 anthracyclines,38 and vincristine,38 are substrates for the ABCB1 transporter (also known as MDR1 or multidrug resistance protein), and thus this association may be plausibly linked to both mechanisms of intrinsic blast resistance and to unfavorable host pharmacokinetic characteristics. Candidate SNPs in ABCB1 have been shown previously to be associated with outcome in ALL.39,40

We conclude that germline SNPs were associated with risk of ALL relapse in this large cohort of newly diagnosed children with ALL, even among patients who were negative for MRD, so inherited SNP genotypes may be a useful additional feature for risk classification in this disease.


Contribution: J.J.Y., C.C., and M.V.R. conceived and designed the study; M.D., D.C., Y.F., G.N., M.J.B., N.J.W., P.L.M., W.P.B., B.C., C.L.W., S.P.H., W.E.E., C.-H.P., M.L., and M.V.R. acquired the data; J.J.Y. and M.V.R. wrote the manuscript; J.J.Y., D.C., G.N., C.L.W., S.P.H., W.E.E., C.-H.P., and M.V.R. critically revised the manuscript for important intellectual content; C.C., M.D., X.C., W.Y., N.C., and P.S. performed the statistical analysis; and D.C., M.J.B., G.H.R., W.L.C., S.P.H., W.E.E., and M.V.R. obtained funding for the study.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Mary V. Relling, St Jude Children's Research Hospital, Department of Pharmaceutical Sciences, 262 Danny Thomas Pl, Memphis, TN 38105; e-mail: mary.relling{at}


The authors thank all patients and their parents who participated in the St Jude Total Therapy XIIIB/XV and COG P9906/P9904/P9905 studies; clinicians and research staff at the St Jude and COG institutions; Dr Jeannette Pullen from the University of Mississippi at Jackson and Dr Andrew Carroll from the University of Alabama at Birmingham for assistance in classification of patients with ALL; and Dr Mark Shriver at the Pennsylvania State University for sharing SNP genotype data of the 105 Native-American references.

This work was supported by the National Institutes of Health (grants CA142665, CA21765, CA158568, CA156449, CA36401, CA98543, CA114766, CA140729, and U01GM92666) and the American Lebanese Syrian Associated Charities (ALSAC). The Jeffrey Pride Foundation and the National Childhood Cancer Foundation provided support for genome-wide genotyping of the COG specimens. S.P.H. holds the Ergen Family Chair in Pediatric Cancer. J.J.Y. is supported by an American Society of Hematology Scholar Award and an Alex's Lemonade Stand Foundation for Childhood Cancer Young Investigator Grant. These sponsors provided financial support for the St Jude and the COG clinical trials, as well as for the genetic studies, but were not directly involved in the design or execution of the study, genomic data collection/management/analysis, or review/approval of this manuscript.

CA142665CA21765CA158568CA156449CA36401CA98543CA114766CA140729U01GM92666National Institutes of Health


  • The online version of this article contains a data supplement.

  • The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

  • Submitted July 1, 2012.
  • Accepted September 14, 2012.


View Abstract