Blood Journal
Leading the way in experimental and clinical research in hematology

Gene expression classifiers for relapse-free survival and minimal residual disease improve risk classification and outcome prediction in pediatric B-precursor acute lymphoblastic leukemia

  1. Huining Kang1,
  2. I.-Ming Chen1,2,
  3. Carla S. Wilson1,
  4. Edward J. Bedrick1,
  5. Richard C. Harvey1,
  6. Susan R. Atlas1,
  7. Meenakshi Devidas2,3,
  8. Charles G. Mullighan4,
  9. Xuefei Wang1,
  10. Maurice Murphy1,
  11. Kerem Ar1,
  12. Walker Wharton1,
  13. Michael J. Borowitz2,5,
  14. W. Paul Bowman2,6,
  15. Deepa Bhojwani7,
  16. William L. Carroll2,7,
  17. Bruce M. Camitta2,8,
  18. Gregory H. Reaman2,9,
  19. Malcolm A. Smith10,
  20. James R. Downing4,
  21. Stephen P. Hunger2,11, and
  22. Cheryl L. Willman1,2
  1. 1University of New Mexico Cancer Center and Departments of Pathology, Internal Medicine, Mathematics and Statistics, and Physics and Astronomy, University of New Mexico, Albuquerque;
  2. 2Children's Oncology Group, Arcadia, CA;
  3. 3Children's Oncology Group Statistics and Data Center, and Department of Epidemiology and Health Policy Research, College of Medicine, University of Florida, Gainesville;
  4. 4Department of Pathology, St Jude Children's Research Hospital, Memphis, TN;
  5. 5Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD;
  6. 6Cook Children's Medical Center, Forth Worth, TX;
  7. 7Departments of Pediatrics, Hematology, and Oncology and Cancer Center, New York University Medical Center, NY;
  8. 8Departments of Pediatrics, Hematology, Oncology, and Transplantation, Medical College of Wisconsin, Milwaukee;
  9. 9Department of Hematology-Oncology, Children's National Medical Center, Washington, DC;
  10. 10Pediatric Oncology, Clinical Investigations Branch, National Cancer Institute, Bethesda, MD; and
  11. 11Children's Hospital, Department of Pediatrics, and University of Colorado Cancer Center, University of Colorado Denver School of Medicine, Aurora

Abstract

To determine whether gene expression profiling could improve outcome prediction in children with acute lymphoblastic leukemia (ALL) at high risk for relapse, we profiled pretreatment leukemic cells in 207 uniformly treated children with high-risk B-precursor ALL. A 38-gene expression classifier predictive of relapse-free survival (RFS) could distinguish 2 groups with differing relapse risks: low (4-year RFS, 81%, n = 109) versus high (4-year RFS, 50%, n = 98; P < .001). In multivariate analysis, the gene expression classifier (P = .001) and flow cytometric measures of minimal residual disease (MRD; P = .001) each provided independent prognostic information. Together, they could be used to classify children with high-risk ALL into low- (87% RFS), intermediate- (62% RFS), or high- (29% RFS) risk groups (P < .001). A 21-gene expression classifier predictive of end-induction MRD effectively substituted for flow MRD, yielding a combined classifier that could distinguish these 3 risk groups at diagnosis (P < .001). These classifiers were further validated on an independent high-risk ALL cohort (P = .006) and retainedindependent prognostic significance (P < .001) in the presence of other recently described poor prognostic factors (IKAROS/IKZF1 deletions, JAK mutations, and kinase expression signatures). Thus, gene expression classifiers improve ALL risk classification and allow prospective identification of children who respond or fail current treatment regimens. These trials were registered at http://clinicaltrials.gov under NCT00005603.

Introduction

Through the optimization and progressive intensification of standard chemotherapeutic regimens, remarkable advances have been achieved in the treatment of pediatric acute lymphoblastic leukemia (ALL).13 In parallel, laboratory investigations have provided remarkable insights into the biologic and genetic heterogeneity of this disease with the characterization of several recurring genetic abnormalities (hyperdiploidy, hypodiploidy, t[12;21][ETV6-RUNX1], t[1;19][TCF3-PBX1], t[9;22][BCR-ABL1], and translocations involving 11q23[MLL]) that are associated with distinct therapeutic outcomes and clinical phenotypes.2 Detailed risk classification schemes, incorporating pretreatment clinical characteristics (such as age, sex, and presenting white blood cell [WBC] count), the presence or absence of recurring cytogenetic abnormalities, and measures of minimal residual disease (MRD) at the end of induction therapy, are now used to tailor the intensity of therapy to a child's relative relapse risk (categorized as low, standard/intermediate, high, or very high).46 Yet, despite refinements in risk classification and improvements in overall survival, the second most common cause of cancer-related mortality in children in the United States remains relapsed ALL.7 Whereas relapses are more frequent in children with very high-risk disease, associated with BCR-ABL1 or hypodiploidy, relapses occur within all currently defined risk groups.1,7 Indeed, the majority of relapses occur in children initially assigned to the standard/intermediate- or high-risk categories.7 Thus, a primary challenge in pediatric ALL is to prospectively identify those children with higher-risk disease who do not benefit from therapeutic intensification and who require the development of new therapies for cure.7

In this study, we determined whether gene expression profiling could be used to improve risk classification and outcome prediction in high-risk pediatric ALL, a risk category largely defined by pretreatment clinical characteristics (age > 10 years and presenting WBC > 50 000/μL) and the absence of genetic abnormalities associated with low (hyperdiploidy, t[12;21][ETV6-RUNX1]) or very high (hypodiploidy, t[9;22][BCR-ABL1]) risk disease.4 More than 25% of children diagnosed with ALL are initially classified as high-risk. Outcomes in this form of ALL remain poor with high rates of relapse and relapse-free survival (RFS) of only 45% to 60%.7 Furthermore, the underlying genetic features associated with this form of ALL have not been well characterized. Thus, gene expression profiling and other comprehensive genomic technologies, such as assessment of genome copy number abnormalities or DNA sequencing, have the potential to resolve the underlying genetic heterogeneity of this form of ALL and to capture genetic differences that impact treatment response that can be exploited for improved risk classification and the identification of novel therapeutic targets.815

From the gene expression profiles obtained in the pretreatment leukemic cells of 207 uniformly treated children with high-risk ALL, we used supervised learning algorithms and extensive cross-validation techniques to build a 42-probe-set (38-gene) expression classifier predictive of RFS. In multivariate analysis, the best predictive model for RFS was this gene expression classifier combined with either flow cytometric measures of MRD determined at the end of induction therapy (day 29), or, a 23-probe-set (21-gene) molecular classifier derived from pretreatment samples that could predict levels of end-induction flow MRD at initial diagnosis. The application of these classifiers separated children with high-risk ALL into 3 distinct risk groups with significantly different survivals in the initial patient cohort used for modeling and in a second independent cohort of high-risk ALL patients used for validation. The gene expression classifier for RFS alone and combined with flow MRD also retained independent prognostic significance in the presence of other genetic abnormalities (IKAROS/IKZF1 deletions,16 JAK mutations,17 and gene expression signatures reflective of activated tyrosine kinases16,18) that we and others have recently discovered and determined to be associated with a poor outcome in pediatric ALL. Thus, gene expression classifiers significantly enhance outcome prediction and risk classification in high-risk ALL and, in particular, identify a group of children most likely to fail current therapeutic approaches and for whom novel therapies must be developed for cure.

Methods

Patient selection

Patient samples and clinical and outcome data for this study were obtained from the Children's Oncology Group (COG) Clinical Trial P9906. COG P9906 enrolled 272 eligible high-risk B-precursor ALL patients between March 15, 2000 and April 25, 2003; all patients were treated uniformly with a modified augmented Berlin-Frankfurt-Münster Study Group (BFM) regimen.6,19 This trial targeted a subset of newly diagnosed high-risk ALL patients who had experienced a poor outcome (44% RFS at 4 years) in prior studies.5,20 Patients with central nervous system disease or testicular leukemia were eligible for the trial regardless of age or WBC count at diagnosis. Patients with very high-risk features (BCR-ABL1 or hypodiploidy) were excluded, whereas those with low-risk features (trisomies of chromosomes 4 or 10; t[12;21][ETV6-RUNX1]) were excluded unless they had central nervous system disease or testicular leukemia. The majority of patients had MRD assessed by flow cytometry, as previously described; cases were defined as MRD positive or MRD negative at the end of induction therapy (day 29) using a threshold of 0.01%.6 For this study, previously cryopreserved residual pretreatment leukemia specimens were available on a representative cohort of 207 of the 272 (76%) registered patients. With the exception of differences in presenting WBC count, these 207 patients were highly similar in all other clinical and outcome parameters to all 272 patients accrued to this trial (see supplemental Table 1, available on the Blood website; see the Supplemental Materials link at the top of the online article). For validation of the performance of the classifiers, an independent set of 84 children with high-risk ALL, previously treated on COG Trial 1961, was used as a validation cohort14 (supplemental Section 2 provides detailed patient characteristics of the validation cohort). Treatment protocols were approved by the National Cancer Institute and participating institutions through their institutional review boards. Informed consent for clinical trial registration, sample submission, and participation in these research studies was obtained from all patients or their guardians in accordance with the Declaration of Helsinki.

Microarray analyses

RNA was purified from 207 pretreatment diagnostic samples with more than 80% blasts (131 bone marrow, 76 peripheral blood) and hybridized to HG_U133A_Plus2.0 oligonucleotide microarrays (Affymetrix) after RNA quantification, cDNA preparation, and labeling (supplemental Section 3). Signals were scanned (Affymetrix GeneChip Scanner) and analyzed with Affymetrix Microarray Suite (MAS 5.0). The expression signal matrix used for outcome analyses corresponded to a filtered list of 23 775 probe sets (supplemental Section 4). This gene expression dataset may be accessed via the National Cancer Institute caArray site (https://array.nci.nih.gov/caarray/) or at Gene Expression Omnibus (http://www.ncbi.nih.gov/geo) under accession number GSE11877.

Statistical analyses

RFS was calculated from the date of trial enrollment to either the date of first event (relapse) or last follow-up. Patients in clinical remission, or with a second malignancy, or with a toxic death as a first event were censored at the date of last contact. As described in detail in supplemental Sections 4C and 5 to 9, a Cox score was used to rank genes based on their association with RFS, and a Cox proportional hazards model–based supervised principal components analysis21 was used to build the gene expression classifier for RFS from the rank-ordered gene list. Similarly, for the development of the gene expression classifier predictive of end-induction MRD, a modified t test was used to rank genes expressed in pretreatment cells according to their association with day 29 flow MRD, defined as positive or negative at a threshold of 0.01%.6 Diagonal linear discriminant analysis22,23 was then used to build a prediction model and the classifier for MRD from the top-ranked genes. The likelihood-ratio test (LRT) score and the prediction error rate were used in the model construction and evaluation. To avoid overfitting, extensive cross-validation was used to determine the numbers of top-ranked genes to be included.23 Nested cross-validations provided predictions for individual cases as well as overall measures of the selected models' performance.22,23

For the first multivariate analysis testing, the predictive power of the gene expression classifier for RFS relative to flow cytometric measures of MRD and to other clinical and genetic variables, a multivariate proportional Cox hazards regression analysis was performed with the risk score (determined by gene expression classifier for RFS), WBC (on a log scale), and flow cytometric measures of MRD as explanatory variables. The LRT was performed to determine whether the risk score defined by the gene expression classifier for RFS was a significant predictor of time to relapse, adjusting for WBC and MRD.

To determine whether the gene expression classifier for RFS and the combined classifier (with flow cytometric measures of MRD) retained prognostic importance in the presence of new ALL-associated genetic abnormalities associated with a poor outcome that we and others have recently described, we accessed our recently published data reporting IKZF1/IKAROS deletions16 and JAK mutations17 in ALL, as these studies were performed using DNA samples from the same cohort of patients with high-risk ALL (COG P9906) reported in this study. The primary DNA copy number variation data reporting IKZF1 deletions16 may be accessed at http://target.cancer.gov/data. The JAK mutation data17 may be accessed at http://www.pnas.org/content/suppl/2009/05/22/0811761106.DCSupplemental/0811761106SI.pdf. A multivariate Cox proportional hazards regression analysis was performed with each expression classifier and included IKZF1/IKAROS deletions, JAK mutations, and kinase gene expression signatures as additional explanatory variables. A LRT was then performed to determine whether the classifiers retained independent prognostic significance adjusting for the effects of all covariates. All statistical analyses used Stata Version 9 and R.

Results

Patients and clinical risk factors

The median age of the 207 high-risk B-precursor ALL patients registered to COG Trial P9906 was 13 years (range: 1-20 years; Table 1). Whereas 23 of the 207 ALL patients had a t(1;19)(TCF3-PBX1) and 21 had various translocations involving MLL, the remaining 163 high-risk cases had no other known recurring cytogenetic abnormalities (Table 1). RFS in these 207 patients was 66.3% at 4 years (95% CI, 59%-73%; Figure 1A). Day 29 MRD, measured using flow cytometric techniques (end-induction flow MRD), was detected in 35% (67 of 191) of cases (Table 1).6 Among pretreatment clinical variables (age, sex, and central nervous system involvement), the presence of recurrent cytogenetic abnormalities (TCF3-PBX1 and MLL), and measures of MRD, only end-induction flow MRD and increasing WBC count were significantly associated with decreased RFS, and both retained significance in multivariate analysis (LRT based on Cox regression, P < .001; Table 1). A trend toward declining RFS was also observed among the 25% of children with Hispanic/Latino ethnicity (P = .049; Table 1).

View this table:
Table 1

Association of RFS with clinical and genetic features in the high-risk ALL cohort

Figure 1

Performance of the 42-probe-set (38-gene) gene expression classifier for prediction of RFS. (A-B) Kaplan-Meier survival estimates of RFS in the full cohort of 207 patients (A) and in the low- versus high-risk groups distinguished with the gene expression classifier for RFS (B). HR is the hazard ratio estimated using Cox regression. (C) A gene expression heatmap is shown with the rows representing the 42 probe sets (containing 38 unique genes) composing the gene expression classifier for RFS. The columns represent patient samples sorted from left to right by time to relapse or last follow-up. Red indicates high expression relative to the mean; green, low expression relative to the mean; R, relapse; and C, continuous remission.

A gene expression classifier predictive of survival

Gene expression profiles were obtained from pretreatment leukemic samples in each of the 207 high-risk ALL patients. To develop a gene expression–based classifier predictive of RFS, each of the 23 775 informative probe sets on the gene expression microarrays was ranked based on strength of association with RFS (Cox score).21 As detailed in supplemental Sections 4C, 5, and 8, a Cox proportional hazards model–based supervised principal component analysis was used to build the expression classifier for RFS, which was optimized by performing 20 iterations of 5-fold cross-validation.21 The final model incorporated the top 42 Affymetrix microarray probe sets corresponding to 38 unique genes (see supplemental Table 4 for the gene list; false discovery rate = 8.45%, significance analysis of microarrays [SAM]).24 The predicted gene expression classifier–based risk score for relapse for a given patient was computed via nested leave-one-out cross-validation (LOOCV) over the full model-building procedure (supplemental Sections 5 and 8). With a threshold of zero, the gene expression classifier–derived risk scores significantly separated the 207 high-risk ALL patients into low (4-year RFS, 81%; 95% CI, 72%-87%; n = 109) versus high (4-year RFS, 50%; 95% CI, 39%-60%; n = 98) risk groups (Figure 1B-C). Increased expression of BMPR1B, CTGF (CCN2), TTYH2, IGJ, NT5E (CD73), CDC42EP3, and TSPAN7, and decreased expression of NR4A3 (NOR-1), RGS1-2, and BTG3 were observed in the high gene expression risk group with the poorest outcome (Figure 1C). In a multivariate Cox regression analysis, the LRT revealed that the gene expression classifier for RFS provided significant independent information for outcome prediction, even after adjusting for flow MRD and WBC count (P = .001).

Improving risk classification and outcome prediction by combining the gene expression classifier and flow cytometric measures of MRD

Flow cytometric measures of MRD (flow MRD), measured at the end of induction therapy (day 29), were also capable of distinguishing 2 groups of patients with significantly different outcomes within the high-risk ALL cohort (Figure 2A).6 However, the independent prognostic impact of the gene expression–based classifier for RFS could further split both the flow MRD-negative patients (Figure 2B) and flow MRD-positive patients (Figure 2C) into 2 distinct patient groups with significantly different RFS (P = .001 and P = .005, respectively). It was particularly striking that the application of the gene expression classifier to the flow MRD-negative patients (Figure 2B) distinguished a group of high-risk ALL patients who did extremely well in the COG P9906 clinical trial(87% RFS at 4 years; 95% CI, 77%-93%). Similarly, applying the gene expression classifier to the flow MRD-positive patients distinguished a group of patients who did relatively well (68% RFS at 4 years; 95% CI, 47%-82%) from those who had an extremely poor outcome (Figure 2C). As both the gene expression classifier for RFS and flow MRD provided independent prognostic information in a multivariate Cox regression analysis (each P = .001), we built a combined risk classifier using these 2 variables; this combined classifier was capable of distinguishing 4 distinct prognostic groups within this cohort of high-risk ALL patients (Figure 2D). The 72 patients in the lowest risk group (38% of cases in the cohort; Table 2), who had low-risk gene expression classifier scores and negative end-induction flow MRD, showed significantly better RFS than the other groups (P < .001). Whereas all 20 cases with a t(1;19)(TCF3-PBX1) were contained within this lowest risk group (Figure 2D-E), it is of interest that another 52 patients lacking known recurring cytogenetic abnormalities were also assigned to this risk group (Table 2). Similarly, the 38 patients in the highest risk group (20% of cohort), who had high gene expression classifier risk scores and positive end-induction flow MRD, displayed significantly worse RFS (29% RFS at 4 years; 95% CI, 14%-46%, which continued to decline at 5 years; P < .001; Figures 2C-E; Table 2). No significant survival differences (P = .57) were observed among those with discordant predictors, either those patients with low gene expression classifier risk scores and positive end-induction flow MRD (28 of 191, 15% of cohort) or those with high gene expression classifier risk scores and negative end-induction flow MRD (52 of 191, 27% of cohort). These 2 groups were thus combined into an intermediate-risk group (Figure 2E). Figure 2E provides the Kaplan-Meier survival estimates for the 3 groups defined by the combined classifier and highlights the significant differences in RFS. These 3 risk groups varied significantly in age and in the presence of the known recurring cytogenetic abnormalities (Table 2). Whereas the 17 patients with MLL translocations were distributed within the low- and intermediate-risk groups, all 20 cases with t(1;19)(TCF3-PBX1) were in the lowest risk group, as discussed above (Table 2; Figure 2E). Interestingly, of the 8 relapses that occurred in the lowest risk group, all 8 were ALL cases with t(1;19)(TCF3-PBX1). Children in each of the 3 risk groups had similar proportions of relapse within the bone marrow or isolated to the central nervous system (Table 2).

Figure 2

Kaplan-Meier estimates of RFS based on the gene expression classifier for RFS and end-induction (day 29) MRD. (A) Day 29 flow cytometric measures of MRD separated patients into 2 groups with significantly different RFS. (B-C) After dividing patients by their end-induction flow MRD status, an independent effect of the gene expression classifier for RFS is observed among both the flow MRD-negative (< 0.01% blasts; B) and flow MRD-positive (> 0.01% blasts; C) patients. (D-E) Combining the risk scores determined from the gene expression classifier and flow MRD yields 4 distinct outcome groups; the 2 discordant groups show no significant difference in RFS (P = .572) and are therefore collapsed into an intermediate-risk group for RFS prediction (E). (E) The hazard ratios (HR) and corresponding P values are based on the Cox regression (medium-risk vs low-risk, HR = 3.73, P = .001; high-risk vs medium-risk, HR = 2.27, P = .002). The P value reported in the lower left corner corresponds to the test for differences among all groups.

View this table:
Table 2

Clinical and genetic features of the three risk groups determined by the combined application of the gene expression classifier for RFS and flow cytometric measures of MRD

To assure that the gene expression classifier could improve outcome prediction in high-risk ALL patients lacking known recurring cytogenetic abnormalities, we built a second gene expression classifier for RFS using a subset of 163 of the original 207 COG 9906 high-risk ALL patients, excluding those cases with MLL (n = 21) or E2A-PBX1 translocations (n = 23), again using a Cox proportional hazards model–based supervised principal component analysis with extensive cross-validation (see supplemental Section 10). The resulting classifier for RFS contained 32 probe sets (29 unique genes; list provided in supplemental Table 8) and had a high degree of overlap (84%) with the genes in the initial classifier (supplemental Table 4). With a threshold of zero, the risk scores derived from this second classifier also significantly separated the 163 ALL cases into low- (4-year RFS, 76%; 95% CI, 64%-84%; n = 88) versus high- (4-year RFS, 52%; 95% CI, 40%-64%; n = 75) risk groups (P = .001; Figure 3A). Flow cytometric measures of end-induction MRD were also capable of distinguishing 2 risk groups within these 163 high-risk ALL cases (Figure 3B), and application of the gene expression classifier further divided both the flow MRD-negative (Figure 3C) and flow MRD-positive (Figure 3D) patients into distinct risk groups with significantly different outcomes. Combining this second classifier for RFS with end-induction flow MRD yielded 4 distinct risk groups with significantly different outcomes (P < .001; Figure 3E). As no significant survival differences were observed among the 2 groups with discordant predictors, these groups were combined into an intermediate-risk group (Figure 3F). As shown in Figure 3F, the Kaplan-Meier survival estimates for the 3 risk groups defined by this second combined classifier demonstrated highly significant differences in RFS (low [83% 4-year RFS; 95% CI, 70%-90%], intermediate [60% 4-year RFS; 95% CI, 44%-72%], and high [35% 4-year RFS; 95% CI, 19%-44%]; P < .001). These results demonstrate that gene expression classifiers significantly refine risk classification in high-risk ALL cases lacking known cytogenetic abnormalities.

Figure 3

Kaplan-Meier estimates of RFS based on the gene expression classifier for RFS modeled on high-risk ALL cases lacking known recurring cytogenetic abnormalities and end-induction (day 29) MRD. (A) The second gene expression classifier modeled only on those high-risk ALL cases (n = 163; supplemental Table 8) from the COG 9906 ALL cohort lacking recurring cytogenetic abnormalities resolves 2 distinct risk groups of patients with significantly different RFS. (B) Day 29 flow MRD status separated these 163 ALL cases into 2 groups with significantly different RFS. (C-D) After dividing patients by their end-induction flow MRD status, an independent effect of the gene expression classifier for RFS is observed among both the flow MRD-negative (< 0.01% blasts; C) and flow MRD-positive (> 0.01% blasts; D) patients. (E-F) Combining the risk scores determined from the gene expression classifier and flow MRD yields 4 distinct outcome groups (E); the 2 discordant groups show no significant difference in RFS and are therefore collapsed into an intermediate-risk group for RFS prediction (F). (F) The hazard ratios (HR) and corresponding P values are based on the Cox regression (high-risk vs intermediate-risk, HR = 2.26, P = .007; intermediate-risk vs low-risk, HR = 2.77, P = .008). The P value reported in the lower left corner corresponds to the test for differences among all groups.

A gene expression classifier predictive of end-induction flow MRD

The clinical application of a combined classifier using the gene expression classifier for RFS and day 29 flow MRD would require waiting until the end of induction therapy, precluding earlier intervention in patients who were destined to ultimately fail therapy. To develop a gene expression classifier predictive of end-induction MRD in diagnostic pretreatment specimens, 23 775 informative probe sets from 191 patients (of the 207 patients who had day 29 MRD results available) were ranked on their association with MRD (supplemental Sections 6 and 9). Using a threshold of 1% for the false discovery rate, SAM identified 352 probe sets significantly associated with positive end-induction flow MRD (supplemental Table 6). A diagonal linear discriminant analysis model22,23 predicting MRD was built and optimized by performing 100 iterations of 10-fold cross-validation. The final model incorporated the top 23 probe sets (21 unique genes; supplemental Table 5), which separated the patients into 2 groups with significantly different outcomes (log rank test, P = .014). Figure 4A shows the receiver operating characteristic (ROC) curve for the nested LOOCV predictions of the classifier. The 23 probe sets in the gene expression classifier predictive of end-induction MRD (Figure 4B) include the genes BAALC, P2RY5, TNFSF4, E2F8, IRF4, CDC42EP3, and KLF4, and 2 probe sets each for EPB41L2 and PARP15. When the gene expression classifier predictive of MRD was substituted for the day 29 flow MRD data and then combined with the expression classifier for RFS, 3 distinct risk groups were resolved that had significantly different RFS at 4 years (low-, 82%; intermediate-, 63%; and high-risk, 45%; Figure 4C). Whereas still highly statistically significant (P < .001), the combined classifier using the gene expression classifier for RFS and the gene expression classifier predicting end-induction MRD (Figure 4C) was slightly less discriminatory than the one combining the gene expression classifier for RFS and flow MRD (Figure 2E).

Figure 4

Gene expression classifier for prediction of end-induction (day 29) flow MRD in pretreatment samples combined with the gene expression classifier for RFS. (A) A ROC shows the high accuracy of the 23-probe-set MRD classifier (LOOCV error rate of 24.61%; sensitivity 71.64%, specificity 77.42%) in predicting MRD. The area under the ROC curve (0.80) is significantly greater than an uninformative ROC curve (0.5; P < .001). (B) Heatmap of 23-probe-set predictor of MRD presented in rows (false discovery rate < .001%, SAM). The columns represent patient samples with positive or negative end-induction flow MRD, whereas the rows are the specific predictor genes. Red: high expression relative to the mean; green: low expression relative to the mean. (C) Kaplan-Meier estimates of RFS for the risk groups determined by combining the gene expression classifiers for RFS and MRD, analogous to Figure 2E, with the gene expression predictor for MRD replacing day 29 flow MRD. The 3 risk groups have significantly different RFS (log rank test, P < .001).

Validation of the classifiers in an independent dataset

We next determined whether the gene expression classifiers were predictive of outcome in a second independent cohort of 84 children with high-risk ALL treated on a different clinical trial (COG/Children's Cancer Group [CCG] 1961).14,19 In contrast to the initial COG 9906 high-risk ALL cohort, a WBC count more than 50 000/μL (LRT, P = .014) and male sex (LRT, P = .018) were associated with a worse RFS (supplemental Section 2).14,19 Flow MRD was not evaluated in the CCG 1961 trial. The initial 38-gene expression classifier for RFS (supplemental Table 4) that we developed from COG P9906 predicted a risk score among these 84 patients who were significantly associated with RFS (Cox proportional hazard regression, P = .006), even after adjusting for sex and WBC count (multivariate Cox regression, P = .01). The gene expression classifier risk scores split the 84 children from CCG 1961 into high (n = 28) and low (n = 56) risk groups (Figure 5A). Unlike our initial cohort, a significantly greater number of children with WBC counts > 50 000/μL was in the high (82%, 23 of 28) compared with the lower risk groups defined by the expression classifier (55%, 31 of 56; Fisher exact test, P = .017). Similar to the COG 9906 cohort, all children with t(1;19)(TCF3-PBX1) were in the lowest risk group, although this cytogenetic abnormality by itself did not predict RFS. We next tested the effect of the combined gene expression classifiers for RFS and MRD and were able to resolve 3 distinct risk groups with significantly different outcomes (Figure 5B), demonstrating that these classifiers were capable of resolving distinct risk groups in an independent cohort of children with high-risk ALL.

Figure 5

Kaplan-Meier estimates of RFS using the combined gene expression classifiers for RFS and MRD in an independent cohort of 84 children with high-risk ALL. (A) The gene expression classifier for RFS separates children into low- and high-risk groups in an independent cohort of 84 children with high-risk ALL treated on COG Trial 1961.14,16 (B) Application of the combined gene expression classifiers for RFS and MRD shows significant separation of 3 risk groups: low (47 of 84, 56%), intermediate (22 of 84, 26%), and high (15 of 84, 18%), similar to our initial cohort (Figure 3C).

Gene expression classifiers retain independent prognostic significance in the presence of new genetic factors associated with a poor outcome in pediatric ALL

We and others have recently identified new genetic features in pediatric ALL that are associated with a poor outcome, including IKAROS/IKZF1 deletions,16 JAK mutations,17 and gene expression signatures reflective of activated tyrosine kinase signaling pathways (termed kinase signatures).16,18 Two of these studies16,18 first reported the discovery of ALL cases that lacked a classic BCR-ABL1 translocation, but that had gene expression profiles reflective of tyrosine kinase activation. Our more recent work17 has determined that the majority of these cases have activating mutations of the JAK family of tyrosine kinases. We thus wished to determine whether the gene expression classifier for RFS, or the combined classifier, retained independent prognostic significance in the presence of these genetic abnormalities. As detailed in “Statistical analyses,” our studies reporting IKAROS/IKZF1 deletions,16 activated kinase signatures,16 and JAK mutations17 used samples from the same COG 9906 high-risk ALL cohort; thus, we could readily perform this multivariate analysis.

As shown in Table 3, activated kinase signatures, JAK family mutations, and IKAROS/IKZF1 deletions were each significantly associated with the highest risk group as defined by the gene expression classifier for RFS in the COG 9906 high-risk ALL cases. Not only did the gene expression classifier for RFS assign all 38 cases with a kinase signature to the highest risk group, it also assigned another 60 cases to this risk group (Table 3). Similarly, whereas all cases with JAK mutations were assigned to the highest risk group by the gene expression classifier for RFS, an additional 74 cases lacking these mutations were also assigned to this high-risk group (Table 3). The gene expression classifier also refined risk classification in the presence of IKAROS/IKZF1 deletions (Table 3). In a multivariate Cox regression analysis, only the gene expression classifier for RFS (P = .005) and IKAROS/IKZF1 deletions (P = .003) retained prognostic significance (Table 4). A LRT determined that the gene expression classifier for RFS retained independent prognostic significance (P = .014) when adjusting for all other covariates. We also examined the association between risk groups as defined by the combined gene expression classifier for RFS and end-induction flow MRD (the combined classifier) with kinase signatures, JAK family mutations, and IKAROS/IKZF1 deletions (Table 5; Figure 6). Again, significant associations between each of these variables and the 3 risk groups (low, intermediate, and high) defined by the combined classifier were seen (Table 5). As shown in Figure 6, the application of the combined classifier refined risk classification and distinguished different patient groups with statistically significant different RFS in the presence or absence of a kinase signature (Figure 6A-B), in the presence or absence of JAK mutations (Figure 6C-D), and in the presence or absence of IKAROS/IKZF1 deletions (Figure 6E-F). In a multivariate Cox regression analysis (Table 6), only the combined classifier retained independent prognostic significance for outcome prediction. The LRT revealed that the combined classifier retained independent prognostic significance after adjusting for the effects of all other genetic abnormalities (P = .001).

View this table:
Table 3

Association of kinase gene expression signatures, JAK mutations, and IKAROS/IKZF1 deletions with the low- versus high-risk groups defined by the gene expression classifier for RFS

View this table:
Table 4

Multivariate Cox regression analysis of the prognostic significance of the risk group determined by the gene expression classifier for RFS in the presence of genetic factors in ALL associated with a poor outcome

View this table:
Table 5

Association of kinase gene expression signatures, JAK mutations, and IKAROS/IKZF1 deletions with the three risk groups defined by the combined gene expression classifier for RFS and flow cytometric measures of MRD

Figure 6

Kaplan-Meier estimates of RFS using the combined gene expression classifier for RFS and flow cytometric measures of MRD in the presence of kinase signatures, JAK mutations, and IKAROS/IKZF1 deletions. (A-B) Application of the original 42-probe-set (38-gene; supplemental Table 4) gene expression classifier for RFS combined with end-induction flow cytometric measures of MRD distinguishes 2 distinct risk groups in COG 9906 ALL patients with kinase signatures (A) and 3 risk groups in those patients lacking kinase signatures (B). (C-D) Application of the combined classifier also resolves 2 distinct and statistically significant risk groups in ALL patients with JAK mutations (C) and in 3 risk groups in those patients lacking JAK mutations (D). (E-F) Application of the combined classifier distinguishes 3 risk groups with statistically significant RFS and patients with (E) and without IKAROS/IKZF1 deletions. The P value reported in the lower left corner corresponds to the log rank test for differences among all groups.

View this table:
Table 6

Multivariate Cox regression analysis of the prognostic significance of the risk group determined by the combined gene expression classifier for RFS and flow cytometric measures of MRD in the presence of genetic factors in ALL associated with a poor outcome

Discussion

Whereas gene expression–profiling studies in the acute leukemias have identified gene expression signatures associated with recurrent cytogenetic abnormalities8,25,26 and in vitro drug responsiveness,911,15 fewer studies have reported and validated gene expression classifiers predictive of survival.13,14 In this study, gene expression classifiers predictive of RFS and end-induction MRD were derived from the gene expression profiles obtained in the pretreatment samples of 207 children with B-precursor high-risk ALL. A 42-probe-set (containing 38 unique genes) expression classifier predictive of RFS was capable of resolving 2 distinct groups of patients with significantly different outcomes within the category of pediatric ALL patients traditionally defined as high risk. In multivariate analyses, only the gene expression–based classifier for RFS and flow cytometric measures of end-induction MRD provided independent prognostic information for outcome prediction. By combining the risk scores derived from the gene expression classifier for RFS with end-induction flow MRD, 3 distinct groups of patients with strikingly different treatment outcomes could be identified. Similar results were obtained when modeling only those high-risk ALL cases that lacked any known recurring cytogenetic abnormalities.

Perhaps most importantly, in terms of the future potential clinical utility of gene expression-based classifiers for risk classification, we further demonstrated that both the gene expression classifier for RFS and the combination of this classifier with end-induction flow MRD retained independent prognostic significance for outcome prediction in the presence of new genetic abnormalities that we and others have recently discovered and found to be associated with a poor outcome in pediatric ALL (IKAROS/IKZF1 deletions, JAK mutations, and kinase signatures). The combined classifier further refined outcome prediction in the presence of each of these mutations or signatures, distinguishing which cases with JAK mutations, kinase signatures, or IKAROS/IKZF1 deletions would have a good (low-risk), intermediate, or poor (high-risk) outcome (Table 5; Figure 6). Thus, whereas IKZF1 deletions and JAK mutations are exciting new targets for the development of novel therapeutic approaches in pediatric ALL, assessment of these genetic abnormalities alone may not be fully sufficient for risk classification or to predict overall outcome. As gene expression profiles reflect the full constellation and consequence of the multiple genetic abnormalities seen in each ALL patient and as measures of MRD are a functional biologic measure of residual or resistant leukemic cells, they may have an enhanced clinical utility for refinement of risk classification and outcome prediction.

The results reported in this study, as well as those of other recent studies,1618 reveal the striking molecular and biologic heterogeneity within children who have traditionally been classified as high-risk ALL. Unexpectedly, 72 of 207 (38%) of the high-risk ALL patients studied in the COG 9906 ALL cohort were found by the combined gene expression classifier for RFS and flow MRD classifier to have a significantly better survival (87% RFS at 4 years) compared with the entire cohort (66% survival at 4 years). This group of patients, which included all 20 cases with t(1;19)(TCF3-PBX1) and an additional 52 cases whose underlying genetic abnormalities remain to be discovered, was characterized by high expression of the tumor suppressor genes and signaling proteins RGS2, NFKBIB, NR4A3, DDX21, and BTG3.2730 Application of the combined classifier also identified 38 of 207 (20%) patients in the COG 9906 cohort who had a dismal 4-year RFS of 29% (approaching 0% at 5 years). Highly expressed in this group of patients with the worst outcome were genes (BMPR1B, CTGF [CCN2], TTYH2, IGJ, PON2, CD73, CDC42EP3, TSPAN7, and SEMA6A) involved in adaptive cell signaling responses to transforming growth factor β, stem cell function, B-cell development and differentiation, and the regulation of tumor growth.2745 These highest risk cases lacked expression of the genes (NR4A3, BTG3, RGS1, and RGS2) whose relatively high expression characterized the ALL cases with the best outcome. Not surprisingly, given that all cases with an activated kinase signature were assigned to the highest risk group with the combined classifier, 6 of the genes associated with our kinase signature (BMPR1B, ECM1, IGJ, PON2, SEMA6A, and TSPAN7) were contained within our gene expression classifier for RFS. The genes that characterize the risk groups defined by the combined classifier provide important clues to the multiple complex pathways and mechanisms of leukemic transformation in pediatric ALL.

The kinetics of early treatment response, best assessed by molecular or flow cytometric measures of MRD after the first 1 to 3 months of therapy, are a potent predictor of outcome in leukemia. Yet, MRD data are not available at initial diagnosis, and relapses occur in some pediatric ALL patients (such as those with t[1;19][TCF3-PBX1]) who have an excellent (negative) end-induction MRD response. Ideally, one would want to identify as early as possible those ALL patients who are most likely to fail therapy so that novel treatment interventions or alternative induction methods could be used. Using the combined gene expression classifier for RFS and end-induction flow MRD, we identified 38 patients in the initial cohort of 207 patients who were destined to ultimately fail intensified traditional therapy for ALL. We therefore built a 23-probe-set (21-gene) gene expression classifier predictive of day 29 flow MRD in diagnostic, pretreatment samples that could successfully replace end-induction flow MRD in our risk model. Among several interesting genes in the classifier predictive of end-induction MRD was BAALC, a novel marker of early progenitor cells that has been reported to confer a worse outcome and primary resistance in acute leukemia, including ALL and acute myeloid leukemia in adults.46,47 Given the relatively old age (mean = 13 years) of the children and adolescents in our ALL cohort and the presence of genes in our gene expression classifiers for RFS and MRD that have previously been associated with a poor outcome in adult ALL (such as CTGF43,44 and BAALC46,47), we hypothesize that the gene expression classifiers that we have developed for pediatric ALL may also be useful for risk classification and outcome prediction in adults with ALL. These studies are now in progress.

The results of our studies provide evidence that improved outcome prediction and risk classification can be achieved in ALL through the development of gene expression classifiers. The application of gene expression classifiers allows for the prospective identification of a significant subgroup of ALL patients with little chance for cure on contemporary chemotherapeutic regimens. Further analysis of these expression profiles, coupled with other comprehensive genomic studies, will hopefully lead to the continued identification of novel targets and more effective therapies for these children.

Authorship

Contribution: H.K. performed statistical analyses, designed and developed classifiers, and prepared the manuscript; I.-M.C. and R.C.H. performed leukemia sample processing, gene expression arrays, and correlative data analysis; C.S.W. and W.W. performed data analysis and review and prepared the manuscript; E.J.B. performed statistical analyses and designed and developed classifiers; S.R.A. conducted statistical analyses, data analysis, and review; M.D. conducted COG clinical and statistical analyses, data review, and analysis; C.G.M. completed IKAROS, collaborated in JAK studies, and reviewed the data; X.W. performed statistical analyses, model building, data interpretation, and review; M.M. conducted statistical analyses and database and data warehouse development; K.A. technically performed RNA isolations/microarrays; M.J.B. performed research, data analysis, and review, and prepared the manuscript; W.P.B. designed COG studies and conducted data analysis and review; D.B. completed arrays on independent dataset and performed data analysis and review; W.L.C. designed COG studies and performed data analysis, review, and arrays from independent cohort; B.M.C. designed COG studies, performed data analysis and review, and prepared the manuscript; G.H.R. designed COG and CCG studies and performed data analysis and review; M.A.S. coordinated National Cancer Institute TARGET studies for JAK/IKAROS analyses and participated in data review; J.R.D. completed IKAROS, collaborated in JAK studies, and performed data review; S.P.H. designed COG studies, performed data analysis and review, and prepared the manuscript; and C.L.W. oversaw all aspects of this project, performed data analysis and review and statistical analysis review, and prepared the manuscript.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Cheryl L. Willman, University of New Mexico Cancer Research Facility, 2325 Camino de Salud NE, Room G03, MSC08 4630 1 University of New Mexico, Albuquerque, NM 87131; e-mail: cwillman{at}salud.unm.edu.

Acknowledgments

This work was supported by National Institutes of Health NCI U01 CA114762 Strategic Partnerships to Evaluate Cancer Gene Signatures Program (C.L.W.), NCI U10 CA98543 supporting the Children's Oncology Group and Statistics and Data Center (G.H.R.), and a subcontract to NCI U10 CA98543 in support of the National Cancer Institute TARGET Initiative. Additional funding was provided by a Leukemia & Lymphoma Society Specialized Center of Research Program Grant 7388-06 (C.L.W.). University of New Mexico Cancer Center Shared Resources: Keck-UNM Genomics Resource, Biostatistics, and Bioinformatics and Computational Biology, partially supported by NCI P30 CA118100 (C.L.W.), were also critical for this work.

Footnotes

  • The online version of this article contains a data supplement.

  • The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

  • Submitted May 1, 2009.
  • Accepted November 6, 2009.

References

View Abstract