Blood Journal
Leading the way in experimental and clinical research in hematology

Commonly altered genomic regions in acute myeloid leukemia are enriched for somatic mutations involved in chromatin remodeling and splicing

  1. Anna Dolnik1,*,
  2. Julia C. Engelmann2,*,
  3. Maren Scharfenberger-Schmeer3,
  4. Julian Mauch1,
  5. Sabine Kelkenberg-Schade3,
  6. Berit Haldemann3,
  7. Tamara Fries3,
  8. Jan Krönke1,
  9. Michael W. M. Kühn1,
  10. Peter Paschka1,
  11. Sabine Kayser1,
  12. Stephan Wolf3,
  13. Verena I. Gaidzik1,
  14. Richard F. Schlenk1,
  15. Frank G. Rücker1,
  16. Hartmut Döhner1,
  17. Claudio Lottaz2,
  18. Konstanze Döhner1, and
  19. Lars Bullinger1
  1. 1Department of Internal Medicine III, University Hospital of Ulm, Ulm, Germany;
  2. 2Statistical Bioinformatics, Institute for Functional Genomics, University of Regensburg, Regensburg, Germany; and
  3. 3Genomics and Proteomics Core Facilities, Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany


Acute myeloid leukemia (AML) is characterized by molecular heterogeneity. As commonly altered genomic regions point to candidate genes involved in leukemogenesis, we used microarray-based comparative genomic hybridization and single nucleotide polymorphism profiling data of 391 AML cases to further narrow down genomic regions of interest. Targeted resequencing of 1000 genes located in the critical regions was performed in a representative cohort of 50 AML samples comprising all major cytogenetic subgroups. We identified 120 missense/nonsense mutations as well as 60 insertions/deletions affecting 73 different genes (∼ 3.6 tumor-specific aberrations/AML). While most of the newly identified alterations were nonrecurrent, we observed an enrichment of mutations affecting genes involved in epigenetic regulation including known candidates like TET2, TET1, DNMT3A, and DNMT1, as well as mutations in the histone methyltransferases NSD1, EZH2, and MLL3. Furthermore, we found mutations in the splicing factor SFPQ and in the nonclassic regulators of mRNA processing CTCF and RAD21. These splicing-related mutations affected 10% of AML patients in a mutually exclusive manner. In conclusion, we could identify a large number of alterations in genes involved in aberrant splicing and epigenetic regulation in genomic regions commonly altered in AML, highlighting their important role in the molecular pathogenesis of AML.


Acute myeloid leukemia (AML) is a genetically heterogeneous clonal disorder characterized by the accumulation of somatic genetic alterations in hematopoietic progenitor cells.1 Recent next-generation sequencing (NGS) analyses of AML genomes and exomes led to the identification of several new candidate genes, which might contribute to the pathogenesis of the disease.26 At the same time, first NGS results further highlighted the substantial heterogeneity of AML at the molecular level and the difficulty of distinguishing causative driver mutations from passenger aberrations.

For example, sequencing of the first AML genome revealed somatic mutations in 10 different genes. Except for mutations in the known candidate genes FLT3 and NPM1, none of the newly discovered mutations were found to be recurrent by screening an independent test cohort of 187 AML cases.3 Similarly, sequencing of a first treatment-related AML (tAML) genome revealed somatic single nucleotide variations (SNVs), insertions, or deletions (indels) in 28 genes, and again none of these was found recurrent in an additional 93 tAML samples besides a previously reported NUP98 mutation.4 This illustrates the great variation of genomic changes in AML, which might lead to phenotypically and clinically similar manifestations.

However, NGS analyses also discovered potential new “driver” mutations such as those affecting the metabolic enzyme IDH1 (isocitrate dehydrogenase 1).5 Mutations in IDH1 and its mitochondrial variant IDH2 occur in 7.6% and 8.7% of AMLs, respectively,7 and have recently been reported to lead to global DNA hypermethylation via a change in cell metabolism.8 Thus, 2-hydroxglutarate, the aberrant product of mutated IDH enzymes, impairs the function of the α-ketoglutarate–dependent methylcytosine dioxygenase TET2. Notably, TET2 is also found mutated in 8% of AML patients, and these mutations have a similar effect on the epigenetic signature.8 Interestingly, the second newly identified recurrent mutation in AML affects another enzyme involved in epigenetic regulation, DNMT3A (DNA methyltransferase 3 alpha).2 In addition to the initial whole-genome analysis, sequencing the exomes of 9 AML cases with a monocytic phenotype added this gene to a growing list of regulators of the epigenome affected in AML.6

Furthermore, in addition to the well-known recurrently mutated genes affecting tyrosine kinases (such as FLT3, KIT, and JAK2) or transcription factors (such as CEBPA, WT1, and RUNX1),1 NGS-based approaches recently showed genes required for the initial steps of RNA splicing to be frequently implicated in hematopoietic malignancies.9,10 Exome sequencing of myelodysplastic syndrome (MDS) cases revealed recurrent mutations in different components of the RNA splicing machinery (including SF3B1, SRSF2, and U2AF1).10 In MDS patients with ring sideroblasts, up to 75% of cases carry an SF3B1 mutation, suggesting a great importance of the genetic alteration of major splicing components for the deregulation of hematopoiesis.9,10 Screening of the respective genes in AML revealed mutations in 26% of AML derived from MDS, but only 7% of de novo AML cases.10

Compared with other cancers, such as diffuse large B-cell lymphoma (DLBCL),11,12 basal-like breast cancer,13 and lung cancer,14 in which in general more than 30 mutations were detected per case, in de novo AML the mutation frequency is substantially lower. Similarly, microarray-based genomic profiling in AML revealed fewer alterations than seen in other hematopoietic tumors such as acute lymphoblastic leukemia.15 Nevertheless, array-based comparative genomic hybridization (aCGH) and single nucleotide polymorphism (SNP) array-based analyses also identified recurrently gained and/or lost regions in AML that point to genes of interest.16,17 For example, commonly deleted regions in AML indicate tumor suppressor genes that are also targeted by inactivating mutations (such as TP53 in the deleted 17p13 locus), and amplified regions point to genes that are hit by activating mutations (such as the JAK2 gene, which can also be overexpressed because of high-level amplifications involving the 9p24 locus).17 In addition, recent refined screening efforts revealed many more candidate regions of interest that warrant further exploration,18 and given the low overall number of aberrations in AML, focusing on the respective candidate regions likely increases the chance of finding new recurrent mutations in AML.

Based on aCGH and SNP microarray data of 391 AML cases, we selected 1000 candidate genes of potential pathogenetic relevance located in genomic regions affected by recurrent genomic gains and losses. Using a targeted resequencing approach, the coding exons of the respective candidate genes were analyzed in 50 paired diagnosis and remission AML samples. We identified 120 novel tumor-specific missense or nonsense mutations and 60 indels. In accordance with recent studies, these aberrations mostly affected histone-modifying enzymes, but also allowed us to identify novel mutations in nonclassic regulators of transcriptional elongation and splicing.


Patient samples

For NGS, diagnostic and matched remission samples were collected from 52 adult patients enrolled on German-Austrian AML Study Group (AMLSG) treatment protocols for younger adults (AMLSG-HD98A [NCT00146120] and AMLSG 07-04 [NCT00151242]) and comprised Ficoll gradient–purified mononuclear cells (with blast counts > 80% in all analyzed cases). Written informed consent was obtained from all patients in accordance with the Declaration of Helsinki, and the study was approved by the institutional review board of each participating center. For Sanger sequencing-based mutation screening, 120 additional AML samples from patients enrolled in the AMLSG-HD98A trial were used.

DNA capture array design, library preparation, targeted enrichment, and NGS

We reviewed the results of a previously published study of 157 cases of cytogenetically normal AML (CN-AML) profiled with Affymetrix GeneChip Human Mapping SNP Array 50K or 500K platforms,16 as well as 234 AML cases with complex karyotype (CK-AML) analyzed with either aCGH (n = 131), Affymetrix GeneChip Human Mapping SNP 250K (n = 61), or Genome-Wide Human SNP Array 6.0 (n = 42) platforms.17,18 Based on this analysis, we focused on genes located in minimally deleted/gained regions of recurrent aberrations and generated a list of candidate genes, available in supplemental Table 1 (see the Supplemental Materials link at the top of the article), which we used for designing a custom DNA capture array (NimbleGen; Roche) for targeted enrichment of all coding exons of the respective genes, corresponding to 2.16 Mb of sequence.

After library preparation and targeted enrichment of DNA as outlined in the supplemental Methods, NGS was performed with an Illumina Genome Analyzer IIx using 100-bp paired-end read technology with the aim to reach an average coverage of 50x (because of low coverage, 2 of the 52 initially sequenced samples had to be excluded from further analysis). Reads were processed as detailed in the supplemental Methods using the customized filtering pipeline depicted in Figure 1 and supplemental Figures 1 and 2.

Validation of SNVs and indels by Sanger sequencing

To confirm SNVs and indels, a selection of high-confidence alterations was analyzed by direct sequencing in paired tumor and remission samples. In brief, genomic regions of 300-500 bp were amplified by PCR using specific primers. After PCR purification (QIAGEN), DNA was directly sequenced using the ABI Ready Reaction Dye Terminator Cycle Sequencing Kit (Applied Biosystems) as previously published.19

Gene set analysis

We performed overrepresentation analysis on the list of genes carrying high-confidence SNVs or indels using Gene Ontology (GO) terms from the Biologic Process branch20 and an implementation of the hypergeometric test, which takes the hierarchical structure of GO terms into account.21 We calculated the deviation between observed and expected gene counts, which were generated on the basis of the 1000 genes present on the custom microarray. The overrepresentation of a specific GO term was considered significant at P < .01.

Copy number estimates based on sequencing coverage

For all targeted segments of the 1000 genes selected, basewise coverage was calculated and scaled to mean = 0 and unit variance. Segments were summarized by taking the median of all values of a segment. Coverage in the diagnosis samples was normalized by subtracting the coverage of the segments in the remission sample such that coverage of the diagnosis sample reflects somatic copy number changes.

SNP microarray analysis and gene expression profiling

All cases analyzed by NGS were also investigated by SNP microarray profiling using 500K and SNP 6.0-microarray platforms (Affymetrix) to derive copy number estimates. Paired analysis of diagnosis and remission samples was carried out using dChipSNP or Genotyping console version 2.0 (Affymetrix) as previously described.18 For selected cases, gene expression was profiled using Affymetrix HG U133plus2.0 microarrays as previously reported,19 and data are available at Gene Expression Omnibus (; accession no. GSE38987).


Screening for somatic mutations in genomic regions recurrently altered in AML

In this study, we applied a cost-effective targeted resequencing approach to identify mutations in the coding sequence of 1000 genes located in recurrently altered genomic regions in AML, which are prone to harbor genes of relevance to leukemogenesis and have been identified in our previous analyses (supplemental Table 1). To capture the spectrum of AML, we took a representative AML sample cohort (n = 50) including 19 CN-AML cases with known mutations in routinely tested genes (CEBPA, NPM1, FLT3, or WT1), 9 CN-AML cases without any known mutations, 7 AML with complex karyotype, 14 core-binding factor (CBF) AML cases characterized by either t(8;21) or inv(16), and 1 case with MLL translocation t(11q23). For these cases, diagnosis (= “tumor”) and complete remission (= “germline”) samples were compared. On average, 28.6% reads were uniquely mapped on target, and we obtained a 58.2-fold mean target coverage of diagnosis and remission samples. In addition, 86.1% of the target regions were covered by more than 10 reads, which was previously reported to be sufficient for variant calling (supplemental Table 2).6

After filtering changes in intergenic or intronic regions and germline polymorphisms, we identified a total of 268 predicted somatic variants in all sequenced exons of the 50 AML patients (Figure 1A), on average 5.3 SNVs per library (Figure 1B). After selection of variants nonpresent in dbSNP132, we observed on average 3.6 novel SNVs per library (in total n = 182 SNVs); while 1.2 of these SNVs either affected noncoding RNAs (ncRNAs), 5′- or 3′UTRs, or were synonymous, only 2.4 SNVs resulted in an amino acid change or a stop codon (in total n = 120, or 2.2 missense SNVs, 0.2 nonsense SNVs per library; Figure 1B, supplemental Tables 3-4). Based on this initial analysis, we selected 64 SNVs for validation by Sanger sequencing, which could confirm 42 (65.6%) SNVs as somatic, 2 (3.1%) were germline mutations, and 20 (31.3%) were false-positive calls. Based on those results, we refined our filtering criteria to obtain a list of candidates with a greater than 95% probability of being true positive calls. Using these refined filter criteria, we found 60 nonsilent SNVs in 50 cases of which 90.2% were predicted to be damaging for protein function (Figure 1, Table 1).

Figure 1

Overview of the data-filtering procedure for the identification of single nucleotide variations (SNVs) in paired tumor-remission AML samples (n = 50). (A) Overview and numbers of SNVs in the individual filtering steps. A detailed description of the filtering parameters is given in supplemental Methods. All detected SNVs and indels are listed in supplemental Table 4. (B) Mean number of SNVs per patient with certain properties. dbSNP132 indicates single nucleotide polymorphism database, build 132, NCBI; UTR, untranslated region; and ncRNA, noncoding RNA.

View this table:
Table 1

Summary of high-confidence somatic SNVs and indels found in 50 AML patients

With regard to the detection of insertions and deletions (up to 12 bp in size), we detected a total of 135 indels in the 50 AML cases. Validation by Sanger sequencing and filtering out false-positive indel calls in homopolymer sequences revealed a total of 79 high-confidence indels affecting coding exons; of the 1.6 indels per sample, 0.2 indels were annotated as variations and 0.2 indels affected 5′- or 3′UTRs, thereby resulting in 1.2 indels in which the frameshift will impact the protein level (n = 60; orange fraction in Figure 1B, and supplemental Figure 2).

Recurrently altered genomic regions are enriched for somatic mutations associated with chromatin remodeling

To further assess the pathologic relevance of discovered genes, we performed a gene set overrepresentation analysis of the affected genes. The analysis revealed a specific and significant enrichment of functionally related sets of genes, with 5 of the 17 most significant gene sets being implicated in epigenetic regulation of transcription (P < .01; supplemental Table 5). Interestingly, the most significantly enriched functional categories included gene sets involved in chromatin modification, chromosome organization, and histone H4 acetylation, pointing to a general importance of epigenetic deregulation in AML.

Individual spectrum of identified somatic mutations

The high-confidence somatic SNVs and indels affected 73 different genes, mostly in a nonrecurrent manner except for not yet reported mutations in RAD21 and the known gene mutations commonly seen in AML (Table 1, supplemental Table 4). In accordance, the gene mutations known to be present in our analyzed leukemia samples, such as NPM1 mutations, were also detected by our NGS-based approach, thereby validating the quality of our data (note: FLT3-ITDs could not be detected because of the large size of the inserted duplications).

While we on average detected 3.6 somatic frameshift, missense, or nonsense mutations per case, the number of aberrations in individual cases ranged from 0 to 23 (Figure 2A). Notably, we observed at least 1 frameshift, missense, or nonsense mutation in 48 (96%) of 50 of samples, with only 2 samples showing no novel aberrations (nos. 26 and 40, except routinely screened FLT3-ITD in no. 40), and, on the other hand, in 1 case (no. 46) being characterized by 23 aberrations.

Figure 2

Targeted resequencing results of 50 paired tumor-remission AML samples. (A) Distribution of numbers and categories of somatically acquired point mutations among the 50 cases. (B) Fraction of reads reporting mutated frameshift/missense/nonsense alleles from targeted resequencing data for each case. No frameshift/missense/nonsense mutations were found for cases 26 and 40. Mutations in recurrently mutated AML genes identified in this screen are shown as colored points, with nonrecurrent mutations as black points. CBF-AML indicates core-binding factor AML; CK-AML, complex karyotype AML; CN-AML, cytogenetically normal AML; FLT3*, FLT3 with internal tandem duplications (FLT3-ITD); dbSNP132, single nucleotide polymorphism database, build 132, NCBI; UTR, untranslated region; and ncRNA, noncoding RNA.

Genes affected by somatic mutations: identification of old and new targets

Recent studies have demonstrated the deregulation of the epigenetic code in cancer, and in agreement with this observation, we found several mutations in chromatin modifiers regulating transcription by setting or removing of epigenetic marks. As expected, we detected mutations in TET2, reported to play a role in DNA demethylation, in 10% of the samples in our patient cohort, but in contrast to previous studies, we also observed a nonrecurrent damaging mutation in TET1 in 1 CBF-AML case.22 This suggests that like in lung, breast, and prostate cancer, TET1 mutations might also be a rare event in AML.23 Similarly, we detected DNMT3A mutations,2,6 but also found a DNMT1 mutation in a patient with CK-AML, which might affect microsatellite stability and mismatch DNA repair.24 Furthermore, we detected mutations in the histone methyltransferase–coding gene NSD1, which has been found to be mutated at low frequency in AML.6 While the NSD1 SNVs (resulting in p.P530L and p.Y1971C) identified in our analysis are predicted to be damaging, they do not cluster in any specific domain of the protein and differ from those recently described.6 In addition, we found mutations in the histone methyltransferase genes MLL, MLL3, and EZH2, as well as in the histone demethylase KDM5A and factors involved in histone acetylation like TRRAP and EP300. Although in our cohort these genes were affected in a nonrecurrent manner, most of them were already reported mutated in cancer, including hematopoietic malignancies, thereby suggesting a potential pathogenic role.

New recurrent mutations in AML were found in RAD21, which in our sample cohort occurred at the same frequency as alterations in DNMT3A (Table 2). Because RAD21 was initially described as a component of the cohesin complex, which regulates chromosome cohesion and segregation in mitosis as well as double-strand break (DSB) DNA repair by tethering DNA molecules,25 mutations in RAD21 might contribute to genome instability. Notably, we also detected a mutation in the CTCF gene coding for a zinc finger protein, which similarly to cohesin mediates chromatin interactions, regulates gene expression, and has been shown to cooperate with RAD21.26 Moreover, it has recently been reported that in addition to the regulation of transcription via 3-dimensional (3D) modulation of chromatin structure and DNA looping, CTCF plays an important role in pre-mRNA processing and regulates splicing by initiating polymerase II (Pol II) pausing and spliceosome assembly.27 With regard to alterations affecting splicing, we also found a novel missense mutation in a component of the spliceosome SFPQ/PSF in a CN-AML patient with NPM1 mutation and FLT3-ITD (Table 1). SFPQ encodes an RNA and DNA-binding protein with multiple functions involved in processing of pre-mRNA and regulating transcription in complex with histone deacetylases (HDAC) and Sin3A.28,29 Interestingly, SFPQ has recently been identified in a mass spectrometry screen as part of the RAD21 interactome.30

View this table:
Table 2

Recurrent somatic SNVs and indels found in 50 AML patients

As for other well-known genes implicated in AML, we found recurrent mutations in WT1 (12% of the samples), NRAS, and KRAS (in 10% and 6% of the samples, respectively), and KIT (in 30% of 13 CBF-AML samples) at incidences similar to those previously reported in AML.1 Furthermore, we observed mutations in the transcription factor-coding gene GATA2, which plays a critical role in hematopoietic differentiation and was recently reported to be mutated in 3.6% of acute monocytic leukemias.6 In line with previous findings, 2 of the GATA2 mutations were located in the zinc finger–containing DNA-binding domain most frequently affected in hematopoietic malignancies (Table 2). Other transcription factors with nonrecurrent damaging SNVs and indels included E2F1, RUNX1, SPI1/PU.1, and IKZF1 (Table 1). As E2F1 deficiency impairs DSB DNA damage repair and contributes to genome instability, we found a missense mutation predicted to be damaging (p.I188L) in a CK-AML case. While RUNX1 mutations are common in AML (reported in 5.6% cases),31 alterations in SPI1/PU.1 are considered to be a rare event because only 1 mutation was found in a previous analysis of 112 AMLs with normal karyotype.32 Nevertheless, the stop codon causing SPI1/PU.1 mutation in a CN-AML patient also exhibiting a TET2 mutation might be functionally relevant. Mutations, deletions, and dominant-negative forms of IKZF1 were mainly reported in lymphoid malignancies, and thus far have not been found in AML.33 In our study, a missense IKZF1 mutation was found in a subclone, as it was only seen in 26% of the tumor reads of a GATA2 mutant CN-AML.

Co-occurrence of SNVs and CNVs

As sequencing coverage is proportional to copy number, we assessed copy number variations (CNVs) based on our NGS data with the idea to see whether the newly identified aberrations are affected by 2 genomic hits. To eliminate the variability in target coverage originating from sequence composition of different exons, we normalized the coverage in tumors by subtracting the coverage of the matched remission samples. In parallel, we estimated copy number variations using SNP microarray platforms and compared findings. Based on both analyses, we did not find any case in which the mutated gene was also affected by deletion of the unmutated copy, thus suggesting haploinsufficiency or dominant-negative effects to play a significant role, provided that the remaining copy is not silenced by epigenetic mechanisms.

In line with this observation, we generally did not observe the fraction of reads reporting the variant to be greater than ∼ 58%, except for CENPJ and RSBN1L with read frequencies of 68% and 83%, respectively (Figure 2B). While the high frequency of RSBN1L mutations was not linked to a CNV or acquired uniparental disomy (UPD), we found an acquired copy number neutral loss of heterozygosity, that is, an UPD, in the region of chromosome 13 encompassing CENPJ in the relapse sample, which might illustrate the presence of a subclone with UPD encompassing CENPJ at diagnosis (supplemental Figure 3).

Furthermore, we could detect a gain on chromosome 8 to be associated with RAD21 mutations on 8q24.11 in 2 of 3 cases (Figure 3). In a t(8;21) CBF-AML case with trisomy 8, we observed 2 different frameshift insertions in exon 9 affecting 1 allele of RAD21, thereby suggesting that the unmutated chromosome was duplicated (Figure 3A-C). In contrast, in a CK-AML case, we found a heterozygous nonsense mutation resulting in a truncated RAD21 lacking 199 amino acids, which was located in the gained part of chromosome 8. Supported by 49% of reads, in this case the mutation most likely affected 2 copies of RAD21 given a ∼ 80% fraction of leukemic cells in the sample (Figure 3D-F). While one might speculate about a dominant-negative effect of the truncated RAD21 protein based on the first case, a dose-dependent impact of RAD21 mutations on the regulation of cellular processes such as gene expression provides a possible explanation for a selective advantage associated with duplication of the mutated allele in the latter case.

Figure 3

Sequencing coverage and SNP Array 6.0 analysis reveal copy number variations in region with RAD21 mutations. (A) DNA sequence chromatograms of tumor and remission samples show tumor-specific frameshift mutations in RAD21: insertions */+TTAG (chr8: 117866622) and */+TT (chr8: 117866620) affecting 1 RAD21 allele in a patient with t(8;21)–positive CBF-AML. (B) Analysis of the coverage variation across the genome (indicated on the horizontal axis) reveals a gain of chromosome 8 (chr 8). The coverage of the tumor sample was normalized by subtracting the coverage of the remission sample; each point corresponds to the scaled and normalized median read number in a 300-bp-wide target region. (C) SNP 6.0 copy number variation analysis confirms the diagnosis-specific gain of chr 8 in the patient. Top line (in blue) indicates plot of chr 8 from diagnosis sample; and bottom line (in green) corresponds to chr 8 in matched remission sample. (D) Somatic nonsense mutation in exon 9 of RAD21 (chr8: 117864815, c.C1294T:p.Q432X) confirmed by Sanger DNA sequencing of tumor and remission samples in a patient with CK-AML. (E) Coverage variation indicates a gain of the long arm of chr 8. The plot is generated as described in panel B. (F) 250K SNP array analysis reveals the diagnosis-specific gain on the long arm of chr 8.

Incidence of RAD21 mutations

To evaluate the incidence of RAD21 mutations, we screened the coding exons 2 to 14 of RAD21 for the presence of mutations in 120 AML patient samples. We found 6 novel SNVs in 6 patients, which included synonymous (n = 2), missense (n = 3), and nonsense (n = 1) mutations (supplemental Figure 4). Mutations were distributed over the entire gene and did not cluster in any particular exon. All missense SNVs were predicted to be damaging and have an impact on protein function.

In combination with the NGS findings, a total of 7 of 170 AML cases (4.1%) presented with RAD21 mutations affecting the protein level. Notably, the RAD21 SNVs were significantly more frequent in patients with RAS mutations (P = .022, Fisher test based on 4 RASmut/RAD21mut, 27 RASmut/RAD21wt, 3 RASwt/RAD21mut, and 136 RASwt/RAD21wt patients), and all CN-AML cases with RAD21 mutations harbored a NPM1 mutation (Table 3).

View this table:
Table 3

SNVs and indels found in RAD21 in 170 AML patients


In this study, we sequenced the coding exons of 1000 genes, which were defined by their location in recurrently altered genomic regions, in a cohort of 50 AML cases representative for all major cytogenetic AML subgroups. While the use of remission samples as germline DNA surrogate imposes the risk to miss acquired mutations persisting after therapy, we could identify 73 genes to be affected by somatic nonsilent mutations and small indels. Most of the genes were targeted in a nonrecurrent manner, thereby confirming the complexity and variation of events underlying malignant transformation and confirming that genomic regions recurrently altered in AML are also targeted by gene mutations. Notably, this subset of affected genes was significantly enriched in functional GO categories like chromatin modification, chromosome organization, and histone H4 acetylation, and in 40% of patients mutations affected at least 1 gene linked to epigenetic regulation of transcription. As in AML, a similar enrichment of mutations in genes involved in regulation of transcription, chromatin modification, and DNA methylation was found in other hematologic malignancies such as DLBCL12 and follicular lymphoma,11 as well as in many solid tumors.34,35

In addition to somatic mutations affecting “classic” epigenetic regulators, our study revealed mutations in so-called 3D-chromatin modifiers, which alter patterns of gene expression from DNA loci through regulation of DNA looping and modulation of chromatin architecture in the nucleus.36,37 For example, we observed a frameshift deletion in CTCF, encoding a DNA-binding protein and transcriptional regulator. Interestingly, a somatic nonsynonymous CTCF mutation was recently also identified in acute lymphoblastic leukemia (ALL).38 As a nuclear factor mediating intra- and interconnections of chromosomes and an insulator of transcription, CTCF has also been shown to promote RNA-Pol II pausing and to regulate mRNA splicing.27,39 The unexpected function of CTCF in splicing seems to have a direct impact on specific target molecules such as CD45, for which isoforms are strictly regulated in lymphocyte development, but CTCF also appears to have a global effect on splicing as determined by high-throughput RNA sequencing of CTCF-depleted Burkitt lymphoma–derived cell lines.27

Notably, CTCF often cooperates with RAD21, an evolutionary conserved subunit of the cohesin complex,26,40 which now turns out to be also recurrently mutated in AML. To the best of our knowledge, this is the first report of RAD21 mutations in AML, which affect ∼ 4% of patients and seem not to be restricted to a specific cytogenetic AML subgroup, as we detected RAD21 mutations in a CBF-AML, CK-AML, and CN-AML with NPM1 mutation. While 4 of the 7 cases had a RAS mutation, additional mutational screening is needed to elucidate a possible role of RAD21 mutations in RAS-promoted leukemogenesis and to further highlight the potential impact of gene dosage.

Besides an important role in DSB DNA repair, where RAD21 provides a linkage necessary for homologous recombination of damaged and intact chromatids, RAD21 alone or in complex with other proteins governs chromatin topology and mediates long-range chromosomal interactions in cis.41 Furthermore, it also directly regulates transcription of genes implicated in pluripotency, cell proliferation, and differentiation,37 and monoallelic loss of RAD21 has been associated with impaired bone marrow stem cell clonogenic regeneration, while homozygous deletion results in early embryonic death.42 In addition, in siRNA-based screens, RAD21 was recently also identified as a functional epigenetic silencing factor along with other genes like DNMT3A, and TRIM24, both of which are also mutated in AML.43

Recently, stalling of Pol II at RAD21/CTCF-binding sites was identified, whereas knockdown of RAD21 abolished this pausing of transcription.39 Exploration of the RAD21 interactome by yeast-2-hybrid screen and immunoprecipitation-coupled mass spectrometry identified several molecules involved in mRNA processing and splicing as potential interaction partners of RAD21, among them a splicing factor, SFPQ, which was also found mutated in this study.30

This is in accordance with recent reports that identified a growing number of mutations affecting genes associated with mRNA splicing in hematologic malignancies. For example, in patients with MDS, multiple components of the RNA splicing machinery were found mutated, especially genes of the early mRNA processing complex (E/A complex), which is involved in the earliest step of spliceosome formation.10 In MDS, mutations comprise the E/A-complex components U2AF35, SRSF2, ZRSR2, SF3B1, and less frequently SF3A1, PRPF40B, U2AF65, and SF1. While these genes were not contained in our candidate gene list, we however detected a nonsynonymous damaging mutation in SFPQ/PSF, coding for a RNA and DNA-binding protein that is a component of the spliceosome C complex with an essential role for the catalytic step II of the splicing reaction. In cancer, SFPQ has been shown to be translocated to TFE3 and ABL in papillary renal cell carcinoma and in B-cell progenitor ALL, respectively.44,45 Thus, mutated SFPQ might play a significant role in deregulation of splicing in leukemic cells, especially as depletion in SFPQ also inhibits formation of the early spliceosome.28,46

Screening of AMLs derived from MDS revealed mutations of splicing-associated genes in ∼ 26% of the cases, whereas at the same time only ∼ 7% of de novo AML had mutations in 1 of the 8 splicing factors analyzed.10 In AML, we now found mutations in several genes coding for multifunctional proteins and noncanonical regulators of splicing, such as RAD21 and CTCF, as well as additional mutations in classic mRNA processing factors like SFPQ.

While in our cohort 10% of the cases harbored 1 of these mutations, our findings imply the possibility that in contrast to the mutations affecting the E/A complex in MDS, additional levels of splicing deregulation might be involved in the pathogenesis of AML and thus might play a greater role as initially thought. Furthermore, in line with all the recent reports demonstrating the complexity of genomic aberrations in AML, we also observed a wide spectrum of different mutations affecting well-known as well as novel AML targets. As for the incidence and clinical impact of all these individual mutations, large comprehensive trials are warranted and might give further hints with regard to the most relevant aberrations. In addition, the delineation of all mutations that lead to deregulated splicing or epigenetic silencing will provide the basis for a better understanding of the complex nature of common leukemia phenotypes, which then could be directly targeted in the future.


Contribution: A.D. designed and performed research, analyzed and interpreted data, and wrote the manuscript; J.C.E. designed the research, developed analysis software, analyzed and interpreted data, and wrote the manuscript; M.S.-S., S.K.-S., T.F., J.K., M.W.M.K., P.P., S.K., S.W., and V.I.G. performed research; J.M. and F.G.R. performed research and analyzed and interpreted data; B.H. and R.F.S. analyzed and interpreted data; H.D. and K.D. designed research, contributed vital reagents, analyzed and interpreted data, and wrote the manuscript; C.L. designed the research, provided the essential computing environment, and wrote the manuscript; and L.B. designed the research, performed research, analyzed and interpreted data, and wrote the manuscript.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Lars Bullinger, MD, Department of Internal Medicine III, University Hospital of Ulm, Albert-Einstein-Allee 23, 89081 Ulm, Germany; e-mail: lars.bullinger{at}


For excellent technical assistance, we thank Ulla Botzenhardt, Sabrina Skambraks, Michaela Rehl, Carmen Blersch, Marianne Habdank, as well as Daniela Späth, from our study center for providing clinical information. For helpful suggestions and discussion, we thank Stefan Fröhling. Furthermore, we thank all the members of the German-Austrian AMLSG for their participation in the clinical trials and for providing leukemia samples.

This work was supported in part by Bundesministerium für Bildung und Forschung (BMBF) grants 01GS0871 (L.B., K.D., H.D.), 01GS0882 (C.L.), and TS-01GR0802-2 (M.S.-S.), and L.B. was supported by the Deutsche Forschungsgemeinschaft (Heisenberg-Stipendium BU 1339/3-1).


  • * A.D. and J.C.E. contributed equally to this work.

  • This article contains a data supplement.

  • The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

  • Submitted December 28, 2011.
  • Accepted September 2, 2012.


View Abstract