Mutational and structural analysis of diffuse large B-cell lymphoma using whole-genome sequencing

Ryan D. Morin, Karen Mungall, Erin Pleasance, Andrew J. Mungall, Rodrigo Goya, Ryan D. Huff, David W. Scott, Jiarui Ding, Andrew Roth, Readman Chiu, Richard D. Corbett, Fong Chun Chan, Maria Mendez-Lago, Diane L. Trinh, Madison Bolger-Munro, Greg Taylor, Alireza Hadj Khodabakhshi, Susana Ben-Neriah, Julia Pon, Barbara Meissner, Bruce Woolcock, Noushin Farnoud, Sanja Rogic, Emilia L. Lim, Nathalie A. Johnson, Sohrab Shah, Steven Jones, Christian Steidl, Robert Holt, Inanc Birol, Richard Moore, Joseph M. Connors, Randy D. Gascoyne and Marco A. Marra

Data supplements

Article Figures & Data


  • Figure 1

    Mutation spectra and significantly mutated genes. (A) The somatic point mutation spectrum observed genome-wide in each of the 40 cases. Overall, mutations affecting TA base pairs were more common than CG pairs, with TA>CG transitions the most common mutation observed on average. Some of the outliers, such as RG043, RG014, and RG111, harbored mutations in genes involved in DNA repair (supplemental Table 1; “Discussion”). (B) Mutated genes with significant evidence for positive selection (false discovery rate = 0.08) are ordered on the x-axis based on selective pressure estimate. The y-axis shows the adjusted P value such that highly significant genes, typically because of a larger number of observed mutations, lie toward the upper right. The size of the circles is proportional to the number of cases in the patient cohort in which a nonsilent or splice site SNV was identified. Significant genes identified in the larger patient cohort in our previous RNA-seq study are purple, and those identified in separate studies3-5 are blue. The remaining 41 genes shown in pink have not, to our knowledge, been identified by others as significant targets of point mutation in DLBCL. Genes denoted with crosshairs indicate those with secondary support for mutations from other studies or the 13 DLBCL cell lines sequenced here (see supplemental Table 2 for details and references). Genes affected by splice site mutations included the known tumor suppressor genes MLL2, RB1, CREBBP, and TP53, as well as others with signatures indicative of inactivation, including DNAH5 and SGK1.

  • Figure 2

    Mutations affecting GNA13 in a large cohort of DLBCLs. Guided by the prevalence of GNA13 mutations in our DLBCL cohorts analyzed by RNA-seq and WGS, we sought to ascertain the full mutational landscape of this gene across a large number of de novo DLBCL cases (n = 279), of which 182 had been classified as GCB or non-GCB using immunohistochemistry.30 (A) Nonsilent SNVs, indels, or splice site mutations were detected in a total of 40 patients (14.3%), with many cases harboring more than a single mutation. Up to five nonsilent mutations affecting GNA13 were observed in one patient. Overall, multiple truncation inducing mutations including frameshift indels and introduced stop codons were observed. The ratio of transitions to transversions and the large number of mutations affecting the WRCY/RGYW motif is consistent with AID-mediated mutation; however, there was no observable enrichment of mutations in the 5ʹ end of the locus (supplemental Table 5). In agreement with our previous observation, GNA13 mutations were strongly enriched in GCB, with 29 of 89 GCB (32.6%) cases having at least a single mutation in this gene and only 2 of 91 non-GCB cases mutated. (B) We mapped each of the mutations to the solved structure of Galpha13 (PDB accession no. 3AB3) and observed some nonsynonymous mutations in close proximity to the catalytic site (C), including multiple residues that interact directly with the substrate (GTP). Taken in conjunction with the prevalence of truncating mutations, we predict these likely inhibit the signaling activity of Gα13.

  • Figure 3

    Overview of rearrangements, CNAs, SNVs, SHM and focal deletions detected. (A) Inner arcs represent somatic rearrangements from each of the patient genomes, with a different color depicting each case. Cumulative summaries of all the somatic CNAs detected across all 96 cases are depicted in blue (deleted regions) and red (amplified regions). SHM targets identified from these genomes22 are indicated with blue circles with diameter proportional to the number of mutated cases. (B) Small deletions are often not detectable by copy number analysis methods. Our de novo assembly-based pipeline identified breakpoints representing small deletions (indicated by blue bars), some of which affected a single gene. Two cases were found to have such deletions affecting ETV6. Of note, a fusion involving ETV6 and the immunoglobulin heavy chain locus was observed in a separate case. Deletions affecting other genes likely to be relevant to DLBCL are also shown. FHIT, with a focal deletion shown here, was also a common target of larger deletions by CNA analysis. S1PR2 was also a significant target of somatic point mutations and functionally cooperates with proteins encoded by GNA13 and GNAI2 (“Discussion”). The 2 deletions affecting TP63 in a single case are also shown (see supplemental Figure 12). The upper 3 transcripts represent TA isoforms, whereas the lower 2 correspond to Δ N isoforms. UTX is a histone demethylase that acts on H3K27, the same lysine targeted by EZH2, which is a target of activating mutations in NHL. A recently described small molecule inhibitor of EZH2 activity showed efficacy in DLBCL cell lines with UTX mutations.31

  • Figure 4

    A likely chromothripsis event resulting in loss of the CDKN2A/B locus. Shown are regions of somatic copy number loss (blue) detected by HMMCopy analysis of a single case. Gray arcs represent rearrangement breakpoints and connections, as determined by contigs resulting from whole genome assembly of that case. (A) The rearranged region includes a series of deleted segments and encompasses many genes. The coordinated loss of genetic material and focused rejoining of fragments in these discrete regions is indicative of a single mutational event followed by DNA repair in a single cell cycle and is consistent with the chromothripsis model.32 (B) An expanded view of the boxed region from (A) is shown. One of the deleted segments encompasses both CDKN2A and CDKN2B, known to be targets of focal deletion in DLBCL6 and also found to be commonly deleted in this cohort.

  • Figure 5

    Timing of chromosomal duplications in DLBCL evolution. The sequence data can be used to approximate the relative time in which individual large amplifications/gains occurred during the evolution of the tumor. Here, the timing estimate for amplifications detected in each genome is shown for 5 chromosomes commonly affected by such events. The genomic coordinates amplifications are shown on the x-axis, with separate colors indicating events detected in different individuals. The y-axis shows the time at which the event was estimated to occur, with events near the bottom arising earlier in tumor development and those near the top arising later. Only events for which we could precisely calculate timing were included (confidence interval range < 0.2). Samples involving approximated REL amplifications are shown separately (supplemental Figure 17). Despite arising later in some cases, gains of 18 were nonetheless one of the earliest of all amplification events detected (supplemental Figure 18). The genes targeted by amplifications of 11q and 1q have not been conclusively identified. The position of ETS1 is indicated because it also a significant target of somatic point mutations in our data, and thus a potential novel oncogene. The region of overlap between the regions gained on 1q contains a small number of genes including IKBKE, which encodes IkappaB kinase ε, a positive regulator of RELA.38 Among the 96 DLBCL cases analyzed for CNAs, IKBKE expression was significantly higher in amplified cases (P = .00745, Wilcoxon Rank Sum test).