Association of polygenic risk score with the risk of chronic lymphocytic leukemia and monoclonal B-cell lymphocytosis

Geffen Kleinstern, Nicola J. Camp, Lynn R. Goldin, Celine M. Vachon, Claire M. Vajdic, Silvia de Sanjose, J. Brice Weinberg, Yolanda Benavente, Delphine Casabonne, Mark Liebow, Alexandra Nieters, Henrik Hjalgrim, Mads Melbye, Bengt Glimelius, Hans-Olov Adami, Paolo Boffetta, Paul Brennan, Marc Maynadie, James McKay, Pier Luigi Cocco, Tait D. Shanafelt, Timothy G. Call, Aaron D. Norman, Curtis Hanson, Dennis Robinson, Kari G. Chaffee, Angela R. Brooks-Wilson, Alain Monnereau, Jacqueline Clavel, Martha Glenn, Karen Curtin, Lucia Conde, Paige M. Bracci, Lindsay M. Morton, Wendy Cozen, Richard K. Severson, Stephen J. Chanock, John J. Spinelli, James B. Johnston, Nathaniel Rothman, Christine F. Skibola, Jose F. Leis, Neil E. Kay, Karin E. Smedby, Sonja I. Berndt, James R. Cerhan, Neil Caporaso and Susan L. Slager

Key Points

  • PRS, based on the known CLL loci, predicts CLL risk with high discrimination.

  • This PRS predicts risk of monoclonal B-cell lymphocytosis, a precursor to CLL and a condition that has clinical impact beyond risk for CLL.


Inherited loci have been found to be associated with risk of chronic lymphocytic leukemia (CLL). A combined polygenic risk score (PRS) of representative single nucleotide polymorphisms (SNPs) from these loci may improve risk prediction over individual SNPs. Herein, we evaluated the association of a PRS with CLL risk and its precursor, monoclonal B-cell lymphocytosis (MBL). We assessed its validity and discriminative ability in an independent sample and evaluated effect modification and confounding by family history (FH) of hematological cancers. For discovery, we pooled genotype data on 41 representative SNPs from 1499 CLL and 2459 controls from the InterLymph Consortium. For validation, we used data from 1267 controls from Mayo Clinic and 201 CLL, 95 MBL, and 144 controls with a FH of CLL from the Genetic Epidemiology of CLL Consortium. We used odds ratios (ORs) to estimate disease associations with PRS and c-statistics to assess discriminatory accuracy. In InterLymph, the continuous PRS was strongly associated with CLL risk (OR, 2.49; P = 4.4 × 10−94). We replicated these findings in the Genetic Epidemiology of CLL Consortium and Mayo controls (OR, 3.02; P = 7.8 × 10−30) and observed high discrimination (c-statistic = 0.78). When jointly modeled with FH, PRS retained its significance, along with FH status. Finally, we found a highly significant association of the continuous PRS with MBL risk (OR, 2.81; P = 9.8 × 10−16). In conclusion, our validated PRS was strongly associated with CLL risk, adding information beyond FH. The PRS provides a means of identifying those individuals at greater risk for CLL as well as those at increased risk of MBL, a condition that has potential clinical impact beyond CLL.


Chronic lymphocytic leukemia (CLL) is a subtype of non-Hodgkin lymphoma (NHL) and is characterized by an absolute B-cell lymphocyte population >5 × 109/L, with clonal cells having a characteristic immunophenotype.1 CLL has 1 of the strongest familial risks among cancer sites,2 with a sixfold to ninefold increased risk for first-degree relatives of CLL cases.3,4 Thus far, family studies have identified only a few rare variants that potentially explain CLL susceptibility in a handful of families.5-7 In contrast, genome-wide association studies (GWAS) have provided clear evidence that common inherited variants have a role in the etiology of CLL with >40 loci identified to date.8-15 Each of these loci contains common single nucleotide polymorphisms (SNPs) that are statistically associated with CLL risk. Based on the location of these SNPs, possible biological mechanisms of these SNPs may relate to B-cell development, apoptosis, and telomere length maintenance, but because the effect sizes of these SNPs are small (with odds ratios [ORs] <2.0 per variant allele) the individual SNPs are weak predictors of CLL risk. It has been shown in other cancers that when combining representative SNPs from susceptibility loci into a single summary measure, known as a polygenic risk score (PRS), one observes increased risk prediction.16,17

Monoclonal B-cell lymphocytosis (MBL) is an established precursor to CLL18 and has a similar immunophenotype to that of CLL, but patients have an absolute B-cell count <5.0 × 109/L and no evidence of lymphadenopathy.1 MBL increases with age, affecting <0.5% of individuals younger than age 40, 5% of those age 40 to 60, and >10% of those older than age 60.19-22 Studies have also shown higher prevalence of MBL among individuals with 2 or more family members with CLL compared with the general population.23 Beyond aging and family history (FH) of CLL, little is known about factors that increase the risk of MBL. Two genetic epidemiology studies evaluated at most 10 CLL susceptibility loci with MBL risk and found evidence of association15,24; 1 epidemiological study suggested that lifetime exposure to several infectious agents may be associated with MBL risk.25

Herein we examine the effect of combining representative SNPs from all of the established CLL susceptibility loci into a PRS with risk of CLL. We evaluate whether this PRS is modified by family history status or other potential CLL risk factors. We demonstrate the validity and discriminative ability of this risk score in an independent set of CLL cases and 2 sets of controls and, finally, we examine the association of the PRS with MBL risk.


Study populations


The International Lymphoma Epidemiology Consortium (InterLymph) is a scientific forum for epidemiology research in NHL, including CLL subtype ( Through this venue, we identified those contributing NHL case-control studies with incident CLL cases and controls. The individual level exposure data from risk factor questionnaires were pooled and harmonized through the InterLymph Subtypes Project.26,27


The NHL GWAS was a large international initiative to identify genetic loci associated with specific NHL subtypes, including CLL.10,28-30 Contributing studies with CLL cases and controls were from 8 prospective cohort studies, 8 population-based case-control studies from InterLymph, 5 clinic or hospital based case-control studies from InterLymph, and 1 family-based study from the Genetic Epidemiology of CLL (GEC) Consortium ― with a total of 2849 CLL cases and 7983 controls available (supplemental Table 1 available on the Blood Web site). Details about the GWAS data can be found elsewhere,10,13,14 but in brief, samples were genotyped using the Illumina OmniExpress, Affymetrix 6.0, or Illumina HumanCNV370-duo arrays. Each genotyping array underwent rigorous quality control metrics as previously detailed.10,13,14 Within the contributing studies of the NHL GWAS, we used a subset of 1499 CLL cases and 2459 controls from 8 InterLymph case-control studies who also had harmonized risk factor questionnaire data (Table 1). Six of the 8 studies were population based and the remaining 2 were clinic/hospital based case-control studies of incident CLL cases. All 1499 CLL cases and 2459 controls were non-Hispanic Caucasians.

Table 1.

Demographic characteristics for InterLymph case-control studies with exposure data by CLL/control status

GEC Consortium.

The GEC Consortium is an international collection of families from North America with 2 or more relatives with CLL.15,31 Families with 2 or more living members with CLL were ascertained from 9 institutions (Table 2). First-degree relatives from 7 of the 9 institutions of CLL cases were also recruited. Recruitment at each site occurred through hematology clinics, cancer registries, or the Internet. Flow cytometry for MBL screening in unaffected relatives was done on fresh or frozen blood samples as previously described.23 MBL with CLL-like phenotype with presence of monoclonal B-lymphocytes coexpressing CD19, CD5, with weak or no expression of CD20,32 were only included because this is the most common phenotype among MBL with a similar immunophenotype to that of CLL, yet clinically different from CLL. A total of 93% of the MBL were low-count MBL, defined by clonal B-cell counts <0.5 cells/L. Independent GEC samples contributed in both the discovery (supplemental Table 1) and validation stage. For our validation set, peripheral blood DNA was genotyped on Illumina OmniExpress at the National Cancer Institute Cancer Genomics Research Laboratory. Genotypes were called using Illumina GenomeStudio software, and duplicates showed >99% concordance. Extensive quality control metrics were used, including removing monomorphic SNPs, SNPs with call rates <95%, or SNPs with extreme Hardy-Weinberg disequilibrium (P < 10−5). We also dropped samples with call rates <90%, gender discordance, or had a monozygotic twin genotyped. After exclusions, 1149 samples (98%) passed quality control. We then removed patients who had other B-cell lymphoproliferative disorders (eg, Hodgkin lymphoma, multiple myeloma) and those who were included in the prior CLL GWAS studies within InterLymph. From the remaining 1031 individuals, we selected subsets of unrelated individuals for our analyses. Specifically, for validating the PRS in CLL cases and controls using GEC samples, we selected CLL cases (n = 135) and controls (n = 83) unrelated to each other. For evaluating our PRS in MBL cases and controls using GEC samples, we selected MBL (n = 95) and controls (n = 58) unrelated to each other. For our analyses using another independent set of Mayo Clinic controls (N = 1267; Table 2), we selected 201 unrelated CLL, 95 MBL, and 144 controls from the GEC Consortium. All GEC samples were non-Hispanic Caucasians.

Table 2.

Demographic characteristics for the GEC study by CLL/MBL/controls status

Mayo Clinic controls.

As another set of controls to validate the PRS associations, we pooled genotype data from 1267 non-Hispanic Caucasian controls (unknown for MBL and FH status) from Mayo Clinic. These controls were selected from patients seen in the general medicine practice at Mayo Clinic Rochester for a prescheduled general medical examination.29,33 They were genotyped on the Illumina OmniExpress. We followed the same rigorous quality control metrics as in the NHL GWAS. These samples were not previously used in the GWAS that discovered the CLL susceptibility SNPs. The mean age at consent was 56 years (standard deviation = 15) and 48% were men.

Contributing studies were approved by local ethics review committees, and all participants provided written, informed consent.

Genetic effects: PRS

To compute a PRS, we first identified representative SNPs from each of the CLL susceptibility loci. Representative SNPs are those with the most significant P value in the locus based on our fine mapping efforts.14 During the fine mapping stage, we included all discovered SNPs to date8-15 and excluded previous representative SNPs from 3 CLL loci (3q28, 4q26, 6p25.2) because these loci did not replicate (all SNP, P > 10−6) in the most recent and largest GWAS of CLL.14 Then, using these representative SNPs (N = 41), we computed a PRS for each individual. The PRS is a weighted average of the number of risk alleles across the representative CLL SNPs, with the weights being the log of the OR reported for each SNP (supplemental Table 2):14Embedded ImageWhere Embedded Image is the number of risk alleles carried by the jth individual at the ith SNP, and Embedded Image is the per-allele OR from the most recent and largest GWAS of CLL.14 The weights allow one to account for the effect size of each SNP on CLL risk. Using no weights means the SNPs are equality weighted. We show the distribution of the PRS by cases and controls in the combined study (supplemental Figures 1 and 3); for sensitivity analyses, we show the case and control PRS distributions by study (supplemental Figures 2 and 4).


From the InterLymph Subtypes project, a positive FH was defined as a person self-reporting any hematological malignancy among first-degree relatives.26,27 Hematological malignancies were defined as any NHL, Hodgkin lymphoma, multiple myeloma, or leukemia. From InterLymph, 67% had FH data available (71% among controls, 60% among cases). From the GEC Consortium, all members were FH positive for CLL, as defined by the inclusion criteria. FH data were not available for the Mayo Clinic controls.

Environmental exposures

Using the exposure data from the InterLymph Consortium, we evaluated effect modification and confounding of the PRS association by exposures that have been identified to be associated with CLL risk.26 These factors were harmonized in the InterLymph Subtypes Project26,27 and included history of total and recreational sun exposure; history of ever living or working on a farm; history of any atopy including asthma, eczema, hay fever, and allergies (allergies to plants, animals, dust, insects, mold, and food); and adult height. These exposure data were not available in the GEC Consortium or Mayo Clinic controls.

Statistical analyses

Differences in the distribution of risk factor data between CLL and controls were assessed using 2-sided χ2 tests or Student t test, where appropriate. Logistic regression was used to estimate OR and 95% confidence intervals (CIs) to assess the association of PRS with CLL or MBL risk and to evaluate effect modification of the PRS by FH or environmental exposures by including an interaction term in the model. Likelihood ratio tests comparing models with and without the interaction effects were used to test significance. We evaluated PRS as a continuous or categorical predictor with the PRS categorized by quintiles (for simplifying the interpretation) using the cutoff points based on the distribution of all controls (N = 7983) used in the NHL GWAS,13 our largest available control sample representing the general population. The middle quintile was used as the reference category, containing the most common PRS value observed in the general population. All regression models were adjusted for age, sex, study, and socioeconomic status (if available). To evaluate model discriminatory ability, we computed a c-statistic and 95% CI34 for the adjusted regression models. The c-statistic is equivalent to the area under the received operating characteristic curve and is the probability that the measure or predicted risk is higher for a case than for a control.35 A c-statistic = 0.5 is equivalent to chance, c-statistic >0.7 is a good discrimination between cases and control, c-statistic >0.8 is a strong discrimination, and c-statistic = 1 indicates perfect discrimination.36 We also evaluated the concordance of the observed log OR with that of the published log OR using Pearson’s correlation coefficient. Two-sided P < .05 indicated statistical significance. Statistical analyses were performed using R 3.4.0 and SAS 9.4 (SAS Institute, Cary, NC) software programs.


The median PRS in the subset of 8 case-control studies from InterLymph consisting of 1499 CLL cases and 2459 controls was 8.25 and 7.50, respectively (Figure 1A), which is similar to that of the overall 2849 CLL cases and 7983 controls from the NHL GWAS (supplemental Figure 1). For sensitivity analyses, we assessed the PRS distribution among NHL GWAS controls (supplemental Figure 2A) and cases (supplemental Figure 2B) by study. We observed consistent PRS distributions across the studies, especially for those with larger sample sizes (N > 50). Among all InterLymph CLL cases, almost one-half (49%) were in the upper PRS quintile (Q5), whereas only 5% were in the lowest quintile (Q1) (Table 3). The PRS was strongly associated with CLL risk (continuous PRS effect: OR, 2.49; P = 4.4 × 10−94) with a 3.64-fold increased risk (CI, 2.94-4.51) for upper (Q5) vs middle (Q3) quintile, and a 1.65-fold increased risk (CI, 1.31-2.08) for Q4 vs Q3 quintile. We also observed a significant inverse association (OR, 0.36; CI, 0.26-0.48) for Q1 vs Q3 quintile and for Q2 vs Q3 quintile (OR, 0.73; CI, 0.56-0.94). The PRS had high discrimination accuracy (c-statistic= 0.79; CI, 0.78-0.80).

Figure 1.

Polygenic risk score distribution by InterLymph CLL and controls and GEC CLL, MBL, controls, and Mayo Clinic controls. Histograms of polygenic risk scores (x-axis) and density (y-axis). (A) InterLymph CLL (dashed red line) and controls (solid purple line). (B) Mayo controls (purple), GEC controls (green), GEC MBL (blue), and GEC CLL (red). Vertical lines indicate the median for the corresponding polygenic risk score distribution.

Table 3.

PRS by case-control status and the association with CLL risk by FH status: InterLymph data

FH of any hematological malignancy in first-degree relatives (FH+) was also associated with CLL risk (OR, 2.04; CI, 1.53-2.73; supplemental Table 3). The model with FH alone showed good discrimination (c-statistic = 0.70; CI, 0.68-0.72), but was significantly lower than that in models with just PRS (P < .001). We considered the definition of FH restricted to any leukemia and had similar results (results not shown). When we modeled FH and PRS together, they were both individually statistically significant (both P < .0001; supplemental Table 4), and the c-statistic increased to 0.80 (CI, 0.78-0.81). We next stratified the InterLymph cases and controls by FH status to evaluate heterogeneity of the PRS effect by FH. The median PRS in the FH+ CLL cases and controls was 8.43 and 7.55, respectively, and in the non-FH (FH) CLL cases and controls, the median PRS was 8.23 and 7.49, respectively (supplemental Figure 3). Although we did not observe a significant interaction between FH and PRS (P = .21), we observed differences in effect size when considering a continuous PRS variable in the model; a 3.79-fold (CI, 2.44-5.87) increased risk among the FH+ group compared with a 2.46-fold (CI, 2.19-2.76) increased CLL risk among the FH group (Table 3). We evaluated the effect of other CLL risk factors on the association of the PRS with CLL risk and found little to no evidence of confounding or effect modification; all interaction P values were >.10 (supplemental Table 3).

To validate the PRS association, we used an independent replication sample of unrelated CLL cases and controls from the GEC Consortium. As part of the eligibility requirement within GEC, all individuals have a FH of CLL. The median PRS was 8.47 and 7.96 in the cases and controls, respectively. The PRS distribution by recruitment site was consistent for controls, especially for recruitment sites with >15 individuals, whereas for CLL it was less stable across recruitment sites (supplemental Figure 4). We validated the PRS association with CLL risk (continuous PRS effect: OR, 2.44; CI, 1.65-3.62; P = 9.0 × 10−6) (Table 4), and the discrimination ability was comparable to that of the InterLymph data (Table 4; c-statistic = 0.80; CI, 0.74-0.85). Because of the limited number of unrelated controls from the GEC Consortium, we also compared the GEC CLL cases to a separate set of 1267 Mayo controls (Table 5). The distribution of the PRS among the Mayo controls was strikingly similar to that of the InterLymph controls and controls from the NHL GWAS (supplemental Figure 4), with a median PRS = 7.59. We again observed a strong association with PRS (continuous PRS effect: OR, 3.02; CI, 2.49-3.65; P = 7.8 × 10−30) and good discrimination (c-statistic = 0.78; CI, 0.74-0.81). Finally, there was good concordance between the published log OR for the 41 CLL SNPs and the observed log OR from the analyses using the GEC CLL cases and Mayo controls (Pearson’s correlation ρ = 0.53, P = 3.6 × 10−4; supplemental Figure 5; supplemental Table 2).

Table 4.

Association between PRS and GEC CLL/MBL risk compared with GEC controls

Table 5.

Association between PRS and GEC CLL/MBL/controls risk compared with Mayo Clinic controls

We next evaluated the association of the PRS on risk of MBL using the GEC Consortium data. The median PRS in the MBLs was 8.40 compared with that of 7.90 in the controls. The PRS distribution by recruitment site with N > 15 individuals was fairly consistent for MBL (supplemental Figure 4B). Similar to CLL risk, we found a significant association between PRS and risk of MBL (continuous PRS effect: OR, 2.30; CI, 1.44-3.67; P = .001) (Table 4). When we compared the MBL with the Mayo controls, we also observed a significant association (continuous PRS effect: OR, 2.81; CI, 2.18-3.61; P = 9.8 × 10−16) with a 4.34-fold (CI, 2.21-8.50) increased risk of MBL between Q5 vs Q3 quintiles (Table 5; Figure 1B).

We next compared the PRS distributions between the GEC controls and the Mayo controls. The GEC controls are from CLL families and therefore have an elevated risk of CLL because of their strong FH of CLL. Because of the positive FH status of the GEC controls, we hypothesized that the PRS distribution among the GEC controls would be higher than that of the Mayo controls (unknown for FH status). As hypothesized, we observed a higher median PRS in GEC controls (median = 8.09) compared with the Mayo controls (median = 7.59). This finding was significant (P = 3.0 × 10−7; Table 5).


This is the first study to identify and validate a highly significant association of the combined effects of 41 known common susceptibility loci with CLL risk. The PRS in our study had high discriminatory value with a c-statistic of 0.79 in the InterLymph data and 0.78 in the GEC validation data. These values are higher than any discrimination among values of PRS of other nonhematological cancers.16,37-39

Both FH status and PRS retained significance in a multivariate model and were robust to the inclusion of other environmental risk factors. The fact that FH is still significant suggests that more CLL loci may be identified to account for all of the observed familial risk. We also observed that modeling FH and PRS together (c-statistic = 0.80) did not improve discrimination beyond that of PRS alone (c-statistic = 0.79), but a significant improvement was observed over modeling FH alone (c-statistic = 0.70). When PRS was stratified by FH status, we observed stronger effects among the FH+ group compared with that of the FH group, but a formal test of interaction was not statistically significant. This was a limitation of our study. The lack of significant interaction observations may be due to sample size, the extent of missing FH data, or our definition of FH status, which was self-reported and not validated using medical records. As such, larger sample sizes or a more refined FH definition that is specific to FH of CLL may tease out the joint effects of FH and PRS. Alternatively, the effect of FH may diminish as more CLL susceptibility loci are identified and included in the PRS.

Our study also clearly demonstrated a strong genetic contribution to MBL risk. We may now consider age, FH of CLL, and CLL PRS as risk factors for MBL. Studies are showing that MBL is an important clinical phenotype,40 especially for those with high-count MBL (ie, those with an absolute clonal B-cell count >0.5 × 109 cells/L). Specifically, in addition to increased risk of CLL, individuals with high-count MBL have a greater risk of hospitalizations from infections41 and may have greater risk of nonhematological cancers42 than controls. Although 93% of the MBL were low-count MBL, they are at greater risk of progression to CLL because of their strong FH of CLL. Moreover, although not all of these MBL from CLL families will progress to CLL, our PRS findings may also provide insight into identifying those individuals with MBL who may be at greater risk of progressing to CLL. Future studies will be needed to validate this hypothesis. However, we note that it is currently premature to apply a screening test using the PRS because there are no existing clinical guidelines (treatment or preventive treatment) for newly diagnosed MBL, especially for low-count MBL. Furthermore, the natural history of low-count MBL is understudied.

Our study also demonstrated an enrichment of common, low-risk inherited variants among members with a strong family history of CLL, suggesting that low-risk variants play a role in CLL families with 2 or more members with CLL. This is supported by our findings that the distribution of the PRS among GEC controls from CLL families is significantly different from that of the Mayo controls. We note that the Mayo controls had unknown FH status of CLL or hematological malignancies in general. Given that a positive FH status of CLL is rare, this limitation is unlikely to change the results of our findings. We also observed this effect when validating the PRS with GEC CLL cases and controls in that the effect size was attenuated when using GEC controls compared with Mayo controls.

In summary, we have shown and validated that a combined score of known CLL inherited common variants is a strong predictor of CLL risk with high discrimination, which lends support for the need to further evaluate its potential for clinical utility. Because the PRS does not account for all of the familial risk, more CLL loci may be identified. We further showed that the PRS also predicted risk of MBL, suggesting that the PRS provides a continuum of risk from controls to MBL to CLL. Future prospective research studies will be needed to evaluate whether those MBL with high PRS will progress to CLL compared with those MBL with low PRS. Furthermore, druggable targets or other effective means of preventing progression to MBL and to CLL will need to be developed before targeted surveillance using the PRS to be realized in clinical practice.


The Utah studies were supported by grants from the National Institutes of Health, National Cancer Institute (grant CA134674); partial support for data collection from the Utah Population Database (UPDB) and the Utah Cancer Registry (UCR); and partial support for all datasets within the UPDB from the Huntsman Cancer Institute (HCI) and the HCI Comprehensive Cancer Center Support grant (grant P30 CA42014); the UCR is supported in part by the National Institutes of Health, National Cancer Institute Surveillance, Epidemiology, and End Results Program (SEER) Program (contract HHSN261201000026C), with additional support from the Utah State Department of Health and the University of Utah. The Genetic Epidemiology of CLL Consortium was supported by a grant from the National Institutes of Health, National Cancer Institute (118444). The British Columbia site was supported by grants from the Canadian Institutes for Health Research, Canadian Cancer Society, and Michael Smith Foundation for Health Research. The University of California San Diego Molecular Epidemiology of Non-Hodgkin Lymphoma (UCSF2) studies were supported by grants from the National Institutes of Health, National Cancer Institute (grants CA1046282 and CA154643); the collection of cancer incidence data used in this study was supported by the California Department of Health Services as part of the statewide cancer reporting program mandated by California Health and Safety Code Section 103885; the National Cancer Institute’s SEER Program (contract HHSN261201000140C) that was awarded to the Cancer Prevention Institute of California, the University of Southern California (contract HHSN261201000035C), and to the Public Health Institute (contract HHSN261201000034C); and from the Centers for Disease Control and Prevention’s National Program of Cancer Registries awarded to the Public Health Institute (agreement #1U58 DP000807-01). European multi-center case–control study (EpiLymph) was supported by grants from the European Commission (grants QLK4-CT-2000-00422 and FOOD-CT-2006-023103); the Spanish Ministry of Health CIBER de Epidemiología y Salud Pública (grants PI11/01810, PI14/01219, RCESP C03/09, RTICESP C03/10, and RTIC RD06/0020/0095); the Marató de TV3 Foundation (grant 051210); the Agència de Gestiód’AjutsUniversitarisi de Recerca – Generalitat de Catalunya (grant 2014SRG756), which had no role in the data collection, analysis, or interpretation of the results; the National Institutes of Health (contract NO1-CO-12400); the Compagnia di San Paolo—Programma Oncologia; the Federal Office for Radiation Protection (grants StSch4261 and StSch4420); the José Carreras Leukemia Foundation (grant DJCLS-R12/23); the German Federal Ministry for Education and Research (grant BMBF-01-EO-1303); the Health Research Board, Ireland, and Cancer Research Ireland; the Fondation de France; and the Association de Recherche Contre le Cancer. The Czech Republic site was supported by grants from MH CZ–DRO (Masaryk Memorial Cancer Institute grant 00209805) and the Regional Centre for Applied Molecular Oncology (grant CZ.1.05/2.1.00/03.0101). Environmental and Genetic Risks Factors Study in Adult Lymphoma (ENGELA) was supported by grants from the Association pour la Recherche contre le Cancer, Institut National du Cancer, Fondation de France, Fondation contre la Leucémie, Agence nationale de sécurité sanitaire de l’alimentation, and de l’environnement et du travail. Scandinavian Lymphoma Etiology Study (SCALE) was supported by grants from the Swedish Cancer Society (grant 2009/659); Stockholm County Council (grant 20110209); the Strategic Research Program in Epidemiology at Karolinska Institute; Swedish Cancer Society (grant 02 6661); National Institutes of Health, National Cancer Institute (grant 5R01 CA69669-02); and Plan Denmark. The Mayo site was supported by grants from the National Institutes of Health, National Cancer Institute (grant CA97274); Specialized Programs of Research Excellence in Human Cancer (grant P50 CA97274); Molecular Epidemiology of Non-Hodgkin Lymphoma Survival (grant R01 CA129539); Henry J. Predolin Foundation (grant R01 CA92153); National Center for Advancing Translational Science (grant UL1 TR000135); and Mayo Clinic Cancer Center (grant P30 CA15083). The National Cancer Institute SEER study was supported by grants from the Intramural Research Program of the National Cancer Institute, National Institutes of Health, and Public Health Service (grants N01-PC-65064, N01-PC-67008, N01-PC-67009, N01-PC-67010, and N02-PC-71105). The New South Wales, Australia, site was supported by grants from the Australian National Health and Medical Research Council (grant ID990920), the Cancer Council NSW, and the University of Sydney Faculty of Medicine. This study was also supported by grants from the National Institutes of Health, National Cancer Institute (grant R25 CA92049; Mayo Cancer Genetic Epidemiology Training Program). The Mayo Clinic Center for Individualized Medicine provided the Mayo Clinic Biobank materials.

The ideas and opinions expressed herein are those of the authors, and endorsement by the State of California, the California Department of Health Services, the National Cancer Institute, or the Centers for Disease Control and Prevention or their contractors and subcontractors is not intended nor should be inferred.


Contribution: All authors wrote the manuscript; G.K., N.J.C., M.L., P.L.C., T.G.C., C.H., D.R., K.G.C., A.M., J.C., K.C., S.J.C., J.J.S., N.E.K., J.R.C., N.C., and S.L.S. analyzed and/or interpreted the data; N.J.C., C.M.V., S.d.S., J.B.W., Y.B., D.C., H.H., B.G., H.-O.A., P. Boffetta, M. Maynadie, J.M., P.L.C., A.D.N., D.R., K.G.C., A.M., M.G., K.C., R.K.S., J.J.S., C.F.S., J.F.L., J.R.C., N.C., and S.L.S. collected and assembled the data; and G.K., N.J.C., S.d.S., T.G.C., A.D.N., K.C., J.R.C., N.C., and S.L.S. conceived and designed the study.

Conflict-of-interest disclosure: M. Maynadie is a consultant to Janssen Pharmaceuticals and receives funding from Novartis and BMS. A.R.B.W. has stock in Xenon Pharmaceuticals. M.G. is the principal investigator on an Amgen institutional clinical trial. P.M.B. receives research funding from Navidea. J.F.L. receives research funding from ACERTA and Janssen Pharmaceuticals. The remaining authors declare no competing financial interests.

Correspondence: Susan L. Slager, Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905; e-mail: slager{at}


  • * G.K., N.J.C., and L.R.G. contributed equally to this work.

  • J.R.C., N.C., and S.L.S. jointly supervised the work.

  • The online version of this article contains a data supplement.

  • The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

  • Submitted November 3, 2017.
  • Accepted March 23, 2018.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
View Abstract