Blood Journal
Leading the way in experimental and clinical research in hematology

Retroviral vector insertion sites associated with dominant hematopoietic clones mark “stemness” pathways

  1. Olga S. Kustikova1,2,
  2. Hartmut Geiger3,
  3. Zhixiong Li1,
  4. Martijn H. Brugman4,
  5. Stuart M. Chambers5,
  6. Chad A. Shaw6,
  7. Karin Pike-Overzet7,
  8. Dick de Ridder8,
  9. Frank J. T. Staal7,
  10. Gottfried von Keudell2,
  11. Kerstin Cornils2,
  12. Kalpana Jekumar Nattamai3,
  13. Ute Modlich1,
  14. Gerard Wagemaker4,
  15. Margaret A. Goodell5,6,
  16. Boris Fehse2, and
  17. Christopher Baum1,3
  1. 1Department of Experimental Hematology, Hannover Medical School, Germany;
  2. 2Bone Marrow Transplantation, University Medical Center Hamburg-Eppendorf, Hamburg, Germany;
  3. 3Division of Experimental Hematology, Cincinnati Children's Hospital Medical Center, OH;
  4. 4Department of Hematology, Erasmus University Medical Center, Rotterdam, The Netherlands;
  5. 5Center for Cell and Gene Therapy and Cell and Molecular Biology Program, Baylor College of Medicine, Houston, TX;
  6. 6Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX;
  7. 7Department of Immunology, Erasmus Medical Center, Rotterdam, The Netherlands;
  8. 8Information and Communication Theory Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, The Netherlands


Evidence from model organisms and clinical trials reveals that the random insertion of retrovirus-based vectors in the genome of long-term repopulating hematopoietic cells may increase self-renewal or initiate malignant transformation. Clonal dominance of nonmalignant cells is a particularly interesting phenotype as it may be caused by the dysregulation of genes that affect self-renewal and competitive fitness. We have accumulated 280 retrovirus vector insertion sites (RVISs) from murine long-term studies resulting in benign or malignant clonal dominance. RVISs (22.5%) are located in or near (up to 100 kb [kilobase]) to known proto-oncogenes, 49.6% in signaling genes, and 27.9% in other or unknown genes. The resulting insertional dominance database (IDDb) shows substantial overlaps with the transcriptome of hematopoietic stem/progenitor cells and the retrovirus-tagged cancer gene database (RTCGD). RVISs preferentially marked genes with high expression in hematopoietic stem/progenitor cells, and Gene Ontology revealed an overrepresentation of genes associated with cell-cycle control, apoptosis signaling, and transcriptional regulation, including major “stemness” pathways. The IDDb forms a powerful resource for the identification of genes that stimulate or transform hematopoietic stem/progenitor cells and is an important reference for vector biosafety studies in human gene therapy.


In analogy to their replication-competent ancestors,1,2 the semirandom insertion of replication-deficient retrovirus-based vectors may alter cell fate by up-regulating cellular proto-oncogenes or disrupting tumor suppressor genes.312 Such forms of insertional mutagenesis have always represented a safety concern in the development of human gene therapy, although initial studies did not reveal major consequences of random vector insertions.13 The advent of sensitive technologies to detect vector insertion sites in mixed samples,1416 the completion of the murine and human genome projects,17 the design of improved animal models with long-term follow-up,3,18 and the increasing efficiency of retrovirus-mediated gene delivery in clinical trials9,1922 have all contributed to a revised interpretation of vector-mediated insertional mutagenesis. Clonal imbalance triggered by vector insertion is thus expected to represent the rule rather than the exception.2325

Preclinical models and clinical trials revealed that the semirandom insertion of retrovirus-based vectors in the genome of long-term repopulating hematopoietic cells may increase self-renewal and/or initiate malignant transformation.311 Increased self-renewal can be transitory, resulting in clonal succession such that a given dominant clone is replaced by others over time.4,9,11 It is likely, although not always formally shown, that replication stress as caused by extended culture of cells prior to transplantation,5 serial bone marrow transplantation (BMT) in myeloablated recipients,3 cytotoxic chemotherapy,10 or chronic infection9 may trigger the clonal dominance. Long-term observation is required to detect such clones, as the growth kinetics of insertional mutants may be relatively slow and multiple competitor cells are often cotransplanted or present in the host.3,4,10

If more than one proto-oncogene is up-regulated by random vector insertion,5 tumor-promoting sequences are encoded by the vector,7 or cells with pre-existing tumor-promoting lesions are transduced,26 clonal leukemias, lymphomas, or sarcomas may result in consequence of random vector insertion, as previously observed in studies with replication-competent retroviruses (RCRs) such as murine leukemia virus (MLV).1,2,12 In contrast, clonal dominance was not detected following retroviral vector-mediated gene transfer in transplanted T cells, although a fifth of the retroviral vector insertion sites (RVISs) affected the expression of neighboring genes.27 This supports the conclusion that clonal selection requires a triad consisting of dysregulated expression of genes that regulate cell fitness, a cell type with extensive self-renewal potential, and a milieu with a selection pressure for the fittest mutants.

Our work has focused on a relatively simple serial BMT model in C57Bl6 mice. The “normal” genetic background of this strain, the relatively low incidence of host-derived tumors (< 3% under our experimental conditions), and the availability of an allelic variant in the CD45 panleukocyte antigen in a congenic strain (B6 CD45.1) to distinguish donor and host cells render this model particularly attractive for gene discovery by and preclinical safety studies of retroviral gene transfer into hematopoietic cells.

In the present report, we summarize data from several laboratories that used this model to develop a database of RVISs detected in dominant clones contributing to phenotypically intact, mildly dysplastic, and overtly malignant hematopoiesis. We describe the validation of our experimental conditions to detect genetic lesions underlying clonal dominance, and several important genetic and biological insights obtained from the newly established insertional dominance database (IDDb). These analyses underline the validity of our approach to discover genes that regulate fitness and potentially transform self-renewing cells in vivo, promoting a systematic extension for both gene discovery and vector biosafety studies in the context of different cell types and selection conditions.

Materials and methods

Transplantation conditions and analysis of healthy and leukemic hematopoiesis in mice

All BMT studies were performed in C57BL/6 mice. In brief, donor bone marrow cells were cultured ex vivo to stimulate gene transfer using vectors based on MLV, and cells were transplanted into lethally irradiated recipients aged 12 to 16 weeks. Mice were kept in the animal facilities of the participating institutions, according to local animal experimentation guidelines. Food and water were supplied ad libitum. Table 1. summarizes the transplantation conditions and vectors used (for further details, see Document S1, available on the Blood website; see the Supplemental Materials link at the top of the online article). Mice were humanely killed when symptomatic (leukemic) or after 2.5 to 7 months in the healthy cases and examined for pathologic abnormalities, including histologic, morphologic (blood smears and cytospins of bone marrow and spleen), and flow cytometry analyses.5 Animal experiments were approved by the institutional animal research review boards of the principal investigators listed in Table 1.

View this table:
Table 1

Overview of murine bone marrow transplantation (BMT) experiments

Cell culture

K562 cells were cultivated and transduced as described.28

Ligation-mediated polymerase chain reaction

Ligation-mediated polymerase chain reaction (LMPCR) was performed as described.4,5,15

Insertion site analysis

Fragments containing retroviral genomic junctions were submitted to further analysis using the following websites: BLAST29 searches were performed or, in some cases at Ensemble30; the mouse Retrovirus Tagged Cancer Gene Database (RTCGD)31; and/or the stem cell database (SCDb)32 were used. Gene Ontology (GO) describes genes' biological roles and is arranged in a quasi-hierarchical structure from more general terms to more specific. To determine abundance for each GO category, the frequency of retroviral inserts was calculated and compared with the expected frequency observed by chance, as described.33 GO analysis was confirmed by the Expression Analysis Systematic Explorer (EASE).34

Expression arrays

Mouse bone marrow cells were depleted from lineage-committed cells (CD5, CD45R [B220], CD11b, anti–Gr-1, 7-4m and Ter-119; Lineage depletion kit; Miltenyi Biotec, Bergisch-Gladbach, Germany) using AutoMACS (magnetic cell sorter) (Miltenyi Biotec) in 2 independent experiments. The lineage-depleted cells were selected for CD117+ cells (c-kit selection kit; Miltenyi Biotec). Lineage/C-Kit+/Sca-1+ (LSK) cells were selected on a fluorescence-activated cell sorting (FACS) DiVa (BD Biosciences, San Jose, CA). Purity for both experiments was greater than 96%. RNA was isolated (Qiashredder and RNeasy; QIAGEN, Hilden, Germany) directly after sorting (day 0) or after maintaining the cells in serum-free medium supplemented with mSCF, mTPO, and Flt3L for 2 days. Quality was assessed using an Agilent 2100 BioAnalyzer (Agilent Technologies, Palo Alto, CA). Total RNA (100 ng) from LSK cells was used in the GeneChip Eukaryotic Small Sample Target Labeling Assay Version II (Affymetrix, Santa Clara, CA) to generate biotinylated cRNA. cRNA (11 μg) was fragmented for 35 minutes at 95°C. Fragmented cRNA (10 μg) was then hybridized to mouse 430 2.0 microarray (Affymetrix) for 16 hours at 45°C followed by washing, staining, and scanning at 570 nm, according to standard methods.35 The expression data were normalized as described.36,37 For each gene, the highest expression was determined. For some, only the most highly expressed probe set was used. To determine the association of vector insertion with gene expression, a Cochran-Armitage test for trend was performed.38

Pathway analysis

Gene symbols were entered into Netaffx ( and the corresponding Affymetrix IDs for the mouse 430 2.0 arrays were retrieved. The resulting Affymetrix IDs were entered in the Ingenuity Pathway Analysis tool ( to generate direct and indirect pathways. For each dataset, the 10 functions and diseases with the most genes assigned to it are displayed.


Experimental setup

The RVISs described in this study are derived from murine experiments (mostly C57Bl6), using several replication-deficient MLV-based vectors for gene transfer into ex vivo–cultured hematopoietic cells. The vectors used include a group encoding fluorescent proteins (EGFP, DsRed), a group encoding transmembrane proteins that serve as selection markers (dLNGFR, human tCD34 and flCD34, murine tCD34 and flCD34, MDR1), and a vector expressing a gene associated with DNA repair (XRCC4). As a positive control for a transforming vector expressing a strong oncogene, the large T antigen (TAg) from simian virus 40 (SV40) was used (Figure 1, Table 2)TAg transforms cells by sequestering 2 tumor suppressor genes, Rb and p53.39 The transforming potential of the TAg vector was initially evaluated in 32D cells, revealing insertion sites with potential contribution to transformation (Z.L., unpublished data, January 2006). Four RVISs from these studies were also included in the IDDb (1.4% of the database).

Figure 1

Experimental setup of murine BMT studies using donor cells modified with different retroviral vectors. The enhancer-promoter contained in the long terminal repeat (LTR), the cDNA encoded by the vector, and the 3′ untranslated region (3′ UTR) are indicated in Table 2. LD indicates low dose of retroviral vector; HD, high dose; and exp, expansion in vivo.

View this table:
Table 2

Modules of retrovival vectors used in this study

If the vectors do not encode oncogenic sequences, RVISs present in dominant clones may mark events that initiate increased self-renewal.4 Importantly, we noted transcriptional dysregulation of the mutated alleles in all cases tested so far.4 If the vectors encode oncogenic sequences such as TAg, the insertional events may either collaborate with the encoded oncogene to initiate tumor formation or promote the expansion of dominant malignant clones whose initial transformation is primarily dependent on the vector-encoded oncogene.7 Mice were prospectively examined for several months; in a subset of the studies, serial BMT was performed to increase replicative stress and observation time (Figure 1; Table 1).

Validation of LMPCR

Different methods have been described to recover insertion sites from retrovirally transduced cells.1416,27,40,41 To identify insertion sites of dominant clones, it was crucial to neglect insertion sites present in minor clones. Ligation-mediated PCR (LMPCR) as opposed to the much more sensitive “linear amplification-mediated PCR” (LAMPCR) has previously been shown to lack the sensitivity to detect all insertion sites present in highly polyclonal samples.16 However, we noted that the bands obtained by LMPCR correlated well with Southern blot results obtained in clonal samples, and recovery of RVISs was in the range of 80% when using a single restriction enzyme.4,5,42 We thus decided to select dominant bands that are isolated from analytical gels for direct sequencing, ignoring weak bands that might reflect insertion sites present in minor clones.

We validated this approach by examining DNA from K562 clones that contained a known number of retroviral vector insertions and DNA from a K562 mass culture obtained after transduction with a high MOI of a marking vector.28 Although LMPCR reproducibly showed “dominant bands” of molecular weights ranging from 100 to 800 base pair (bp) in clonal samples, polyclonal DNA yielded a smear of multiple minor bands (Figure 2A). To examine the minimal proportion of clonal DNA required for detection of dominant bands, we mixed DNA from a clone with 6 insertions (validated by Southern blot, not shown) with DNA from a polyclonal retrovirally transduced mass culture. If the clonal DNA constituted greater than 70% of the sample, LMPCR reproducibly revealed its insertion sites as dominant bands, whereas minor PCR products progressively disappeared. Major PCR products were recovered largely irrespective of their size (Figure 2B).

Figure 2

LMPCR validation. (A) DNA of K562 mass cultures and cell clones containing different numbers of retroviral insertions28 was subjected to insertion site amplification by LMPCR using the conditions described in “Material and methods.” In contrast to the clonal DNA, mass culture DNA does not reveal dominant bands except when cells were propagated for several weeks, revealing a clonal imbalance. (B) Mixing mass culture DNA with increasing amounts of DNA from clone 2.4 reveals that LMPCR recovers dominant bands if these contribute greater than 70% of the population.

Direct sequencing of the PCR product confirmed the presence of RVISs (data not shown). We typically performed 2 LMPCRs to confirm reproducibility.

Composition and content of the IDDb

The sequence data obtained by LMPCR were blasted against the mouse genome to identify genes potentially affected by the insertion site. We also examined whether the hit loci were contained in the RTCGD,2 and listed the experimental conditions as these may affect selection (vector, transplantation conditions, and potential development of malignancy; Table S1). In total, we identified 276 RVISs from a total of 120 C57Bl6 mice (receiving retrovirally engineered bone marrow cells), and 4 RVISs from 2 C3H/Hej mice (developing leukemia after receiving 32D cells transduced with a TAg vector). On average, we thus retrieved 2.3 insertions per animal, reflecting the low number of dominant clones. Only 16.4% of these mice presented with leukemia, manifesting with a latency of 5 to 10 months after gene transfer.

Overall, 22.5% of the RVISs contained in the IDDb are located in or near to known proto-oncogenes as defined by the RTCGD2 and additional literature (, 49.6% in genes encoding proteins involved in various processes of cell signaling, 20% in other (often metabolic), and 7.9% in unknown genes. When bone marrow cells were transplanted to secondary recipients, the proportion of insertions in proto-oncogenes increased from 15% (primary recipients) to 24% (secondary recipients) in mice with normal hematopoiesis (Figure 3A). Thus, the IDDb perfectly reproduced the findings of our previous study performed in mice that showed no signs of hematopoietic malignancies.4 Considering that proto-oncogenes represent 1.06% (n = 231) of the murine genome (Entrez Gene, May 11, 2006), this is a gross overrepresentation. For comparison, 37% of the RVISs recovered from leukemias were in or close to proto-oncogenes (Figure 3A), strongly suggesting that the RVISs were causally involved in promoting a competitive advantage and inducing transformation.5

Figure 3

Retroviral vector insertion site (RVIS) distribution according to gene classes and type of transgene. (A) RVISs in known proto-oncogenes (POGs) increase in frequency over serial BMT and are most pronounced in leukemic clones. (B) No major impact of the transgene class was found except when the vector encoded a potent oncogene (TAg), which increased the probability to select for RVIS in POGs. SIGs indicates signaling genes; OGs, other genes.

We next asked whether the different transgenes encoded by the vectors affected clonal selection. We subdivided the hit genes into 4 groups: proto-oncogenes as defined by the RTCGD2,31 and additional literature, signaling genes, other genes, and unknown genes. EGFP and DsRed encode fluorescent proteins which are not known to cause significant changes of signaling networks. Twenty-five percent (n = 70) of the hits were recovered using these vectors. In this subgroup, the distribution of hits in the 4 gene groups was almost identical to that obtained within the set of transgenes that encode surface marker proteins for which an effect on cellular signaling cannot be ruled out (MDR1, dLNGFR, human tCD34, human flCD34, murine tCD34, murine flCD34, XRCC4) (Figure 3B). In contrast, a control group in which the transgene encoded the potent oncoprotein TAg of SV40 showed a distinctively higher proportion of RVISs in proto-oncogenes (41% versus ∼ 20% with other vectors). This supports the conclusion that RVISs recovered from tumors induced by replication-deficient vectors encoding oncogenes contribute to clonal selection.7

Interestingly, RVISs of the oncogenic TAg vectors overlapped with those observed in healthy retrovirally marked hematopoiesis exhibiting clonal dominance. Four of the 13 proto-oncogene hits observed in tumors induced by TAg vector insertion were also observed in dominant clones transduced by other vectors (Sema4b, AB041803, BC013781, Fli1), and 3 additional proto-oncogenes hits selected in TAg vector-transduced tumors occurred in gene families that were also marked using other vectors (eg, BclX was hit by the TAg vector and Mcl1 by the dLNGFR vector; growth factor receptors Axl and Csf1r were hit by TAg vectors and Csfr3 by the DsRed vector).

In further support of selection for clonal dominance largely irrespective of the type of transgene encoded, 4.6% (n = 13) of RVISs affect the Mds/Evi1 locus which encodes a transcription factor expressed in primitive hematopoietic cells.43 Rearrangement and ectopic expression of this allele contributes to human and murine leukemia.43 Evi1 represents the third most frequent insertion site listed in the RTCGD.2,31 Sixteen percent of the other RVISs found in the IDDb are common RVISs (CRVISs); that is, independent insertion sites recovered from different cell clones but affecting the same gene. Because CRVISs are a strong indication of selection for an important biological function,2 it is interesting that only 52% of the IDDb-CRVISs represent known proto-oncogenes (Table 3). Summarizing all RVISs in known proto-oncogenes, those forming novel CRVISs in our database and those occurring in genes with an established role in stem cell self-renewal and hematopoiesis, a group of at least 48 genes encoding growth factors, signal transducers, and transcription factors can be extracted which represent interesting candidates for future functional studies (Table 3). Interestingly, 81% of these RVISs were found in secondary and/or leukemic transplant recipients.

View this table:
Table 3

Insertions in known proto-oncogenes, genes with an established role in hematopoiesis, and common insertion sites in the insertional dominance database (IDDb)

Insertion site distribution in relation to the transcription start site

To further address the potential selective pressure present on the mutated alleles, we analyzed the distribution of the RVISs with respect to the transcriptional start site (TSS) of the next neighboring gene. In unselected freshly transduced cells, MLV vectors have a preference for insertions in the 10-kb (kilobase) window around the TSS,40 with a peak in the ± 1-kb window,41 whereas HIV and derived vectors tend to prefer actively transcribed sequences, in particular beyond +2 kb downstream of the TSS.40,41,44 The reference data obtained in previous studies (kindly provided by D. Russell and G. Trobridge; Trobridge et al41) are shown in Figure 4A (lines), in comparison with our database (columns). For this comparison, we divided the IDDb hits into 4 classes: class 1 consists of known self-renewal genes, proto-oncogenes, and CRVISs present in the IDDb (Table 3); class 2 represents genes with a known or putative role in cellular signaling networks; class 3 collects other genes; and class 4 unknown genes.

Figure 4

Type of mutations. Data are shown with respect to gene class 1 (common insertion sites, proto-oncogenes, and self-renewal genes), class 2 (signaling genes), and classes 3 and 4 (other and unknown genes). (A) Position of RVIS in the Insertional Dominance Database (IDDb) around the transcriptional start site. Reference data insertion sites of different vectors in freshly transduced cells, shown as lines, were kindly provided by G. Trobridge and D. Russell.41 MLV indicates murine leukemia virus vector; FV, foamy virus vector; HIV, human immunodeficiency virus vector; random, computer-predicted random insertion pattern. (B) Overrepresentation of enhancer mutations in class 1 genes. RVISs were analyzed for the different types of retroviral insertional mutations proposed earlier.12 Insertions located downstream but in an antisense orientation do not correspond to the definition of enhancer mutations suggested in Uren et al12 and are therefore labeled “Except. +/R.”.

Compared with the insertion pattern of MLV in unselected cells, the IDDb shows a clear overrepresentation of class 1 events in the window between -1 and -20 kb, and also between 5 and 10 kb downstream of the TSS (Figure 4A). No overrepresentation is found within 1 kb upstream of the TSS, and class 1 hits are even underrepresented in the first kilobase of transcribed sequence. A similar picture is observed in the window around +2.5 to 5 kb. Events in classes 3 and 4 serve as an internal control, showing no enrichment over the unselected MLV pattern in the windows around -5 to -1 kb and no counterselection in the +1-kb window. The region that is most likely to contribute to clonal dominance thus resides within -1 to -5 kb upstream of the TSS, whereas insertions closely downstream of the TSS tend to be counterselected.

Vectors based on foamy virus and even more so those based on lentiviruses have been shown to have a reduced bias for the region surrounding the TSS.40,41 The IDDb with its focus on genes that support competitive fitness reveals that a simple switch to these vector types may not fully eliminate the risk of insertional mutagenesis. Looking at the window 5 kb upstream of the TSS, a switch to foamy virus–based vectors41 might reduce the probability of “productive” class 1 insertions by a factor of less than 2 and a switch to HIV-based vectors by a factor of 10. However, hits in this window only account for less than 20% of class 1 events in our database. For the majority of events located further upstream or downstream, changing the retroviral backbone does not seem to change the risk.

The position and orientation of the vector with respect to the transcription unit allows a classification of insertional mutations as follows12: enhancer mutations are typically located upstream of the transcription unit in the antisense orientation or downstream in the sense orientation, fusion transcript mutations may originate from insertions upstream of transcription units in the sense orientation, and insertions within a transcription unit may lead to aberrant splicing or termination. In the IDDb, 40% (111 of 280) of the RVISs represented enhancer mutations, the majority (76%) occurring upstream of the TSS in the antisense orientation. Enhancer mutations were more relevant in class 1 genes (55%) than in the other classes (∼ 34%). Fusion mutations represented 20% of the events in class 1, and approximately 14% in the other classes. Accordingly, insertions within transcription units were underrepresented in class 1 compared with the other classes (Figure 4B).

Together, the enrichment of insertions in class 1 genes over serial transplantation and with leukemic progression (Figure 3), the skewed distribution of insertions around the TSS (Figure 4A), and the counterselection of insertions within transcription units (Figure 4B) in favor of enhancer and fusion mutations all reveal that insertional mutations strongly contributed to the occurrence of clonal dominance in our experiments.

Overlap with stem cell databases and pathway analysis

MLV vectors preferentially target active genes, but extremely high gene expression levels might impede insertions.45 To explore whether the RVISs selected in vivo represent genes expressed in primitive hematopoietic cells, as suggested from a previous study conducted with human cells,46 we compared the genes listed in the IDDb with 3 different transcriptome databases. The first is the publicly accessible stem cell database (SCDb),32 which represents a subtracted cDNA library derived from primitive hematopoietic cells present in murine fetal liver and marrow. The second database was generated from a genome-wide gene expression profiling experiment using Affymetrix array full genome mouse arrays on RNA extracted from highly purified hematopoietic stem/progenitor cells (Lin Sca1+ c-Kit+, LSK, > 96% pure after flow sorting) obtained from steady state murine bone marrow (M.H.B., K.P., D.R., F.J.T.S., M. M. A. Verstegen, G. W., unpublished observations, July 2006), and the third database was generated using the same array and RNA extracted from highly purified hematopoietic stem cells (side population [SP] combined with LSK)47 (S.M.C. and M.A.G., unpublished data, July 2006).

We found 57% of the class 1 genes to be listed in the SCDb, as opposed to 32% for class 2 and 17% for class 3. With reference to the GO classification used in the SCDb, the IDDb shows an overrepresentation of genes encoding proteins involved in apoptosis, intracellular signaling, or transcriptional control, whereas the following gene classes are strongly underrepresented: cell adhesion, transport, chromatin regulators, protein processing, and protein synthesis (Table 4)

View this table:
Table 4

Genes associated with clonal dominance preferentially belong to three GO categories: intracellular signaling, transcription factor, and apoptosis

We further studied GO in its branching into a semihierarchical tree, describing genes in categories from very general (ie, regulation of biological process, levels 1-5) to very specific (level 10+). This analysis showed a highly significant (P < .05, hypergeometric) overrepresentation of the following processes: cell proliferation (level 4, P = .016), positive regulation of apoptosis (level 7, P = .033), and regulation of transcription, DNA-dependent (level 8, P = .001).

A network-based pathway analysis demonstrated that RVISs clustered near genes involved in cancer and were, in addition, strongly correlated with genes involved in hematologic and immune system development, functions, and disease (Tables S2 and S3). Canonical pathway analyses performed with IDDb genes revealed a significant overrepresentation of growth factor signaling pathways, death receptor signaling pathways, and associated intracellular networks (Table 5). Strikingly, most of the genes extracted in Table 3 are connected in 2 major networks (Figure 5). Figure 5A shows major pathways contributing to hematopoietic stemness (Igf-1, Vegf, Pten, apoptosis, death receptor), whereas Figure 5B reveals the association with additional nuclear players involved in hematopoietic self-renewal and lineage commitment.

View this table:
Table 5

Ingenuity pathway analysis of all genes listed in the IDDb (24 most significant results shown)

Figure 5

Ingenuity analyses of the genes listed in Table 3 reveal 2 major pathways. Note that further members of these pathways (A-B) may be highlighted when extending the analysis to the full IDDb. That is, Siva shown on the bottom of Figure 5A is a chromosomal neighbor of Akt1; this locus represents a CRVIS in the IDDb (Table 3; Table S1).

This suggested that the RVISs selected in the IDDb occurred preferentially in a subset of genes expressed in primitive hematopoietic cells. We further approached this question by comparing the genes listed in the IDDb with gene expression microarray data obtained from purified fractions of hematopoietic stem/progenitor cells. With respect to the most primitive fraction analyzed, SP-LSK, RVISs present in the IDDb were clearly associated with expressed genes (P < .01, Wilcoxon test). Interestingly, the level of significance increased depending on serial transplantation and the degree of transformation: primary recipients (P = .003), secondary recipients (P < .001) and leukemias (P < .001). This reveals that the vast majority of genes whose deregulation causes clonal dominance is already expressed in primitive hematopoietic cells, rather than being activated from a silent state by insertional mutagenesis.

As the initial target population of retroviral gene transfer was not such a highly purified fraction, we also used gene expression array data from LSK cells to check whether the level of transcription correlates with RVISs. LSK cells contain both short-term and long-term repopulating cells,48 and it is possible that some RVISs converted short-term to long-term clones, as can be observed in consequence of certain oncogenic translocations49 (and references therein). On the basis of their relative expression level, genes were classified into 10 “bins” such that bin 1 represents the 10% of genes with the lowest expression levels, and bin 10 the 10% of genes with the highest. In agreement with findings made in unselected cells, RVISs present in the IDDb correlated with gene expression levels prior to transduction (Figure 6A-B). Comparing freshly isolated and cultured LSK cells, no major effect of culture conditions on the insertion profile was noted (Figure 6A-B). Interestingly, the association of RVISs with highly expressed genes tended to be more pronounced in class 1 than in classes 2 and 3 (Figure 6C).

Figure 6

The probability of retroviral vector insertion but not the probability of forming a common insertion site depends on the expression level of the affected gene. (A) Array data from enriched hematopoietic progenitors containing both long-term and short-term repopulating cells (LSK cells, freshly isolated) were divided into 10 equal bins according to relative gene expression levels. The curves show the number of genes marked by RVISs in the different bins. Irrespective of the selection conditions (primary recipient, secondary recipient, or leukemia), the probability of RVIS is highest in the 40% most highly expressed genes. (B) Similar results were obtained when examining array data from LSK cells that were cultured for 2 days. (C) The selection for insertions in highly expressed genes is most pronounced for class 1 genes. (D) Expression levels of all genes detected by the arrays of LSK cells versus all RVIS genes of the IDDb, showing that the latter clearly have a much higher expression. The CRVIS genes of the IDDb are superimposed, showing that these do not cluster in the highest expression levels. Labeled genes represent CRVISs that were hit 3 times or more. Genes below the dotted line are not expressed in LSK cells.

Remarkably, the probability of forming a CRVIS does not seem to depend on the expression level. CRVISs are evenly distributed over all expression levels (Figure 6D) and even found in regions without transcriptional activity. A similar trend was observed for CRVISs that were hit 3 or more times; Evi1, the most frequent CRVIS in our dataset, does not show the highest expression level in the array (Figure 6D). Together, these data confirm the hypothesis that the risk of retroviral vector insertion in a given locus depends on its expression level in the target cell. However, the selection for CRVIS is not a sole function of the initial expression level. We conclude that CRVISs are selected based on the biological consequences of target gene dysregulation and do not necessarily reflect a higher probability of retroviral integration.

Association of proto-oncogenes with leukemogenesis

To address whether the IDDb contains novel information regarding proto-oncogenes associated with leukemogenesis, we compared our data with tumor phenotypes listed in the RTCGD (obtained in animals infected with RCRs). The 2 databases are not redundant: Many CRVISs observed in the IDDb are not listed as CRVISs in the much larger RTCGD (Igfbp4, Dph5, FasL, Gpr43, Gtf2i, Ly78, Lrcc6, Plcg2, Sesn2, Tnfsf10, Rab3gap2). These genes may be more likely to confer clonal dominance in healthy hematopoiesis than to contribute to malignant transformation. Furthermore, the IDDb shows some genes to have almost identical RVISs as listed in the RTCGD, however, frequently in association with distinct tumor phenotypes. The expansion and comparative analysis of these 2 databases may thus provide deeper insights into the association of genes with the induction of clonal dominance and malignant tumors.

Interestingly, the few RVISs identified to date that were associated with clonal dominance or malignant transformation in primate models and clinical trials are all found in the IDDb: BCL-2A1 was identified as the RVIS in the single case of a malignant transformation observed to date in a nonhuman primate model following the use of replication-deficient retroviral vectors.10 It is highly related to BclX and Mcl1 listed in the IDDb. LMO2 was found as a CRVIS in cases of lymphatic leukemia occurring in gene therapy for X-linked severe combined immunodeficiency (SCID-X1) disease,8 and the murine homolog is contained in the IDDb. Finally, the most frequent CRVIS in the IDDb found in association with both clonal dominance and myeloid transformation is Evi1; CRVISs in the human homolog were observed in association with clonal dominance in a recent report of patients undergoing retroviral vector-mediated gene therapy for chronic granulomatous disease (CGD).9


The present study introduces a novel database (IDDb) listing RVISs associated with clonal dominance in cases of normal, potentially preleukemic hematopoiesis or malignant transformation of hematopoietic cells. We showed that our experimental conditions select RVISs of dominant clones that contribute the majority (> 50%) of a polyclonal population. Under our experimental conditions that involve a rather profound replication stress, the dysregulated cellular genes most likely have the potential to promote proliferation and/or survival of long-term repopulating hematopoietic cells. Consistent with present concepts of oncogenesis and leukemogenesis,50,51 GO analysis revealed that 3 major gene functions contribute to clonal dominance: regulation of proliferation, apoptosis, and transcription. More importantly, pathway studies revealed that these genes are functionally connected in 2 major signaling networks (Figure 5). Of note, many of the genes listed in these classes have previously not been implicated in hematopoietic stem cell (HSC) biology.

Another important conclusion was that those genes which contribute to clonal dominance following insertional mutagenesis are more likely to be hit if already being transcriptionally active at a relatively high level in primitive hematopoietic cells. This was also observed with reference to the transcriptome of freshly isolated cells, independent of prior cytokine stimulation. The same conclusion was derived from an independent study performed with an even longer observation period (M.H.B., K.P., D.R., F.J.T.S., M. M. A., Verstegen, G.W., unpublished observations, July 2006). Interestingly, similar findings were made with retrovirally marked human cells observed in the nondiabetic obese (NOD)/SCID xenotransplant setting,46 whereas insertion sites observed in murine tumors induced by RCR rather overlap with the transcriptome of human leukemias.52 We would therefore assume that all vectors that show an insertion bias for expressed genes and contain strong enhancer sequences raise the probability of inducing clonal dominance by insertional mutagenesis. This also implies that the risk of clonal dominance or even malignant transformation should be much lower if gene transfer occurs in cells that have partially or completely silenced the self-renewal program.

The leukemias occurring in our model typically require combinatorial genetic lesions, either by the presence of RVISs in more than one leukemia-promoting gene,5 or by a single proto-oncogenic RVIS in combination with signal alterations evoked by the vector-encoded transgene.3 Although animals were examined for hematopoietic abnormalities in compliance with recommendations,53,54 preleukemic clonal expansion might have been overlooked. Leukemogenic signal alterations are expected to be dose related, as previously observed for Evi1 and Hoxb4.5557 The potential utility of these genes for stem cell expansion will thus depend on the ability to identify the required level of transcriptional dysregulation. Accordingly, we would assume that RVISs in such genes are only selected in vivo if the resulting extent of transcriptional dysregulation fits the selective pressure encountered in the given conditions. Insertional mutagenesis by RVISs may thus represent a powerful approach to identify genes that promote clonal survival under different selection conditions, such as exposure to cytotoxic drugs, inhibitory cytokines, irradiation, or disease-specific conditions.

Notably, not all IDDb entries can be considered as potential inducers of clonal dominance. Some genes may be accidentally marked in clones that contain more than one insertion, and intrinsic, potentially stochastic differences in cell fitness may also contribute to clonal dominance (reviewed in Spangrude et al58). A stronger focus on serially transplanted HSCs and experimental conditions favoring a single integration per cell may further increase the stringency of the screen. However, final proof requires functional studies. For a number of genes contained in the IDDb an essential role in the regulation of cellular survival is experimentally validated. This applies to the majority of the proto-oncogenes listed in Table 3 and the genes involved in the networks presented in Figure 5. However, only a smaller subset of these genes has previously been implicated to regulate “stemness.” Examples are Akt1, which is known to be essential for self-renewal of murine embryonic stem cells,59 Hoxb4, which stimulates HSC self-renewal without necessarily inducing leukemia,60 and Evi1, which triggers self-renewal of myeloid progenitor cells in vitro and might give rise to a myelodysplastic syndrome and myeloid leukemia.6,43 Interestingly, Akt1 together with Foxo3a and Cyclin D regulates the hibernation of HSCs,61 and all 3 genes are found in the center of the network shown in Figure 5A. Other genes that are functionally related to the last 2 examples are also found: The IDDb (Table S1) lists additional homeobox genes (Hoxa7, Hhex, Cutl1, Dlx2, Dlx3) and Ski, which is related to Evi1 in its function to interact with SMAD signaling. An extended analysis of the IDDb also reveals further members of other pathways that are not (yet) recognized by the Ingenuity software tool.

Expanding the IDDb is also of major importance for the safety analysis of RVISs in preclinical and clinical studies. Although mice and humans differ in their susceptibility to transformation and some underlying mechanisms,50 the IDDb nevertheless contains the 3 leading gene families associated with leukemia induction or preleukemic alterations observed to date in nonhuman primates and clinical trials: the Bcl2-related genes,10 Lmo2,8 and Evi1.9 Expanding our approach to studies with other animal models might eventually even reveal basic biological principles regulating stem cell fitness that have been genetically and functionally conserved between different species. A general database for vector insertion sites that also includes data from clinical trials would be of great value.


Contribution: O.S.K., G.v.K., and K.C. performed LMPCR, sequence analyses; O.S.K. organized the database and performed associated biostatistics; H.G., Z.L., K.J.N., and U.M. designed and performed animal experiments (including associated molecular biology); M.H.B., S.M.C., C.A.S., K.P.-O., D.d.R., F.J.T.S., G.W., and M.A.G. performed transcriptome studies and associated bioinformatics; and B.F. and C.B. initiated and coordinated the work, and wrote the paper together with the above colleagues.

Conflict of interest disclosure: The authors declare no competing financial interests.

Correspondence: Boris Fehse, Bone Marrow Transplantation, University Hospital Eppendorf, Martinistr. 52, 20251 Hamburg, Germany; e-mail: fehse{at}; and Christopher Baum, Experimental Hematology, OE6960, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany; e-mail: baum.christopher{at}


We thank Manfred Schmidt and Christof von Kalle for their support in establishing LMPCR and for contributing insertion sites as published in references 3 and 5. We thank Anita Badbaran for technical assistance and Kristoffer Weber for help with the figures.

This work was supported by the Deutsche Forschungsgemeinschaft (grant DFG SPP1230) (B.F., Z.L. and C.B.) and (grant DFG-FE568/5-1,2) (B.F.), the European Union (grants INHERINET-QLK3-CT-2001-00427 and CONSERT-LSHB-CT-2004-005242) (G.W., F.J.T.S., and C.B.), and the National Cancer Institute (grant R01-CA107492-01A2) (C.B.).


  • The online version of this article contains a data supplement.

  • The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

  • Submitted August 28, 2006.
  • Accepted October 18, 2006.


View Abstract