The first congenital defect of hypoxia-sensing homozygosity for VHL 598C>T mutation was recently identified in Chuvash polycythemia. Subsequently, we found this mutation in 11 unrelated individuals of diverse ethnic backgrounds. To address the question of whether the VHL 598C>T substitution occurred in a single founder or resulted from recurrent mutational events in human evolution, we performed haplotype analysis of 8 polymorphic markers covering 340 kb spanning the VHL gene on 101 subjects bearing the VHL 598C>T mutation, including 72 homozygotes (61 Chuvash and 11 non-Chuvash) and 29 heterozygotes (11 Chuvash and 18 non-Chuvash), and 447 healthy unrelated individuals from Chuvash and other ethnic groups. The differences in allele frequencies for each of the 8 markers between 447 healthy controls (598C) and 101 subjects bearing the 598T allele (P < 10–7) showed strong linkage disequilibrium. Haplotype analysis indicated a founder effect. We conclude that the VHL 598C>T mutation, the most common defect of congenital polycythemia yet found, was spread from a single founder 14 000 to 62 000 years ago.
Diverse physiologic and pathophysiologic processes are regulated by tissue hypoxia.1,2 Hypoxia inducible factor 1α (HIF-1α) is a component of HIF, the “master” transcription regulator of the genes that respond to hypoxia.3 The HIF-1α protein is stable under hypoxia but is rapidly degraded in normoxia via a pathway that includes binding to the von Hippel-Lindau (VHL) protein.4 Chuvash polycythemia (CP) is a congenital disorder of augmented hypoxia sensing due to a homozygous mutation of the VHL gene.5,6 Subsequent to its original description in the Chuvash population isolate, the same mutation was detected in 11 unrelated individuals belonging to the diverse ethnic groups of Asian, white, and African-American origin.7-9
The worldwide distribution of the VHL 598C>T mutation raised the question of whether the VHL 598C>T substitution occurred in a single founder or resulted from recurrent mutational events. To address this issue, we characterized haplotypes of Chuvash and non-Chuvash subjects bearing the VHL 598C>T mutation and healthy unrelated individuals from diverse ethnic groups.
We obtained 150 Chuvash DNA samples (61 with CP and 89 controls) and 35 non-Chuvash samples (29/35 non-Chuvash samples had VHL mutation, 6/35 non-Chuvash samples are their healthy relatives that were used as the controls).7-9 We also examined 363 unrelated healthy individuals: Southeast Asian (n = 76), white (n = 101), African American (n = 88), and Hispanic (n = 98).
VHL 598C>T mutation screen
The VHL 598C>T mutation was identified by Fnu4HI restriction endonuclease that digests the nucleotide sequence of the wild allele (C) at the CP locus, but its targeted restriction site is abolished by 598C>T mutation. The genotyping was performed as previously described.6,8
Haplotype analysis of the VHL gene
After screening 41 reported single-nucleotide polymorphisms (SNPs) spanning the VHL gene, we found 8 polymorphisms, illustrated in Figure 1 and Table 1, to be highly informative in the Chuvash population.
Estimation of haplotype frequencies and determination of statistical significance
We used the expectation maximization (EM) algorithm, which handles uncertainty in phase determination, to estimate haplotype frequencies12 as implemented by Mander.13 Log-linear model was used to test for linkage disequilibrium and disease association.13 To evaluate the haplotypes in pedigrees we used GeneHunter.14 Since GeneHunter (Whitehead Institute, Cambridge, MA) does not allow for linkage disequilibrium among the markers, we rejected any haplotypes that failed to preserve strong associations among the marker alleles.
Long and accurate polymerase chain reaction (LA PCR; Serologicals, Norcross, GA) was employed to determine haplotype phase of the VHL 598C>T alleles from individuals no. 1 and no. 2, who did not have extensive family members available for phasing the haplotypes. The primers were VHL433F (5′-AAA AAA CAC CAA ACC TTA GAG GGG TG-3′) and VHL9246R (5′-CCC AAA GCA GGA GGC AGA CAA GTC ACC-3′). The 8.8-kb gel-purified amplicon covering the markers rs779805 through 1149A>G11 was cloned into the pCR-XL-TOPO vector (Invitrogen, Carlsbad, CA) and sequenced.
Dating the origin of VHL 598C>T mutation
We calculated the time of origin of the VHL 598C>T mutation using a formula that relates the proportion of disease-bearing chromosomes not bearing the ancestral haplotype to the number of generations15 (g) since the mutation event according to the formula g = log[(1–Q)/(1–PN)]/log(1–θ), where Q is the proportion of disease-bearing chromosomes not bearing the ancestral haplotype, PN is the frequency of the disease allele in the population, and θ is the recombination distance assuming that 1% corresponds to 1 megabase (Mb).
Results and discussion
The VHL 598C>T missense mutation was found in Chuvashians, African Americans, Bangladeshi, Punjabi, English, and whites.5,7-9 This finding suggests either that the VHL 598C>T mutation occurred several times in human evolution or that it originated once in some common founder.
We compared the allele frequencies of 8 polymorphic markers in 447 healthy controls (VHL 598C) and 101 subjects bearing the VHL 598C>T mutation that implied 173 598C>T chromosomes from 72 homozygotes (61 Chuvash and 11 non-Chuvash) and 29 heterozygotes (11 Chuvash and 18 non-Chuvash). In the 144 chromosomes from the 72 homozygotes for the VHL 598T genotype, the frequency of one of the alleles for each SNP (see underlined text in Table 2) ranged from 0.73 to 1.00, whereas in 894 chromosomes from the 447 healthy controls (VHL 598C), the frequency was only 0.13 to 0.85. The linkage disequilibrium between VHL 598T mutation and the polymorphic markers was complete for 5 SNPs (rs722509, rs779805, rs779808, rs1678607, 1149A>G) and extensive for 3 SNPs (rs1056286, rs696356, rs378630) (Table 2). Based on χ2 analysis, the differences in allele frequencies for each marker between 447 healthy controls (598C) and 101 subjects bearing 598T were highly significant (P < 10–7; Table 2). This degree of linkage disequilibrium indicates that the VHL 598C>T mutation occurred only once on the founder haplotype-rs1056286T-rs722509T-rs779805G-rs779808C-rs1678607A-1149G-rs696356C-rs378630A (Table 2).
We have detected the VHL 598C>T mutation in 11 unrelated non-Chuvash families or individuals.7-9 Haplotype analysis demonstrated that 5 individuals (nos. 3-7) homozygous for 598C>T shared the same founder haplotype as the polycythemia patients from Chuvashia and thus are of common ancestry. Six non-Chuvash individuals (nos. 1, 2, and 8-11) had the VHL 598C>T mutation on a single allele. Phase determination on individuals no. 1 and no. 2 showed that the VHL 598C>T allele contained the identical SNP pattern, as did the VHL 598C>T allele from Chuvash patients.
The analyses of the haplotypes observed in Chuvash and non-Chuvash subjects with VHL 598C>T mutation can be further subdivided into the core CP haplotype (covering 6 SNPs) and 2 extended CP haplotypes (covering 8 SNPs) (Figure 1). The 2 extended CP haplotypes TTGCAT*GCA (haplotype 1) and TTGCAT*GCC (haplotype 2) cover 340 kb containing the VHL 598C>T mutation (T*) (Figure 1). In these analyses, 99.2% Chuvash and non-Chuvash individuals homozygous for the VHL 598C>T mutation share the core CP haplotype, while 96.5% share the 2 extended CP haplotypes (72.2% haplotype 1 and 24.3% haplotype 2). The analyses of the 6 heterozygous non-Chuvash individuals reveal that 3 of 6 individuals (nos. 1, 2, and 8) share the core CP haplotype. Of those without the exact CP haplotype (individuals nos. 9-11), individual no. 9 has a haplotype that may have arisen from a crossover between SNPs rs779808 and rs1678607 within the VHL gene, individual no. 10 has a C to A transition/crossover at rs696356, and individual no. 11 has the A to C transition at the SNP rs1678607. Statistical differences in founder haplotype were verified between normal alleles and alleles bearing the VHL 598T mutation. Out of 144 VHL 598T alleles derived from homozygotes, 72.2% bore the founder haplotype, TTGCAGCA, which was in striking contrast to 0.9% of 894 VHL 598C alleles derived from healthy controls without VHL 598C>T mutation (P < .000 001). These data strongly support the conclusion that the VHL 598C>T missense mutation arose from a single founder.
We estimated the time of origin of the VHL 598C>T mutation using an approach suggested by Risch et al.15 In the 11 non-Chuvash families or individuals that we studied, we observed 6 TTGCAT*GCC haplotypes, 1 CCGCCT*GCC haplotype, 2 CTGCAT*GCC haplotypes (African American and white), 6 TTGCAT*GCA haplotypes (Bengali and Punjabi origin), and 1 TTGCCT*GCC haplotype (English) among 5 families having affected individuals homozygous for the Chuvash mutation, 5 patients heterozygous with the Chuvash mutation, and 1 African American heterozygous for the Chuvash mutation who was otherwise healthy. We assumed the absence of back mutations. For these calculations, we excluded Chuvash individuals, since their mutations are likely due to founder effect. Among the remaining controls, 1 of 363 individuals (excluding 6 nonpolycythemic relatives who were ascertained because of polycythemic probands) showed a VHL 598C>T mutation (individual no. 1), yielding an estimated population frequency of the disease allele (PN) of 0.001377. Given the genomic distance between VHL 598T and rs1056286 of 234.8 kb, 715 generations of 20 years or 14 293 years are required to account for the data. For the next closest 5′ marker, rs722509, which was 89.8 kb from the VHL 598C>T mutation, the mutation is estimated to have occurred 3110 generations or 62 188 years ago. In the 3′ region, the distance from the marker rs378630 to the VHL 598C>T mutation was 105.7 kb and the estimated number of generations is 930 or 18 593 years ago. The wide range of estimated dates for the origin of the 598C>T mutation reflects the study of a limited number of haplotypes in non-Chuvash subjects. In addition, because the Bengali and the Punjabi families share a unique common haplotype in the 3′ region, the estimated time could be biased by sampling several patients from this region. The calculations concerning generation time are highly influenced by the mutation frequency in the controls, but the rarity of this mutation in controls precludes an accurate estimate of the age of the mutation. The mutation could be much older than our estimate because we only observed a single African-American control with a mutation and our calculations were all performed with whites, for whom the estimated control mutation frequency is 0. We conclude that the VHL 598C>T mutation arose in a single ancestor at least 14 000 to 62 000 years ago. This age estimate and the fact that the mutation is found in different races and ethnic groups suggest that this mutation occurred early in the development of the human race.
We conclude that homozygosity for the VHL 598C>T mutation, the first disorder of augmented hypoxia sensing and the most common genetic defect of congenital polycythemia, originated from a single founder. It is possible that this wide dissemination from the original founder may be associated with some survival advantages. Such an advantage might be related to subtle improvement of iron metabolism, erythropoiesis, embryonic development, energy metabolism,1,16 or some other yet unknown effect. Another intriguing possibility is the recent demonstration of the protective role of HIF-1α in regulating vascular endothelial growth factor (VEGF) in pre-eclampsia,16,17 which is the leading cause of maternal and fetal mortality worldwide.18 However, at this time this is only speculation and might prompt further in vitro and in vivo studies.
We are grateful to Baylor Human Polymorphism Resource for providing us with the DNA samples of 4 ethnic groups and to Xiangjun Gu and Wei V. Chen at the UT MD Anderson Cancer Center for their assistance in data analysis.
Josef T. Prchal, Section of Hematology/Oncology, Baylor College of Medicine, One Baylor Plaza, MS 525D, Houston, TX 77030; e-mail: .
Prepublished online as Blood First Edition Paper, November 6, 2003; DOI 10.1182/blood-2003-07-2550.
Supported by grants from the National Heart, Lung, and Blood Institute (HL66333 and UH1HL03679; J.T.P.); the Howard University General Clinical Research Center grant from the National Institutes of Health (2MO1 RR10284; V.G.); a grant from the National Human Genome Research Institute (R01HG02275; C.I.A.); and the Northern Ireland Leukaemia Research Fund (T.R.J.L.).
E.L. and M.J.P. contributed equally to this study.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 U.S.C. section 1734.
- Submitted August 7, 2003.
- Accepted October 26, 2003.
- Copyright © 2004 by The American Society of Hematology