Global and organ-specific chronic graft-versus-host disease severity according to the 2005 NIH Consensus Criteria

Sally Arai, Madan Jagasia, Barry Storer, Xiaoyu Chai, Joseph Pidala, Corey Cutler, Mukta Arora, Daniel J. Weisdorf, Mary E. D. Flowers, Paul J. Martin, Jeanne Palmer, David Jacobsohn, Steven Z. Pavletic, Georgia B. Vogelsang and Stephanie J. Lee


In 2005, the National Institutes of Health Consensus Development Project on Criteria for Clinical Trials in Chronic GVHD proposed a new scoring system for individual organs and an algorithm for calculating global severity (mild, moderate, severe). The Chronic GVHD Consortium was established to test these new criteria. This report includes the first 298 adult patients enrolled at 5 centers of the Consortium. Patients were assessed every 3-6 months using standardized forms recommended by the Consensus Conference. At the time of study enrollment, global chronic GVHD severity was mild in 10% (n = 32), moderate in 59% (n = 175), and severe in 31% (n = 91). Skin, lung, or eye scores determined the global severity score in the majority of cases, with the other 5 organs determining 16% of the global severity scores. Conventional risk factors predictive for onset of chronic GVHD and nonrelapse mortality in people with chronic GVHD were not associated with NIH global severity scores. Global severity scores at enrollment were associated with nonrelapse mortality (P < .0001) and survival (P < .0001); 2-year overall survival was 62% (severe), 86% (moderate), and 97% (mild). Patients with mild chronic GVHD have a good prognosis, while patients with severe chronic GVHD have a poor prognosis. This study was registered at as no. NCT00637689.


Chronic GVHD is a common complication associated with high morbidity and mortality after allogeneic hematopoietic cell transplantation (HCT).1,2 Before 2005, chronic GVHD was only diagnosed after day 100, and the severity was described as “limited” (< 50% body surface area skin involvement or liver involvement only) or “extensive” (involvement of any other target organ, > 50% body surface area involvement or cirrhosis).3 In 2005, the National Institutes of Health (NIH) Consensus Working Group for Diagnosis and Staging recommended organ-specific severity scoring scales and proposed a new definition for global chronic GVHD severity.4 The new global severity score was intended to replace the “limited” versus “extensive” designation with the goal of providing a more clinically informative and discriminating severity measure for use in clinical trials and as an indicator of the need for systemic immunosuppressive treatment. The hope was also that better phenotypic classifications would assist laboratory researchers studying biologic correlates and pathophysiology of chronic GVHD.5 The NIH global severity score uses the numerical scoring system for individual organs to calculate a summary scale according to the number and severity of organs involved.

The Chronic GVHD Consortium is an NIH-funded study group established to test the new chronic GVHD criteria because the scoring system was based on consensus opinion and not empiric data. Using data from a prospective observational cohort study, we sought to: (1) describe the global and organ-specific severity of participants; (2) describe the individual organ contributions within the global categories and ascertain whether all organs need to be scored; (3) assess clinical risk factors for higher global severity; and (4) assess whether global severity predicts nonrelapse mortality and overall survival.


The Chronic GVHD Consortium began patient accrual in 2007 and this report includes the first 298 adult patients from 5 centers: Fred Hutchinson Cancer Research Center, Stanford University, University of Minnesota, Dana-Farber Cancer Institute, and Vanderbilt University. The protocol was approved by the institutional review boards of participating centers, and all patients provided written informed consent in accordance with the Declaration of Helsinki. Eligible patients were HCT recipients age 2 or older with chronic GVHD, diagnosed according to the NIH consensus criteria, and requiring systemic immunosuppressive therapy. Patients with either classic chronic GVHD (without features of acute GVHD) or overlap syndrome (features of both chronic and acute GVHD) were eligible. Cases were classified as incident (enrollment < 3 months after chronic GVHD diagnosis) or prevalent (enrollment 3 or more months after chronic GVHD diagnosis). At enrollment and every 6 months thereafter, standardized data were collected from clinicians and patients as recommended by the NIH Consensus Conference.4,6 Incident cases had an additional assessment at 3 months after enrollment. All patients were followed with serial assessments until their chronic GVHD was resolved for 1 year. Collection of longitudinal data is ongoing. Data collection forms, the ACCESS database structure, and SAS coding programs are available on request from the authors.

Clinician organ severity scoring

A clinical categorical system (0-3) is used for scoring of individual organs that describes the severity for each affected organ taking functional impact into account.4 Eight organs (skin, mouth, eyes, gastrointestinal [GI] tract, liver, lungs, joints, and female genital tract) are assessed. In general, a score of 0 means no manifestations/symptoms, a score of 1 indicates no significant impairment of function or activities of daily living (ADL), a score of 2 reflects significant impairment of ADL but no major disability, and a score of 3 indicates significant impairment of ADL with major disability. The scoring is conducted in the clinic and the only mandated laboratory tests for its completion are liver function tests, although pulmonary function tests are collected if available. An example of the 0-3 organ severity scoring is shown for skin (Figure 1). Clinicians provided the organ scoring information but global severity (mild, moderate, severe) was then calculated from these scores by computer algorithm according to the number and severity of organs reported. Mild disease was 1 or 2 organs (except lung) with score 1. Moderate disease was 3 or more organs with score 1 or lung score 1, or 1 or more organs with score 2. Severe disease was any organ with a score 3 or lung score 2. Lung dysfunction was treated differently than other organ dysfunction based on studies reporting higher mortality.7 In a single case, a patient was asymptomatic at enrollment (score 0 on all organs) although she was still on immunosuppression. This patient was combined with the mild global severity group for analysis.

Figure 1

Individual organ severity scoring within global severity categories.

Statistical considerations

For purposes of the main analyses, global severity scoring at the time of study enrollment (whether incident or prevalent case) was used (n = 298 adults). Global severity from either the enrollment visit or follow-up visits (n = 738) was used in the subsequent analyses. Statistics were descriptive for percentages and frequencies of categorical variables. For Karnofsky performance status (KPS), data were divided into tertiles for analysis.

The contribution of individual organ scores to the global severity score at each visit was quantified by calculating whether knowledge of the individual organ score was contributory (ie, helped determine the global severity score) or necessary (ie, was the only score determining the global severity score).

Previously reported risk factors associated with the development of chronic GVHD as well as risk factors previously identified to be associated with increased nonrelapse mortality in patients with chronic GVHD were analyzed for their association with global severity of chronic GVHD using logistic regression. Generalized estimating equation (GEE) methods available for logistic regression in SAS Proc Genmod were used to adjust for baseline characteristics and variables that could vary over repeated observations per patient (n = 293 adults, 725 assessments). Missing data (n = 5 patients, n = 13 assessments) are attributable to missing key clinical information.

Nonrelapse mortality was defined as death without prior relapse. Survival was calculated from the time of enrollment, with patients censored at date last known alive. Cox regression was used for hazard ratio (HR) analysis of nonrelapse mortality and survival relative to severity and other risk factors.


Patient and transplantation characteristics of the 298 participants included in this analysis are shown in Table 1. Chronic GVHD characteristics are summarized in Table 2.

Table 1

Patient and transplantation characteristics at enrollment

Table 2

Chronic GVHD characteristics at enrollment

At the time of study enrollment, global chronic GVHD severity according to NIH Consensus Criteria was calculated from reported data as mild in 10% (n = 32), moderate in 59% (n = 175), and severe in 31% (n = 91; Figure 1) Severity distribution was similar across incident (chronic GVHD diagnosis within 3 months of enrollment) and prevalent cases (P = .35). Skin, mouth, and liver were most commonly involved in mild chronic GVHD, while moderate and severe chronic GVHD often involved the skin, mouth, eye, liver, and lung (Table 2). Overall, the global severity assignments were attributable to lung (45%), skin (36%), eye (25%), mouth (15%), liver (12%), joint (11%), genital tract (6%), and GI tract (5%; column 2, Table 3). This means that the lung score contributed to the global severity score in 45% of the visits, even though the global severity could also have been determined from other organ involvement. In the analysis of necessary organ scoring, where the percentages represent the time that the organ must be scored to ascertain the correct global severity (column 3, Table 3), the order was identical but frequencies were lower. The 5 least influential organs each accounted for < 6% of the global severity scores, but failure to score them would mean that global severity would be underestimated in 16% of visits.

Table 3

The importance of scoring individual organs to determine the global severity at each visit

The moderate global severity category was heterogeneous with 18 (10%) classified as moderate because of 3 or more score 1 organ manifestations and 108 (62%) classified as moderate because at least 1 organ had a score of 2 or lung score of 1. Forty-nine patients (28%) would have been classified as moderate by either criterion.

Patients assigned to the severe global category (n = 91) often had score 3 skin (n = 39, > 50% BSA, or deep sclerotic features, or impaired mobility) or score 2-3 lung involvement (n = 44, FEV1 < 60% or lung function score 6 or greater or shortness of breath after walking on flat ground or at rest), accounting for 85% of assignments to the severe category. Scores of 3 in the mouth (n = 6), eye (n = 7), GI tract (n = 2), joints (n = 4), or genital tract (n = 4, females only) occurred in 3%-11% of patients in this category (Figure 1). There was no evidence that a specific pattern of organ involvement in the severe category was associated with nonrelapse mortality (P = .94) or survival (P = .85).

Global severity of chronic GVHD was not associated with previously reported risk factors for chronic GVHD onset such as older age, female donors for male patients, unrelated or HLA-mismatched donors, conditioning intensity, peripheral blood grafts, prior CMV infection, underlying disease, disease status, or prior acute GVHD,810 nor with previously defined risk factors for mortality in patients with chronic GVHD, such as time to onset of chronic GVHD or thrombocytopenia (< 100 × 109/L). Of the evaluated factors, only KPS ≤ 70 at time of chronic GVHD diagnosis was associated with higher global severity (Table 4).

Table 4

Risk factors for calculated NIH global severity

The median follow-up of survivors was 18.5 months (range 2-41 months). Higher NIH global severity at enrollment was associated with higher nonrelapse mortality and lower overall survival, overall P values < .0001 (Figure 2). Thrombocytopenia (platelets < 100 × 109/L) was also associated with nonrelapse mortality (HR 3.4: 1.7-6.7, P = .001) and survival (HR 3.1: 1.7-5.6, P = .0006). KPS ≤ 70 at time of chronic GVHD diagnosis was associated with survival (HR 2.1: 1.2-3.8, P = .05). Nonrelapse mortality and survival were not associated with donor type, recipient age, or disease status (Table 5) or with incident or prevalent status or time from transplantation. Two-year nonrelapse mortality was 3% (95% CI, 1%-10%), 9% (4%-15%), and 32% (20%-43%), and 2-year survival was 97% (95% CI, 90%-99%), 86% (80%-92%), and 62% (50%-74%) for mild, moderate, and severe global severity, respectively. The median survival for patients with severe chronic GVHD according to NIH consensus criteria was 30 months, while it has not been reached for patients with mild or moderate chronic GVHD.

Figure 2

Please provide a brief inclusive title to the Figure 2 legend that does not reference specific panel labels. Cumulative incidence of nonrelapse mortality (A) and Kaplan-Meier plot of overall survival (B) according to NIH global severity at enrollment, and showing 2-year estimates (95% CI). HR, hazard ratio and 95% CI, compared with mild chronic GVHD.

Table 5

Multivariate analysis of overall mortality and nonrelapse mortality


We analyzed the spectrum of organ involvement and global and organ-specific chronic GVHD severity in 298 adult patients enrolled in the Chronic GVHD Consortium and report several findings. First, the distribution of NIH global severity scores was higher than we expected and did not differ according to duration of chronic GVHD. Second, the moderate chronic GVHD category, which comprised over half (n = 175, 59%) of the patients, appears to be quite heterogeneous and is defined primarily by patients who have at least 1 organ of moderate severity or lung score of 1. Additional refinements to the global severity scoring may be able to distinguish prognostically different subgroups within the moderate category.

Higher NIH global severity was attributable both to a greater number of organs involved and more severe individual organ scoring, with lung, skin, or eye involvement contributing to the global severity score in the majority of visits. Although we had sought to identify at least one organ system that could be deleted from the scoring system without compromising the global severity calculation, we were not able to do so. The 5 least common organs still should be scored because otherwise global severity would be underestimated in 16% of visits.

Our multivariate analysis revealed no association of NIH global severity categories and previously defined risk factors for development of chronic GVHD or nonrelapse mortality,1,3,1116 other than a low KPS at diagnosis of chronic GVHD. These results suggest that most factors which predict onset and prognosis of chronic GVHD may be different from those that determine functional impairment and symptoms in individual organs. Our observation that KPS ≤ 70% at chronic GVHD diagnosis was associated with moderate-severe chronic GVHD concurrently or at any subsequent assessment point may just reflect functional impairment, such as shortness of breath with exertion, being incorporated into the organ scoring definitions. Low KPS has been previously identified as a risk factor for mortality in patients with chronic GVHD.1,3 There is not yet enough data in the current cohort to know whether certain organ involvements within the global severity categories are most predictive of survival because we could not detect any organ-specific associations in the severe category, but we might have had limited power. For example, eye and skin manifestations are common and often severe but may not be associated with life-threatening complications in the same direct way as lung dysfunction.

Retrospective series have reported mixed results in assessing whether NIH chronic GVHD global severity is associated with overall survival, where no correlation was found in one series,17 while worse survival was reported in other series.1821 These studies were all conducted from chart review rather than prospectively collected information. Our results are based on prospectively collected data and show that global chronic GVHD severity scores calculated according to the NIH suggested scoring algorithm do have prognostic significance. Severe global chronic GVHD was associated with higher nonrelapse mortality and lower survival, with a median survival of 30 months. There was a trend for moderate chronic GVHD to have a higher nonrelapse mortality and lower survival than patients with mild chronic GVHD but this is not statistically significant yet, perhaps because of limited sample size, population heterogeneity, and relatively few fatal events during the current period of follow-up. We do not yet have enough cases of recurrent malignancy to analyze whether NIH chronic GVHD severity is associated with the graft-versus-tumor effect.22

Our study has several limitations. First, while we are reporting a very large cohort of patients, these are all adults and reflect the transplantation practices and clinical assessments at a limited number of large institutions with a specific interest in chronic GVHD. Notably, children, ethnic and racial minorities, and other important subgroups are absent or underrepresented in this cohort. Second, because an indication for systemic treatment was required for enrollment into the cohort, patients with very mild chronic GVHD who only needed topical therapy are not represented. Third, because of the extensive nature of the required NIH data elements, we were not able to collect additional clinical information that could have been used to improve the organ-specific grading. The provider assessment is already long, requires specific training, and is challenging in the context of a busy clinical practice. Fourth, our median follow-up of survivors (18.5 months) is still short. Although we were able to demonstrate statistical differences in nonrelapse mortality and overall survival, additional follow-up will allow more nuanced analyses and possible refinements to the global scoring system. Lastly, although individual institutions have collected some biologic samples on study participants, there is no standardized biologic repository across the consortium. Biomarker discovery in chronic GVHD populations which are well characterized phenotypically according to NIH criteria may help elucidate the biologic pathogenesis of the subtypes of chronic GVHD, potentially allowing development of targeted therapies.

In conclusion, we recommend that all organs continue to be scored according to NIH criteria in studies focused on chronic GVHD incidence and severity although lung, skin, and eye scores contribute the most to global severity scoring. The current NIH global scoring system has prognostic significance and appears to accurately reflect different risks of nonrelapse mortality and overall survival. No apparent clinical factors predict patients who will have severe chronic GVHD, but this group, which comprised 31% of our cohort, had a median survival of only 30 months justifying study of aggressive new interventions to improve survival in this very high-risk group.


Contribution: S.A., M.J., C.C., M.A., D.J.W., M.E.D.F., P.J.M., and S.J.L. contributed clinical data; B.S. and X.C. performed statistical analysis; S.A., M.J., and S.J.L. designed research and drafted the manuscript; and all authors contributed to analysis and interpretation of data and critical review of the manuscript.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Stephanie J. Lee, MD, MPH, Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, D5-290, Seattle, WA 98109; e-mail: sjlee{at}


This work was supported by National Institutes of Health/National Cancer Institute grant CA 118953.

CA 118953National Institutes of Health


  • An Inside Blood analysis of this article appears at the front of this issue.

  • Presented in abstract form at the 52nd annual meeting of the American Society of Hematology, Orlando, FL, December 6, 2010.

  • The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

  • Submitted March 24, 2011.
  • Accepted July 5, 2011.


View Abstract