A 73-year-old man was diagnosed with primary myelofibrosis (PMF) after the incidental discovery of abnormalities in a blood test performed for the control of diabetes mellitus. The patient was asymptomatic. The spleen was palpable at 6 cm below the left costal margin. Hemoglobin was 10.9 g/dL; white blood cell count was 13.2 × 109/L, with a leukoerythroblastic blood picture and 2% blasts, platelet count of 387 × 109/L, and serum lactic dehydrogenase level of 1087 U/L. BCR/ABL was negative, and the JAK2 V617F mutation was found. Bone marrow cytogenetic study disclosed a normal 46,XY karyotype, and the marrow biopsy was typical of myelofibrosis. According to the International Prognostic Scoring System (IPSS),1 the patient had intermediate-2 risk PMF, because of age over 65 years and blood blasts ≥1%. Median survival of patients with intermediate-2 risk PMF is 4 years. Because the patient was asymptomatic and not eligible for allogeneic stem cell transplantation, should he receive ruxolitinib to try to prolong his survival?
Myelofibrosis (MF) is a clonal proliferation of a pluripotent hematopoietic stem cell that can appear de novo (primary myelofibrosis or PMF) or following a previously known essential thrombocythemia or polycythemia vera (post-ET or post–PV MF).2 The disease is largely driven by mutations in the JAK2, the calreticulin (CALR), or the MPL genes, which abnormally activate the cytokine receptor/JAK2 pathway and their downstream effectors; additional molecular abnormalities are frequently found.3 Clinical presentation is heterogeneous.1 Thirty percent of patients are initially asymptomatic, but most patients complain of symptoms from anemia and splenomegaly or constitutional symptoms (weight loss, night sweats, or low-grade fever). As the disease evolves, all patients become symptomatic due to marrow failure, increasing splenomegaly, and constitutional symptoms. Aquagenic pruritus, bone pain, infections, thrombosis, and extramedullary hematopoiesis in sites other than the spleen and liver can occur and, in some patients, evolution to acute leukemia is observed.4
Before 1995, median survival of MF patients was around 5 years.1 Later, but still before introduction of the JAK inhibitors, it increased to almost 7 years,5 a fact ascribed to earlier diagnosis and better medical care. Except for allogeneic stem cell transplantation, which can be applied to a minority of patients only, no curative treatment of the disease currently exists, and therapy is essentially palliative and aimed at controlling the disease symptoms. Conventional options include observation in asymptomatic patients, anemia-alleviating agents, cytoreductive drugs such as hydroxyurea, splenic radiation, and splenectomy.4 The introduction of ruxolitinib has changed the therapeutic scenario of MF.
Ruxolitinib is the only JAK inhibitor approved for the treatment of patients with MF. As all agents of this class, the drug mainly inhibits dysregulated JAK-STAT signaling, present in all MF patients irrespective of their JAK2 mutational status, but it is not selective for the mutated JAK2, which explains its efficacy in both JAK2-positive and JAK2-negative MF. Ruxolitinib is highly effective in reducing the spleen and controlling the symptoms of MF, with this resulting in a marked improvement in the patients’ quality of life.6 Its approval was based on the results of 2 randomized clinical trials, Controlled Myelofibrosis Study with Oral JAK Inhibitor Treatment-I (COMFORT-I)7 and COMFORT-II,8 comparing ruxolitinib with placebo or best available therapy (BAT), respectively. However, discordant indications were approved by the regulatory agencies. Thus, whereas the Food and Drug Administration approval was for patients with intermediate- and high-risk MF, the European Medicines Agency approved the drug for the treatment of the splenomegaly and/or constitutional symptoms of MF, irrespective of the risk group.
The therapeutic effect of ruxolitinib is usually dramatic, but also drug-dependent, because drug discontinuation or dose reduction is rapidly followed by spleen increase and reappearance of symptoms. Besides, there is no clear indication of a disease-modifying effect. Indeed, patients do not achieve a complete or a partial response and, quite often, not even a complete hematologic response. Moreover, reduction in the JAK2V617F allele burden is usually modest, whereas improvement in the bone marrow fibrosis is seen only in a minority of patients. In the absence of conventional criteria of response, the possible survival prolongation has been ascribed to the improvement in the patients’ performance status due to cytokine modulation. However, the effect of ruxolitinib on survival is a matter of controversy. Thus, although a survival benefit has been reported for patients receiving ruxolitinib in comparison with those treated with placebo, best therapy, and historical controls, a Cochrane Review concluded that the evidence was insufficient to allow any conclusion regarding the efficacy of the drug in MF, with this being mainly due to the lack of statistical potency of the phase 3 trials to measure a possible survival gain.9 A similar conclusion was reached by other authors.10
We searched the Medline database for references on ruxolitinib treatment. We considered only full published articles analyzing the effect of the drug on survival and including a comparator, either historical controls, placebo, or BAT. Because of this, studies on patients with intermediate-1 risk MF were not considered, as they did not include comparators. Only a few studies were retrieved fulfilling the above-mentioned criteria: (1) the 2 randomized clinical trials COMFORT-I and COMFORT-II, which enrolled patients with primary and post–PV/ET MF classified in the intermediate-2 and high-risk categories of the IPSS; (2) several updates of both trials; and (3) 3 case-control studies comparing the survival of patients treated with ruxolitinib with that of historical controls. These studies were evaluated for methodological quality using the criteria established for randomized controlled trials and observational studies.11,12 The confidence in the estimates was assessed according to the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) methodology.13 Because of the limited number of studies, we preferred the format of narrative rather than systematic review and allowed some digression from the GRADE discipline.
COMFORT-I was a double-blind, randomized placebo-controlled phase 3 trial conducted in the United States, Canada, and Australia.7 Enrollment began in 2009 and included 309 adult patients who met the following criteria: PMF or post–PV/ET MF; IPSS risk category intermediate-2 or higher; Eastern Cooperative Oncology Groupperformance status ≤3; palpable splenomegaly ≥5 cm below the left costal margin; platelet count ≥100 × 109/L; blood blasts <10%; and blood CD4 cell count >20 × 106/L. COMFORT-II was a nonblinded phase 3 trial conducted in several European countries,8 in which patients were randomized 2:1 to ruxolitinib or BAT. Inclusion criteria were similar to those of COMFORT-I, except for CD4 cell count; from July 2009 to January 2010, a total of 219 patients were enrolled, 146 to ruxolitinib and 73 to BAT.
The main study endpoint was the reduction in spleen volume, assessed by imaging techniques, at 24 weeks of treatment in COMFORT-I and at 48 weeks in COMFORT-II. Overall survival was a secondary endpoint, as also was the reduction of symptoms. Both trials established a set of criteria allowing discontinuation of the assigned therapy and entry into an extension phase in which patients allocated to placebo (COMFORT-I) or BAT (COMFORT-II) could receive ruxolitinib. Crossover to ruxolitinib was permitted as soon as before week 20 (COMFORT-I), and all ongoing patients in the control arms of the 2 trials ended up crossing over to ruxolitinib.14 Planned survival analysis according to the intention-to-treat (ITT) principle was performed after a median follow-up of 51 weeks in COMFORT-I and 61.1 weeks in COMFORT-II. No differences in survival between the 2 arms were observed in COMFORT-II, whereas an advantage for the ruxolitinib arm was seen in COMFORT-I (8.4% of patients in the ruxolitinib arm had died vs 15.6% in the placebo arm; hazard ratio [HR]: 0.5, 95% confidence interval [CI]: 0.25-0.98; P = .04).
At the above-mentioned cutoff times, the expected mortality rates derived from the original IPSS series were ∼15% for the high-risk group and 10% for the intermediate-2 risk group.1 The observed mortality was 37 patients (12% of the initial series) in COMFORT-I and 15 (6.7%) in COMFORT-II. A statistical power calculation based on the expected mortality shows that, at these early cutoff times, 247 events would have been necessary to detect a 30% reduction in mortality and 65 events to detect a 50% reduction. Therefore, both COMFORT-I and COMFORT-II were severely underpowered to estimate the effect of treatment on survival.
Updates of the COMFORT studies
Patients enrolled in the COMFORT studies were followed up over an extended, noncontrolled study phase, and the results were reported at 2 and 3 years for COMFORT-I,15,16 at 3 and 5 years for COMFORT-II,17,18 and at 3 years for both trials combined.14 Because many patients allocated to the control arms in both trials were crossed over to active treatment with ruxolitinib, the survival analysis based on the ITT principle may fail to get an accurate estimate of the treatment effects. In some of the above follow-up reports, the rank-preserving structural failure time (RPSFT) model was used in order to correct the survival estimates for crossover.19 The RPSFT model works by reconstructing the survival of patients as if they had never received active treatment. Survival under treatment is shrunk by successive numerical factors until the experimental and control arms have the same curve. The model assumes that treatment acts by increasing the survival time by this shrinking factor. When this method was applied, the nominal data suggested a survival advantage for ruxolitinib over placebo and BAT. However, it must be noted that the RPSFT is actually a theoretical model. For instance, the accuracy of the survival estimated by the Kaplan-Meier method and of the HRs derived from Cox regression can always be tested by checking the estimation against the actual data once all individuals have reached the outcome. On the contrary, the accuracy of the RPSFT model can never be tested because there is no tangible reality behind the statistical model against which to confront the estimated results.
Risk of bias or internal validity
Both COMFORT studies, as well as their successive updates, were sponsored by the industry (Incyte Co. and Novartis), which is the usual practice in trials with new drugs. The data were analyzed and interpreted by the sponsors’ clinical and statistical teams, and the investigators collaborated in the interpretation of the results. The investigators were transparent in declaring conflicts of interest, and many of them had received honoraria or research funds from the sponsors or were affiliated with them. It has been pointed out that these facts must be taken into account when judging the risk of bias.20 On the positive side, an independent board reviewed the data, and the randomization was based on a method that facilitates an even distribution of potential confounders and guarantees allocation concealment.
The lack of blindness may contribute to post–allocation selection bias and to ascertain bias. As previously noted,10 in COMFORT-I, blindness was likely to be imperfect, because physicians could easily guess whether the patient was in the treatment or the placebo arm because of the rapid disappearance of the symptoms in patients under active treatment. Besides, COMFORT-II and the subsequent follow-ups of both trials were not blinded. Lack of blindness may influence the physician’s decision to discontinue therapy in the control group and increases the adherence of patients who know that they are receiving the new drug (performance bias). All these facts may be behind the higher discontinuation rate in the placebo arm in COMFORT-I (18.2% vs 7.7%, after excluding deaths; P = .006) and in the BAT arm in COMFORT-II (52.0% vs 33.5%, after excluding deaths; P = .008). Differences in the rates of discontinuation may be indicative of the so-called “informative censoring,”21 a kind of bias in which loss to follow-up is not random but linked to the intervention under study or to an uneven distribution of the prognostic factors between the 2 study arms. Although lack of blinding can rarely bias the interpretation of objective outcomes (ascertain bias), as it is the case of all-cause mortality, it may erode the even distribution of confounders produced by the randomization.22
Uncontrolled follow-up of randomized trials are exposed to several sources of bias that make them close to observational studies with regard to the quality of the evidence on the magnitude of treatment effect, despite keeping the affix “randomized trial.”23 Confounding bias may arise from the differential use of concomitant therapies, intensity of care, or selective nonadherence. For instance, physicians may be more prone to closely follow patients in the treatment arm because they are less familiar with the possible side effects of the new drug. Closer follow-up implies early recognition and treatment of complications and comorbidities, a fact that might confer a survival advantage to the patients. Selection bias may arise from differential loss to follow-up because of higher, unregistered mortality in 1 of the trial arms. Some of the above biases may underlie the already mentioned differences in the censoring rates in the updates of COMFORT-I and COMFORT-II. Finally, the appropriateness of using placebo as control arm for the evaluation of new therapies (as was the case in COMFORT-I) is questionable, because such an approach does not reflect real clinical practice, especially in diseases such as MF, which are associated with an important symptom burden.
The original COMFORT trials were largely underpowered to provide a precise estimation of the effect of treatment on survival, due to the short follow-up and the small number of events at the time of the cutoff analysis. Reports on follow-up updates are richer in events. Nevertheless, because of the high rate of crossover to the new therapy in both trials, the measures of the differential survival based on ITT should be regarded as imprecise estimates of the “true” treatment effect. Correction of the crossover error by means of the RPSFT model was intended to overcome this source of imprecision, but this is critically dependent on the assumptions behind the model (see above), which are transparent but not testable. Thus, as previously mentioned, one can never test the accuracy of the results of the RPSFT model because there is no tangible reality to confront with the estimated results.
The impact of ruxolitinib therapy on the survival of MF patients harboring non–myeloproliferative neoplasm driver mutations (such as ASXL1, EZH2, SRSF2, IDH1/2, and others) has been analyzed with contradictory results. Thus, in a subpopulation of 166 patients of the COMFORT-II study, ruxolitinib seemed to be beneficial in terms of splenomegaly, symptoms, and survival also in the minority of patients with the above mutations.24 However, a similar analysis in 95 patients of the phase 1/2 study of ruxolitinib showed a worse outcome for patients with ≥3 driver and nondriver mutations.25 Therefore, in future studies, the inclusion of information on the patients’ mutational status would be desirable.
In a sponsor-independent analysis, Tefferi et al26 compared the long-term outcome of 51 MF patients treated with ruxolitinib in the phase 1/2 trial (NCT00509899)6 with 410 historical controls from Mayo Clinic. No differences in raw survival or Dynamic International Prognostic Scoring System-Plus (DIPSS-Plus) adjusted survival were found, but no additional data were provided to allow judging the quality of the analysis.
Verstovsek et al27 compared 101 patients from the NCT00509899 trial6 treated at MD Anderson Cancer Center with historical controls from 1 American and 2 Italian databases matched with the patients for the trial inclusion criteria. After a median follow-up of 32 months, survival was significantly better in the ruxolitinib group than in the controls, but the benefit was restricted to patients in the IPSS high-risk category (HR 0.50, 95% CI: 0.31-0.81; P = .008). Patients receiving ruxolitinib had higher leukocyte counts and larger spleens, whereas controls were older (69% vs 50% over 65 years) and had slightly lower hemoglobin levels. The net effect of these imbalances cannot be readily determined. It is worth noting that most, if not all, historical controls seemed to have PMF (data not shown), whereas 47% of those enrolled in the NCT00509899 trial had post–PV/ET MF. This potential imbalance between the 2 groups may not be trivial, because post–PV/ET MF has been associated with longer survival than PMF in patients receiving ruxolitinib.15
More recently, Passamonti et al28 compared 100 patients with PMF assigned to the ruxolitinib arm in COMFORT-II with 350 historical controls selected from the multicenter DIPSS database. The timeline for survival analysis began at diagnosis in both groups, but patients were considered at risk when being started on ruxolitinib (COMFORT-II group) or when progression to IPSS intermediate-2 or high risk was first documented (DIPSS group). After a median follow-up of 2.5 years for the COMFORT-II patients and 2.6 years for the DIPSS patients, 30 (30%) deaths occurred in COMFORT-II and 258 (86%) deaths occurred in the DIPSS group. Median survival was 5 years (95% CI: 2.9-7.8) for the COMFORT-II group and 3.5 years (95% CI: 3.0-3.9) for the DIPSS patients. Of note, both groups were mismatched for an important prognostic factor that penalized DIPSS patients. Indeed, the minimum platelet count threshold (100 × 109/L) employed in COMFORT-II was not applied to the DIPSS controls, so that in the latter group, 25% of patients had platelet counts <100 × 109/L. Thrombocytopenia is a well-known poor prognostic factor in PMF, as shown in the original IPSS series,1 where it was strongly correlated with anemia, as well as in the DIPSS-Plus classification29 and in other studies.30 Thrombocytopenia has also been linked to a higher frequency of evolution of MF to acute leukemia.31 Moreover, in the pooled analysis of the COMFORT studies, higher baseline platelet counts correlated with lower risk of death.14 Both groups were also mismatched for spleen size, with larger spleens in COMFORT-II, although spleen size has never been identified as an independent prognostic factor in modern MF risk classifications. Finally, it is worth noting that the COMFORT-II patients seemed to be younger at MF diagnosis than the DIPSS patients (median age: 61 vs 67 years), whereas both groups had a roughly similar median age at the start of the survival analysis (68 years in COMFORT-II vs 67 years in DIPPS). Therefore, quite likely, patients in the DIPSS group had evolved more quickly into the intermediate-2 and high-risk categories, which may have selected a MF population with a more aggressive evolution.
Finally, historical controls carry an increased risk of selection and recall biases, which reduces the internal validity of this kind of study. Moreover, important factors may have changed over time, such as, for instance, diagnostic criteria, distribution of prognostic factors, and improvements in quality of care. In the case of ruxolitinib-treated patients, the above imbalances between the treatment group and the historical controls further erode the internal validity of these studies.
Table 1 summarizes the studies of ruxolitinib in MF and the estimates of the treatment effect on survival that are derived from the clinical trials, their successive updates, and the case-control studies.
The results of the present analysis indicate that the evidence supporting a survival prolongation by ruxolitinib therapy in MF patients is weak, due to the methodological caveats of the available studies. Thus, as previously mentioned, studies using historical controls are generally considered a poor source of evidence; moreover, in 1 of such studies that supported a survival advantage for patients on ruxolitinib, the control group was penalized. Concerning the phase 3 trials, they were severely underpowered to estimate the effect of ruxolitinib on survival, whereas the early crossover further contributed to weaken the capacity to show potential survival differences between the groups. Besides, there is no biological evidence, such as the achievement of a complete or a partial remission, cytogenetic or molecular response, or reversal of the bone marrow fibrosis, that can support a possible favorable effect of ruxolitinib on the survival of MF patients. Therefore, despite the suggestion of a survival advantage for patients receiving ruxolitinib, appropriately designed phase 3 studies, without the above-mentioned caveats, would be needed to demonstrate a survival benefit of this therapy. However, this does not undermine the efficacy of the drug in controlling 2 of the 3 main clinical manifestations of MF (namely, splenomegaly and constitutional symptoms), a fact that has a profound impact on the patient’s quality of life, as it has been seen not only in clinical trials32 but also in daily practice. In this sense, ruxolitinib can be considered the current BAT for the above 2 clinical manifestations of MF.
On the basis of the existing evidence, ruxolitinib is recommended for the treatment of intermediate-2 and high-risk MF patients with symptomatic splenomegaly and/or constitutional symptoms. Given the efficacy of ruxolitinib in this clinical setting, its use in intermediate-1 risk MF patients with these symptoms seems reasonable, although the evidence is limited. On the contrary, ruxolitinib should not be used in MF patients with the only purpose of prolonging survival. Actually, these have been the recommendations of a recent consensus document of an expert panel on behalf of the European LeukemiaNet and the Italian Society of Hematology.33
Ruxolitinib was not prescribed to our patient, who had intermediate-2 risk MF but no spleen-derived or constitutional symptoms.
Contribution: F.C. designed the study, participated in the review and selection of the publications and in the data analysis, and wrote the manuscript; A.P. participated in the critical review of the studies, performed the data analysis, and wrote the manuscript.
Conflict-of-interest disclosure: F.C. received honoraria from Novartis and Incyte. A.P. declares no competing financial interests.
Correspondence: Francisco Cervantes, Hematology Department, Hospital Clínic, Villarroel 170, 08036 Barcelona, Spain; e-mail:.
This study has been supported in part by grant RD012/0036/0004 from the Instituto de Salud Carlos III, Spanish Ministry of Health.
- Submitted November 4, 2016.
- Accepted December 23, 2016.
- © 2017 by The American Society of Hematology