Recently, a study was published in the New England Journal of Medicine, entitled Antenatal Betamethasone for Women at Risk of Late Preterm Delivery. The authors conclude that, “Administration of betamethasone to women at risk for late preterm delivery significantly reduced the rate of neonatal respiratory complications.” So should we start doing this? We have to consider several things before deciding. What is the quality of the paper? Are the findings clinically significant? Are there any risks or unintended consequences of doing so? Are the findings generalizable?
One of the unfortunate things about this paper, and many like it, is the over-utilization and reliance on p-values. Just because a set of data can be manipulated into having a p-value less than 0.05 doesn’t mean that the finding is “true” or “statistically significant.” And it certainly doesn’t mean that it is clinically significant. But in the lust for significant p-values, a lot of harm to the scientific community is done.
I won’t spend too much time here on p-values, but you may be unaware of the quiet revolution happening in statistics. Recently, the American Statistical Association released a report on proper use of p-values. Fivethirtyeight.com has a wonderful review of the new guidelines here. Please don’t pretend to create or read scientific literature without first understanding these issues. The bottom line: p-values are hacked, manipulated, and abused, and don’t represent what authors and readers think they do. I will talk about this elsewhere, but for now I would like to list the six bullet points from the ASA’s statement:
- P-values can indicate how incompatible the data are with a specified statistical model.
- P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
- Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
- Proper inference requires full reporting and transparency.
- A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
- By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
With this understanding of p-values, let’s look at this paper in more detail.
What was the null hypothesis?
Unfortunately, this was not explicitly stated in the paper; however, the intent was to determine whether the administration of betamethasone in the late preterm period (34-36.5 weeks) would decrease respiratory and other neonatal complications. The primary outcome was a composite of need for CPAP or high flow nasal cannula for ≥ 2 hours, need for ≥ 30% inspired oxygen for ≥ 4 hours, need for mechanical ventilation, need for ECMO, or neonatal death ≤ 72 hours after birth. The study was powered appropriately only for this composite outcome.
It was not powered to study secondary outcomes. The authors also state that they chose not to perform adjustments for multiple comparisons in the secondary outcomes. Not doing this important step is a common form of p-hacking and makes any conclusions drawn from secondary outcomes questionable. Some of the secondary outcomes included were severe respiratory complications (defined as need for CPAP or high flow nasal cannula for ≥ 12 hours or need for ≥ 30% inspired oxygen for ≥ 24 hours), need for resuscitation, transient tachypnea of the newborn, surfactant use, etc.
Who did they enroll?
The authors enrolled women at risk of preterm delivery between 34-36.5 weeks and divided them into three main groups: preterm labor, ruptured membranes, and other medical or indicated deliveries, such as GHTN, IUGR, oligohydramnios, etc. About half of the women enrolled were in the medically indicated group and about half were either in the preterm labor or ruptured membranes group. About 31% of the patients had cesarean deliveries.
What did they find?
They found that the primary outcome, the composite of respiratory markers, was decreased from 14.4% in the placebo group to 11.6% in the steroid group, with a p-value of 0.02. More specifically, among the endpoints that make up the composite, only need for CPAP or high flow nasal cannula for ≥ 2 hours was different in the two groups. There was no difference in need for 30% oxygen for ≥ 4 hours or need for mechanical ventilation. None of the infants in the study needed ECMO and none died within 72 hours of birth, though two did die before discharge home, both in the betamethasone group.
Again, among the secondary neonatal respiratory outcomes, there was no adjustment made for multiple comparisons so the p-values given are mostly irrelevant. However, it is likely that one secondary outcome would have survived such adjustments: CPAP or high flow nasal cannula for ≥ 12 hours. This was present in 10.5% of the placebo group and in 6.5% of the betamethasone group. This outcome accounts for what the authors term “severe respiratory complication,” though it is arguable whether this should qualify for the term ‘severe.’ Other secondary outcomes with p-values less than 0.05 include need for resuscitation at birth, transient tachypnea of the newborn, bronchopulmonary dysplasia, and surfactant use. It is not clear if any of these would have had significant p-values if adjustments were made for multiple comparisons, though transient tachypnea of the newborn has the lowest p-value in this group.
The authors also report on other, non-respiratory, neonatal outcomes. Again, no adjustment was made for multiple comparisons. Of these, the only significant finding which would likely survive such analysis was neonatal hypoglycemia, which occurred in 24% of infants exposed to betamethasone and only 15% of those exposed to placebo. In fact, overall, this was the most “significant” finding in the study. There was no difference in things like admissions to NICU or intermediate care nursery, length of hospital stay, feeding difficulties, sepsis, necrotizing entercolitis, hypothermia, or hyperbilirubinemia.
What did they conclude?
The administration of antenatal betamethasone in women at risk for late preterm delivery significantly decreased the rate of respiratory complications in newborns. Betamethasone administration significantly increased the rate of neonatal hypoglycemia but not the rates of other maternal or neonatal complications.
The authors do not comment on magnitude of effect. They do not use more explanatory terms like ‘number needed to treat’ (NNT) or ‘number needed to harm’ (NNH). There is no long-term follow-up data about the infants, so we don’t know if the observed differences have any long-term implications. There is no data or speculation about how a policy of universal administration of betamethasone to women at risk of late preterm delivery might lead to excessive hospitalization with increased cost, or other unintended consequences, like provoking DKA in at risk mothers (in fact, diabetic women were excluded from the trial).
The authors do provide more data in the Appendix, and there are several important and relevant pieces of information in these data. I will present the data in terms of numbers needed to treat or harm since these are more meaningful to determine clinical significance. The authors give additional information relating to the composite primary outcome (recall this was really only need for CPAP or high flow nasal cannula for ≥ 2 hours) and they provided separate data for the secondary outcome of severe respiratory complications (which was really only CPAP or high flow nasal cannula for ≥ 12 hours).
For the composite outcome,
- Among women attempting a vaginal delivery, the NNT was 59 to prevent one case
- Among women planning to have a cesarean delivery, the NNT was 10 to prevent one case
- Among women with preterm labor, the NNT was 59 to prevent one case
- Among women with preterm rupture of membranes, the NNT was 68 to prevent one case
- Among women with medically indicated deliveries, the NNT was 24 to prevent one case
- There was no difference at all among women after 36 weeks.
For the secondary outcome of severe respiratory complication,
- Among women attempting a vaginal delivery, the NNH (number needed to harm) was 500 to cause one case
- Among women planning to have a cesarean delivery, the NNT was 13.7 to prevent one case
- Among women with preterm labor, the NNT was 143 to prevent one case
- Among women with preterm rupture of membranes, the NNH was 200 to cause one case
- Among women with medically indicated deliveries, the NNT was 46 to prevent one case
- Among women enrolled ≥36 weeks, the NNT was 112 to prevent one case
Additionally, there was no statistically significant difference in the primary composite outcome or in the severe respiratory complication for male fetuses (the differences were seen only in female fetuses). Again, there was no difference among women enrolled ≥ 36 weeks for the primary outcome. The NNH was 11 for the causing an additional case of neonatal hypoglycemia.
So what does all of this mean?
- We can say with some certainty that no benefit was seen after 36 weeks. So the conversation should be limited to women with gestational ages between 34 and 35.5 weeks.
- The benefit was largest among women undergoing planned cesareans and lowest among women undergoing spontaneous vaginal deliveries after ruptured membranes. The authors didn’t supply measures of significance for the subsets but it seems likely that the data are not significant for any category except indicated cesareans. Recall that the NNT for a planned cesarean was only 10 to prevent one case of the primary outcome versus 68 for a woman with premature ruptured membranes. In fact, the secondary outcome of severe respiratory complication was more likely in the steroid group (though this, like many of these outcomes, was not statistically significant). So women with planned vaginal deliveries likely do not benefit.
- The fact that these data seem only to apply to female fetuses is interesting to say the least. The authors did not comment on this at all, unfortunately. The truth is, this finding reflects, more than anything, the poor statistical value of any of the associations. But, if one were being a purist, one would not give steroids when the fetus was known to be male.
- The authors quickly gloss over the increased risk of hypoglycemia among the infants exposed to betamethasone, but this was, by far, the most significant finding in the study and must give rise to the consideration of potential risks. Since most newborns who are hypoglycemic in this gestational age will be admitted to an intermediate nursery or neonatal intensive care unit, then this risk likely offsets any benefits seen in the subgroups who benefitted from steroid administration (which are basically female fetuses who are born by planned cesarean due to a medical indication). In the groups where there was no benefit from steroids, such as fetuses after 36 weeks or males, then the hypoglycemia represents a risk with no benefit that might otherwise balance it out. This risk/benefit profile likely accounts for the observation that there were no observed differences in admission to NICU or intermediate care nurseries or lengths of hospital stay. Both groups were receiving treatment, albeit for different causes in some cases.
- The rates of neonatal sepsis were also not different in the two groups, but there was one death from sepsis in the steroid group, compared to none in the control group. There are no data presented about the severity of infections, only the frequency.
- We should not assume that reducing the rates of neonatal respiratory problems will translate in the future to lower rates of childhood respiratory issues. This data will become available in time. Also, we should note that neonatal hypoglycemia is associated with an increased risk of developmental delay (with an OR of 2.42), so we also shouldn’t assume that the stark increase in neonatal hypoglycemia is without later consequence.
So should we give the steroids or not?
Well certainly not to everyone. It is reasonable, based on this and previous studies, to give a course of betamethasone to women who are undergoing a planned cesarean delivery for medical indications. Still, we have to consider which is worse: transient neonatal hypoglycemia or CPAP or high flow nasal cannula for ≥ 2 hours? Essentially, we are trading one for the other. How severe is the respiratory outcome that is being prevented? Remember, there was no differences in the rates of mechanical ventilation. If these are essentially equal outcomes, then ultimately we are not benefitting newborns by giving steroids, and we may be causing harm if the administration of steroids is overly-applied to all pregnancies at risk for delivery between 34-37 weeks.
The authors of an accompanying editorial in the Journal also urges restraint in utilizing the data in this study and cite the need for more studies and follow-up studies. They do suggest an interesting compromise that reflects the their New Zealand heritage: That we administer steroids up to 34.6 weeks, which is the practice in New Zealand and a recommendation of the World Health Organization. It would be interesting to see what would happen to the data in the current study if it included women only from 35.0-36.6 weeks. It seems probable that all statistical significance would wash away.