Published Date : January 19, 2017
Categories : Cognitive Bias
Imagine living in a different world, a world where our children are not pumped full of dangerous toxins and chemicals at the doctor’s office; a world where women aren’t forced to give birth in hospitals, pumped full of microbiome-destroying antibiotics and love-destroying pitocin; a world where we don’t have to worry about eating genetically modified food, where our crops are grown in a sustainable manner, without chemical pesticides and fertilizers; a world where our animals are not pumped full of hormones and antibiotics and they are allowed to range freely instead of being kept pent up in cages, eating pesticide-ridden, genetically modified corn; imagine a world where our food isn’t processed, stripped of nutrients; where we eat whole, natural foods, where corn syrup doesn’t make up most of our calories and diet drinks and other chemicals don’t comprise most of what we drink.
Imagine a world where doctors aren’t constantly giving us medicines that we don’t need and where we aren’t exploited by hospitals that want to steal our money and our health; imagine a world where kids aren’t constantly exposed to television and violent video games, where they go outside and play and work like regular kids; imagine a world where there are no assault rifles or semiautomatic handguns, killing our children, a world where we haven’t destroyed the purity of our water and air with countless chemicals and pollutants and carbon-based pollution; where smog doesn’t choke our children, a world where they can grow up without being poisoned by the countless chemicals in the plastics that litter our landfills and oceans, where they can truly thrive. Can you imagine this heaven on earth?
Well, you don’t have to just imagine it. It’s real a place. It’s called America, 1850.
Let’s see what life was like in this simpler, more wholesome, natural, organic world, free of the horrors of the modern age.
In 1850, a white male could expect to live to the age of 38.3 years, and a white female would hopefully see 40.5 years. Nonwhites lived only to 32.5 for men and 35 for women. Today, the average person can expect to live to 78.8 years of age.
In 1850, a woman had approximately a 1% chance of death during any given birth, and with an average of around 11 pregnancies, she had about a 1 in 9 chance of death due to pregnancy in her lifetime. Today, a woman in the United States has about a 1 in 2,500 lifetime risk of death due to childbirth, with just over 2 births per average.
In 1850, a white child had a 21.7% chance of death before the age of 1 and a nonwhite child had a 34% chance of death before age 1. Today, infant mortality (death before age 1) is about 0.5%. Overall, in 1850, about half of children did not make it past the age of 5 years. Today, the total number of children who die before age 5, including infant deaths, is just over 0.5%
The ten leading causes of death in 1850 were:
In 2015, the 10 leading causes of death were:
In 1850, the homicide rate was about 20/100k, while today the murder rate is closer to 4.9/100k. Very few people owned guns in America in the 1850s, and none owned assaults weapons or semiautomatic pistols. Today in the US there are more guns than people.
In 1850, most children started doing significant work at the age of 5, and children 14 and up worked nearly full time just trying to help the family produce enough food and other resources to survive. Today, children spend that time learning.
In 1850, few homes had indoor plumbing or reliably clean, potable water. Today, few homes don’t. In 1850, no home had refrigeration, pasteurization, or other means of preserving perishable foodstuffs. Today, this is unimaginable.
In 1850, children eating whole grain and organic, non-processed diets, had relatively high levels of pernicious anemia, rickets, pellagra, scurvy, and other diseases related to vitamin deficiencies. Today, these diseases are practically unknown (in the US) largely due to vitamin fortification in our processed foods.
I don’t have data from 1850, but in 1900, we know that the following numbers of children were affected by the following diseases:
Today, we vaccinate against diphtheria, tetanus, whooping cough (pertussis), polio, measles, mumps, rubella, chickenpox, and the flu. Polio and smallpox have been eliminated and diphtheria nearly eliminated. Deaths from these diseases, like maternal mortality, have been reduced by about 99% at a time when more children should be getting them due to population density.
If you were a child in 1850 who developed Type 1 Diabetes, certain leukemias and lymphomas, and a few other diseases, you just died. Today, you live a fairly normal life. If your appendix or ectopic pregnancy ruptured in 1850, you more than likely died. Today, you are easily fixed.
I could go on. But technological improvements in sanitation and hygiene, vaccinations, the development of miracle drugs like antibiotics and insulin, the dramatically improved safety of surgeries ranging from appendectomies to cesarean deliveries, improved farming techniques, fortification of processed foods with vitamins, pasteurization, water purification, refrigeration, and a host of other innovations have made us free to live longer, have more leisure time (to develop things like computers and go to Space), live more comfortable and healthier lives, and die of things like cancer and Alzheimer’s disease rather than dysentery and cholera.
So you are right to blame non-organic farming methods, GMOs, vaccines, corn syrup, processed foods, diet drinks, and antibiotics for the rising rates of cancer, heart disease, Alzheimers, strokes, and kidney disease. They cause the diseases by allowing you to live to an old enough age to develop them. You’re welcome.
Published Date : January 18, 2017
Categories : Cognitive Bias
A common fallacy we all make is the appeal to accomplishment. This occurs when we claim that an opinion or idea is invalid because the person making the claim doesn’t have the accomplishments, experiences, or perhaps moral standing to make such a claim. “How can you stand there and criticize the chef’s cooking? Have you ever been a chef?” I prefer, however, to call this fallacy an appeal to experience.
The appeal to experience comes in many forms. For example,
I’m sure you can think of more. All of these arguments are logically absurd. You may be wrong about your ideas, and your lack of experience may even contribute to why you are wrong, but you are not wrong simply because you lack the experience. Arguments should stand on their own merits, not the merits of the ones making the arguments. An argument should only be discounted because it lacks merit.
But this fallacy is important for a different reason, and it relates in particular to the practice of anecdotal medicine. Here’s a howardism:
Our experiences lead us to awareness, not insight.
People often confuse awareness and insight. Let’s say you were a victim of a violent crime. This experience might make you a passionate advocate for reducing such crimes and helping victims of such crimes. But your experience alone will not tell you how best to do those things, and, worse, your experience may bias you in a way that you might never be the best person to solve the problem. You might want to focus all of our resources on ineffective measures related only to this one problem that has so viscerally affected you, and by doing so you might actually cause increased crime elsewhere.
Extend this metaphor to healthcare. Someone may be passionate about a certain disease (say, ovarian cancer) because she has been personally affected by it or has taken care of many patients who have died from it. This visceral bias may lead to selective interpretation of data and a desire to do whatever, no matter the unintended consequences, to prevent ovarian cancer. Such passion may lead for the advocacy of screening CA-125 tests or screening ultrasounds, both of which have been found to be harmful. But often it takes an objective observer to see the real risk and benefit ratio of a given intervention.
Our experiences makes us aware of problems, they allow us to have empathy for others, they focus our attention, but they also make us biased in a deep, visceral way, and our experiences may intellectually cripple us, making us unaware of larger, more global issues. The person with experience is often not the right person to make decisions about the things he has personally experienced. When our personal identities are so closely tied to a particular thing, say a way of practicing medicine or teaching or anything else, we may have a problem seeing the forest as we focus too much on our one blade of grass. This is called inattentional blindness.
So how do we gain insight? Through deliberate analysis and study. This needs to be dispassionate and objective, considering all factors fairly. A person with “experience” can do this and her experience can be valuable, but she must carefully check against her own biases. We don’t need to experience a thing to gain insight into it. I don’t need to damage a ureter during a hysterectomy to gain expertise at preventing ureteric injury. I don’t need to perform a series of Zavanelli maneuvers in order to know how to do one when called upon to do so (okay, I’ve done one…).
In medicine, many discussions (a form of argument) with colleagues or patients are begun or ended with “I once had a patient who…” or “In my experience…” or “When I do this procedure…” etc. This may serve the purpose of providing context for the argument and that is fine, but too often these anecdotes are the argument. It is a claim that the experience with that patient or procedure is sufficient to end the debate, and this is the hallmark of anecdotal medicine.
Patients often find this appealing, and many will seek out the most experienced physician, assuming that this is the same as the most competent physician. This makes the assumption that experience is a predicate of insight. Patients often ask, “Have you ever seen a patient with this diagnosis before?” when they mean to ask, “Do you have competency in treating this diagnosis?” These are often not the same question.
One of my favorites (as a male gynecologist) is, “I know you’re a man so you don’t understand what it’s like to have a period….” That’s true, but I am an expert in treating problems associated with the menstrual cycle and I have discussed the menstrual cycles with thousands of women; I’ve read about the menstrual cycle in dozens of books and papers, and I possess the shared and accumulated knowledge of hundreds of years of scientific research. Even better, I’m not personally biased about the menstrual cycle and I don’t believe it causes my toes to hurt, as someone might anecdotally believe who notices that her toes hurt once a month. So while I lack the joy of personally having a period, I am qualified to treat one.
This appeal to experience argument is extended to almost every issue in modern society. Unless you have a child with autism, you can’t decide whether vaccines are related to autism. Unless you are a victim of gun violence, you aren’t entitled to have an opinion about gun laws. Unless you’ve walked a mile in my shoes, don’t tell me how to walk.
But worse than this, the argument in medicine is, “I don’t care what the evidence says, I know from my own personal experience that it works.” I have heard this more times than I care to mention. “I don’t need to read about it, I treat it every day!” “I know the literature says that this doesn’t work, but in my hands I have excellent outcomes.”
For all its faults, evidence based or science based medicine is lightyears ahead of anecdotal medicine. It is odd, to me at least, that most physicians still practice anecdotal medicine. Even those who claim to believe in evidence based medicine tend to pick and choose the literature that confirms their biases and ignore the literature that disagrees with them. This is disingenuous at best and dishonest at worst. But humans, especially doctors, like to believe that their experiences are more valuable than those of other people. It makes us feel special to think that we can intuit how to do something and use our genius, rather than just blindly follow best practices.
When you practice anecdotal medicine, you are basing your patient’s health on a few personal experiences. Whether your past experience with the disease is one patient or 1,000, you are still operating on assumptions based on far fewer patients than our collective experiences of thousands or tens of thousands of patients. What’s more, your patients were “collected” in a biased way, without controls, subject to faulty memories, and the prior patient or patients you recall may not match the current patient in front of you. Even worse, you are likely committing the post hoc logical fallacy, which states that since some second thing happened after some first thing, then the first thing must have caused the second. “I gave her an antibiotic for a UTI and her pain went away.” This is an illogical statement. Her pain went away, yes, but you don’t know that it wouldn’t have gone away anyway if you hadn’t diagnosed her with a UTI and treated her. In fact, for all you know, she might have gotten better even sooner if you hadn’t misdiagnosed her and treated her with an antibiotic. An incredible amount of medical practice is based on this fallacy alone.
People really believe in anecdotes, though. Michael Shermer has said,
Anecdotal thinking comes naturally; science requires training.
I have read numerous defenses of anecdotal medicine over the years. Each of these must, ultimately, take the position that our experience with one or two patients is better than our experience with one or two thousand patients. This quote by Italian surgeon Nicola Basso I think sums up the argument:
The fact that a stone, thrown out of the window, rises instead of falls is not statistically significant, but it is a powerful observation.
I’ve always liked that quote and Basso was a wonderful surgeon. But its total bunk. There is so much evidence that gravity is a thing that if you throw a stone out the window and it goes up, you better start coming up with alternative explanations:
Observation of the extraordinary or outlandish doesn’t overturn good evidence. The observation of the stone is not statistically significant for a reason, and this is not a powerful observation, it is a dangerous one. This is the same type of thinking that leads to the conclusion that vaccines cause autism. If the stone really went up, then set up some controls and replicate the finding sufficiently and it will become significant data and we will revise our theories appropriately. But don’t tell me that you ate radish juice for a month and your cancer went away. Until you disprove gravity, I won’t jump out of any windows just because you had a strange experience with a stone.
If you want to explore this type of illogical thinking more, here is a blog that discusses the power of anecdotal proof that vaccines cause autism. If you have the stomach, give it a quick look. The illogic, rhetoric, and cliches are simply amazing. What bothers me, though, is not when this illogic is applied to subjects like autism, but when the exact same logic is applied to things like magnesium tocolysis. Anecdotal evidence is anecdotal evidence, and you shouldn’t criticize this anti-vaxer if you aren’t also prepared to criticize the same logic used everyday in medicine. I see pediatricians everyday take rather self-pious positions about anti-vaxers, and then go to work and practice pediatrics based upon their own anecdotal experiences.
It is true that collected anecdotes may eventually amount to useful data, but we shouldn’t act on that data until it becomes sufficiently collected. Marc Bekoff has said, “The plural of anecdote is not data.” We must collect the data systematically and analyze it blindly and without bias, until a reasonable probability of factuality is reached. A lot of good science starts with a keen observation of something interesting, but a lot of bad patient care also starts with such observations that go unvetted. We don’t apply anecdotes to patients until the information has been replicated, controlled, tested, and validated.
“It is not what the man of science believes that distinguishes him, but how and why he believes it. His beliefs are tentative, not dogmatic; they are based on evidence, not on authority or intuition.” – Bertrand Russell
Of course, we don’t have evidence based answers to many, many questions. This leads us often to decide what to do based on weak evidence or even “expert opinion.” I think many are so frustrated by this fact that they throw away or distrust what evidence we do have and stick to what seems right to them based on their experiences. On the other hand, many EBM purists refuse to do anything for which strong evidence is lacking. Nassim Taleb describes it in this way:
“Doctors most commonly get mixed up between absence of evidence and evidence of absence”
Just because we don’t have evidence that something works doesn’t mean that it doesn’t work. So many things have not been appropriately studied (or cannot be studied ethically given the nature of the problem). In this case, we resort back to experience, but not just experience. We also have analogy, inference, and other powerful tools that can lead us to the best approach. But please don’t do something just because it seemed to work one time before.
For more about anecdotal medicine, read this.
Published Date : January 17, 2017
Categories : Cognitive Bias, Evidence Based Medicine, OB/Gyn
It’s ironic that a journal that’s supposed to combat high blood pressure is doing its best to raise mine with this click bait. A study published this month in the American Journal of Hypertension entitled Maternal Blood Pressure Before Pregnancy and Sex of the Baby: A Prospective Preconception Cohort Study concluded that women with higher preconceptual blood pressures were more likely to have a male child. Give me a break. Before you run off and buy some cold medicine to raise your BP to increase your chances of having a boy baby (I guarantee it will happen), we need to have a little talk.
This is the type of nonsense that fills many medical journals and gets attention from the media. It works its way into textbooks and its novelty gets talked about by the lay press. It leads people to draw silly conclusions in their attempts to understand the information and rationalize it in the world they live in. It is garbage and it shouldn’t be published, at least without comment. But editors and peer reviewers uncritically welcome such nonsense to make money and to get clicks and media coverage which will increase the journal’s impact factor. So far this article is by far the most popular of this month’s issue of the journal and has been picked up by over 40 news outlets. At the time of writing, despite the fact that over 40 news outlets had written about the article, the free PDF of the article had only been downloaded 16 times – two of those were by me and no doubt most of the rest were by the authors. It appears that it was published through the journal’s open access option, which apparently cost the authors $3,650. This isn’t science, it’s business.
So what’s wrong with their ridiculous paper? Let’s go through the five step process I previously outline in How Do I Know if a Study is Valid? and highlight the issues. Recall the steps:
How good is the study?
This Canadian study was done on women in China (where there is a great interest in selecting male fetuses). Out of 3,375 women, they analyzed data regarding 1,411 women who had preconceptual data collected and went on to deliver a singleton pregnancy. This cohort of women went on to deliver 672 females and 739 males, which skews the data towards an unexpected number of male infants. The investigators measured several independent factors preconceptually that were analyzed to see if they differed between the group having boys and the group having girls: age, years of education, smoking, passive smoking exposure, preexisting hypertension, preexisting diabetes, BMI, waist circumference, systolic blood pressure, diastolic blood pressure, mean arterial blood pressure, total cholesterol, LDL, HDL, triglycerides, and glucose. The blood pressure readings were further divided into quintiles, each of which were analyzed.
After adjusting for factors that the authors thought might otherwise confound hypertension, they found that the systolic blood pressure was an average of 2.7 mm HG higher in the group of women who had male babies and they claimed that this was statistically significant with a P-value of 0.0016. No other comparisons were statistically significant. The rest of the article draws misleading and unsupported conclusions from this data and provides unwarranted speculation that I won’t repeat here. They are even so bold as to make the clickbait claim that a woman with a systolic blood pressure greater than 123 is 1.5 times as likely to have a male as compared to a woman with a lower pressure. Wow. If this seems to smack common sense in the face, there’s a reason.
So what gives? Is this a good study? No. I grow increasingly frustrated by this type of study and there are many problems, but the main problem that the whole house of cards is built upon is the claim that the observed difference in systolic blood pressures (a difference of 2.7 mm Hg) is meaningful just because there is a P-value of less than 0.05. So why don’t I care about this (the P-value after all was 0.0016)?
First, I should tell you that a difference of even 1.3 mm Hg given this number of patients in the trial would have produced a P-value of less than 0.05 (0.0414 to be precise). Does that seem absurd? It is. The expected deviation of automated blood pressure cuffs is about ± 6 mm Hg, so the study was powered to find a difference in blood pressure smaller than the standard deviation of the machine used to measure the blood pressures.
The problem is that the study has no power analysis. P-values can only be interpreted in the context of a power analysis. If a study is over-powered, then even a tiny difference will appear statistically significant, even it if is not actually significant. Let’s say the authors wanted the study to be powered to find a difference of 5 mm Hg between the two groups, if such a difference existed; in this case a N = 90 (rather than 1,411) would have been appropriate. The easiest way to fake statistical significance when there isn’t any is to over-enroll; the higher the N, the lower the P-value. This is what is called P Hacking. Given sufficient N, even a difference of 0.1 mm Hg would be considered significant. These decisions must be made a priori or adjusted for posteriori. They did neither.
Second, the adjustments used for the adjusted odds ratios were selective and artificially limited. They adjusted for age, for example, but they chose not to adjust for exposure to second hand smoke (which could reasonably affect blood pressure) nor did they adjust for average time from enrollment to pregnancy in the study (which was the biggest difference between the two groups, and nearly five weeks different – this difference could have easily reflected a difference in staff or equipment or calibration). Since we don’t have the data, we cannot perform the adjustments ourselves, but it is very reasonable to assume that a complete adjustment may have easily resulted in a loss of the purported statistical significance.
Third, they have a serious multiple comparisons problem. This study tested multiple hypotheses, and at least 22 hypotheses were put forward as preconceptual factors that might have influenced gender determination, ranging from maternal age to triglyceride levels (including the quintiling of blood pressures). We have to make an adjustment to the nominal P-value to adjust for over-enrollment, and we can be very conservative and use a new nominal value of 0.025 as a threshold for significance. If we apply the Bonferroni correction to this, accounting for 22 comparisons, then an adjusted P-value of 0.0011363 becomes the new threshold for significance. Using Sidak’s adjustment, we get a P-value of 0.0011501 as the threshold for significance. By either adjustment, we learn that the claimed P-value of 0.0016 is not statistically significant, and that’s assuming that their P-value is adequately adjusted (it almost certainly is not).
More importantly, if no adjustments at all are made, and the paper is taken as is, then it should be remembered that the chance of finding one or more statistically significant differences when 22 hypotheses are tested in 67.65%. So it was more likely than not that at least one of the 22 probed variables would turn up with a false positive. But this is why the multiple comparisons adjustments must be made; when they are, the claim of significance goes away.
The study has numerous other embarrassing problems, but suffice it to say that an unadjusted blood pressure difference in the two groups of 2.7 mm Hg is not significant.
What is the probability that the discovered association is true?
Usually, I stop with the first question if no significant data is found, but for the sake of playing along, let’s answer this question as well. Recall that this question, which involved Bayesian updating, is based upon first determining what the pretest probability of the hypothesis being tested is. In this case, given all that we know with a high degree of certainty about how sex is determined, what would be a fair estimate that a difference of 5 mm Hg (let alone the observed 2.7 mm Hg) in blood pressure measurements many weeks before conception would affect the eventual sex of the fetus? I think you’re picking up what I’m putting down, so I won’t wax eloquently about this and simply state that this hypothesis undoubtedly, prior to the publication of this study, had less than a 1% chance of being true (probably far, far, far less, but I am generous to a fault). Given this low pre-study probability, even if a high quality, unbiased trial were to find a statistically significant result, it would only change the probability of the hypothesis being true to about 14%; if we introduce even a smidgen of bias to that good trial then the probability drops to only 5%. But, alas, this trial is not good and did not produce a significant result.
What are other hypotheses that explain the data?
Of course, even if the data is real, we must consider what other hypotheses explain the data. Consider this picture:
What’s going on here? Let’s generate some hypotheses: a chameleon climbed up on a towel and changed its colors to blend in; a chameleon climbed up on a towel and the towel changed its colors to match the chameleon; someone who owns a cute chameleon bought a similarly colored towel and placed the animal on it to take a picture that would hopefully go viral; or, perhaps, someone is really good with Photoshop. I’m sure you can think of more. Note that the observed data is compatible with all four theories (and some more I didn’t list).
Just because we might accept that the data is real and valid doesn’t mean that any one particular alternate hypothesis is true. Most people who saw this viral picture assumed that the chameleon had changed its colors to match the towel. But those are the normal colors of the blue bar ambilobe panther chameleon; the owners simply posed the animal with a similar towel. But the point is that the data alone cannot tell you that; you have to generate hypotheses and decide which one is most probable based upon all available data, including data not in the study.
Why else might have the authors of this paper found the observed difference (assuming it is a real difference) besides random chance?
We could go on for a while, but you get the point. Knowing nothing else, and raising no question as to the quality of the paper’s data or its methods, we know at the outset that there is a 67.65% chance that the observed difference was due to chance alone, making that the most likely hypothesis. Again, this assumes that there even was a valid finding, and there was not.
Is the magnitude of the discovered effect clinically significant?
No, for three reasons. First, there was no discovered effect. Second, 2.7 mm Hg is well within the margin of error of the measurement device. Third, unless a prospective, randomized, triple blinded, placebo-controlled trial produces a result that says that raising a woman’s blood pressure by about 3 points causes her to be more likely to have a male fetus, then I have no clinical use for this data. Even if such a trial occurred and a result corroborated the magnitude of effect suggested by this paper, I would have to raise the blood pressure of 1,000 women to produce 17 more males, and I doubt that this is something that anyone wants to do (maybe that increased blood pressure will lead to more miscarriages and the net effect is actually less males – oh, the unintended consequences).
I’m not going to deal with the fifth question relating to the cost of adoption of the intervention, because it is difficult to determine cost from fantasy.
I will say that such “science” is dangerous and irresponsible. Why didn’t the editors or peer reviewers point out the multiple problems I have mentioned (and many, many more)? Why was this paper promoted by the journal with press releases to major media outlets? Where is the editorialization that urges caution in drawing any meaningful conclusions from such preliminary work? All of these issues and more are the types of things that I discuss in An Air of Legitimacy, A Hindrance to Progress.
Please don’t take an ephedrine because you want a boy. Oh, and I because I wrote this, the impact factor of this journal will actually go up because I linked to them. Geez.
Published Date : January 16, 2017
Categories : Evidence Based Medicine
In 1945, Vannevar Bush wrote his famous essay, As We May Think. Bush had risen to the level of Director of the US Office of Scientific Research and Development during World War II, where he was overwhelmed with the volume of new scientific research and publications that came across his desk. He lamented,
There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers—conclusions which he cannot find time to grasp, much less to remember, as they appear.
In response to this problem, he famously envisioned the Memex. The Memex was to be a desk that stored millions of pages of books, research papers, notes, and other information on microfilm, but the desk would also allow complex ways of retrieving and annotating the information, and it would remember all of the user’s previous interactions with the material. In this way, he envisioned instant access to all of the information in the world relevant to your interests, and this data would be linked together to form “associative trails” or hyperlinks.
Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them. … The physician, puzzled by a patient’s reactions, strikes the trail established in studying an earlier similar case, and runs rapidly through analogous case histories, with side references to the classics for the pertinent anatomy and histology. …
The historian, with a vast chronological account of a people, parallels it with a skip trail which stops only on the salient items, and can follow at any time contemporary trails which lead him all over civilization at a particular epoch. There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record. …
Thus science may implement the ways in which man produces, stores, and consults the record of the race.
Though his vision was limited by the mechanical prospects of information management imaginable in 1945, it had these essential elements:
This was bold thinking in a world where the then extant knowledge and research was contained in printed books and journals, the best of which were hand-indexed. These books and journals, if you had access to them, could not be edited or updated. There was no effective way of searching the published literature, let alone unpublished research or the original data that the research was based upon. Once you found a relevant work, this work was not necessarily a pathway to immediately readable primary sources or related literature, especially if the literature was published later.
By 1967, Bush realized that computers would become the method by which his vision could be implemented. The ideas of hypertext and linked documents inspired the work of Douglas Enlgebart (who invented the computer mouse and helped to develop the graphical user interface), Ted Nelson (who started Project Xanadu in 1960), and countless others, including Tim Berners-Lee.
Berners-Lee is famous for creating the World Wide Web in 1990. Before the World Wide Web, Berners-Lee created ENQUIRE in 1980 as a means of hyperlinking scientific data, emails, and other information for CERN (the European Organization for Nuclear Research). Eventually, those ideas culminated in the World Wide Web and the Internet as we know it today (interestingly, Berners-Lee’s father invented the first commercial stored program electronic computer in 1951).
The motivations for Berners-Lee’s efforts were in parallel with those of Vannevar Bush. The first line for the world’s first website described the WWW project as “a hypermedia information retrieval initiative aiming to give universal access to a large universe of documents.” He was interested in increasing the ease of collaboration and organizing the massive amount of information being generated in scientific research programs around the world. Communities of scientists and other interested parties could easily access the latest information, contribute to it, add notes, and help grow and apply knowledge. This format finally gave rise to Bush’s vision in ways that even Bush couldn’t forsee (like near-instantaneous search and communication).
Yet, today, scientific publishing and dissemination of scientific information is stuck in a wormhole that connects the 1940s to the 2010s. While many improvements have been made, much of the promise of the WWW is not implemented. Here are some of the problems:
These problems could all be solved if all scientific papers were published in an open access format, with open peer ranking and review, able to be corrected or updated, with full transparency of underlying data, etc. The current academic system of rewarding academicians who publish (and punishing those who do dot) emphasizes bad behaviors and encourages low quality publications (it encourages quantity over quality and research for the sake of career promotion, not advancement of the science). The current system of paid journals (either those protected by paywalls and subscription fees or those touted as open-access which charge authors a fee to publish) is also an archaic arrangement that wrongfully perpetuates the idea that the literature published in those journals is necessarily better than literature published elsewhere. These anachronisms come from a pre-industrial age and have more to do with making money and protecting established power-holders than they do with promoting scientific progress. True scientific progress is disruptive by nature, but the current system oppresses disruption.
The Internet and the World Wide Web, created expressly for dissemination of scientific data, has revolutionized a lot of other traditional paradigms, and as one old system has been replaced with another (disruption), there is always an existing power that fights against it. In each case, however, progress has been made and what we have now is far better than what was painfully replaced. Consider the following examples.
Encyclopedias. Before the Internet, dozens of printed encyclopedias (and references of all other sorts) were available. The Encyclopedia Britannica was perhaps the finest of these general knowledge encyclopedias. It was first published in 1768 and the last printed edition was published in 2010. It now exists online only. The last printed edition had about 40,000 articles with over 8,500 photographs and illustrations and cost $1,400 new. The subscription to the online version costs $70 per year. But I’ll bet that you just use Wikipedia like the rest of us. The English-language Wikipedia has over 5,000,000 unique articles and hundreds of millions of pictures and graphics. It is updated constantly and it hyperlinks exhaustively both to itself and to the rest of the web. Its sources are a click a way and a record of original writing and edits exists for each article, often with some exciting debate among various editors, that anyone can read.
So why would anyone buy a subscription to Britannica, let alone a print edition that was out of date even before it was printed? An air of legitimacy. The Encyclopedia Britannica, like many traditional sources, markets itself as more accurate or more true than less expensive, less controlled, and more available alternatives. Britannica, on its website, cites the number of Noble Prize winners who have contributed to articles, or American presidents, for example, as a way of claiming some authority. But Wikipedia? Anyone can edit it at anytime. Surely, that’s a bad thing?Students in high school or college may get in trouble for citing Wikipedia, but not the Britannica.
But this is all bullocks. The massive peer-review system that underpins Wikipedia and similar efforts is far better than anything Britannica could ever do. Would you rather read an article about Shakespeare’s The Tempest written by one associate professor somewhere at a junior college (Britannica), or one written by over 1,800 Shakespeare enthusiasts (many of whom are also professors or published authors on the subject, edited or updated over 4,000 times (the last edit literally today), replete with over 130 clickable footnotes and hundreds of others links and reference citations (Wikipedia)? By the way, the article on The Tempest has over 100 active watchers if you want to try to make a stupid edit.
In 2005, this study in Nature found an equal number of mistakes in Wikipedia and Britannica. That was over ten years ago; I suspect that Wikipedia has grown stronger while Britannica has continued to decay. More importantly, for up to the minute, useful knowledge (that promotes the easily ability to be fact checked), there is no comparison. Crowdsourcing, massive peer review, deep hyperlinking, and instantaneous availability has won the battle. The same facts presented on Wikipedia have deep context and near-instantaneous recall of sources. But Britannica? You’ll just have to trust them. Britannica has survived by promoting its “air of legitimacy,” but such an appeal to authority is really just a clever marketing tool. This air of legitimacy has become a shroud of death.
News. The newsprint and traditional magazine industry is similarly dying quickly. A daily newspaper may feel like an official “record” for history, but we mostly get our news from multimedia-rich, hyperlinked, and continuously updated online sources. Many media outlets have made this transition successfully, but websites like reddit.com go a step further, fulfilling the vision of massive peer review and crowd-sourcing. I always go to Reddit or Twitter for new or breaking news. The data is more fluid, but that is usually an acceptable tradeoff that makes the information more immediately useful. I you want the final version of a news story, you really just have to wait about a decade for a book to come out; by then, the information doesn’t have the same usefulness. Quick and fluid versus slow and fixed is always a battle in news information.
Video. Traditionally, we had four or five sources of television. Today, we have a massive amount of content immediately available, most at low or no cost. Instead of 20 or so central creators of television and movies, we have millions. YouTube is a marketplace that is free to enter, free to consume (except for the advertisements, of course) and creates a more level playing field for all. This means that any content creator who creates something worthwhile has an opportunity for success. In the old system, quality was not as important because there was limited competition. With only three or four over-the-air networks, any content at all would draw consumers. It wasn’t better content, but it too had an air of legitimacy because it came from a “major” network. But this is what all monopolies claim – that there is no need for competition and that a closed system is best to preserve quality (and control).
Music. Before I get back to the scientific literature, I’ll list music as a final example. The music industry has gone from a few, centralized number of producers and distributors, to a system where anyone can create music and sell it and distribute it though the same platforms as the biggest players. They may not have the promotion and radio access that still drives musical consumption trends, but anyone can make a song or an album and anyone can download it for free or for maybe 99 cents. This has resulted in higher quality and more choices. The music industry fought tooth and nail against this transition, which was spurred on by file-sharing websites like Napster (the music industry once sued LimeWire for $75 trillion!). The music industry story shows us that necessary change will happen whether an industry supports it or not (and whether it is legal or not).
Scientific publishing. So what about medical and scientific journals? As I stated above, because of a publishing cycle that is stuck 200 years ago in the print era, the dream of Berners-Lee and Bush is largely unfulfilled for scientific literature. You cannot, in the vast majority of cases, click on a hyperlinked reference and see the original paper instantaneously after you find the abstract or reference in PubMed or some other source. Literature is not organized in a convenient place by its metadata so that an interested researcher can quickly see all of the relevant literature and sort it in a variety of ways, with immediate access to full text articles and the original full data sets of the authors. There is not a place where massive peer review occurs or where authors can update and correct their original papers. You cannot easily go to one place and see the most important papers in your field, regardless of where they were published, organized and ranked by massive peer review, and stay up-to-date with and contribute to your field. Far too much trust and credibility is extended to journal editors and scientific journalists, many of whom have no real qualifications even to be in those positions.
Two factors have led to this current crisis in the way scientific literature is disseminated: the perpetuation of the expensive, subscription based, journal system, in which journals claim some legitimacy compared to other methods of publication, and the push for academic scientists to publish as many articles as possible to further their careers.
Irony?
Are paywall protected, subscription-only journals really the protectors of high-quality publications? In other words, is something more likely to be true because it is published in the New England Journal of Medicine rather than if it had been published on a Wikipedia-like commons for scientific publications. This is difficult to answer because such a comparison doesn’t exist. If you do the work of producing a good publication, you are obviously going to submit it to the best journal you can; for this fact alone, journals like the NEJM typically produce higher quality literature.
But the editors of leading journals decide which papers to publish largely based upon impact and how well the study is apparently produced. Impact is typically a function of novelty, and novelty can often be an indicator of faulty findings. High quality production can easily be faked, and often is. So studies with dubious conclusions or false-positive findings are all too common in leading journals. Articles in traditional journals are plagued by many problems which I will only briefly mention here.
It is a subject for another day to think about why academicians are wrongly incentivized to produce so many low quality publications, but the problem has only gotten worse in recent years as the number of publications produced is increasing meteorically as pay-for-publication, open access journals are thriving. All of the problems listed above for traditional journals exist also for these journals, plus one additional problem: next to no or no quality editorship or peer review. There are now over 10,000 open access or pay-to-publish journals. The quality varies widely, but many will publish any paper submitted for a small fee within a couple of hours of receipt. Beall’s List provides a list of many of these predatory journals.
Indeed, the extremely low quality of many open access journals has helped the paywall journals maintain their air of legitimacy, but I wouldn’t be so bold as to guess that more true hypothesis are published in pay-walled journals as compared to the open access journals. Both have similar problems and these problems are largely corrected with a massively reviewed, free-to-view format. Academic institutions could easily learn to embrace such as a system, and they could and should give more credit to faculty who produce high-impact, high-quality papers as judged by the community that uses them.
But first the pay-to-play system has to go away. The open-access journals are already freely viewable but don’t exist in a framework that provides the type of system I am describing. The paywall articles need to become free to view, as well. Ironically, this will probably occur in the scientific arena in the same way that it occurred in the music arena. Just as Napster and other file-sharing websites forced publishers to change their paradigm, so too websites like Sci-Hub will encourage this breakthrough. Sci-Hub is the Napster of scientific papers, with over 58,000,000 papers ready to download and hundreds of thousands of downloads per day.
For the paywall journals to truly give way to a new system, they must go the way of Britannica or adapt. This means that stakeholders must get past the air of legitimacy that the preeminent journals maintain, and this will not be easy. A lot of academic pride and a lot of careers are tied to publication in these journals; and a lot of money is made by the publishers. But these factors are a hindrance to progress.
Consumers of scientific publications are actually misled by the gravitas of an article published in an austere journal. They wrongfully assume that it is of higher value or better quality because it is published in such a journal. I heard a debater in a recent Intelligence Squared debate about the FDA make the comment that a particular paper was from a “reasonable journal” and a “reasonable university” as if this were all that needed to be said to end the debate. He did not refer to the merits of the data, but instead made this logical fallacy though an appeal to authority. Science based on faith. A paper could be complete garbage and meet this shallow requirement. Even metrics like how many times an article has been cited or how many times a publication has been downloaded are not very valuable in determining what its quality is; this often just reflects how accessible it is and prevailing bias. Does it come from an open source journal or a widely prescribed journal? It will be accessed more. Does it agree with the prevailing bias of the scientific community? It will be published in a leading journal.
Authors must lead this revolution. The authors of scientific literature don’t make money for their authorship (at least not directly), so that shouldn’t work against the revolution; their incentive is advancement of the field and advancement of their careers. If they want their research read and utilized, then they should be in favor of open dissemination. If they want their research to have high impact, then they should want a wide readership. If they want their findings to be a true part of the scientific process, then they should want the best peer review and commentary about their data. If they want the scientific process to work as designed, and true progress to be made, then they must fight against the current system and replace it.
Published Date : January 14, 2017
Categories : Evidence Based Medicine, OB/Gyn
This is an excellent clinical question, and the type of question that a clinician encounters almost daily. How should we determine the answer to the question? Rather than just tell you, allow me to go through the thought process that should be applied to this and countless other questions.
First, we must make sure we understand the problem and what factors are important to consider. We must also decide what the important outcomes are and make sure that we are not making crucial decisions based on surrogate outcomes alone. Most mistakes in answering a clinical question like this one actually occur in this step. If we are asking the wrong question, we’ll probably get the wrong answer.
So what is oligohydramnios?
Oligohydramnios has numerous definitions, and this lack of clarity will contribute to confusion in answering our question. It has been defined in many different ways:
So first we must decide which of these definitions to use. The truth is, we don’t know how much fluid is “normal.” We can define normal by looking at a cross section of patients and describing the distribution of volumes of fluid in large population of patients, and this has been done. We can then say that anything above or below a certain percentile (the patients on the tails of the bell-shaped curve) are “abnormal,” and this thinking leads to percentile diagnoses, such as oligohydramnios being defined as less than 5th percentile. But that doesn’t necessarily mean that having that amount of fluid is associated with negative outcomes. So abnormal, in that context, may not necessarily be “bad.”
Part of the problem with defining oligohydramnios by either the AFI < 5 cm or the AFV < 2 cm is that the techniques to measure fluid with ultrasound just aren’t that good. The largest study to compare ultrasound methods to objective measurements of amniotic fluid (like dye dilution studies or direct measurements) found that the AFI method only had a sensitivity of 10% and a specificity of 96% while the AFV method (single deepest pocket) only had a sensitivity of 5% and a specificity of 98%. This makes both tests very unreliable for predicting how much fluid is present. Other studies have shown that these predictive values don’t improve even when a variety of techniques are combined.
So our techniques for measuring amniotic fluid just aren’t that good. They have low predictive value and significant unreliability. They also have poor reproducibility. One lesson from this is that sufficient pretest probability should be present before any type of ultrasound-based measurement of amniotic fluid is made, else the result is next to meaningless.
A better way to define how much fluid is abnormal is by correlating the level of measured fluid with negative pregnancy outcomes, rather than as a subset of a normal distribution of amniotic fluid volumes across a population. In other words, our ability to predict how much fluid is actually in the uterus is very poor, but do the poor measurements we take correlate with good or bad outcomes in a clinically useful way? Before we look for answers to that question, we will stumble across one more related issue. Ultrasonographers are trained to exclude pockets of fluid that contain umbilical cord from their measurements; in the past, this was done only with gray scale ultrasound, but most current ultrasonographers can do a better job of excluding cord by using color doppler. Studies have shown that the rate of over-diagnosis of oligohydramnios using color doppler is about 20% compared to using gray scale alone. This is important because original studies that tried to determine which levels of amniotic fluid were associated with abnormal pregnancy outcomes did not use the color doppler method.
There are only a few studies that have looked at dye-determined, objectively measured oligohydramnios which also evaluated fetal outcomes. One found no difference in meconium-stained fluid, variable decelerations, or low Apgars; another found no difference in fetal pH; and another found no difference in a variety of intrapartum outcomes studied, including late and variable decelelerations, IUGR, mode of delivery, NICU admission, etc. These facts are important because it would seem that there is no scientific evidence for most of the intrapartum negatives that are anecdotally associated with low amniotic fluid.
The literature related to an ultrasound-derived AFI <5 and adverse outcomes is mixed to say the least. The classic paper by Rutherford et al. from 1987 that largely established the current fears of oligohydramnios is itself plagued with methodological problems, but it armed perinatologists with a new tool in the form of AFI measurement and a reason to be worried about it. This retrospective chart review of 353 women found just 27 with an AFI of less than 5 cm. The main reason these ultrasounds were done were because of “postdate” pregnancies, many of whom did not have “good dates.” Not surprisingly, they found that the women with low fluid were more likely to have meconium-stained fluid, intrapartum distress, etc. But is this because they were “postdate” or because they had a low amount of amniotic fluid? In other words, do the fetal distress and low fluid have a common cause (like chronic uteroplacental insufficiency), or is the low fluid itself causing the distress? The main problem with this influential paper is that its findings are not generalizable. The finding of low fluid in these pregnancies may have just been a way of better identifying more severe postdatism; but nothing in the study implies that the results are applicable to a well-dated pregnancy, say at 30-weeks-gestation.
Chauhan et al. conducted a meta-analysis of 18 trials that evaluated the relationship between low fluid and adverse outcomes. They concluded that an AFI of less than 5 was associated with an increased risk of cesarean delivery for fetal distress and an increased risk of 5 minutes Apgar scores less than 7, but no evidence of true distress, such as fetal acidosis. This finding is interesting and raises the question of whether bias related to the providers’ knowledge of the presence of oligohydramnios might have affected their management of the labors. How many women were perhaps unnecessarily induced and unnecessarily delivered by cesarean because the provider was anticipating a negative outcome in a “high-risk” pregnancy, even though studies that look at objective measurements of oligohydramnios fail to see such associations? How well can the studies in this meta-analysis be trusted since ultrasound is so poor at detecting oligohydramnios in the first place? Essentially, studies have shown that if you have oligohydramnios, you are more likely to be delivered by cesarean for “fetal distress” by your obstetrician, but not more likely to have actual, objective fetal distress.
Another review of studies conducted in 2016 by Rabie et al. identified 15 trials to be analyzed. They concluded that among low-risk pregnancies, women with isolated oligohydramnios were more likely to have neonates with meconium aspiration syndrome, to have a cesarean for fetal distress, and to have a NICU admission. Patients in the high risk group were more likely to have infants with low birth weight, but all other metrics were the same, including Apgar scores, NICU admissions, meconium-stained fluid, and cesarean delivery. So once again, the issue is likely that the oligohydramnios is a result of the same underlying pathology (and the presence of the diagnosis itself may bias providers). In other words, why was amniotic fluid being checked in the low-risk group of women? Largely because of postdatism; we expect higher rates of meconium aspiration syndrome, NICU admissions, and cesareans in that group. Why was the amniotic fluid being checked in the high-risk women? For all sorts of reasons, including things that affect placental blood flow like hypertension and diabetes, and given the underlying pathology behind those types of maternal conditions, when uteroplacental insufficiency occurs, we often see oligohydramnios and growth restriction. But the addition of oligohydramnios to all of the other problems of the pregnancy did not make the pregnancy (or the neonate) any worse off, though it was associated with the presence of growth restriction.
Chauhan et al., in other study, showed that if the obstetrician knew that a patient had oligohydramnios at the time of admission for labor, the patient was more likely to undergo a cesarean delivery for fetal distress; but the fetal outcome was not improved by this knowledge or action. This example of the framing effect clouds all of the non-blinded studies done on this issue, particularly in regards to the outcome of “cesarean for fetal distress,” and it also shows one of the very real iatrogenic harms of over-testing with ultrasound.
The summary of this data, particularly when the individual trials are looked at, is that there is no data that supports the idea that objective outcomes are different in pregnancies that have oligohydramnios determined by ultrasound versus those that do not.
How should we measure oligohydramnios?
When you understand how inconsequential oligohydramnios is, and that the risks of over-diagnosis outweigh the benefits, then the answer intuitively is that we should use the method of ultrasound determination that labels the fewest women as having oligohydramnios, which happens to be the AFV method (single deepest pocket < 2 cm). This has been studied in several modern trials.
So if you’re still planning on measuring amniotic fluid at this point, the take-away is that you should be using the single deepest pocket method, in order to prevent iatrogenic harm. Unfortunately, the rates of utilization of AFI (instead of SDP) remain very high. One wonders if this is due to the economic incentives of over-diagnosis.
Do we care about oligohydramnios then?
If oligohydramnios is seemingly unimportant, then why even look for it? And if abnormal, why try to treat it (with hydration for example)? I have previously discussed the PORTO study in relation to IUGR, but the study is useful for thinking about oligohydramnios as well. The PORTO study found that oligohydramnios was a predictor of an adverse perinatal outcome only when associated with an estimated fetal weight (EFW) less than the 3rd percentile. This observation should provide some context for what the clinical utility of oligohydramnios truly is (apart from its use in the BPP). If oligohydramnios is important, it is important because it is a sign of either ruptured membranes, significant uteroplacental insufficiency (UPI), or some relatively rare condition like Potter’s sequence.
If we have ruled out ruptured membranes, then we must assess for evidence of UPI. If there is chronic UPI, then a fetus will begin to shunt blood away from the splanchnic circulation and the renal arteries to favor the brain and heart. When this occurs, an immediate consequence may be decreased urinary production. Since the whole amniotic fluid volume is turned over daily, then less fetal urine can be a first sign of decreased fetal perfusion. Eventually, this selective shunting of blood will also result in a small abdominal circumference (as the liver becomes relatively smaller) and, finally, asymmetric growth restriction. By the time this IUGR becomes severe (<3%ile), then the rate of adverse outcomes increases. Along the way, or perhaps as a late finding, we might also observe changes in the umbilical artery velocimetry, as relative placental resistance increases. These changes in the umbilical artery doppler profile are also associated with an increased risk of adverse outcomes.
So that’s it. We care about oligohydramnios only because it is an indirect sign than there might be UPI, which is ultimately the cause of the adverse outcomes (not cord compression, placental compression, or other nonsense). So what if the fluid is low but the fetus is growing well? In this case, it may be a transient issue that resolves, an aberrant reading, a normal variation, or a sign of things to come (that is, the first sign of a placenta at risk). The only way to know this is to recheck the fluid and see if there is a persistence of the abnormality, and if that persistence eventually becomes associated with growth restriction.
What about the BPP? The biophysical profile is a useful tool for determining fetal well-being. Yet, as noted above, studies don’t seem to show a difference in BPP predictive value dependent upon which method of measuring amniotic fluid is used. This means that a lot fewer abnormal BPPs will happen if SDP is utilized rather than AFI, but with the same predictive value for abnormal outcomes. Usually, when amniotic fluid alone makes the difference in a BPP, it is related to postdatism, and in this regard, measuring amniotic fluid as part of the BPP should likely remain a part of the BPP test. Currently, BPPs are over-utilized in clinical practice (due to diagnostic drift) and, if the test is conducted with AFI rather than SDP, then a significant number of over-diagnoses are likely occurring.
Does it matter then if we correct the amniotic fluid volume by maternal hydration?
You have probably already guessed the answer to this question: No. There is simply no scientific evidence that shows that iatrogenically correcting amniotic fluid volume is valuable. Let’s say that the low fluid is due to ruptured membranes; will increasing it be helpful? No. What about if it is transient or has some cause related to something other than chronic UPI? Will it be helpful then to make the number on the ultrasound report greater? No. Well, what if it is due to chronic UPI? Won’t it be helpful to increase the amniotic fluid and therefore increase placental perfusion with forced hydration? I think that’s the spirit of the idea, but there is simply no scientific evidence that increasing maternal hydration for a short time (a few hours or perhaps days) has any impact on chronic UPI. The uteroplacental interface is not a pump that needs to be primed and then growth restriction and fetal oxygenation will be improved thereafter. So even if maternal hydration were to improve the ultrasound diagnosis of oligohydramnios, it might actually have a deleterious effect on the gestation since it might give false reassurance that there is an appropriately working placenta, rather than preserve the early warning system of severe growth restriction and increasing placental resistance.
The point is, even if maternal hydration improved the diagnosis of oligohydramnios, we need to know if it will also improve fetal outcomes. This is an important point. The ultrasound measurement of amniotic fluid is a surrogate marker, and a very, very poor one. Remember how badly ultrasound is at actually measuring the true amount of amniotic fluid in the first place. Often, poor measurements come just from pressing too hard on the maternal abdomen or an incorrect angle of the probe. It’s a very poor test to base important decisions on, particularly in cases of isolated oligohydramnios. When oligohydramnios is the only abnormal finding, then the pretest probability is very poor, and therefore, even in the presence of a positive finding, the positive predictive value is also exceedingly poor. The lesson again here is that the test shouldn’t be done when there isn’t sufficient pretest probability to do so (like a specific maternal medical condition associated with UPI), and when it is done, it should be performed using the SDP method.
So does maternal hydration improve oligohydramnios?
Well, I don’t care. It’s not relevant. But if you must know, you can read this low-quality meta-analysis about the subject that concludes that oral hydration is better than intravenous hydration and that hypotonic solutions are better than isotonic solutions. I’ll say now that I took the long way of providing that answer because the more direct way (searching PubMed for the latest meta-analysis) would have given you the wrong idea. Most folks likely would have only read the abstract, and most would not have answered these previous fundamental questions.
But by now, hopefully you realize the house of cards that this meta-analysis represents. Of the 16 papers reviewed, none of the papers used the SDP method, including the ten published after 2004 (the year it became obvious that AFI should be abandoned). Only one looked at outcomes other than the ultrasound measurement of amniotic fluid volumes, and that paper, of course, found a higher rate of cesarean in the oligohydramnios group. Yet this study, Bangladeshian paper about oral hydration, should have been excluded due to its low quality, low number, and 11% fetal death rate.
Ten of the 16 papers had 25 or fewer patients enrolled, making the majority of included studies statistically worthless. The highest quality and largest study in the mix is this paper from 2014 that looked at whether IV administration of 2 L of hypotonic fluid prior to attempted external cephalic version impacted the success rate of the procedure. It did not. None of the papers accounted for the margin in error in ultrasound measurement of the fluid and most of the authors didn’t consider blinding, randomization, or placebo-control as important things to do in a study.
In short, 16 low quality studies don’t make one good one. Some of the small studies showing success of maternal hydration contain conclusions like this one: “Since it caused no complications for the mother and the fetus, it can be used as an effective method in management of oligohydramnios.” And isn’t this the problem with “scientific” publications? This Iranian study of 20 patients, using inappropriate statistical methods, unpowered to detect serious complications (like maternal heart failure or water intoxication), with no evidence of improved neonatal outcomes, no surrogate markers measured more than 90 minutes after the intervention (what was the fluid the next day or next week?), using the wrong method for detecting fluid volume (the AFI), producing a clinically insignificant effect (1.5 cm increase of AFI), and published in the prestigious Journal of the Caring Sciences (an open access journal), is considered science. What’s more, they have the hubris to claim that this is “managing” oligohydramnios.
Yes, this paper was one of the 16 trials included in the previously cited, open-access published meta-analysis, where the lead author cited himself 17 times in the bibliography, with such irrelevant publications as Fertility rate and subsequent pregnancy outcomes after conservative surgical techniques in postpartum hemorrhage: 15 years of literature. Are you seeing the real problem? These studies and publications shouldn’t have been published in the first place, but the publishers are making money and the “researchers” are going to get published and are going to increase how many times their publications have been cited even if they have to cite them themselves and pay for it to be printed. Somewhere, some “academician” advanced their careers (and harmed patients) with such garbage.
But I’m on a rant. I’m sure the editor of PLOS One, which publishes 70% of all received manuscripts at the tune of 85 papers each and every day, saw some value in the paper other than the $1,495 publication fee (yes, that’s $127,075 per day). No wonder some people joke that the ‘L’ should be dropped from the journal’s title.
Well, anyway, the take-home points are these:
Published Date : January 10, 2017
Categories : Cognitive Bias
Imagine waking up tomorrow morning and finding a world you don’t know. Everything isn’t as it should be. Things that should be good are now bad. Things that should make people happy instead make people sad. Your fundamental ideas about the world, nature, and people are all sorely wrong. Even things you know to be true about yourself no longer hold up under examination. Imagine that you discover that almost everything you know to be true is in fact false!
The experience of this realization would be termed cognitive dissonance and we might call the anxiety and stress induced by such realization an existential crisis. We all want a feeling of internal consistency; we want to feel like what we believe comports with the world around us. We run away from situations or facts that show us that what we know to be true might not be true.
Indeed, the most important invention of the human mind is its ability to protect itself from evidence that it is wrong. We lie to ourselves almost constantly, filtering out all evidence that we might be incorrect and quickly grasping for any data that tends to support our personal worldview (confirmation bias). When some proof of our misbelief sneaks in, we either change our belief (rarely) or we disregard the proof, discredit it, ignore it, rationalize it, reject it, or misinterpret the evidence (commonly). The more deeply held the belief, the more profound the magnitude of cognitive dissonance, and, in turn, the more profound the rejection.
“The lady doth protest too much, methinks.” – Queen Gertrude, Hamlet
Typically, the louder or more banal the reaction, the greater the cognitive dissonance. We really start fighting when our worldview is under attack. Emotional reactions and passionate responses often betray a lack of confidence in the belief. In other words, people who have a well-supported position rarely need to resort to immature defenses since logic and evidence is on their side.
That unfamiliar world I asked you to imagine waking up in is actually the world that surrounds you now. But your brain has protected you from this stark reality by convincing you otherwise. How many fundamental beliefs do you hold that are wrong or are at least poorly supported by evidence? How many assumptions that guide your daily life are actually invalid? Is it a scary thought to realize that most things you believe are wrong, incomplete, or at least poorly evidenced? This includes many things of which you are absolutely convinced, even though history will look back at those ideas and judge them as foolish and inane.
We do this on a very personal level. Do you believe that you are really awesome? You may suffer from the Dunning-Kruger effect. This occurs when we overestimate our skills. Incompetent people often fail to recognize their inadequacies, lack of skill, and usually fail to recognize incompetence in others, despite plenty of external evidence that they are incompetent. The more incompetent you are, the worse you are at recognizing your (or anyone else’s) incompetence.
Do you think you are really terrible at everything? You may suffer the Imposter Syndrome. This is basically the opposite of the Dunning-Kruger effect. In the Imposter Syndrome, highly competent and high achieving people are unable to internalize and believe that they are competent, despite external evidence of their awesomeness. These phenomena don’t occur because you’re awesome or because you’re incompetent, they occur because you have a view of yourself that is not compatible with reality and your brain can’t make sense of it. They occur because you are either an arrogant tool or because you lack self-esteem.
Science is not immune to the consequences of cognitive dissonance. In fact, science perpetuates the condition by lending rhetorical credence to virtually any idea. If you use PubMed enough, you’ll know that you can find literature that supports almost any belief in science you might have, including exact opposite beliefs.
Do you believe that eating processed or red meats increases or has no effect on your risk of pancreatic cancer? Well, you’re right. Heinen et al. in 2009 found no evidence among 120,852 patients of a link between the risk of pancreatic cancer and eating red meats or processed meats. Hurray! Pass me the hot dogs. Of course, Nöthlings et al. in a 2006 study of 190,545 patients found an increased risk of pancreatic cancers among those who consumed red meat or processed meats. Ah, snap! What are we to do? Pick whichever study best comports with your preexisting world view (at least that is what people actually do, whether that is right or not).
I can give hundreds of examples like this one. But alas, this is not science – this is confirmation bias. Yet “science” is invoked by both sides, like some Oracle on Mount Delphi, as the authority (an appeal to authority) that justifies the potentially false belief. Nor does a consensus or plurality of experts or mass of evidence qualify as science. This too is mere Scientism. Every correct idea starts as a heretical fallacy until, finally faced with an overwhelming burden of evidence, the majority accept the idea as valid.
Okay, but at least we can all agree that elevated LDL cholesterol is associated with an increased risk of mortality. Well, except that this trial in JAMA from 1994 found that elevated LDL and low HDL were not associated with an increased risk of total mortality, coronary heart disease mortality, or hospitalization for myocardial infarction or unstable angina. This 2003 study found that high levels of LDL may protect against atherosclerosis. In fact, according to this 2002 study, low LDL and total cholesterol levels were associated with higher mortality in patients with heart failure. Worse, this 2007 study of over 309,000 people found that lower LDL levels achieved with use of statin medications was associated with a doubling of the risk of cancer. Hmm.
That feeling right now in your head is called cognitive dissonance. I am not asking you to draw any conclusions about what you think about cholesterol or statin drugs (I kind of am, I guess and so has science as the “cholesterol hypothesis” is being replaced), but I am demonstrating that it is easy to pick and choose conclusions from the scientific smörgåsbord. Your mind is quickly rationalizing and reconciling these papers without any real reason to do so, since you’ve never read them (“That’s a bad paper…”, “The increased cancer risk is outweighed by all the good that statins do…”, “That’s only true in people over 70…”, etc. ).
We are all guilty of this. It is hardwired in our brains. It is unavoidable. Whole communities of people are guilty of this in a self-perpetuating and synergistic fashion. This large scale, cultural dissonance leads to large groups of people seeking to harmonize their beliefs with their environments. We need to rationalize the world we live in to avoid cultural dissonance.
Can you imagine a society where the fathers of young teenage boys routinely allow older adult men to have sex with their sons, and, worse, consider it an honor to do so? Well, that was ancient Greek society, and the practice was encouraged and normalized by the greatest minds in Greece, from Aristophanes to Socrates. Why did that society as a whole not defend the young boys and end the shameful practice? Cultural dissonance. We are good at defending what we believe in and we are good at defending what is commonplace, status quo, or widely accepted. We are reticent to deviate from what we are used to. I call this cognitive inertia or normalcy inertia. We see the world not as it is, but as we need it to be.
We are slow to change our views about anything we are deeply invested in. We are comfortable with what we know, what is familiar, what is near, and what is common. We are biased towards these things. This is cognitive inertia. We are attracted to what we are told is normal or what seems normal. This leads to a normalcy inertia. It is comfortable to think that everyone around us, our parents or our friends, have a good grasp of reality. If our parents or our elders or those whom we view as smarter than us or are leaders are wrong about the world, what chance do we stand? We are taught culturally what is normal, and we can hardly see the world elsewise.
Cultural dissonance and cognitive inertia exist in science in what is termed prevailing bias. If the prevailing scientific bias is that the sun revolves around the earth, then all evidence (even evidence that refutes the idea) will be seen as supporting that prevailing belief.
It was, of course, Copernicus who made the first serious, modern suggestion that the earth revolved around the sun, and not the other way around, in 1543. In doing so, he challenged the prevailing view (the prevailing bias) of Ptolemy that had reigned relatively unchallenged for nearly 1400 years. Copernicus did not provide “proof” of this theory, but he did do something that I think is very interesting: he showed that the same celestial observations (the same data) which were accounted for by the Ptolemaic system could also be accounted for by his system. In other words, he showed that there was more than one hypothesis that could fit the data. Tycho Brahe quickly added yet a third hypothesis that was compatible with the data.
At the moment that Copernicus first showed that all known data could be explained by his system as well as by the Ptolemaic system, “science” should have immediately stopped and admitted that, barring any new evidence, both explanations were equally likely. This is a foundational principle of Bayesian updating. But, of course, that is not what happened. His contemporaries couldn’t objectively view the issue due to the prevailing bias and the cognitive inertia that went with it, just like you probably didn’t take seriously the very credible articles cited above about cholesterol. In fact, wide acceptance of the Copernican theory as even potentially valid didn’t really occur until after 1700.
Nevertheless, by 1610, Galileo had discovered the moons of Jupiter, Kepler had shown that the planetary orbits were elliptical rather than circular, and by 1639 Zupi had discovered the phases of Mercury. All of these were crucial developments and discoveries. If science had properly held heliocentrism as at least equally likely as geocentrism in 1543, it should have accepted it as the most likely explanation by 1610, and all but proven by 1639.
But none of this happened. It took nearly another hundred years, and this is not because science is slow and careful, it is because science is biased and imperfect and only gives up prevailing ideas when forced to or when those who are emotionally invested in the ideas die. This is literally what happened in the Copernican revolution. The diehards resisted and rationalized and denied until those diehards …. well …. died. This is cognitive inertia.
Magnesium sulfate tocolysis is an example of this that immediately comes to mind in Obstetrics. There is absolutely no scientific evidence that the practice works or is beneficial, and this has been the case for over 30 years. Yet it persists, and its true believers simply cannot be convinced otherwise. It is a prevailing bias.
It doesn’t need to be this way. The problem in all of these examples is attachment. We are too attached to the ideas and beliefs that we hold, and they define us, they give us meaning and purpose and comfort. As good as humans are at rationalizing away evidence that contradicts their beliefs, we are also pretty good at determining what is true and what isn’t … as long as we are not attached to the outcome. Indeed, our brains are perfect computers for Bayesian updating, and therefore for making decisions and deciding the epistemological value of evidence.
If you don’t care whether Chevy or Ford is better, and you are presented with objective data about a truck from each manufacturer, you will likely make a good decision. But we care too much about Chevy vs Ford so this is impractical … unless we blind ourselves. Blinding is helpful in science expressly because of issues like prevailing bias. As much as we talk about blinding in scientific studies, we don’t blind where it really matters. We should blind study designers to what is being studied. We should blind statisticians to what the data represent. We should blind peer reviewers to the subject of the paper. Sounds crazy? It’s not and it’s all possible and effective.
We all have learned that an “appeal to authority” is a logical fallacy, but why is it? Shouldn’t authorities be the ones who most adeptly are able to decide the truth regarding a particular issue? Shouldn’t expert peer reviewers be the first choice to review papers in their respective fields? Shouldn’t those who have dedicated their lives to a particular area of medicine be the ones who make the practice guidelines about that subject? Well … no. An appeal to authority is a fallacy because it is an appeal to bias. (I realize that some texts relegate the appeal to authority fallacy as referring only to an appeal to an unqualified authority, but these authors miss the point and find themselves in disagreement with the history of the concept).
Those who are most invested personally in a field are the ones least likely to make objective decisions; they are the most biased and the most emotionally invested; they are the ones most likely to have cognitive inertia; they are the ones most likely not to see the forest for the trees. What’s more, they are really good at defending their position. A rhetorician and logician no less than Socrates himself defended pederasty, and he was certainly a qualified authority. But his ability to make a convincing argument that justified his worldview doesn’t qualify as evidence of the moral rightness of pederasty.
The experts rebuffed and scorned Copernicus for over 100 years; yet a schoolboy, uninvested in how the planets work, who was presented with an argument for heliocentrism by Galileo and an argument for geocentrism by one of his critical contemporaries, would have easily decided that Galileo was right. In the same way, a medical student shown the body of research for and against magnesium sulfate tocolysis will dismiss the idea in mere seconds, while a high risk obstetrical specialists with forty years’ experience clings to the falsehood like his life depends on it.
In other words, we typically don’t let facts speak for themselves; rather, we speak for them.
Some of the experts we depend on to make important scientific decisions would not be allowed on a jury to decide a case if it were revealed how invested and therefore biased they are regarding the subject matter. Yet we freely and liberally allow their opinions to go unchallenged on an endless number of scientific subjects.
The idea of prevailing bias has been studied in the scientific literature. John Ioannidis has stated that in some fields of research, like nutrition, “the extent that observed findings deviate from what is expected by chance alone would be simply a pure measure of prevailing bias.” Remember the question about whether red meats and processed meats are associated with an increased risk of pancreatic cancer? The answer according to some very poor meta-anlyses of the subject is yes (here’s one), but this yes answer (of a very small, negligible magnitude) likely represents just the prevailing bias in the nutrition field that red meat and processed foods are bad. How do I know? Because of the poor design quality of the original studies, the poor quality of the derivative meta-analyses, and the very minute magnitude of effect. But try telling all of that to a vegan. The vegan would accept, without question, even the poorest of evidence that red meat increases the risk of cancer, but would fight tooth and nail against any claims, even from high quality studies, that vegetables cause cancer (I’m not claiming that vegetables cause cancer!).
This recent study examined how likely a person is to change his belief when presented with incontrovertible evidence that the belief is wrong. Study participants were happy to change their minds about subjects like “Thomas Edison” or “Reading early predicts intelligence” when confronted with contradictory evidence, but they were very unlikely to change their minds about “abortion” or “gay marriage” given equally strong evidence. In general, they were very unlikely to change their minds about political or religious beliefs (since these are deeply held) yet likely to change their minds about things like Thomas Edison because their beliefs about Thomas Edison don’t define them; their senses of self don’t care about facts related to Edison’s life. Objectively, they should have been willing to change any belief with equal probability when contradictory evidence was presented – but they did not.
So what’s the moral of all of this? Well, I’ll let you decide, because you already have anyway. If I say something you agree with, you’ll say ‘Amen,’ and if I say something you disagree with, you’ll attack my character and my halitosis. But such is life.
Published Date : December 24, 2016
Categories : Evidence Based Medicine, OB/Gyn
It is clinically useful to think of the menstrual cycle as actually consisting of two cycles (the ovarian cycle and the uterine cycle). Each has three phases: the ovarian cycle consists of the follicular phase, the ovulation phase, and the luteal phase; the uterine cycle consists of menses, the proliferative phase, and the secretory phase.
Educating patients that the two cycles don’t always work together is useful. Sometimes this is unintentional (anovulatory bleeding) and sometimes it is intentional (IUD-induced amenorrhea). Also, educate patients that the menses associated with birth control pills is completely artificial (since it is not associated with ovulation).
Premenstrual Symptoms.
Most gynecologists are very familiar with the premenstrual syndrome (PMS), which has a variety of nonspecific symptoms like stress, anxiety, emotional lability, headache, fatigue, etc., and a variety of physical symptoms like bloating, cramps, breast pain, etc. In fact, hundreds of symptoms have been attributed to PMS or premenstrual dysphoric disorder (PMDD). There is no doubt that many of these symptoms, confined to the luteal phase, are related to the menstrual cycle. Most are probably related to the rise in progesterone and the relative rapid fluctuations of estrogen. But the explanation for many symptoms is still not understood or at least poorly understood. Bloating, for example, is likely related to progesterone effects on the bowel; this can be a confusing symptom in women who have IBS or other functional bowel issues, who confuse their symptoms with menstrual issues. Worse, there may be improvement in the underlying functional bowel disease with the initiation of birth control pills, and the real diagnosis goes missed and inadequately treated.
Indeed the biggest problem with premenstrual symptoms in women is misattribution. Be careful to explore other causes for emotional lability, anxiety, and fatigue before just assuming it’s PMS and starting a birth control pill. Hormonal cyclic changes may contribute to these many nonspecific changes, but a good history often reveals other underlying causes. Unfortunately, few gynecologists insist upon a menstrual symptom diary before making a diagnosis.
Ovulation.
Ovulation can be detected before the fact with home LH detection kits for the urine. LH hormone peaks on the day of ovulation at an average of 42.6 IU/L. However, the home detection kits detect LH in the urine at levels around 10-20 IU/L, therefore predicting ovulation usually the day before it happens. This fact is convenient for timing of intercourse, since intercourse on the day of ovulation or later is unlikely to result in pregnancy.
Afterwards, ovulation can be confirmed by checking a midluteal progesterone level (day 21 of a 28 day cycle, or usually 8 days after the first positive home LH kit). Any value above 3 ng/ml is consistent with ovulation (see William’s Gynecology if your jaw just dropped to the floor in disbelief). This number confuses many practicing OB/Gyns and fertility specialists, who often cite the number 10 or even 12 as being consistent with ovulation. But 3 is the number. Why the confusion? I suspect it is because 3 ng/ml of progesterone is equivalent to 9.54 nmol/L of progesterone (which rounds up nicely to 10). That’s right: people just used the wrong units. (Science? Meh…). In the US, we typically use the conventional unit (ng/ml) but in international journals and sometimes in US journals, the SI units (nmol/L) are used instead. Much of the early work from the 1980s on detecting ovulation with midluteal progesterone was either European or South African, thus apparently confusing many American audiences.
Luteal Phase Defects.
What’s worse, some descriptions of so-called “luteal phase defects” from the 1990s suggested that women with a midluteal progesterone levels less than 10 ng/ml may have the condition. The 5th to 95th percentile range of Day 21 progesterones in normal ovulating women is 8.2 to 17 ng/ml (if you are off by even a day or two one way or another, then lower values of 5 and 6 become very common). Declaring that women who have values of less than 10 have a “luteal phase defect” would condemn at least 15% of women to having this diagnosis. Such demonization of the corpus luteum was common place before we understood better the many causes of miscarriage, almost none of which are related to hormone levels.
The idea of a luteal phase defect is that if the corpus luteum doesn’t secrete progesterone long enough and at adequate amounts, then a pregnancy may not implant or, if it does, it may miscarry. Pregnancies usually implant within 6-10 days after ovulation; the later the implantation occurs, the higher the rate of miscarriage. Because of these facts, physicians have attempted to give progesterone supplementation to women with a history of miscarriage or with a low serum progesterone level in early pregnancy. The majority of studies have focused on luteal phase support in pregnancies achieved with assisted reproductive technologies. But in naturally occurring, spontaneous pregnancies, there is simply no good scientific evidence that a luteal phase defect even exists, or, if it does, that progesterone supplementation is effective in reducing the risk of miscarriage. Even if the defect does exist, its diagnosis can only be established with endometrial biopsy. The problem theoretically would be the premature decidualization of the endometrium, and progesterone supplementation is not necessarily going to fix this, since it may have more to do with progesterone receptor activity or other unknown factors rather than serum levels of progesterone.
Recurrent Pregnancy Loss.
Progesterone therapy has been offered by a majority of practicing gynecologists to treat recurrent pregnancy loss in women with low serum progesterone levels or a history of miscarriage without any scientific evidence for decades. The promotion of the idea of a “luteal phase defect” as a cause of miscarriage, along with the observation of the utility of progesterone supplementation in artificial reproductive cycles, spurred on this non-scientific practice. Worse, since many women with a history of recurrent pregnancy loss have been sent to specialists in reproductive endocrinology, they have often recommended this practice, having been strongly biased by their ART successes.
We have known since the 1980s that low serum progesterone levels were predictive of a high likelihood of miscarriage. This, too, naturally promoted the idea that we could add back progesterone and fix the problem (we are, after all, the generation that conquered nature!). But, alas, some of this is chicken and egg thinking. Is the serum progesterone level low because the pregnancy is destined to fail, or is the pregnancy destined to fail because the progesterone is low? We too quickly draw conclusions from the two or three pieces of data we understand, making false conclusions and offering false hope. Today, our understanding of early pregnancy development is much more complex but still lacking. How good is your knowledge of pinopodes and alpha-v beta-3 integrins? How about osteopontin, Th1 and Th2 cytokines, or the progesterone-induced blocking factor (PIBF)? I’ll bet if there were commercial lab tests available to check levels of these things (and some pharmacy willing to make them), doctors would check levels and replace (or take away) in an effort to stop miscarriages. Sounds ridiculous? It’s what we have been doing with progesterone for thirty years.
The highest quality study ever performed on the subject of progesterone supplementation was published in 2015 in the New England Journal of Medicine. The PROMISE Trial randomized 836 women with a history of three or more miscarriages to receive progesterone supplementation or placebo once they achieved spontaneous pregnancies. The rate of live birth after 24 weeks was no different in the two arms. This study is currently the only quality study ever done on the issue and should be enough to end the practice once and for all. Unfortunately, I doubt it will have that impact.
Positive Feedback.
It may not surprise you that the traditional ideas of how the menstrual cycle hormones interact is a only a theory with a lot of holes. Have you ever been confused by some of the apparent contradictions of how certain drugs work (like clomiphene or letrozole) or how the positive feedback of estrogen and progesterone affect GnRH, etc.? Don’t worry, the model is probably just untrue and/or incomplete. Read more here if you are curious.
Published Date : December 24, 2016
Categories : OB/Gyn
There is a fine line between innovation and
Published Date : December 23, 2016
Categories : Cognitive Bias, Evidence Based Medicine
Frequency of the word “post-truth” in news stories according to Google Trends (spike is in Nov. 2016)
Since the 2016 election, the term post-truth has become the subject of countless essays, news pieces, and YouTube videos. Facebook and other outlets want to censor “fake news” (and therefore become the arbiters of what is true) in response to a year that is being called the “Year Truth Died.” The Oxford Dictionary has declared “post-truth” its word of the year. It is interesting how deluded people have become to think that they can decide what is true or untrue. As I have said before, this is merely hubris.
Most of the stories and posts that I see on Facebook and news websites over the last several years that have bothered me for failing to meet the standard of “truth” are health-related. Naturally, those are the posts I give more attention. If anyone wants to confirm that we have apparently, as a society, become more tolerant of false claims, look no further than the health industry and its associated advertising and pseudoscientific literature. Of course, I am not saying that this has just happened in the last 10 years. It is as old as time itself.
On CNN’s website today, I read this article about apple cider vinegar, promoted by health.com. Health magazine and CNN are both owned by Time, Inc. The article is typical of the type of stories covered on CNN’s health section and in magazines like Health. It’s author is Cynthia Sass, whose many popular books promote unscientific methods of dieting and other questionable nutritional tripe (“lose eight pounds in just four days”). You might say she’s a part of the Alt-Health movement, pun completely intended. But her particular brand of nonsense isn’t up for consideration for censorship for being “untrue,” I guess because it makes a lot of money for CNN and others.
The Alt-Health industry is characterized by false claims and shoddy pseudoscience that is designed to bilk people out of billions of dollars. The use and interpretation of evidence-based medicine is tough enough as it is. Even when well-designed, randomized, placebo-controlled clinical trials are available, there is still much analysis and interpretation to be done to understand what to do with that evidence. When it comes to diet and the dietary industry, this problem is immensely more difficult. There are few randomized, placebo-controlled trials. Anecdotal evidence and testimonials are standard fare. Most published studies consist of retrospective, observational data or prospective studies that are poorly controlled and poorly powered. Where positive data do exist, often the effects are either focused on the wrong outcome or the magnitude of effect is insignificant. Finally, there are huge economic incentives in the industry to mislead consumers.
Economic incentives? Yes, indeed. International ‘Big Pharma’ had sales in 2014 of roughly half a trillion dollars worldwide. The value of the food industry is at least 10 times that size and very competitive, with companies trying to edge each other out on something other than price, since industry margins are already so low (I should note that US Big Pharma sales are about $236B yearly). Current growth in the food industry comes from niche-markets, which are often created and fueled by spurious health claims, like organic food, gluten-free, fat-free, sugar-free, whole foods, etc.
Organic food sales alone in the US were nearly $50 billion last year. The poorly regulated supplements and vitamin industry is worth something more than $61B per year in the US. And consumers get very little in return for this money. These sales are driven directly by predatory, slick salesmanship and pseudoscience. Facebook feeds are awash with snake oil and quackery (I saw this on Facebook, for example). Because the Internet is used for this marketing, a Google search for scientific evidence about diet and dietary supplements is almost assured to be heavily biased and probably false, with most results coming from someone with something to sell. Since a lot of pseudoscientific supplements and Alt-Health products are sold through multilevel marketing schemes, then small, personal webpages and Facebook posts that number in the millions exist giving anecdotal endorsements to products that the endorser stands to gain from financially.
‘Big Pharma’ has become a buzz-word for these salesmen. Big Pharma – bad, Natural – good. But the profit margins are much higher in the health foods, vitamins, and supplements industry. This industry does next to nothing in the way of research or safety testing. They spend nothing on FDA approval and the most unscrupulous companies in the industry usually just go bankrupt if they get hit with a lawsuit. But even large, mainstream companies are in on the act. Whole Foods, for example, has the highest profit margins of any grocery chain in America, raking in the cash while implying to its consumers that the food it sells in somehow “healthier” than that of its competitors.
Back to the article about apple cider vinegar. An accompanying slide show describes “15 superfoods for the Fall.” A superfood, according to Wikipedia, is a marketing term and is not used by nutritionists and nutrition scientists. In other words, it’s fake news designed to exploit you out of your money. In fact, the European Union has made it illegal to market foods using this term. I wonder if Facebook will label all articles about superfoods as ‘junk science’ or ‘disputed’ or ‘fake’? I see at least one promoted article about superfoods per day. CNN, for their part, has decided to present this fake science as truth.
This particular article reports on two studies which claim that drinking apple cider vinegar both lowers blood sugar and makes you skinnier! Just an ounce a day delivers both of these miracles! The reporter says that this is based on “some solid research.” She goes on to speculate that it might also improve gut health (the latest buzz-word) and concludes by recommending that you drink two teaspoons of organic apple cider vinegar mixed with a teaspoon of organic honey mixed in a cup of warm water each day. I always like my acetic acid to be organic – specifically, two carbons per molecule (nerdy joke).
This type of reporting on scientific issues is common on CNN, Fox News, and hundreds of other sites. The quality only goes down when it comes to blog sites like Huffington Post. Did you detect any bias or selective presentation of data in this vinegar piece? (Please don’t read it.) If this is how scientific evidence is reported on, how can we expect “fact-driven” reportage on issues involving less clear areas like sociology, economics, foreign policy, current events analysis, etc?
Scientific journalism should require some critical assessment of the data and it’s implications, not just cherry-picking of any claim made that agrees with the author’s opinions. Ms. Sass did not practice scientific journalism, so let’s spend just a minute and do the work she should have done.
One study she cited showed lower blood sugars in the morning, if the participants ate a piece of cheese with some vinegar the night before. The study involved 11 participants (yes, eleven!) with Type 2 Diabetes, eight of whom took medications for it (why even try to have a homogenous group?). They recorded their fasting blood sugars for three days taking nothing at night, then for three days taking water with an ounce of cheese, and finally for three days taking the same cheese with 2 tablespoons of apple cider vinegar. The cheese and water group reduced fasting sugars by 2% (P=0.928) and the vinegar and cheese group reduced sugars by 4% (P=0.046). Yup, that’s the solid research the reporter was talking about. The study doesn’t account for the margins of error in glucometers, doesn’t strictly control for dietary and activity levels of the participants, doesn’t know what medicines the patients were on or when and how much the participants took, and, yet, the authors have the gall to conclude that the observed effect of vinegar was significant. I won’t waste much more time analyzing this paper, but suffice it to say I have seen middle school science projects more worthy of publication. Eleven unmatched people for 3 non-controlled days? Oh, and that P-value of 0.046? It wasn’t compared to the placebo arm (water and cheese). Hmm. Sounds legit.
The second paper she cites claims that apple cider vinegar improves insulin sensitivity in people who are insulin resistant or who have type 2 diabetes. This “solid research” paper included 8 controls, 11 people who were supposed to be insulin resistant, and 10 with type 2 diabetes. They were assigned to drink either a vinegar, water, and saccharin concoction followed by a bagel or a “placebo drink” followed by a bagel. They then did the opposite the next week. Interestingly, for people who think that a smidgen of vinegar is so powerful, we aren’t told what was in the placebo drink. They found “slightly improved” insulin resistance in the type 2 diabetes group. Oh, the P-value you ask? It was 0.07. Solid research!
The CNN writer also claims of this study, “People with pre-diabetes improved their blood glucose levels with vinegar by nearly half, while people with diabetes cut their blood glucose concentrations by 25%.” The paper makes no such claims about this at all, but rather it seems that the CNN expert was confused by the paper’s report of the comparisons of the fasting blood sugars in the three groups to one another (the control, pre-diabetics, and diabetics) with no claim that this was related to vinegar but rather the disease. Oh, the rigors of writing for Health magazine and CNN.
What about the weight gain claim? Those claims are supported with a link to this paper which studied the effects of force-feeding acetic acid to fat mice. I am not even going to comment on the complete non-relevance of this paper to any of the CNN claims. But there is one last follow-up paper in humans that the author cites. The study was conducted in Japan and consisted of giving subjects three versions of a drink that had variations of seven different ingredients, one of which was vinegar (and the placebo with no vinegar had lactic acid!). The rest of the paper is just as shoddy, with inappropriate controls, incorrect statistical methods, and false claims of significance. If you must know, the low dose group gained weight and body fat, and the high dose group lost, but none of it was statistically significant.
I could go on, but this type of analysis is typical of the vast majority of science reporting that appears on news websites. Interestingly, the papers cited are from 2004, 2007, and 2009. In other words, none of this was “news” anyway, but rather it was selective picking through the literature to support a narrative the author had already written. She might have missed this excellent 2014 review of the literature on vinegar. The authors lamented:
The use of food additives of natural origin to treat disease has increased significantly in recent years, despite a lack of evidence showing medical benefit. … Diabetic patients are 1.6 times more likely to use complementary and alternative medical products than nondiabetics. Moreover, obese individuals, who are usually unwilling to reduce their daily caloric intake, are often prone to use dietary gimmicks or alternative products that promise weight loss and beneficial metabolic effects.
The review is quite fair and exhaustive and concludes that the positive literature about vinegar come only from low-quality studies; they also believe that there is a significant publication bias in the area. They state that no health claims can currently be made about vinegar. Do you think CNN would publish those conclusions in a piece promoting superfoods, turmeric, and other nonsense? Of course not. Every single “news” outlet is more interested in click-bait and ad sales than facts.
Is this constant publication and promotion of health lies harmful? Of course is it. From Dr. Oz to The Doctors and the vast majority of health reporting on every major website, almost all we see are stories similar to this one on vinegar. They encourage patients to do ineffective things for their health problems (often promoting the idea the medical field is bad) and they take their money. It is the lower form of charlatan who preys on the weak and infirm to make a buck. I didn’t have to look hard for the vinegar story, I actually just went to the CNN Health page and picked the newest story. Such analysis could be performed every day. Where is the fact-checking?
We don’t live in a post-truth world. That’s just the meme of the month. Humans have always cared little about facts in every walk of life, it just became more talked about in politics this year. Nothing has changed. Do we care about facts? Every day hundred of thousands of doctors in the United States, who choose not to follow evidence-based guidelines, prove that they already live in a “post-truth” world where facts are irrelevant. Patients who spend hundreds of billions of dollars on the Alt-Health industry live in a post-truth world. But this world has been here from the very beginning of time and it isn’t going away.
So should Facebook get rid of the Alt-Health junk from my feed? Nah. I like a good laugh, and I don’t believe in censorship – I believe in effective persuasion. I am frustrated by but not afraid of the anti-vaxxers, the home-birth VBACers, the flat-earthers, etc. Censorship usually has the opposite effect than the one the censors desire. Want someone to read a book? Ban it. Instead of suppression, we should provide some competition of ideas.
Published Date : December 22, 2016
Categories : OB/Gyn, Teaching Tools
Speed in operating should be rendhe achievement, not the aim, of every surgeon. – Russell John Howard
This old school howardism makes an excellent point: speed in surgery is the fruit of good technique, efficiency of motion, and a lack of complications. But the lust for speed may encourage risk taking and shortcut making resulting in poor surgery, longer surgery, and more complications.
Is the length of surgery important?
Yes. Better outcomes are consistently observed with shorter length of surgery.
However, the relationship between shorter surgery and better outcomes is complex. Are better outcomes the result of a shorter time, or is a shorter time the result of a simple, less moribund case? The answer is both. Shorter surgical time definitely translates into fewer complications due to the effect of the time itself. Shorter time means:
This study of about 100,000 surgical cases found that longer operating times were associated with an increased risk of urinary tract infections, organ-space surgical site infections, sepsis/septic shock, pneumonia, DVT, renal failure, wound disruption, cardiac arrest requiring CPR, and death. In fact, surgical site infections occurred in 14.1/1,000 cases per hour, starting at 42 minutes, and 16.6/1,000 cases of sepsis occurred for each additional hour longer than the standard time the case should take. A total of 116 per 1,000 additional negative outcomes were associated with each extra hour of surgery. By comparison, the fastest procedures (top 2.5%tile) had the lowest composite rate of negative outcomes.
Of course, a shorter surgery is also a result of factors that naturally lead to shorter surgeries and produce better outcomes anyway, such as:
This last point is cited as the reason why surgery takes too long by every surgeon who has ever had a long case (“The patient was complicated…”) or if a surgeon is consistently slow then it is because all of his patients are complicated. While there is obviously some truth to this, we must be careful not to blame the patient for our own inadequacies. Think about how well your best cases go with straightforward patients and compare those length of surgery times to some standard metrics.
A lot of focus today is on minimally invasive surgery as a means to decrease patient complications. Yet, length of surgery is often more important than the route of surgery. If minimally invasive approaches take considerably longer than open approaches (or result in more steep Trendelenburg, excessive abdominal insufflation, prolonged intubation, etc.) then a patient might be better served with an open approach. For example, once a laparoscopic total colectomy takes longer than 3 hours, then the patient would have had fewer total complications with an open procedure. The data is scarce, but one might assume the same to be true with hysterectomy. A robotic case that takes in excess of 3 hours with several ports, steep Trendelenburg position, and high insufflation pressures undoubtedly results in more morbidity than a 30 minute abdominal hysterectomy performed through an 8 cm incision (or a 20 minute vaginal hysterectomy performed through no incision).
So how long should surgeries take?
As little time as possible to safely complete the task.
A more pragmatic answer is difficult. Scientific studies that report average lengths of surgery are often not comparable to the time we should expect a competent surgeon to take in the real world since most of the published cases are performed by residents in training programs. This question reminds me of a passage from Joel-Cohen’s book from the 1970s entitled Abdominal and Vaginal Hysterectomy: new techniques based on time and motion studies. Joel-Cohen published a treatise of his own surgical techniques which had been influenced by his own filmed and photographed studies looking for ways to improve efficiencies. He says:
Although speed as such is no criterion of the surgeon’s ability, with simplicity and constancy of technique, no waste of movements, and using instruments properly, there is an enormous saving of time. It is therefore necessary for me to say that, without hurrying, my own average time for abdominal hysterectomy, that is a total hysterectomy from skin opening to complete skin closure, is usually, about twenty to twenty-five minutes. Vaginal hysterectomy with repair, anterior and posterior, is also twenty to twenty-five minutes for the complete operation, and without repair, an average of twelve to fifteen minutes. These are recorded times and not guesses.
Well. He sounds like a fun party guest. But he is right. Joel-Cohen was a master of his craft and was obviously reporting his straightforward cases, but still the times are realistic. I have done simple vaginal hysterectomies in just under 10 minutes but my average runs closer to 18 minutes (I have seen a video of a complete TVH in just 6 minutes). Routine cesareans should take between 12 and 18 minutes. Simple abdominal hysterectomies should take about 25 minutes or less. Additional procedures obviously add additional time, but even complicated cases can be performed in less than double these times.
I am not saying that Joel-Cohen operative times should be expected for the average gynecologist, but in non-teaching cases, the averages shouldn’t be considerably longer. Obviously a wide variation of lengths of surgery exists for different procedures. I know general surgeons who can do an appendectomy in 10 minutes and I know some who take considerably over an hour. Why the variation? If you can answer this question, you can understand how to become faster.
How can I become faster?
Let’s look at a few factors that contribute to shorter operative times (and better outcomes). At the outset, it should be stated clearly that speed is for the patient’s benefit, not the surgeon’s. Speed gains come from efficiency not haste.
1. Essentialism (simplicity). Simplicity of technique is the hallmark of good surgery. Not only does simplicity save time, but it also presents fewer steps wherein mistakes can occur, reduces the perceived technical complexity/difficulty of the procedure (boosting the surgeon’s self-efficacy) and allows for greater procedural simplicity. Einstein is often misquoted as saying, “Everything should be made as simple as possible, but not simpler.” This is one of the truest principles of surgical technique.
There are some essential features of every procedure; but for every essential procedure, there are often many unnecessary steps that have been added over time. Separating the essential steps from superfluous complexity is the sine qua non of great surgical technique. Leonardo da Vinci said, “Simplicity is the ultimate sophistication.” It’s ironic that a surgical robot was named after him.
What are the steps that must be done for any given surgery? Let’s think about cesarean delivery for a moment. The abdomen and uterus must be entered, the child and placenta delivered, and the uterus and abdomen closed. Now what are the best methods to accomplish each of those steps? For details about this procedure, read more here. But more to the point, What are the superfluous steps? Closure of the parietal and visceral peritoneum are unnecessary; sharp entry and dissection into the parietal and visceral peritoneum are unnecessary; dissection of the rectus muscles off of the rectus sheath are unnecessary; irrigation and manual dilation of the cervix are unnecessary; two-layer closure of the uterus is unnecessary; non-inclusion of the vesicouterine peritoneum into the hysterotomy closure is unnecessary; routinely grasping the uterine incision with various clamps before repair is unnecessary; routine cauterization of various areas with the Bovie is unnecessary; cleaning out the uterus with a sponge is unnecessary; and reapproximation of the rectus muscles is unnecessary.
Not only are all of these things unnecessary, but most have evidence of harm and all add time and complexity to the procedure, increasing the aggregate risk and rate of complications. All of these steps at one point were considered to be crucial to a successful surgery by someone, but we need to use whatever evidence we have available today to decide whether they are truly essential. All of the above steps have failed that test.
So how do you figure out which steps are essential? Think about every single step and the alternatives that exist for each step; then search for the evidence. A lot of surgeons just don’t know that there is another way – they have totally lost sight of what is essential and necessary and what is superfluous and harmful. Do you really need the arms tucked to do a 5 minute laparoscopic tubal ligation? Do you actually need a paracervical block to do an Essure? Do you routinely need to sound the uterus prior to inserting an IUD? What good is a left tilt during a cesarean? Why did you give a preoperative antibiotic to a patient having a diagnostic laparoscopy? Do you actually need a uterine manipulator to do a tubal? Do you need a robot to perform a hysterectomy? Question every step and eliminate the things that are not productive.
Custom and habit are not excuses for continuing to do things the way they were once done. I wrote recently about the many once-performed but not evidence based steps of a vaginal delivery. Similarly unnecessary steps exist for everything that we do.
Some steps may be necessary but more than one method exists for completing each step. For example, initial laparoscopic entry may be accomplished by direct entry with an optical trocar, pre-insufflation with a Veress needle, or an open method using a Hasson cannula. Which is safer? Direct entry (if this shocks you, it’s true). Which is faster? Direct entry. We can secure the pedicles during a hysterectomy with clamps and suture or with an energy sealing device. Which is safer? The energy sealing device. Which is faster? The energy sealing device. You will find that most of the time the faster method is also the safer method.
When we eliminate unnecessary steps from surgery, a common criticism is that we are being lazy or acting in haste. This criticism is based on the usually incorrect assumption that the eliminated step was valuable in the first place. For example, I don’t rip open the fascia and peritoneum during abdominal entry for a Cesarean because I am in hurry; I do it because it is associated with better patient outcomes. Remember, speed is the achievement of good surgery, not the goal. Isaac Newton said:
“Truth is ever to be found in simplicity, and not in the multiplicity and confusion of things.”
Our task is challenging only because surgery has evolved through trial and error over many decades and only recently has scientific evidence really weighed in. Procedures have accumulated unnecessary complexities. We must erase the prior assumptions and seek the essentials.
2. Confidence. One of my favorite books, Technique in the Use of Surgical Tools, by Romfh and Cramer, contains this pearl:
“The surgeon who terrorizes his operating team is advertising his inadequacies and lack of self-confidence.”
Self-confidence and self-efficacy are essential to a good surgeon. This is not the same as ego. I have written about self-efficacy here and factors that contribute to it. Many surgeons, for all their ego and brashness, lack self-efficacy. Arrogance is often a compensation for low self-esteem and low self-efficacy. What practical steps can you take to increase your sense of confidence and self-efficacy?
A confident surgeon is a deliberate surgeon. If you know what you intend to do, do it! Too frequently surgeons “stutter and stammer” with the needle, unsure of themselves and unsure of their bites. Clamps are placed and replaced three or four times because the surgeon just isn’t sure of quite where it should be. A lack of repertoire in different techniques leads surgeons to keep trying the same thing over and over again, even though it’s not working, when the case becomes challenging. Uncertainty about simple anatomy leads surgeons to look and relook, consider and reconsider. If you know the starting point and the endpoint, you can draw a line between two dots as well as anyone; but too often, surgeons lack deliberateness because they are unsure of the goal and how best to get there. Make purposeful progress with every step. Be intentional.
3. Efficiency.
Joel-Cohen and other master surgeons talk a lot about efficiency of motion and proper utilization of tools. Here are some tips for both.
How can you improve your efficiency?
How can you better use tools? If you are a student of the history of surgery, then you realize that as new tools were developed, huge gains were made in safety, ease, and efficiency. Tools are our friends, and if you find yourself struggling, you probably aren’t using the right tool or you aren’t using the tool right.
My mentor liked to say, “Surgery is a thousand little things done well.” There are lot of things that go into making surgery productive and efficient. A few seconds saved here and there add up to lots of time at the end. Anticipate where you’re going and go there with purpose; bring the team along with you and anticipate their needs. Ultimately, you are no better than your team. How do you make your team better? How do you make your assistant better? Give plenty of advanced notice to the scrub tech for what you need next. It takes time to open up additional suture, load a needle driver, get a scope hooked up, etc. Provide ample time and notice so that when you actually need something it is in your hand. Always see the next several steps in your mind so that you are drawing a straight line between the dots.
Lastly, error begets error. If you place a trocar through the inferior mesenteric artery because you don’t know the anatomy well, you’ve just added 15 or 20 minutes to the case. If your initial circumferential incision around the cervix is too distal, you’ve likely added 10 minutes to the case as you struggle to make colpotomies. If you make the incision too low on the uterus during a cesarean for a woman who has a deep arrest, you might have added 30 minutes to the case fixing an extension into the cervix, vagina, or bladder. Mistakes like these are not a product of haste; they are product of poor knowledge of anatomy and poor technique. Many surgeons are their own worst enemies. Avoiding missteps associated with poor technique is perhaps the greatest time-saver of all.
Published Date : December 20, 2016
Categories : OB/Gyn, Teaching Tools
The modern chief resident in Ob/Gyn (and I am certain in other specialities) has few characteristics in common with the role of a chief resident in decades past. This change, like most changes, is both good and bad. Most chief residents today are not really “chiefs” in any meaningful sense of the word, but are rather just last-year residents, still learning basic and intermediate skills, with a few administrative responsibilities. Those administrative responsibilities may or may not include things like making the call schedule, the rotation schedule, assigning residents to various tasks and conferences, and maybe dealing with vacation requests.
This level of “responsibility” is akin to the level of responsibility a fast food store manager might have, without concerns like payroll, hiring and firing, conflict resolution, etc. In other words, it’s not real responsibility and it rarely comes with real authority. A program director (or administrative secretary) is always ready to veto or change anything he might not like about the chief resident’s decision.
Several events have conspired to change the landscape for chief residents. Due to billing and liability concerns, chief residents are given far less autonomy and responsibility than in the past. Virtually every meaningful decision nowadays is made by the attending physician. This may or may not be a good thing in terms of patient quality (it probably depends on how good the attending is), but it is a devastating thing for the chief resident, who no longer feels a real sense of responsibility for the decisions she is making (while still having a safety net). Most often, the first time a physician feels ultimately responsible for the patient is during the first year of practice post-residency, but by that point the safety net is usually gone and the cost of failure is much higher.
Residency work hour restrictions, coupled with a declining volume of surgical cases, have made the last year of residency more important in terms of learning how to do more advanced procedures or becoming more skillful at basic procedures. In Ob/Gyn, most residents today graduate unable to safely perform a vaginal hysterectomy and several other procedures independently, according to a 2015 study of Fellowship Directors. They found that,
In theory, these graduates entering fellowship training are the crème de la crème. If they are (I have my doubts), then our speciality is in trouble. If the numbers are this bad for graduates entering advanced training fellowships, imagine how bad they are for the average chief resident beginning the last year of training.
Since the chief residents are still incompetent to do many procedures, this creates a situation where the chiefs “steal” the cases from juniors, which perpetuates the vicious cycle of unprepared chief residents. The juniors will enter their chief years woefully unprepared, and worse, the role of chief resident as teacher is all but destroyed as they struggle just to gain some glimmer of competency for themselves.
Teaching is perhaps the most important role of a chief resident (or at least it should be). I learned a lot during the first 20 or so hysterectomies I did, but I learned a lot more during the first 10 that I taught. Even today, I usually learn far more by first-assisting someone than by doing the actual procedure. But a chief resident, who is still not far enough along the learning curve to start teaching, or, more likely, who has not developed enough self-efficacy to give up the case to a junior and teach her how to do it, cannot develop competency because competency and mastery ultimately comes from teaching.
Unfortunately, today, the junior faculty have actually assumed the role of chief resident, and their own competencies and sense of self-efficacy is stuck in that role unless they are lucky enough to grow out of it. An incompetent chief resident who goes into a non-academic practice may never have the opportunity to complete his education through teaching.
The Cognitive Apprenticeship Theory describes the process of a master of a skill teaching a novice apprentice. Various teaching methods can be used during such an apprenticeship. In residency training, we typically see modeling and coaching. In modeling, the apprentice learns from watching the master, and then in coaching, the master gives feedback to the novice while the novice attempts the different skills. There are other methods that are often neglected in modern residency teaching which are necessary to develop mastery.
For example, exploration allows the novice to problem solve on her own; in today’s environment of almost too much supervision and zero tolerance for any type of failure, this important method of education is almost never appropriately utilized.
Another method, articulation, is also not fully utilized. Articulation involves the novice articulating their knowledge, reasoning, and problem-solving in an effort to force synthesis and allow the novice to have knowledge gaps or reasoning errors highlighted so that they can be corrected. In medical education, the role of the novice (the resident, medical student, junior faculty member) in teaching is where most articulation occurs. Depriving the chief resident of the role of teaching junior residents deprives them of this important learning method. I should note that the depth of teaching and the perceived importance of the teaching matters. If a chief resident, for example, is tasked with teaching a junior resident or medical student only basic and non-ambiguous skills and information, rather than complex and nuanced decision making, data interpretation, and advanced skills, then the chief resident is not really benefitting.
Most authors recognize that the Cognitive Apprenticeship Theory proceeds in defined stages. The Fitts and Posner three stages model is frequently applied to this. The three stages are the cognitive stage, associative stage, and autonomous stage.
In the cognitive stage, novices learn the theory of the skill. In the associative stage, they practice the skill and learn from mistakes and misinterpretations while the salient and most important points of the skill are reinforced by the master. In the autonomous stage, learners are skilled enough to practice the skill independently and they possess the requisite skills that allow them to continue to learn from their practice until they become an expert. In practical terms, this may mean reading about vaginal hysterectomy and watching videos and observing and assisting in surgeries during the cognitive stage; then performing vaginal hysterectomies with a teaching assistant who gives good feedback during the associative stage; and lastly, assisting learners with vaginal hysterectomies or performing them independently without a teaching assistant (no attending) during the autonomous stage.
Most residents graduate from residency without ever starting into the autonomous stage. Chief residents may still be early in the associative stage for skills like vaginal hysterectomy at the time they graduate. Because they don’t reach the autonomous stage while still supervised, they may not learn the skills needed to continue to learn from their own experiences and therefore may never advance to mastery. They also often lack the necessary skills for lifelong learning and independent critical thinking necessary to stay up to date during their careers.
Self-efficacy is a measure of how much one believes that she is able to complete tasks or reach goals or work through problems. Interns have a low degree of self-efficacy, at least related to medical problems. In theory, a graduating resident, ready for independent practice, would have a relatively high degree of self-efficacy. This self-efficacy is necessary to progress through the autonomous stage and advance towards mastery. For people with low self-efficacy, obstacles seem larger than they really are, tasks seem more difficult than they should, and adversity quickly leads to giving up and failure. Physicians with low self-efficacy lack resiliency, and as such suffer from more anxiety and depression. Because surgeons with low self-efficacy believe that a task (a surgery) is harder than it actually is, their performance suffers and the surgery becomes more difficult, leading to a vicious cycle of prophetic self-fulfillment. Albert Bandura identified four factors that affect self-efficacy: experience, modeling, social persuasion, and physiological factors.
Experience is a major promoter of self-efficacy (either positively or negatively). Achieving expertise (in anything really) helps develop self-efficacy. Specifically, in a surgical specialty, gaining expertise in a surgical procedure fuels self-efficacy. Residents who graduate without expertise obviously have poorer self-efficacy as a result. I always like to ask learners to tell me something they are good at, something that they can teach me. The topic doesn’t really matter because I am tricking them into feeling a heightened sense of self-efficacy and accomplishment that they can translate into the current task or educational goal.
Modeling is best expressed as the feeling that “if he can do it then so can I.” Once Roger Bannister finally broke the four-minute mile threshold, his record lasted just 46 days. Four minute miles soon became routine for top runners. The self-efficacy of every other sprinter in the world was raised by Bannister’s feat. Bannister, by the way, was a medical resident at the time he broke four minutes (he actually had been at work at a London hospital that morning). One of the negatives of the current modeling in residency programs is that most residents rarely see surgery done well. With less accomplished senior residents, and junior faculty who lack self-efficacy, surgeries seem far more difficult that they should be. Hysterectomies that should take 20-30 minutes instead take 2-3 hours; cesareans that should last 10-20 minutes instead take 1-2 hours. Modeling as a means of self-efficacy development is largely missing from current residency training. If the chief resident struggles miserably to do a hysterectomy, what hope does the junior resident have? When the resident leaves the case feeling defeated and intimidated, self-efficacy drops dramatically. It is the attending physician’s responsibility to ensure that this never happens.
Social persuasion is related to the type and amount of encouragement or discouragement that a person receives. Encouragement typically promotes higher self-efficacy, while discouragement generally promotes lower self-efficacy. As you might guess, most of residency training is focused on negative feedback and therefore discouragement leading to lower self-efficacy. We have morbidity and mortality conferences to scorn failure rather than conferences that reward successes. We always focus on what could have been done better rather than on what went well. Good teachers understand this instinctively. We don’t need to undermine self-efficacy by sharp criticism; rather we need to build on successes. No resident does every task perfectly. We should try to positively reinforce what has been done well rather than constantly point out deficiencies. This is an important part of the associative stage of the cognitive apprenticeship model; the good things and important steps are to be reinforced, not the negatives. The greatest role of a teacher is to encourage.
The physiological factors that affect self-efficacy relate to our stress response. If a situation makes us nervous and anxious, the physiological response to this stress will tend to lower our self-efficacy and make us consequently underperform. A good teaching environment is relatively stress free. Does that sound like most residency programs? The Yerkes-Dodson law describes this behavior:
Anxiety management of learners (and surgeons) is critical to effective education (and successful surgeries).
Self-efficacy is something that should be continuously fostered and promoted in residency training (and all other training). Most chief residents today have very little self-efficacy.
What are some solutions?
None of this means that they shouldn’t be supervised. But it does mean that attending physicians should sign off on their management plans at the last stage unless there is a very important reason not to. In essence, the chief resident should be serving as a junior faculty member.
Consider these two examples:
A patient comes to triage to be evaluated. The junior resident assesses the patient and tells the chief resident. The chief resident presents the patient to the attending physician.
This subtle distinction is all the difference in the world in effective education. The attending may ask about the chief’s thought process and may teach a general principle related to the problem, but unless the assessment and plan are completely off the reservation, the attending physician should just approve the plan. This promotes the chief’s self-efficacy, her sense of responsibility, the respect for the chief by the junior residents, and gives the attending physician an opportunity to assess reasoning and decision making abilities for the chief which may need to be fine-tuned. In turn the chief resident should do most of the teaching for the junior staff.
In surgery, whenever possible, the model should similarly be that the chief resident teaches and assists the junior resident. I honestly believe that it requires as much if not more skill to be a good assistant as it does to be a good surgeon. True mastery of surgery comes from the ability to teach and explain every step to a novice. Many residents graduate not understanding many of the steps of a surgery or their importance, but rather they just practice mimicry of surgeries they have seen, aping each step. This is the difference between a parrot and a person: both can say words, but the parrot doesn’t know what they mean or when they should and shouldn’t say them. A lot of practicing Ob/Gyns and surgeons are mere parrots in the operating room.
This also means that residents need earlier exposure in the operating room than they get in many programs. The goal should be for residents to be leaving the third year at the place where fourth year residents are leaving now. We need to regain a year, and we need to do so in the face of shortened work days and fewer cases. Undoubtedly, simulation and technology will need to serve a larger and larger role in developed core skills for trainees so that each surgical opportunity becomes more valuable (and safer).
A lack of self-efficacy among chief residents and faculty also promotes a negative work environment for everyone. Most bullying behaviors in the workplace are driven by the low self-esteem of the bully. Chief residents are encouraged to be selfish (steal cases) and hostile to juniors as they deal with their own negative feedback, anxieties, and fears as they come closer to graduation not knowing what they are doing. Culture is important, and bullying and negative treatment should never happen.
It’s time to end this negative cycle.
Published Date : December 19, 2016
Categories : Cognitive Bias, Evidence Based Medicine
Like dreams, statistics are a form of wish fulfillment. – Jean Baudrillard
Are you afraid of getting mauled by a bear? I have great news: just move to Mars. That’s right. People who live on Earth are at a really very high risk of bear attacks, but there has NEVER been a report of a bear attack on Mars.
Now there are some downsides and some costs associated with living on Mars, but let me minimize those by not talking about them and scare the living daylights out of you about bears. In the 2010s alone, just in the United States, six people so far have been killed by wild black bears (one even in New Jersey!). This horrifying figure doesn’t include captive black bears, let alone brown bears and polar bears. There is clearly an epidemic of bear violence. Oh, and did you see The Revenant, with the graphic bear attack on Leo? I’m going to Mars. There hasn’t been one bear attack on humans in Martian history, let alone a death.
Not convinced? Well, why do you discount my evidence? I’ll give you nine reasons to do so.
#1. Statistics are what you make of them.
If politics and political reporting have taught us anything at all, it’s that statistics are completely dependent upon context. For example, in December of 2016, the US unemployment rate fell to 4.6%, a low not seen since August of 2007, and down considerably since a recent high of 10.0% in October of 2009, near the beginning of Obama’s presidency. Obama is the greatest jobs president in history! If your bias predisposes you to present the job market favorably, this is how to present the data.
However, if you are affected to present the data differently, then you’ll point out a different set of facts.
The US Labor Force Participation rate fell to 62.7% in December of 2016. This means that a record number of US working-age, able-bodied adults, well over 95 million, are not even looking for jobs in the current low-wage, poor job-conditions economy. The labor participation rate has dropped every year since Obama became president:
And this low work rate has not been seen since the Carter administration, in February of 1978 (it was then in the midst of a decades long rise related to full-female participation in the workforce):
This low work rate is in the face of a record number of Americans living in poverty (42.8 million) and a record number of Americans receiving food stamps (43 million). The jobs Obama did create couldn’t even keep pace with the US population.
Obama is the worst jobs president in history!
Now I told no lies in any of the above data. The same statistics can and often do lead to exact opposite conclusions. But, I selectively chose which statistics to report and discuss. You didn’t care that I did this as long as you were reading the paragraph you wanted to believe. But it bothered you when you read the other paragraph. Which conclusion is “true”? I would say that I’ll let you decide, but you already decided before you even read the data. In politics, the selective presentation of data and its context is called spin.
#2: Selective reporting of data can be used to support almost any belief.
In December of 1891, Carroll Wright, the United States Commissioner of Labor, wrote,
It is almost a daily occurrence that clear, accurate, and most carefully-compiled statistics are used to prove opposite conclusions. This is done by a jugglery of figures, and it is not the fault of the figures themselves.
Clearly nothing has changed. This political example is better than a medical one, because every one is biased about the issue and therefore will have a predictable and visceral reaction. It is not enough when looking at data to say that you will do so in an unbiased way or that you will just let the data speak for itself and accept its conclusions. This is impossible; none of us can operate without bias. Humans so hate being wrong about what they believe that they will stop at almost nothing to twist, rationalize, or ignore whatever information exists to be consistent with their prior beliefs. Our brains are wonderful at avoiding cognitive dissonance even if it means fudging or ignoring the facts. What’s more, we don’t consciously know that we do this (in most cases) so we cannot consciously change the behavior. (We must use systematic methods to identify and remove bias from data interpretation).
In an editorial appearing in the Chicago Medical Times in 1887, Finley Ellingwood, MD, made these comments while discussing theories of the biological mechanism of sex selection in embryos,
Papers have appeared in a large number of our exchanges during the past year on the determination of sex. The question is largely a theoretical one. Statistics have been used to prove every position taken, and the same statistics have been used to prove opposite positions.
Again, nothing has changed in science either. The search satisfying bias happens every day when people search PubMed and selectively find the one article or piece of data that appears to support their beliefs. Data must be presented in the appropriate context and it must be presented in its entirety. This gives rise to two more important lessons that should always be kept in mind:
#3: Correlation rarely equals causation (and a lack of correlation doesn’t eliminate causation).
Let’s say that you can show whether more jobs or less jobs correlates with the Obama presidency. Does that mean that Obama is the cause? Not at all. Correlation rarely equals causation. Correlation may just be pure coincidence or perhaps both events are caused by another event. US unemployment rates in recent years have been strongly correlated with international unemployment rates. Does the US President affect world employment rates? Or perhaps the same international economic factors affect employment rates around the world? Perhaps those same factors affect voters’ choice for president?
Here are some fun examples. Did you know that the US Highway Fatality Rate is strongly correlated to the number of lemons imported from Mexico?
Clearly, we should be spending more money on lemon importation rather than enforcing traffic laws. While we are at it, we should get rid of most fire departments, because there is also a strong correlation between the number of firemen sent to a fire and the amount of damage done to the fire. Also, we need to ban the sale of ice cream on city streets, since the amount of ice cream sold corresponds with the number of serious crimes committed in New York City (shark attacks and drownings are also strongly correlated to ice cream sales). Oh, and grab some chocolate for this one, because the more chocolate you eat, the more likely you are to win a Nobel Prize:
Most people are good at understanding that correlation doesn’t equal causation, but they aren’t good at actually believing it. While everyone has a good chuckle at the above examples, the same folks take seriously studies that show a correlation between the amount of diet soda consumed and weight gain or other such nonsense. American and English people eat a lot of fatty foods and drink a lot of alcohol and have high rates of cardiovascular disease. The French eat a lot of fatty foods and have lower rates of CV disease and Italians drink a lot of alcohol and have lower rates of CV disease. Which of these correlations do you choose to build a health philosophy around? I could go on, but suffice it to say that correlation doesn’t equal causation. In fact, no matter how strong the correlation, there is still no implication of causation. A large amount of evidence published in medical journals is correlative; these studies should do nothing more than encourage someone to design a controlled trial to test causality – but instead, these mostly spurious correlations are adopted into practice. If you enjoy spurious correlations, this website has 30,000 of them.
What’s more, a lack of correlation doesn’t eliminate causation. Correlation simply measures the linear association between two variables, and it doesn’t relate to causation. But remember that two variables may be nonlinearly dependent even though they appear to be uncorrelated using normal tests for correlation.
#4: More than one hypothesis is always satisfied by the data.
Measuring the correlation between two variables is all that we really do in science, and even when we find a strong correlation and make an assumption of causation (probably because we have evaluated the association prospectively and repeated the experiment several times with controls), we still must remember that more than one hypothesis is always satisfied by the data.
Ignorance of this idea is the root of most bad science. Virtually any set of data or any observation can be explained by countless different hypotheses. If you have already decided that your hypothesis is right, and then you find data (any data) that supports it, it is nearly impossible for you to decide that your hypothesis is invalid. Imagine that you come home and see the front door of your home busted open; the burglar alarm is going off and a man wearing a hoodie runs out the back door carrying your jewelry box. You follow the man and discover that he is your neighbor and in his house are your jewelry box and several other valuables. The observed data is all consistent with the hypothesis that he broke into your home and stole your possessions. But are there any other hypotheses that match the observed data? He claims that he heard your alarm go off and went over to investigate since he knew you were not home; when he discovered the front door open, he carried several of your valuables over for safe-keeping until you got home in case the burglar or others might loot your property. This is another hypothesis that fits the observed data. You’ll need additional information to determine which one is most likely.
If you were to design an experiment to test the two hypotheses, you would find a strong correlation for both; that is, you would see a P-value of less than 0.05 for both hypotheses. Remember, the Frequentist statistical techniques we use simply test the probability that the data agrees with the hypothesis given that the hypothesis is true; it doesn’t predict the probability that the hypothesis is true. In this situation, there would be a finding of statistical significance for the hypothesis that the neighbor is the burglar and for the hypothesis that he is not. Remember this next time you read a paper with a statistically significant finding. Ask yourself what other hypotheses might fit the data as well, keeping in mind that an exact opposite hypothesis might also give rise to the same observations.
Recall that a Type 1 Error is the incorrect rejection of a true null hypothesis (in favor of some other hypothesis) while a Type 2 Error is incorrectly retaining a false null hypothesis (and therefore falsely rejecting some other hypothesis). In both cases, the hypothesis being considered is the null hypothesis, not a particular alternative hypothesis. If I do a study to test the effect of magnesium sulfate on the rates of cerebral palsy among neonatal survivors of preterm deliveries, the null hypothesis would be that it has no effect. If I find a statistically significant difference in cerebral palsy among survivors between the two arms of my study, there are three possibilities: a Type 1 Error (falsely rejecting the null hypothesis); my alternate hypothesis is true (magnesium sulfate decreases rates of cerebral palsy); or some other unconsidered hypothesis explains the results (maybe magnesium sulfate increases morbidity among babies most vulnerable to cerebral palsy, resulting in less survivors with the disease).
#5: Don’t accept the alternative hypothesis just because you reject the null hypothesis.
It is this last, third possibility that is too frequently overlooked. Always consider what other alternative hypothesis might explain the data.
#6. Always consider the magnitude of effect.
There are a lot of things that are true that just don’t matter. A LOT. Even if a statistical correlation is valid, and even if we are confident that our theory of causation is true, this doesn’t mean that this “truth” requires action. It is inarguably true that when bears attack humans, there is a great chance of death. It is also true that moving to Mars would eliminate this risk entirely. But of course the idea that we should move to Mars to avoid bears is ludicrous.
But why? Part of the answer is related to the magnitude of effect. The number of lives saved by this ridiculous scheme is inconsequential when put into the context of cost and unintended consequences (which we will get to in a second). In most experiments, when such a small magnitude of effect is observed, chances are it is due to chance. But I used the bear example to remove that possibility; I am admitting that it’s true that moving to Mars will prevent bear mauling. Even though this is true, you don’t care and neither do I.
We need perspective to appreciate the magnitude of effect. As I have mentioned before, we need to think about, talk about, and relate risks and benefits to ourselves, our colleagues, and our patients, in understandable terms.
Consider this: bear attacks are scary; they are what I call boogeymen. Boogeymen have significant emotional and visceral effects, and this tends to bias our perspective. There are about two bear-related injuries per year in the US; but about 15 people are killed every year by dogs and more than 90 by lightning. Bears are definitely more dangerous than dogs, but of course dogs are much more prevalent (and trusted). Remember, you are more likely to die drowning in a kiddie pool in the US than you are to be killed by an assault rifle. But the assault rifle is the boogeyman, not the kiddie pool. We focus too much sometimes on rare boogeymen and too little on every day things that we can do better. This gives us another valuable lesson.
#7: Focus on doing common things well before you worry about unusual things if you want to have the biggest impact.
The Coronis Trial looked at the long term (three year) consequences of different Cesarean delivery techniques and generally found no differences dependent upon the techniques used (this is not to say that there aren’t short term differences). As an example, they found that the rate of hernia (about 0.2% over ten years) appeared no different among women who had sharp versus blunt dissection of the fascia. This type of study is valuable and adds to our current knowledge. I can imagine that if the study had found that one method had reduced the risk of hernia from 0.2% to 0.15%, there would be quite a few people being very dogmatic that their method of fascial entry was superior and should always be used, even though the number needed to treat is 2000. The high number needed to treat should be evidence of the low impact of the intervention and a significant likelihood of a false positive finding. But if you are really interested in lowering the rates of abdominal hernias, do less cesarean deliveries! Most obstetricians could cut their cesarean delivery rate in half if they followed current labor management guidelines; doing so would have the biggest possible impact on their patients’ outcomes. Until they do these common things correctly, they really have no right to consider less common practices.
A real example of this type of boogeyman is one layer versus two layer closure of the uterus to prevent future uterine rupture. The Coronis Trial found 1 rupture among 1610 subsequent pregnancies in the one layer group and 2 ruptures among 1624 subsequent pregnancies in the two layer group. This data is not statistically significant, but let’s imagine that it was: the number needed to treat (NNT) to prevent one uterine rupture by doing only one layer uterine closures would be at least 1638 and likely much higher with a larger sample size, so we can round to 2000. This unreasonably high number is enough to demonstrate that the data is not significant, either statistically or clinically, yet you and I both know providers who refuse to do trials of labor in women who have had prior one layer closures, even though my example from the Coronis trial showed more rupture in the two layer group. Funny how people misunderstand statistics.
If a boogeyman is a rare thing that gets too much attention and concern because it invokes a negative visceral response (bears, sharks, assault rifles, etc.), what about a rare thing that gets too little concern because it seems innocuous? I call these the Veratri.
Veratrum is a genus of a beautiful white flower. Eat one, though, and you’ll probably die of a heart attack. Veratri are the opposite of boogeymen: they seem harmless at first glance and are often underestimated. There is no programmed, negative visceral response to white flowers. One example of a veratri is the use of antibiotics for 3rd and 4th degree vaginal lacerations. Currently, the American College of Obstetricians and Gynecologists recommends a single dose of a second generation cephalosporin at the time of repair for any obstetric injury involving the anal sphincter. This is a relatively new recommendation (after hundreds of years of not doing so) based upon a recent randomized, placebo-controlled trial which showed that only 8% of women who received the antibiotic developed serious wound complications at two weeks compared to 24% of women who received placebo. In other words, the number needed to treat (NNT) is only 6.3 patients to prevent one serious wound complication.
This low NNT (6.3) shows the statistical significance of the data and also the clinical significance of giving the antibiotic. Yet many obstetricians discount this recommendation because they rarely see the condition (I have never seen a wound complication from an obstetric anal sphincter injury, but then I’ve only had 10-15 in my entire career) and because it is contradictory to decades of practice. If the opposite circumstances had existed where the former practice had been to give antibiotics but a new study showed that it was not valuable, then these same obstetricians would fight just as hard to keep giving them. I call this cognitive inertia. It’s just much easier to keep doing the same thing and justifying your current beliefs than to believe that you’ve been wrong about something most of your life. Cognitive inertia explains, for example, why obstetricians continue to give magnesium sulfate for tocolysis despite all available scientific evidence saying that it is ineffective for this indication.
Both boogeymen and veratri are metaphors for the cognitive bias called base rate neglect.
But back to the bears. Let’s say that you’ve looked at all the data about bears and their heinous attacks on humans and you believe that the magnitude of effect of moving to Mars is justifiable given this epidemic; you still must consider the unintended consequences of moving to Mars.
#8. Unintended consequences are just as important as intended consequences.
A lot of true things with meaningful effects still shouldn’t be done due to the unintended consequences of the action. A good study will pick outcomes that matter, as I have previously discussed. For example, the point of giving a tocolytic is to make healthier babies, so outcomes should focus almost exclusively on neonatal outcomes, not frequency of contractions or time to delivery. Likewise, cancer screenings like mammography or colonoscopy should focus on total mortality and total morbidity, not sub-outcomes like death from breast cancer or colon cancer. Or interventions like prophylactic oophorectomy or salpingectomy should not focus just on death from ovarian cancer, but total mortality and morbidity as well as quality of life years.
But clinical outcomes aren’t the only unintended consequences to focus on. We also must consider cost. This seems unethical to many people who feel uncomfortable even considering withholding an intervention from a person or group of people due to cost; but the truth is, doing so is actually one of the most ethical things a physician can do.
Consider for example example doing a full-body CT scans every 3 months on every person in order to detect cancers and other serious pathologies very early (I can hear the mouths of Radiologists across the country watering). A first year med student can see that the unintended consequences of this are prohibitive. The radiation exposure, for example, would likely lead to more cancers over time than are prevented. What if we use MRI instead to remove this unintended consequence? Next the savvy student will think of over-diagnosis and over-treatment of incidentalomas. This is an excellent point as well. But fewer students will think of the ethical principle of Justice.
Doing these MRIs would cost at least $4 trillion per year just for the imaging, let alone for all the costs from incidentalomas, etc. That’s more than the entire cost of our current healthcare system. The ethical principle of Justice requires us to spend our money and resources in the most ethical way; in essence, we must get the most bang for our buck. That means that we can and should place a value on human life and on quality of life years. This cost analysis, in turn, must be considered when any new drug or intervention comes along for our assessment. Think of it this way: when you order an unnecessary (or low yield) test on a patient, you might be denying a life-saving vaccine to a child. This leads us to the last lesson, a modification of a previous lesson:
#9: Focus money and resources on high-impact tests and interventions before you spend resources on low-yield therapies and diagnostics.
We cannot practice medicine in a responsible way without understanding how to utilize data in the correct way. Unfortunately, most medical providers continue to pervert data for their own devices and many fall back on anecdotal medicine because it is more comfortable for them.
Cognitive psychology tells us that the unaided human mind is vulnerable to many fallacies and illusions because of its reliance on its memory for vivid anecdotes rather than systematic statistics. – Steven Pinker
The keyword is unaided. There is a right way to do evidence based medicine, but most people who claim to believe in EBM are not doing it well. They have data dreams but not a systematic approach to data analysis. For all its faults, evidence based medicine is still leaps and bounds ahead of anecdotal medicine.
I’ll provide one simple piece of evidence for this assertion. A new study, just published, sought to build upon previous knowledge that female physicians are more likely than their male counterparts to follow clinical guidelines and evidence based practice. They gathered data to see if this rational female adherence to data had an impact on patient outcomes compared to the irrational male adherence to anecdotes and their own perceived omnipotence. Guess what? Patients treated by female physicians had lower 30-day mortality and lower 30-day readmissions than patients cared for by male physicians. Sorry guys.
For a more formal consideration of these principles, please read How Do I Know If A Study is Valid?
Published Date : December 14, 2016
Categories : Evidence Based Medicine, OB/Gyn
A current controversy in gynecology illustrates several issues with the implementation of evidence-based medicine into surgical procedures. The clinical question is, Should we perform universal cystoscopy at the conclusion of all hysterectomies for benign reasons to detect urological injuries? This would contrast with a strategy of selective cystoscopy after particular high-risk procedures or in particular high-risk patients or situations. To analyze this issue, we need to answer some basic questions first.
What type of injuries can occur?
The bladder can become lacerated, perforated, or bruised. It can suffer a thermal injury or have erosion or placement of mesh into the bladder.
The ureter can become lacerated, transected, or ligated. It can be angulated (kinked). It can be devascularized and/or suffer a thermal injury.
The urethra can become lacerated or perforated. It can be partially or totally obstructed. It can have erosion or placement of mesh into the urethra.
Not all of these injuries are immediately detectable by cystoscopy. Many thermal injuries, for example, do not become apparent for a few days and so immediate cystoscopy would be negative (and might even give a false sense of reassurance should that patient develop worrisome symptoms in the first few postoperative). Some angulation injuries of the ureter still result in efflux or jetting of urine from the ureteral orifice, but subsequently result in problematic hydronephrosis or even worse if accompanied by postoperative swelling.
There obviously is a huge advantage to early detection of any of these injuries. Most of these injuries, if undetected, will result in at least one additional surgery and often several invasive and expensive tests and procedures. Serious long-term issues like renal failure or vesicovaginal or ureterovaginal fistula may occur. Early detection and correction is clearly advantageous.
What is the actual rate of urological injury during hysterectomy?
An excellent prospective data source for rates of complications during hysterectomy comes from this Finnish study (FINHYST) of 5279 hysterectomies. From this data set we can summarize the following rates of major complications of hysterectomy:
One immediate observation is that the major complications rate for vaginal hysterectomy (VH) is substantially lower than the rates of complications for both abdominal hysterectomy (AH) and laparoscopic hysterectomy (LH). In particular, the rate of bladder injury ranges from a low of 0.6% with VH to a high of 1.0% with LH and the rate of ureter injury ranges from a low of 0.04% with VH to a high of 0.3% with both AH and LH.
An immediate response to this data might be that the easier cases were done vaginally rather than by LH or AH; but this common canard is not based in evidence. Gynecologists tend to do hysterectomies the way that they tend to do them. Some, like me, do virtually every case for any indication vaginally, while others do the same utilizing the abdominal or laparoscopic approaches. It is true that an advanced endometriosis case is more likely to be done laparoscopically or abdominally, but that balances out with an increased likelihood that advanced uterine prolapse/procidentia cases are more likely to be done vaginally. Both confer increased risk of surgical complications. The vaginal cases are also, as a rule, more likely to be accompanied by procedures like colpopexies, colporrhaphies, and slings, which should confer an increased risk to the patient of these complications. So don’t jump to any conclusions.
The literature is consistent in validating the vaginal route as the safest for avoiding complications, and there are likely anatomic reasons for this. I will illustrate the danger of jumping to conclusions that “make sense” with this example. David Nichols and Clyde Randall, widely regarded as two of the greatest vaginal surgeons ever, stated in their opus Vaginal Surgery,
The risk of ureteral injury is greater during a vaginal hysterectomy.
They made this statement because they believed that the ureter was drawn closer to the uterus when it was pulled down during a vaginal case, and rolled away from the uterus when pulled up as during an abdominal case (or a laparoscopic hysterectomy). They believed that the uterine artery “pulled” the ureter down with it. This belief, from the two venerable and venerated surgeons, became dogma in the 1990s, along with lots of other nonsense.
The truth is the exact opposite. This cadaveric video shows that with downward traction (as during a VH), the ureter moves from about 2 cm from the uterine artery to about 4 cm! The intuition of the experts was exactly opposite! What’s more, we can see from the Finnish data above that the rate of ureter injury is about 10 times higher with AH and LH than with VH. The proof, as they say, is in the pudding.
What other axioms have we accepted as dogma? That’s a different subject for a different time, but you likely believe many things wholeheartedly that are exactly wrong.
The Finnish authors also found that urinary retention was independently associated with prolapse, but not the route of hysterectomy (though the prolapse cases were more commonly done vaginally).
How good is cystoscopy at detecting these various injuries?
There are several issues to be considered in order to answer this question. First, many of the complications, particularly cystotomies, are noted at the time that they occur, and therefore universal cystoscopy shouldn’t get the credit for identifying those injuries. Even when the injury is “confirmed with cystoscopy,” the cystoscopy might have been performed because the surgeon was worried about it (we’ve all been there). Second, the debate is between universal cystoscopy and selective cystoscopy; in other words, the real question becomes, How many unsuspected injuries are detected when cystoscopy is performed on patients having straightforward hysterectomies for benign indications without additional risky procedures (like miduretheral slings, colpopexies, or anterior repairs)?
This question further needs be answered in the context of different routes of hysterectomy, since some routes have higher intraoperative detection rates than others (it is easier to identify a bladder injury at the time of VH than at the time of LH, for example) and different expected rates of injury. What might be good for an LH may not necessarily apply to a VH. Also, we must sort out how many injuries go undetected despite cystoscopy and how many delayed injuries were due to thermal injuries or some other mechanism that might have avoided detection at the time of surgery.
In this study of 2918 hysterectomies from the University of Michigan published in 2016, a retrospective analysis was performed for a series of patients before and after an institutional policy of universal cystoscopy was adopted. Before adoption of the universal cystoscopy, 2.6% of patients had a urological injury at the time of hysterectomy. Of these 25 injuries, 7 went undetected at the time of surgery (two of the seven had normal cystoscopies). After adoption of universal cystoscopy, 1.8% of women had an injury. Of these 34 injuries, 2 went undetected at the time of surgery (both had normal cystoscopies).
There are several things to observe about these data before digging deeper. The overall injury rate of 2.1% in this study is about double what we would have expected based on the Finnish data. Also, the complication rate declined over the years of the study. These observations raise two questions. First, how much does having residents and fellows increase the complication rate? Second, how much does the surgeon’s place on the learning curve affect complication rates?
It is difficult to know how much residents and fellows might contribute to the rate of complications. On the one hand, when novices hold the scissors or the knife, we would expect a higher rate of complications; but, on the other hand, that resident is likely to be more tentative and hesitant, with a watchful attending monitoring every step. So the two factors may cancel each other out. Some studies have tried to examine this by comparing the outcomes of cases performed by residents with attending physicians to those performed by those same attending physicians alone, and the results of these studies are favorable for the residents. This study show an increased risk of thromboembolism and wound infections associated with resident participation, probably reflecting longer operating times. This piece from the New York Times summarizes some of these issues and some of the literature.
Studies that compare the complication rates of residents to their academic attendings make the assumption that academic attending physicians have as good of outcomes as those in private practice. This may not be the case; a good academic attending rarely operates himself (instead letting the residents do the cases), whereas a busy private practitioner may operate by herself almost exclusively. In FINHYST, women who had their VHs at a university setting were about twice as likely to suffer a complication as those who underwent the same procedure at local or private setting (OR 0.69 for local and 0.39 for a private hospital). Thus, what’s good for an academic institution isn’t necessarily good for the real world. In the Michigan study, the average operating time was just under three hours for each hysterectomy; that alone should tell us that these cases were not being doing expertly.
We also must consider the learning curve. Over the course of the Michigan study period, more and more robotic-assisted hysterectomies were being performed. We can assume that many of the cases done before universal cystoscopy were performed early in the learning curve of the attending physicians. The decrease in complication rates from 2.6% to 1.8% over the course of the study may partially reflect progression along the learning curve. We know from other studies that the number of procedures performed per year by a surgeon is correlated to the complication rate. We don’t have an appreciation where surgeons in the Michigan study might have been on their own personal learning curves for these procedures. So the results might be more applicable to low volume surgeons or those early in the learning curve (which defines all residents) than to high volume surgeons.
This systematic review of the impact of surgeon volume on outcomes found that low volume surgeons had 60% more intraoperative complications and 40% more postoperative complications.
Below is the data from the Michigan study about the injuries that were missed during the time of surgery.
Of the seven women with delayed detection of injuries before the implementation of universal cystoscopy, two actually had cystoscopy anyway and the injuries were still missed (or perhaps it was too early to detect the injuries). Four of the remaining five were vesicovaginal fistulae. It is difficult to know whether cystoscopy would have identified correctable injuries in these four cases. Fistulae may result from an undetected cystotomy, most commonly performed at the time of abdominal or laparoscopic hysterectomy during dissection of the posterior bladder wall from the anterior surface of the uterus. Indeed, all four of these fistulae were during an AH or LH. Increasingly more common today, however, is that the injury is related to necrosis and/or devascularization that results in delayed injury. This can happen when the Bovie is used for bladder dissection at the time of AH or when an energy device is used at the time of LH. Cystoscopy performed at the time of these procedures may have revealed an intact bladder or bruising only, therefore not changing the ultimate outcomes.
The fifth injury was a ureteral obstruction that was treated with balloon dilation. This injury is very likely to have gone undetected at the time of the original surgery, since the ureter likely still effluxed (how much efflux is enough to rule out angulation-type injuries is not yet known).
Thus, it is reasonable to conclude that even with universal cystoscopy, two or three of these injuries would have remained undetected, just as four of the nine undetected injuries weren’t identified even though cystoscopy was performed at the time of surgery. As expected, the lowest rate of delayed injury occurred with VH, where only one injury had delayed diagnosis (even though a cystoscopy was performed). Even after the adoption of universal cystoscopy, most of the injuries were still detected without cystoscopy, and fewer significant injuries overall, like vesicovaginal fistulae, may represent better surgery performed by surgeons further along their learning curves rather than a benefit of universal cystoscopy. The authors didn’t discuss this possibility, but surgeons dealing with multiple recent cases of vesicovaginal fistulae might learn from continuous quality improvement efforts to use less energy sources near the bladder, improve the dissection around the vaginal cuff, and make sure cuff closure sutures are not too close to the bladder. A retrospective study cannot be used to conclude that cystoscopy is the reason for fewer cases of vesicovaginal fistulae (a prospective, randomized, cross-over type study would be ideal).
What are the costs and consequences of delayed diagnosis?
Clearly there are costs and consequences for delayed diagnosis. A study by Visco et al. previously concluded that when the rate of ureteral injury exceeded 1.5% for AH or 2.0% for VH or LH, then cystoscopy became cost effective. Nothing published subsequent to this analysis would tend to suggest anything different. Data like this is the basis of the current ACOG-recommended practice of selective cystoscopy. The ureteric injury rate in the Michigan data appears to be 0.46% overall, which would suggest that their current practice is not cost-effective.
There are also medicolegal considerations. Undoubtedly some percentage of all urinary tract injuries will result in liability claims, and I am sure that this percentage will be higher in the group of women who had a delayed diagnosis. But by creating a non-evidence based de facto standard of care (like universal cystoscopy) it is certainly possible that such claims may go up as an unintended consequence. The perception will become that all injuries can be detected at the time of surgery and corrected (even though 44% of the delayed injuries in the Michigan study were not detected by cystoscopy) and this false perception will resonate with juries. Even more exposed will be the outliers who do not or cannot perform routine cystoscopy (in the Michigan study, even with a department-wide mandate, they were unable to achieve 100% performance of cystoscopy).
What other unintended consequences does universal cystoscopy have?
In addition to potentially creating a more negative medicolegal climate around urinary tract injuries, universal cystoscopy will be accompanied by increased cost, increased operative time, over-diagnosis of certain bladder conditions, and over-treatment of false positive findings. There are troves of data that show the negative implications of increased operative time (here is an example) and we don’t have adequate studies to assess the risks of over-diagnosis and over-treatment.
So what do we do?
Here’s what we can and should do:
If you want to learn more about diagnosing managing urinary tract injuries, read this excellent review for more information.
Published Date : December 13, 2016
Categories : #FourTips, OB/Gyn
“If you do only routine cases eventually even they will become difficult.”
A limiting factor for many during vaginal hysterectomy is the non-descending uterus. Women who have not given birth, for example, often pose a challenge to the vaginal hysterectomist, particularly when there is another complicating factor, like obesity or a large uterus. In general, once the uterosacral ligaments are transected, the case will get dramatically easier. I have previously commented on large trials which found that nulliparity and a lack of descent are not contraindications for vaginal hysterectomy, and indeed the vaginal route results in excellent outcomes in these patients. Nichols and Randall, in their famous book, Vaginal Surgery, go so far as to state:
“Provided the uterus is movable, the less the prolapse, the easier the operations and the anatomic challenges, findings, and solutions of no two will be identical.”
Their point was that a non-prolapse vaginal hysterectomy is accompanied by fewer anatomic variations and distortions that might lead to inadvertent injuries to the bladder or ureters, for example. So here are four more tips.
1. Don’t change your approach to an LAVH.
I recently learned from a colleague what LAVH stands for: lousy at vaginal hysterectomy. While there are certainly patients who may benefit from laparoscopic-assistance, like those with adnexal pathology or advanced endometriosis, a lack of descent is not usually aided by this approach (and it may even make things worse). The support of the uterus comes mainly from the uterosacral and cardinal ligaments, and division of these ligaments is not a part of the standard LAVH approach. Dividing the infundibulopelvic or utero-ovarian ligaments rarely makes the difficult portion of a vaginal hysterectomy any easier. Worse, because the patient is usually in yellow-fin style stirrups (rather than candy-cane stirrups), then the vaginal portion of the case will be miserable as you struggle with an inadequately positioned patient, poor exposure, and no room for your assistants.
The exception to this rule is when the uterus is not descending due to scarring of the uterus to the anterior abdominal wall, as often happens with a previous cesarean delivery. This finding may be known in advance of the surgery due to a previous diagnostic laparoscopy, ultrasound mapping of adhesions, or the presence of Sheth’s Cervicofundal Sign on exam. Sheth’s Sign is present when traction of the cervix results in a depression of the abdominal wall. These types of adhesions can be dealt with vaginally by advanced surgeons, but, in general, it is easier to cut the uterus off the anterior abdominal wall before proceeding vaginally.
Click picture for source.
While it is true that the upper pedicles are often a challenge in non-descent cases even after division of the uterosacral ligaments, this issue is obviated through other normal vaginal techniques like use of an energy sealing device which extends our safe range of securing pedicles or hemisection of the uterus or other uterine debulking techniques which provide more room to work (even in relatively normal sized but non-descending uteri).
2. Be flexible in making a posterior colpotomy.
Many challenging cases are abandoned too early when the surgeon is unable to make a posterior or anterior colpotomy. Of course, the anterior colpotomy can be delayed until near the end of the case and really shouldn’t be an issue. No good modern technique for vaginal hysterectomy encourages early anterior colpotomy.
But the posterior colpotomy needs to be made (in most cases) to make good progress (and allow division of the uterosacral ligaments). In non-descent cases, the traditional methods of making a posterior colpotomy may not work because of an inability to get instruments in the right positions and a lack of exposure. Don’t get frustrated by this, but instead consider a different approach. Here are two suggestions:
First, if using your standard technique of posterior colpotomy, consider using sharply curved Jorgenson scissors rather than curved Mayo scissors.
The acute angle of the scissors will significantly change the geometry and allow for a much easier entry. This is actually the same reason why these scissors are helpful in making the colpotomy at the time of an abdominal hysterectomy. If the initial incision attempt for the posterior colpotomy is too distal, then the peritoneal reflection may get pushed further and further toward the fundus, making the entry more and more difficult, particularly if there is already a lack of descent.
An alternative approach, such as the Pelosi method of cervicocolpotomy, can also be used. The Pelosi method for posterior colpotomy is not dependent upon uterine descent. In the Pelosi method, a vertical incision is made at 6 o’clock on the cervix and this incision is continued up the posterior wall of the cervix until the incision runs into the reflected peritoneum and a colpotomy is made. Using this technique, virtually any posterior colpotomy can be made with ease. To minimize bleeding, injection with a vasopressin solution should be performed first.
Once the colpotomy is made, the case can almost always be completed vaginally. This colpotomy will allow for transection of the main supporting ligaments and provide artificial uterine descent. Once the uterosacral and cardinal ligaments are divided, the case will usually get dramatically easier.
3. A right angle clamp is the right angle to take for a non-descending uterus.
As long as the uterosacral and cardinal ligaments are attached to the uterus, there won’t be a lot of descent. After making a colpotomy, dividing these ligaments is the next step. In the traditional method (and my normal method), a hysterectomy clamp (either a Heaney or a Glennard type clamp) is placed around the uterosacral ligament at the position of the yellow ligament below, whose tip points at the peritoneal edge of the posterior colpotomy.
This clamp needs to fit snugly around the ligament next to the cervix to avoid increasing the risk of injury to the ureter which lies within 2 cm of the top of the yellow arrow on the laparoscopic view above. If the clamp is not sufficiently snug towards the cervix, then the tip of the clamp can quickly rotate outwards (laterally) towards the ureter.
If the clamp is rotated cephalad and lateral due to the depth of the vagina, then it may approximate the position of the dotted yellow arrow, bringing into contact with the ureter. This clamp geometry problem in a non-descent case can make it very difficult to get the clamp into the proper position. One choice is to take small serial bites with a sharply curved hysterectomy clamp; but in addition to this being time consuming, it can still be very difficult when the ligament is several inches up in the vagina.
Another approach is a modification of the Purohit method of clampless vaginal hysterectomy. In this case, dissection of the ligament is accomplished with a right angle clamp and a Bovie after the posterior colpotomy has been made.
Once the vaginal epithelium has been dissected back from over the underling ligament, the right angle clamp can be inserted along the yellow arrow above, fitting snugly against the cervix (away from the ureter).
The degree of descent is now irrelevant as this technique can be employed even several inches into the vagina. With the clamp slightly opened, a Bovie can be used to divide the tissue between the jaws of the clamp.
Clips taken from here.
The clamp can be sequentially repositioned more anteriorly until the full uterosacral ligament is divided and the insertion of the cardinal ligament also divided. The normal size uterus usually descends significantly after this has been done on both sides and the rest of the case proceeds normally.
4. Make some room.
In rare cases, it may be necessary to cut an episiotomy or a Schuchardt incision. Certainly don’t do this routinely, but if you find that the space between the weighted speculum posteriorly and the anterior vaginal wall retractor is only about 3 cm or less, then this step will usually become necessary for complicated cases.
Bonus tips.
I’ve previously described some other tricks for non-descent cases. The Doderlein-Kronig technique of massaging the uterosacral ligaments at the beginning of the case, adds up to 1 cm of uterine descent. I find myself doing this at the beginning of most cases. Positioning the patient in the reverse Trendelenburg position and making sure that the buttocks extend off the bed helps get more depth out of the weighted speculum and enhances exposure significantly. The further in the weight speculum is positioned, the easier it is to place clamps on the uterosacral ligaments. In fact, the assistant should routinely press the speculum into the vagina while the surgeon places the initial clamps on those ligaments.
Read more at Simplified Vaginal Hysterectomy and Four Tips For Vaginal Hysterectomy.
Published Date : December 7, 2016
Categories : Cognitive Bias, Evidence Based Medicine
Until I know this sure uncertainty,
I’ll entertain the offered fallacy. – William Shakespeare, The Comedy of Errors
My four year old daughter asked me to guess which two numbers she was thinking of, and all she would tell me is that the two numbers add up to 10. How do I approach this problem? I guess; but my guess is not entirely uninformed.
First, I would assume that she is talking about integers. I can also assume that she is talking about positive numbers (she probably doesn’t know much about the concept of negative numbers). She is likely thinking of two different numbers (since that’s how a four year old would process the phrase two numbers), and I doubt she would consider zero (nothing) as a number. This leaves just four likely options: 1+9, 2+8, 3+7, and 4+6. Knowing that people tend to think of numbers in the middle of a sequence, I picked 4+6 (and was right).
Am I psychic? No. Lucky? No. I simply bet on the most probable solution. Yet 42.5 and -32.5 as well as pi + 6.8584 could have been the answers. Even among rational numbers, there are an infinite number of solutions. This is how we make any decision. There is a real answer (whatever is in her head) but without her telling me the answer, I need to use all ascertainable data to make my best guess (that is, the most probable solution). When patients present with a problem, there is an actual diagnosis, but they don’t know it, so we must use all ascertainable data to make a best guess (and change that guess to the new most likely solution when we learn new information).
So what did I do when guessing the two numbers? I divided the infinite number of possible solutions into common solutions with a high probability of being correct, uncommon solutions with a low probability, and uncommon solutions that are improbable.
In other words, I made an exhaustive (infinite!) differential diagnosis once I was presented with the chief complaint (“Daddy, guess two numbers…”). In a real clinical encounter, I would then start asking a series of questions to narrow down my differential diagnosis. Note that the questions I ask are based upon the differential diagnosis, so the differential is very important and must be considered before the questions. For example,
With each additional bit of information I gather, I quickly narrow down an infinite list. If I can get answers to those four questions (don’t assume we can always get answers), I have narrowed down an infinite number of choices to just two. There is also a possibility that I cannot get certain answers to each question; and this is more the case when dealing with clinical medicine. “Does the x-ray show a pneumonia?” The true answer might be, “There is an 80% chance that the x-ray shows a pneumonia.” Nevertheless, I can still use this information to make one diagnosis more likely (pneumonia) and another less likely (lung cancer).
A test is just another type of question. I can ask my daughter if she is talking about two different numbers; she may not know the answer. I could also perform an experiment. For example, I could have her subtract one number from another and tell me whether the answer is equal to zero. In this same way, I order tests in clinical medicine. A CBC or a CT scan is just another form of interrogation.
In clinical medicine, I like to talk about diagnoses in the same three tier system. I call them horses, zebras, and Tasmanian Tigers. When you hear hoofbeats, think horses. It is true: Common things are common and rare things are rare. Our knowledge of what is common (the accessibility bias) saves us time and makes us efficient; and we are usually correct because common things explain most things. But sometimes hoofbeats are zebras, and sometimes they are Tasmanian Tigers.
Tasmanian tigers officially went extinct in 1936. They were a marsupial that was hunted to extinction in parts of Australia, New Zealand, and Tasmania by farmers and others. Yet, every now and again, someone spots an animal in the wild they believe is a Tasmanian tiger. I have seen zebras and I know where to go find them; but I have never seen a Tasmanian tiger and, as far as I know, I never will. Still, I wouldn’t be shocked if someday another one is spotted deep in the wild bushlands of Australia.

The sum of Tasmanian tigers + zebras + horses is equal to all possible diagnoses. Most of those don’t interest me; but as long as something is possible, it is still on the list. We spend most of our time in the green space of horses, and occasionally we venture off into the goldenrod pasture of zebras. Let’s look at a practical example. Let’s consider a few choices for postmenopausal bleeding.
I have arbitrarily (and not accurately) divided this differential diagnosis into three groups based on standard deviations (yes, I know the math isn’t perfect). But the general idea is that about 95% of diagnoses are horses, while about 99.7% of all diagnoses are either horses or zebras. That last 0.3% has a ton of stuff in it, not just the ones I have listed: primary lymphoma of the vagina, hypernephroma, ligneous cervicitis, tamoxifen-affected villous papyraceous, vaginal neurofibromatosis, uterine angiolipoleiomyoma, etc. It even contains so-far undiscovered diagnoses.
We spend most of our lives with horses, and occasionally meet zebras. The next patient you see with postmenopausal bleeding, you could just tell her not to worry about it and that it is benign and you would be correct 95% of the time. But some zebras and even Tasmanian tigers are always important to consider (like endometrial, cervical, ovarian, and bladder cancers) . So how can we put all this together in a cogent approach to diagnosis?
That’s it. It works. Try it out. We are always giving patients our best guesses; we need to remember that they are all guesses and as such should be subject to revision as new information comes in. That’s what Bayesian updating is, and the process I have described here is Bayesian probabilistic reasoning. It allows me to give the most accurate answer I can to a variety of questions, ranging from my daughter asking me to guess two numbers to a patient asking me to find out why she has postmenopausal bleeding.
Published Date : December 5, 2016
Categories : Cognitive Bias, Evidence Based Medicine, OB/Gyn
Here are some steps of vaginal delivery that have been used in various combinations over the last century:
All of these practices have three things in common:
Most of these practices were introduced into Obstetrics with the best of intentions, often in response and in reaction to bad outcomes that were being observed. They were interventions that “just made sense,” given how the problems were perceived at the time. They were passed on to trainees with one part dogma and one part fear – fear that doing a delivery any other way would have disastrous consequences. Many of these practices persisted for decades after good evidence said they should be abandoned (and some are still used today).
Let’s look at a few examples.
A hundred years ago or so, before the discovery of antibiotics and after the development of aseptic and antiseptic techniques, obstetricians responded to puerperal fever (the once leading cause of death for pregnant women) by loading up as many aseptic and antiseptic principles as they could onto women having babies. Enemas were to prevent later defecation from spoiling the supposedly sterile field achieved with solutions of mercury and Lysol. Sterile drapes, gloves, and technique were prerequisite. Many of the practices, like shaving the women, actually increased the risk of infection. Women who give birth today rarely undergo any type of preparation at all. Infections still happen and many are unavoidable; when they do occur, they are usually easily treated with antibiotics. But in the belief that all peripartum infections were iatrogenic (and therefore preventable), we even went so far as routinely using only rectal examinations of the cervix (another procedure that likely increased cross-contamination rates). None of these interventions were evidence based except hand-washing and glove wearing.
When fewer and fewer women were dying of infections (yay gloves!), neonatal and infant morbidities and mortality became the next cause célèbre. At the turn of the 20th century, cerebral palsy and other mental and physical disabilities of children were believed to be almost exclusively caused by birth trauma to the brain or other intrauterine events. This trauma supposedly occurred as the head was repeatedly assaulted by the perineum and vagina while the mother was pushing. So, once again, science came to the rescue in the form of the prophylactic episiotomy (to prevent the head from being bullied and to shorten the second stage of labor) and the prophylactic forceps delivery (with the blades of the forceps pushing out on the walls of the vagina, preserving the fetal head from trauma). Of course, these ideas seem silly today, but good intentions led to generations of children being born this way, all the while causing women and children to have countless severe traumas, perineal lacerations, incontinence, and dyspareunia. The widespread introduction of electronic fetal monitoring a generation later, without any evidence of its efficacy, was the next big push to end cerebral palsy. It did not; it did, however, hasten an incredible rise in both the cesarean delivery rate and obstetric lawsuits.
The incorrectly named meconium aspiration syndrome (MAS) gave rise to decades of other ineffective and potentially harmful interventions. The belief that aspiration of meconium was mechanically leading to pulmonary hypertension in the newborn led to numerous efforts to prevent this aspiration, including amnioinfusion to dilute the concentration of meconium in the amniotic fluid, deep suctioning at the perineum, and intubation and suctioning of meconium below the cords (first in all babies, then in babies who were depressed, and now, finally in 2016, no babies). All of these practices were, as usual, introduced with no scientific evidence of safety or efficacy; and slowly, as studies were eventually performed that found them ineffective and potentially harmful, they have all been abandoned.
The practice of holding the baby below the level of the placenta and clamping the cord as quickly as possible was originally done because obstetricians realized that not doing so might allow the transfusion of another 100 ml or so of blood into the baby – and who would want that? Of course, over time, the extra 100 ml was understood to not be harmful and perhaps even beneficial (especially in preterm infants), and finally the 7th Edition of the Textbook of Neonatal Resuscitation recommends that all vigorous infants have at least 30-60 seconds of delayed cord clamping, with the baby right up on the mother’s chest! Joseph DeLee would roll over in his grave.
Vaginal birth is finally coming full-circle. Before doctors emerged on the scene, women gave birth in whatever position they wanted to and then they picked their babies up and held them to their chests while the cord slowly stopped pulsating and the placenta delivered. They received fewer, if any, vaginal exams (and when they did, the examiner wasn’t also sticking her hand in other infected women). If they had a small laceration, it wasn’t repaired (a practice currently endorsed by ACOG). When their babies didn’t cry, they warmed them and stimulated them until they (hopefully) started.
Now I am certainly not arguing for a return to the 18th century, where 1 in 8 women died in childbirth and half of all children died by the age of 5. I don’t think in the least that women should be denied the life-saving interventions of modern obstetrics or even just the benefit of a nice epidural. Yet, I can’t help but think that, overall, nature did a pretty good job in designing women to have babies. Birth is a natural process. Our job is to intervene when things go wrong (as they often do), but not try to redesign a natural process. Cesarean delivery is one of the greatest gifts to women and children; but its overuse today is a leading driver of maternal mortality. Too much of a good thing is not a good thing. This cautionary lesson is present in all of the above examples. Wash your hands when you do an exam; don’t use caustic acid to burn off the last remaining epithelial cells from the vagina.
What are the interventions and practices that really make a difference in a routine, normal vaginal delivery? I’m not sure that there are any. I’m just there in case something goes wrong: shoulder dystocia, prolapsed cord, labor dystocia, abruption, fetal distress, neonatal distress, postpartum hemorrhage, amniotic fluid embolism, fever and infections, etc. I have plenty to worry about. But I can worry about those things without reinventing the way women give birth.
Our predecessors were just as sure of the interventions listed above as we are today of the things we do. Which of the practices that we do today will those who come after us look at with equal scorn? I have a few ideas … but I don’t want to bias you.
Published Date : December 4, 2016
Categories : Other Stuff
The cost of US healthcare is growing at an unsustainable rate. Despite measures that were supposed to lower the rate of growth of healthcare costs (like the Affordable Care Act), the rate of growth in 2015 was 5.8%, the highest in 8 years. The total cost of healthcare in 2015 was $3.2T, with the Federal Government spending 29% of those dollars, and state and local governments contributing another 17%. Overall, healthcare costs were 18% of GDP and healthcare costs are growing at a rate significantly faster than the GDP. In 1995, healthcare costs were only 13.1% of GDP.
More important, in 2015 the Federal Government spent $990B on healthcare alone, which was 31% of an all-time record tax revenue of $3.2T (the government spent $3.7T with a deficit of $439B). State governments spent an additional $200B on healthcare, which was about 11% of state revenues. Local governments provided a similar amount of funding. Despite this record spending, there are still about 30 million uninsured people in the US, along with many millions more who have insurance that cannot practically be used due to high deductibles.
If you remember none of the above numbers, remember this: we spent the same amount on healthcare last year as the US treasury collected in a year with record high tax revenue: $3.2T, even with 1 in 11 folks uninsured. While overall healthcare cost growth was 5.8%, the federal Medicaid growth rate was 12.6%. Total federal healthcare expense growth was 8.9% in 2015 and 11% in 2014.
If you favor a governmental single payer system, then historically high tax revenue will have to at least double to make that happen. If you took every penny from the top 5% of US earners (100% taxation), you would only have $3.1T, still not enough to cover the rising costs of US healthcare. If we assume that a stable cost of about $3.5T would be necessary for universal coverage, and add the other 2015 costs of government to that, the Treasury would need to collect about $6.3T in taxes. This is equivalent to the entire incomes of the top 30% of Americans and and 70% of all dollars earned in America by US citizens (total income was $9T). What about free education? Hah.
The point? We need to lower the cost of healthcare. Here are 10 ways:
Bonus tip: Healthcare is not a right, it is a privilege; that doesn’t excuse us from decency and charity. Nothing that can be paid for can be construed as a right. The freedoms of speech and assembly don’t come with a cost to the right-bearer nor to the entity that grants the right. If you haven’t recently, take a look at our Bill of Rights; nothing in there costs a penny. I have the right to own a gun, but the government isn’t obligated to buy me one, nor are any of my fellow-citizens; if I want it, I’ll buy it, and me being unable to afford a gun doesn’t compel you to buy one for me.
There are a lot of things that are hard to live without, but that doesn’t give us a Constitutional right to posses them. Imagine if car ownership were a right: would everyone have a Pinto or a Ferrari? If everyone drove a Pinto, then innovation would by stymied and quality low. By everyone having a Pinto, the true right to car ownership would actually be dampened because choice and aspiration are removed; even if everyone had a Ferrari, some would want minivans and Toyotas instead (or maybe a Rolls).
In any event, the dream of universal Ferrari ownership would bankrupt us all. If this sounds like a false comparison, it is not: most poor Americans would prefer to have a functioning car with prepaid maintenance and gas expenses than health insurance. Mobility in most parts of the US is key to pursuing the American dream (to getting a job, visiting friends and family, even going to the doctor). The point is, healthcare is a luxury, not a right. We cannot lose track of this fact in our public discourse.
If we choose to define healthcare as a right (a relatively modern idea), then we must constrain it. What are the minimum aspects of the medical industry that must be granted to all? Breast implants? Lasik eye surgery? And what are the maximums? Liver transplants for 90 year olds? Perpetual life support for someone who is brain dead? By robbing people of their choice in healthcare, we rob them of their freedoms. It would be akin to granting you the right to free speech as long as what you say is contained on a certain list (don’t laugh – we live in an age where more and more people believe that speech should be limited). If it seems crazy to you that healthcare shouldn’t be considered a right, then you probably work in healthcare and have a rather biased and myopic view. Just because it is a priority for those of us in the field doesn’t mean it is a priority for others.
The belief that we should work to give healthcare to everyone in need has nothing do with the belief that healthcare is a fundamental right. Indeed, imagine a world where medical charity was common (right now the tax code doesn’t incentivize it and regulatory hurdles all but prevent it). Such changes are needed so that we can truly fulfill our ethic to serve those in need. But in the meantime, we must face the reality that serious (and painful) reforms must happen, or none of us will be able to benefit from healthcare in the future.
Published Date : December 4, 2016
Categories : Evidence Based Medicine, OB/Gyn
Uncontrolled variables are a huge problem with every scientific study. When we compare two blood pressure medications to one another, the results will have more validity if the two groups of patients are very similar in every characteristic. Differences in age, severity of preexisting disease, ethnicity, gender, etc. could make whatever conclusions are found next to worthless. Randomization and blinding are our best available tools to control for variables – both those we recognize and those we don’t – as well as for controlling for biases that we are aware of and those we haven’t even imagined.
Yet studies comparing one drug to another – fraught as they are with complexity and unknowns – are still incredibly simple when compared to studies that compare surgical techniques to one another. Not only are randomization and blinding major issues, but the individual skill of the operator (which can’t be controlled for) may turn out to be the biggest variable, and there are serious concerns about the general applicability of the findings of any surgical technique study.
Imagine we were comparing the quality of kitchen cabinets built with two different machines, say two different types of wood shapers. We select a variety of outcomes to follow. Some are subjective, like beauty, sales price, and machine operator satisfaction (compare to pain scores, cosmesis ratings, or surgeon satisfaction), while other are objective, like time to manufacture, material wastage, and cost (compare to operating room time, complication rate/blood less, and cost of procedure). The tricky bit comes next. In order to produce a higher quality study, we would like to track several thousand cabinets of different sizes and styles, and we would like to do so in a short amount of time.
We could find two different factories that already use the two different methods and look at their outcomes, but that’s not very satisfying. There are many hundreds of other important differences in the two factories that might actually be responsible for the different outcomes, and adjusting for even the known variables is an almost impossible task, let alone the unknown variables. So we can either have the same factory use the same methods and then switch to another method, have different factories using a third method switch to new methods, or some other mishmash. What’s worse, individual machine operators within each factory may have vastly different levels of skill and expertise.
A very competent and seasoned machine operator may get excellent results out of both machines, while I (having never used either) may get bollocks from both. Or, someone very competent with one machine may struggle adapting to the other machine and vice versa. A new method or machine may have a steep learning curve, and learning curves vary widely. Someone naive to both machines may still have inferred skill from another similar process with which he has competency. A group of naive operators may produce better results overall with the machine with a shorter learning curve, even though in the long term (after the study has ended), the other machine is vastly superior. What’s more, the folks who design and implement the study may favor one machine over another (perhaps their company makes that machine) and when they teach operators how to use the two machines, they are just simply better at teaching their own machine (because they have more experience with it, more familiarity, truly believe in it, etc.).
Such are the problems with surgical technique/tool studies. It should suffice to say that an excellent surgeon is likely to produce better results with the worst technique on most days than a bad surgeon will produce with the very best techniques on the best days; and, in most cases, a surgeon will do much better with a technique with which she is familiar and accomplished than she will with a new technique with a steep learning curve. It turns out, people truly are the most important variable in such studies.
Randomization is difficult. Usually in such a study, different operators who use different techniques are being compared, so it is difficult to truly randomize the technique to the patient. Blinding is nearly impossible; obviously the surgeon knows what method she is using. These pitfalls automatically tend to land studies about surgical technique near the bottom of the quality evidence pile.
Even if these limitations can be overcome, the general applicability of a technique may be wanting. Just because I’m really, really good at something – and demonstrate it with super-awesome reports of my amazingness – doesn’t mean that everyone else can read a paper or watch a video or attend a weekend workshop and all of a sudden share in those amazing outcomes. A technique may be amazing, but if it is not generalizable to a large population of average-skilled surgeons (and assistants), it doesn’t mean a whole lot.
Take this study for example, which found that laparoscopic hysterectomy (LH) was associated with less pain, less need for pain medicine, and a shorter length of stay than vaginal hysterectomy (VH). This study was heavily promoted by the surgical equipment industry, since it purported to show a definitive advantage of LH over VH. The patients were randomized to receive one of the two surgical approaches and the surgeries were performed by the same team, who presented themselves as adept at both approaches. But were they? Are these finding generally applicable?
For starters, they did not use two techniques of VH that are known to decrease postoperative pain (an energy sealing device for sealing pedicles and intraoperative paracervical blockade). But aside from this, the most telling statistics presented in the paper are the average lengths of stay: 1 day for LH and just over 2 days for VH. This is simply an amazing statistic for VH length of stay in an era where same day discharge for VH is common (I personally have sent hundreds of VHs home within 5 hours of surgery). This bizarre finding tells me that the surgeons were simply more skilled with LH than VH.
What was purported as a strength of the study (the same surgeons performing both approaches) is actually a weakness when we realize that they are not equally adept at both techniques. The article reports no conflicts of interest, but a simple Google search reveals that the lead author (Ghezzi) has a financial relationship with Karl Storz GmbH & Co, KG, whose products he endorses in this and other articles. Hmm.
Not all great surgeries are generalizable. Here is Part 1 and Part 2 of an awesome straight-stick, laparoscopic extraperitoneal aortic lymph node dissection. This guy is fantastic and if I were a woman with cancer I would let him operate on me. But his skill level and seeming ease in doing a complex surgery are not necessarily teachable to average surgeons. Fun to watch, but I don’t expect the average surgeon to be doing this anytime soon. Sometimes the techniques that get published for certain surgeries or a certain series of patients reflect outcomes and complication rates not attainable by we mere mortals. That’s okay, but it reinforces the idea that sometimes the best surgery is the one we can all do well.
The point of all of this is that when it comes to surgery, the surgeon (in most cases) is by far the most important factor in outcome differences. This excellent piece discusses some research that drives home this point (and shows some cool videos). If you are interested in finding out who the “good” surgeons are, by the way, don’t waste too much time looking. With a few noted exceptions, such transparent data simply isn’t available. Because surgeons fight against such transparency, there is sometimes an idea that we are all interchangeable cogs of a machine; scientific studies need to assume this for standardization and employers and payers don’t always recognize the importance of high quality physicians.
But the skill of the surgeon is likely the single most important variable of any surgery. Are all board-certified OB/Gyns equal? Of course not. There is a wide variation in competencies and outcomes. Cesarean delivery rates range from around 10% to over 60% among obstetricians in similar communities who are supposedly all following the same evidence-based labor management guidelines. Vaginal hysterectomy rates vary from 0% to 95+% among board-certified gynecologists. Are we all equally competent? Hardly.
The surgical skill of surgeons, like most things in life, tends to fall along a bell-shaped curve. If we want to improve the quality of care provided to a wide variety of our patients, there is only so much that we can do in terms of increasing the surgical skill of surgeons. Residency programs are providing fewer opportunities than ever to develop the surgical skills of our future physicians (too much too learn, too little time to do it, and fewer patients who need complex surgeries). Some skills are being lost to history as the techniques and practice of them are going extinct with a retiring generation of physicians (e.g., breech delivery, Scanzoni maneuver) while other skills are threatened species that exist only in some zoos and preserves (e.g., vaginal hysterectomy, external cephalic version, forceps delivery).
To increase the quality and safety of surgery, we need to address improving the surgical skills and education of our residents and young physicians in practice. But we also need to focus on enabling technologies.
An enabling technology is an innovation or invention that can be used to enhance the ability of a user. The personal computer is an example of an enabling technology. I can do a lot more things today (make movies, edit photos, write this blog, do calculus, make music, search the world’s libraries, etc.) than those who lived a generation before me; and I can do them better, quicker, and cheaper.
Ted Anderson (Vanderbilt University) has described this concept extensively in the field of gynecology. He points out that global endometrial ablation devices (like the NovaSure) are an example of an enabling technology. Ted is an expert in rollerball endometrial ablation. Yet it is unrealistic that he will be able to train the vast majority of gynecologists to be as good as he is; apart from innate skill, there just aren’t enough cases available for learners to gain sufficient experience. The outcomes of rollerball ablations performed by his trainees are considerably subpar compared to outcomes of his own series of hundreds of rollerball ablations. But with endometrial ablation devices, his learners can achieve similar outcomes. What’s more, the safety of endometrial ablation devices, cost, length of surgery, etc., are all superior. So without having to make surgeons dramatically better, we can extend the safe, quality outcomes of endometrial ablation to millions of women, not just the few thousand who have access to high quality surgeons like Dr. Anderson.
The mid-urethral sling is another example of an enabling technology. Prior to transvaginal (TVT) and transobturator tapes (TOT), incontinence procedures required a considerably greater amount of surgical skills and were more morbid for patients. Only a small percentage of gynecologists were good at things like retropubic urethropexies (i.e., Burches and MMKs) or pubovaginal slings with harvested autologous materials. I enjoyed doing laparoscopic Burch procedures and always thought it was a fun surgery; but I would much rather teach a resident how to do a TOT. I can teach almost any resident to do a TOT, but a laparoscopic Burch to only a handful. A TOT is therefore an enabling procedure: it greatly expands the reach and safety of incontinence procedures.
Use of an energy-sealing device (like the Ligasure) for vaginal hysterectomy is my favorite enabling technology. The device greatly expands the number of gynecologists who can competently perform vaginal hysterectomy and it also greatly expands the number of women who are candidates for vaginal hysterectomy. Patient outcomes are uniformly better, and even though we have to spend money on the device, total cost goes down due to shorter surgeries, fewer complications, and shorter lengths of stay.
An enabling technology might also be a particular technique for a surgery. An enabling technology doesn’t necessarily provide the best outcome in the best hands, but it provides the best outcome in average hands. There are dozens of similar examples. Yet, many surgeons are opposed to using enabling technologies. Imagine if you went to work for a typesetter who refused to let you use your computer and laser printer because “real” typesetters should know how to make lead-type by hand or use a Linotype machine. It may make him feel superior to all the ‘hacks out there using Macs’ but he is providing more expensive work to fewer clients for higher cost. He will soon find himself without any clients (or students).
We should all embrace enabling technologies. The hallmarks of a great surgeon are not stubbornness and anachronism but rather flexibility and innovation.
Published Date : November 28, 2016
Categories : Cognitive Bias, Evidence Based Medicine, OB/Gyn
Most people, it turns out, aren’t really all that skeptical. The 2016 election has taught us that what passes as ‘news’ isn’t always very true. Fewer and fewer things seem implausible to people who have stopped being surprised by what once was seemingly impossible; and folks don’t distinguish between what’s possible and what’s probable. There are three different issues that independently and sometimes simultaneously lead people to believe untruths. Let me explain.
Our patients (and our colleagues) source information in this way. Don’t be surprised when they present with far-fetched ideas (“my IUD is causing me to have headaches”) because they definitely “researched it” and have it on good authority (here and 124,000 other places). This trending video on Facebook, published by the respectable sounding healthforallwomen.com, currently has over 12 million views (and counting). Why so many views? Because it is a video about the “health benefits” of not wearing bras (hopefully you can figure out why it so popular). Among the video’s several claims is that wearing a bra increases the risk of breast cancer, up to 100x! This type of social media-promulgated drivel is what our patients see and accept at face value. Before you hate on Facebook, realize that a video like this is no less factual than a typical episode of Dr. Oz. This same website also recommends using milk thistle and dandelion root to counteract the oncogenic effects of estrogen. Wow. Does wearing a bra increase the risk of breast cancer? No it doesn’t, but as of this writing about 12 million more people have watched that video than have read this article.
Often, our patients are most convinced by the experiences and opinions of other patients, found on message boards and chat rooms. These sources of information seem to be the best to them, since other patients appear to answer their common questions honestly and without ulterior motivation. But, this source of information is often the least informed and least evidence-based. We call it anecdotal evidence. This type of evidence is betrayed by the patients’ biases, false conclusions, and misassumptions.
So are doctors any better? Not in the least. The lack of robust skepticism is not related to educational attainment. Physicians live in echo chambers. They associate with colleagues who echo back to them their own practice patterns and beliefs. Physicians accept things as true because of science that they don’t really understand, believing that a p-value below 0.05 is as good as gold (especially if the conclusion of the study already agrees with what they thought anyway). Physicians have their own establishment and authority figures: if a study is printed in the New England Journal of Medicine, it is often given a pass as factual, high-quality, and impactful. And physicians crowd source information (e.g., regional standard of care) and ask their colleagues what to do (e.g., curbside consult) rather than analyze a problem using high-quality evidence (when available). The majority of physicians narcissistically prefer anecdotal experiences over systematic evidence.
We all need to step out of the echo chamber. The more firmly you believe something, the more vigilantly you should seek to challenge it. Be skeptical of all sources of information and seek to disprove what you read, like a true scientist. Don’t accept anything just because it makes sense, nor reject anything just because it seems implausible or unconscionable. Seek quality evidence for all your beliefs and theories. Consider the real possibility that you might be wrong about almost everything you believe.
Published Date : November 28, 2016
Categories : Cognitive Bias, Evidence Based Medicine
He uses statistics as a drunken man uses lamp posts – for support rather than for illumination. – Andrew Lang
This all sounds fantastic! But what exactly is accuracy? Recall that,
Accuracy = (True Positives+True Negatives)/(True Positives + False Positives + False Negatives + True Negatives).
In other words, accuracy is the total number of true results divided by the total number of results. So imagine that 1 person among 500 has a particular disease. The test fails to detect the person with disease (FN=1) while falsely identifying 2 people as having the disease (FP=2). The other 497 patients are true negatives (TN=497). There are no true positives. What is the accuracy of this test?
Accuracy = (0+497)/(0+2+1+497) or 497/500 = 99.4%!
So, a test with 99.4% accuracy failed to identify the one person with disease and falsely identified two people as having the disease who did not. If it had correctly identified the one person with the disease, it would have been 99.6% accurate, even though 2 out of 3 positive results were false positives. In neither example does the word accuracy mean what we might think it should mean intuitively.
Advertisers and study-authors love the word accuracy. What about the other examples at the conference? Well a test whose accuracy increases from 96% to 98% could be advertised as having improved accuracy by more than 50% (since it reduced inaccuracy by 50%). The wording of such a claim is, in fact, as ambiguous as the word accuracy itself. The word accuracy is a tool for the dishonest.
Stop using the word accuracy and be leery of those who do.
What about the false positive rate of our imaginary test?
The false positive rate = 1 – specificity or 1 – TN/(TN+FP). In our hypothetical scenario, the false positive rate is 0.4%. That sounds excellent and it is indeed accurate: only two people out of 500 had false positives. But that number is clinically useless and doesn’t tell the real story. We don’t order 500 tests at a time, we order 1 at a time. So when a patient is sitting in front of you with a positive test result, knowing that the false positive rate is 0.4% doesn’t help you tell the patient that her result actually has a 2/3 chance of being falsely positive (nor does the accuracy rate help us understand this).
On the other hand, the positive predictive value (PPV) is helpful. Recall that PPV = TP/(TP+FP) or 1/3 in this case. That number is exactly what you need in order to counsel the patient sitting in front of you with a positive result: She has a 1 in three chance that her test result is accurate.
The Quad Screen test for antenatal Down syndrome screening is an interesting example. The test is designed to have only a 5% false positive rate. This statistic is repeated over and over again in nearly ever description of the test: a 5% false positive rate. Yet, the screen is “positive” at any result which gives a ≥ 1 in 270 chance of Down syndrome. About 97.9% of positive results are false positives. Knowing about the 5% false positive rate doesn’t help you in the least bit interpret a positive test result nor does it help you counsel a patient who might be interested in the test. The 5% false positive rate means that 1 in 20 women who take the test will have a false positive result; by contrast, about 1 in 950 women will have a true positive result. In other words, for every true positive result, there will be about 47.5 false positives, a positive predictive value of only 2.1%.
Stop using the misleading false positive rate and start using positive predictive value instead.
Maybe this would have been a better quote to start with:
Truth does not consist in minute accuracy of detail, but in conveying a right impression. – Henry Alford