Why is it that I still remember that the formula for the volume of a sphere is 4/3π r3, which I learned in tenth grade geometry? And why is it that I never even heard of a p-value, the measure commonly used to assess whether a result is “statistically significant,” until I was in medical school? I haven’t had any occasion to compute the volume of a sphere since I took calculus in college, but I have to interpret statistical findings all the time. Something is not right here.
Understanding at least the rudiments of statistics matters—and not just to me, a physician who has to make decisions about how to treat patients by evaluating articles in the medical literature that rely on statistical methodology. Understanding basic statistics matters to everyone. You need to know some statistics to realize that it is more accurate to measure the population by using sampling techniques than by trying to count everyone. You need to know some statistics to understand why Nate Silver, with his FiveThirtyEight website, was so much more on target in his predictions about the 2012 presidential elections than anyone else. And you need to know some statistics to decide, as a patient, how to evaluate the options your physician presents you with.
Just this morning, I read an article in the first section of the NY Times “Study Discounts Testosterone-Suppressing Therapy for Early Prostate Cancer.” It turns out that millions of men with early stage prostate cancer, mainly men over the age of 65, have been treated with “Androgen Deprivation Therapy” (ADT), either by bilateral orchiectomy (surgical removal of the testes) or by drugs. A new study, published in JAMA Internal Medicine, concludes that ADT in such men does not prolong life. It does cause lots of side effects, ranging from osteoporosis to weight gain, to decreased libido, to diabetes. The article quotes one expert who was not involved in the study as saying that the findings were “eye-opening and even alarming.” According to editorial writers from the Dana Farber Cancer Institute, the treatment is a good candidate for inclusion in the “Choosing Wisely” campaign, a national effort to eliminate the use of “low value medicine;” that is, treatments that achieve little, given their cost. The article fits in nicely with a major theme of JAMA Internal Medicine, which has a section called “Less is More.” It’s a theme that resonates with me as well: I often argue on this blog that certain treatments, especially when provided to frail, older individuals, may cause more harm than good. Finding that a commonly used treatment, such as ADT in older men, doesn’t do what it promises, would not be at all surprising to me. But is it true?
I looked up the article, which isn’t actually in the print issue of the journal yet; it was published in the “online first” section, which gets important articles distributed quickly. The authors looked at data on 66,717 men age 66 or older with localized prostate cancer diagnosed between 1992 and 2009. They defined “primary ADT” as orchiectomy or the use of a drug such as a luteinizing hormone releasing agonist (a drug that stimulates the pituitary to signal the testes to make testosterone until they run out, at which point testosterone levels fall) as the sole cancer therapy given to men with localized prostate cancer within 6 months of diagnosis. The outcomes they were interested in were cancer specific mortality (that is, the death rate from prostate cancer) and overall mortality. So far so good.
But since this was not a randomized study in which some men got ADT and others received conservative management (ie no treatment unless symptoms develop), with the selection made based on the flip of a coin, there was no reason to believe that the two groups of men would be similar to one another. In fact, they were quite different. The men who got ADT were a good bit older than those who did not (average age 79 vs 77). They were considerably sicker, with higher rates of other diseases such as heart disease or lung disease. And they were far more likely to have “high risk” prostate cancer, based on the characteristics of the cells in their tumors (47.7% vs 23%). Their PSA scores were also much higher (an average of 19.5 in the ADT group compared to 11.1 in the other men, where 4 is the typical cutoff for normal). Simply comparing the outcomes in these 2 very dissimilar groups of men would not tell the whole story. Somehow, the authors needed to try to compensate for the inherent differences between the men. The only way to do that (other than scrapping this approach entirely and randomizing men to get ADT or some other treatment), is to build a statistical model.
Build a model the study authors did. The specifics of what they actually did are too complicated to describe here. I’m not sure I fully understand what they did, but it involved a technique called “Instrumental Variable Analysis,” known as IV. Suffice it to say that when they used this approach to try to adjust for all the differences between the groups (only some of which they could specify), they concluded that the 15-year prostate cancer specific survival rate was 85.4% in both groups. And when they used a different method, the Cox multivariate model, they found the mortality rate was 2.4/100 in the ADT group and 1.1/100 in the group treated with conservative management or, after attempting to adjust for differences based on what was known about other illnesses, PSA levels, etcetera, the group treated with ADT was 1.53 times more likely to die.
What the reader needs to understand is that the results of the study depend entirely on which model you choose. If you select IV, and the authors try hard to make the case that this is an excellent choice, but which some experts think is a flawed approach, you find that ADT and conservative therapy are equivalent. If you select the more conventional approach, you find that ADT is actually worse than watchful weighting. Since neither model predicts that ADT is better than conservative management, perhaps it follows that ADT is just a bad choice for the treatment of early prostate cancer in older men. The right conclusion, I think, is that we don’t actually know what to make of ADT. If we chose yet another model, perhaps we would find that ADT is superior.
Learning about different study designs—which ones you can trust, which ones are merely suggestive and which have to be confirmed using a better, more reliable approach—is what kids should be learning in high school and college. Learning about probability and statistics is what kids should be learning, not trigonometry and solid geometry. Our math curriculum reflects seventeenth century mathematical knowledge (it typically includes elementary algebra, Euclidean geometry, and perhaps calculus, created in the fourth century BCE and the seventeenth centuries respectively).
Today, big data is all the rage and there is a growing enthusiasm for learning how to milk large data sets for useful information. But the reality is that it’s not just big data that’s important and it’s not just important for a small cadre of people. We all need to learn how to make sense of what we read in the newspapers, of what our doctors tell us about different treatments. And to do that, we need to develop basic statistical literacy.