July 20, 2014

Big Data, Small Data, Any Data At All

Why is it that I still remember that the formula for the volume of a sphere is 4/3π r3, which I learned in tenth grade geometry? And why is it that I never even heard of a p-value, the measure commonly used to assess whether a result is “statistically significant,” until I was in medical school? I haven’t had any occasion to compute the volume of a sphere since I took calculus in college, but I have to interpret statistical findings all the time. Something is not right here.

Understanding at least the rudiments of statistics matters—and not just to me, a physician who has to make decisions about how to treat patients by evaluating articles in the medical literature that rely on statistical methodology. Understanding basic statistics matters to everyone. You need to know some statistics to realize that it is more accurate to measure the population by using sampling techniques than by trying to count everyone. You need to know some statistics to understand why Nate Silver, with his FiveThirtyEight website, was so much more on target in his predictions about the 2012 presidential elections than anyone else. And you need to know some statistics to decide, as a patient, how to evaluate the options your physician presents you with.

Just this morning, I read an article in the first section of the NY Times “Study Discounts Testosterone-Suppressing Therapy for Early Prostate Cancer.” It turns out that millions of men with early stage prostate cancer, mainly men over the age of 65, have been treated with “Androgen Deprivation Therapy” (ADT), either by bilateral orchiectomy (surgical removal of the testes) or by drugs. A new study, published in JAMA Internal Medicine, concludes that ADT in such men does not prolong life. It does cause lots of side effects, ranging from osteoporosis to weight gain, to decreased libido, to diabetes. The article quotes one expert who was not involved in the study as saying that the findings were “eye-opening and even alarming.” According to editorial writers from the Dana Farber Cancer Institute, the treatment is a good candidate for inclusion in the “Choosing Wisely” campaign, a national effort to eliminate the use of “low value medicine;” that is, treatments that achieve little, given their cost. The article fits in nicely with a major theme of JAMA Internal Medicine, which has a section called “Less is More.” It’s a theme that resonates with me as well: I often argue on this blog that certain treatments, especially when provided to frail, older individuals, may cause more harm than good. Finding that a commonly used treatment, such as ADT in older men, doesn’t do what it promises, would not be at all surprising to me. But is it true?

I looked up the article, which isn’t actually in the print issue of the journal yet; it was published in the “online first” section, which gets important articles distributed quickly. The authors looked at data on 66,717 men age 66 or older with localized prostate cancer diagnosed between 1992 and 2009. They defined “primary ADT” as orchiectomy or the use of a drug such as a luteinizing hormone releasing agonist (a drug that stimulates the pituitary to signal the testes to make testosterone until they run out, at which point testosterone levels fall) as the sole cancer therapy given to men with localized prostate cancer within 6 months of diagnosis. The outcomes they were interested in were cancer specific mortality (that is, the death rate from prostate cancer) and overall mortality. So far so good. 

But since this was not a randomized study in which some men got ADT and others received conservative management (ie no treatment unless symptoms develop), with the selection made based on the flip of a coin, there was no reason to believe that the two groups of men would be similar to one another. In fact, they were quite different. The men who got ADT were a good bit older than those who did not (average age 79 vs 77). They were considerably sicker, with higher rates of other diseases such as heart disease or lung disease. And they were far more likely to have “high risk” prostate cancer, based on the characteristics of the cells in their tumors (47.7% vs 23%). Their PSA scores were also much higher (an average of 19.5 in the ADT group compared to 11.1 in the other men, where 4 is the typical cutoff for normal). Simply comparing the outcomes in these 2 very dissimilar groups of men would not tell the whole story. Somehow, the authors needed to try to compensate for the inherent differences between the men. The only way to do that (other than scrapping this approach entirely and randomizing men to get ADT or some other treatment), is to build a statistical model.

Build a model the study authors did. The specifics of what they actually did are too complicated to describe here. I’m not sure I fully understand what they did, but it involved a technique called “Instrumental Variable Analysis,” known as IV. Suffice it to say that when they used this approach to try to adjust for all the differences between the groups (only some of which they could specify), they concluded that the 15-year prostate cancer specific survival rate was 85.4% in both groups. And when they used a different method, the Cox multivariate model, they found the mortality rate was 2.4/100 in the ADT group and 1.1/100 in the group treated with conservative management or, after attempting to adjust for differences based on what was known about other illnesses, PSA levels, etcetera, the group treated with ADT was 1.53 times more likely to die.

What the reader needs to understand is that the results of the study depend entirely on which model you choose. If you select IV, and the authors try hard to make the case that this is an excellent choice, but which some experts think is a flawed approach, you find that ADT and conservative therapy are equivalent. If you select the more conventional approach, you find that ADT is actually worse than watchful weighting. Since neither model predicts that ADT is better than conservative management, perhaps it follows that ADT is just a bad choice for the treatment of early prostate cancer in older men. The right conclusion, I think, is that we don’t actually know what to make of ADT. If we chose yet another model, perhaps we would find that ADT is superior.


Learning about different study designs—which ones you can trust, which ones are merely suggestive and which have to be confirmed using a better, more reliable approach—is what kids should be learning in high school and college. Learning about probability and statistics is what kids should be learning, not trigonometry and solid geometry. Our math curriculum reflects seventeenth century mathematical knowledge (it typically includes elementary algebra, Euclidean geometry, and perhaps calculus, created in the fourth century BCE and the seventeenth centuries respectively). 

Today, big data is all the rage and there is a growing enthusiasm for learning how to milk large data sets for useful information. But the reality is that it’s not just big data that’s important and it’s not just important for a small cadre of people. We all need to learn how to make sense of what we read in the newspapers, of what our doctors tell us about different treatments. And to do that, we need to develop basic statistical literacy.

July 14, 2014

Remembering Tacrine

A shiver went down my spine when I looked through the New England Journal of Medicine on November 13, 1986. There, in typically stodgy medicalese, was the report of a study that claimed to dramatically improve the cognition of patients with moderate to severe Alzheimer’s disease. It was what I, as a recently minted geriatrician, had been fervently wishing for but despaired of ever seeing—a glimmer of hope that we would have a medication to treat patients with Alzheimer’s disease. So I was struck in reading this week’s “Feature” article in the BMJ, “Alzheimer’s disease: still a perplexing problem,” by the author’s observation that the few drugs available for treating Alzheimer’s today are all agents that, like the drug I read about in 1986, target the production of the neurotransmitter, acetylcholine, which is vastly decreased in the brains of people with Alzheimer’s. In fact, most of the existing drugs are “anticholinesterases,” derivatives of the drug reported on in 1986, tetrahydroaminoacridine (ultimately shortened to tacrine). The only available drug for Alzheimer’s that isn’t an anticholinesterase is mementine (Namenda) approved by the FDA in 2003, which also affects acetylcholine but works in a different way. None of these drugs is terribly effective and NICE, the British agency that advises the National Health Service on coverage for drugs and devices, gives them only the most tepid recommendation.  So what, I wondered, ever happened to William Summers, the lead author of the original New England Journal study that created such a stir nearly 28 years ago?

I was familiar with the early history of William Summers and tacrine (THA). The New England Journal of Medicine published an editorial along with the article, heralding the drug as a breakthrough, but the community of Alzheimer’s researchers reacted with venom. They commented on the shortcomings of the study: it only involved 17 patients, the patients served as their own controls, and the conclusions rested on the patients’ performance on a handful of tests of cognition, some of which were unfamiliar or of questionable clinical importance, including a “global assessment of cognition” performed by the examining physician. All valid concerns. Nonetheless, the findings were impressive: the difference between treated patients and untreated patients was statistically significant with a p-value of .0003. That is, if THA actually were ineffective, the chance of the investigators observing what they did was about 3 in 10,000. And the authors didn’t claim to have discovered a miracle drug. They said: “These encouraging initial results suggest that THA may be at least temporarily useful in the long-term palliative treatment of patients with Alzheimer’s Disease. We stress that further observations will be required before a clearer assessment of the role of this agent can be made.”

The deeper concern seemed to have nothing to do with the merits—or weaknesses—of the study. Basically, the question was: Who is William Summers? Summers was not part of the established Alzheimer’s research community. He carried out the study on patients in his private practice. He had gone to medical school and completed a residency in Internal medicine and Psychiatry at Washington University in St. Louis, Missouri and had been on the faculty at the University of Pittsburgh, the University of Southern California and, at the time the initial article came out, at UCLA. But he had no funding, let alone facilities or other forms of support, from either a major research university or a drug company. He had little background carrying out research. In short, he wasn’t one of the boys.

The National Institute on Aging and the Alzheimer’s Association agreed to jointly sponsor the Tacrine Collaborative Study Group to evaluate the drug more definitively. But in the course of its review of Summers’ research, a routine step that takes place when a researcher applies for a chemical to be considered as an “investigational drug,” the FDA uncovered a variety of sloppy research methods. For the next 3 ½ years, Summers and the FDA were at war until finally, in 1991, the FDA formally exonerated Summers of all substantive charges. UCLA, the medical school where Summers had an appointment as a Research Associate Professor, also conducted an investigation. Reporting for the LA Times, writer Janny Scott quoted the panel as finding “severe shortcomings in the design, execution, and report of Summers’ study” but concluding that “these limitations did not undermine the conclusion that the drug had helped some of the patients studied.”

The Tacrine Collaborative largely confirmed Summers’ early findings, this time studying 215 patients at multiple different clinical centers. In 1993, the FDA approved tacrine for use in the treatment of Alzheimer’s disease. Tacrine is no longer on the market, but not because it was found to be ineffective. It was simply superseded by another anticholinesterase that is easier to take (dosing is once a day instead of 3 times a day) and that has even fewer side effects than tacrine. The new drug, Aricept, was a blockbuster drug, accounting for $3 billion in sales worldwide in 2010, before it went off patent. Aricept is not a great drug, any more than Tacrine was, nor are the other 2 anticholinesterases on the market, rivastigmine (Exelon) and galantamine (Razadyne). But Aricept, like Tacrine, fills a void. It gives patients and families something to do, some drug to take; and it may confer a modest benefit, at least for a short time. It’s no worse than a great many other drugs we prescribe for patients, many of them far more toxic and far pricier.

And Summers? He dropped out of sight.

Summers didn’t entirely drop out of sight. PubMedwhich keeps tabs on all publications in medical journals, lists 13 articles that Summers has authored since the New England Journal paper in 1986, most in minor journals such as the Journal of Alzheimer’s Disease. His most recent paper, published in 2010, is a study of a “complex antioxidant blend” on cognitive function. His website explains that he offers “care not found in modern healthcare system,” and that he uses an “innovative mix” of traditional medicine, supplements, and homeopathic remedies. Clearly, William Summers is way out on the fringe of respectability. But was he always on the fringe, or was he driven out of mainstream medicine, deterred from pursuing an innovative research career, because he was not an insider?


I don’t know the answer. And I recognize the need to adhere to high standards in the research arena. All too many articles are published of studies that use weak if not downright inadequate methodology. But I can’t help wondering: are funding agencies such as NIH as well as private non-for-profit foundations such as the Robert Wood Johnson Foundation stultifying research by limiting funding to very conventional applicants? I wonder, and I worry.

June 29, 2014

Dollars for Dying

A recent article in the Huffington Post focused attention on an often neglected part of the American health care scene, hospice. Unfortunately, by relying largely on interviews, chiefly with angry and very vocal family members of patients who died while enrolled in hospice, and very little on data, the article distorts reality.

The article:  “..Dying became a multibillion dollar industry.”
The reality: Just under 2.5 million Americans die each year, three-quarters of them over age 65. It’s about time we started to spend money on taking care of them competently and compassionately. Hospitals are also a multi-billion dollar industry. Is that bad?

The article: “Many providers are imperiling the health of patients in a drive to boost revenues and enroll more people;” i.e., the problems are due to for-profit hospices.
The reality: 63% of the 5500 hospices in existence today are for-profit, along with a growing proportion of hospitals and medical practices, but despite several studies comparing the quality of for profit hospices to that of not-for-profit hospitals, it’s been impossible to pinpoint any important areas where ownership status predicts quality. A study by Wachterman et al in JAMA in 2011 showed that the median length of stay was longer in for-profit hospices (20 days vs 16 days), largely reflecting a larger proportion of patients with dementia and a smaller proportion of patients with cancer. But does this demonstrate poor quality? Don’t patients with Alzheimer’s disease deserve to be enrolled in hospice? Isn’t the median length of stay in hospice widely held to be excessively short, with 35% of patients dying within a week of enrollment? And don’t other health care institutions deliberately offer some lucrative services (think hospitals providing a transplant program) in order to subsidize money-losing services (think mental health programs?) What’s so surprising or unfortunate about for-profit hospices having the business savvy to compensate for the low per diem reimbursement (Medicare paid hospices $153/day for home care in 2012, which was supposed to pay for nursing visits, home health aides, social work care, medications, and supplies)? Another study in 2011 reported the results of a national survey of hospice programs and found adherence to National Quality Forum measures of quality to be high, with for-profit hospices doing better in some areas and not-for-profit hospices in others. 

The article: Hospice representatives “troll the halls” of hospitals in search of patients.
The reality: Since the seminal SUPPORT study of 1995 documenting the large proportion of American hospitalized patients who die in pain, burdened by invasive and ultimately non-beneficial care, there has been widespread acknowledgment that Americans receive too little palliative care. One study done in nursing homes found that when attending physicians received a note informing them that a patient met the criteria for hospice, the rate of hospice referral soared from 1% of eligible patients to 20%. Are reminders to physicians overly intrusive? Is having a hospice representative available at the hospital to discuss their program with patients promptly when called by the attending physician aggressive? Or does it facilitate a quick and smooth transition from curative care to comfort care?

That said, anyone who has taken care of patients enrolled in hospice is aware that the quality of hospice care is not uniformly excellent, despite surveys revealing that 75% of families with a relative under hospice care at the time of death rated quality as excellent, compared to only 49% of families whose dying relative was cared for in the hospital. What can we do to further improve the hospice experience?

Three approaches come to mind, regulatory, economic, and educational. The regulatory tack, advocated in the Huffington article, is reasonable—if exercised cautiously. Nursing homes were transformed from unsanitary firetraps warehouses for elderly individuals to sterile, medicalized facilities. They are now the most highly regulated American industry; some have claimed more tightly regulated than nuclear power plants. In the process, they stopped being homes and became institutions. The majority of hospice care is delivered in the home. The challenge will be to design regulations that promote quality and prevent abuses without destroying the essence of hospice care. 

The economic strategy is what MedPAC (the Medicare Payment Advisory Commission) advocates. At MedPAC’s recommendation, the ACA includes a provision allowing Medicare to change its current uniform daily rate to a higher rate for the first few days and the final days a patient is enrolled in hospice, with a lower rate for the intervening days. This will make enrolling patients with dementia less financially attractive, thus ensuring compliance with hospice’s current eligibility criteria, though whether this will improve quality is questionable. 

The educational strategy, really an approach to communication, translates the claim that hospices enroll people who “don’t  belong in hospice” into a concern about poor communication. I have watched hospital discharge planners and hospice representatives promote hospice to patients. They describe all the services that hospice will provide: nursing visits, home health aide hours, respite care, bereavement services, prescription medications, and so forth. They typically say less about what hospice does not provide, such as (with the exception of so called open access hospice) palliative radiation or palliative chemotherapy or blood transfusions. When I broach hospice care with a patient, I start by determining the patient’s overriding goal of care. Is it to live as long as possible? Is it to focus exclusively on comfort? Or is it somewhere in between—mainly wanting comfort but being willing to put up with certain kinds of unpleasant medical treatments in exchange for living longer? Only once I know what the patient wants—and ideally am satisfied that the family accepts the patient’s perspective—do we talk about the best way to achieve their goals. If hospice is the best way, then I tell them so. If it’s not, we discuss what approach would be most conducive to their goals.

Hospice is far from perfect and my natural tendency is to be suspicious of for-profit health care. But what's wrong with hospice care and how to fix it are not quite so obvious. We need more data and better analysis before we act.




June 23, 2014

Politicians: Keep Out

This past week, 150 Congressmen sent a letter to the Centers for Medicare and Medicaid Services (CMS), urging that it approve reimbursement for lung cancer screening with lowdose CT scans. These legislators are trying to exercise their political muscle in an arena where they have no business intervening. Medicare has a fair, transparent, and extremely thoughtful process for deciding what tests to cover. The attempt to destroy this honest, objective, and time-tested process by injecting political pressure is reprehensible.

The Medicare program is required by statute to limit coverage to tests and services that are “reasonable and necessary” for the treatment of illness.  Most “coverage decisions” continue to be made by local intermediaries, the regional contractors that function as Medicare’s agents throughout the country. But occasionally, for particularly important decisions, Medicare issues a National Coverage Determination which is then binding on all its intermediaries.  A few months ago, CMS was asked to make a decision about paying for lung cancer screening using low-dose CT scans. It has diligently been conducting a thorough, comprehensive assessment.

What Medicare often does is to ask MEDCAC, the Medicare Evidence Development and Coverage Advisory Committee, to collect information about the procedure it is supposed to evaluate and to discuss, publicly, its evaluation of the information. MEDCAC is an independent panel of 100 people, drawn from medicine, industry, science, ethics, public health, economics, and the public, from whom up to 15 people are chosen to address any particular issue that comes up. Medicare asked MEDCAC to review low-dose CT screening and on April 30,  the committee had an all day meeting. The agenda is available on line. The evidence was presented and discussed. The committee voted—each person’s vote is public and each person was asked to explain the rationale for his vote. What the committee concluded was that the evidence supporting the use of low-dose CT scanning to screen for lung cancer in the Medicare population just wasn’t there. The transcript of the entire meeting is on line. It runs to 310 pages.

To be precise, the committee members were asked “how confident are you there is adequate evidence to determine if the benefits are greater than the harms” for Medicare enrollees. They could vote  from “1” (little confidence) to “5” (high confidence). The average vote was 2.33. But the US Preventive Services Task Force, another independent body of experts, had recently given low dose CT scanning a “B” grade, recommending that it be used in people ages 55-79 who have a 30 pack-year smoking history and are currently smoking or have quit within the last 15 years. How could MEDCAC vote no and USPSTF vote yes?

It turns out that the members of the USPSTF didn’t exactly vote yes. They suggested excluding people with “a health problem that substantially limits life expectancy or the ability or willingness to have curative lung surgery.” The reason for this caveat is that screening for lung cancer, like screening tests in general, only makes sense if early detection leads to cure or at least more effective treatment. And the only truly effective treatment for the vast majority of cases of lung cancer in smokers is surgery. Major surgery: removal of all or part of a lung. So the question for Medicare is whether doing major surgery in older people with lung cancer is a good idea. 

The single study on which the USPSTF recommendation was principally based, the National Lung Screening Trial, though it included 52,000 high risk individuals randomized to screening with low-dose CT or screening with an old fashioned chest x-ray, included relatively few people over 65 (26%), very few people over 70 (9%), and few individuals with other health conditionsSo when this study, which by the way was funded by the National Cancer Institute, part of the National Institutes of Health, reported that the death rate was 20% higher in people screened with a conventional x-ray than in those screened with low-dose CT scan (based on a reduction in the death rate from lung cancer from 309/100,000 to 247/100,000 in 6.5 years), its conclusion rested on the 79 excess lung cancer deaths (425 vs 346) in those getting regular x-rays. In other words, 320 people had to be screened to prevent one death. We do not know how many of these 79 deaths were in older people; we do not know how many of these 79 deaths were in people with other serious illnesses such as heart disease or diabetes; and we do not know, for those who survived their lung cancer, how many would go on to die of other illnesses in the near future.

It was based on this kind of analysis that MEDCAC determined there just wasn’t enough evidence to justify ordering Medicare to reimburse for the screening of high risk older individuals with low-dose CT scanning. It didn’t say low-dose CT scanning doesn’t work at all: it said we don’t have enough information about older, sicker patients. It didn’t say low dose CT scanning won’t work in older people: it said there isn’t compelling enough reason to mandate reimbursement for this test and treating such patients for lung cancer with surgery, the only treatment associated with a high cure rate, might in fact do more harm than good to such individuals.


The specific reasons that MEDCAC chose to vote no to Medicare reimbursement are not actually terribly important—though I’ve included some to give a sense of the reasons that were invoked, and all the reasons are publicly available—what is most important is that the rigorous, evidence-based process on which the decision was based be honored. Medicare has yet to issue an “NCD” (National Coverage Decision). It could still be swayed by political pressure, by lobbyists, by emotional personal stories of individuals whose lung cancers were detected by low-dose CT scanning and who believe their survival hinged on this. The 150 lawmakers who are pressuring Medicare are hoping to achieve exactly this end. If Medicare is to remain the excellent insurance program that it in many ways is, it must do what all third party insurers have to do: decide what to cover and what not to cover. And it should make that decision based on the facts, not on the ignorant screed of politicians.