‘No, Not That Evidence’

President Donald Trump announces his decision to pull the U.S. out of the Paris climate agreement, June 1, 2017. (Chip Somodevilla/Getty Images)
The technocratic Left has a data problem

The president of the United States had just cited his work with approval during a Rose Garden speech announcing a major change in American policy, and MIT economist John Reilly was speaking with National Public Radio. “I’m so sorry,” said host Barbara Howard. “Yeah,” Reilly replied.

This was not a triumph but a tragedy, because the president in question was Donald Trump. And the action taken was withdrawal of the United States from the Paris climate agreement.

Trump had cited Reilly’s work correctly, saying: “Even if the Paris Agreement were implemented in full” using Reilly’s economic projections, “. . . it is estimated it would only produce a two-tenths of one degree . . . Celsius reduction in global temperature by the year 2100.” But as Reilly explained on NPR, “All of us here believe the Paris agreement was an important step forward, so, to have our work used as an excuse to withdraw it is exactly the reverse of what we imagined hoping it would do.”

The failure of social science to produce the results that the technocratic Left “imagined hoping it would” has reached epidemic proportions. On many of our nation’s major domestic-policy issues, intense efforts to construct a rigorous scaffolding for a more assertive centralized state have yielded only piles of debris. “Evidence-based policymaking” was supposed to produce well-designed government programs with high returns on investment, justifying in turn further government engagement that would further lift the masses. Instead, it has shown how little we know, and how little our programs accomplish.

The frustration is palpable. But with the social sciences dominated by left-of-center technocrats, the result has been less shooting the messenger and more the messenger committing ritual suicide. Poor Reilly had to write the Wall Street Journal a letter, criticizing its use of his study and noting that the Paris agreement’s real value was higher than two-tenths of one degree, “according to several analyses.” More generally, unhelpful results are stalled or suppressed; when they emerge anyway, they are denigrated; and then, from the ensuing confusion, the researchers announce that in fact their preexisting beliefs were correct all along — all this with a seal of approval reading “peer-reviewed.”

Take health care. As Congress debated repeal or reform of the Affordable Care Act (ACA) last year, Democrats relied heavily on the claim that Republican proposals would cause hundreds of thousands of Americans to die. In the New York Times, columnist Paul Krugman warned that the GOP proposal would cause “vast suffering — including, according to the best estimates, around 200,000 preventable deaths.” Former presidential candidate Hillary Clinton tweeted that “if Republicans pass this bill, they’re the death party.” Striking claims such as these require striking evidence.

Fortunately for the accusers, a study of the highest quality had recently measured Medicaid’s effect on the physical health of recipients. (It was Medicaid coverage that was most at issue in the ACA debate.) The study, known as the Oregon Health Insurance Experiment, monitored the health of thousands of Oregonians who were randomly assigned to receive Medicaid or not. Such a random-assignment model is considered the “gold standard” for measuring the effects of a treatment or policy. The researchers had pre-specified their planned analyses, committing themselves in advance to use the data in certain ways and measure certain outcomes before they knew how the results would look. When their first results in 2011 seemed to show Medicaid working, principal investigator Katherine Baicker of Harvard University’s school of public health exulted to the New York Times: “This is an unbelievable opportunity to actually find out once and for all what expanding public health insurance does.”

Unfortunately for the accusers, when more data became available, the researchers discovered that “Medicaid coverage generated no significant im­provements in measured physical health outcomes in the first 2 years.” At least at first glance, this would seem to preclude strong claims that any scaling back of Medicaid would cause the death of countless Americans. But that assumes that one gives it a glance.

Krugman’s “best estimates” ignored the Oregon result entirely. Instead, he referenced an article at Vox by a professor and two graduates from Harvard’s public-health school. They in turn were relying on their own study, which had just been published by the Center for American Progress (CAP), the left-of-center think tank led by Hillary Clinton’s former policy director. Clinton’s tweet relied upon the CAP study too. Both Vox and CAP, in turn, relied on a commentary published just one day before the CAP study on the New England Journal of Medicine website by Benjamin Sommers (a professor at the same Harvard school) and colleagues.

“One question experts are commonly asked,” wrote Sommers, “is how the ACA — or its repeal — will affect health and mortality.” With a political debate raging on Capitol Hill, Sommers and the New England Journal of Medicine felt the time was right to interject his view that the mortality consequences would be substantial. He identified three lines of evidence: The first offered “conflicting observational studies on whether lack of insurance is an independent predictor of mortality.” No good. The second, the Oregon study, he characterized as “highly imprecise estimates” that were “unable to rule out large mortality increases or decreases.” No good. The third line included three studies that found “significant reductions in mortality in quasi-experimental analyses.” Here was something worth using.

The lead author of all three of those studies, incidentally, was Sommers.

Baicker did not rise in defense of her Oregon findings, which she had earlier said would show “once and for all” what public health insurance does. In fact, she was a co-author of the Sommers commentary, dismissing Oregon as “not well suited to evaluating mortality.”

 The pattern of rejecting results even when the study design has been established in advance is pervasive. Sometimes the researchers stand by their conclusions, but this can prompt aggressive sabotage by the “peers” re­sponsible for “review.”

Tennessee, for instance, provided conditions for an experiment similar to Oregon’s, this time focused on state-funded pre-kindergarten. Thousands of young children were assigned randomly to receive or not receive access to the program, and researchers from Vanderbilt University followed their subsequent academic performance. In 2015, the release of initial results garnered widespread attention for showing that by the end of third grade, the students who had attended pre-K were performing worse in school.

Critics called the methodology “flawed,” and one even argued that random-assignment experiments may be unreliable because those not assigned to receive free pre-K could get angry and work extra hard to help their kids catch up with the pre-K group. A common theory held that the Tennessee program was especially low-quality. But as the study’s leaders, Dale Farran and Mark Lipsey, carefully showed, the Tennessee program was not any lower-quality than others touted as better models.

When the final, peer-reviewed study appeared this year and confirmed the earlier findings, Farran and Lipsey also posted a brief note describing an experience similar to what had happened in Oregon. While their earliest, seemingly positive findings had been “relished,” the subsequent, negative ones “were not welcome. So much so,” they continued,

that it has been difficult to get the results published. Our first attempt was reviewed by pre-k advocates who had disparaged our findings when they first came out in a working paper — we know that because their reviews repeated word-for-word criticisms made in their prior blogs and commentary. . . . It is, of course, understandable that people are skeptical of results that do not confirm the prevailing wisdom, but the vitriol with which our work has been greeted is beyond mere scientific concern. 

Back in the Pacific Northwest, meanwhile, researchers were watching Seattle increase its minimum wage toward $15 per hour. When the Seattle city council adopted the policy in 2014, it also issued a public request for proposals to evaluate the policy’s effects. The city selected a proposal from the University of Washington and contracted with the team there to conduct a multi-year study. An initial report in 2016 found mixed results from the establishment of an $11-per-hour minimum; it met with limited reaction.

The picture looked different in 2017, though, once the minimum reached $13. When the team shared with the city preliminary results indicating that the policy was reducing hours worked for low-wage earners by nearly 10 percent and costing them $125 per month, the mayor sprang into action. No, he didn’t seek to revise the policy. He contacted a team at the University of California, Berkeley, known for publishing studies that favor high minimum wages and asked them to release a report just days before the UW team’s was set to appear.

Michael Reich, a professor of economics at Berkeley, was glad to help. As Seattle Weekly later documented with emails obtained via a public-disclosure request, Reich “stayed in close touch with mayoral staff, as well as a think tank that advocates for a $15 minimum wage nationwide and a New York public relations agency, to craft the messaging and timing of the study’s release.” The Berkeley report, endorsing the city’s position, appeared on June 20. When the UW report followed on June 26, Reich was ready with a city-requested memo criticizing it. The city has also cut off funding to the UW team.

Federal agencies, deeply invested in the validation of their own programs, employ a simpler coping mechanism when bad news arrives: stall and bury. The U.S. Department of Labor contracted with Mathematica Policy Research in 2008 to conduct a “gold standard” evaluation of federal job-training programs. A 2016 report with initial findings on effectiveness, dated May 30, was not released until November 8 (Election Day). The results showed no increase in earnings or employment for participants more than a year after training began. Longer-term findings, promised in 2017, have yet to be released.

The Department of Health and Human Services conducted a random-assignment study of Head Start that followed children who had enrolled in 2002 through their third-grade year in 2008. The final results were not published until 2012 — with a report dated October but released the Friday before Christmas. The conclusion: “By the end of 3rd grade there were very few impacts found for either cohort in any of the four domains of cognitive, social-emotional, health and parenting practices. The few impacts that were found did not show a clear pattern of favorable or unfavorable impacts for children.”

 It would be a mistake to draw from any of these studies a conclusion that the programs in question are failures and should be discontinued. In many cases the findings are just statistically insignificant, meaning that the results were not clear-cut enough to support confidence in any conclusion. In some, the findings were mixed or provided results that conflicted with those of previous studies. These answers are best understood as “It depends” or “We don’t know.”

Does Medicaid work? It depends on who the recipient is and what counts as working. Coverage for pregnant women and children does seem to offer significant benefits. And even in the Oregon study, where physical health did not improve, mental health did. On one hand, recipients’ financial strain declined. On the other hand, a follow-up analysis showed that recipients were getting only 20 to 40 cents of value per dollar spent. And there are still other studies, some of which found that insurance coverage reduced mortality, or did in some states but not others, or did not. The UW and Berkeley teams actually agreed that a higher minimum wage showed little effect on the restaurant industry — ideally, that would prompt further investigation into why.

But neither technocrats nor demagogues have time for confusion, ambiguity, and humility. In the evidence-based-policymaking framework, results are supposed to dictate policy and expose all who stand in the way as heartless and dogmatic. How can anyone have a debate on the appropriate role of government, the importance of insurance in health care, or the competence of a federal agency to allocate resources nationwide without citing an analysis of exactly how many bodies the other side will strew in the streets?

So somebody must write the commentary that dismisses most studies of health insurance and anoints one — the one that suggests the highest body count — as the “best estimate” for Paul Krugman’s purposes. Writing in the New York Times after publication of the initial Tennessee results, Berkeley professor David L. Kirp acknowledged that “skeptical researchers” were claiming that “preschool provides only short-term educational assistance that fades out after a few years.” But, he assured readers, more-positive findings from an Oklahoma study (without random assignment) “should put that contention to rest.” Not “raise interesting questions” or “show the issue to be a complicated one,” but put that contention to rest. Nothing to see here, folks. Back to your regularly scheduled demands for universal pre-K.

Some check is needed on the impulse to slice and dice whatever results the research might yield into whatever conclusion the research community “imagined hoping” it would reach. In theory, peer review should do just that. But in this respect, the leftward lean of the ivory tower is as problematic for its distortion of the knowledge that feeds public-policy debates as it is for its suffocating effect on students and the broader culture. Peer review changes from feature to bug when the peers form an echo chamber of like-minded individuals pursuing the same ends. Academic journals become talking-points memos when they time the publication of unreviewed commentaries for maximum im­pact on political debates.

Alternatively, the media might play the role of auditor. But this assumes a level of sophistication and diligence on the part of reporters that is not often in evidence. When dealing with peer-reviewed content, they will expressly disavow any obligation to do more than repeat what they consider wisdom from on high. Even unreviewed analysis gets little scrutiny.

In Seattle, for instance, the city council’s bad behavior might be expected. But the New York Times covered the debacle as if nothing were amiss. “The first study” in its reporting was Berkeley’s; “the second study” was UW’s. Berkeley’s sham was consistent with past minimum-wage studies, while the UW result appeared an outlier. “Experts on the minimum wage questioned the methods of the University of Washington researchers.” The Times provided no indication that the UW team was employing a predefined methodology chosen by the city and evinced no curiosity about why this alternative Berkeley analysis had suddenly materialized. The following week, the Times editorial board applauded UW’s critics for being “data-driven, not ideological.”

In defense of the social scientists, it must be hard to think straight when the cognitive dissonance is so deafening. They know that their policies are right, and their studies should validate this, yet the data just won’t cooperate. So the claims take on the character of “fake but accurate” reporting, or of the dirty cop who plants a murder weapon on someone he’s sure is guilty. The problem, as these examples indicate, is not just that it’s wrong, but that such behavior tends to catch up with you. And when that happens, it doesn’t matter if you were right, because you and your methods and your organization and your cause are all discredited. That’s obviously a danger for the technocratic Left’s policy agenda, but it’s a danger for everyone else too.

Good data are critical to good policymaking, and we invest enormous public resources each year hiring researchers to develop them. We ask those researchers to police themselves, expecting that they will recognize that their credibility rests on their objectivity and their commitment to upholding the highest standards in their methods, regardless of what answers emerge. When they ignore those responsibilities for the sake of grabbing immediate “wins,” the eventual result is lose-lose: Academics, deservedly, lose society’s trust, and society loses a crucial source of credible expertise that might otherwise help government to operate effectively and improve people’s lives.