Monday, December 28, 2020

A Dose of Skepticism about Randomized Control Trials

When I was being socialized into economics, it was common to for professors to say something like" Economics cannot carry out experiments like the natural sciences, and thus we must turn to other standards of evidence." But the last couple of decades have shown that economists can indeed carry out experiments in the form of randomized control trials: that is, an experiment in which one group of participants selected at random gets a "treatment" of some kind and the other "control" group does not. 

Indeed, the 2019 Nobel prize economics was awarded “for their experimental approach to alleviating global poverty” to Abhijit Banerjee, Esther Duflo, and Michael Kremer. Going back a little farther, Vernon L. Smith received a share of the 2002 Nobel Prize "for having established laboratory experiments as a tool in empirical economic analysis, especially in the study of alternative market mechanisms." 

But as with all seismic shifts in what one group of economists do, some other economists push back. A collection of essays by prominent economists edited by Florent Bédécarrats, Isabelle Guérin, and François Roubaud, Randomized Control Trials in the Field of Development: A Critical Perspective, has just been published by Oxford University Press. Broadly speaking, the concerns are that the evidence from such trials can often be less compelling than their advocates claim, and that such trials may in some cases be more productive as a format for producing publishable economic research than for gaining a deeper understanding of the challenges of economic development. Here, I'll give a quick sample of some main points made in the first few essays, while recommending the book itself for additional discussion.  

The first essay, by Angus Deaton, is entitled "Randomization in the Tropics Revisited: A Theme and Eleven Variations." A sizeable portion of Deaton's argument focuses on treating randomized control trials as just another form of empirical study, with no presumption that it should be better or worse.  For example, Deaton writes (I draw here on the version of the essay published as NBER Working Paper No. 27600, issued in July 2020, footnotes omitted):
The RCT is a useful tool, but I think that it is a mistake to put method ahead of substance. I have written papers using RCTs. Like other methods of investigation, they are often useful, and, like other methods, they have dangers and drawbacks. Methodological prejudice can only tie our hands. Context is always important, and we must adapt our methods to the problem at hand. It is not true that an RCT, when feasible, will always do better than an observational study. This should not be controversial, but my reading of the rhetoric in the literature suggests that the following statements might still make some uncomfortable, particularly the second: (a) RCTs are affected by the same problems of inference and estimation that economists have faced using other methods, as well as by some that are peculiarly their own, and (b) no RCT can ever legitimately claim to have established causality. My theme is that RCTs have no special status, they have no exemption from the problems of inference that econometricians have always wrestled with, and there is nothing that they, and only they, can accomplish. Just as none of the strengths of RCTs are possessed by RCTs alone, none of their weaknesses are theirs alone, and I shall take pains to emphasize those facts. There is no gold standard. There are good studies and bad studies, and that is all.
However, Deaton also argues that randomized control trials have a particular disadvantage, which is that they are using people as experimental subjects. 
Yet some of the development RCTs seem to pose challenges to the most basic rules. How is informed consent handled when people do not even know they are part of an experiment? Is it OK to run experiments that might change the results of an election? Beneficence is one of the basic requirements of experimentation on human subjects. But beneficence for whom? Foreign experimenters or even local government officials are often poor judges of what people want. Thinking you know what is good for other people is not an appropriate basis for beneficence.  ...

My main concern is broader. Even in the US, nearly all RCTs on the welfare system are RCTs done by better-heeled, better-educated and paler people on lower income, less-educated and darker people. My reading of the literature is that a large majority of American experiments were not done in the interests of the poor people who were their subjects, but in the interests of rich people (or at least taxpayers or their representatives) who had accepted, sometimes reluctantly, an obligation to prevent the worst of poverty, and wanted to minimize the cost of doing so. That is bad enough, but at least the domestic poor get to vote, and are part of the society in which taxpayers live and welfare operates, so that there is a feedback from them to their benefactors. Not so in economic development, where those being aided have no influence over the donors. Some of the RCTs done by western economists on extremely poor people in India, and that were vetted by American institutional review boards, appear unethical, and likely could not have been done on American subjects. It is particularly worrying if the research addresses questions in economics that appear to have no potential benefit for the subjects. Using poor people to build a professional CV should not be accepted. Institutional review boards in the US have special protection for prisoners, whose  autonomy is compromised; there appears to be no similar protection for some of the poorest people in the world. There is an uncomfortable parallel here with the debates about pharmaceutical countries testing drugs in Africa. ...

 Working to benefit the citizens of other countries is fraught with difficulties. In countries ruled by regimes that do not care about the welfare of their citizens—extractive regimes that see their citizens as source of plunder—the regime, if it has complete control, will necessarily be the beneficiary of aid from abroad. ... The RCT is in itself a neutral statistical tool but as Dean Spears notes60, “RCTs provide a ready and high-status language” that allows “mutual legitimization among funders, researchers, and governments.” When the RCT methodology is used as a tool for “finding out what works,” in a way that does not include freedom in its definition of what works, then it risks supporting oppression. 
The next essay, by Martin Ravallion, asks "Should the Randomistas (Continue to) Rule?" He supports and offers his own angle on many of the questions about randomized control trials and statistical inference raised by Deaton, and adds concerns about how an overemphasis on RCTs may bias the research agenda. Here, I quote from the version of the paper released as NBER Working Paper 27554:
We are seeing a welcome shift toward a culture of experimentation in fighting poverty, and addressing other development challenges. RCTs have a place on the menu of tools for this purpose. However, they do not deserve the special status that advocates have given them, and which has so influenced researchers, development agencies, donors and the development community as a whole. ...  Despite frequent claims to the contrary, an RCT does not equate counterfactual outcomes between treated and control units. The absence of systematic bias does not imply that the experimental error in a one-off RCT is less than the error in some alternative non-random method. We cannot know that. Among the feasible methods in any application (with a given budget for evaluation), the RCT option need not come closer to the truth. Indeed, if the sample size for an observational study is sufficiently greater than for an RCT in the same setting, then the trials by observational study can be more often close to the truth even if they are biased. ...  Moreover, when we look at RCTs in practice, we see them confronting problems of mis-measurement, selective compliance and contamination. Then it becomes clear that the tool cannot address the questions we ask about poverty, and policies for fighting it, without making the same type of assumptions found in observational studies—assumptions that the randomistas promised to avoid. ... 

The questionable claims made about the superiority of RCTs as the “gold standard” have had a distorting influence on the use of impact evaluations to inform development policymaking. The bias stems from the fact that randomization is only feasible for a non-random subset of policies. When a program is community- or economy-wide or there are pervasive spillover effects from those treated to those not, an RCT will be of little help, and may well be deceptive. The tool is only well suited to a rather narrow range of development policies, and even then it will not address many of the questions that policymakers ask. Advocating RCTs as the best, or even only, scientific method for impact evaluation risks distorting our knowledge base for fighting poverty ...
Lant Pritchett follows with the third essay, "Randomizing Development: Method or Madness." (I draw here from the version of the paper at his website dated June 30, 2019). Pritchett emphasizes that "development" is a big picture concept. He writes: 
National development is a four-fold transformation of an intrinsically social grouping (country or region or society) to higher levels of capabilities in four dimensions: an economic transformation from lower productivity to higher productivity; a political transformation to governments more responsive to the broad wishes of the population, an administrative transformation to organizations (including those of the state) with higher levels of functional capability for implementation, and a social transformation to more equal treatment of the citizens of the country (usually with a sense of common identity and, to some extent, shared purpose). National development is about countries like Haiti or India or Bolivia or Indonesia achieving the high levels of economic, political, administrative, and social functional capabilities that Denmark or Japan or Australia possess
Pritchett argues that when you compare those big-picture goals to the very limited focus of the actual randomized control trials of the sort that often end up being published in economics journals, the comparison is somewhere between ridiculous and painful. He writes (footnotes omitted):
Bill Gates has recently been promoting chicken ownership to address poverty in Africa. In an open letter, Professor Blattman of University of Chicago pointed out that cash transfers may be more cost effective than chickens said: “It would be straightforward to run a study with a few thousand people in six countries, and eight or 12 variations, to understand which combination works best, where, and with whom. To me that answer is the best investment we could make to fight world poverty. The scholars at Innovations for Poverty Action who ran the livestock trial in Science agree with me. In fact, we’ve been trying, together, to get just such a comparative study started.” [emphasis added] 

I think it is important for the development community to stop and reflect on how we, as a development community, arrived at this two-fold madness. First the madness that Bill Gates, a genius, a humanitarian, an important public intellectual, could be even semiseriously talking about chickens. Second, the madness about method, that the response of Chris Blattman, also a genius, an academic at a top global university, and also an important public intellectual would respond not “Chickens? Really?” but rather that the “best investment” to “fight world poverty” is using the right method to study the competing program and design elements of chickens versus cash transfers. 

That this is madness is, I hope, is obvious. The top 20 most populous developing countries in the world are (in order): China, India, Indonesia, Brazil, Pakistan, Nigeria, Bangladesh, Russia, Mexico, Philippines, Ethiopia, Vietnam, Egypt, Iran, Turkey, DR Congo, Thailand, South Africa, Tanzania and Colombia. Together these countries have 4.6 billion people. Imagine gathering a couple of dozen of the leaders from any one of these countries (where “leadership” could be political, social, economic, intellectual, popular, mass movement, civil society, or any combination) and saying: “We, the experts in the development community, think ‘fighting world poverty’ is the center of the development agenda and we think that the ‘best investment’ we can make to promote development/fight poverty in your country [fill in the blank: Indonesia, Brazil, Nigeria, DRC, Tanzania, South Africa, Egypt, India] is a set of studies using the right method to resolve the questions of whether anti-poverty programs should promote chicken ownership or distribute cash and, within that, how best to design such chicken or cash transfer programs?” 

I imagine two responses from country leaders. One, how could you have come to such trivial and trivializing ideas about our country’s goals, aspirations, and challenge? How can we as [Indonesians/Indians/Nigerians/Egyptians/Tanzanians] not take as outright contempt the suggestion that either “chickens” or “studies about chickens” are the top priorities for our country? Two, we can easily list for you many pressing, urgent, if not crisis, development issues affecting the current and future well-being of the citizens of our country. These questions are important whether or not your preferred method for producing research papers can address them.
There is much more in these essays worth considering, and much more in the volume as a whole. Although I am here emphasizing the criticisms of this approach, I should also note that these issues are well-known and broadly debated, and the supporters of a randomized control trial approach have answers of their own. For example, for a discussion of how experiments can move from small-scale to larger-scale and eventually to public policy, one starting point is a three-paper symposium in the Fall 2017 issue of the Journal of Economic Perspectives
Here's the full Table of Contents for Randomized Control Trials in the Field of Development: A Critical Perspective, edited by Florent Bédécarrats, Isabelle Guérin, and François Roubaud: 

General Introduction, Florent Bédécarrats, Isabelle Guérin, and François Roubaud
0. Randomization in the Tropics Revisited: A Theme and Eleven Variations, Sir Angus Deaton
1. Should the Randomistas (Continue to) Rule?, Martin Ravallion
2. Randomizing Development: Method or Madness?, Lant Pritchett
3. The Disruptive Power of RCTs, Jonathan Morduch
4. RCTs in Development Economics, Their Critics, and Their Evolution, Timothy Ogden
5. Reducing the Knowledge Gap in Global Health Delivery: Contributions and Limitations of Randomized Controlled trials, Andres Garchitorena, Megan Murray, Bethany Hedt-Gauthier, Paul Farmer, and Matthew Bonds
6. Trials and Tribulations: The Rise and Fall of the RCT in the WASH Sector, Dean Spears, Radu Ban, and Oliver Cumming
7. Microfinance RCTs in Development: Miracle or Mirage?, Florent Bédécarrats, Isabelle Guérin, and François Roubaud
8. The Rhetorical Superiority of Poor Economics, Agnès Labrousse
9. Are the 'Randomistas' Evaluators?, Robert Picciotto
10. Ethics of RCTs: Should Economists Care about Equipoise?, Michel Abramowicz and Ariane Szafarz
11. Using Priors in Experimental Design: How Much Are We Leaving on the Table?, Eva Vivalt
12. Epilogue: Randomization and Social Policy Evaluation Revisited, James J. Heckman