Here's an early example from List's work:
"So let’s go through an example whereby I think I can convince you that I am in a natural environment and that I’m learning something of importance for economics. I first got interested in charitable fundraising in 1998 when a dean at the University of Central Florida asked me to raise money for a center at UCF. ... Many charities have programs where they will match a donor’s gift. So your $100 gift means that the charity will get $200 after the match. Interestingly, however, when you go and ask those charities if matching works they say, “Of course it does, and a 2-to-1 match is much better than a 1-to-1 match, and a 3-to-1 match is better than either of them.” So I asked, “What is your empirical evidence for that?” They had none. Turns out that it was a gut feeling they had.
"I said, well, why don’t you do field experiments to learn about what works for charity? ... So what we are going to do is partner with them in one of their mail solicitations. Say they send out 50,000 letters a month. We will then randomize those 50,000 letters that go directly to households into different treatments. One household might receive a letter that says, “Please give to our charity. Every dollar you give will be matched with $3 from us.” Another household might receive the exact same letter, but the only thing that changes is that we tell them that every dollar you give will be matched by $2. Another household receives a $1 match offer. And, finally, another household will receive a letter that doesn’t mention matching. So you fill these treatment cells with thousands of households that don’t know they’re part of an experiment. We’re using randomization to learn about whether the match works. That’s an example of a natural field experiment — completed in a natural environment and the task is commonplace.
"I didn’t learn that 3-to-1 works better than 2-to-1 or 1-to-1. Empirically, what happens is, the match in and of itself works really well. We raise about 20 percent more money when there is a match available. But, the 3-to-1, 2-to-1, and 1-to-1 matches work about the same."How does List respond to the concern that we are unlikely to learn much of interest from these kinds of experiments, because the real world is just too messy for cause and effect to be discerned?
"So I come along, and I say we really need to use the tool of randomization, but we need to use it in the field. Here’s where the skepticism arose using that approach: People would say, “You can’t do that, because the world is really, really messy, and there are a lot of things that you don’t observe or control. When you go to the marketplace, there are a lot of reasons why people are behaving in the manner in which they behave. So there’s no way — you don’t have the control — to run an experiment in that environment and learn something useful. The best you can do is to just observe and take from that observation something of potential interest.
"That reasoning stems from the natural sciences. Consider the example with the chemist: If she has dirty test tubes her data are flawed. The rub is that chemists do not use randomization to measure treatment effects. When you do, you can balance the unobservables — the “dirt” — and make clean inference. As such, I think that economists’ reasoning on field experiments has been flawed for decades, and I believe it is an important reason why people have not used field experiments until the last 10 or 15 years. They have believed that because the world is really messy, you can’t have control in the same way that a chemist has control or a biologist might have control. ...
"When I look at the real world, I want it to be messy. I want there to be many, many variables that we don’t observe and I want those variables to frustrate inference. The reason why the field experiments are so valuable is because you randomize people into treatment and control, and those unobservable variables are then balanced. I’m not getting rid of the unobservables — you can never get rid of unobservables — but I can balance them across treatment and control cells. Experimentation should be used in environments that are messy; and I think the profession has had it exactly backwards for decades. They have always thought if the test tube is not clean, then you can’t experiment. That’s exactly wrong. When the test tube is dirty, it means that it’s harder to make proper causal inference by using our typical empirical approaches that model mounds and mounds of data."
"I think in many ways, it’s harder to overturn entrenched thinking in parts of the nonprofit, corporate, and public sectors, where many things are not subject to empirical testing. For instance, why don’t we know what works in education? It’s because we have not used field experiments across school districts. Each school district should be engaged in several experiments a year, and then in the end the federal government can say, “Here’s what works. Here’s a new law.” It’s unfair to future generations to pass along zero information on what policies can curb criminal activities, what policies can curb teen pregnancy, what are the best ways to overcome the racial achievement gap, why there aren’t more women in the top echelon of corporations. We don’t know because we don’t understand, we haven’t
engaged in feedback-maximization. There needs to be a transformation, and I don’t know what it’s going to take. I mean, are we going to be sitting here in 50 years and thinking, “If we only knew what worked to help close the achievement gap, if we only knew how to do that”?
"I hope my work in education induces a sea change in the way we think about how to construct curricula. Right now, we are doing a lot of work on a prekindergarten program in Chicago Heights and in a year or two I think that we will be able to tell policymakers what will help kids — and how much it will help them. But unless people adopt the field experimental approach more broadly, it will be a career that’s not fulfilled in my eyes."