Wednesday, July 3, 2013

Interview with Susan Athey on Big Data and Other Topics

Douglas Clement has a characteristically excellent interview with Susan Athey, appearing in the June 2013 issue of The Region, which is published by the Federal Reserve Bank of Minneapolis. Athey is a professor at Stanford's Business School, winner of the  John Bates Clark medal in 2007, and also has been Chief Economist at Microsoft since 2007. Here are some snippets:

Susan Athey

On whether the arrival of "big data" means that theory is now less important:

"In fact, the need for theory is in some ways magnified by having large amounts of data. When you have a small amount of data, you can just look at the data and build your intuition from it. When you have very large amounts of data, just taking an average can cost thousands of dollars of computer time. So you’d better have an idea of what you’re doing and why before you go out to take those averages. The importance of theory to create conceptual frameworks to know what to look for has never been larger ... I think what is true is that when you have large amounts of data, if you ask it the right questions, you have a greater ability to let the data speak, and so you can be much less reliant on assumptions. But you still need a strong conceptual framework to understand what’s coming out.

"And I would say in the business world, this is where there’s an enormous scarcity of talent. I see that there are a fair number of statisticians out there, not nearly enough, but a fair number of data scientists out there. There’s a huge demand for them still. But among data scientists, the ones who can define a question and introduce a new way of looking at the data—those data scientists are rock stars. They’re pursued by every company and they move up the hierarchy very quickly. They’re giving presentations to top executives and are extraordinarily influential. And there are never enough of them."

Why economics should focus more on issues of big data: 

"I think that the data scientists should take a little more economics. That would help; economics puts a lot of emphasis on the conceptual framework. And I also think that economics should be paying a lot more attention to the statistics of big data.
Right now, economics as a profession has very little market share in the business analysis of this big data. It’s mostly statisticians. We’re just not training our undergraduates to be qualified for these jobs. Even our graduate students, even someone with a Ph.D. from a very good economics department really doesn’t have the right skills to analyze the kinds of data sets that big Internet firms are creating. ... We’re a little bit behind. Econometrics, at the undergraduate level, is not appreciated as much as an expertise that’s extremely important for future employment, and we certainly don’t see a lot of economics majors going on to take extra steps beyond what’s required....I really think we need to make some changes in education. What happens at the top Ph.D. programs isn’t going to really impact the overall workforce. But what we do at the undergraduate level and whether we start offering more advanced or master’s level courses becomes more important—because, really, with just an undergraduate degree it’s hard to be very successful on the technical side at any of these firms."

How big data will generate future productivity gains: 

"Companies in all sorts of different industries are starting to generate large amounts of data. The Internet companies were built from the ground up on that data. Other companies are just starting to think about what they do with the data. If you think about these kind of general purpose innovations like the computer, it took us a while to figure out what to do with the computer. It replaced the secretary and the typewriter, but it took another 15 years before the personal computer really changed the way we do commerce, which you would say really comes with the Internet and businesses being built around it."

"With the big data, of course, the Googles and the Facebooks and so on were born on that. But if you take, say, a car manufacturer that might be getting real-time information from monitoring devices within the cars, there’s a first level of things you can do with that data. Like you can look at aggregate failure rates, or something, for certain types of things. You can identify problems."

"But there’s a whole other level of optimizations that can be done. And I think that idea will apply across many industries. They’ll start with just the basics of, let’s figure out how to prioritize problems. For example, with software you can get telemetry data about, where are the bugs? What’s causing crashes? That’s sort of the first level of what you do with data: You use the data to identify problems and make priorities. The more frequent the crash, the higher you prioritize in fixing that problem. But there’s a next level, which includes real-time machine learning, customization, personalization, optimization, where industry as a whole is just inventing what to do with it. And there could be some really radical breakthroughs in different industries. They’re just very hard to anticipate as they start to use these data."

On the idea that auction design needs to focus not only on getting the highest bid, but also on attracting lots of bidders:

"So if you’re thinking about how to design an auction, or how to design a market more generally, even though it can be tempting to focus on what happens once the people are in the room, it can be more important to start with designing your marketplace to get people to come, to start with. This insight is one that I’ve brought to other settings. I think, for example, it applies in online auctions. When a large company like eBay or an online advertising firm is designing its marketplace, for example, it can be more important to design your marketplace to attract bidders and make sure they’re there to participate than it is to try to extract every last cent out of them once they get there. If potential bidders are not making enough profit to make it worth their time to come, they won’t come. And thin markets can be much more problematic."
On the difference between profit-seeking search engines and competitive search engines: 

"[A] profit-maximizing search engine cares how much surplus the advertisers get versus the search engine. As a result of that, a monopolist search engine will tend to raise reserve prices [meaning the lowest price they’ll accept] too high in order to extract more surplus from the advertisers even if it means eliminating ads that the consumers might have liked to see. In contrast, a competitive search engine—one that’s competing for advertisers and users—will be more likely to choose the welfare-maximizing point.  A more realistic model would also incorporate the other content that gets crowded off the page by the ads; such a model would be more likely to see a monopolist search engine put up too many ads relative to what consumers would like, but again competition would typically push a firm closer to welfare maximization in order to keep both sides of the market participating."