Pages

Monday, May 5, 2014

Big Data in Political Campaigns

How does the collection and use of big data work in political campaigns? David W. Nickerson and Todd Rogers pull back the curtain and offer a glimpse of what's been  happening in "Political Campaigns and Big Data," which appears in the Spring 2014 issue of the Journal of Economic Perspectives.  Nickerson is a Notre Dame professor of political science who was "`Director of Experiments' in the Analytics Department in the 2012 re-election campaign of President  Obama."  Rogers is a professor of public policy at Harvard's Kennedy school who "co-founded the Analyst Institute, which uses field experiments and behavioral science insights to develop best practices in progressive political communications." They write:

Over the past six years, campaigns have become increasingly reliant on analyzing large and detailed datasets to create the necessary predictions. While the adoption of these new analytic methods has not radically transformed how campaigns operate, the improved efficiency gives data-savvy campaigns a competitive advantage. This has led the political parties to engage in an arms race to leverage ever-growing volumes of data to create votes. This paper describes the utility and evolution of data in political campaigns. The techniques used as recently as a decade or two ago by political campaigns to predict the tendencies of citizens appear extremely rudimentary by current standards.
Like all articles in JEP back to the first issue in 1987, it is freely available courtesy of the American Economic Association. (Full disclosure: I've been managing editor of JEP back to that first issue in 1987.) Here are some points from their essay that jumped out at me.

The starting point for gathering data on potential voters are the publicly available files of official voters maintained in each state. As Nickerson and Rogers write: "The official voter file contains
a wide range of information. In addition to personal information such as date of birth and gender, which are often valuable in developing predictive scores, voter files also contain contact information such as address and phone." In addition, while the files of course don't record who anyone voted for, they do show whether people voted, and how they voted--say, on Election Day, or using some form of early or absentee voting.

This data can then be merged with data from other sources. Census data is available for the average of a voting precinct, showing "the average household income, average level of education, average number of children per household, and ethnic distribution" for that precinct.

Additional data can be purchased from commercial firms. Nickerson and Rogers report that the most cost-effective data to purchase is updated phone numbers (because the phone numbers in the state voter registration files are often outdated after a few years) as well as data about "estimated years of education, home ownership status, and mortgage information." Other information, while available, isn't cost-effective to buy. They write: " In contrast, information on magazine subscriptions, car purchases, and other consumer tastes are relatively expensive to purchase from vendors, and also tend to be available for very few individuals. Given this limited coverage, this data tends not to be useful in constructing predictive scores for the entire population—and so campaigns generally avoid or
limit purchases of this kind of consumer data."

Finally, a major source of voter information is provided by voters themselves when they sign up at a candidate's website or party website. Not only do people provide information directly, but the campaign can also keep track of what sorts of topics or messages cause people to respond by clicking on a link or donating money, so much can be learned about people in that way.

These sources of information have some interesting implications. Campaigns know more about those who vote, and who are politically active, than about those who don't vote regularly or who are not politically active. Campaigns also tend to know more about their own supporters.  Nickerson and Rogers write: "To the extent that predictive scores are useful and reveal true unobserved characteristics about citizens, it means that multiple organizations will produce predictive scores that recommend targeting the same sets of citizens. For example, some citizens might find themselves
contacted many times, while other citizens—like those with low turnout behavior scores in 2012—might be ignored by nearly every campaign."

After collecting and collating and coordinating all this data, the question is how to use it. Nickerson and Rogers point out that focusing on those who are already very likely to vote for you, or focusing on those who are already very likely to vote against you, tends to be a waste of money. Thus, one way that data can make a campaign more cost-effective is that it can minimize spending money on those who are unpersuadeable or who are already persuaded. This also reduces the risk of "backlash," in which attempts to encourage voting for your candidate revs up voters for the other side.

Another possible advantage is that campaigns can run small-scale experiments  about what messages or actions are likely to cause a certain slice of voters to take an action--clicking on a link, volunteering time, putting up a sign, giving money--that is likely to be correlated with voting for the candidate later on. When small-scale experiments have shown what steps are likely to be effective, then the approach can be used at larger scale. How effective can such steps be? They write: "Suppose a campaign’s persuasive communications has an average treatment effect of 2 percentage points—a number
on the high end of persuasion effects observed in high-expense campaigns: that is, if half of citizens who vote already planned to vote for the candidate, 52 percent would support the candidate after the persuasive communication."

Nickerson and Rogers point out in their conclusion that while using big data to drive campaigning, in a very real way, makes traditional boots-on-the-ground campaigning more important than ever. After all, the bottom line of the campaign is still to push for more of your voters to turn out. Big data can help a campaign allocate resources more cost-effectively, but campaign still needs to do the actual work.
"The improved capability to target individual voters offers campaigns an opportunity to concentrate their resources where they will be most effective. This power, however, has not radically transformed the nature of campaign work. One could argue that the growing impact of data analytics in campaigns has amplified the importance of traditional campaign work. . . . Professional phone interviews are still used for message development and tracking, but they are also essential for developing predictive scores of candidate support and measuring changes in voter preferences in randomized experiments. Similarly, better targeting has made grassroots campaign tactics more efficient and therefore more cost competitive with mass communication forms of outreach. Volunteers still need to persuade skeptical neighbors, but they are now better able to focus on persuadable neighbors and use messages more likely to resonate. This leads to higher-quality interactions and (potentially) a more pleasant volunteer experience. So while savvy campaigns will harness the power of predictive scores, the scores will only help the campaigns that were already effective."