For a big-picture overview of these issues, a useful starting point is the three-paper "Symposium on the Provision of Public Data" in the Winter 2019 issue of the Journal of Economic Perspectives.
- "The Value of US Government Data to US Business Decisions," by Ellen Hughes-Cromwick and Julia Coronado
- "On the Controversies behind the Origins of the Federal Economic Statistics," by Hugh Rockoff
- "Evolving Measurement for an Evolving Economy: Thoughts on 21st Century US Economic Statistics," by Ron S. Jarmin
But if you want to get down and dirty with the details of what changes to government statistics are being researched and considered, you will want turn to the papers from the most recent Conference on Research in Income and Wealth, held March 16-17 in Washington, DC. The CRIW, which is administered by the National Bureau of Economic Research, has been holding conferences since 1939 with a mixture of researchers from government, academia, business, and nonprofits to talk about issues of economic measurement. Sixteen of the papers from the conference, most also including presentation slides, are available at the website.
In Winter 2019 JEP, Hughes-Cromwick and Coronado point out that the combined annual budget for the 13 major US government statistical agencies is a little over $2 billion. For comparison, the “government-data–intensive sectors” a sector of the economy, which includes firms that rely heavily on government data like "investment analysts, database aggregator firms, market researchers, benchmarkers, and others," now has annual revenues in the range of $300 billion or more. They also offer concrete examples how firms in just a few industries--automobiles, energy, and financial services--use government data as a basis for their own additional calculations for a very wide range of business decisions.
Rockoff points out that main government statistical series like inflation, unemployment, and GDP all emerged out of historical situations where it became important for politicians to have an idea of what was actually going on. For example, early US government efforts at measuring inflation emerged from the public controversy over the extent of price change in the Civil War and the early 1890s. Early measurement of GDP and national income emerged from disputes over the extent of inequality in the opening decades of the 20th century. Some of the earliest US unemployment statistics were collected in Massachusetts in the aftermath of the panic of 1873 and the depression that followed. As he points out, the ongoing development of these statistics was then shaped by changes in price, output, and unemployment during World Wars I and II, and the Great Depression.
This interaction between US government policy and statistics goes back to the origins of the US Census in 1790, when James Madison (then a member of the House of Representatives) argued that the Census should do more than just count heads, but should collect other economic data as well. In the Congressional debate, Madison said:
If gentlemen have any doubts with respect to its utility, I cannot satisfy them in a better manner, than by referring them to the debates which took place upon the bills, intend, collaterally, to benefit the agricultural, commercial, and manufacturing parts of the community. Did they not wish then to know the relative proportion of each, and the exact number of every division, in order that they might rest their arguments on facts, instead of assertions and conjectures?"In my own view, Madison's plaintive cry "in order that they might rest their arguments on facts" doesn't apply only or even mainly to Congress. Public economic statistics are a way for all citizens to keep tabs on their society and their government, too.
In JEP, Jarmin points out that survey-based methods of collecting government data have been seeing lower response rates. This pattern applies to the the main government surveys of households, including the Current Population Survey, the Survey of Income and Program Participation (SIPP), the Consumer Expenditure Survey, the National Health Interview Survey, and the General Social Survey. Similar concerns apply to surveys of businesses: the Monthly Retail Trade Survey, the Quarterly Services Survey, and the Current Establishment Survey. Surveys have the considerable advantage of being nationally representative, but they also have the disadvantage that you are relying on what people are telling you, rather than what actually happened. For example, if you compare actual payments from the food stamp program to what people report on surveys, you find that many people are receiving assistance from food stamps but not reporting it (or underreporting it) on the survey. Moreover, surveys are costly to carry out.
Can survey-based data be replaced by some combination of administrative data from government programs, private-sector data (which could perhaps be automatically submitted by firms), and "big data" automatically collected from websites? Sure up to a point.
For example, people's income and work can be examined by looking at income tax data and Social Security payroll data. A private company called NPD collects point-of-sale data directly from retailers; could the government tap into this data or perhaps contract with NPD to collect the data teh government desires, rather than doing its own separate survey on retail sales? Instead of collecting price data from stores for the measure of inflation, might it be possible to use automated data from price scanners in stores, or even scrape the data from websites that advertise prices for certain goods?
The papers presented at the CRIW conference talk about lots of specific proposals along these lines. Many are promising, and none are easy. For example, using administrative data from the IRS or Social Security raises concerns about privacy, and practical concerns about linking together data from very different computer systems. Is data collected by a private firm likely to be nationally representative? If the US government relies on a private firm for key data, how does the government make sure that the data isn't disclosed in advance, and what happens to the data if the firm doesn't perform well or goes out of business?
The idea of using data from barcodes to get a detailed view of sales and prices is definitely intriguing. But barcodes often change, which makes analyzing them complex to work with. As Ehrlich, Haltiwanger, Jarmin, Johnson, and Shapiro point out in their paper for the CRIW conference:
Roughly speaking, if a good defined at the barcode or SKU level is sold today, there is only a fifty percent chance it will be sold a year from today. This turnover of goods is one of the greatest challenges of using the raw item-level data for measurement, but also is an enormous opportunity. When new goods replace old goods there is frequently both a change in price and quality. Appropriately identifying whether changing item-level prices imply changes in the cost of living or instead reflect changes in product quality is a core issue for measuring activity and inflation. The statistical agencies currently perform these judgments using a combination of direct comparisons of substitutes, adjustments, and hedonics that are very hard to scale.Moreover, if government statistics are emerging from an array of different sources and evolving over time, how does one figure out whether changes in unemployment, inflation, and GDP are a result of actual changes in the economy, or just changes in how the variables are being measured? How does one balance the desire for accurate and detailed measurement, which often takes time, with a desire for continual immediate updates to the data?
Overall, it seems to me that one can discern a shadowy pattern emerging. There will be highly detailed and representative and costly government statistics published at longer intervals--maybe a year or five years or even 10 years apart. These will often rely on nationally-representative surveys. But In between, when it comes to smaller time intervals of single-month or three-month periods, the updates to these figures will rely more heavily on extrapolations from the administrative and private sources that are available. We will know that these updates are not necessarily representative and subject to later corrections. The short-term updates may not always be fully transparent, because of concerns over privacy from both firms and individuals, but for the short-term, they will be a reasonable way to proceed.
The dream is that it becomes possible to develop better statistics with costs remaining the same or even lower. But for me, some additional investment in government statistics is an inexpensive way of supporting the decisions of firms and policymakers, and providing accountability to citizens.
Here's a list of the papers (and presentation slides) available at the conference website:
- Gabriel Ehrlich, John C. Haltiwanger, Ron S. Jarmin, David Johnson, Matthew D. Shapiro, "Re-Engineering Key National Economic Indicators (slides)"
- Aditya Aladangady, Shifrah Aron-Dine, Wendy Dunn, Laura Feiveson, Paul Lengermann, and Claudia R. Sahm, "From Transactions Data to Economic Statistics: Constructing Real-Time, High-Frequency, Geographic Measures of Consumer Spending (slides)
- Andrea Batch, Jeffrey C. Chen, Alexander Driessen, Abe Dunn, and Kyle K. Hood, "Off to the Races: A Comparison of Machine Learning and Alternative Data for Predicting Economic Indicators(slides)"
- Rishab Guha and Serena Ng, "A Machine Learning Analysis of Seasonal and Cyclical Sales in Weekly Scanner Data (slides)"
- Rebecca J. Hutchinson, "Investigating Alternative Data Sources to Reduce Respondent Burden in United States Census Bureau Retail Economic Data Products (slides)"
- Carol Robbins, Gizem Korkmaz, Jose Bayoan Santiago Calderon, Claire Kelling, Sallie Keller, Stephanie S. Shipp, "The Scope and Impact of Open Source Software: A Framework for Analysis and Preliminary Cost Estimates(slides)"
- Tomaz Cajner, Leland D. Crane, Ryan Decker, Adrian Hamins-Puertolas, and Christopher Kurz, "Improving the Accuracy of Economic Measurement with Multiple Data Sources: The Case of Payroll Employment Data (slides)"
- Andrew L. Baer, J. Bradford Jensen, Shawn D. Klimek, Lisa Singh, Joseph Staudt, and Yifang Wei, "Automating Response Evaluation for Franchising Questions on the 2017 Economic Census (slides)"
- Marina Gindelsky, Jeremy Moulton, and Scott A. Wentland, "Valuing Housing Services in the Era of Big Data: A User Cost Approach Leveraging Zillow Microdata (slides)"
- Abe Dunn, Bureau of Economic Analysis, Dana Goldman, John Romley, University of Southern California Neeraj Sood, "Quantifying Productivity Growth in Health Care Using Insurance Claims and Administrative Data (slides)"
- Edward L. Glaeser, Hyunjin Kim, Michael Luca, "Nowcasting the Local Economy: Using Yelp Data to Measure Economic Activity"
- David Copple, Bradley J. Speigner, and Arthur Turrell, "Transforming Naturally Occurring Text Data into Economic Statistics: The Case of Online Job Vacancy Postings(slides)
- Sudip Bhattacharjee, John Cuffe, Ugochukwu Etudo, Justin Smith, and Nevada Basdeo, "Using Public Data to Generate Industrial Classification Codes (slides)"
- Don Fast and Susan Fleck, "Measuring Export Price Movements With Administrative Trade Data (slides)"
- David Friedman, Crystal G. Konny, and Brendan K. Williams, "Big Data in the U.S. Consumer Price Index: Experiences & Plans (slides)"
- W. Erwin Diewert and Robert C. Feenstra, "Estimating the Benefits of New Products (slides)" and "Estimating the Benefits of New Products: Some Approximations"