Tuesday, March 30, 2021

Data and Development

The 2021 World Development Report. one of the annual flagship reports of the World Bank, is focused on the theme of "Data for Better Lives" (released in March 2021). The WDR is one of the flagship reports of the World Bank, and it is always a nice mixture of big-picture overview and specific examples. Here, I'll focus on a few of the themes that occurred to me in reading the report. 

First, there are lots of examples of how improved data can help economic development. For many economists, the first reaction is to think about dissemination of information related to production and markets. As the report notes: 
For millennia, farming and food supply have depended on access to accurate information. When will the rains come? How large will the yields be? What crops will earn the most money at market? Where are the most likely buyers located? Today, that information is being collected and leveraged at an unprecedented rate through data-driven agricultural business models. In India, farmers can access a data-driven platform that uses satellite imagery, artificial intelligence (AI), and machine learning (ML) to detect crop health remotely and estimate yield ahead of the harvest. Farmers can then share such information with financial institutions to demonstrate their potential profitability, thereby increasing their chance of obtaining a loan. Other data-driven platforms provide real-time crop prices and match sellers with buyers.
Other examples are about helping the government focus on improved and more focused provision of public services: 
The 2015 National Water Supply and Sanitation Survey commissioned by Nigeria’s government gathered data from households, water points, water schemes, and public facilities, including schools and health facilities. These data revealed that 130 million Nigerians (or more than two-thirds of the population at that time) did not meet the standard for sanitation set out by the Millennium Development Goals and that inadequate access to clean water was especially an issue for poor households and in certain geographical areas (map O.2). In response to the findings from the report based on these data, President Muhammadu Buhari declared a state of emergency in the sector and launched the National Action Plan for the Revitalization of Nigeria’s Water, Sanitation and
Hygiene (WASH) Sector.
Other examples are from the private sector, like logistics platforms to help coordinate trucking services.

These platforms (often dubbed “Uber for trucks”) match cargo and shippers with trucks for last-mile transport. In lower-income countries, where the supply of truck drivers is highly fragmented and often informal, sourcing cargo is a challenge, and returning with an empty load contributes to high shipping costs. In China, the empty load rate is 27 percent versus 13 percent in Germany and 10 percent in the United States. Digital freight matching overcomes these challenges by matching cargo to drivers and trucks that are underutilized. The model also uses data insights to optimize routing and provide truckers with integrated services and working capital. Because a significant share of logistics services in lower-income countries leverage informal suppliers, these technologies also represent an opportunity to formalize services. Examples include Blackbuck (India), Cargo X (Brazil), Full Truck Alliance (China), Kobo360 (Ghana, Kenya, Nigeria, Togo, Uganda), and Lori (Kenya, Nigeria, Rwanda, South Sudan, Tanzania, Uganda). In addition to using data for matching, Blackbuck uses various data to set reliable arrival times, drawing on global positioning system (GPS) data and predictions on the length of driver stops. Lori tracks data on costs and revenues per lane, along with data on asset utilization, to help optimize services. Cargo X charts routes to avoid traffic and reduce the risk of cargo robbery. Kobo360 chooses routes to avoid armed bandits based on real-time information shared by drivers. Many of the firms also allow shippers to track their cargo in real time. Data on driver characteristics and behavior have allowed platforms to offer auxiliary services to address the challenges that truck drivers face. For example, some platforms offer financial products to help drivers pay upfront costs, such as tolls, fuel, and tires, as well as targeted insurance products. Kobo360 claims that its drivers increase their monthly earnings by 40 percent and that users save an average of about 7 percent in logistics costs. Lori claims that more than 40 percent of grain moving through Kenya to Uganda now moves through its platform, and that the direct costs of moving bulk grain have been reduced by 17 percent in Uganda.

Some examples combine government efforts with privately-generated data. For example, there are estimates that reducing road mortality by half could save 675,000 lives a year. But how can the the government know where to invest on infrastructure and enforcement efforts?  

Unfortunately, many countries facing these difficult choices have little or no data on road traffic crashes and inadequate capacity to analyze the data they do have. Official data on road traffic crashes capture only 56 percent of fatalities in low- and middle-income countries, on average. Crash reports exist, yet they are buried in piles of paper or collected by private operators instead of being converted into useful data or disseminated to the people who need the information to make policy decisions. In Kenya, where official figures underreport the number of fatalities by a factor of 4.5, the rapid expansion of mobile phones and social media provides an opportunity to leverage commuter reports on traffic conditions as a potential source of data on road traffic crashes. Big data mining, combined with digitization of official paper records, has demonstrated how disparate data can be leveraged to inform urban spatial analysis, planning, and management. Researchers worked in close collaboration with the National Police Service to digitize more than 10,000 situation reports spanning from 2013 to 2020 from the 14 police stations in Nairobi to create the first digital and geolocated administrative dataset of individual crashes in the city. They combined administrative data with data crowdsourced using a software application for mobile devices and short message service (SMS) traffic platform, Ma3Route, which has more than 1.1 million subscribers in Kenya. They analyzed 870,000 transport-related tweets submitted between 2012 and 2020 to identify and geolocate 36,428 crash reports by developing and improving natural language processing and geoparsing algorithms. ... By combining these sources of data, researchers were able to identify the 5 percent of roads ... where 50 percent of the road traffic deaths occur in the city ... This exercise demonstrates that addressing data scarcity can transform an intractable problem into a more
manageable one.
There are lots of other examples in the report. "For remote populations around the world, receiving specialized medical care has been nearly impossible without having to travel miles to urban areas. Today, telehealth clinics and their specialists can monitor and diagnose patients remotely using sensors that collect patient health data and AI that helps analyze such data." Similar points can be made about delivering education services. "DigiCow, pioneered in Kenya, keeps digital health records on cows and matches farmers with qualified veterinary services."

My second main reaction to the report is that, despite the many individual examples of how data can help in economic development, there are substantial gaps in the data infrastructure for developing economies. At the national level, most countries now do a full census about once a decade, which often provide a reasonable population count at that time. But details on the population are often scanty. The report notes: 
Lack of completeness is often less of a problem in census and survey data because they are designed to cover the entire population of interest. For administrative data, the story is different. Civil registration and vital statistics systems (births and deaths) are not complete in any low-income country, compared with completeness in 22 percent of lower-middle-income countries, 51 percent of upper-middle-income countries, and 95 percent of high-income countries. These gaps leave about 1 billion people worldwide
without official proof of identity. More than one-quarter of children overall, and more than half of children in Sub-Saharan Africa, under the age of five are not registered at birth.
As another example of missing data, "Ground-based sensors, deployed in Internet of Things systems, can measure some outcomes, such as air pollution, climatic conditions, and water quality, on a continual basis and at a low cost. However, adoption of these technologies is still too limited to provide timely data at scale, particularly in low-income countries."

In some cases, it's possible to use other data sources to fill in some of the gaps. For example, measuring poverty is often done by carrying out much more detailed household surveys in a few areas, and then using the once-a-decade census data to project this to the country as a whole. The result is a reasonable statistical estimate of the poverty rate for the country as a whole, but not much knowledge about the location of actual poor people across the country. The report notes: 
Estimates of poverty are usually statistically valid for a nation and at some slightly finer level of geographic stratification, but rarely are such household surveys designed to provide the refined profiles of poverty that would allow policies to mitigate poverty to target the village level or lower. Meanwhile, for decades high-resolution poverty maps have been produced by estimating a model of poverty from survey data and then mapping this model onto census data, allowing an estimate of poverty for every household in the census data. A problem with this approach is that census data are available only once a decade (and in many poorer countries even less frequently). Modifications of this approach have replaced population census data with CDR [call detail record, from phones] data or various types of remote sensing data (typically from satellites, but also from drones). This repurposing of CDR or satellite data can provide greater resolution and timelier maps of poverty. For example, using only household survey data the government of Tanzania was able to profile the level of poverty across only 20 regions of the country’s mainland. Once the household survey data were combined with satellite imagery data, it became possible to estimate poverty for each of the country’s 169 districts (map O.3). Combining the two data sources increased the resolution of the poverty picture by eightfold with essentially no loss of precision.
The complimentary problem with lack of data is that is that data infrastructure in many low-income countries is often weak. This is a problem in the obvious way that many people and firms have a hard time accessing available data. But it's also a problem in a less obvious way: people who can't access data also can't contribute to data, and thus can't answer surveys, report on local conditions, offer feedback and advice, or offer access to data on purchase patterns and even (via cell-phone data) on location patterns. As the report notes: 
That said, efforts to move toward universal access face fundamental challenges. First, because of the continual technological innovation in mobile technology service, coverage is a moving target. Whereas in 2018, 92 percent of the world’s population lived within range of a 3G signal (offering speeds of 40 megabytes per second), that share dropped to 80 percent for 4G technology (providing faster speeds of 400 megabytes per second, which are needed for more sophisticated smartphone applications that can promote development). The recent commercial launch of 5G technology (reaching speeds of 1,000 megabytes per second) in a handful of leading-edge markets risks leaving the low-income countries even further behind. ...
The second challenge is that a substantial majority of the 40 percent of the world’s population who do not use data services live within range of a broadband signal. Of people living in low- and middle-income countries who do not access the internet, more than two-thirds stated in a survey that they do not know what the internet is or how to use it, indicating that digital literacy is a major issue.
Affordability is also a factor in low- and middle-income countries, where the cost of an entry-level smartphone represents about 80 percent of monthly income of the bottom 20 percent of households. Relatively high taxes and  duties further contribute to this expense. As costs come down in response to innovation, competitive pressures, and sound government policy, uptake in use of the internet will likely increase. Yet even among those who do use the internet, consumption of data services stands at just 0.2 gigabytes per capita per month, a fraction of what this Report estimates may be needed to perform basic social and economic functions online.
As a third reaction, the report often refers to potential dangers of increasing the role of data in an economy, including invasions of personal privacy and the danger of monopolistic companies using data to exploit consumers. In high-income countries and some middle-income countries, these are certainly important subjects for discussion. But in the context of low-income economies, it seems to me that the challenges of the lack of data are so substantial that worries about problems from widespread data are premature. 

The situation reminds me of Joan Robinson's comment in her 1962 book Economic Philosophy (p. 46 of my Pelican Book edition): "The misery of being exploited by capitalists is nothing compared to the misery of not being exploited at all." In a similar spirit, one might say that the misery of data being misused or monopolized is nothing compared to the misery of data barely being used at all. 

Finally, data is of course not valuable in isolation, but rather because of the ways that it may help people and firms and government to choose different actions. In the examples above, for instance, data can help government understand the location of social needs, or help a farmer adjust agricultural practices, or help a producer ship a products to a buyer, or a provide a method for someone to find work in the gig economy.  Data flows are also a feedback mechanism, both for markets and for government Without data to show the extent of problems, it's harder to hold public officials accountable.  

For some previous posts with additional discussion of government data and academic data, much of it from the context of the US and other high-income countries, see: