Thursday, April 3, 2014

What's a Gini Coefficient?

When you look up economic statistics about inequality, you often see it measured with a Gini coefficient. But where does the Gini coefficient come from, how is it calculated, and intuitively what does it mean? Here are some thoughts.

The most straightforward way to think about the Gini coefficient is to start with a different but related tool for measuring inequality, a figure called a Lorenz curve. The Lorenz curve was developed by an American statistician and economist named Max Lorenz when he was a graduate student at the University of Wisconsin. His article on the the topic "Methods of Measuring the Concentration of Wealth," appeared in Publications of the American Statistical Association , Vol. 9, No. 70 (Jun., 1905), pp. 209-219.  The Congressional Budget Office presented a nice tight description of a Lorenz curve in a 2011 report:  
"The cumulative percentage of income can be plotted against the cumulative percentage of the population, producing a so-called Lorenz curve (see the figure). The more even the income distribution is, the closer to a 45-degree line the Lorenz curve is. At one extreme, if each income group had the same income, then the cumulative income share would equal the cumulative population share, and the Lorenz curve would follow the 45-degree line, known as the line of equality. At the other extreme, if the highest income group earned all the income, the Lorenz curve would be flat across the vast majority of the income range,following the bottom edge of the figure, and then jump to the top of the figure at the very right-hand edge.
"Lorenz curves for actual income distributions fall between those two hypothetical extremes. Typically, they intersect the diagonal line only at the very first and last points. Between those points, the curves are bow-shaped below the 45-degree line. The Lorenz curve of market income falls to the right and below the curve for after-tax income, reflecting its greater inequality. Both curves fall to the right and below the line of equality, reflecting the inequality in both market income and after-tax income."

The Gini coefficient is calculated as an area taken from the Lorenz curve. The Gini coefficient was developed by an Italian statistician (and noted fascist thinker) Corrado Gini in a 1912 paper written in Italian (and to my knowledge not freely available on the web). The intuition is straightforward (although the mathematical formula will look a little messier). On a Lorenz curve, greater equality means that the line based on actual data is closer to the 45-degree line that shows a perfectly equal distribution. Greater inequality means that the line based on actual data will be more "bowed" away from the 45-degree line. The Gini coefficient is based on the area between the 45-degree line and the actual data line. As the CBO writes in its 2011 report:

"The Gini index is equal to twice the area between the 45-degree line and the Lorenz curve. Once again, the extreme cases of complete equality and complete inequality bound the measure. At one extreme, if income was evenly distributed and the Lorenz curve followed the 45-degree line, there would be no area between the curve and the line, so the Gini index would be zero. At the other extreme, if all income was in the highest income group, the area between the line and the curve would be equal to the entire area under the line, and the Gini index would equal one. The Gini index for [U.S.] after-tax income in 2007 was 0.489—about halfway between those two extremes."
To put it another way, the Lorenz curve plots the full range of data on the distribution of income. The Gini coefficient boils down that full range of data to a single number, which is why it's useful for comparisons. But because the Gini boils down the overall distribution of income to a single number, it also loses some detail. For example, if the Gini coefficient has risen, is this because the share going to the top 20% went up, or the top 10%, top 1%, or top 0.1%? You can see these kinds of differences on a Lorenz curve, if you know what you're looking for, but the Gini alone doesn't tell you which is true. 

So that's the graphical meaning of the Gini coefficient. But what is the intuitive meaning? I posted last weak about an intriguing "Chartbook of Economic Inequality," written by Tony Atkinson and Salvatore Morelli. In their overview of why they use the statistics they use, they write:

"The [income] distribution is summarised in a single summary statistic, typically the Gini  coefficient, which is not our preferred statistic but that most commonly published  by statistical agencies. The explanation of the coefficient given by most agencies  takes the form of geometry, but we prefer to describe it in terms of the mean  difference. A Gini coefficient of G per cent means that, if we take any 2 households from the population at random, the expected difference is 2G per cent of the mean. So that a rise in the Gini coefficient from 30 to 40 per cent implies that the expected difference has gone up from 60 to 80 per cent of the mean."
Atkinson and Morelli add another way to interpret the Gini coefficient:
Another useful way of thinking, suggested by Amartya Sen, is in terms of  “distributionally adjusted” national income, which with the Gini coefficient is (100-G) per cent of national income. So that a rise in the Gini coefficient from 30 to 40  per cent is equivalent to reducing national income by 14 per cent (1/7)." 
Note: This post in part recycles some explanations of the Gini that appeared previously in this blog several years ago, but it seemed useful to put the discussion all in one place.