By a curious twist of fate, I came across the information about salaries, overtime pay and bonuses for the public sector in the City by the Bay – San Francisco. The dataset included public employees of all ranks, from the Mayor to the janitor of the City Hall.
Without losing a single minute, I dug into the numbers. There is nothing more exciting than looking at the wages of others, especially when you can justify it by doing data science! 🙂
This dataset had the information not only about top executives’ wages but regular employees as well, including the low-level. This data show fundamental patterns in the income distribution which can be easily observed in real life. I would like to use this opportunity and invite all of you, couch economists, to explore an exciting world of impressive numbers and stingy statistics.
We will talk about average and median pay, Gini index for income inequality, rich/poor income ratio, Matthew Effect which describes large accumulating income gap, and career growth.
Let me unveil what data I analyzed and where you can access it. This is the real information about the wages in the public sector in San Francisco, California (San Francisco ranked second among U.S. cities with the worst gap between rich and poor in 2015). This dataset includes names, positions, wages, overtime premium, and bonuses of most public employees for 4 years, 2011 to 2014. The data is not standardized or clean, but good enough for our purposes. They were kindly provided by the State Administration as part of the Transparent California Project. All amounts are gross, before any deductions, in USD per calendar year.
I would like to skip data preparation and analysis steps, as well as graphics drawing code. Please see on Github if you want to know the details. The code is written in Python, using Jupyter, Pandas and Seaborn.
Now, let’s get started.
Transparent California’s dataset includes not only the base pay, but various benefits as well. In order to have our data organized, let’s take the base pay and total pay, which includes all benefits.
Here is the base pay distribution for 4 years:
These four icicle-shaped figures represent income distribution from 2011 to 2014. Base pay is on the vertical axis, and probability density of income distribution is on the horizontal axis. The dashed lines are 1st (25th percentile), 2nd (50th percentile, or the median), and 3rd (75th percentile) quartiles of the income distribution. We can clearly see several thickenings, around $5,000, $65,000, $110,000, and $170,000. This corresponds to four classes of government employees: seasonal workers, staff, highly skilled professionals, and top management. It seems that the icicle is slowly shifting up from left to right, which can be interpreted either as growing wealth levels or inflation.
Let’s take a look at the total pay distribution:
The data for 2011 is very different from the following years. It turned out that the data for 2011 does not include employees’ benefits. Also, 2011 data was formatted differently, which greatly complicates their use for our analysis. Moreover, 2011 was the year of Consolidated Municipal Election, and the wages of employees holding elective positions were calculated for a split year.
In the other years, benefits smoothened out the distribution graphs, virtually eliminating the clear division between four employee types. It means that in public sector, a good staff member who works hard and receives benefits can earn as much as a highly skilled professional who takes it easy.
San Francisco has a reputation as an expensive city to live in. The official minimum wage is about $20,000 a year, while the real living wage starts from $40,000. Why then does Transparent California include so many workers with the wages below minimum? The answer is their status, full-time (FT) or part-time (PT) employee. This means that along with full-timers, it has information about part-timers and freelancers. The Status field appears in the dataset only in 2014.
Here is the total pay distribution split for full-timers and part-timers:
As you can see, the median pay of full-timers is about $130,000 per year.
Average and Median Wages
Let’s discuss how the average and median wages are connected; this topic is often argued about in the Internet, and there is a common opinion that the average pay is much higher than the median because of the top management wages.
Let’s examine the real facts.
On the top, you see an attempt to adjust the wage distribution to the bell curve. The highest distribution density corresponds to the average pay, which is $90,000. The bottom figure represents the wage distribution quartiles, and the midline inside the rectangle is the median pay, which is $85,000. As you can see, the average pay is indeed higher than the median, but the difference is insignificant.
The most stirring income-related problem is equitable distribution (or, as one of Karl Marx’s slogan says, ‘from each according to his ability, to each according to his need’). Our wise ancestors left us metrics for inequality of income distribution. The most popular ones are Gini index and the R/P income ratio.
The Gini index is a statistical measure of dispersion of a variable. In economics, it is used to measure how income is dispersed among the population groups. The Gini index is the ratio between of the area between the diagonal line y = x and Lorenz curve (green) divided by the total area of the triangle below the diagonal line (blue + green):
The Gini index can range from 0 to 100, where 0 is complete equality (everything is blue), and 100 is perfect income inequality, i.e. one person has all the income (everything is green). The Gini index of USA is 45.0, China 47.3, Russia is 42.0, and Germany 27.0. Sweden ranks the lowest in the Gini index being 23.0, while it rises up to 60 or higher in African monarchies.
The graph above shows the Lorenz curve for full-time employees’ incomes. The Gini index there is 18.9 which corresponds to the Soviet times of equalization. You can interpret this as following: if you managed to get into the San Francisco City Hall staff, you will always have a decent salary. Or, on the other hand, if you work in City Hall, there is not much room for career growth there.
Another interesting stratification indicator is rich/poor income ratio. Let’s look at the dark-blue triangle up on the top. Its 20% wide which corresponds to the richest 20% in the dataset. Triangle height is 31%. This means that the richest 20% have 31% of all income.
Then, let’s look at the red triangle. It is 20% wide and corresponds to the poorest 20% and is 12% tall, i.e. the poorest 20% have 12% of all income. R/P 20% ratio reflects the income ratio of the richest 20% to the poorest 20%. For San Francisco public sector employees, R/P 20% is 2.5. You can interpret this index as the ceiling for your career growth or social mobility.
For comparison, let’s look at the Lorenz curve for part-timers:
There is a striking difference with the previous graph. Lorentz curve notably bends inwards, and the green area is larger. The richest 20% area is almost twice as tall, while the poorest 20% area is practically invisible. The Gini index is 53.6, and R/P 20 is 45. This roughly corresponds to the poor countries of South America, with a pronounced stratification between wealthy capitalists and provincial workers.
As the Gospel of Matthew says, “To those who use well what they are given, even more will be given, and they will have an abundance. But from those who do nothing, even what little they have will be taken away.” (Mt 25:29). In other words, the gap between the rich and the poor keeps widening. Protosociologists have been observing this amusing phenomenon for thousands of years, and recently the broad masses have become aware of it as well. The Matthew Effect, as economists call it, concludes that the only way to multiply your fortune and leave an inheritance to your children is to belong to the top of the society. This idea is very unpleasant to think about, and it would be great if we could prove it wrong.
The figure below shows the total wage fund of the San Francisco government sector for 3 years. It grew from $3.70 billion in 2012 to $3.82 billion in 2014, with a total increase of 3.2%.
Let’s plot the average pay in the top and bottom deciles (the largest and the smallest 10% of wage). You can see how the average pay in the top decile increased by 3.0%, while in the bottom decile it dropped by 12.6%.
This does not only confirm the Matthew effect, but also means that it is very profound and easily discernible to the naked eye. Matthew effect can be expected to have been causing strong growth of the Gini index in the U.S. over the past 30 years.
It is worth noting that due to the lack of full-time/part-time markup in the first years in the Transparent California dataset, we estimated income deciles based on the mixed sample of full-timers and freelancers. This could have significantly impacted the result, and I am not fully sure about it, since the gap looks very unlikely.
Since we are talking about career growth, let’s check if it really exists or it is all science fiction. I do not want to get into the details of a career ladder in the U.S. government sector, so let’s pretend that career progress growth solely comes down to salary raise. Transparent California dataset is a listing of the city employees, and the names expectedly repeat from year to year.
Let’s look at the income of the recurrent entries in 2012 and 2014, calculate income growth percentage and plot its distribution per year:
There is a graph of probability density of income growth on the left and a graph of income growth distribution on the right.
First, the income growth of 20% of employees ranges from 0 to 2%, which roughly covers the inflation. The most common income growth is 2% to 5% in 50% of cases, and that is the mode of career growth rate. Second, about 15% were able to achieve 5-10% growth, which can be considered a high rate. No more than 5% showed an outstanding income growth of over 10% per year. It is worth to note that 10% showed a negative income increase, i.е. their pay dropped year-to-year.
In order to build this graph, I had to apply coarsening filters to cut off a long tail formed by the employees who switched from being a freelancer to a full-time municipal employee, thereby increasing their income by 50-100 times. These filters could significantly affect the distribution.
It turned out that even a limited example such as San Francisco public sector allows looking at a wide range of economic and social patterns.
Among them, I can mention the following:
- The salary ranges in the public sector do not vary widely;
- A highly-skilled professional can earn no less than an average manager;
- The median pay is close to the average pay;
- The gap between rich and poor continues to grow;
- Even a static environment in public sector offers opportunities for career growth.
However, this is just one economic sector in one city that is very atypical by itself. I do not recommend making any far-reaching conclusions about statistics in the U.S. or, especially, in the world based on this article.