Data Science, Economics, Falsifiable

The Scale of Inequality

When dealing with questions of inequality, I often get boggled by the sheer size of the numbers. People aren’t very good at intuitively parsing the difference between a million and a billion. Our brains round both to “very large”. I’m actually in a position where I get reminded of this fairly often, as the difference can become stark when programming. Running a program on a million points of data takes scant seconds. Running the same set of operations on a billion data points can take more than an hour. A million seconds is eleven and a half days. A billion seconds 31 years.

Here I would like to try to give a sense of the relative scale of various concepts in inequality. Just how much wealth do the wealthiest people in the world possess compared to the rest? How much of the world’s middle class is concentrated in just a few wealthy nations? How long might it take developing nations to catch up with developed nations? How long before there exists enough wealth in the world that everyone could be rich if we just distributed it more fairly?

According to the Forbes billionaire list, there are (as of the time of writing) 2,208 billionaires in the world, who collectively control $9.1 trillion in wealth (1 trillion seconds ago was the year 29691 BCE, contemporaneous with the oldest cave paintings in Europe). This is 3.25% of the total global wealth of $280 trillion.

The US Federal Budget for 2019 is $4.4 trillion. State governments and local governments each spend another $1.9 trillion. Some $700 billion dollars is given to those governments by the Federal government. With that subtracted, total US government spending is projected to be $7.5 trillion next year.

Therefore, the whole world population of billionaires holds assets equivalent to 1.2 years of US government outlays. Note that US government outlays aren’t equivalent to that money being destroyed. It goes to pay salaries or buy equipment. The comparison here is simply to illustrate how private wealth stacks up against the budgets that governments control.

If we go down by a factor of 1000, there are about 15 million millionaires in the world (according to Wikipedia). Millionaires collectively hold $37.1 trillion (13.25% of all global wealth). All of the wealth that millionaires hold would be enough to fund US government spending for five years.

When we see sensational headlines, like “Richest 1% now owns half the world’s wealth“, we tend to think that we’re talking about millionaires and billionaires. In fact, millionaires and billionaires only own about 16.5% of the world’s wealth (which is still a lot for 0.2% of the world’s population to hold). The rest is owned by less wealthy individuals. The global 1% makes $32,400 a year or more. This is virtually identical to the median American yearly salary. This means that almost fully half of Americans are in the global 1%. Canadians now have a similar median wage, which means a similar number are in the global 1%.

To give a sense of how this distorts the global middle class, I used Povcal.net, the World Bank’s online tool for poverty measurement. I looked for the percentage of a country’s population making between 75% and 125% of the median US income (at purchasing power parity, which takes into account cheaper goods and services in developing countries), equivalent to $64-$107US per day (which is what you get when you divide 75% and 125% of the median US wage by 365 – as far as I can tell, this is the procedure that gives us numbers like $1.25 per day income as the threshold for absolute poverty).

I grabbed what I thought would be an interesting set of countries: The G8, BRICS, The Next 11, Australia, Botswana, Chile, Spain, and Ukraine. These 28 countries had – in the years surveyed – a combined population of 5.3 billion people and had among them the 17 largest economies in the world (in nominal terms). You can see my spreadsheet collecting this data here.

The United States had by far the largest estimated middle class (73 million people), followed by Germany (17 million), Japan (12 million), France (12 million), and the United Kingdom (10 million). Canada came next with 8 million, beating most larger countries, including Brazil, Italy, Korea, Spain, Russia, China, and India. Iran and Mexico have largely similar middle-class sizes, despite Mexico being substantially larger. Botswana ended up having a larger middle class than the Ukraine.

This speaks to a couple of problems when looking at inequality. First, living standards (and therefore class distinctions) are incredibly variable from country to country. A standard of living that is considered middle class in North America might not be the same in Europe or Japan. In fact, I’ve frequently heard it said that the North American middle class (particularly Americans and Canadians) consume more than their equivalents in Europe. Therefore, this should be looked at as a comparison of North American equivalent middle class – who, as I’ve already said, are about 50% encompassed in the global 1%.

Second, we tend to think of countries in Europe as generally wealthier than countries in Africa. This isn’t necessarily true. Botswana’s GDP per capita is actually three times larger than Ukraine’s when unadjusted and more than twice as large at purchasing power parity (which takes into account price differences between countries). It also has a higher GDP per capita than Serbia, Albania, and Moldova (even at purchasing power parity). Botswana, Seychelles, and Gabon have per capita GDPs at purchasing power parity that aren’t dissimilar from those possessed by some less developed European countries.

Botswana, Gabon, and Seychelles have all been distinguished by relatively high rates of growth since decolonization, which has by now made them “middle income” countries. Botswana’s growth has been so powerful and sustained that in my spreadsheet, it has a marginally larger North American equivalent middle class than Nigeria, a country approximately 80 times larger than it.

Of all the listed countries, Canada had the largest middle class as a percent of its population. This no doubt comes partially from using North American middle-class standards (and perhaps also because of the omission of the small, homogenous Nordic countries), although it is also notable that Canada has the highest median income of major countries (although this might be tied with the United States) and the highest 40th percentile income. America dominates income for people in the 60th percentile and above, while Norway comes out ahead for people in the 30th percentile or below.

The total population of the (North American equivalent) middle class in these 28 countries was 170 million, which represents about 3% of their combined population.

There is a staggering difference in consumption between wealthy countries and poor countries, in part driven by the staggering difference in the size of middle (and higher classes) – people with income to spend on things beyond immediate survival. According to Trading Economics, the total disposable income of China is $7.84 trillion (dollars are US). India has $2.53 trillion. Canada, with a population almost 40 times smaller than either, has a total disposable income of $0.96 trillion, while America, with a population about four times smaller than either China or India has a disposable income of $14.79 trillion, larger than China and India put together. If China was as wealthy as Canada, its yearly disposable income would be almost $300 trillion, approximately equivalent to the total amount of wealth in the world.

According to Wikipedia, The Central African Republic has the world’s lowest GDP per capita at purchasing power parity, making it a good candidate for the title of “world’s poorest country”. Using Povcal, I was able to estimate the median wage at $1.33 per day (or $485 US per year). If the Central African Republic grew at the same rate as Botswana did post-independence (approximately 8% year on year) starting in 2008 (the last year for which I had data) and these gains were seen in the median wage, it would take until 2139 for it to attain the same median wage as the US currently enjoys. This of course ignores development aid, which could speed up the process.

All of the wealth currently in the world is equivalent to $36,000 per person (although this is misleading, because much of the world’s wealth is illiquid – it’s in houses and factories and cars). All of the wealth currently on the TSX is equivalent to about $60,000 per Canadian. All of the wealth currently on the NYSE is equivalent to about $65,000 per American. In just corporate shares alone, Canada and the US are almost twice as wealthy as the global average. This doesn’t even get into the cars, houses, and other resources that people own in those countries.

If total global wealth were to grow at the same rate as the market, we might expect to have approximately $1,000,000 per person (not inflation adjusted) sometime between 2066 and 2072, depending on population growth. If we factor in inflation and want there to be approximately $1,000,000 per person in present dollars, it will instead take until sometime between 2102 and 2111.

This assumes too much, of course. But it gives you a sense of how much we have right now and how long it will take to have – as some people incorrectly believe we already do – enough that everyone could (in a fair world) have so much they might never need to work.

This is not of course, to say, that things are fair today. It remains true that the median Canadian or American makes more money every year than 99% of the world, and that the wealth possessed by those median Canadians or Americans and those above them is equivalent to that held by the bottom 50% of the world. Many of us, very many of those reading this perhaps, are the 1%.

That’s the reality of inequality.

Data Science, Economics, Falsifiable

Is Google Putting Money In Your Pocket?

The Cambridge Analytica scandal has put tech companies front and centre. If the thinkpieces along the lines of “are the big tech companies good or bad for society” were coming out any faster, I might have to doubt even Google’s ability to make sense of them all.

This isn’t another one of those thinkpieces. Instead it’s an attempt at an analysis. I want to understand in monetary terms how much one tech company – Google – puts into or takes out of everyone’s pockets. This analysis is going to act as a template for some of the more detailed analyses of inequality I’d like to do later, so if you have a comment about methodology, I’m eager to hear it.

Here’s the basics: Google is a large technology company that primarily makes money off of ad revenues. Since Google is a publicly traded company, statistics are easy to come by. In 2016, Google brought in $89.5 billion in revenue and about 89% of that was from advertising. Advertising is further broken down between advertising on Google sites (e.g. Google Search, Gmail, YouTube, Google Maps, etc.) which account for 80% of advertising revenue and advertising on partner sites, which covers the remainder. The remaining 11% is made up of a variety of smaller projects – selling corporate licenses of its GSuite office software, the Google Play Store, the Google Cloud Computing Platform, and several smaller projects.

There are two ways that we can track how Google’s existence helps or hurts you financially. First, there’s the value of the software it provides. Google’s search has become so important to our daily life that we don’t even notice it anymore – it’s like breathing. Then there’s YouTube, which has more high-quality content than anyone could watch in a lifetime. There’s Google Docs, which are almost a full (free!) replacement for Microsoft Office. There’s Gmail, which is how basically everyone I know does their email. And there’s Android, currently the only viable alternative to iOS. If you had to pay for all of this stuff, how much would you be out?

Second, we can look at how its advertising arm has changed the prices of everything we buy. If Google’s advertising system has driven an increase in spending on advertising (perhaps by starting an arms race in advertising, or by arming marketing managers with graphs, charts and metrics that they can use to trigger increased spending), then we’re all ultimately paying for Google’s software with higher prices elsewhere (we could also be paying with worse products at the same prices, as advertising takes budget that would otherwise be used on quality). On the other hand, if more targeted advertising has led to less advertising overall, then everything will be slightly less expensive (or higher quality) than the counterfactual world in which more was spent on advertising.

Once we add this all up, we’ll have some sort of answer. We’ll know if Google has made us better off, made us poorer, or if it’s been neutral. This doesn’t speak to any social benefits that Google may provide (if they exist – and one should hope they do exist if Google isn’t helping us out financially).

To estimate the value of the software Google provides, we should compare it to the most popular paid alternatives – and look into the existence of any other good free alternatives. Because of this, we can’t really evaluate Search, but because of its existence, let’s agree to break any tie in favour of Google helping us.

On the other hand, Google docs is very easy to compare with other consumer alternatives. Microsoft Office Home Edition costs $109 yearly. Word Perfect (not that anyone uses it anymore) is $259.99 (all prices should be assumed to be in Canadian dollars unless otherwise noted).

Free alternatives exist in the form of OpenOffice and LibreOffice, but both tend to suffer from bugs. Last time I tried to make a presentation in OpenOffice I found it crashed approximately once per slide. I had a similar experience with LibreOffice. I once installed it for a friend who was looking to save money and promptly found myself fixing problems with it whenever I visited his house.

My crude estimate is that I’d expect to spend four hours troubleshooting either free alternative per year. Weighing this time at Ontario’s minimum wage of $14/hour and accepting that the only office suite that anyone under 70 ever actually buys is Microsoft’s offering and we see that Google saves you $109 per year compared to Microsoft and $56 each year compared to using free software.

With respect to email, there are numerous free alternatives to Gmail (like Microsoft’s Hotmail). In addition, many internet service providers bundle free email addresses in with their service. Taking all this into account, Gmail probably doesn’t provide much in the way of direct monetary value to consumers, compared to its competitors.

Google Maps is in a similar position. There are several alternatives that are also free, like Apple Maps, Waze (also owned by Google), Bing Maps, and even the Open Street Map project. Even if you believe that Google Maps provides more value than these alternatives, it’s hard to quantify it. What’s clear is that Google Maps isn’t so far ahead of the pack that there’s no point to using anything else. The prevalence of Google Maps might even be because of user laziness (or anticompetitive behaviour by Google). I’m not confident it’s better than everything else, because I’ve rarely used anything else.

Android is the last Google project worth analyzing and it’s an interesting one. On one hand, it looks like Apple phones tend to cost more than comparable Android phones. On the other hand, Apple is a luxury brand and it’s hard to tell how much of the added price you pay for an iPhone is attributable to that, to differing software, or to differing hardware. Comparing a few recent phones, there’s something like a $50-$200 gap between flagship Android phones and iPhones of the same generation. I’m going to assign a plausible sounding $20 cost saved per phone from using Android, then multiply this by the US Android market share (53%), to get $11 for the average consumer. The error bars are obviously rather large on this calculation.

(There may also be second order effects from increased competition here; the presence of Android could force Apple to develop more features or lower its prices slightly. This is very hard to calculate, so I’m not going to try to.)

When we add this up, we see that Google Docs save anyone who does word processing $50-$100 per year and Android saves the average phone buyer $11 approximately every two years. This means the average person probably sees some slight yearly financial benefit from Google, although I’m not sure the median person does. The median person and the average person do both get some benefit from Google Search, so there’s something in the plus column here, even if it’s hard to quantify.

Now, on to advertising.

I’ve managed to find an assortment of sources that give a view of total advertising spending in the United States over time, as well as changes in the GDP and inflation. I’ve compiled it all in a spreadsheet with the sources listed at the bottom. Don’t just take my word for it – you can see the data yourself. Overlapping this, I’ve found data for Google’s revenue during its meteoric rise – from $19 million in 2001 to $110 billion in 2017.

Google ad revenue represented 0.03% of US advertising spending in 2002. By 2012, a mere 10 years later, it was equivalent to 14.7% of the total. Over that same time, overall advertising spending increased from $237 billion in 2002 to $297 billion in 2012 (2012 is the last date I have data for total advertising spending). Note however that this isn’t a true comparison, because some Google revenue comes from outside of America. I wasn’t able to find revenue broken down in greater depth that this, so I’m using these numbers in an illustrative manner, not an exact manner.

So, does this mean that Google’s growth drove a growth in advertising spending? Probably not. As the economy is normally growing and changing, the absolute amount of advertising spending is less important than advertising spending compared to the rest of the economy. Here we actually see the opposite of what a naïve reading of the numbers would suggest. Advertising spending grew more slowly than economic growth from 2002 to 2012. In 2002, it was 2.3% of the US economy. By 2012, it was 1.9%.

This also isn’t evidence that Google (and other targeted advertising platforms have decreased spending on advertising). Historically, advertising has represented between 1.2% of US GDP (in 1944, with the Second World War dominating the economy) and 3.0% (in 1922, during the “roaring 20s”). Since 1972, the total has been more stable, varying between 1.7% and 2.5%. A Student’s T-test confirms (P-values around 0.35 for 1919-2002 vs. 2003-2012 and 1972-2002 vs. 2003-2012) that there’s no significant difference between post-Google levels of spending and historical levels.

Even if this was lower than historical bounds, it wouldn’t necessarily prove Google (and its ilk) are causing reduced ad spending. It could be that trends would have driven advertising spending even lower, absent Google’s rise. All we can for sure is that Google hasn’t caused an ahistorically large change in advertising rates. In fact, the only thing that is clear in the advertising trends is the peak in the early 1920s that has never been recaptured and a uniquely low dip in the 1940s that seems to have obviously been caused by World War II. For all that people talk about tech disrupting advertising and ad-supported businesses, these current changes are still less drastic than changes we’ve seen in the past.

The change in advertising spending during the years Google is growing could be driven by Google and similar advertising services. But it also could be normal year to year variation, driven by trends similar to what have driven it in the past. If I had a Ph. D. in advertising history, I might be able to tell you what those trends are, but from my present position, all I can say is that the current movement doesn’t seem that weird, from a historical perspective.

In summary, it looks like the expected value for the average person from Google products is close to $0, but leaning towards positive. It’s likely to be positive for you personally if you need a word processor or use Android phones, but the error bounds on advertising mean that it’s hard to tell. Furthermore, we can confidently say that the current disruption in the advertising space is probably less severe than the historical disruption to the field during World War II. There’s also a chance that more targeted advertising has led to less advertising spending (and this does feel more likely than it leading to more spending), but the historical variations in data are large enough that we can’t say for sure.

Data Science, Literature, Model

Two Ideas Worth Sharing From ‘Weapons of Math Destruction’

Recently, I talked about what I didn’t like in Dr. Cathy O’Neil’s book, Weapons of Math Destruction. This time around, I’d like to mention two parts of it I really liked. I wish Dr. O’Neil put more effort into naming the concepts she covered; I don’t have names for them from WMD, but in my head, I’ve been calling them Hidden Value Encodings and Axiomatic Judgements.

Hidden Value Encodings

Dr. O’Neil opens the book with a description of the model she uses to cook for her family. After going into a lot of detail about it, she makes this excellent observation:

Here we see that models, despite their reputation for impartiality, reflect goals and ideology. When I removed the possibility of eating Pop-Tarts at every meal, I was imposing my ideology on the meals model. It’s something we do without a second thought. Our own values and desires influence our choices, from the data we choose to collect to the questions we ask. Models are opinions embedded in mathematics.

It is far too easy to view models as entirely empirical, as math made form and therefore blind to values judgements. But that couldn’t be further from the truth. It’s value judgements all the way down.

Imagine a model that tries to determine when a credit card transaction is fraudulent. Fraudulent credit cards transactions cost the credit card company money, because they must refund the stolen amount to the customer. Incorrectly identifying credit card transactions also costs a company money, either through customer support time, or if the customer gets so fed up by constant false positives that they switch to a different credit card provider.

If you were tasked with building a model to predict which credit card transactions were fraudulent by one of the major credit card companies, you would probably build into your model a variable cost for failing to catch fraudulent transactions (equivalent to the cost the company must bear if the transaction is fraudulent) and a fixed cost for labelling innocuous transactions as fraudulent (equivalent to the average cost of a customer support call plus the average chance of a false positive pushing someone over the edge into switching cards multiplied by the cost of their lost business over the next few years).

From this encoding, we can already see that our model would want to automatically approve all transactions below the fixed cost of dealing with false positives [1], while applying increasing scrutiny to more expensive items, especially expensive items with big resale value or items more expensive than the cardholder normally buys (as both of these point strongly toward fraud).

This seems innocuous and logical. It is also encoding at least two sets of values. First, it encodes the values associated with capitalism. At the most basic level, this algorithm “believes” that profit is good and losses are bad. It is aimed to maximize profit for the bank and while we may hold this as a default assumption for most algorithms associated with companies, that does not mean it is devoid of values; instead it encodes all of the values associated with capitalism [2]. Second, the algorithm encodes some notion that customers have freedom to choose between alternatives (even more so than is encoded by default in accepting capitalism).

By applying a cost to false positives (and likely it would be a cost that rises with each previous false positive), you are tacitly acknowledging that customers could take their business elsewhere. If customers instead had no freedom to choose who they did business with, you could merely encode as your loss from false positives the fixed cost of fielding support calls. Since outsourced phone support is very cheap, your algorithm would care much less about false positives if there was no consumer choice.

As far as I can tell, there is no “value-free” place to stand. An algorithm in the service of a hospital that helps diagnose patients or focus resources on the most ill encodes the value that “it is better to be healthy than sick; better to be alive than dead”. These values might be (almost-)universal, but they still exist, they are still encoded, and they still deserve to be interrogated when we put functions of our society in the hands of software governed by them.

Axiomatic Judgements

One of the most annoying parts of being a child is the occasional requirement to accept an imposition on your time or preferences with the explanation “because I say so”. “Because I say so” isn’t an argument, it’s a request that you acknowledge adults’ overwhelming physical, earning, and social power as giving them a right to set arbitrary rules for you. Some algorithms, forced onto unwelcoming and less powerful populations (teachers, job-seekers, etc.) have adopted this MO as well. Instead of having to prove that they have beneficial effects or that their outputs are legitimate, they define things such that their outputs are always correct and brook no criticism.

Here’s Dr. O’Neil talking about a value-added teaching model in Washington State:

When Mathematica’s scoring system tags Sarah Wysocki and 205 other teachers as failures, the district fires them. But how does it ever learn if it was right? It doesn’t. The system itself has determined that they were failures, and that is how they are viewed. Two hundred and six “bad” teachers are gone. That fact alone appears to demonstrate how effective the value-added model is. It is cleansing the district of underperforming teachers. Instead of searching for the truth, the score comes to embody it.

She contrasts this with how Amazon operates: “if Amazon.​com, through a faulty correlation, started recommending lawn care books to teenage girls, the clicks would plummet, and the algorithm would be tweaked until it got it right.” On the other hand, the teacher rating algorithm doesn’t update, doesn’t look check if it is firing good teachers, and doesn’t take an accounting of its own costs. It holds it as axiomatic ­–a basic fact beyond questioning– that its results are the right results.

I am in full agreement with Dr. O’Neil’s criticism here. Not only does it push past the bounds of fairness to make important decisions, like hiring and firing, through opaque formulae that are not explained to those who are being judged and lack basic accountability, but it’s a professional black mark on all of the statisticians involved.

Whenever you train a model, you hold some data back. This is your test data and you will use it to assess how well your model did. That gets you through to “production” – to having your model out in the field. This is an exciting milestone, not only because your model is now making decisions and (hopefully) making them well, but because now you’ll have way more data. You can see how your new fraud detection algorithm does by the volume of payouts and customer support calls. You can see how your new leak detection algorithm does by customers replying to your emails and telling you if you got it right or not.

A friend of mine who worked in FinTech once told me that they approved 1.5% of everyone who applied for their financial product, no matter what. They’d keep the score their model gave to that person on record, then see how the person fared in reality. If they used the product responsibly despite a low score, or used it recklessly despite a high score, it was viewed as valuable information that helped the team make their model that much better. I can imagine a team of data scientists, heads together around a monitor, looking through features and asking each other “huh, do any of you see what we missed here?” and it’s a pleasant image [3].

Value added teaching models, or psychological pre-screens for hiring do nothing of the sort (even though it would be trivial for them to!). They give results and those results are defined as the ground truth. There’s no room for messy reality to work its way back into the cycle. There’s no room for the creators to learn. The algorithm will be flawed and imperfect, like all products of human hands. That is inevitable. But it will be far less perfect than it could be. Absent feedback, it is doomed to always be flawed, in ways both subtle and gross, and in ways unknown to its creators and victims.

Like most Canadian engineering students, I made a solemn vow:

…in the presence of these my betters and my equals in my calling, [I] bind myself upon my honour and cold iron, that, to the best of my knowledge and power, I will not henceforward suffer or pass, or be privy to the passing of, bad workmanship or faulty material in aught that concerns my works before mankind as an engineer…

Sloppy work, like that value-added teacher model is the very definition of bad workmanship. Would that I never suffer something like that to leave my hands and take life in the world! It is no Quebec Bridge, but the value-added teaching model and other doomed to fail algorithms like it represent a slow-motion accident, steadily stealing jobs and happiness from people with no appeal or remorse.

I can accept stains on the honour of my chosen profession. Those are inevitable. But in a way, stains on our competence are so much worse. Models that take in no feedback are both, but the second really stings me.

Footnotes

[1] This first approximation isn’t correct in practice, because certain patterns of small transactions are consistent with fraud. I found this out the hard way, when a certain Bitcoin exchange’s credit card verification procedure (withdrawing less than a dollar, then refunding it a few days later, after you tell them how much they withdrew) triggered the fraud detection software at my bank. Apparently credit card thieves will often do a similar thing (minus the whole “ask the cardholder how much was withdrawn” step), as a means of checking if the card is good without cluing in the cardholder. ^

[2] I don’t mean this as a criticism of capitalism. I seek merely to point out (that like all other economic systems) capitalism is neither value neutral, nor inevitable. “Capitalism” encodes values like “people are largely rational”, “people often act to maximize their gains” and “choice is fundamentally good and useful”. ^

If socialist banks had ever made it to the point of deploying algorithms (instead of collapsing under the weight of their flawed economic system), those algorithms would also encode values (like “people will work hard for the good of the whole” and “people are inherently altruistic” and “it is worth it to sacrifice efficiency in the name of fairness”).

[3] Dulce et decorum est… get the fucking data science right. ^

Data Science, Literature, Model

Two Fallacies From ‘Weapons of Math Destruction’

Much thanks to Cody Wild for providing editing and feedback. That said, I would like to remind my readers that I deserve full credit for all errors and that all opinions expressed here are only guaranteed to be mine.

[12 minute read]

I recently read Weapons of Math Destruction by Dr. Cathy O’Neil and found it an enormously frustrating book. It’s not that whole book was rubbish ­– that would have made things easy. No, the real problem with this book is that the crap and the pearls were so closely mixed that I had to stare at every sentence very, very carefully in hopes of figuring out which one each was. There’s some good stuff in here. But much of Dr. O’Neil’s argumentation relies on two new (to me) fallacies. It’s these fallacies (which I’ve dubbed the Ought-Is Fallacy and the Availability Bait-and-Switch) that I want to explore today.

Ought-Is Fallacy

It’s a commonly repeated truism that “correlation doesn’t imply causation”. People who’ve been around the statistics block a bit longer might echo Randall Monroe and retort that “correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there'”. Understanding why a graph like this:

In addition to this graph obviously being anchored, using it is obviously fair use.
Image Copyright The New York Times, 2017. Used here for purposes of commentary and criticism.

Is utter horsecrap [1], despite how suggestive it looks is the work of a decent education in statistics. Here correlation doesn’t imply causation. On the other hand, it’s not hard to find excellent examples where correlation really does mean causation:

This would be a risky graph to use if echo chambers didn't mean that I know literally no one who doesn't believe in global warming
Source: The National Centers for Environmental Administration. Having to spell “centre” wrong and use inferior units is a small price to pay for the fact that the American government immediately releases everything it creates into the public domain.

When trying to understand the ground truth, it’s important that you don’t confuse correlation with causation. But not every human endeavour is aimed at determining the ground truth. Some endeavours really do just need to understand which activities and results are correlated. Principal among these is insurance.

Let’s say I wanted to sell you “punched in the face” insurance. You’d pay a small premium every month and if you were ever punched in the face hard enough to require dental work, I’d pay you enough to cover it [2]. I’d probably charge you more if you were male, because men are much, much more likely to be seriously injured in an assault than women are.

I’m just interested in pricing my product. It doesn’t actually matter if being a man is causal of more assaults or just correlated with it. It doesn’t matter if men aren’t inherently more likely to assault and be assaulted compared to women (for a biological definition of “inherently”). It doesn’t matter what assault rates would be like in a society without toxic masculinity. One thing and one thing alone matters: on average, I will have to pay out more often for men. Therefore, I charge men more.

If you were to claim that because there may be nothing inherent in maleness that causes assault and being assaulted, therefore men shouldn’t have to pay more, you are making a moral argument, not an empirical one. You are also committing the ought-is fallacy. Just because your beliefs tell you that some aspect of the world should be a certain way, or that it would be more moral for the world to be a certain way, does not mean the world actually is that way or that everyone must agree to order the world as if that were true.

This doesn’t prevent you from making a moral argument that we should ignore certain correlates in certain cases in the interest of fairness, merely that you should not be making an empirical argument about what is ultimately values.

The ought-is fallacy came up literally whenever Weapons of Math Destruction talked about insurance, as well as when it talked about sentencing disparities. Here’s one example:

But as the questions continue, delving deeper into the person’s life, it’s easy to imagine how inmates from a privileged background would answer one way and those from tough inner-city streets another. Ask a criminal who grew up in comfortable suburbs about “the first time you were ever involved with the police,” and he might not have a single incident to report other than the one that brought him to prison. Young black males, by contrast, are likely to have been stopped by police dozens of times, even when they’ve done nothing wrong. A 2013 study by the New York Civil Liberties Union found that while black and Latino males between the ages of fourteen and twenty-four made up only 4.7 percent of the city’s population, they accounted for 40.6 percent of the stop-and-frisk checks by police. More than 90 percent of those stopped were innocent. Some of the others might have been drinking underage or carrying a joint. And unlike most rich kids, they got in trouble for it. So if early “involvement” with the police signals recidivism, poor people and racial minorities look far riskier.

Now I happen to agree with Dr. O’Neil that we should not allow race to end up playing a role in prison sentence length. There are plenty of good things to include in a sentence length: seriousness of crime, remorse, etc. I don’t think race should be one of these criteria and since the sequence of events that Dr. O’Neil mentions make this far from the default in the criminal justice system, I think doing more to ensure race stays out of sentencing is an important moral responsibility we have as a society.

But Dr. O’Neil’s empirical criticism of recidivism models is entirely off base. In this specific example, she is claiming that some characteristics that correlate with recidivism should not be used in recidivism models even though they improve the accuracy, because they are not per se causative of crime.

Because of systematic racism and discrimination in policing [3], the recidivism rate among black Americans is higher. If the only thing you care about is maximizing the prison sentence of people who are most likely to re-offend, then your model will tag black people for longer sentences. It does not matter what the “cause” of this is! Your accuracy will still be higher if you take race into account.

To say “black Americans seem to have a higher rate of recidivism, therefore we should punish them more heavily” is almost to commit the opposite fallacy, the is-ought. Instead, we should say “yes, empirically there’s a high rate of recidivism among black Americans, but this is probably caused by social factors and regardless, if we don’t want to create a population of permanently incarcerated people, with all of the vicious cycle of discrimination that this creates, we should aim for racial parity in sentencing”. This is a very strong (and I think persuasive) moral claim [4].

It certainly is more work to make a complicated moral claim that mentions the trade-offs we must make between punishment and fairness (or between what is morally right and what is expedient) than it is to make a claim that makes no reference to these subtleties. When we admit that we are sacrificing accuracy in the name of fairness, we do open up an avenue for people to attack us.

Despite this disadvantage, I think keeping our moral and empirical claims separate is very important. When you make the empirical claim that “being black isn’t causative of higher rates of recidivism, therefore the models are wrong when they rank black Americans as more likely to reoffend”, instead of the corresponding ethical claim, then you are making two mistakes. First, there’s lots of room to quibble about what “causative” even means, beyond simple genetic causation. Because you took an empirical and not ethical position, you may have to fight any future evidence to the contrary of your empirical position, even if the evidence is true; in essence, you risk becoming an enemy of the truth. If the truth becomes particularly obvious (and contrary to your claims) you risk looking risible and any gains you achieved will be at risk of reversal.

Second, I would argue that it is ridiculous to claim that universal human rights must rest on claims of genetic identicalness between all groups of people (and trying to make the empirical claim above, rather than a moral claim implicitly embraces this premise). Ashkenazi Jews are (on average) about 15 IQ points ahead of other groups. Should we give them any different moral worth because of this? I would argue no [5]. The only criteria for full moral worth as a human and all universal rights that all humans are entitled to is being human.

As genetic engineering becomes possible, it will be especially problematic to have a norm that moral worth of humans can be modified by their genetic predisposition to pro-social behaviour. Everyone, but most especially the left, which views diversity and flourishing as some of its most important projects should push back against both the is-ought and ought-is fallacies and fight for an expansive definition of universal human rights.

Availability Bait-and-Switch

Imagine someone told you the following story:

The Fair Housing Act has been an absolute disaster for my family! My brother was trying to sublet his apartment to a friend for the summer. Unfortunately, one of the fair housing inspectors caught wind of this and forced him to put up notices that it was for rent. He had to spend a week showing random people around it and some snot-nosed five-year-old broke one of his vases while he was showing that kid’s mother around. I know there were problems before, but is the Fair Housing Act really worth it if it can cause this?

Most people would say the answer to the above is “yes, it really was worth it, oh my God, what is wrong with you?”

But it’s actually hard to think that. Because you just read a long, vivid, easily imaginable example of what exactly was wrong with the current regime and a quick throw away reference to there being problems with the old way things were done. Some people might say that it’s better to at least mention that the other way of doing things had its problems too. I disagree strenuously.

When you make a throw-away reference to problems with another way of doing things, while focusing all of your descriptive effort on the problems of the current way (or vice-versa), you are committing the Availability Bait-and-Switch. And you are giving a very false illusion of balance; people will remember that you mentioned both had problems, but they will not take this away as their impression. You will have tricked your readers into thinking you gave a balanced treatment (or at least paved the way for a defence against claims that you didn’t give a balanced treatment) while doing nothing of the sort!

We are all running corrupted hardware. One of the most notable cognitive biases we have is the availability heuristic. We judge probabilities based on what we can easily recall, not on any empirical basis. If you were asked “are there more words in the average English language book that start with k, or have k as the third letter?”, you’d probably say “start with k!” [6]. In fact, words with “k” as the third letter show up more often. But these words are harder to recall and therefore much less available to your brain.

If I were to give you a bunch of very vivid examples of how algorithms can ruin your life (as Dr. O’Neil repeatedly does, most egregiously in chapters 1, 5, and 8) and then mention off-hand that human decision making also used to ruin a lot of people’s lives, you’d probably come out of our talk much more concerned with algorithms than with human decision making. This was a thing I had to deliberately fight against while reading Weapons of Math Destruction.

Because for a book about how algorithms are destroying everything, there was a remarkable paucity of data on this destruction. I cannot recall seeing any comparative analysis (backed up by statistics, not anecdotes) of the costs and benefits of human decision making and algorithmic decision making, as it applied to Dr. O’Neil’s areas of focus. The book was all the costs of one and a vague allusion to the potential costs of the other.

If you want to give your readers an accurate snapshot of the ground truth, your examples must be representative of the ground truth. If algorithms cause twice as much damage as human decision making in certain circumstances (and again, I’ve seen zero proof that this is the case) then you should interleave every two examples of algorithmic destruction with one of human pettiness. As long as you aren’t doing this, you are lying to your readers. If you’re committed to lying, perhaps for reasons of pithiness or flow, then drop the vague allusions to the costs of the other way of doing things. Make it clear you’re writing a hatchet job, instead of trying to claim epistemic virtue points for “telling both sides of the story”. At least doing things that way is honest [7].

Footnotes

[1] This is a classic example of “anchoring”, a phenomenon where you appear to have a strong correlation in a certain direction because of a single extreme point. When you have anchoring, it’s unclear how generalizable your conclusion is – as the whole direction of the fit could be the result of the single extreme point.

Here’s a toy example:

Note that the thing that makes me suspicious of anchoring here is that we have a big hole with no data and no way of knowing what sort of data goes there (it’s not likely we can randomly generate a bunch of new countries and plot their gun ownership and rate of mass shootings). If we did some more readings (ignoring the fact that in this case we can’t) and got something like this:

I would no longer be worried about anchoring. It really isn’t enough just to look at the correlation coefficient either. The image labelled “Also Not Anchored” has a marginally lower correlation coefficient than the anchored image, even though (I would argue) it is FAR more likely to represent a true positive correlation. Note also we have no way to tell that more data will necessarily give us a graph like the third. We could also get something like this:

In which we have a fairly clear trend of noisy data with an average of 2.5 irrespective of our x-value and a pair of outliers driving a slight positive correlation.

Also, the NYT graph isn’t normalized to population, which is kind of a WTF level mistake. They include another graph that is normalized later on, but the graph I show is the preview image on Facebook. I was very annoyed with the smug liberals in the comments of the NYT article, crowing about how conservatives are too stupid to understand statistics. But that’s a rant for another day…  ^

[2] I’d very quickly go out of business because of the moral hazard and adverse selection built into this product, but that isn’t germane to the example. ^

[3] Or at least, this is my guess as to the most plausible factors in the recidivism rate discrepancy. I think social factors ­– especially when social gaps are so clear and pervasive – seem much more likely than biological ones. The simplest example of the disparity in policing – and its effects – is the relative rates of being stopped by police during Stop and Frisk given above by Dr. O’Neil. ^

[4] It’s possible that variations in Monoamine oxidase A or some other gene amongst populations might make some populations more predisposed (in a biological sense) to violence or other antisocial behaviour. Given that violence and antisocial behaviour are relatively uncommon (e.g. about six in every one thousand Canadian adults are incarcerated or under community supervision on any given day), any genetic effect that increases them would both be small on a social level and lead to a relatively large skew in terms of supervised populations.

This would occur in the same way that repeat offenders tend to be about one standard deviation below median societal IQ but the correlation between IQ and crime explains very little of the variation in crime. This effect exists because crime is so rare.

It is unfortunately easy for people to take things like “Group X is 5% more likely to be violent”, and believe that people in Group X are something like 5% likely to assault them. This obviously isn’t true. Given that there are about 7.5 assaults for every 1000 Canadians each year, a population that was instead 100% Group X (with their presumed 5% higher assault rate) would see about 7.875 assaults per 1000 people, a difference of about one additional assault per 3500 people.

Unfortunately, if society took its normal course, we could expect to see Group X very overrepresented in prison. As soon as Group X gets a reputation for violence, juries would be more likely to convict, bail would be less likely, sentences might be longer (out of fear of recidivism), etc. Because many jobs (and in America, social benefits and rights) are withdrawn after you’ve been sentenced to jail, formerly incarcerated members of Group X would see fewer legal avenues to make a living. This could become even worse if even non-criminal members of Group X would denied some jobs due to fear of future criminality, leaving Group X members with few overall options but the black and grey economies and further tightening the spiral of incarceration and discrimination.

In this case, I think the moral thing to do as a society is to ignore any evidence we have about between-group differences in genetic propensities to violence. Ignoring results isn’t the same thing as pretending they are false or banning research; we aren’t fighting against truth, simply saying that some small extra predictive power into violence is not worth the social cost that Group X would face in a society that is entirely unable to productively reason about statistics.  ^

[5] Although we should be ever vigilant against people who seek to do the opposite and use genetic differences between Ashkenazi Jews and other populations as a basis for their Nazi ideology. As Hannah Arendt said, the Holocaust was a crime against humanity perpetrated on the body of the Jewish people. It was a crime against humanity (rather than “merely” a crime against Jews) because Jews are human. ^

[6] Or at least, you would if I hadn’t warned you that I was about to talk about biases. ^

[7] My next blog post is going to be devoted to what I did like about the book, because I don’t want to commit the mistakes I’ve just railed against (and because I think there was some good stuff in the book that bears reviewing). ^

Data Science, Politics

Thoughts (and Data) on Charity & Taxes

The other day, I posed a question to my friends on Facebook:

Do you think countries with higher taxes see more charitable donations or fewer charitable donations? What sort of correlation would you expect between the two (weak positive? weak negative? strong positive? strong negative?).

I just crunched some numbers and I’ll post them later. First I want to give people a chance to guess and test their calibration.

I was doing research for a future blog post on libertarianism and wanted to check one of the fundamental assumptions that many libertarians make: in the absence of a government, private charity would provide many of the same social services that are currently provided by the government.

I honestly wasn’t sure what I’d find. But I was curious to see what people would suggest. Answer fell into four main camps:

  1. Charitable giving and support for a welfare state might be caused by the same thing, so there will be a weak positive correlation.
  2. Tax incentives for charitable donations shift the utility of donating, such that people in higher tax countries will donate more, as they get more utility per dollar spent (they get the same good feelings from charity, but also receive a bigger rebate come tax time). People who thought up this mechanism predicted a weak positive correlation.
  3. This whole thing will be hopeless confounded by other variables and no conclusion would survive proper controls.
  4. Libertarians are right. Taxes drain money that would go to private charity, so we should see a strong(ish) negative correlation.

I was surprised (but probably shouldn’t have been) to find that these tracked people’s political views. The more libertarian I thought someone was, the more likely they were to believe in a negative correlation. Meanwhile, people who were really into the welfare state tended to assume that charitable donations and taxes would be correlated.

In order to figure out who was right, I grabbed the most recent World Giving Index and correlated it with data about personal income tax levels (and sales tax levels, just to see what happened).

There are a number of flaws with this analysis. I’m not looking for confounding variables. Like at all. When it comes to things as tied to national character as charity and taxes (and how they interact!), this is a serious error in the analysis. I’m also using pretty poor metrics. It would be best to compare something like average tax rate with charitable donation amount per capita. Unfortunately, I couldn’t find any good repositories of this data and didn’t want to spend the hours it would take to build a really solid database of my own.

I decided to restrict my analysis to OECD countries (minus Turkey, which I was missing data on). You’ll have to take my word that I made this decision before I saw any of the data (it turns out that there is essentially no correlation between income tax rate and percent of people who donate to charity when looking at all countries where I have data for both).

Caveats aside, what did I see?

There was a weak correlation (I’m using a simple Pearson correlation, as implemented by Google sheets here, nothing fancy) between the percentage of a population that engaged in charitable giving and the highest income tax bracket in a country. There was a weaker, negative correlation between sales tax and the percent of a population that engaged in charitable giving, but more than 60% of this came from the anchoring effect of the USA, with its relatively high charitable giving and lack of Federal sales tax. The correlation with income tax rates wasn’t similarly vulnerable to removing the United States (in fact, it jumped up by about 12% when they were removed).

Here’s the graphs. I’ve deliberately omitted trend lines because I’m a strong believer in the constellation test.

 

All the data available is in a publicly viewable Google Sheet.

I don’t think these data give a particularly clear answer about the likelihood of private charity replacing government sponsored welfare programs in a hypothetical libertarian state. But they do suggest to me that the burden of proof should probably rest on libertarians. These results should make you view any claims that charitable giving is held back by the government with skepticism, but it should by no means prevent you from being convinced by good evidence.

I am happy to see that my results largely line up with better academic studies (as reported by the WSJ). It seems that if we look at the past few decades, decreasing the tax rates in the highest income brackets have been associated with decreasing charitable giving, at least in the United States. Whether this represents a correlated increase in selfishness, or fewer individuals donating as the utility of donating decreases is difficult to know.

The WSJ article also mentions that government grants to a charity reduce private donation by about 75% of the grant amount. I don’t know if this represents donations that are lost entirely, or merely substituted for other (presumably needier) charities. If it’s the first, then this would be strong evidence for the libertarian perspective. If it’s the latter, then it means that many people intuitively understand and accept the key effective altruism concept of “room for more funding“, at least as far as the government is concerned.

Conclusions

Finding good answers to the question of whether private charity would replace government welfare turned out to be harder than I thought. The main problem was the quality of data that is easily available. While it was easy to find statistics good enough for a simple, limited analysis, I wasn’t able to find a convenient table with all of the data I needed. This is where actual researchers have a huge advantage over random people on the internet. They have access to cheap labour in the volumes necessary to find and tabulate high quality data.

I’m very glad I posed the question to my friends before figuring out the answer. It never occurred to me to consider the effect of tax incentives on charitable giving. I’m now of the weakly held opinion that the main way taxes affect charitable donations is by offsetting the costs with rebates. I’m also fascinated by the extent to which people’s guesses tracked their political leanings. This shows that (on my Facebook wall, at least) people hold opinions that are motivated by a genuine desire to see the most effective possible government. Differing axioms and exposure to different data lead to differing conceptions of what this would be, but everyone is ultimately on the same team.

I will try and remember this next time I think someone’s preferred government policy is a terrible idea. It’s probably much more productive to try and figure out why they believe their policy objectives will lead to the best outcomes and arguing about that, rather than slipping into clichéd insults.

I was also reminded that it’s fun and rewarding to spend a few hours doing data analysis (especially when you get the same results as studies that get reported on in the WSJ).