Economics, Falsifiable, Politics

Franchise Economics: Why Tim Hortons Has Become A Flashpoint In The Minimum Wage Fight

Since the minimum wage increase took effect on January 1st, Tim Hortons has been in the news. Many local franchisees have been clawing back benefits, removing paid breaks, or otherwise taking measures to reduce the costs associated with an increased minimum wage.

TVO just put out a piece about this ongoing saga by the Christian socialist Michael Coren. It loudly declares that “Tim Hortons doesn’t deserve your sympathy“. Unfortunately, Mr. Coren is incorrect. Everyone involved here (Tim Hortons the corporation, Tim Hortons franchisees, and Tim Hortons workers) is caught between a rock and a hard place. They all deserve your sympathy.

This Tim Hortons could be literally anywhere in suburban or rural Canada. Image Credit: Marek Ślusarczyk via Wikipedia Commons

It is a truism that a minimum wage increase must result in either declining profits, cuts to other costs, or rising prices. While supporters of the minimum wage increase would love to see it all come out of profits, that isn’t reasonable.

Basic economics tell us that as we approach a perfect market, profits should fall to zero. The key assumptions underpinning this are global perfect information (so no one can have any innovations that allow them to do better than anyone else) and zero start-up costs (so anyone can enter any market at any time). Obviously, these assumptions aren’t true in reality, but when it comes to fast food, they’re fairly close to true.

It is relatively cheap to start a fast-food restaurant (compared to say opening a factory). The start-up costs for a McDonalds, KFC, or Wendy’s are $1,000,000 to $2.3 million, while a Subway costs about $100,000 to $250,000 to start. This means that whenever someone sees fast-food restaurants making large profits in an area, they can open their own and take a fraction of the business, driving everyone’s profits down.

They’re probably driven down much lower than you think. If you had to guess, what would you say the profit margins for a fast-food restaurant are? If you’re anything like people in this study, you probably think something like 35%. The actual answer is 6% [1].

In addition to telling me that the average fast food restaurant has a 6% profit margin, that link helpfully told me that 29% of operating expenses in a fast-food restaurant come from labour costs. Raising those labour costs by 20% by increasing wages 20% increases total costs by 6% [2]. The minimum wage isn’t making fast-food restaurant owners make do with a little less in the way of profits. It’s entirely wiping out profits.

Now maybe your response to that is “well my heart doesn’t really bleed for that big multinational losing its profits”. But that’s not how Tim Hortons works. Tim Hortons, like almost all fast-food restaurants is a franchise. Tim Hortons the corporation makes money by collecting fees and providing services to Tim Hortons the restaurants, which are owned by the mythical small business owners™ that everyone (even the proponents of the minimum wage increase) claim to care so much about.

Most of these owners aren’t scions of wealthy families, but are instead ordinary members of their communities who saw opening a Tim Hortons as an investment, a vocation, or as a way to give back. They need to eat as much as their workers.

Faced with rising labour costs and no real profit buffer to absorb them, these owners can only cut costs or raise prices.

Except they can’t raise prices.

That’s the rub of a franchise system. The corporate office wants everything to be the exact same at every store. They set prices and every store must follow them. But there’s divergent incentives here. Tim Hortons the corporation makes a profit by selling supplies to its franchises; critically, they make a profit on supplies whether those franchisees turn a profit or not. They really don’t want to raise prices, because raising prices will hurt their bottom line.

It’s well known that (in general) the more expensive something is, the less people want it. Raising prices will hurt the sales volume of Tim Hortons franchises, which will decrease the profits at corporate Tim Hortons. The minimum wage hike affects Tim Hortons the corporation very little. They might see slightly increased shipping costs, but their costs are far less dependent on Canadian minimum wage labour. Honestly, the minimum wage increase probably is a net good for Tim Hortons the corporation. More money in people’s pockets means more money spent on fast-food.

Tim Hortons the corporation probably won’t say it, because they don’t want to antagonize their franchisees, but this minimum wage hike is great for them.

So, Tim Hortons franchisees have to cut costs or run charities. Given that they are running restaurants and not charities, we can probably assume that they’re going to cut costs. Why does it have to be labour costs that get cut? Can’t they just get their supplies for cheaper?

Here the franchise system bites them again. If they were independent restaurateurs, they might be able to source cheaper ingredients, reduce the ply of the toilet paper in their bathrooms, etc. and get their profits back this way.

But they’re franchisees. Tim Hortons the corporation has a big list of everything you need to run a Tim Hortons and you are only allowed to buy it from them. They get to set the prices however they want. And what they want is to keep them steady.

The only cost that Tim Hortons the corporation doesn’t control is labour costs. So, this is what franchisees have to cut.

There are two ways to decrease your labour costs. You can “increase productivity”, or you can cut wages and benefits. “Increase productivity” is the clinical and uninformative way of saying “fire 20% of your workers and verbally abuse the others until they work faster” or “fire 20% of your workers and replace them with machines”. While increased productivity is generally desirable from an economics point of view, it is often more ambiguous from a moral point of view.

Given that the minimum wage was just raised and it is illegal to pay any less than it, Tim Hortons franchisees cannot cut wages. So, if they’re against firing their employees and want to keep making literally any money, they have to cut benefits.

This might make it seem like corporate Tim Hortons is the bad guy here. They aren’t. The executives at Tim Hortons labour under what is called a fiduciary duty. They have a legal obligation to protect shareholder interests from harm and to act for the good of the corporation, not their own private good or for their private moral beliefs. They are responding to the minimum wage hike the way the government has told them to respond [3].

Minimum wage jobs suck. For all that economists claim there is no moral judgement implied in a wage, that it merely shows the intersection of the amount of supply of a certain type of labour and the demand for that labour, it can be hard to believe that there is no moral dimension to this when people making one wage struggle to make ends meet, while those earning another can buy fancy cars they don’t even need.

It is popular to blame business owners and capitalists for the wages their workers make and to say that it shows how little they value their workers. I don’t think that’s merited here. Corporate Tim Hortons has crunched the numbers and decided that if they raise prices, fewer people will buy coffee, their profits will decrease, and they might be personally liable for breach of fiduciary duty. In the face of rising prices, franchisees try and do whatever they can to stay afloat. We can say that caring about profits more than the wages their workers make shows immense selfishness on the part of these franchisees, but it’s little different than the banal selfishness anyone shows when they care more about making money for themselves than making money and giving it away – or the selfishness we show when we want our coffee to be cheaper than it can be when made by someone earning a wage that can comfortably support a family.


[1] As long as there are other available investments approximately as risky as opening a fast-food restaurant that return at least 6%, profits shouldn’t drop any lower than that. In this way, inefficiencies in other sectors could stop fast food restaurants from behaving like they were in a perfectly free market even if they were. ^

[2] This calculation is flawed, in that there are probably other costs making up total labour costs (like benefits) beyond simple wage income. On the other hand, it isn’t just wages that are going up. Other increased costs probably balance out any inaccuracies, making the conclusions essentially correct. This is to say nothing for corporate taxes, which further reduce profits. ^

[3] We can’t blame fiduciary duty, because fiduciary duty is how investing at all can happen. You might not like investing, but without investing, saving for retirement or having a national pension plan is impossible. If your response to this is to say “well let’s just tear down capitalism and start over”, I’d like to remind you that people tried that and it led to a) famine, b) gulags, c) death squads, d) more famine, and e) persistent shortages of every consumer good imaginable, including food ^


2018 Predictions

Inspired by Slate Star Codex, this is my second year of making predictions (see also: my previous predictions, their scores, and my recent LessWrong post about these predictions).

Before I jump into the predictions, I want to mention that I’ve created templates so that anyone who wants to can also take a stab at it; the templates focus on international events and come in two versions:

  • Long (which assumes you read global news a lot)
  • Short (which is less demanding)

With both these sheets, the idea is to pick a limited number of probabilities (I recommend 51%, 60%, 70%, 80%, and 90%) and assign one to each item that you have an opinion on. At the end of the year, you count the number of correct items in each probability bin and use that to see how close you were to ideal. This gives you an answer to the important question: “when I say something is 80% likely to happen, how likely, really, is it to happen?”

You can also make your own (or use the set of questions Slate Star Codex normally uses). If you do make your own, please link your post (and maybe also your template?) in the comments or post it to the front page. It’s my hope that this post can serve as a convenient place for the LW community to look at the predictions of everyone who wants to participate in this experiment!

With that out of the way, here’s my guesses for the next year.


  1. Liberals remain ahead in the CBC Poll Tracker seat projection – 70%
  2. Trudeau has a higher net favorability rating than Andrew Scheer according to the CBC Leader Meter on January 1, 2019 – 80%
  3. Marijuana is legalized in time for Canada Day – 60%
  4. Marijuana is legalized in 2018 – 90%
  5. At least one court finds the assisted dying bill isn’t in line with Carter v Canada – 70%
  6. Ontario PC party wins the election – 60%
  7. The Ontario election results in a minority government – 80%
  8. The Quebec election results in a minority government – 80%
  9. No BC snap election in 2018 – 90%
  10. No terrorist attack in Canada that kills > 10 Canadians in 2018 – 90%
  11. More Canadian opioid poisoning deaths in 2018 than in 2017 – 60%
  12. Canada does better at the 2018 Winter Olympics (in both gold medals and total medals) than in 2014 – 90%
  13. Canada does not win a gold medal in men’s hockey at the 2018 Olympics – 70%
  14. Canada does win a gold medal in women’s hockey at the 2018 Olympics – 51%


  1. Trump announces that the US is pulling out of NAFTA and begins the process of putting the US withdrawal into motion – 51%
  2.  Less than 100km of concrete wall on the border with Mexico will be constructed – 90%
  3. No registry of Muslims created – 90%
  4. Congress doesn’t take action to extend DACA – 80%
  5. No department of the Federal Government is eliminated – 90%
  6. There isn’t a government shutdown before the midterm elections – 60%
  7. Democrats take back the house in the 2018 midterm elections – 80%
  8. Democrats take back the senate in the 2018 midterm elections – 60%
  9. Mueller’s investigation finishes in 2018 – 60%
  10. Impeachment proceedings aimed at Trump are not started in 2018 – 80%
  11. Trump is still president at the end of 2018 – 90%
  12. No terrorist attack in America that kills > 10 Americans – 70%
  13. No terrorist attack in America that kills > 100 Americans – 90%
  14. Susan Collins doesn’t get the Obamacare stabilization measures she was promised – 70%
  15. More US opioid poisoning deaths in 2018 than in 2017 – 80%

South America

  1. FARC peace deal remains in place on January 1, 2019 – 80%
  2. The black market exchange rate for Venezuelan Bolivars is above 110,000 to the US dollar on January 1, 2019 (as measured by DolarToday) – 80%
  3. Inflation in Venezuela is above 100% for the year of 2018 (as measured by DolarToday) – 90%
  4. United Socialist party retains control of the Venezuelan presidency in 2018 – 90%
  5. Protests (and the official response to those protests) result in more than 100 fatalities in Venezuela in 2018 – 60%
  6. Protests (and the official response to those protests) do not result in more than 1000 fatalities in Venezuela in 2018 – 70%
  7. Major Venezuelan opposition groups do not enter any sort of power sharing agreement with the Venezuelan regime in 2018 – 80%

Middle East

  1. No Israeli politician is indicted by the ICC over settlement activity in 2018 – 90%
  2. There isn’t an election in Israel in 2018 – 80%
  3. US does not physically relocate its embassy to Jerusalem in 2018 – 90%
  4. No Palestinian led Intifada in Israel that results in the deaths of >1000 combined attackers, security forces, and civilians (this is a conflict characterized by suicide bombing and police responses) – 70%
  5. No Israeli led operation in the West Bank or Gaza that results in the deaths of >1000 combined soldiers, civilians, and militants (this is a conflict characterized by rocket fire and military strikes) – 70%
  6. Fatah and Hamas do not meaningfully reconcile in 2018 (e.g. Fatah still doesn’t control Gaza by January 1, 2019) – 51%
  7. No significant resurgence in ISIL in 2018 (e.g. it does not gain territory over the next year) – 80%
  8. Fewer casualties in the Syrian Civil War in 2018 than in 2017 – 70%
  9. No power sharing agreement or durable ceasefire (typified by the three months following the agreement each having less than 500 fatalities) in Syria in 2018 – 80%
  10. Bashar Al Assad is still President of Syria on January 1, 2019 – 90%
  11. Protests in Iran do not result in more than 1000 fatalities by the end of 2018 – 70%
  12. Protests in Iran do not result in more than 100 fatalities by the end of 2018 – 51%
  13. Hassan Rouhani is still President of Iran on January 1, 2019 – 90%
  14. No new international sanctions against Iran (does not include adding new organizations or individuals to old categories and requires coordinated participation of at least two countries) – 80%
  15. No new US sanctions against Iran (does not include adding new organizations or individuals to old categories) – 51%
  16. No attack on the Iranian nuclear program by Israel – 90%
  17. Iran does not withdraw from the deal limiting its nuclear program – 90%
  18. Conditional on Iran remaining in the nuclear deal, inspectors find no evidence of violations after the deal began – 90%
  19. Yemen Civil War continues – 60%
  20. Saudi Arabia pulls troops out of Yemen – 51%
  21. Mohammed bin Salman either remains as crown prince of Saudi Arabia, or becomes king (i.e. no coup or succession shake-up) – 80%
  22. Rockets fired from Yemen cause casualties in another country – 51%
  23. No resolution or lifting of embargo in the Qatar crisis – 80%
  24. OPEC production cuts continue through to the end of 2018 – 60%


  1. No power sharing between ZANU-PF and the opposition will happen in Zimbabwe before the elections (if they occur) in 2018 – 80%
  2. Zimbabwe will hold election in 2018 – 70%
  3. No peace deal ends South Sudan fighting – 70%
  4. Libya still has two rival governments on January 1, 2019 – 70%
  5. No protests, riots, or rebellion in Egypt that kills >100 people in a one week period – 80%
  6. No protests, riots, or rebellion in Tunisian kills >50 people in a one week period – 90%
  7. No terrorist attack in Tunisia kills >20 people – 80%
  8. Zuma is not impeached in 2018 – 51%


  1.  Inflation rate in Japan still remains below 1% in 2018 – 70%
  2. Japanese constitutional reform (removing pacifism) does not occur in 2018 – 51%
  3. China will not deploy its military against Taiwan or Hong Kong in 2018 – 90%
  4. North Korea will test a submarine launched ballistic missile in 2018 – 70%
  5. North Korea will not test nuclear weapons or launch any missiles during the 2018 Olympics – 80%
  6. North Korea will test a nuclear weapon in 2018 – 51%
  7. No country will attempt to shoot down a North Korean missile test in 2018 – 80%
  8. If there is an attempt, it will succeed – 51%
  9. North Korea tests a missile that is judged by experts at 38 North as likely able to carry a plausible North Korean nuclear weapon to the United States – 60%
  10. No current member of China’s Politburo Standing Committee visits North Korea in 2018 – 70%
  11. No meeting between Kim Jung-un and Moon Jae-in in 2018 – 90%


  1. No resolution to the crisis in Ukraine – 80%
  2. Russian GDP growth is less than 3% – 80%
  3. No gain of greater than 20% in the value of the ruble vs. the dollar – 70%
  4. Sanctions against Russia are not significantly rolled back (e.g. sanctions remain in place against Rosneft, Novate, Gazprombank and Vnesheconombank by all members of the G7 remain in place at the end of 2018) – 90%
  5. Angela Merkel remains chancellor of Germany – 60%
  6. Germany holds another election before a government can be formed – 51%
  7. No date set for another Scottish referendum in 2018 – 80%
  8. Teresa May remains prime minister of the United Kingdom – 70%
  9. The UK does not terminate the process of Brexit in 2018 – 90%
  10. There is no final Brexit withdrawal deal reached in 2019 (Donald Tusk wishes to have one by October) – 51%
  11. No snap election/vote of no-confidence in the UK in 2018 – 80%
  12. Poland’s EU voting rights aren’t suspended – 90%
  13. Poland and Hungary continue to refuse to accept migrant quotas – 90%

Grading my 2017 Predictions

Now is the big reveal. Just how did I do in 2017?


  1. Trudeau ends the year with a lower approval rating than he started – 60%
  2. No bill introduced that changes the electoral system away from first past the post in 2019 – 50%
  3. No referendum scheduled on changing the electoral system away from first past the post before 2019 – 70%
  4. A bill legalizing marijuana is passed by the House of Commons – 90%
  5. The senate doesn’t block attempts to legalize marijuana – 80%
  6. At least one court finds the assisted dying bill isn’t in line with Carter v Canada – 60%
  7. Ontario Liberal Approval rating remains below 30% – 80%
  8. Patrick Brown “unsure” rating remains above 40% – 70%
  9. Kellie Leitch is not the next CPC leader – 80%
  10. Michael Chong is not the next CPC leader – 70%
  11. Maxine Bernier is not the next CPC leader – 90%
  12. No terrorist attack that kills >10 Canadians – 70%
  13. No terrorist attack that kills >100 Canadians – 90%
  14. At least one large technology company (valuation >$10 billion and >1,000 employees) will open a Waterloo office in 2017 – 80%



  1. Trump will veto at least 1 bill passed by the House and Senate – 70%
  2. Changes to NAFTA will not significantly affect Canada (e.g. introduce tariffs, eliminate visas, etc) – 80%
  3. Less than 100km of concrete wall on the border with Mexico will be constructed – 80%
  4. Unemployment rate changes by less than 0.5% in 2017 – 90%
  5. Bay Area housing prices increase in 2017 – 90%
  6. Protests (in America) on Trump’s inauguration day draw at least 1 million people – 80%
  7. Protests (in America) on Trump’s inauguration day draw at least 5 million people – 50%
  8. Protests (in America) on Trump’s inauguration day draw less than 10 million people – 70%
  9. Protests outside of America on Trump’s inauguration day draw at least 1 million people – 60%
  10. Terrorist attack in America that kills at least 10 Americans – 70%
  11. No terrorist attack in America that kills at least 100 Americans – 70%
  12. No registry of Muslims created in America – 90%
  13. New Supreme Court Justice is named to the USSC – 90%
  14. No repeal of any of: the individual mandate, the prohibition on denying coverage for pre-existing conditions, children remaining on their parents insurance plans until they are 25 – 70%
  15. gov is taken offline or otherwise rendered inoperative by the new administration – 80%
  16. No Federal Department is eliminated – 80%

South America

  1. No setback to the FARC peace deal significant enough to cause >1000 rebels to rearm – 70%
  2. On the black market, the exchange rate for Venezuelan Bolivars to US Dollars remains above 3000 bolivars per dollar. (As measured by DolarToday) – 80%
  3. Inflation in Venezuela for 2017 is higher than 100% (As measured by DolarToday) – 90%
  4. United Socialist party retains control of the Venezuelan presidency – 70%
  5. No uprising in Venezuela leading to >1000 combined civilian and soldier deaths – 70%

Middle East

  1. The “Regulation” Bill, legalizing many illegal settlements, is passed in Israel – 60%
  2. No Israeli politician is indicted by the ICC over settlement activity in 2017 – 80%
  3. The US moves its embassy to Jerusalem – 50%
  4. OPEC agreement fails (as evidenced by Saudi Arabia increasing oil production to >10.058 million BPD) – 50%
  5. Iraq takes back Mosul – 90%
  6. Mosul Dam does not fail – 70%
  7. Fewer casualties in Syrian Civil War in 2017 than in 2016 – 60%
  8. No new international sanctions against Iran – 80%
  9. No new US sanctions against Iran – 50%
  10. No attack on the Iranian nuclear program by Israel – 80%
  11. Iran does not withdraw from the deal limiting its nuclear program – 80%
  12. Conditional on Iran remaining in the nuclear deal, inspectors find no evidence of violations after the deal began – 90%
  13. Yemen Civil War continues – 60%


  1. Power transition in The Gambia requires ECOWAS troops – 50%
  2. Power transition occurs in The Gambia – 70%
  3. No peace deal ends South Sudan fighting – 50%
  4. IS or affiliated groups do not hold more territory in Africa at the end of 2017 than at the beginning – 90%
  5. Libya has a single government by the end of 2017 – 50%
  6. No protests, riots, or rebellion in Egypt that kills >100 people in a one week period – 80%
  7. No protests, riots, or rebellion in Tunisian kills >50 people in a one week period – 90%
  8. At least one terrorist attack kills >50 people in Tunisian – 50%


  1. Inflation rate in Japan remains below 1% in 2017 – 70%
  2. No Japanese snap election in 2017 – 90%
  3. Scandal involving Thailand’s new king makes its way to a major Western Newspaper – 50%
  4. Saenuri Party loses in the 2017 South Korean election – 80%
  5. China will send at least one diplomatic “insult” to the US (e.g. expelling an ambassador or consul or closing on of its embassies or consulates) – 60%
  6. By the end of 2017, none of the young lawmakers associated with the Umbrella Revolution will be in the Hong Kong parliament – 60%
  7. The Hong Kong lawmakers who are appealing their ban from parliament will have their final appeals denied – 80%
  8. China will not deploy its military against either Hong Kong or Taiwan in 2017 – 90%
  9. North Korea detonates a nuclear weapon – 70%
  10. North Korea does not demonstrate a completed weapon system (e.g. miniaturized bomb and ICBM capable of threatening the continental United States) – 90%


  1. No resolution to the crisis in Ukraine – 70%
  2. Crimea remains part of Russia – 90%
  3. Russian GDP growth is less than 2% – 80%
  4. No gain of greater than 15% in the value of the ruble vs the dollar – 60%
  5. Angela Merkel remains Chancellor of Germany – 60%
  6. Marie Le Pen does not become President of France – 70%
  7. Geert Wilders does not become Prime Minister of the Netherlands – 70%
  8. UK invokes Article 50 – 60%
  9. Conditional on the UK invoking article 50, this occurs behind schedule – 70%
  10. Conditional on the UK leaving the EU, Scotland prepares for another referendum – 80%
  11. No snap election called in the UK – 80%
  12. No regional independence movement (e.g. Scotland, Catalan) achieves success in Europe in 2017 – 90%
  13. Sanctions against Russia are not significantly rolled back (e.g. sanctions remain in place against Rosneft, Novate, Gazprombank and Vnesheconombank by all members of the G7 remain in place at the end of 2017) – 60%


  1. I will not break up with anyone I am currently dating – 90%
  2. I will buy a car – 50%
  3. I will still be working at my current job at the end of 2017 – 80%
  4. I will not move to another city in 2017 – 90%
  5. Conditional on remaining in my current city, I will not move to a different apartment in 2017 – 80%
  6. I will read at least 40 books this year – 80%
  7. I will read at least 10 non-fiction books this year – 50%
  8. I will start reading (and read at least 50 pages) of at least 10 books people recommended to me this year – 60%
  9. I will write at least 200,000 words this year – 80%
  10. I will post at least 15 blog posts or short stories – 80%
  11. I will post at least 25 blog posts or short stories – 50%
  12. I will be >15% over or under-confident for at least 2 confidence levels in these predictions (before taking into account this prediction) – 80%


I thought that making my predictions mostly numerical would make them easy to grade. This mostly worked, but there were a few edge cases, judgement calls, and other amusing things that I want to explicitly mention:

  • Throughout my predictions I used the word “remains”. I regret this, because it is ambiguous. I think I intended it to mean “on January 1st, 2018, X remains true”, but there’s an alternative parsing that is “during all of 2017, X will be true”. I feel it’s most accurate to grade these according to my intent. For my 2018 predictions, I will use clearly language.
  • For 8, the two most recent polls I could find were both from November. In one, Patrick Brown had a “don’t know” rating of 50%. In the other, it was 34%. Polls were found by Googling ‘ontario leader popularity’ and ‘ontario leader popularity politics’; November was the last month in which I could find polls, so I only used November polls. I’m averaging these two and considering the prediction successful. The lack of good aggregation of Ontario political information is part of why I would like to create a website tracking the Ontario election this year.
  • While I was correct as to the who wasn’t the next leader of the Conservative party, I definitely got emotionally involved such that I was severely miscalibrated. Bernier came far closer to winning it then Leitch or Chong and both of those two fringe candidates had much lower chances that it felt like they did.
  • WRT 24 and 25, I’m not counting the Las Vegas shooting as a terrorist attack because it lacked a political motive (as far as we currently know). I think I overestimated the risk of terrorist attack because of the availability heuristic (the last two years had seen a higher than normal amount of successful and dangerous terrorist attacks). A proper estimation would have focused more on the base rate.
  • I don’t think Trump’s cancellation of advertising comes anywhere close to fulfilling 29, so I’m marking it as failed.
  • 31 is borderline, with many former FARC fighters “joining criminal gangs or a dissident FARC movement that has about 1,000 fighters nationwide“. Given that this still implies less than 1000 members rearmed to continue the fight as FARC, I think the prediction holds.
  • 38 is also borderline, but ultimately, I think there is a difference between an announcement of intent to move and an actual movement. Since I was going to mark 28 as a success if Trump hadn’t signed the tax bill by now, it’s only fair that I mark 38 as a failure.
  • I was way under-confident in the stability of the Mosul dam (41). Compare my probability with the chance on the Good Judgement Project and you’ll see I really overstated the risk compared to the consensus.
  • WRT to 43 and 44, the US Treasury added new groups to existing designations, but these are neither new sanctions, nor international sanctions
  • For 59, I think this Daily Mail headline counts: “Thailand’s colourful new King brought ‘his mistress AND his former air stewardess wife’ to his father’s lavish cremation ceremony with both marching in bearskin hats“. I think I want to stay away from prediction “scandals” in the future though, because it’s a very fuzzy concept.
  • While North Korea claims to have a complete, miniaturized ICBM, it looks like them actually realizing this with a weapon able to hit the US mainland is about one year away. Therefore 66 is a success.
  • 69 is only provisionally true and needs to be revised when more GDP data is available.
  • I am apparently rubbish at predicting snap elections, given that I got both 77 and 58 wrong, while being highly confident in my wrongness.
  • Out of all of my failed predictions, the one that surprised me the most was the OPEC deal holding. I really thought that it would fall apart.

A complete list of the sources I used when grading all non-personal predictions is available here.


The whole point of having predictions with few allowed probabilities (for me it was 50%, 60%, 70%, 80% and 90%) is that you can then check how accurate these were by pooling your answers. Here’s how I did:

Of my predictions at a 50% confidence level, I got 7 right and 6 wrong (54%).
Of my predictions at a 60% confidence level, I got 9 right and 4 wrong (69%).
Of my predictions at a 70% confidence level, I got 16 right and 4 wrong (80%).
Of my predictions at an 80% confidence level, I got 20 right and 6 wrong (77%).
Of my predictions at a 90% confidence level, I got 17 right and 2 wrong (89%).

If you prefer graphs, here’s the results on a graph. The red line shows what I would get if I was a perfect judge of probability. The blue line is actual me. Whenever the red line is below my results, I was under-confident. Whenever it’s above them, I was overconfident.

I’m pleased that in general (excepting 70% vs. 80%), things I thought were more likely were in fact more likely. I appear to be fairly under-confident at lower probability levels (50% through 70%), and fairly good at higher confidence levels (80% and 90%), although of course this is just one year and some of this could be due to chance and luck.

My meta-calibration was quite poor. I was never more than 10% off from perfect calibration, despite my worries that I would frequently be up to 15% from it.

Advice, Model

Improvement Without Superstition

[7 minute read]

When you make continuous, incremental improvements to something, one of two things can happen. You can improve it a lot, or you can fall into superstition. I’m not talking about black cats or broken mirrors, but rather humans becoming addicted to whichever steps were last seen to work, instead of whichever steps produce their goal.

I’ve seen superstition develop first hand. It happened in one of the places you might least expect it – in a biochemistry lab. In the summer of 2015, I found myself trying to understand which mutants of a certain protein were more stable than the wildtype. Because science is perpetually underfunded, the computer that drove the equipment we were using was ancient and frequently crashed. Each crash wiped out an hour or two of painstaking, hurried labour and meant we had less time to use the instrument to collect actual data. We really wanted to avoid crashes! Therefore, over the course of that summer, we came up with about 12 different things to do before each experiment (in sequence) to prevent them from happening.

We were sure that 10 out of the 12 things were probably useless, we just didn’t know which ten. There may have been no good reason that opening the instrument, closing, it, then opening it again to load our sample would prevent computer crashes, but as far as we could tell when we did that, the machine crashed far less. It was the same for the other eleven. More self-aware than I, the graduate student I worked with joked to me: “this is how superstitions get started” and I laughed along. Until I read two articles in The New Yorker.

In The Score (How Childbirth Went Industrial), Dr. Atul Gawande talks about the influence of the Apgar score on childbirth. Through a process of continuous competition and optimization, doctors have found out ways to increase the Apgar scores of infants in their first five minutes of life – and how to deal with difficult births in ways that maximize their Apgar scores. The result of this has been a shocking (six-fold) decrease in infant mortality. And all of this is despite the fact that according to Gawande, “[in] a ranking of medical specialties according to their use of hard evidence from randomized clinical trials, obstetrics came in last. Obstetricians did few randomized trials, and when they did they ignored the results.”

Similarly, in The Bell Curve (What happens when patients find out how good their doctors really are), Gawande found that the differences between the best CF (cystic fibrosis) treatment centres and the rest turned out to hinge on how rigorously each centre followed the guidelines established by big clinical trials. That is to say, those that followed the accepted standard of care to the letter had much lower survival rates than those that hared off after any potentially lifesaving idea.

It seems that obstetricians and CF specialists were able to get incredible results without too much in the way of superstitions. Even things that look at first glance to be minor superstitions often turned out not to be. For example, when Gawande looked deeper into a series of studies that showed forceps were as good as or better than Caesarian sections, he was told by an experienced obstetrician (who was himself quite skilled with forceps) that these trials probably benefitted from serious selection effects (in general, only doctors particularly confident in their forceps skills volunteer for studies of them). If forceps were used on the same industrial scale as Caesarian sections, that doctor suspected that they’d end up worse.

But I don’t want to give the impression that there’s something about medicine as a field that allows doctors to make these sorts of improvements without superstition. In The Emperor of all Maladies, Dr. Siddhartha Mukherjee spends some time talking about the now discontinued practices of “super-radical” mastectomy and “radical” chemotherapy. In both treatments, doctors believed that if some amount of a treatment was good, more must be better. And for a while, it seemed better. Cancer survival rates improved after these procedures were introduced.

But randomized controlled trials showed that there was no benefit to those invasive, destructive procedures beyond that offered by their less-radical equivalents. Despite this evidence, surgeons and oncologists clung to these treatments with an almost religious zeal, long after they should have given up and abandoned them. Perhaps they couldn’t bear to believe that they had needlessly poisoned or maimed their patients. Or perhaps the superstition was so strong that they felt they were courting doom by doing anything else.

The simplest way to avoid superstition is to wait for large scale trials. But from both Gawande articles, I get a sense that matches with anecdotal evidence from my own life and that of my friends. It’s the sense that if you want to do something, anything, important – if you want to increase your productivity or manage your depression/anxiety, or keep CF patients alive – you’re likely to do much better if you take the large scale empirical results and use them as a springboard (or ignore them entirely if they don’t seem to work for you).

For people interested in nootropics, melatonin, or vitamins, there’s self-blinding trials, which provide many of the benefits of larger trials without the wait.  But for other interventions, it’s very hard to effectively blind yourself. If you want to see if meditation improves your focus, for example, then you can’t really hide the fact that you meditated on certain days from yourself [1].

When I think about how far from the established evidence I’ve gone to increase my productivity, I worry about the chance I could become superstitious.

For example, trigger-action plans (TAPs) have a lot of evidence behind them. They’re also entirely useless to me (I think because I lack a visual imagination with which to prepare a trigger) and I haven’t tried to make one in years. The Pomodoro method is widely used to increase productivity, but I find I work much better when I cut out the breaks entirely – or work through them and later take an equivalent amount of time off whenever I please. I use pomos only as a convenient, easy to Beemind measure of how long I worked on something.

I know modest epistemologies are supposed to be out of favour now, but I think it can be useful to pause, reflect, and wonder: when is one like the doctors saving CF patients and when is one like the doctors doing super-radical mastectomies? I’ve written at length about the productivity regime I’ve developed. How much of it is chaff?

It is undeniable that I am better at things. I’ve rigorously tracked the outputs on Beeminder and the graphs don’t lie. Last year I averaged 20,000 words per month. This year, it’s 30,000. When I started my blog more than a year ago, I thought I’d be happy if I could publish something once per month. This year, I’ve published 1.1 times per week.

But people get better over time. The uselessness of super-radical mastectomies was masked by other cancer treatments getting better. Survival rates went up, but when the accounting was finished, none of that was to the credit of those surgeries.

And it’s not just uselessness that I’m worried about, but also harm; it’s possible that my habits have constrained my natural development, rather than promoting it. This has happened in the past, when poorly chosen metrics made me fall victim to Campbell’s Law.

From the perspective of avoiding superstition: even if you believe that medicine cannot wait for placebo controlled trials to try new, potentially life-saving treatments, surely you must admit that placebo controlled trials are good for determining which things aren’t worth it (take as an example the very common knee surgery, arthroscopic partial meniscectomy, which has repeatedly performed no better than sham surgery when subjected to controlled trials).

Scott Alexander recently wrote about an exciting new antidepressant failing in Stage I trials. When the drug was first announced, a few brave souls managed to synthesize some. When they tried it, they reported amazing results, results that we now know to have been placebo. Look. You aren’t getting an experimental drug synthesized and trying it unless you’re pretty familiar with nootropics. Is the state of self-experimentation really that poor among the nootropics community? Or is it really hard to figure out if something works on you or not [2]?

Still, reflection isn’t the same thing as abandoning the inside view entirely. I’ve been thinking up heuristics since I read Dr. Gawande’s articles; armed with these, I expect to have a reasonable shot at knowing when I’m at risk of becoming superstitious. They are:

  • If you genuinely care only about the outcome, not the techniques you use to attain it, you’re less likely to mislead yourself (beware the person with a favourite technique or a vested interest!).
  • If the thing you’re trying to improve doesn’t tend to get better on its own and you’re only trying one potentially successful intervention at a time, fewer of your interventions will turn out to be superstitions and you’ll need to prune less often (much can be masked by a steady rate of change!).
  • If you regularly abandon sunk costs (“You abandon a sunk cost. You didn’t want to. It’s crying.”), superstitions do less damage, so you can afford to spend less mental effort on avoid them.

Finally, it might be that you don’t care that some effects are placebo, so long as you get them and get them repeatedly. That’s what happened with the experiment I worked on that summer. We knew we were superstitious, but we didn’t care. We just needed enough data to publish. And eventually, we got it.

[Special thanks go to Tessa Alexanian, who provided incisive comments on an earlier draft. Without them, this would be very much an incoherent mess. This was cross-posted on Less Wrong 2.0 and as of the time of posting it here, there’s at least one comment over there.]


[1] Even so, there are things you can do here to get useful information. For example, you could get in the habit of collecting information on yourself for a month or so (like happiness, focus, etc.), then try several combinations of interventions you think might work (e.g. A, B, C, AB, BC, CA, ABC, then back to baseline) for a few weeks each. Assuming that at least one of the interventions doesn’t work, you’ll have a placebo to compare against. Although be sure to correct any results for multiple comparisons. ^

[2] That people still buy anything from HVMN (after they rebranded themselves in what might have been an attempt to avoid a study showing their product did no better than coffee) actually makes me suspect the latter explanation is true, but still. ^


Book Review: The Managed Heart

[16 minute read]

Content warning: reading this book left me in a low state of existential panic and unable to respond appropriately to other people’s emotions for about a week. You have been warned.

If you’ve followed my blog for any amount of time, you probably know that I’m a big fan of the sociologist and feminist scholar Professor Arlie Russell Hochschild. Previously I have reviewed her books “Strangers in Their Own Land” and “The Second Shift“. I’ve also published a practical guide to sharing housework, inspired by reading “The Second Shift”. Today I’m going to review The Managed Heart the book that first brought Professor Hochschild to mainstream attention.

But before I begin the review, I’d like to talk about words.

Words are handles to grasp concepts. These handles (like the concepts they evoke) are by necessity blurry and fuzzy. They change. Is Pluto a planet? It depends on what “planet” means to you. If you’re an academic astronomer, you might answer this differently than one of the kids who sent Neil deGrasse Tyson hate mail.

Language must necessarily grow and evolve. I’ve given up trying to police the meaning of literally (although you’ll have to take the Oxford Comma from my cold, dead hands). That said, I really wish that every subculture dominated by people under thirty took one fucking second to do a fucking lit review before they grab academic sounding words for their HuffPo think pieces or blog posts.

(I live in a glass house here. I am loosely associated with the Rationalist Community, a group of people who have based their whole philosophy on the literal arch-enemies of the rationalist philosophical tradition. “Empiricist Community” didn’t sound as smart or clever, so it lost out as a name despite the fact it was far more accurate.)

Technical words mean specific things and their definitions are policed so that academic disagreement (and more rarely, agreement) can happen at all. Academics need to have a clear(ish) view of what concept-handles they’re playing with and clear(ish) boundaries on those concepts, lest they spend all of their days arguing about definitions, like a Clinton caught in a lie. Currently we filter that sort of person out of the general academic discourse by letting them go study Hegel, but there’s always a risk of that spilling over, to disastrous effect.

Worse, when a technical word is stolen for general vocabulary it often comes to mean what people think it should mean, rather than what it originally meant. Those concepts, which were important enough that they needed names, are now left to float, handle-less. For example, “market failure” is at risk of coming to mean “weird consequences of markets”, not “markets that are trapped away from the Pareto-frontier, such that they have an opportunity to make someone/some metric better off without making anyone/anything else worse off that cannot be realized”. The technical definition is not evoked as well by the phrase “market failure” and so is at risk of being elided in popular discourse.

A subsequent consequence of this is that academic debate becomes meaningless, confusing, or incomprehensible to ordinary people (as their ability to police the language they use for discussions results in inevitable linguistic drift when those same terms are misused elsewhere). Non-academics assume that academics are using the colloquial term, when in fact they’re saying something else. Switching terms like this often has serious consequences for the veracity of arguments!

When an economist says “a minimum wage can lead to market failures”, many people think the economist is saying “it would be better if people could be payed less”, where they might actually be saying “when a minimum wage exists, a company may fail to hire a low productivity worker (say a high school graduate, or someone who doesn’t speak the dominant language very well) while forcing another worker to work overtime; if no minimum wage existed, the company could hire that worker, making both the hired worker and the existing employee (now freed from overtime) better off, while leaving the company no worse off”.

All this is to say that “emotional labour” is a key concept from The Managed Heart. It was termed in this book. And as near as I can tell, it has literally never been used properly in a blog post or think piece.

So before I talk about what emotional labour (in the academic sense) is, I’d like to give several examples of what it isn’t.

Emotional labour isn’t the mental load that women have to carry when managing the chores and children of a household. Infuriatingly, this subject was covered by Professor Hochschild in another book. It has a whole chapter devoted to it! Properly termed, it would be “responsibility for managing the second shift” or something like that.

Emotional labour isn’t women helping men process and figure out their feelings without compensation. Under the framework introduced in The Managed Heart, I’d suggest that it could be called “feeling rules promoting asymmetric empathizing”, which I will admit is much less catchy.

Emotional labour isn’t even the work women do to manage their feelings in a relationship so that men feel supported and validated. That comes up in The Managed Heart and is one subset of “emotion work”.

I am not claiming that any of these other contenders for the term “emotional labour” do not exist, are not real problems, or do not deserve academic study of their own. I believe that they do exist, are real problems, and deserve study (much of which has been done by Professor Hochschild). But I am also going to ignore them, pretend they don’t exist, and talk only about emotional labour as it was defined by Professor Hochschild: “the commercialization of our capacity to influence our own feelings”.

Unpacking that seemingly simple definition will provide fodder for most of my review.

First, what are feelings?

Professor Hochschild carefully charts the development of theories of emotion. There’s Darwin’s physical theory of emotion, that holds that emotions are the evolutionary vestiges of certain acts. Teeth barred in a rictus of anger is, to Darwin, the evolutionary vestige of actually biting. Anger emerges as the remnant of what would have been aggressive action and shows up in situations where our ancestors might have been aggressive.

Freud had some nonsense about dammed up libido (I have a policy of ignoring everything Freud said that involves the words “libido”, “oedipal”, and “fixation”, and I’m not going to break it just for this review). William James held that emotions were signifiers of physical change; to James, the emotion of anger was merely what we feel when our body prepares to fight and is solely a consequence of underlying physiological processes.

Later theorists, like Gerth and Mills, situated emotions in a social context. They talked about how culture might influence emotions and how inchoate emotions might be made understandable when others interpreted them for us. For example, if a bride cries when left at the altar on her wedding day, her mother’s explanation “you must be furious” gives name and focus to her roiling emotions. The bride may come to believe that she is crying because she is angry, and that the roil of emotions in her belly is anger. Had her mother instead suggested that she was feeling “sorrow”, then perhaps that would have been the name she chose.

Professor Hochschild builds on these definitions (and many others) to get one she’s satisfied with. To her, emotion is a sense, like proprioception or touch. It allows us to sense how we relate to others actions or to developments. Emotion in a Hochschildian framework doesn’t just lead to action (e.g. I was angry so I attacked him), it also leads to cognition (e.g. I paused to wonder why I was so sad).

Professor Hochschild holds emotion up as one of the most important senses because it acts as a signal function. There is the tautological sense in which emotion lets us know how we feel about something, but there is also the sense in which it warns us. We talk about a twinge of jealousy or a sinking dread. These emotions help us realize that all is not right.

Emotions can be consonant with a situation (e.g. I feel so happy on my wedding day), or dissonant (e.g. I should be happy at my wedding, but I’m really just scared). Dissonant emotions are most often the ones we seek to change, but as emotion becomes commercialized, we are increasingly asked to change our consonant emotions as well.

What do we do when we can’t change our emotions? And how do we effect a change?

Surface acting is one way we can deal (in a socially acceptable manner) with “feeling the wrong thing”. In surface acting, we change our countenance or face, but make no attempt to change how we feel. We might grin through pain, wear a fixed smile, or hide that we want to cry. We may not fool anyone else and we certainly don’t fool ourselves, but sometimes surface acting allows us to pay our emotional dues to those around us.

Surface acting can feel exhausting; you can’t rest or relax while you are presenting a fake face to the world. Therefore, it is often beneficial for us to be able to engage in deep acting.

Deep acting is the sincere attempt to engender an emotion that you are currently not feeling. There are two ways that you can attempt deep acting. In the first, you can try and chivy and talk yourself into feeling what you desire. When someone says they are trying to fall in love, or conversely trying not to fall too hard, they are engaging in this first form of deep acting.

The second form of deep acting shares much with method acting. Method acting encourages the actor to bring in emotions from other parts of their life and use them to animate the emotions of their character. In deep acting, you push on your emotions by using memories of other emotional states. Deep acting might look like “I was unhappy on the day of my wedding, so I brought up memories of things I like about my partner until I was smiling“.

Society imposes on us many feelings rules, which we interact with by doing the emotion work of deep acting or the feigning surface acting. Here’s a simple feelings rule: it is considered impolite to feel anything other than happiness for a friend’s promotion. If you instead feel jealous, there will be a strong societal expectation that you show none of it. Instead, you must transmute the jealousy into joy via deep acting, or hide it via surface acting.

You might think that feeling rules only apply when you aren’t interacting with the people you’re closest with. Professor Hochschild disagrees. She believes that feelings rules bind us especially tightly when we are with our closest friends or our romantic partners. She talks briefly here (and at depth in “The Second Shift”) about the economy of gratitude that exists in a relationship and how it requires constant emotion work to maintain. You expect your partner to be excited on your behalf when you get a promotion or self-flagellating and apologetic if they cheat. Closeness acts like a filter; only people who instinctively manage their emotions in a way that is pleasing to you (or, in the case of partners who try and “win someone over”, put in a lot of effort) end up close to you, so the reality of the emotion work underlying close relationships is often obscured. Part of Professor Hochschild’s purpose in studying emotion work at work was to pull back this curtain and view emotion work that wasn’t so unconscious and unthinking.

There certainly can be a gendered dynamic to emotion work. Professor Hochschild believes that men are trained to expect a certain amount of emotion work from women: fluffing of the ego, soothing of the temper, etc. She also believes that emotion work is unevenly spread because women are better trained in it and men tend to be better off. Within the context of a heterosexual relationship, this often manifests in the unconscious deal of a man providing physical security through his more highly paid work, in exchange for a woman’s emotional labour and her labour around the house (this idea is more thoroughly dissected in “The Second Shift”).

The primary marketplace and arena of emotion work is “emotional bowing”. Emotional bowing encompasses two types of exchanges, improvisational and straight. In a straight exchange, you are following the rules and exchange rates of society. When you repay advice from a senior colleague with sincere gratitude, you are engaging in a straight exchange.

When the gratitude is feigned, obviously false, or the advice given grudgingly, you are still trying to play out the straight exchange, but you are quibbling about the exchange rate. Similarly, when you brush aside gratitude and claim the advice you gave was “my pleasure”, you are making a rather different point about the exchange rate and showing kindnesses and graciousness – and perhaps making something clear about the emotional tone you expect at your workplace. Even kindness can become a demand for future emotion work.

Many disagreements, especially among close friends and lovers are caused by different notions of the exchange rate between actions. In these close relationships, emotion work is just one way that we can repay others, but it is often the one that breaks down in response to problems, when we suddenly realize the thing we “should” be feeling takes actual work to feel.

In an improvisational exchange, the feeling rules themselves are called into question, often using sarcasm or irony. A man may jokingly tell a crying male friend “remember, men never cry”. By ironically referencing the feeling rule (that men cannot show emotion), he gives his friend permission to violate it. This sort of exchange requires clear knowledge and understanding of how everyone involved interprets feelings rules, so is uncommon except in close relationships.

When the crying man rejects the toxic masculinity that causes men to disown their emotions, referencing the feeling rule might cheer him up, as he is reminded that even his sorrow is a radical act in line with his values. But if he instead embraces that conception of masculinity, referencing the feeling rule might add to his grief and make him feel a failure. Only his friends would know which is likely to occur, so only his friends would risk an improvisational exchange.

This particular part of the book brought on my existential crisis, as I found myself unable to respond to emotional displays with anything other than attempts to calculate what was given, expected, and owed. I do now wonder if this is a common experience, or if my response was somewhat atypical? In either case, a warning before I (potentially) inflicted this on anyone else seemed prudent.

Anyway, all of this background brings us to emotional labour, the true topic of this book. Emotional labour is when emotion work is removed from its normal place in the home and in broader society, and starts to become part of someone’s economic responsibilities. Physical labour has long been commoditized and therefore made anonymous – that is to say, it does not matter which particular person manufactures your car, because any other labourer could have done it approximately as well. While emotional labour has long existed, it is only recently (with a decline in manufacturing jobs and increase in service jobs) that it has become commoditized and therefore gone mainstream.

Professor Hochschild takes a somewhat Marxist approach to the dangers of emotional labour. In the same way that Marx worried about labourers being alienated from the physical products of their work, Professor Hochschild worries about the effects of labourers being alienated from the emotional products of their work.

Like all of Professor Hochschild’s books, The Managed Heart is in some sense an ethnography. The subjects of this book are bill collectors (who are required to do the emotional labour of avoiding sympathy or pity) and flight attendants (who must do the emotional labour of providing a cheery, relaxed façade). In both of these cases, these required emotions (and the feeling rules that produce them) might be variously consonant and dissonant with what the worker may wish to feel.

Earlier, I said that workers are being increasingly asked to avoid consonant feelings. Take as an example the bill collector, moved by pity or charity to seek to find a repayment schedule that works for their client or a flight attendant furious at a customer who is repeatedly belittling them. In both of these cases, emotions are correctly functioning (both as a signal function, and in accordance with societal feelings rules), but economic realities demand that the worker feel something else. Corporate requirements impose a new set of feelings rules, which may clash with extant ones, potentially grinding up workers in the process.

Acting in response to these alien feeling rules can be exhausting. For flight attendants, Professor Hochschild identified three stances they can take towards their work, each with its own risks:

In the first, the worker identifies too wholeheartedly with the job, and therefore risks burnout. In the second, the worker clearly distinguishes her- self from the job and is less likely to suffer burnout; but she may blame herself for making this very distinction and denigrate herself as ‘Just an actor, not sincere.” In the third, the worker distinguishes herself from her act, does not blame herself for this, and sees the job as positively requiring the capacity to act; for this worker there is some risk of estrangement from acting altogether, and some cynicism about it– “We’re just illusion makers.”

No job is entirely without risks (both physical and psychological), yet work must get done. I would have like to see Professor Hochschild better engage with this fact. Her potential solution (to give workers more control over the emotional labour they are required to do) is not as free of costs as she seems to think it is. For whenever it is not universal, all those companies that refuse to give control of emotional labour over to their employees may find themselves at a steep advantage. The threat of this (if emotional labour is indeed a competitive advantage) might be enough to keep whole industries scared of allowing any worker control, absent a mechanism for perfect coordination.

(It seems like the best way to free people from emotional labour would be to prove that it is not important. But we are social animals and so I doubt such a proof is forthcoming. Or possible at all.)

Still, there is often something deeply troubling about how emotional labour is framed. Professor Hochschild gives the example of a seminar about “reducing stress and making work more pleasant” at the flight attendant recurrent training centre. Belying the messaging, it seemed like the real purpose of the seminar was to convince the flight attendants to sublimate any anger they might, in the future, feel at passengers into emotions less risky for the company. A pleasant working environment was secondary to the corporate goals.

In the model of emotion-as-signal-function, anger is important. Indeed, it seems that negative emotions (specifically the negative affect/fear cluster) are particularly important to living a safe life. There seems to be something deeply wrong and dangerous to workers in telling them that all anger in their professional life is their own problem, to be appropriately handled, rather than occasionally indicative of a customer who is seriously overstepping lines.

Regardless of the right or wrong of it, the flight attendants interviewed in the book had to manage their anger and they talked about several strategies they had developed to do so (some of which were taught to them at recurrent training). They might put themselves in the angry customer’s shoes and try and imagine that person as suffering from some life events that explained and excused their behaviour. Or they might remind themselves that they only had to deal with the customer for a little while, allowing them drive out their anger and replace it with relief. Asking other co-workers for emotional support was officially discouraged, because it might lead to anger spreading. Flight attendants who could help their colleagues feel the officially sanctioned emotions (e.g. by diffusing anger with light-hearted joking) were valued members of teams.

Professor Hochschild suggests that we are trained for emotional labour from a young age. Or rather, that some children are. She suggests that working-class children are prepared to have their actions governed by rules, while middle-class children are prepared to have their feelings governed by rules. Note that this isn’t necessarily explicit. I recall receiving no specific training on emotion management, but I know that I picked it up somewhere and that I’m somewhat disturbed by people who seem unable or unwilling to practice emotion management.

One way that emotion work is taught (or not) is by family dynamics. Professor Hochschild suggests that many working-class families use a positional family control system, while middle-class families use a personal control system. In a positional family, authority is derived from a certain mixture of age, gender, employment status, parenthood, etc. Those with authority make decisions within their spheres of authority and the other members of the family must act in accordance with these decisions, although they don’t have to like it.

In a personal control system, control is achieved via appeals to the emotions of a child. Because all decisions of the child are framed as a choice (but with an obvious correct answer), this can lead to a maddening chain of explanations. Whenever the child states their preference, the parent will explain the decision in more detail and explain why the child should feel differently, such that they’ll have the “correct” preference. I can’t remember if this was explicitly mentioned, but it seems to me that this would also serve the purpose of inculcating in the child a strong understand of normative feeling rules.

There is also a relationship between these control systems and discipline. Professor Hochschild cites research that middle-class parents are more likely to sanction intent, while working-class parents are more likely to sanction actions. The working-class parent sanctions the child because of the results of a temper-tantrum. The middle-class parent sanctions the child because they lost their temper.

Professor Hochschild suggests that the sum of this is three messages sent to a (middle-class) child:

  • Feelings in others, particularly their superiors, are important and worth trying to understand.
  • Their own feelings are important and a valid reason for making decisions.
  • Feelings are meant to be managed, controlled, and yoked to rules.

It’s clear that because of this education, feelings rules are a gender and class issue. First, the feelings rules learned in childhood act as a middle- and upper-class shibboleth, making it clear who was raised outside of those classes. Working-class members looking for upwards mobility will have to do catch-up work that is entirely invisible – except in lapses – to those they are seeking to blend in with.

Second, in a world in which the higher ranks of government and corporations are biased towards men, women are given a particular incentive to be sensitive towards the feelings of men, while men have no corresponding requirement to be sensitive towards the feelings of women. Combine this with a toxic masculinity that leaves men little room to acknowledge or talk about feelings and you’re left with a situation where many men will seriously lack the capacity to understand – or even the knowledge that they should be trying to understand – the feelings of women in their lives.

Professor Hochschild frames the intersection of class and feeling rules somewhat more bluntly than I have:

More precisely, the class messages that parents pass on to their children may be roughly as follows. Middle class: “Your feelings count because you are (or will be) considered important by others:’ Lower class: “Your feelings don’t count because you aren’t (or won’t be) considered important by others:’

Note that this was written in the 80s and Professor Hochschild did suggest that orientation towards controlling emotions might soon (after the time of publication) cut across class lines due to the advent of automation. To an extent I think this has been borne out, but I feel like there is also an aesthetic element here. Class determines what emotions are acceptable to show (although of course this relationship is complicated and fickle, much like fashion), which also determines what people are raised to be able to do.

For a book that was supposed to focus on emotional labour, remarkably little of this book concerned actual interviews with labourers. The case studies here were much less in depth than in “The Second Shift” or “Strangers in Their Own Land”. This necessarily made the book harder to read and more academic and dry in tone. Ethnography often gives me the thrill of meeting (vicariously) interesting people (and of discovering that people I haven’t given much thought to are shockingly interesting!), but I found that distinctly lacking in this book.

(It’s much more theoretical than practical and I have to say that I prefer Professor Hochschild’s more practical books.)

That’s not to say the book wasn’t interesting or thought provoking. On the contrary, I often found thinking about it overwhelming. It introduced me to powerful models in areas of my life where I’d previously done little modelling.

If you want to better understand emotion, I recommend this book. If you want to read an entertaining ethnography, or see in depth case studies of how emotional management ties in to work, I’m less certain that you should. If you want an introduction to Professor Hochschild’s work, I also recommend skipping this one until you’ve read “The Second Shift”; that book is much more focused and somewhat better written.

Really, I think that my view of The Managed Heart illustrates a common problem, known to anyone who goes back and reads the earlier (and less polished) work of a beloved author. People grow, change, and develop. I can see some of the things I loved about Professor Hochschild’s later work here, but many other parts were missing.

Luckily Professor Hochschild has written several other books and they undoubtedly have more of what I like most about her. My ambivalence for the style (although not the contents) of this book have not at all dulled my resolution to read more of her work. Expect to see more reviews of Professor Hochschild’s books here in the future.

Model, Politics

Four Narratives on Mohammed Bin Salman

[10 minute read]

Since June 21st of this year, Mohammed bin Salman (often known by his initials, MBS) has been the crown prince of Saudi Arabia. This required what was assuredly not a palace coup, because changes of government or succession are never coups, merely “similar to coups”, “coup-like”, “coup-esque”, or “coupLite™” [1]. As crown prince, MBS has championed a loosening of religious restrictions on women and entertainment, a decrease in reliance on oil for state revenues, and a harder line with Qatar and Iran.

Media coverage has been, uh, split. Here’s an editorial in The Washington Post comparing MBS to Putin, while an editorial in The New York Times fawningly declares “Saudi Arabia’s Arab Spring, at Last” [2]. Given that there’s so much difference in opinion on MBS, I thought it might be useful to collect and summarize some of the common narratives, before giving my own perspective on the man.

MBS as the Enlightened Despot

Historical Archetype: Frederick the Great.
Proponents: Al Arabiya [3], optimistic western journalists.
Don’t talk to them about: The war in Yemen, the blockade of Qatar, the increased stifling of dissent.

Exemplified by the fawning column above, this school of thought holds that MBS is a dynamic young leader who will reform the Saudi economy, end its dependence on oil, overhaul its institutions, end corruption, and “restore” a more moderate form of Islam.

They point to several initiatives that back this up. There’s the Vision 2030 plan that aims to spur entrepreneurship and reduce corruption. There’s much needed educational reforms. There’s the decision to allow women to drive and view sports games. There’s the lifting of bans on entertainment. For some of them, the ambiguous clamp-down on “corruption” is even further evidence that MBS is very serious about his reforms.

To supporters, MBS has achieved much in very little time, which they take to be clear evidence of a strong work ethic and a keen intelligence. His current crop of reforms gives them clear hope that clerical power can be shattered and Saudi Arabia can one day become a functioning, modern, democracy.

MBS as a character in Game of Thrones

Historical Archetype: Richard Nixon
Proponents: Cynical western journalists, Al Jazeera
Don’t talk to them about: How real-life politics is never actually as interesting or well planned as Game of Thrones.

Cersei Lannister’s quotable warning, that “when you play a game of thrones you win or you die” might imply that MBS is on somewhat shaky ground. Proponents of the first view might dispute that and proponents of the next rejoice in it. Proponents of this view point out that so far, MBS seems to be winning.

By isolating Qatar and launching a war in Yemen, he has checked Iranian influence on the Arabian Peninsula. Whether or not it’s valid, his corruption crackdown has sidelined many potential sources of competition (and will probably net much needed liquid cash for the state coffers; it is ironic that Saudi state now turns to sources of liquidity other than the literal liquid that made it so rich). His conflict with Qatar might yet result in the shutdown of Al Jazeera, the most popular TV channel in the Arabic speaking world and long a thorn in the side of Saudi Arabian autocracy.

People who view the conflict through this lens either aren’t particularly concerned with right or wrong (e.g. westerners who just want to get their realpolitik fix) or think that the very fact that MBS might be engaging in HBO worthy realpolitik proves he is guilty of a grave crime (e.g. Al Jazeera, westerners worrying that the region might become even more unstable).

MBS as an overreaching tyrant

Historical Archetype: Joseph II (epitaph: “Here lies Joseph II, who failed in all he undertook.”)
Proponents: Arab spring activists and their allies
Don’t talk to them about: How much better MBS is than any plausible alternative.

Saudi Arabia is a rentier state with an unusual relationship with its population. Saudi state revenues are not derived from taxation (which almost invariably results in calls for responsible government), but instead from oil money. This money is distributed back to citizens via cushy government jobs. In Saudi Arabia, two-thirds of citizen employment is in the public sector. The private sector is almost wholly the purview of expats, who (if I’m reading the latest official Saudi employment report right) hold 75% of the non-governmental jobs [4].

With oil set to become obsolete in the next fifty years, Saudi Arabia is in a very bad position. The only thing that can save it is a diversified economy, but the path there isn’t smooth. Overarching reform of an economy is difficult and normally relies on extensive, society-wide consultation. Proponents of this theory see MBS as intent on centralizing power so that he can achieve this transformation single-handedly.

They note that the reversal of the ban on women driving has been paired with intense pressure on the very activists who originally agitated for its removal, pressure to say nothing and to avoid celebrations. They also note that the anti-corruption sweep conveniently removes many people who could have stood in MBS’s way as he embarks on his reforms and expropriates their wealth for the state [5]. They note that independent economists and other civil society figures – just the sort of people who could have provided (and did provide) nuanced feedback on Vision 2030 – have found themselves suddenly detained on MBS’s orders.

Proponents of this theory believe that MBS is trying to modernize Saudi Arabia, but that he is doomed to fail in his attempts without building a (possibly democratic) consensus around the direction of the kingdom. They believe that Saudi Arabia cannot have the civil society necessary for reform until the government stops viewing rights as something it gives the citizens (and that they must be grateful for), but as an inherent human birthright.

If you believe this, you’ll most likely see MBS as moving the kingdom further from this ideal. And you might see the invasion and ongoing war in Yemen as the sort of cluster-fuck we can expect from MBS’s too-rapid attempts to accumulate and use power.

My View

I would first like to note that one advantage of caricaturing other views then providing a synthesis is that you get to appear reasonable and nuanced by comparison. I’m going to claim that as my reward for going through the work to post this, but please do remember that other people have nuanced views too. I got where I am by reading or listening to them!

My overarching concern with respect to Saudi Arabia is checking the spread of Wahhabi fundamentalism. Saudi Arabia has been exporting this world-wide, with disastrous effects. Wahhabism may not be the official ideology of the so-called Islamic State (Daesh), but it is inextricably tied to their barbarism. Or rather, their barbarity is inextricably tied to and influenced by Wahhabism. It is incredibly easy to find articles by authors, Muslim or not, (many by academics) marking the connection between Wahhabism and terrorism.

The takfiri impulses of Wahhabism [6] underlie the takfiri doctrine so beloved of Daesh. Of course, the vast, vast majority of Wahhabis engage in neither terrorism, nor public executions of (by Canadian standards) innocent people. But insofar as those things do happen in the Sunni world, Wahhabi men are unusually likely to be the perpetrators. It is tempting to go further, to claim that conservatives are wrong – that there is no Islamic terrorism problem, merely a Wahhabi terrorism problem [7] – but this would be false.

(There is terrorism conducted by Shia Muslims and by other Sunni sects and to call terrorism a solely Wahhabi problem makes it sound like there are no peaceful Wahhabis. A much more accurate (and universal, as this is true across almost all religions and populations) single cause would be masculinity, as almost all terrorists are men.)

Still, the fact that so much terrorism can be traced back to a close western ally [8] is disquieting and breeds some amount of distrust of the west in some parts of the Islamic world (remember always that Muslim are the primary victims of Islamic terrorism; few have better reasons to despise Islamic terrorism than the terrorists’ co-religionists and most-frequent victims).

Beyond terrorist groups like Daesh, Wahhabism fuels sectarian conflicts, strips rights from women, makes life even more dangerous for queer people in Muslim countries, and leads to the arrest and persecution of atheists. I am in a general a staunch liberal and I believe that most religions can coexist peacefully and many represent paths towards human flourishing. I do not believe this about Wahhabism. It stifles flourishing and breeds misery wherever it lands. It must be stopped.

The fact that Wahhabism at home is a problem for MBS (the Wahhabi clergy is an alternative, non-royal power centre that he can’t directly control) could give me some hope that he might stop supporting Wahhabism. Certainly he has made statements to that effect. But it is very unclear if he has any real interest in ending Saudi Arabia $100 billion-dollar effort to export Wahhabism abroad. I would be unsurprised if he deals with the domestic problems inherent in displacing the clergy (i.e. they might not want to be displaced without a messy fight) by sending the most reticent and troublesome members abroad, where they won’t mess up his own plans.

There’s the added wrinkle of Iran. MBS clearly hates Iran and Wahhabism considers Iranian Shiites heretical by default. MBS could easily hold onto Wahhabism abroad simply for its usefulness in checking Iranian influence.

Second to this concern is my concern for the human rights of Yemenis. MBS launched a war that has been marked by use of cluster munitions and flagrant disregard for civilian casualties. MBS instigated this war and was defense minister for much of its duration. Its existence and his utter failure to hold his troops to humanitarian standards is a major black mark against him.

Finally, I care about human rights inside Saudi Arabia. It seems clear that in general, the human rights situation inside the country will improve with MBS in power. There really doesn’t exist a plausible power centre that is more likely to make the average Saudi freer. That said, MBS has detained activists and presided over the death sentence of peaceful protestors.

The average Saudi who does not rock the boat may see her life improve. But the activists who have struggled for human rights will probably not be able to enjoy them themselves.

What this means is that MBS is better than almost all plausible replacements (in the short-term), but he is by no means a good leader, or a morally upstanding individual. In the long term, he might stunt the very civil society that Saudi Arabia needs to become a society that accepts and promotes human flourishing [9]. And if he fails in his quest to modernize Saudi society, we’re much more likely to see unrest, repression, and a far worse regime than we are to see democratic change.

In the long run, we’re all dead. But before that, Saudi Arabia may be in for some very uncomfortable changes.


[1] As near as I can tell, the change was retroactively made all proper with the Allegiance Council, as soon as the fait was truly accompli. Reports that they approved it beforehand seem to come only from sources with a very vested interest in that being true. ^

[2] There’s something deeply disturbing about a major news organization comparing a change in which unelected despot will lead a brutal dictatorship with a movement that earnestly strove for democratic change. ^

[3] A note on news outlets linked to throughout this post: Al Arabiya is owned by Saudi Arabia and therefore tends to view everything Saudi Arabia does in the best possible light. Al Jazeera is owned by Qatar (which is currently being blockaded by Saudi Arabia) and tends to view the kingdom in the worst possible light. The Arab Tyrants Manual Podcast that informed my own views here is produced by Iyad El-Baghdadi, who was arrested for his Arab Spring reporting by The United Arab Emirates (a close ally of Saudi Arabia) and later exiled. This has somewhat soured his already dim view on Arab dictatorships. ^

[4] Foreigners make up about 53% of the total labour force and almost all of them work in the private sector. Saudis holding private jobs are ~15.5% of the labour force based on these numbers. If we divide 15.5% by 53% plus 15.5%, we get 22% of private jobs held by Saudis. I think for purposes of this comparison, Saudi Aramco, the state oil giant, counts as the public sector.

Remember also that Saudi Arabia has a truly dismal adult labour force participation rate, a side of effect of their deeply misogynistic public policy. ^

[5] Furthermore, they point out that it is basically impossible to tell if a Saudi royal is corrupt or not, because there is no clear boundary between the personal fortune of the Saud dynasty and the state coffers. Clearing up this particular ambiguity seems low on the priority list of a man who just bought a half-billion dollar yacht.

(If you’re not too lazy to click on a footnote, but are too lazy to click on a link, it was MBS. MBS bought the giant yacht. Spoilers.) ^

[6] I’ve long held the belief that Wahhabism is dangerous. When talking about this with my Muslim friends, I was often hesitant and apologetic. I needn’t have been. Their vehemence in criticism of Wahhabism often outstripped mine. That was because they had all of my reasons to dislike Wahhabism, plus the unique danger takfir presented to them.

Takfir is the idea that Wahhabis (or their ideological descendants) may deem other Muslims to be infidels if they do not follow Wahhabism’s austere commandments. This often leads to the execution or lynching of more moderate Muslims at the hands of takfiris. As you may have guessed, most North American Muslims could be called takfir by Wahhabis or others of their ilk.

Remember: there are Quranic rules of conduct (oft broken, but still existing) that govern how ISIL may treat Christians or Jews. With those they declare takfir, there are no such niceties. Daesh ecstatically executes Muslims they deem takfir.

Takfir is one of the many reasons that it is easy to find articles by Muslim authors decrying Wahhabism. Many Muslims legitimately fear a form of Islam that would happily deem them heretical and execute them. ^

[7] It is commonly reported that 15 of the 19 September 11 hijackers were Saudi men, brought up on Wahhabism. The link between Wahhabism, takfir, and terrorism is another reason it is common to find non-Wahhabi Muslims opposed to Wahhabism. Here’s a sampling of English language reporting on Daesh from Muslim countries. Indeed, in many sources I’ve read, the word takfiri was exclusively followed by “terrorist” or “terrorists”. ^

[8] It remains baffling and disgusting that politicians like Donald Trump, Teresa May, and Justin Trudeau can claim to oppose terrorism, while also maintaining incredibly close relationships with Saudi Arabia, which was described in a leaked diplomatic cable as “the most significant source of funding to Sunni terrorist groups worldwide”. ^

[9] To create a civil society, Saudi Arabia would need to lift restrictions on the press, give activists some official power, and devolve more power to elected municipalities. Civil society is the corona of pressure groups, advisors, and influencers that exist around a government and allow people to build common knowledge about their desires. Civil society helps you understand just how popular or unpopular a government policy is and gives you a lever to pull if you want to influence it.

A functioning civil society protects a government from its own mistakes (by making an outcry possible before any deed is irreversibly done) and helps ensure that the government is responsible to the will of the people.

That MBS is working hard to prevent civil society shows that he has no desire for feedback and believes he knows better than literally everyone else in the country who is not already his sycophant. I see few ways this could end well. ^

Model, Philosophy

When Remoter Effects Matter

In utilitarianism, “remoter effects” are the result of our actions influencing other people (and are hotly debated). I think that remoter effects are often overstated, especially (as Sir Williams said in Utilitarianism for and against) when they give the conventionally ethical answer. For example, a utilitarian might claim that the correct answer to the hostage dilemma [1] is to kill no one, because killing weakens the sanctity of human life and may lead to more deaths in the future.

When debating remoter effects, I think it’s worthwhile to split them into two categories: positive and negative. Positive remoter effects are when your actions cause others to refrain from some negative action they might otherwise take. Negative remoter effects are when your actions make it more likely that others will engage in a negative action [2].

Of late, I’ve been especially interested in ways that positive and negative remoter effects matter in political disagreements. To what extent will acting in an “honourable” [3] or pro-social way convince one’s opponents to do the same? Conversely, does fighting dirty bring out the same tendency in your opponents?

Some of my favourite bloggers are doubtful of the first proposition:

In “Deontologist Envy”, Ozy writes that we shouldn’t necessarily be nice to our enemies in the hopes that they’ll be nice to us:

In general people rarely have their behavior influenced by their political enemies. Trans people take pains to use the correct pronouns; people who are overly concerned about trans women in bathrooms still misgender them. Anti-racists avoid the use of slurs; a distressing number of people who believe in human biodiversity appear to be incapable of constructing a sentence without one. Social justice people are conscientious about trigger warnings; we are subjected to many tedious articles about how mentally ill people should be in therapy instead of burdening the rest of the world with our existence.

In “The Blues of Self-Regulation”, David Schraub talks about how this specifically applies to Republicans and Democrats:

The problem being that, even when Democrats didn’t change a rule protecting the minority party, Republicans haven’t even blinked before casting them aside the minute they interfered with their partisan agenda.

Both of these points are basically correct. Everything that Ozy says about asshats on the internet is true and David wrote his post in response to Republicans removing the filibuster for Supreme Court nominees.

But I still think that positive remoter effects are important in this context. When they happen (and I will concede that this is rare), it is because you are consistently working against the same political opponents and at least some of those opponents are honourable people. My favourite example here (although it is from war, not politics) is the Christmas Day Truce. This truce was so successful and widespread that high command undertook to move men more often to prevent a recurrence.

In politics, I view positive remoter effects as key to Senator John McCain repeatedly torpedoing the GOP healthcare plans. While Senators Murkowski and Collins framed their disagreements with the law around their constituents, McCain specifically mentioned the secretive, hurried and partisan approach to drafting the legislation. This stood in sharp contrast to Obamacare, which had numerous community consultations, went through committee and took special (and perhaps ridiculous) care to get sixty senators on board.

Imagine that Obamacare had been passed after secret drafting and no consultations. Imagine if Democrats had dismantled even more rules in the senate. They may have gotten a few more of their priorities passed or had a stronger version of Obamacare, but right now, they’d be seeing all that rolled back. Instead of evidence of positive remoter effects, we’d be seeing a clear case of negative ones.

When dealing with political enemies, positive remoter effects require a real sacrifice. It’s not enough not to do things that you don’t want to do anyway (like all the examples Ozy listed) and certainly not enough to refrain from doing things to third parties. For positive remoter effects to matter at all – for your opponents (even the honourable ones) not to say “well, they did it first and I don’t want to lose” – you need to give up some tools that you could use to advance your interests. Tedious journalists don’t care about you scrupulously using trigger warnings, but may appreciate not receiving death threats on Twitter.

Had right-wingers refrained from doxxing feminist activists (or even applied any social consequences at all against those who did so), all principled people on the left would be refusing to engage in doxxing against them. As it stands, that isn’t the case and those few leftists who ask their fellow travelers to refrain are met with the entirely truthful response: “but they started it!”

This highlights what might be an additional requirement for positive remoter effects in the political sphere: you need a clearly delimited coalition from which you can eject misbehaving members. Political parties are set up admirably for this. They regularly kick out members who fail to act as decorously as their office demands. Social movements have a much harder time, with predictable consequences – it’s far too easy for the most reprehensible members of any group to quickly become the representatives, at least as far as tactics are concerned.

Still, with positive remoter effects, you are not aiming at a movement or party broadly. Instead you are seeking to find those honourable few in it and inspire them on a different path. When it works (as it did with McCain), it can work wonders. But it isn’t something to lay all your hopes on. Some days, your enemies wake up and don’t screw you over. Other days, you have to fight.

Negative remoter effects seem so obvious as to require almost no explanation. While it’s hard (but possible) to inspire your opponents to civility with good behaviour, it’s depressingly easy to bring them down to your level with bad behavior. Acting honourably guarantees little, but acting dishonourably basically guarantees a similar response. Insofar as honour is a useful characteristic, it is useful precisely because it stops this slide towards mutual annihilation.


[1] In the hostage dilemma, you are one of ten hostages, captured by rebels. The rebel leader offers you a gun with a single bullet. If you kill one of your fellow hostages, all of the survivors (including you) will be let free. If you refuse all of the hostages (including you) will be killed. You are guarded such that you cannot use the weapon against your captors. Your only option is to kill another hostage, or let all of the hostages be killed.

Here, I think remoter effects fail to salvage the conventional answer and the only proper utilitarian response is to kill one of the other hostages. ^

[2] Here I’m using “negative” in a roughly utilitarian sense: negative actions are those that tend to reduce the total utility of the world. When used towards good ends, negative actions consume some of the positive utility that the ends generate. When used towards ill ends, negative actions add even more disutility. This definition is robust against different preferred plans of actions (e.g. it works across liberals and conservatives, who might both agree that political violence tends to reduce utility, even if it doesn’t always reduce utility enough to rule it out in the face of certain ends), but isn’t necessarily robust across all terminal values (e.g. if you care only about reducing suffering and I care only for increasing happiness we may have different opinions on the tendency of reproduction towards good or ill).

Negative actions are roughly equivalent to “defecting”. “Roughly” because it is perhaps more accurate to say that the thing that makes defecting so pernicious is that it involves negative actions of a special class, those that generate extra disutility (possibly even beyond what simple addition would suggest) when both parties engage in them. ^

[3] I used “honourable” in several important places and should probably define it. When discussing actions, I think honourable actions are the opposite of “negative” actions as defined above: actions that tend towards the good, but can be net ill if used for bad ends. When describing “people” as honourable, I’m pointing to people who tend to reinforce norms around cooperation. This is more or less equivalent to being inherently reluctant to use negative actions to advance goals unless provoked.

My favourite example of honour is Salah ad-Din. He sent his own personal physician to tend to King Richard, who was his great enemy and used his own money to buy back a child kidnapped into slavery. Conveniently for me, Salah ad-Din shows both sides of what it means to be honourable. He personally executed Raynald III of Tripoli after Raynald ignored a truce, attacked Muslim caravans, and tortured many of the caravaners to death. To Guy of Lusignan, King of Jerusalem (who was captured in the same battle as Raynald and wrongly feared he was next to die), Salah ad-Din said: “[i]t is not the wont of kings, to kill kings; but that man had transgressed all bounds, and therefore did I treat him thus.” ^

History, Model

Warriors and Soldiers

Epistemic Status: Full of sweeping generalizations because I don’t want to make it 10x longer by properly unpacking all the underlying complexity.

[9 minute read]

In 2006, Dr. Atul Gawande wrote an article in The New Yorker about maternal care entitled “How Childbirth Went Industrial“. It’s an excellent piece from an author who consistently produces excellent pieces. In it, Gawande charts the rise of the C-section, from its origin as technique so dangerous it was considered tantamount to murder (and consequently banned on living mothers), to its current place as one of the most common surgical procedures carried out in North American hospitals.

The C-section – and epidurals and induced labour – have become so common because obstetrics has become ruthlessly focused on maximizing the Apgar score of newborns. Along the way, the field ditched forceps (possibly better for the mother yet tricky to use or teach), a range of maneuvers for manually freeing trapped babies (likewise difficult), and general anesthetic (genuinely bad for infants, or at least for the Apgar scores of infants).

The C-section has taken the place of much of the specialized knowledge of obstetrics of old, not the least because it is easy to teach and easy for even relatively less skilled doctors to get right. When Gawande wrote the article, there was debate about offering women in their 39th week of pregnancy C-sections as an alternative to waiting for labour. Based on the stats, this hasn’t quite come to pass, but C-sections have become slightly more prevalent since the article was written.

I noticed two laments in the piece. First, Gawande wonders at the consequences of such an essential aspect of the human experience being increasingly (and based off of the studies that show forceps are just as good as C-sections, arguably unnecessarily) medicalized. Second, there’s a sense throughout the article that difficult and hard-won knowledge is being lost.

The question facing obstetrics was this: Is medicine a craft or an industry? If medicine is a craft, then you focus on teaching obstetricians to acquire a set of artisanal skills—the Woods corkscrew maneuver for the baby with a shoulder stuck, the Lovset maneuver for the breech baby, the feel of a forceps for a baby whose head is too big. You do research to find new techniques. You accept that things will not always work out in everyone’s hands.

But if medicine is an industry, responsible for the safest possible delivery of millions of babies each year, then the focus shifts. You seek reliability. You begin to wonder whether forty-two thousand obstetricians in the U.S. could really master all these techniques. You notice the steady reports of terrible forceps injuries to babies and mothers, despite the training that clinicians have received. After Apgar, obstetricians decided that they needed a simpler, more predictable way to intervene when a laboring mother ran into trouble. They found it in the Cesarean section.

Medicine would not be the first industry to industrialize. The quasi-mythical King Ludd that gave us the phrase “Luddite” was said to be a weaver, put out of business by the improved mechanical knitting machines. English programs turn out thousands of writers every year, all with an excellent technical command of the English language, but most with none of the emotive power of Gawande. Following the rules is good enough when you’re writing for a corporation that fears to offend, or for technical clarity. But the best writers don’t just know how to follow the rules. They know how and when to break them.

If Gawande was a student of military history, he’d have another metaphor for what is happening to medicine: warriors are being replaced by soldiers.

If you ever find yourself in possession of a spare hour and feel like being lectured breathlessly by a wide-eyed enthusiast, find your local military history buff (you can identify them by their collection of swords or antique guns) and ask them whether there’s any difference between soldiers and warriors.

You can go do this now, or I can fill in, having given this lecture many times myself.

Imagine your favourite (or least favourite) empire from history. You don’t get yourself an empire by collecting bottle caps. To create one, you need some kind of army. To staff your army, you have two options. Warriors, or soldiers.

(Of course, this choice isn’t made just by empires. Their neighbours must necessarily face the same conundrum.)

Warriors are the heroes of movies. They were almost always the product of training that starts at a young age and more often than not were members a special caste. Think medieval European Knights, Japanese Samurai, or the Hashashin fida’i. Warriors were notable for their eponymous mastery of war. A knight was expected to understand strategy and tactics, riding, shooting, fighting (both on foot and mounted), and wrestling. Warriors wanted to live up to their warrior ethos, which normally emphasized certain virtues, like courage and mercy (to other warriors, not to any common peasant drafted to fight them).

Soldiers were whichever conscripts or volunteers someone could get into a reasonable standard of military order. They knew only what they needed to complete their duties: perhaps one or two simple weapons, how to march in formation, how to cook, and how to repair some of their equipment [1]. Soldiers just wanted to make it through the next battle alive. In service to this, they were often brutally efficient in everything they did. Fighting wasn’t an art to them – it was simple butchery and the simpler and quicker the better. Classic examples of soldiers are the Roman Legionaries, Greek Hoplites, and Napoleon’s Grande Armée.

The techniques that soldiers learned were simple because they needed to be easy to teach to ignorant peasants on a mass scale in a short time. Warriors had their whole childhood for elaborate training.

(Or at least, that’s the standard line. In practice, things were never quite as clear cut as that – veteran soldiers might have been as skilled as any warrior, for example. The general point remains though; one on one, you would always have bet on a warrior over a soldier.)

But when you talk about armies, a funny thing happens. Soldiers dominated [2]. Individually, they might have been kind of crap at what they did. Taken as a whole though, they were well-coordinated. They looked out for each other. They fought as a team. They didn’t foolishly break ranks, or charge headlong into the enemy. When Germanic warriors came up against Roman soldiers, they were efficiently butchered. The Germans went into battle looking for honour and perhaps a glorious death. The Romans happily gave them the latter and so lived (mostly) to collect their pensions. Whichever empire you thought about above almost certainly employed soldiers, not warriors.

It turns out that discipline and common purpose have counted for rather a lot more in military history than simple strength of arms. Of this particular point, I can think of no better example than the rebellion that followed the Meiji restoration. The few rebel samurai, wonderfully trained and unholy terrors in single combat were easily slaughtered by the Imperial conscripts, who knew little more than which side of a musket to point at the enemy.

The very fact that the samurai didn’t embrace the firing line is a point against them. Their warrior code, which esteemed individual skill, left them no room to adopt this devastating new technology. And no one could command them to take it up, because they were mostly prima donnas where their honour was concerned.

I don’t want to be too hard on warriors. They were actually an efficient solution to the problem of national defence if a population was small and largely agrarian, lacked political cohesion or logistical ability, or was otherwise incapable of supporting a large army. Under these circumstances, polities could not afford to keep a large population under arms at all times. This gave them several choices: they could rely on temporary levies, who would be largely untrained. They could have a large professional army that paid for itself largely through raiding, or they could have a small, elite cadre of professional warriors.

All of these strategies had disadvantages. Levies tended to have very brittle morale, and calling up a large proportion of a population makes even a successfully prosecuted war economically devastating. Raiding tends to make your neighbours really hate you, leading to more conflicts. It can also be very bad for discipline and can backfire on your own population in lean times. Professional warriors will always be dwarfed in numbers by opponents using any other strategy.

Historically, it was never as simple as solely using just one strategy (e.g. European knights were augmented with and eventually supplanted by temporary levies), but there was a clear lean towards one strategy or another in most resource-limited historical polities. It took complex cultural technology and a well-differentiated economy to support a large force of full time soldiers and wherever these pre-conditions were lacking, you just had to make do with what you could get [3].

When conditions suddenly call for a struggle – whether that struggle is against a foreign adversary, to boost profits, or to cure disease, it is useful to look at how many societal resources are thrown at the fight. When resources are scarce, we should expect to see a few brilliant generalists, or many poorly trained conscripts. When resources are thick on the ground, the amount that can be spent on brilliant people is quickly saturated and the benefits of training your conscripts quickly accrue. From one direction or another, you’ll approach the concept of soldiers.

Doctors as soldiers, not as warriors is the concept Gawande is brushing up against in his essay. These new doctors will be more standardized, with less room for individual brilliance, but more affordances for working well in teams. The prima donnas will be banished (as they aren’t good team players, even when they’re brilliant). Dr. Gregory House may have been the model doctor in the Victorian Age, or maybe even in the fifties. But I doubt any hospital would want him now. It may be that this standardization is just the thing we need to overcome persistent medical errors, improve outcomes across the board, and make populations healthier. But I can sympathize with the position that it might be causing us to lose something beautiful.

In software development, where I work, a similar trend can be observed. Start-ups aggressively court ambitious generalists, for whom freedom to build things their way is more important than market rate compensation and is a better incentive than even the lottery that is stock-options. At start-ups, you’re likely to see languages that are “fun” to work with, often dynamically typed, even though these languages are often considered less inherently comprehensible than their more “enterprise-friendly” statically typed brethren.

It’s with languages like Java (or its Microsoft clone, C#) and C++ that companies like Google and Amazon build the underlying infrastructure that powers large tracts of the internet. Among the big pure software companies, Facebook is the odd one out for using PHP (and this choice required them to rewrite the code underlying the language from scratch to make it performant enough for their large load).

It’s also at larger companies where team work, design documents, and comprehensibility start to be very important (although there’s room for super-stars at all of the big “tech” companies still; it’s only in companies more removed from tech and therefore outside a lot of the competition for top talent where being a good team player and writing comprehensible code might top brilliance as a qualifier). This isn’t to say that no one hiring for top talent appreciates things like good documentation, or comprehensibility. Merely that it is easy for a culture that esteems individual brilliance to ignore these things are a mark of competence.

Here the logic goes that anyone smart enough for the job will be smart enough to untangle the code of their predecessors. As anyone who’s been involved in the untangling can tell you, there’s a big difference between “smart enough to untangle this mess” and “inclined to wade through this genius’s spaghetti code to get to the part that needs fixing”.

No doubt there exist countless other examples in fields I know nothing about.

The point of gathering all these examples and shoving them into my metaphor is this: I think there exist two important transitions that can occur when a society needs to focus a lot of energy on a problem. The transition from conscripts to soldiers isn’t very interesting, as it’s basically the outcome of a process of continuous improvement.

But the transition from warriors to soldiers is. It’s amazing that we can often get better results by replacing a few highly skilled generalists who apply a lot of hard fought decision making, with a veritable army of less well trained, but highly regimented and organized specialists. It’s a powerful testament to the usefulness of group intelligence. Of course, sometimes (e.g. Google, or the Mongols) you get both, but these are rare happy accidents.

Being able to understand where this transition is occurring helps you understand where we’re putting effort. Understanding when it’s happening within your own sphere of influence can help you weather it.

Also note that this transition doesn’t only go in one direction. As manufacturing becomes less and less prevalent in North America, we may return to the distant past, when manufacturing stuff was only undertaken by very skilled artisans making unique objects.


[1] Note the past tense throughout much of this essay; when I speak about soldiers and warriors, I’m referring only to times before the 1900s. I know comparatively little about how modern armies are set up. ^

[2] Best of all were the Mongols, who combined the lifelong training of warriors with the discipline and organization of soldiers. When Mongols clashed with European knights in Hungary, their “dishonourable” tactics (feints, followed by feigned retreats and skirmishing) easily took the day. This was all possible through a system of signal flags that allowed Subutai to command the whole battle from a promontory. European leaders were expected to show their bravery by being in the thick of fighting, which gave them no overall control over their lines. ^

[3] Historically, professional armies with good logistical support could somewhat pay for themselves by expanding an empire, which brought in booty and slaves. This is distinct from raiding (which does not seek to incorporate other territories) and has its own disadvantages (rebellion, over-extension, corruption, massive unemployment among unskilled labourers, etc.). ^

Data Science, Literature, Model

Two Ideas Worth Sharing From ‘Weapons of Math Destruction’

Recently, I talked about what I didn’t like in Dr. Cathy O’Neil’s book, Weapons of Math Destruction. This time around, I’d like to mention two parts of it I really liked. I wish Dr. O’Neil put more effort into naming the concepts she covered; I don’t have names for them from WMD, but in my head, I’ve been calling them Hidden Value Encodings and Axiomatic Judgements.

Hidden Value Encodings

Dr. O’Neil opens the book with a description of the model she uses to cook for her family. After going into a lot of detail about it, she makes this excellent observation:

Here we see that models, despite their reputation for impartiality, reflect goals and ideology. When I removed the possibility of eating Pop-Tarts at every meal, I was imposing my ideology on the meals model. It’s something we do without a second thought. Our own values and desires influence our choices, from the data we choose to collect to the questions we ask. Models are opinions embedded in mathematics.

It is far too easy to view models as entirely empirical, as math made form and therefore blind to values judgements. But that couldn’t be further from the truth. It’s value judgements all the way down.

Imagine a model that tries to determine when a credit card transaction is fraudulent. Fraudulent credit cards transactions cost the credit card company money, because they must refund the stolen amount to the customer. Incorrectly identifying credit card transactions also costs a company money, either through customer support time, or if the customer gets so fed up by constant false positives that they switch to a different credit card provider.

If you were tasked with building a model to predict which credit card transactions were fraudulent by one of the major credit card companies, you would probably build into your model a variable cost for failing to catch fraudulent transactions (equivalent to the cost the company must bear if the transaction is fraudulent) and a fixed cost for labelling innocuous transactions as fraudulent (equivalent to the average cost of a customer support call plus the average chance of a false positive pushing someone over the edge into switching cards multiplied by the cost of their lost business over the next few years).

From this encoding, we can already see that our model would want to automatically approve all transactions below the fixed cost of dealing with false positives [1], while applying increasing scrutiny to more expensive items, especially expensive items with big resale value or items more expensive than the cardholder normally buys (as both of these point strongly toward fraud).

This seems innocuous and logical. It is also encoding at least two sets of values. First, it encodes the values associated with capitalism. At the most basic level, this algorithm “believes” that profit is good and losses are bad. It is aimed to maximize profit for the bank and while we may hold this as a default assumption for most algorithms associated with companies, that does not mean it is devoid of values; instead it encodes all of the values associated with capitalism [2]. Second, the algorithm encodes some notion that customers have freedom to choose between alternatives (even more so than is encoded by default in accepting capitalism).

By applying a cost to false positives (and likely it would be a cost that rises with each previous false positive), you are tacitly acknowledging that customers could take their business elsewhere. If customers instead had no freedom to choose who they did business with, you could merely encode as your loss from false positives the fixed cost of fielding support calls. Since outsourced phone support is very cheap, your algorithm would care much less about false positives if there was no consumer choice.

As far as I can tell, there is no “value-free” place to stand. An algorithm in the service of a hospital that helps diagnose patients or focus resources on the most ill encodes the value that “it is better to be healthy than sick; better to be alive than dead”. These values might be (almost-)universal, but they still exist, they are still encoded, and they still deserve to be interrogated when we put functions of our society in the hands of software governed by them.

Axiomatic Judgements

One of the most annoying parts of being a child is the occasional requirement to accept an imposition on your time or preferences with the explanation “because I say so”. “Because I say so” isn’t an argument, it’s a request that you acknowledge adults’ overwhelming physical, earning, and social power as giving them a right to set arbitrary rules for you. Some algorithms, forced onto unwelcoming and less powerful populations (teachers, job-seekers, etc.) have adopted this MO as well. Instead of having to prove that they have beneficial effects or that their outputs are legitimate, they define things such that their outputs are always correct and brook no criticism.

Here’s Dr. O’Neil talking about a value-added teaching model in Washington State:

When Mathematica’s scoring system tags Sarah Wysocki and 205 other teachers as failures, the district fires them. But how does it ever learn if it was right? It doesn’t. The system itself has determined that they were failures, and that is how they are viewed. Two hundred and six “bad” teachers are gone. That fact alone appears to demonstrate how effective the value-added model is. It is cleansing the district of underperforming teachers. Instead of searching for the truth, the score comes to embody it.

She contrasts this with how Amazon operates: “if Amazon.​com, through a faulty correlation, started recommending lawn care books to teenage girls, the clicks would plummet, and the algorithm would be tweaked until it got it right.” On the other hand, the teacher rating algorithm doesn’t update, doesn’t look check if it is firing good teachers, and doesn’t take an accounting of its own costs. It holds it as axiomatic ­–a basic fact beyond questioning– that its results are the right results.

I am in full agreement with Dr. O’Neil’s criticism here. Not only does it push past the bounds of fairness to make important decisions, like hiring and firing, through opaque formulae that are not explained to those who are being judged and lack basic accountability, but it’s a professional black mark on all of the statisticians involved.

Whenever you train a model, you hold some data back. This is your test data and you will use it to assess how well your model did. That gets you through to “production” – to having your model out in the field. This is an exciting milestone, not only because your model is now making decisions and (hopefully) making them well, but because now you’ll have way more data. You can see how your new fraud detection algorithm does by the volume of payouts and customer support calls. You can see how your new leak detection algorithm does by customers replying to your emails and telling you if you got it right or not.

A friend of mine who worked in FinTech once told me that they approved 1.5% of everyone who applied for their financial product, no matter what. They’d keep the score their model gave to that person on record, then see how the person fared in reality. If they used the product responsibly despite a low score, or used it recklessly despite a high score, it was viewed as valuable information that helped the team make their model that much better. I can imagine a team of data scientists, heads together around a monitor, looking through features and asking each other “huh, do any of you see what we missed here?” and it’s a pleasant image [3].

Value added teaching models, or psychological pre-screens for hiring do nothing of the sort (even though it would be trivial for them to!). They give results and those results are defined as the ground truth. There’s no room for messy reality to work its way back into the cycle. There’s no room for the creators to learn. The algorithm will be flawed and imperfect, like all products of human hands. That is inevitable. But it will be far less perfect than it could be. Absent feedback, it is doomed to always be flawed, in ways both subtle and gross, and in ways unknown to its creators and victims.

Like most Canadian engineering students, I made a solemn vow:

…in the presence of these my betters and my equals in my calling, [I] bind myself upon my honour and cold iron, that, to the best of my knowledge and power, I will not henceforward suffer or pass, or be privy to the passing of, bad workmanship or faulty material in aught that concerns my works before mankind as an engineer…

Sloppy work, like that value-added teacher model is the very definition of bad workmanship. Would that I never suffer something like that to leave my hands and take life in the world! It is no Quebec Bridge, but the value-added teaching model and other doomed to fail algorithms like it represent a slow-motion accident, steadily stealing jobs and happiness from people with no appeal or remorse.

I can accept stains on the honour of my chosen profession. Those are inevitable. But in a way, stains on our competence are so much worse. Models that take in no feedback are both, but the second really stings me.


[1] This first approximation isn’t correct in practice, because certain patterns of small transactions are consistent with fraud. I found this out the hard way, when a certain Bitcoin exchange’s credit card verification procedure (withdrawing less than a dollar, then refunding it a few days later, after you tell them how much they withdrew) triggered the fraud detection software at my bank. Apparently credit card thieves will often do a similar thing (minus the whole “ask the cardholder how much was withdrawn” step), as a means of checking if the card is good without cluing in the cardholder. ^

[2] I don’t mean this as a criticism of capitalism. I seek merely to point out (that like all other economic systems) capitalism is neither value neutral, nor inevitable. “Capitalism” encodes values like “people are largely rational”, “people often act to maximize their gains” and “choice is fundamentally good and useful”. ^

If socialist banks had ever made it to the point of deploying algorithms (instead of collapsing under the weight of their flawed economic system), those algorithms would also encode values (like “people will work hard for the good of the whole” and “people are inherently altruistic” and “it is worth it to sacrifice efficiency in the name of fairness”).

[3] Dulce et decorum est… get the fucking data science right. ^

Data Science, Literature, Model

Two Fallacies From ‘Weapons of Math Destruction’

Much thanks to Cody Wild for providing editing and feedback. That said, I would like to remind my readers that I deserve full credit for all errors and that all opinions expressed here are only guaranteed to be mine.

[12 minute read]

I recently read Weapons of Math Destruction by Dr. Cathy O’Neil and found it an enormously frustrating book. It’s not that whole book was rubbish ­– that would have made things easy. No, the real problem with this book is that the crap and the pearls were so closely mixed that I had to stare at every sentence very, very carefully in hopes of figuring out which one each was. There’s some good stuff in here. But much of Dr. O’Neil’s argumentation relies on two new (to me) fallacies. It’s these fallacies (which I’ve dubbed the Ought-Is Fallacy and the Availability Bait-and-Switch) that I want to explore today.

Ought-Is Fallacy

It’s a commonly repeated truism that “correlation doesn’t imply causation”. People who’ve been around the statistics block a bit longer might echo Randall Monroe and retort that “correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there'”. Understanding why a graph like this:

In addition to this graph obviously being anchored, using it is obviously fair use.
Image Copyright The New York Times, 2017. Used here for purposes of commentary and criticism.

Is utter horsecrap [1], despite how suggestive it looks is the work of a decent education in statistics. Here correlation doesn’t imply causation. On the other hand, it’s not hard to find excellent examples where correlation really does mean causation:

This would be a risky graph to use if echo chambers didn't mean that I know literally no one who doesn't believe in global warming
Source: The National Centers for Environmental Administration. Having to spell “centre” wrong and use inferior units is a small price to pay for the fact that the American government immediately releases everything it creates into the public domain.

When trying to understand the ground truth, it’s important that you don’t confuse correlation with causation. But not every human endeavour is aimed at determining the ground truth. Some endeavours really do just need to understand which activities and results are correlated. Principal among these is insurance.

Let’s say I wanted to sell you “punched in the face” insurance. You’d pay a small premium every month and if you were ever punched in the face hard enough to require dental work, I’d pay you enough to cover it [2]. I’d probably charge you more if you were male, because men are much, much more likely to be seriously injured in an assault than women are.

I’m just interested in pricing my product. It doesn’t actually matter if being a man is causal of more assaults or just correlated with it. It doesn’t matter if men aren’t inherently more likely to assault and be assaulted compared to women (for a biological definition of “inherently”). It doesn’t matter what assault rates would be like in a society without toxic masculinity. One thing and one thing alone matters: on average, I will have to pay out more often for men. Therefore, I charge men more.

If you were to claim that because there may be nothing inherent in maleness that causes assault and being assaulted, therefore men shouldn’t have to pay more, you are making a moral argument, not an empirical one. You are also committing the ought-is fallacy. Just because your beliefs tell you that some aspect of the world should be a certain way, or that it would be more moral for the world to be a certain way, does not mean the world actually is that way or that everyone must agree to order the world as if that were true.

This doesn’t prevent you from making a moral argument that we should ignore certain correlates in certain cases in the interest of fairness, merely that you should not be making an empirical argument about what is ultimately values.

The ought-is fallacy came up literally whenever Weapons of Math Destruction talked about insurance, as well as when it talked about sentencing disparities. Here’s one example:

But as the questions continue, delving deeper into the person’s life, it’s easy to imagine how inmates from a privileged background would answer one way and those from tough inner-city streets another. Ask a criminal who grew up in comfortable suburbs about “the first time you were ever involved with the police,” and he might not have a single incident to report other than the one that brought him to prison. Young black males, by contrast, are likely to have been stopped by police dozens of times, even when they’ve done nothing wrong. A 2013 study by the New York Civil Liberties Union found that while black and Latino males between the ages of fourteen and twenty-four made up only 4.7 percent of the city’s population, they accounted for 40.6 percent of the stop-and-frisk checks by police. More than 90 percent of those stopped were innocent. Some of the others might have been drinking underage or carrying a joint. And unlike most rich kids, they got in trouble for it. So if early “involvement” with the police signals recidivism, poor people and racial minorities look far riskier.

Now I happen to agree with Dr. O’Neil that we should not allow race to end up playing a role in prison sentence length. There are plenty of good things to include in a sentence length: seriousness of crime, remorse, etc. I don’t think race should be one of these criteria and since the sequence of events that Dr. O’Neil mentions make this far from the default in the criminal justice system, I think doing more to ensure race stays out of sentencing is an important moral responsibility we have as a society.

But Dr. O’Neil’s empirical criticism of recidivism models is entirely off base. In this specific example, she is claiming that some characteristics that correlate with recidivism should not be used in recidivism models even though they improve the accuracy, because they are not per se causative of crime.

Because of systematic racism and discrimination in policing [3], the recidivism rate among black Americans is higher. If the only thing you care about is maximizing the prison sentence of people who are most likely to re-offend, then your model will tag black people for longer sentences. It does not matter what the “cause” of this is! Your accuracy will still be higher if you take race into account.

To say “black Americans seem to have a higher rate of recidivism, therefore we should punish them more heavily” is almost to commit the opposite fallacy, the is-ought. Instead, we should say “yes, empirically there’s a high rate of recidivism among black Americans, but this is probably caused by social factors and regardless, if we don’t want to create a population of permanently incarcerated people, with all of the vicious cycle of discrimination that this creates, we should aim for racial parity in sentencing”. This is a very strong (and I think persuasive) moral claim [4].

It certainly is more work to make a complicated moral claim that mentions the trade-offs we must make between punishment and fairness (or between what is morally right and what is expedient) than it is to make a claim that makes no reference to these subtleties. When we admit that we are sacrificing accuracy in the name of fairness, we do open up an avenue for people to attack us.

Despite this disadvantage, I think keeping our moral and empirical claims separate is very important. When you make the empirical claim that “being black isn’t causative of higher rates of recidivism, therefore the models are wrong when they rank black Americans as more likely to reoffend”, instead of the corresponding ethical claim, then you are making two mistakes. First, there’s lots of room to quibble about what “causative” even means, beyond simple genetic causation. Because you took an empirical and not ethical position, you may have to fight any future evidence to the contrary of your empirical position, even if the evidence is true; in essence, you risk becoming an enemy of the truth. If the truth becomes particularly obvious (and contrary to your claims) you risk looking risible and any gains you achieved will be at risk of reversal.

Second, I would argue that it is ridiculous to claim that universal human rights must rest on claims of genetic identicalness between all groups of people (and trying to make the empirical claim above, rather than a moral claim implicitly embraces this premise). Ashkenazi Jews are (on average) about 15 IQ points ahead of other groups. Should we give them any different moral worth because of this? I would argue no [5]. The only criteria for full moral worth as a human and all universal rights that all humans are entitled to is being human.

As genetic engineering becomes possible, it will be especially problematic to have a norm that moral worth of humans can be modified by their genetic predisposition to pro-social behaviour. Everyone, but most especially the left, which views diversity and flourishing as some of its most important projects should push back against both the is-ought and ought-is fallacies and fight for an expansive definition of universal human rights.

Availability Bait-and-Switch

Imagine someone told you the following story:

The Fair Housing Act has been an absolute disaster for my family! My brother was trying to sublet his apartment to a friend for the summer. Unfortunately, one of the fair housing inspectors caught wind of this and forced him to put up notices that it was for rent. He had to spend a week showing random people around it and some snot-nosed five-year-old broke one of his vases while he was showing that kid’s mother around. I know there were problems before, but is the Fair Housing Act really worth it if it can cause this?

Most people would say the answer to the above is “yes, it really was worth it, oh my God, what is wrong with you?”

But it’s actually hard to think that. Because you just read a long, vivid, easily imaginable example of what exactly was wrong with the current regime and a quick throw away reference to there being problems with the old way things were done. Some people might say that it’s better to at least mention that the other way of doing things had its problems too. I disagree strenuously.

When you make a throw-away reference to problems with another way of doing things, while focusing all of your descriptive effort on the problems of the current way (or vice-versa), you are committing the Availability Bait-and-Switch. And you are giving a very false illusion of balance; people will remember that you mentioned both had problems, but they will not take this away as their impression. You will have tricked your readers into thinking you gave a balanced treatment (or at least paved the way for a defence against claims that you didn’t give a balanced treatment) while doing nothing of the sort!

We are all running corrupted hardware. One of the most notable cognitive biases we have is the availability heuristic. We judge probabilities based on what we can easily recall, not on any empirical basis. If you were asked “are there more words in the average English language book that start with k, or have k as the third letter?”, you’d probably say “start with k!” [6]. In fact, words with “k” as the third letter show up more often. But these words are harder to recall and therefore much less available to your brain.

If I were to give you a bunch of very vivid examples of how algorithms can ruin your life (as Dr. O’Neil repeatedly does, most egregiously in chapters 1, 5, and 8) and then mention off-hand that human decision making also used to ruin a lot of people’s lives, you’d probably come out of our talk much more concerned with algorithms than with human decision making. This was a thing I had to deliberately fight against while reading Weapons of Math Destruction.

Because for a book about how algorithms are destroying everything, there was a remarkable paucity of data on this destruction. I cannot recall seeing any comparative analysis (backed up by statistics, not anecdotes) of the costs and benefits of human decision making and algorithmic decision making, as it applied to Dr. O’Neil’s areas of focus. The book was all the costs of one and a vague allusion to the potential costs of the other.

If you want to give your readers an accurate snapshot of the ground truth, your examples must be representative of the ground truth. If algorithms cause twice as much damage as human decision making in certain circumstances (and again, I’ve seen zero proof that this is the case) then you should interleave every two examples of algorithmic destruction with one of human pettiness. As long as you aren’t doing this, you are lying to your readers. If you’re committed to lying, perhaps for reasons of pithiness or flow, then drop the vague allusions to the costs of the other way of doing things. Make it clear you’re writing a hatchet job, instead of trying to claim epistemic virtue points for “telling both sides of the story”. At least doing things that way is honest [7].


[1] This is a classic example of “anchoring”, a phenomenon where you appear to have a strong correlation in a certain direction because of a single extreme point. When you have anchoring, it’s unclear how generalizable your conclusion is – as the whole direction of the fit could be the result of the single extreme point.

Here’s a toy example:

Note that the thing that makes me suspicious of anchoring here is that we have a big hole with no data and no way of knowing what sort of data goes there (it’s not likely we can randomly generate a bunch of new countries and plot their gun ownership and rate of mass shootings). If we did some more readings (ignoring the fact that in this case we can’t) and got something like this:

I would no longer be worried about anchoring. It really isn’t enough just to look at the correlation coefficient either. The image labelled “Also Not Anchored” has a marginally lower correlation coefficient than the anchored image, even though (I would argue) it is FAR more likely to represent a true positive correlation. Note also we have no way to tell that more data will necessarily give us a graph like the third. We could also get something like this:

In which we have a fairly clear trend of noisy data with an average of 2.5 irrespective of our x-value and a pair of outliers driving a slight positive correlation.

Also, the NYT graph isn’t normalized to population, which is kind of a WTF level mistake. They include another graph that is normalized later on, but the graph I show is the preview image on Facebook. I was very annoyed with the smug liberals in the comments of the NYT article, crowing about how conservatives are too stupid to understand statistics. But that’s a rant for another day…  ^

[2] I’d very quickly go out of business because of the moral hazard and adverse selection built into this product, but that isn’t germane to the example. ^

[3] Or at least, this is my guess as to the most plausible factors in the recidivism rate discrepancy. I think social factors ­– especially when social gaps are so clear and pervasive – seem much more likely than biological ones. The simplest example of the disparity in policing – and its effects – is the relative rates of being stopped by police during Stop and Frisk given above by Dr. O’Neil. ^

[4] It’s possible that variations in Monoamine oxidase A or some other gene amongst populations might make some populations more predisposed (in a biological sense) to violence or other antisocial behaviour. Given that violence and antisocial behaviour are relatively uncommon (e.g. about six in every one thousand Canadian adults are incarcerated or under community supervision on any given day), any genetic effect that increases them would both be small on a social level and lead to a relatively large skew in terms of supervised populations.

This would occur in the same way that repeat offenders tend to be about one standard deviation below median societal IQ but the correlation between IQ and crime explains very little of the variation in crime. This effect exists because crime is so rare.

It is unfortunately easy for people to take things like “Group X is 5% more likely to be violent”, and believe that people in Group X are something like 5% likely to assault them. This obviously isn’t true. Given that there are about 7.5 assaults for every 1000 Canadians each year, a population that was instead 100% Group X (with their presumed 5% higher assault rate) would see about 7.875 assaults per 1000 people, a difference of about one additional assault per 3500 people.

Unfortunately, if society took its normal course, we could expect to see Group X very overrepresented in prison. As soon as Group X gets a reputation for violence, juries would be more likely to convict, bail would be less likely, sentences might be longer (out of fear of recidivism), etc. Because many jobs (and in America, social benefits and rights) are withdrawn after you’ve been sentenced to jail, formerly incarcerated members of Group X would see fewer legal avenues to make a living. This could become even worse if even non-criminal members of Group X would denied some jobs due to fear of future criminality, leaving Group X members with few overall options but the black and grey economies and further tightening the spiral of incarceration and discrimination.

In this case, I think the moral thing to do as a society is to ignore any evidence we have about between-group differences in genetic propensities to violence. Ignoring results isn’t the same thing as pretending they are false or banning research; we aren’t fighting against truth, simply saying that some small extra predictive power into violence is not worth the social cost that Group X would face in a society that is entirely unable to productively reason about statistics.  ^

[5] Although we should be ever vigilant against people who seek to do the opposite and use genetic differences between Ashkenazi Jews and other populations as a basis for their Nazi ideology. As Hannah Arendt said, the Holocaust was a crime against humanity perpetrated on the body of the Jewish people. It was a crime against humanity (rather than “merely” a crime against Jews) because Jews are human. ^

[6] Or at least, you would if I hadn’t warned you that I was about to talk about biases. ^

[7] My next blog post is going to be devoted to what I did like about the book, because I don’t want to commit the mistakes I’ve just railed against (and because I think there was some good stuff in the book that bears reviewing). ^