I’ve seen superstition develop first hand. It happened in one of the places you might least expect it – in a biochemistry lab. In the summer of 2015, I found myself trying to understand which mutants of a certain protein were more stable than the wildtype. Because science is perpetually underfunded, the computer that drove the equipment we were using was ancient and frequently crashed. Each crash wiped out an hour or two of painstaking, hurried labour and meant we had less time to use the instrument to collect actual data. We really wanted to avoid crashes! Therefore, over the course of that summer, we came up with about 12 different things to do before each experiment (in sequence) to prevent them from happening.
We were sure that 10 out of the 12 things were probably useless, we just didn’t know which ten. There may have been no good reason that opening the instrument, closing, it, then opening it again to load our sample would prevent computer crashes, but as far as we could tell when we did that, the machine crashed far less. It was the same for the other eleven. More self-aware than I, the graduate student I worked with joked to me: “this is how superstitions get started” and I laughed along. Until I read two articles in The New Yorker.
In The Score (How Childbirth Went Industrial), Dr. Atul Gawande talks about the influence of the Apgar score on childbirth. Through a process of continuous competition and optimization, doctors have found out ways to increase the Apgar scores of infants in their first five minutes of life – and how to deal with difficult births in ways that maximize their Apgar scores. The result of this has been a shocking (six-fold) decrease in infant mortality. And all of this is despite the fact that according to Gawande, “[in] a ranking of medical specialties according to their use of hard evidence from randomized clinical trials, obstetrics came in last. Obstetricians did few randomized trials, and when they did they ignored the results.”
Similarly, in The Bell Curve (What happens when patients find out how good their doctors really are), Gawande found that the differences between the best CF (cystic fibrosis) treatment centres and the rest turned out to hinge on how rigorously each centre followed the guidelines established by big clinical trials. That is to say, those that followed the accepted standard of care to the letter had much lower survival rates than those that hared off after any potentially lifesaving idea.
It seems that obstetricians and CF specialists were able to get incredible results without too much in the way of superstitions. Even things that look at first glance to be minor superstitions often turned out not to be. For example, when Gawande looked deeper into a series of studies that showed forceps were as good as or better than Caesarian sections, he was told by an experienced obstetrician (who was himself quite skilled with forceps) that these trials probably benefitted from serious selection effects (in general, only doctors particularly confident in their forceps skills volunteer for studies of them). If forceps were used on the same industrial scale as Caesarian sections, that doctor suspected that they’d end up worse.
But I don’t want to give the impression that there’s something about medicine as a field that allows doctors to make these sorts of improvements without superstition. In The Emperor of all Maladies, Dr. Siddhartha Mukherjee spends some time talking about the now discontinued practices of “super-radical” mastectomy and “radical” chemotherapy. In both treatments, doctors believed that if some amount of a treatment was good, more must be better. And for a while, it seemed better. Cancer survival rates improved after these procedures were introduced.
But randomized controlled trials showed that there was no benefit to those invasive, destructive procedures beyond that offered by their less-radical equivalents. Despite this evidence, surgeons and oncologists clung to these treatments with an almost religious zeal, long after they should have given up and abandoned them. Perhaps they couldn’t bear to believe that they had needlessly poisoned or maimed their patients. Or perhaps the superstition was so strong that they felt they were courting doom by doing anything else.
The simplest way to avoid superstition is to wait for large scale trials. But from both Gawande articles, I get a sense that matches with anecdotal evidence from my own life and that of my friends. It’s the sense that if you want to do something, anything, important – if you want to increase your productivity or manage your depression/anxiety, or keep CF patients alive – you’re likely to do much better if you take the large scale empirical results and use them as a springboard (or ignore them entirely if they don’t seem to work for you).
For people interested in nootropics, melatonin, or vitamins, there’s self-blinding trials, which provide many of the benefits of larger trials without the wait. But for other interventions, it’s very hard to effectively blind yourself. If you want to see if meditation improves your focus, for example, then you can’t really hide the fact that you meditated on certain days from yourself .
When I think about how far from the established evidence I’ve gone to increase my productivity, I worry about the chance I could become superstitious.
For example, trigger-action plans (TAPs) have a lot of evidence behind them. They’re also entirely useless to me (I think because I lack a visual imagination with which to prepare a trigger) and I haven’t tried to make one in years. The Pomodoro method is widely used to increase productivity, but I find I work much better when I cut out the breaks entirely – or work through them and later take an equivalent amount of time off whenever I please. I use pomos only as a convenient, easy to Beemind measure of how long I worked on something.
I know modest epistemologies are supposed to be out of favour now, but I think it can be useful to pause, reflect, and wonder: when is one like the doctors saving CF patients and when is one like the doctors doing super-radical mastectomies? I’ve written at length about the productivity regime I’ve developed. How much of it is chaff?
It is undeniable that I am better at things. I’ve rigorously tracked the outputs on Beeminder and the graphs don’t lie. Last year I averaged 20,000 words per month. This year, it’s 30,000. When I started my blog more than a year ago, I thought I’d be happy if I could publish something once per month. This year, I’ve published 1.1 times per week.
But people get better over time. The uselessness of super-radical mastectomies was masked by other cancer treatments getting better. Survival rates went up, but when the accounting was finished, none of that was to the credit of those surgeries.
And it’s not just uselessness that I’m worried about, but also harm; it’s possible that my habits have constrained my natural development, rather than promoting it. This has happened in the past, when poorly chosen metrics made me fall victim to Campbell’s Law.
From the perspective of avoiding superstition: even if you believe that medicine cannot wait for placebo controlled trials to try new, potentially life-saving treatments, surely you must admit that placebo controlled trials are good for determining which things aren’t worth it (take as an example the very common knee surgery, arthroscopic partial meniscectomy, which has repeatedly performed no better than sham surgery when subjected to controlled trials).
Scott Alexander recently wrote about an exciting new antidepressant failing in Stage I trials. When the drug was first announced, a few brave souls managed to synthesize some. When they tried it, they reported amazing results, results that we now know to have been placebo. Look. You aren’t getting an experimental drug synthesized and trying it unless you’re pretty familiar with nootropics. Is the state of self-experimentation really that poor among the nootropics community? Or is it really hard to figure out if something works on you or not ?
Still, reflection isn’t the same thing as abandoning the inside view entirely. I’ve been thinking up heuristics since I read Dr. Gawande’s articles; armed with these, I expect to have a reasonable shot at knowing when I’m at risk of becoming superstitious. They are:
If you genuinely care only about the outcome, not the techniques you use to attain it, you’re less likely to mislead yourself (beware the person with a favourite technique or a vested interest!).
If the thing you’re trying to improve doesn’t tend to get better on its own and you’re only trying one potentially successful intervention at a time, fewer of your interventions will turn out to be superstitions and you’ll need to prune less often (much can be masked by a steady rate of change!).
Finally, it might be that you don’t care that some effects are placebo, so long as you get them and get them repeatedly. That’s what happened with the experiment I worked on that summer. We knew we were superstitious, but we didn’t care. We just needed enough data to publish. And eventually, we got it.
 Even so, there are things you can do here to get useful information. For example, you could get in the habit of collecting information on yourself for a month or so (like happiness, focus, etc.), then try several combinations of interventions you think might work (e.g. A, B, C, AB, BC, CA, ABC, then back to baseline) for a few weeks each. Assuming that at least one of the interventions doesn’t work, you’ll have a placebo to compare against. Although be sure to correct any results for multiple comparisons. ^
Since June 21st of this year, Mohammed bin Salman (often known by his initials, MBS) has been the crown prince of Saudi Arabia. This required what was assuredly not a palace coup, because changes of government or succession are never coups, merely “similar to coups”, “coup-like”, “coup-esque”, or “coupLite™” . As crown prince, MBS has championed a loosening of religious restrictions on women and entertainment, a decrease in reliance on oil for state revenues, and a harder line with Qatar and Iran.
Historical Archetype: Frederick the Great. Proponents: Al Arabiya , optimistic western journalists. Don’t talk to them about: The war in Yemen, the blockade of Qatar, the increased stifling of dissent.
Exemplified by the fawning column above, this school of thought holds that MBS is a dynamic young leader who will reform the Saudi economy, end its dependence on oil, overhaul its institutions, end corruption, and “restore” a more moderate form of Islam.
To supporters, MBS has achieved much in very little time, which they take to be clear evidence of a strong work ethic and a keen intelligence. His current crop of reforms gives them clear hope that clerical power can be shattered and Saudi Arabia can one day become a functioning, modern, democracy.
MBS as a character in Game of Thrones
Historical Archetype: Richard Nixon Proponents: Cynical western journalists, Al Jazeera Don’t talk to them about: How real-life politics is never actually as interesting or well planned as Game of Thrones.
Cersei Lannister’s quotable warning, that “when you play a game of thrones you win or you die” might imply that MBS is on somewhat shaky ground. Proponents of the first view might dispute that and proponents of the next rejoice in it. Proponents of this view point out that so far, MBS seems to be winning.
By isolating Qatar and launching a war in Yemen, he has checked Iranian influence on the Arabian Peninsula. Whether or not it’s valid, his corruption crackdown has sidelined many potential sources of competition (and will probably net much needed liquid cash for the state coffers; it is ironic that Saudi state now turns to sources of liquidity other than the literal liquid that made it so rich). His conflict with Qatar might yet result in the shutdown of Al Jazeera, the most popular TV channel in the Arabic speaking world and long a thorn in the side of Saudi Arabian autocracy.
People who view the conflict through this lens either aren’t particularly concerned with right or wrong (e.g. westerners who just want to get their realpolitik fix) or think that the very fact that MBS might be engaging in HBO worthy realpolitik proves he is guilty of a grave crime (e.g. Al Jazeera, westerners worrying that the region might become even more unstable).
MBS as an overreaching tyrant
Historical Archetype: Joseph II (epitaph: “Here lies Joseph II, who failed in all he undertook.”) Proponents:Arab spring activists and their allies Don’t talk to them about: How much better MBS is than any plausible alternative.
Saudi Arabia is a rentier state with an unusual relationship with its population. Saudi state revenues are not derived from taxation (which almost invariably results in calls for responsible government), but instead from oil money. This money is distributed back to citizens via cushy government jobs. In Saudi Arabia, two-thirds of citizen employment is in the public sector. The private sector is almost wholly the purview of expats, who (if I’m reading the latest official Saudi employment report right) hold 75% of the non-governmental jobs .
With oil set to become obsolete in the next fifty years, Saudi Arabia is in a very bad position. The only thing that can save it is a diversified economy, but the path there isn’t smooth. Overarching reform of an economy is difficult and normally relies on extensive, society-wide consultation. Proponents of this theory see MBS as intent on centralizing power so that he can achieve this transformation single-handedly.
They note that the reversal of the ban on women driving has been paired with intense pressure on the very activists who originally agitated for its removal, pressure to say nothing and to avoid celebrations. They also note that the anti-corruption sweep conveniently removes many people who could have stood in MBS’s way as he embarks on his reforms and expropriates their wealth for the state . They note that independent economists and other civil society figures – just the sort of people who could have provided (and did provide) nuanced feedback on Vision 2030 – have found themselves suddenly detained on MBS’s orders.
Proponents of this theory believe that MBS is trying to modernize Saudi Arabia, but that he is doomed to fail in his attempts without building a (possibly democratic) consensus around the direction of the kingdom. They believe that Saudi Arabia cannot have the civil society necessary for reform until the government stops viewing rights as something it gives the citizens (and that they must be grateful for), but as an inherent human birthright.
If you believe this, you’ll most likely see MBS as moving the kingdom further from this ideal. And you might see the invasion and ongoing war in Yemen as the sort of cluster-fuck we can expect from MBS’s too-rapid attempts to accumulate and use power.
I would first like to note that one advantage of caricaturing other views then providing a synthesis is that you get to appear reasonable and nuanced by comparison. I’m going to claim that as my reward for going through the work to post this, but please do remember that other people have nuanced views too. I got where I am by reading or listening to them!
The takfiri impulses of Wahhabism  underlie the takfiri doctrine so beloved of Daesh. Of course, the vast, vast majority of Wahhabis engage in neither terrorism, nor public executions of (by Canadian standards) innocent people. But insofar as those things do happen in the Sunni world, Wahhabi men are unusually likely to be the perpetrators. It is tempting to go further, to claim that conservatives are wrong – that there is no Islamic terrorism problem, merely a Wahhabi terrorism problem  – but this would be false.
(There is terrorism conducted by Shia Muslims and by other Sunni sects and to call terrorism a solely Wahhabi problem makes it sound like there are no peaceful Wahhabis. A much more accurate (and universal, as this is true across almost all religions and populations) single cause would be masculinity, as almost all terrorists are men.)
Still, the fact that so much terrorism can be traced back to a close western ally  is disquieting and breeds some amount of distrust of the west in some parts of the Islamic world (remember always that Muslim are the primary victims of Islamic terrorism; few have better reasons to despise Islamic terrorism than the terrorists’ co-religionists and most-frequent victims).
The fact that Wahhabism at home is a problem for MBS (the Wahhabi clergy is an alternative, non-royal power centre that he can’t directly control) could give me some hope that he might stop supporting Wahhabism. Certainly he has made statements to that effect. But it is very unclear if he has any real interest in ending Saudi Arabia $100 billion-dollar effort to export Wahhabism abroad. I would be unsurprised if he deals with the domestic problems inherent in displacing the clergy (i.e. they might not want to be displaced without a messy fight) by sending the most reticent and troublesome members abroad, where they won’t mess up his own plans.
There’s the added wrinkle of Iran. MBS clearly hates Iran and Wahhabism considers Iranian Shiites heretical by default. MBS could easily hold onto Wahhabism abroad simply for its usefulness in checking Iranian influence.
Finally, I care about human rights inside Saudi Arabia. It seems clear that in general, the human rights situation inside the country will improve with MBS in power. There really doesn’t exist a plausible power centre that is more likely to make the average Saudi freer. That said, MBS has detained activists and presided over the death sentence of peaceful protestors.
The average Saudi who does not rock the boat may see her life improve. But the activists who have struggled for human rights will probably not be able to enjoy them themselves.
What this means is that MBS is better than almost all plausible replacements (in the short-term), but he is by no means a good leader, or a morally upstanding individual. In the long term, he might stunt the very civil society that Saudi Arabia needs to become a society that accepts and promotes human flourishing . And if he fails in his quest to modernize Saudi society, we’re much more likely to see unrest, repression, and a far worse regime than we are to see democratic change.
In the long run, we’re all dead. But before that, Saudi Arabia may be in for some very uncomfortable changes.
 As near as I can tell, the change was retroactively made all proper with the Allegiance Council, as soon as the fait was truly accompli. Reports that they approved it beforehand seem to come only from sources with a very vested interest in that being true. ^
 There’s something deeply disturbing about a major news organization comparing a change in which unelected despot will lead a brutal dictatorship with a movement that earnestly strove for democratic change. ^
 A note on news outlets linked to throughout this post: Al Arabiya is owned by Saudi Arabia and therefore tends to view everything Saudi Arabia does in the best possible light. Al Jazeera is owned by Qatar (which is currently being blockaded by Saudi Arabia) and tends to view the kingdom in the worst possible light. The Arab Tyrants Manual Podcast that informed my own views here is produced by Iyad El-Baghdadi, who was arrested for his Arab Spring reporting by The United Arab Emirates (a close ally of Saudi Arabia) and later exiled. This has somewhat soured his already dim view on Arab dictatorships. ^
 Foreigners make up about 53% of the total labour force and almost all of them work in the private sector. Saudis holding private jobs are ~15.5% of the labour force based on these numbers. If we divide 15.5% by 53% plus 15.5%, we get 22% of private jobs held by Saudis. I think for purposes of this comparison, Saudi Aramco, the state oil giant, counts as the public sector.
Remember also that Saudi Arabia has a truly dismal adult labour force participation rate, a side of effect of their deeply misogynistic public policy. ^
 Furthermore, they point out that it is basically impossible to tell if a Saudi royal is corrupt or not, because there is no clear boundary between the personal fortune of the Saud dynasty and the state coffers. Clearing up this particular ambiguity seems low on the priority list of a man who just bought a half-billion dollar yacht.
(If you’re not too lazy to click on a footnote, but are too lazy to click on a link, it was MBS. MBS bought the giant yacht. Spoilers.) ^
 I’ve long held the belief that Wahhabism is dangerous. When talking about this with my Muslim friends, I was often hesitant and apologetic. I needn’t have been. Their vehemence in criticism of Wahhabism often outstripped mine. That was because they had all of my reasons to dislike Wahhabism, plus the unique danger takfir presented to them.
Takfir is the idea that Wahhabis (or their ideological descendants) may deem other Muslims to be infidels if they do not follow Wahhabism’s austere commandments. This often leads to the execution or lynching of more moderate Muslims at the hands of takfiris. As you may have guessed, most North American Muslims could be called takfir by Wahhabis or others of their ilk.
Takfir is one of the many reasons that it is easy to find articles by Muslim authorsdecrying Wahhabism. Many Muslims legitimately fear a form of Islam that would happily deem them heretical and execute them. ^
 It is commonly reported that 15 of the 19 September 11 hijackers were Saudi men, brought up on Wahhabism. The link between Wahhabism, takfir, and terrorism is another reason it is common to find non-Wahhabi Muslims opposed to Wahhabism. Here’s a sampling of Englishlanguagereportingon Daesh from Muslim countries. Indeed, in manysourcesI’veread, the word takfiri was exclusively followed by “terrorist” or “terrorists”. ^
 To create a civil society, Saudi Arabia would need to lift restrictions on the press, give activists some official power, and devolve more power to elected municipalities. Civil society is the corona of pressure groups, advisors, and influencers that exist around a government and allow people to build common knowledge about their desires. Civil society helps you understand just how popular or unpopular a government policy is and gives you a lever to pull if you want to influence it.
A functioning civil society protects a government from its own mistakes (by making an outcry possible before any deed is irreversibly done) and helps ensure that the government is responsible to the will of the people.
That MBS is working hard to prevent civil society shows that he has no desire for feedback and believes he knows better than literally everyone else in the country who is not already his sycophant. I see few ways this could end well. ^
In utilitarianism, “remoter effects” are the result of our actions influencing other people (and are hotly debated). I think that remoter effects are often overstated, especially (as Sir Williams said in Utilitarianism for and against) when they give the conventionally ethical answer. For example, a utilitarian might claim that the correct answer to the hostage dilemma  is to kill no one, because killing weakens the sanctity of human life and may lead to more deaths in the future.
When debating remoter effects, I think it’s worthwhile to split them into two categories: positive and negative. Positive remoter effects are when your actions cause others to refrain from some negative action they might otherwise take. Negative remoter effects are when your actions make it more likely that others will engage in a negative action .
Of late, I’ve been especially interested in ways that positive and negative remoter effects matter in political disagreements. To what extent will acting in an “honourable”  or pro-social way convince one’s opponents to do the same? Conversely, does fighting dirty bring out the same tendency in your opponents?
Some of my favourite bloggers are doubtful of the first proposition:
In “Deontologist Envy”, Ozy writes that we shouldn’t necessarily be nice to our enemies in the hopes that they’ll be nice to us:
In general people rarely have their behavior influenced by their political enemies. Trans people take pains to use the correct pronouns; people who are overly concerned about trans women in bathrooms still misgender them. Anti-racists avoid the use of slurs; a distressing number of people who believe in human biodiversity appear to be incapable of constructing a sentence without one. Social justice people are conscientious about trigger warnings; we are subjected to many tedious articles about how mentally ill people should be in therapy instead of burdening the rest of the world with our existence.
The problem being that, even when Democrats didn’t change a rule protecting the minority party, Republicans haven’t even blinked before casting them aside the minute they interfered with their partisan agenda.
Both of these points are basically correct. Everything that Ozy says about asshats on the internet is true and David wrote his post in response to Republicans removing the filibuster for Supreme Court nominees.
But I still think that positive remoter effects are important in this context. When they happen (and I will concede that this is rare), it is because you are consistently working against the same political opponents and at least some of those opponents are honourable people. My favourite example here (although it is from war, not politics) is the Christmas Day Truce. This truce was so successful and widespread that high command undertook to move men more often to prevent a recurrence.
In politics, I view positive remoter effects as key to Senator John McCain repeatedly torpedoing the GOP healthcare plans. While Senators Murkowski and Collins framed their disagreements with the law around their constituents, McCain specifically mentioned the secretive, hurried and partisan approach to drafting the legislation. This stood in sharp contrast to Obamacare, which had numerous community consultations, went through committee and took special (and perhaps ridiculous) care to get sixty senators on board.
Imagine that Obamacare had been passed after secret drafting and no consultations. Imagine if Democrats had dismantled even more rules in the senate. They may have gotten a few more of their priorities passed or had a stronger version of Obamacare, but right now, they’d be seeing all that rolled back. Instead of evidence of positive remoter effects, we’d be seeing a clear case of negative ones.
When dealing with political enemies, positive remoter effects require a real sacrifice. It’s not enough not to do things that you don’t want to do anyway (like all the examples Ozy listed) and certainly not enough to refrain from doing things to third parties. For positive remoter effects to matter at all – for your opponents (even the honourable ones) not to say “well, they did it first and I don’t want to lose” – you need to give up some tools that you could use to advance your interests. Tedious journalists don’t care about you scrupulously using trigger warnings, but may appreciate not receiving death threats on Twitter.
Had right-wingers refrained from doxxing feminist activists (or even applied any social consequences at all against those who did so), all principled people on the left would be refusing to engage in doxxing against them. As it stands, that isn’t the case and those few leftists who ask their fellow travelers to refrain are met with the entirely truthful response: “but they started it!”
This highlights what might be an additional requirement for positive remoter effects in the political sphere: you need a clearly delimited coalition from which you can eject misbehaving members. Political parties are set up admirably for this. They regularly kick out members who fail to act as decorously as their office demands. Social movements have a much harder time, with predictable consequences – it’s far too easy for the most reprehensible members of any group to quickly become the representatives, at least as far as tactics are concerned.
Still, with positive remoter effects, you are not aiming at a movement or party broadly. Instead you are seeking to find those honourable few in it and inspire them on a different path. When it works (as it did with McCain), it can work wonders. But it isn’t something to lay all your hopes on. Some days, your enemies wake up and don’t screw you over. Other days, you have to fight.
Negative remoter effects seem so obvious as to require almost no explanation. While it’s hard (but possible) to inspire your opponents to civility with good behaviour, it’s depressingly easy to bring them down to your level with bad behavior. Acting honourably guarantees little, but acting dishonourably basically guarantees a similar response. Insofar as honour is a useful characteristic, it is useful precisely because it stops this slide towards mutual annihilation.
 In the hostage dilemma, you are one of ten hostages, captured by rebels. The rebel leader offers you a gun with a single bullet. If you kill one of your fellow hostages, all of the survivors (including you) will be let free. If you refuse all of the hostages (including you) will be killed. You are guarded such that you cannot use the weapon against your captors. Your only option is to kill another hostage, or let all of the hostages be killed.
Here, I think remoter effects fail to salvage the conventional answer and the only proper utilitarian response is to kill one of the other hostages. ^
 Here I’m using “negative” in a roughly utilitarian sense: negative actions are those that tend to reduce the total utility of the world. When used towards good ends, negative actions consume some of the positive utility that the ends generate. When used towards ill ends, negative actions add even more disutility. This definition is robust against different preferred plans of actions (e.g. it works across liberals and conservatives, who might both agree that political violence tends to reduce utility, even if it doesn’t always reduce utility enough to rule it out in the face of certain ends), but isn’t necessarily robust across all terminal values (e.g. if you care only about reducing suffering and I care only for increasing happiness we may have different opinions on the tendency of reproduction towards good or ill).
Negative actions are roughly equivalent to “defecting”. “Roughly” because it is perhaps more accurate to say that the thing that makes defecting so pernicious is that it involves negative actions of a special class, those that generate extra disutility (possibly even beyond what simple addition would suggest) when both parties engage in them. ^
 I used “honourable” in several important places and should probably define it. When discussing actions, I think honourable actions are the opposite of “negative” actions as defined above: actions that tend towards the good, but can be net ill if used for bad ends. When describing “people” as honourable, I’m pointing to people who tend to reinforce norms around cooperation. This is more or less equivalent to being inherently reluctant to use negative actions to advance goals unless provoked.
My favourite example of honour is Salah ad-Din. He sent his own personal physician to tend to King Richard, who was his great enemy and used his own money to buy back a child kidnapped into slavery. Conveniently for me, Salah ad-Din shows both sides of what it means to be honourable. He personally executed Raynald III of Tripoli after Raynald ignored a truce, attacked Muslim caravans, and tortured many of the caravaners to death. To Guy of Lusignan, King of Jerusalem (who was captured in the same battle as Raynald and wrongly feared he was next to die), Salah ad-Din said: “[i]t is not the wont of kings, to kill kings; but that man had transgressed all bounds, and therefore did I treat him thus.” ^
Epistemic Status: Full of sweeping generalizations because I don’t want to make it 10x longer by properly unpacking all the underlying complexity.
[9 minute read]
In 2006, Dr. Atul Gawande wrote an article in The New Yorker about maternal care entitled “How Childbirth Went Industrial“. It’s an excellent piece from an author who consistently produces excellent pieces. In it, Gawande charts the rise of the C-section, from its origin as technique so dangerous it was considered tantamount to murder (and consequently banned on living mothers), to its current place as one of the most common surgical procedures carried out in North American hospitals.
The C-section – and epidurals and induced labour – have become so common because obstetrics has become ruthlessly focused on maximizing the Apgar score of newborns. Along the way, the field ditched forceps (possibly better for the mother yet tricky to use or teach), a range of maneuvers for manually freeing trapped babies (likewise difficult), and general anesthetic (genuinely bad for infants, or at least for the Apgar scores of infants).
The C-section has taken the place of much of the specialized knowledge of obstetrics of old, not the least because it is easy to teach and easy for even relatively less skilled doctors to get right. When Gawande wrote the article, there was debate about offering women in their 39th week of pregnancy C-sections as an alternative to waiting for labour. Based on the stats, this hasn’t quite come to pass, but C-sections have become slightly more prevalent since the article was written.
I noticed two laments in the piece. First, Gawande wonders at the consequences of such an essential aspect of the human experience being increasingly (and based off of the studies that show forceps are just as good as C-sections, arguably unnecessarily) medicalized. Second, there’s a sense throughout the article that difficult and hard-won knowledge is being lost.
The question facing obstetrics was this: Is medicine a craft or an industry? If medicine is a craft, then you focus on teaching obstetricians to acquire a set of artisanal skills—the Woods corkscrew maneuver for the baby with a shoulder stuck, the Lovset maneuver for the breech baby, the feel of a forceps for a baby whose head is too big. You do research to find new techniques. You accept that things will not always work out in everyone’s hands.
But if medicine is an industry, responsible for the safest possible delivery of millions of babies each year, then the focus shifts. You seek reliability. You begin to wonder whether forty-two thousand obstetricians in the U.S. could really master all these techniques. You notice the steady reports of terrible forceps injuries to babies and mothers, despite the training that clinicians have received. After Apgar, obstetricians decided that they needed a simpler, more predictable way to intervene when a laboring mother ran into trouble. They found it in the Cesarean section.
Medicine would not be the first industry to industrialize. The quasi-mythical King Ludd that gave us the phrase “Luddite” was said to be a weaver, put out of business by the improved mechanical knitting machines. English programs turn out thousands of writers every year, all with an excellent technical command of the English language, but most with none of the emotive power of Gawande. Following the rules is good enough when you’re writing for a corporation that fears to offend, or for technical clarity. But the best writers don’t just know how to follow the rules. They know how and when to break them.
If Gawande was a student of military history, he’d have another metaphor for what is happening to medicine: warriors are being replaced by soldiers.
If you ever find yourself in possession of a spare hour and feel like being lectured breathlessly by a wide-eyed enthusiast, find your local military history buff (you can identify them by their collection of swords or antique guns) and ask them whether there’s any difference between soldiers and warriors.
You can go do this now, or I can fill in, having given this lecture many times myself.
Imagine your favourite (or least favourite) empire from history. You don’t get yourself an empire by collecting bottle caps. To create one, you need some kind of army. To staff your army, you have two options. Warriors, or soldiers.
(Of course, this choice isn’t made just by empires. Their neighbours must necessarily face the same conundrum.)
Warriors are the heroes of movies. They were almost always the product of training that starts at a young age and more often than not were members a special caste. Think medieval European Knights, Japanese Samurai, or the Hashashin fida’i. Warriors were notable for their eponymous mastery of war. A knight was expected to understand strategy and tactics, riding, shooting, fighting (both on foot and mounted), and wrestling. Warriors wanted to live up to their warrior ethos, which normally emphasized certain virtues, like courage and mercy (to other warriors, not to any common peasant drafted to fight them).
Soldiers were whichever conscripts or volunteers someone could get into a reasonable standard of military order. They knew only what they needed to complete their duties: perhaps one or two simple weapons, how to march in formation, how to cook, and how to repair some of their equipment . Soldiers just wanted to make it through the next battle alive. In service to this, they were often brutally efficient in everything they did. Fighting wasn’t an art to them – it was simple butchery and the simpler and quicker the better. Classic examples of soldiers are the Roman Legionaries, Greek Hoplites, and Napoleon’s Grande Armée.
The techniques that soldiers learned were simple because they needed to be easy to teach to ignorant peasants on a mass scale in a short time. Warriors had their whole childhood for elaborate training.
(Or at least, that’s the standard line. In practice, things were never quite as clear cut as that – veteran soldiers might have been as skilled as any warrior, for example. The general point remains though; one on one, you would always have bet on a warrior over a soldier.)
But when you talk about armies, a funny thing happens. Soldiers dominated . Individually, they might have been kind of crap at what they did. Taken as a whole though, they were well-coordinated. They looked out for each other. They fought as a team. They didn’t foolishly break ranks, or charge headlong into the enemy. When Germanic warriors came up against Roman soldiers, they were efficiently butchered. The Germans went into battle looking for honour and perhaps a glorious death. The Romans happily gave them the latter and so lived (mostly) to collect their pensions. Whichever empire you thought about above almost certainly employed soldiers, not warriors.
It turns out that discipline and common purpose have counted for rather a lot more in military history than simple strength of arms. Of this particular point, I can think of no better example than the rebellion that followed the Meiji restoration. The few rebel samurai, wonderfully trained and unholy terrors in single combat were easily slaughtered by the Imperial conscripts, who knew little more than which side of a musket to point at the enemy.
The very fact that the samurai didn’t embrace the firing line is a point against them. Their warrior code, which esteemed individual skill, left them no room to adopt this devastating new technology. And no one could command them to take it up, because they were mostly prima donnas where their honour was concerned.
I don’t want to be too hard on warriors. They were actually an efficient solution to the problem of national defence if a population was small and largely agrarian, lacked political cohesion or logistical ability, or was otherwise incapable of supporting a large army. Under these circumstances, polities could not afford to keep a large population under arms at all times. This gave them several choices: they could rely on temporary levies, who would be largely untrained. They could have a large professional army that paid for itself largely through raiding, or they could have a small, elite cadre of professional warriors.
All of these strategies had disadvantages. Levies tended to have very brittle morale, and calling up a large proportion of a population makes even a successfully prosecuted war economically devastating. Raiding tends to make your neighbours really hate you, leading to more conflicts. It can also be very bad for discipline and can backfire on your own population in lean times. Professional warriors will always be dwarfed in numbers by opponents using any other strategy.
Historically, it was never as simple as solely using just one strategy (e.g. European knights were augmented with and eventually supplanted by temporary levies), but there was a clear lean towards one strategy or another in most resource-limited historical polities. It took complex cultural technology and a well-differentiated economy to support a large force of full time soldiers and wherever these pre-conditions were lacking, you just had to make do with what you could get .
When conditions suddenly call for a struggle – whether that struggle is against a foreign adversary, to boost profits, or to cure disease, it is useful to look at how many societal resources are thrown at the fight. When resources are scarce, we should expect to see a few brilliant generalists, or many poorly trained conscripts. When resources are thick on the ground, the amount that can be spent on brilliant people is quickly saturated and the benefits of training your conscripts quickly accrue. From one direction or another, you’ll approach the concept of soldiers.
Doctors as soldiers, not as warriors is the concept Gawande is brushing up against in his essay. These new doctors will be more standardized, with less room for individual brilliance, but more affordances for working well in teams. The prima donnas will be banished (as they aren’t good team players, even when they’re brilliant). Dr. Gregory House may have been the model doctor in the Victorian Age, or maybe even in the fifties. But I doubt any hospital would want him now. It may be that this standardization is just the thing we need to overcome persistent medical errors, improve outcomes across the board, and make populations healthier. But I can sympathize with the position that it might be causing us to lose something beautiful.
In software development, where I work, a similar trend can be observed. Start-ups aggressively court ambitious generalists, for whom freedom to build things their way is more important than market rate compensation and is a better incentive than even the lottery that is stock-options. At start-ups, you’re likely to see languages that are “fun” to work with, often dynamically typed, even though these languages are often considered less inherently comprehensible than their more “enterprise-friendly” statically typed brethren.
It’s with languages like Java (or its Microsoft clone, C#) and C++ that companies like Google and Amazon build the underlying infrastructure that powers large tracts of the internet. Among the big pure software companies, Facebook is the odd one out for using PHP (and this choice required them to rewrite the code underlying the language from scratch to make it performant enough for their large load).
It’s also at larger companies where team work, design documents, and comprehensibility start to be very important (although there’s room for super-stars at all of the big “tech” companies still; it’s only in companies more removed from tech and therefore outside a lot of the competition for top talent where being a good team player and writing comprehensible code might top brilliance as a qualifier). This isn’t to say that no one hiring for top talent appreciates things like good documentation, or comprehensibility. Merely that it is easy for a culture that esteems individual brilliance to ignore these things are a mark of competence.
Here the logic goes that anyone smart enough for the job will be smart enough to untangle the code of their predecessors. As anyone who’s been involved in the untangling can tell you, there’s a big difference between “smart enough to untangle this mess” and “inclined to wade through this genius’s spaghetti code to get to the part that needs fixing”.
No doubt there exist countless other examples in fields I know nothing about.
The point of gathering all these examples and shoving them into my metaphor is this: I think there exist two important transitions that can occur when a society needs to focus a lot of energy on a problem. The transition from conscripts to soldiers isn’t very interesting, as it’s basically the outcome of a process of continuous improvement.
But the transition from warriors to soldiers is. It’s amazing that we can often get better results by replacing a few highly skilled generalists who apply a lot of hard fought decision making, with a veritable army of less well trained, but highly regimented and organized specialists. It’s a powerful testament to the usefulness of group intelligence. Of course, sometimes (e.g. Google, or the Mongols) you get both, but these are rare happy accidents.
Being able to understand where this transition is occurring helps you understand where we’re putting effort. Understanding when it’s happening within your own sphere of influence can help you weather it.
Also note that this transition doesn’t only go in one direction. As manufacturing becomes less and less prevalent in North America, we may return to the distant past, when manufacturing stuff was only undertaken by very skilled artisans making unique objects.
 Note the past tense throughout much of this essay; when I speak about soldiers and warriors, I’m referring only to times before the 1900s. I know comparatively little about how modern armies are set up. ^
 Best of all were the Mongols, who combined the lifelong training of warriors with the discipline and organization of soldiers. When Mongols clashed with European knights in Hungary, their “dishonourable” tactics (feints, followed by feigned retreats and skirmishing) easily took the day. This was all possible through a system of signal flags that allowed Subutai to command the whole battle from a promontory. European leaders were expected to show their bravery by being in the thick of fighting, which gave them no overall control over their lines. ^
 Historically, professional armies with good logistical support could somewhat pay for themselves by expanding an empire, which brought in booty and slaves. This is distinct from raiding (which does not seek to incorporate other territories) and has its own disadvantages (rebellion, over-extension, corruption, massive unemployment among unskilled labourers, etc.). ^
Recently, I talked about what I didn’t like in Dr. Cathy O’Neil’s book, Weapons of Math Destruction. This time around, I’d like to mention two parts of it I really liked. I wish Dr. O’Neil put more effort into naming the concepts she covered; I don’t have names for them from WMD, but in my head, I’ve been calling them Hidden Value Encodings and Axiomatic Judgements.
Hidden Value Encodings
Dr. O’Neil opens the book with a description of the model she uses to cook for her family. After going into a lot of detail about it, she makes this excellent observation:
Here we see that models, despite their reputation for impartiality, reflect goals and ideology. When I removed the possibility of eating Pop-Tarts at every meal, I was imposing my ideology on the meals model. It’s something we do without a second thought. Our own values and desires influence our choices, from the data we choose to collect to the questions we ask. Models are opinions embedded in mathematics.
It is far too easy to view models as entirely empirical, as math made form and therefore blind to values judgements. But that couldn’t be further from the truth. It’s value judgements all the way down.
Imagine a model that tries to determine when a credit card transaction is fraudulent. Fraudulent credit cards transactions cost the credit card company money, because they must refund the stolen amount to the customer. Incorrectly identifying credit card transactions also costs a company money, either through customer support time, or if the customer gets so fed up by constant false positives that they switch to a different credit card provider.
If you were tasked with building a model to predict which credit card transactions were fraudulent by one of the major credit card companies, you would probably build into your model a variable cost for failing to catch fraudulent transactions (equivalent to the cost the company must bear if the transaction is fraudulent) and a fixed cost for labelling innocuous transactions as fraudulent (equivalent to the average cost of a customer support call plus the average chance of a false positive pushing someone over the edge into switching cards multiplied by the cost of their lost business over the next few years).
From this encoding, we can already see that our model would want to automatically approve all transactions below the fixed cost of dealing with false positives , while applying increasing scrutiny to more expensive items, especially expensive items with big resale value or items more expensive than the cardholder normally buys (as both of these point strongly toward fraud).
This seems innocuous and logical. It is also encoding at least two sets of values. First, it encodes the values associated with capitalism. At the most basic level, this algorithm “believes” that profit is good and losses are bad. It is aimed to maximize profit for the bank and while we may hold this as a default assumption for most algorithms associated with companies, that does not mean it is devoid of values; instead it encodes all of the values associated with capitalism . Second, the algorithm encodes some notion that customers have freedom to choose between alternatives (even more so than is encoded by default in accepting capitalism).
By applying a cost to false positives (and likely it would be a cost that rises with each previous false positive), you are tacitly acknowledging that customers could take their business elsewhere. If customers instead had no freedom to choose who they did business with, you could merely encode as your loss from false positives the fixed cost of fielding support calls. Since outsourced phone support is very cheap, your algorithm would care much less about false positives if there was no consumer choice.
As far as I can tell, there is no “value-free” place to stand. An algorithm in the service of a hospital that helps diagnose patients or focus resources on the most ill encodes the value that “it is better to be healthy than sick; better to be alive than dead”. These values might be (almost-)universal, but they still exist, they are still encoded, and they still deserve to be interrogated when we put functions of our society in the hands of software governed by them.
One of the most annoying parts of being a child is the occasional requirement to accept an imposition on your time or preferences with the explanation “because I say so”. “Because I say so” isn’t an argument, it’s a request that you acknowledge adults’ overwhelming physical, earning, and social power as giving them a right to set arbitrary rules for you. Some algorithms, forced onto unwelcoming and less powerful populations (teachers, job-seekers, etc.) have adopted this MO as well. Instead of having to prove that they have beneficial effects or that their outputs are legitimate, they define things such that their outputs are always correct and brook no criticism.
Here’s Dr. O’Neil talking about a value-added teaching model in Washington State:
When Mathematica’s scoring system tags Sarah Wysocki and 205 other teachers as failures, the district fires them. But how does it ever learn if it was right? It doesn’t. The system itself has determined that they were failures, and that is how they are viewed. Two hundred and six “bad” teachers are gone. That fact alone appears to demonstrate how effective the value-added model is. It is cleansing the district of underperforming teachers. Instead of searching for the truth, the score comes to embody it.
She contrasts this with how Amazon operates: “if Amazon.com, through a faulty correlation, started recommending lawn care books to teenage girls, the clicks would plummet, and the algorithm would be tweaked until it got it right.” On the other hand, the teacher rating algorithm doesn’t update, doesn’t look check if it is firing good teachers, and doesn’t take an accounting of its own costs. It holds it as axiomatic –a basic fact beyond questioning– that its results are the right results.
I am in full agreement with Dr. O’Neil’s criticism here. Not only does it push past the bounds of fairness to make important decisions, like hiring and firing, through opaque formulae that are not explained to those who are being judged and lack basic accountability, but it’s a professional black mark on all of the statisticians involved.
Whenever you train a model, you hold some data back. This is your test data and you will use it to assess how well your model did. That gets you through to “production” – to having your model out in the field. This is an exciting milestone, not only because your model is now making decisions and (hopefully) making them well, but because now you’ll have way more data. You can see how your new fraud detection algorithm does by the volume of payouts and customer support calls. You can see how your new leak detection algorithm does by customers replying to your emails and telling you if you got it right or not.
A friend of mine who worked in FinTech once told me that they approved 1.5% of everyone who applied for their financial product, no matter what. They’d keep the score their model gave to that person on record, then see how the person fared in reality. If they used the product responsibly despite a low score, or used it recklessly despite a high score, it was viewed as valuable information that helped the team make their model that much better. I can imagine a team of data scientists, heads together around a monitor, looking through features and asking each other “huh, do any of you see what we missed here?” and it’s a pleasant image .
Value added teaching models, or psychological pre-screens for hiring do nothing of the sort (even though it would be trivial for them to!). They give results and those results are defined as the ground truth. There’s no room for messy reality to work its way back into the cycle. There’s no room for the creators to learn. The algorithm will be flawed and imperfect, like all products of human hands. That is inevitable. But it will be far less perfect than it could be. Absent feedback, it is doomed to always be flawed, in ways both subtle and gross, and in ways unknown to its creators and victims.
Like most Canadian engineering students, I made a solemn vow:
…in the presence of these my betters and my equals in my calling, [I] bind myself upon my honour and cold iron, that, to the best of my knowledge and power, I will not henceforward suffer or pass, or be privy to the passing of, bad workmanship or faulty material in aught that concerns my works before mankind as an engineer…
Sloppy work, like that value-added teacher model is the very definition of bad workmanship. Would that I never suffer something like that to leave my hands and take life in the world! It is no Quebec Bridge, but the value-added teaching model and other doomed to fail algorithms like it represent a slow-motion accident, steadily stealing jobs and happiness from people with no appeal or remorse.
I can accept stains on the honour of my chosen profession. Those are inevitable. But in a way, stains on our competence are so much worse. Models that take in no feedback are both, but the second really stings me.
 This first approximation isn’t correct in practice, because certain patterns of small transactions are consistent with fraud. I found this out the hard way, when a certain Bitcoin exchange’s credit card verification procedure (withdrawing less than a dollar, then refunding it a few days later, after you tell them how much they withdrew) triggered the fraud detection software at my bank. Apparently credit card thieves will often do a similar thing (minus the whole “ask the cardholder how much was withdrawn” step), as a means of checking if the card is good without cluing in the cardholder. ^
 I don’t mean this as a criticism of capitalism. I seek merely to point out (that like all other economic systems) capitalism is neither value neutral, nor inevitable. “Capitalism” encodes values like “people are largely rational”, “people often act to maximize their gains” and “choice is fundamentally good and useful”. ^
If socialist banks had ever made it to the point of deploying algorithms (instead of collapsing under the weight of their flawed economic system), those algorithms would also encode values (like “people will work hard for the good of the whole” and “people are inherently altruistic” and “it is worth it to sacrifice efficiency in the name of fairness”).
 Dulce et decorum est… get the fucking data science right. ^
Much thanks to Cody Wild for providing editing and feedback. That said, I would like to remind my readers that I deserve full credit for all errors and that all opinions expressed here are only guaranteed to be mine.
[12 minute read]
I recently read Weapons of Math Destruction by Dr. Cathy O’Neil and found it an enormously frustrating book. It’s not that whole book was rubbish – that would have made things easy. No, the real problem with this book is that the crap and the pearls were so closely mixed that I had to stare at every sentence very, very carefully in hopes of figuring out which one each was. There’s some good stuff in here. But much of Dr. O’Neil’s argumentation relies on two new (to me) fallacies. It’s these fallacies (which I’ve dubbed the Ought-Is Fallacy and the Availability Bait-and-Switch) that I want to explore today.
It’s a commonly repeated truism that “correlation doesn’t imply causation”. People who’ve been around the statistics block a bit longer might echo Randall Monroe and retort that “correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there'”. Understanding why a graph like this:
Is utter horsecrap , despite how suggestive it looks is the work of a decent education in statistics. Here correlation doesn’t imply causation. On the other hand, it’s not hard to find excellent examples where correlation really does mean causation:
When trying to understand the ground truth, it’s important that you don’t confuse correlation with causation. But not every human endeavour is aimed at determining the ground truth. Some endeavours really do just need to understand which activities and results are correlated. Principal among these is insurance.
Let’s say I wanted to sell you “punched in the face” insurance. You’d pay a small premium every month and if you were ever punched in the face hard enough to require dental work, I’d pay you enough to cover it . I’d probably charge you more if you were male, because men are much, much more likely to be seriously injured in an assault than women are.
I’m just interested in pricing my product. It doesn’t actually matter if being a man is causal of more assaults or just correlated with it. It doesn’t matter if men aren’t inherently more likely to assault and be assaulted compared to women (for a biological definition of “inherently”). It doesn’t matter what assault rates would be like in a society without toxic masculinity. One thing and one thing alone matters: on average, I will have to pay out more often for men. Therefore, I charge men more.
If you were to claim that because there may be nothing inherent in maleness that causes assault and being assaulted, therefore men shouldn’t have to pay more, you are making a moral argument, not an empirical one. You are also committing the ought-is fallacy. Just because your beliefs tell you that some aspect of the world should be a certain way, or that it would be more moral for the world to be a certain way, does not mean the world actually is that way or that everyone must agree to order the world as if that were true.
This doesn’t prevent you from making a moral argument that we should ignore certain correlates in certain cases in the interest of fairness, merely that you should not be making an empirical argument about what is ultimately values.
The ought-is fallacy came up literally whenever Weapons of Math Destruction talked about insurance, as well as when it talked about sentencing disparities. Here’s one example:
But as the questions continue, delving deeper into the person’s life, it’s easy to imagine how inmates from a privileged background would answer one way and those from tough inner-city streets another. Ask a criminal who grew up in comfortable suburbs about “the first time you were ever involved with the police,” and he might not have a single incident to report other than the one that brought him to prison. Young black males, by contrast, are likely to have been stopped by police dozens of times, even when they’ve done nothing wrong. A 2013 study by the New York Civil Liberties Union found that while black and Latino males between the ages of fourteen and twenty-four made up only 4.7 percent of the city’s population, they accounted for 40.6 percent of the stop-and-frisk checks by police. More than 90 percent of those stopped were innocent. Some of the others might have been drinking underage or carrying a joint. And unlike most rich kids, they got in trouble for it. So if early “involvement” with the police signals recidivism, poor people and racial minorities look far riskier.
Now I happen to agree with Dr. O’Neil that we should not allow race to end up playing a role in prison sentence length. There are plenty of good things to include in a sentence length: seriousness of crime, remorse, etc. I don’t think race should be one of these criteria and since the sequence of events that Dr. O’Neil mentions make this far from the default in the criminal justice system, I think doing more to ensure race stays out of sentencing is an important moral responsibility we have as a society.
But Dr. O’Neil’s empirical criticism of recidivism models is entirely off base. In this specific example, she is claiming that some characteristics that correlate with recidivism should not be used in recidivism models even though they improve the accuracy, because they are not per se causative of crime.
Because of systematic racism and discrimination in policing , the recidivism rate among black Americans is higher. If the only thing you care about is maximizing the prison sentence of people who are most likely to re-offend, then your model will tag black people for longer sentences. It does not matter what the “cause” of this is! Your accuracy will still be higher if you take race into account.
To say “black Americans seem to have a higher rate of recidivism, therefore we should punish them more heavily” is almost to commit the opposite fallacy, the is-ought. Instead, we should say “yes, empirically there’s a high rate of recidivism among black Americans, but this is probably caused by social factors and regardless, if we don’t want to create a population of permanently incarcerated people, with all of the vicious cycle of discrimination that this creates, we should aim for racial parity in sentencing”. This is a very strong (and I think persuasive) moral claim .
It certainly is more work to make a complicated moral claim that mentions the trade-offs we must make between punishment and fairness (or between what is morally right and what is expedient) than it is to make a claim that makes no reference to these subtleties. When we admit that we are sacrificing accuracy in the name of fairness, we do open up an avenue for people to attack us.
Despite this disadvantage, I think keeping our moral and empirical claims separate is very important. When you make the empirical claim that “being black isn’t causative of higher rates of recidivism, therefore the models are wrong when they rank black Americans as more likely to reoffend”, instead of the corresponding ethical claim, then you are making two mistakes. First, there’s lots of room to quibble about what “causative” even means, beyond simple genetic causation. Because you took an empirical and not ethical position, you may have to fight any future evidence to the contrary of your empirical position, even if the evidence is true; in essence, you risk becoming an enemy of the truth. If the truth becomes particularly obvious (and contrary to your claims) you risk looking risible and any gains you achieved will be at risk of reversal.
Second, I would argue that it is ridiculous to claim that universal human rights must rest on claims of genetic identicalness between all groups of people (and trying to make the empirical claim above, rather than a moral claim implicitly embraces this premise). Ashkenazi Jews are (on average) about 15 IQ points ahead of other groups. Should we give them any different moral worth because of this? I would argue no . The only criteria for full moral worth as a human and all universal rights that all humans are entitled to is being human.
As genetic engineering becomes possible, it will be especially problematic to have a norm that moral worth of humans can be modified by their genetic predisposition to pro-social behaviour. Everyone, but most especially the left, which views diversity and flourishing as some of its most important projects should push back against both the is-ought and ought-is fallacies and fight for an expansive definition of universal human rights.
Imagine someone told you the following story:
The Fair Housing Act has been an absolute disaster for my family! My brother was trying to sublet his apartment to a friend for the summer. Unfortunately, one of the fair housing inspectors caught wind of this and forced him to put up notices that it was for rent. He had to spend a week showing random people around it and some snot-nosed five-year-old broke one of his vases while he was showing that kid’s mother around. I know there were problems before, but is the Fair Housing Act really worth it if it can cause this?
Most people would say the answer to the above is “yes, it really was worth it, oh my God, what is wrong with you?”
But it’s actually hard to think that. Because you just read a long, vivid, easily imaginable example of what exactly was wrong with the current regime and a quick throw away reference to there being problems with the old way things were done. Some people might say that it’s better to at least mention that the other way of doing things had its problems too. I disagree strenuously.
When you make a throw-away reference to problems with another way of doing things, while focusing all of your descriptive effort on the problems of the current way (or vice-versa), you are committing the Availability Bait-and-Switch. And you are giving a very false illusion of balance; people will remember that you mentioned both had problems, but they will not take this away as their impression. You will have tricked your readers into thinking you gave a balanced treatment (or at least paved the way for a defence against claims that you didn’t give a balanced treatment) while doing nothing of the sort!
We are all running corrupted hardware. One of the most notable cognitive biases we have is the availability heuristic. We judge probabilities based on what we can easily recall, not on any empirical basis. If you were asked “are there more words in the average English language book that start with k, or have k as the third letter?”, you’d probably say “start with k!” . In fact, words with “k” as the third letter show up more often. But these words are harder to recall and therefore much less available to your brain.
If I were to give you a bunch of very vivid examples of how algorithms can ruin your life (as Dr. O’Neil repeatedly does, most egregiously in chapters 1, 5, and 8) and then mention off-hand that human decision making also used to ruin a lot of people’s lives, you’d probably come out of our talk much more concerned with algorithms than with human decision making. This was a thing I had to deliberately fight against while reading Weapons of Math Destruction.
Because for a book about how algorithms are destroying everything, there was a remarkable paucity of data on this destruction. I cannot recall seeing any comparative analysis (backed up by statistics, not anecdotes) of the costs and benefits of human decision making and algorithmic decision making, as it applied to Dr. O’Neil’s areas of focus. The book was all the costs of one and a vague allusion to the potential costs of the other.
If you want to give your readers an accurate snapshot of the ground truth, your examples must be representative of the ground truth. If algorithms cause twice as much damage as human decision making in certain circumstances (and again, I’ve seen zero proof that this is the case) then you should interleave every two examples of algorithmic destruction with one of human pettiness. As long as you aren’t doing this, you are lying to your readers. If you’re committed to lying, perhaps for reasons of pithiness or flow, then drop the vague allusions to the costs of the other way of doing things. Make it clear you’re writing a hatchet job, instead of trying to claim epistemic virtue points for “telling both sides of the story”. At least doing things that way is honest .
 This is a classic example of “anchoring”, a phenomenon where you appear to have a strong correlation in a certain direction because of a single extreme point. When you have anchoring, it’s unclear how generalizable your conclusion is – as the whole direction of the fit could be the result of the single extreme point.
Here’s a toy example:
Note that the thing that makes me suspicious of anchoring here is that we have a big hole with no data and no way of knowing what sort of data goes there (it’s not likely we can randomly generate a bunch of new countries and plot their gun ownership and rate of mass shootings). If we did some more readings (ignoring the fact that in this case we can’t) and got something like this:
I would no longer be worried about anchoring. It really isn’t enough just to look at the correlation coefficient either. The image labelled “Also Not Anchored” has a marginally lower correlation coefficient than the anchored image, even though (I would argue) it is FAR more likely to represent a true positive correlation. Note also we have no way to tell that more data will necessarily give us a graph like the third. We could also get something like this:
In which we have a fairly clear trend of noisy data with an average of 2.5 irrespective of our x-value and a pair of outliers driving a slight positive correlation.
Also, the NYT graph isn’t normalized to population, which is kind of a WTF level mistake. They include another graph that is normalized later on, but the graph I show is the preview image on Facebook. I was very annoyed with the smug liberals in the comments of the NYT article, crowing about how conservatives are too stupid to understand statistics. But that’s a rant for another day… ^
 I’d very quickly go out of business because of the moral hazard and adverse selection built into this product, but that isn’t germane to the example. ^
 Or at least, this is my guess as to the most plausible factors in the recidivism rate discrepancy. I think social factors – especially when social gaps are so clear and pervasive – seem much more likely than biological ones. The simplest example of the disparity in policing – and its effects – is the relative rates of being stopped by police during Stop and Frisk given above by Dr. O’Neil. ^
 It’s possible that variations in Monoamine oxidase A or some other gene amongst populations might make some populations more predisposed (in a biological sense) to violence or other antisocial behaviour. Given that violence and antisocial behaviour are relatively uncommon (e.g. about six in every one thousand Canadian adults are incarcerated or under community supervision on any given day), any genetic effect that increases them would both be small on a social level and lead to a relatively large skew in terms of supervised populations.
This would occur in the same way that repeat offenders tend to be about one standard deviation below median societal IQ but the correlation between IQ and crime explains very little of the variation in crime. This effect exists because crime is so rare.
It is unfortunately easy for people to take things like “Group X is 5% more likely to be violent”, and believe that people in Group X are something like 5% likely to assault them. This obviously isn’t true. Given that there are about 7.5 assaults for every 1000 Canadians each year, a population that was instead 100% Group X (with their presumed 5% higher assault rate) would see about 7.875 assaults per 1000 people, a difference of about one additional assault per 3500 people.
Unfortunately, if society took its normal course, we could expect to see Group X very overrepresented in prison. As soon as Group X gets a reputation for violence, juries would be more likely to convict, bail would be less likely, sentences might be longer (out of fear of recidivism), etc. Because many jobs (and in America, social benefits and rights) are withdrawn after you’ve been sentenced to jail, formerly incarcerated members of Group X would see fewer legal avenues to make a living. This could become even worse if even non-criminal members of Group X would denied some jobs due to fear of future criminality, leaving Group X members with few overall options but the black and grey economies and further tightening the spiral of incarceration and discrimination.
In this case, I think the moral thing to do as a society is to ignore any evidence we have about between-group differences in genetic propensities to violence. Ignoring results isn’t the same thing as pretending they are false or banning research; we aren’t fighting against truth, simply saying that some small extra predictive power into violence is not worth the social cost that Group X would face in a society that is entirely unable to productively reason about statistics. ^
 Although we should be ever vigilant against people who seek to do the opposite and use genetic differences between Ashkenazi Jews and other populations as a basis for their Nazi ideology. As Hannah Arendt said, the Holocaust was a crime against humanity perpetrated on the body of the Jewish people. It was a crime against humanity (rather than “merely” a crime against Jews) because Jews are human. ^
 Or at least, you would if I hadn’t warned you that I was about to talk about biases. ^
 My next blog post is going to be devoted to what I did like about the book, because I don’t want to commit the mistakes I’ve just railed against (and because I think there was some good stuff in the book that bears reviewing). ^
Last week, I used the Graph Model of Conflict Resolution to find a set of stable equilibria in the present conflict between North Korea and the USA. They were:
The tense status quo (s. 0)
An American troop withdrawal, paired with North Korea giving up its nuclear weapons (s.10)
All out conventional warfare on the Korean Peninsula (s. 4)
All out nuclear warfare on the Korean Peninsula (s. 5)
But how much can we trust these results? How much to they depend on my subjective ranking of the belligerent’s preferences? How much do they depend on the stability metrics I used?
To get a sense of this, I’m going to add another stability metric into the mix, come up with three new preference vectors, and look at how the original results change when we consider a North Korean invasion to be irreversible. After these eight new stability calculations, we’ll have nine slightly different ways of looking at the conflict; this should help us guess which equilibria are robust to my subjective choices and which might exist only because of how I framed the problem.
Alternative Stability Metrics
Previously we assessed stable states using Nash Stability and Sequential Stability. Sequential Stability allowed us to see what would happen if the decision makers were looking two moves ahead and assuming that their opponents wouldn’t “cut off the nose to spite the face” – it assumes, in essence, that people will only sanction by moving to states that they like more, not states they like less.
Maybe that’s a bad assumption dealing with Trump and Kim Jong-un. In this case, wouldn’t it be better to use Symmetric Metarationality? With Symmetric Metarationality, all sanctioning unilateral moves are on the table. Symmetric Metarationality also allows decision makers to respond to sanctioning. In effect, it lets them look three moves ahead, instead of the two allowed by Sequential Stability.
Before we see how this new metric changes things, let’s review our states, preference vectors, and stability analysis from last time.
The states are:
Or in plain English:
Nuclear strike by the US, NK keeps nuclear weapons
Unilateral US troop withdrawal
North Korean invasion with only conventional US responses
North Korean invasion with US nuclear strike
US withdrawal and North Korean Invasion
Unilateral North Korean abandonment of nuclear weapons
US strike and North Korean abandonment of nuclear weapons
Coordinated US withdrawal and NK abandonment of nuclear weapons
NK invasion after abandoning nuclear weapons; conventional US response
NK invasion after abandoning nuclear weapons; US nuclear strike
US withdrawal paired with NK nuclear weapons abandonment and invasion
From these states, we saw the following equilibria and unilateral improvements:
When dealing with Symmetric Metarationality, I find it very helpful to modify the chart above so that it also includes unilateral moves. After we make this change and blank out our results, we get the following:
From here, we use a simple algorithm. First, all states without unilateral improvements are Nash Stable. Next, we check each unilateral improvement in the remaining states against the opponent’s unilateral actions, then against the original actors best unilateral action from each of the resulting states. If there are no results lower than the original actor started, the move is unstable. Otherwise it’s stable by Symmetric Metarationality (and we’ll mark it with “S”). Like Sequential Stability, you can’t truly call this done until you check for states that are simultaneously sanctioned (this is often easy because simultaneous sanctioning is only a risk when both sides are unstable).
An example: There exist a unilateral improvement for America from s. 4 to s. 5. From s. 5, North Korea can move to s. 1, 13, or 9. America disprefers both s. 1 and s. 13 to s. 4 and has no moves out of them, so the threat of North Korea taking either of those actions is an effective sanction and makes s. 4 stable on the American side.
Once we repeat this for all states across both sides, we get the following:
We’ve kept all of our old equilibria and gained a new one in s. 12: “NK invasion after abandoning nuclear weapons; conventional US response”.
Previously, s. 12 wasn’t stable because North Korea preferred the status quo (s. 0) to it and the US had no UIs from the status quo. North Korea moving from s. 12 to s. 0 is sanctioned in Symmetric Metarationality by the US unilateral move from s. 0 to s. 1, which leaves North Korea with only the option of moving from s. 1 to s. 5. State 5 is dispreferred to s. 12 by North Korea, so it can’t risk leaving s. 12 for s. 0. State 12 was always Nash Stable for the US, so it becoming stable for North Korea makes it an equilibrium point.
To put this another way (and to put an example on what I said above), using Symmetric Metarationality allows us to model a world where the adversaries see each other as less rational and more spiteful. In this world. NK doesn’t trust the US to remain at s. 0 if it were to call for a truce after an invasion, so any invasion that starts doesn’t really end.
It was heartening to see all of our existing equilibria remain where they were. Note that I did all of the work in this post without knowing what the results would be and fully prepared to publish even if my initial equilibria never turned up again; that they showed up here made me somewhat relieved.
Previously we modelled invasions as reversible. But is this a realistic assumption? It’s very possible that the bad will from an invasion could last for quite a while, making other strategies very difficult to try out. It’s also likely that America wouldn’t just let North Korean troops give up and slink away without reprisal. If this is the case, maybe we should model a North Korean invasion as irreversible. This will mean that there can be no unilateral improvements for North Korea from s. 4, 5, or 6 to s. 0, 1, 2, 8, 9, or 10.
In practical terms, modelling an invasion as irreversible costs North Korea one unilateral improvement, from s. 4 to s. 0. Let’s see if this changes the results at all (we’re back to sequential stability):
We end up losing the simultaneous sanctioning that made s. 4 a stable state, leaving us with only three stable states: the status quo, a trade of American withdrawal for the North Korean nuclear program, and all out nuclear war on the Korean Peninsula.
We’ve now tried three different ways of looking at this problem. Three equilibria (s. 0, 10, 5) showed up in all cases, one in two cases (s. 4), and one in one case (s. 12). We’re starting to get a sense for which equilibria are particularly stable and which are more liable to only pop up under certain conditions. But how will our equilibria fare when faced with a different preference vectors?
What if we’ve underestimated how much North Korea and the United States care about getting what they want and overestimated how much they care about looking reasonable? I’m going to try ranking the states so that North Korea always prefers invading and the US always prefers first that North Korea doesn’t invade the South and second that they have no nuclear weapons program.
Since we’re modelling the actors as more belligerent, let’s also assume for the purposes of these analyses that invasions are irreversible.
Here are the preferences vectors we’ll use to find equilibria:
Here we have only two stable states, s. 5 and 12. Both of these involve war on the Korean Peninsula; not even the status quo is stable. State 2 is at risk of simultaneous sanctioning, but the resulting states (4, 12, 5, 13) aren’t dispreferred, to s. 2 for either actor, so no simultaneous sanctioning occurs. There really are just two equilibria.
Symmetric Metarationality gives us the exact same result. Only s. 5 and s. 12 are stable. This is suspicious, as the conflict has managed to stay in s. 0 for quite some time. If these preferences were correct, North Korea would have already invaded South Korea and been met with a nuclear response.
What if these preferences are substantially correct and both sides are more aggressive than we initially suspected, but North Korea disprefers being attacked by nuclear weapons below s. 0 and s. 10? That state of affairs is perhaps more reasonable than the blatantly suicidal North Korea we just imagined. How does a modicum of self-preservation change the results?
If we’re assuming that North Korea has broadly similar preferences to our last variation, but doesn’t want to get attacked by nuclear weapons, we get the following preference vectors:
Here are the annotated preferences vectors we’ll use to assess stability with Sequential Stability and Symmetric Metarationality. Since we’re leaving the belligerency of the United States the same, we’ll continue to view invading as an irreversible action.
One “minor” change – deciding that North Korea really doesn’t want to be nuked – and we again have the status quo and a negotiated settlement (in addition to two types of war) as stable equilibria. Does this hold when we’re using Symmetric Metarationality?
Again, we have s. 0, 5, 10, and 12 as our equilibria.
As we’ve seen throughout, Symmetric Metarationality tends to give very similar answers to Sequential Stability. It’s still worth doing – it helps reassure us that our results are robust, but I hope by now you’re beginning to see why I could feel comfortable making an initial analysis based just off of just Sequential Stability.
What instead of underestimating the bloodthirstiness of our belligerents, we’ve been overestimating it? It’s entirely possible that both sides strongly disprefer all options that involve violence (and the more violence an option involves, the more they disprefer it) but talk up their position in hopes of receiving concessions. In this case, let’s give our actors these preference vectors:
(Note that I’m only extending “peacefulness” to these two actors; I’m assuming that North Korea would happily try and annex South Korea if there was no need to fight America to do so)
There are fewer unilateral improvements in this array than in many of the previous ones.
This is perhaps the most surprising result we’ve seen so far. If both powers are all talk with nothing behind it and both powers know and understand this, then they’ll stick in the current high-tension equilibria or fight a war. The only stable states here are s. 0, 4, and 5. State 10, the “negotiated settlement” state is entirely absent. We’ll revisit this scenario with hypergame analysis later, to see what happens if the bluff is believed.
Here we see more equilibria than we’ve seen in any of the other examples. States 2 (unilateral US withdrawal) and 8 (North Korea unilaterally abandoning its nuclear weapons program) make their debut and s. 0, 4, 5, 10, and 12 appear again.
Remember, Symmetric Metarationality is very risk averse; it considers not just opponents’ unilateral improvements, but all of their unilateral moves as fair game. The fact that s. 0 has unilateral moves for either side that are aggressive leaves the actors too scared to move to it, even from states that they disprefer. This explains the presence of s. 2 and s. 8 in the equilibrium for the first time; they’re here because in this model both sides are so scared of war that if they blink first, they’ll be more relieved at the end of tension than they will be annoyed at moving away from their preferences.
I think in general this is a poor assumption, which is why I tend to find Sequential Stability a more useful concept than Symmetric Metarationality. That said, I don’t think this is impossible as a state of affairs, so I’m glad that I observed it. In general, this is actually one of my favourite things about the Graph Model of Conflict Resolution: using it you can very quickly answer “what ifs”, often in ways that are easily bent to understandable narratives.
Why Sensitivity Analysis?
The cool thing about sensitivity analysis is that it shows you the equilibria a conflict can fall into and how sensitivity those equilibria are to your judgement calls. There are 12 possible states in this conflict, but only 7 of them showed up in any stability analysis at all. Within those seven, only 5 showed up more than once.
Here’s a full accounting of the states that showed up (counting our first model, there were nine possible simulations for each equilibrium to show up in):
Unilateral US troop withdrawal
North Korean invasion with only conventional US responses
North Korean invasion with US nuclear strike
Unilateral North Korean abandonment of nuclear weapons
Coordinated US withdrawal and NK abandonment of nuclear weapons
NK invasion after abandoning nuclear weapons; conventional US response
Of the five that showed up more than once, four showed up more than half the time. These then are the most robust equilibria; equilibria that half of the reasonable changes we attempted couldn’t dislodge.
Note “most robust” is not necessarily equivalent to “most likely”. To get actual probabilities on outcomes, we’d have to put probabilities on the initial conditions. Even then, the Graph Model of Conflict Resolution as we’ve currently talked about it does little to explain how decision makers move between equilibria; because this scenario starts in equilibrium, it’s hard to see how it makes it to any of the other equilibria.
Hopefully I’ll be able to explain one way we can model changes in states in my next post, which will cover Hypergame Analysis – the tool we use when actors lack a perfect understanding of one another’s preferences.
Every day, there are conflicts between decision makers. These occur on the international scale (think the Cuban Missile Crisis), the provincial level (Ontario’s sex-ed curriculum anyone?) and the local level (Toronto’s bike lane kerfuffle). Conflict is inevitable. Understanding it, regrettably, is not.
The final results of many conflicts can look baffling from the outside. Why did the Soviet Union retreat in the Cuban missile crisis? Why do some laws pass and others die on the table?
The most powerful tool I have for understanding the ebb and flow of conflict is the Graph Model of Conflict Resolution (GMCR). I had the immense pleasure of learning about it under the tutelage of Professor Keith Hipel, one of its creators. Over the next few weeks, I’d like to share it with you.
GMCR is done in two stages, modelling and analysis.
To model a problem, there are four steps:
Select a point in time for the model
Make a list of the players and their options
Remove outcomes that don’t make sense
Create preference vectors for all players
The easiest way to understand this is to see it done.
Let’s look at the current nuclear stand-off on the Korean peninsula. I wrote this on Sunday, October 29th, 2017, so that’s the point in time we’ll use. To keep things from getting truly out of hand in our first example, let’s just focus on the US and North Korea (I’ll add in South Korea and China in a later post). What options does each side have?
Nuclear strike on North Korea
Withdraw troops and normalize relations
Invasion of South Korea
Abandon nuclear program and submit to inspections
I went through a few iterations here. I originally wrote the US option “Nuclear strike” as “Pre-emptive strike”. I changed it to be more general. A nuclear strike could be pre-emptive, but it also could be in response to North Korea invading South Korea.
It’s pretty easy to make a chart of all these states:
If you treat each action that the belligerents can make as a binary variable (yes=1 or no=0), the states will have a natural ordering based off of the binary sum of the actions taken and not taken. This specific ordering isn’t mandatory – you can use any ordering scheme you want – but I find it useful.
You may also notice that “Status quo” appears nowhere on this chart. That’s an interesting consequence of how actions are represented in the GMCR. Status quo is simply neither striking nor withdrawing for the US, or neither invading nor abandoning their nuclear program for North Korea. Adding an extra row for it would just result in us having to do more work in the next step, where we remove states that can’t exist.
I’ve colour coded some of the cells to help with this step. Removing nonsensical outcomes always requires a bit of judgement. Here we aren’t removing any outcomes that are highly dispreferred. We are supposed to restrict ourselves solely to removing outcomes that seem like they could never ever happen.
To that end, I’ve highlighted all cases where America withdraws troops and strikes North Korea. I’m interpreting “withdraw” here to mean more than just withdrawing troops – I think it would mean that the US would be withdrawing all forms of protection to South Korea. Given that, it wouldn’t make sense for the US to get involved in a nuclear war with North Korea while all the while loudly proclaiming that they don’t care what happens on the Korean peninsula. Not even Nixon’s “madman” diplomacy could encompass that.
On the other hand, I don’t think it’s necessarily impossible for North Korea to give up its nuclear weapons program and invade South Korea. There are a number of gambits where this might make sense – for example, it might believe that if they attacked South Korea after renouncing nuclear weapons, China might back them or the US would be unable to respond with nuclear missiles. Ultimately, I think these should be left in.
Here’s the revised state-space, with the twelve remaining states:
The next step is to figure out how each decision maker prioritizes the states. I’ve found it’s helpful at this point to tag each state with a short plain language explanation.
Nuclear strike by the US, NK keeps nuclear weapons
Unilateral US troop withdrawal
North Korean invasion with only conventional US responses
North Korean invasion with US nuclear strike
US withdrawal and North Korean Invasion
Unilateral North Korean abandonment of nuclear weapons
US strike and North Korean abandonment of nuclear weapons
Coordinated US withdrawal and NK abandonment of nuclear weapons
NK invasion after abandoning nuclear weapons; conventional US response
NK invasion after abandoning nuclear weapons; US nuclear strike
US withdrawal paired with NK nuclear weapons abandonment and invasion
While describing these, I’ve tried to avoid talking about causality. I didn’t describe s. 5 as “North Korean invasion in response to US nuclear strike” or “US nuclear strike in response to North Korean invasion”. Both of these are valid and would depend on which states preceded s. 5.
Looking at all of these states, here’s how I think both decision makers would order them (in order of most preferred to least preferred):
The US prefers North Korea give up its nuclear program and wants to keep protecting South Korea. Its secondary objective is to seem like a reasonable actor on the world stage – which means that it has some preference against using pre-emptive strikes or nuclear weapons on non-nuclear states.
North Korea wants to unify the Korean peninsula under its banner, protect itself against regime change, and end the sanctions its nuclear program has brought. Based on the Agreed Framework, I do think Korea would be willing to give up nuclear weapons in exchange for a normalization of relations with the US and sanctions relief.
Once we have preference vectors, we’ve modelled the problem. Now it’s time for stability analysis.
A state is stable for a player if it isn’t advantageous for the player to shift states. A state is globally stable if it is not advantageous for any player to shift states. When a player can move to a state they prefer over the current state without any input from their opponent, this is a “unilateral improvement” (UI).
There are a variety of ways we can define “advantageous”, which lead to various definitions of stability:
Nash Stability (R): Stable if the actor has no unilateral improvements. States that are Nash stable tend to be pretty bad; these include both sides attacking in a nuclear war or both prisoners defecting in the prisoner’s dilemma. Nash stability ignores the concept of risk; it will never move to a less preferred state in the hopes of making it to a more preferred state.
General Metarationality (GMR): Stable if the actor has no unilateral improvements that aren’t sanctioned by unilateral moves by others. This tends to lead to less confusing results than Nash stability; Cooperation in the prisoner’s dilemma is stable in General Metarationality. General Metarationality accepts the existence of risk, but refuses to take any.
Symmetric Metarationality (SMR): Stable if an actor has no unilateral improvements that aren’t sanctioned by opponents’ unilateral moves after it has a chance to respond to them. This is equivalent to GMR, but with a chance to respond. Here we start to see the capacity to take on some risk.
Sequential Stability (SEQ): Stable if the actor has no unilateral improvements that aren’t sanctioned by opponents’ unilateral improvements. This basically assumes fairly reasonable opponents, the type who won’t cut off their nose to spite their face. Your mileage may vary as to how appropriate this assumption is. Like SMR, this system takes on some risk.
Limited Move Stability (LS): A state is stable if after N moves and countermoves (with both sides acting optimally), there exists no improvement. This is obviously fairly risky as any assumptions you make about your opponents’ optimal actions may turn out to be wrong (or wishful thinking).
Non-myopic Stability (NM): Equivalent to Ls with N set equal to infinity. This predicts stable states where there’s no improvements after any amount of posturing and state changes, as long as both players act entirely optimally.
The two stability metrics most important to the GMCR (at least as I was taught it) are Nash Stability (denoted with r) and Sequential Stability (denoted with s). These have the advantage of being simple enough to calculate by hand while still explaining most real-world equilibria quite well.
To do stability analysis, you write out the preference vectors of both sides, along with any unilateral improvements that they can make. You then use this to decide the stability of each state for each player. If both players are stable at a state by any of the chosen stability metrics, the state overall is stable. A state can also be stable if both players have unilateral improvements from it that result in both ending up in a dispreferred state if taken simultaneously. This is called simultaneous sanctioning and is denoted with u.
The choice of stability metrics will determine which states are stable. If you only use Nash stability, you’ll get a different result than if you combine Sequential Stability and Nash Stability.
Here’s the stability analysis for this conflict (using Nash Stability and Sequential Stability):
Before talking about the outcome, I want to mention a few things.
Look at s. 9 for the US. They prefer s. 8 to s. 9 and the two differ only on a US move. Despite this, s. 8 isn’t a unilateral improvement over s. 9 for the US. This system is called the Graph Model of Conflict Resolution for a reason. States can be viewed as nodes on a directed graph, which implies that some nodes may not have a connection. Or, to put it in simpler terms, some actions can’t be taken back. Once the US has launched a nuclear strike, it cannot un-launch it.
This holds less true for abandoning a nuclear program or withdrawing troops; both of those are fairly easy to undo (as we found out after the collapse of the Agreed Framework). Invasions on the other hand are in a tricky category. They’re somewhat reversible (you can stop and pull out), but the consequences linger. Ultimately I’ll call them reversible, but note that this is debatable and the analysis could change if you change this assumption.
In a perfect world, I’d go through this exercise four or five different times, each time with different assumptions about preferences or the reversibility of certain states or with different stability metrics and see how each factor changes the results. My next blog post will go through this in detail.
The other thing to note here is the existence of simultaneous sanctioning. Both sides have a UI from s. 4; NK to s. 0 and the US to s. 5. Unfortunately, if you take these together, you get s. 1, which both sides disprefer to s. 4. This means that once a war starts the US will be hesitant to launch a nuclear strike and North Korea would be hesitant to withdraw – in case they withdrew just as a strike happened. In reality, we get around double binds like this with negotiated truces – or unilateral ultimatums (e.g. “withdraw by 08:00 tomorrow or we will use nuclear weapons”).
There are four stable equilibria in this conflict:
The status quo
A coordinated US withdrawal of troops (but not a complete withdrawal of US interest) and North Korean renouncement of nuclear weapons
All out conventional war on the Korean Peninsula
All out nuclear war on the Korean Peninsula
I don’t think these equilibria are particularly controversial. The status quo has held for a long time, which would be impossible if it wasn’t a stable equilibrium. Meanwhile, s. 10 looks kind of similar to the Iran deal, with the US removing sanctions and doing some amount of normalization in exchange for the end of Iran’s nuclear program. State 5 is the worst-case scenario that we all know is possible.
Because we’re currently in a stable state, it seems unlikely that we’ll shift to one of the other states that could exist. In actuality, there are a few ways this could happen. A third party could intervene with its own preference vectors and shake up the equilibrium. For example, China could use the threat of economic sanctions (or the threat of ending economic sanctions) to try and get North Korea and the US to come to a détente. There also could be an error in judgement on the part of one of the parties. A false alarm could quickly turn into a very real conflict. It’s also possible that one party could mistake the others preferences, leading to them taking a course of action that they incorrectly believe isn’t sanctioned.
In future posts, I plan to show how these can all be taken into account, using the GMCR framework for Third Party Intervention and Coalitional Analysis, Strength of Preferences, and Hypergame Analysis.
Even without those additions, the GMCR is a powerful tool. I encourage you to try it out for other conflicts and see what the results are. I certainly found that the best way to really understand it was to run it a few times.
Note: I know it’s hard to play around with the charts when they’re embedded as images. You can see copyable versions of them here.
Note: This blog post is about housework and chores. If disability or mental illness makes chores difficult for you to do and having someone breezily describe it as “easy” will be bad for you, I recommend skipping it. This meant to help people who are able split chores with a partner – but historically haven’t – begin to do so. It isn’t meant to be a cudgel with which to beat people who have difficulty with chores due to ability status. If this describes you, you are not lazy or broken and your difficulties are real and valid.
So, you’ve seen the comic by Emma, or read The Second Shift (which also happens to be my favourite term for the chores and childcare that happens after or before work), or maybe someone has linked you here with a pointed note. In any case, I’m going to assume you’re reading this because you’ve realized that you don’t help your partner with much around the house, don’t share much of the management of household chores with your partner, or aren’t very good at household chores and want to get better.
There are three main things you need to work on if you want to be able to split both the act of doing chores and the mental load of keeping track of them with your partner . These are: general skills, noticing things, and keeping track of what needs to happen. It’s difficult to work on any of these in isolation. Getting better at chores will help you feel empowered to notice when they need to be done or keep track of the schedule of doing them. Doing chores whenever you notice they need to be done will give you the practice you need to get better at them.
I think it would be a confusing guide if I laid it all out as holistically as you’ll be working on everything. In the interest of making this digestible, I’ve given each of the key areas their own subsection, with an additional final section the talks about dealing with some of the issues that may arise as you and your partner negotiate and re-negotiate the second shift.
If you honestly don’t have any housework skills at all (either because you lacked an adult to model them for you, or adults refused to model them for you because of your gender, or any other reason) you’re going to need to start by building them up. It may seem like a good idea to ask your partner for help with this task.
It might not be. If your partner is frustrated with you because they feel you aren’t pulling your weight around the house, asking them to teach you will only increase the short-term stress on them. You’ll probably expect them to respond really positively to your change of heart, but you shouldn’t be surprised if they’re instead grumbly. Teaching someone how to do something is work. Teaching you chores would mean that for a while, all chores will take them longer.
It’s possible that your expectation that your partner be thrilled that you’re helping out will clash with any annoyance they have at doing chores more slowly in order to teach you and leave both of you feeling out of sorts. You’ll be hurt that your partner isn’t appreciating your “gift” , while your partner might feel like it’s taken you long too long to even offer. It’s also possible that seeing you learn might convince your partner that you can’t do chores correctly, which will make them reluctant to delegate chores to you and ruin your whole enterprise before it really begins.
If it turns out your partner is a bad choice, cadge lessons from your closest friends. They don’t have to live with you and they aren’t starting from a place of frustration. For many friends, it’s definitely worth a few pointers to have someone else do the grunt-work of their chores for them. And that’s exactly the deal I suggest you make.
That said, if your relationship with your partner is one where you can talk honestly and openly (and if it isn’t, um, what are you doing?) you can cut out the guessing and just ask them what they’d prefer. Talking with your partner has a further advantage: you can ask them what chores they’d most like you to learn. I have some samples here, but if these are the chores your partner minds least (while I know at least one person who hates each of these, they also just happen to be the chores I find most tolerable), you may want to substitute them for chores your partner especially hates (like fucking sweeping, the objectively worst chore).
Think about the type of food you (and your friends or your partner) like to eat, then go looking online for recipes that match. I’m very partial to the President’s Choice recipes website, as well as the blog Cookie and Kate, but Google is your friend here. Once you have a recipe in mind, contact your chosen teacher and ask if you can buy the ingredients  and make it for them. Make it clear that the meal will only happen if they teach you things like basic knife skills and how to boil water.
Repeat this process with several different friends until you can make 2-3 recipes unaided. Ideally these shouldn’t have much overlap in technique (e.g. one soup, one stir fry with rice, and one pasta dish). Once you have the basics under your belt, you should be able to pick the rest up as you go along, assuming you end up doing at least some of the cooking in your household.
There are four good reasons to learn to do the dishes:
It’s easy to learn and hard to get wrong
It’s an excellent way to train your ability to notice things
Doing the dishes doesn’t preclude talking with people
Which means that you can get a reputation as helpful simply by doing the dishes whenever someone invites you over for a meal, without sacrificing any time hanging out with your friends
You can learn to do the dishes the same way as cooking. Just ask a friend if you can come over, hangout, and do their dishes. Basically no one will say no to this. It can also be combined with learning to make food if you want to save some time.
Whenever you do dishes at home, especially if it’s part of your set of chores, you should remember that the dishes aren’t truly done until you’ve put them away. Don’t leave them in the dishwasher or drying rack for days!
Laundry is a chore that has to be scheduled (unless you like running out of underwear), so learning it will allow you to practice that aspect of the second shift. You can learn laundry the same as you would dishes or cooking, or maybe even at the same time if are picking recipes with lots of dead time.
There are two important things to note about laundry:
If you don’t want everything to be horribly wrinkled, you need to take it out of the dryer as soon as it’s done.
If you are doing laundry for someone else (and especially if that person wears feminine clothes), you must ask them “is there anything in this load that can’t go in the dryer or needs to go in on delicate?”. Many things (especially hosiery) can be ruined by the wrong dryer setting, or by going in the dryer at all.
Cleaning the washroom
I’ve found that people give me an inordinate amount of credit (relative to the work involved) whenever I clean a washroom. I think this is because (oddly) most people hate cleaning the washroom. These people are mistaken. In all households where the washroom has been cleaned in the last year or so, this is one of the least gross rooms to clean.
(That said, this is one chore I wouldn’t recommend learning at the same time as you cook!)
People are very cavalier about food. Food spills rarely get cleaned up properly, leading to stickiness or mold in the kitchen. Kitchen sinks are often a disaster of old food, soggy vegetables, and clogged drains. I find it impossible to clean a kitchen without retching at least once from some food that’s gone off.
Bathrooms, on the other hand, rarely smell all that bad (and when they do, it’s more of a faint lingering odour, as opposed to the concentrated wretchedness you might find at the back of the fridge). People are incredibly embarrassed by any spills they cause in the bathroom and try to completely clean them up. If you wear gloves and wash your hands regularly, you should rarely be grossed out cleaning the bathroom (with the exception of the shower drain, which becomes a yawning abyss as soon as anyone in the house has hair past shoulder length).
Most people (especially people in their twenties) don’t realize all this and treat cleaning the bathroom as only marginally less heroic than cleaning up nuclear waste.
Take advantage of this fact and offer to clean your friend’s washroom if they show how to do it. You really only need to do this once or twice to get the hang of it. Then you’ll be all set to take over what’s probably your partner’s least favourite chore.
Once you’ve learned some things
You can show off your skills to your partner. If you started learning before your inability to do chores became a problem in the relationship, you were probably having your partner teach you, in which case you can skip this step. If you instead learned from friends, you need to make your partner aware that you can now do things around the home.
Ideally, you would clean a room or make a dinner and then have your partner make non-judgemental suggestions about how you could do it better. Be prepared to spot genuine conflicts of values; you might view things as clean after a quick wipe, when your partner considers them clean only after a thorough scrub. I suggest that you and your partner put some time into negotiating a combined standard if your preferences aren’t already congruent. Remember that if you haven’t been doing the chores much, you aren’t really negotiating from a position of strength. Also remember that diverging cleanliness preferences aren’t really a good reason to go back to doing nothing.
Within a month or so of starting your journey towards chores competence, you should be ready to take stuff off your partner’s plate. Note that the chores I’ve outlined above don’t represent half the housework for a typical couple (unless you do a significant amount of yard work or take over all of the cooking), so you’ll probably have to learn a few more things. Once you’ve built up goodwill from actually doing some chores, it should be fine to have your partner teach you how to do the remaining ones.
I actually recommend learning how to do every chore that gets regularly done. This allows you to do it if your partner is gone or sick (or if you ever break up). It also helps you discover which chores you don’t mind and which you despise (I’m looking at you, cleaning the kitchen). It’s probably best to split up the housework such that you and your partner spend a similar amount of time on the chores you don’t mind, in addition to trying to balance the overall amount of work.
Being able to do some chores means you’ve graduated from Chores 101. In Chores 202, you should develop the ability to do chores without prompting. It’s one thing to clean the washroom when asked, or make dinner when your partner loudly declares “I’m hungry”. It’s quite another to say to your partner “hey, I think this is as messy as I ever want the bathroom to get, will it disrupt your routine if I clean it tonight?” or “hey dear, does cauliflower mac and cheese sound good for dinner at six?” and then follow through.
When you take ownership of a chore and follow through on it, your partner can begin to drop the chore from their mind. Instead of looking around the washroom every so often, thinking about when they need to tell you to clean it, they can enjoy their shits in peace; instead of reminding you to go grocery shopping as a subtle way of telling you it’s your night to cook, they can relax and assume you’ll cook something delicious.
To build up your ability to notice things, you should pick a handful of chores and internally declare them MY RESPONSIBILITY. For chores that are your responsibility, you are forbidden to think “somebody should do that”. Whenever this thought happens, replace it with “I should do that!”.
With dishes this is especially easy. Look at the sink whenever you’re in the kitchen. If you don’t have anything urgent to do and there are some dishes in the sink, immediately do them (this is especially useful while waiting for the microwave, coffee maker, or toaster). On nights when your partner is cooking, head into the kitchen midway through their meal prep and start doing any dishes they’re done with. If you time this right, almost all the dishes can be done by the time you start eating and you can keep your partner company to boot .
You should aim to never be asked about something that is your responsibility (outside of extenuating circumstances, like “finals week”).
It’s obviously unfair to expect one person to notice everything wrong with the house (especially if people in the house have different cleanliness preferences). Note that this applies to your partner just as much as it applies to you. Neither of you should have to notice everything! This probably requires you and your partner to talk about what wrong means to you and come to a clear consensus. You should judge the state of the house off of this consensus, not off of how it feels to you personally .
There’s one final step to noticing things. When your partner asks you to do something (like get out a specific dish from the dishwasher), notice what else could be done and assume that the ask was as expansive as possible. Don’t just get out a single dish. Empty the whole dishwasher. When asked to take the laundry out of the dryer, fold it and put it away too. When you do the bare minimum, you push all the rest of the work onto your partner.
Keeping Track of What Needs to Happen
This is the last thing you need to get good at if you really want to share the mental load of chores with your partner.
Almost all chores spawn meta-chores. Cooking provides a simple example; you can’t cook if you don’t pay the power bill, buy groceries, and keep your cooking surfaces relatively clean. Even less involved chores probably require the occasional shopping trip, while children spawn a truly staggering amount of secondary work (like doctor’s appointments, vaccinations, permission slips, pre-school applications, birthday party invitations to sort, and homework to look over).
You can’t truly have ownership of a chore without taking responsibility for the chores it spawns. If your partner has to ask you every week if they need to pick up more cleaning supplies at the store, you’ve done a poor job managing the meta-chores. Your partner can only really banish a chore from their head once you’ve shown a clear track record of managing the meta-chores too.
If your memory isn’t great, assistive technology can really help. Apparently virtual assistants are now good enough that saying “Okay Google, remind me to buy dryer sheets next time I’m at a store” actually works. If you don’t want to share everything you ever do with Google or Apple, a pen and paper or notes to yourself on a calendar can work just as well.
You don’t need to do everything here yourself. If your partner regularly shops or is on their way to the grocery store for something they need, it’s totally fine to ask them to grab something you need on the way. The thing you want to avoid is the sort of cascading failure (e.g. a lack of soap means that laundry isn’t done for two weeks) that promotes chores they thought would be safely done to the top of their attention.
Ultimately, responsibility for your chores means that you should be able to do it even if no one else comes and saves you. In the same way that you want to train yourself to replace “someone should do that” with “I should do that” for the physical act of the chore, you need to replace things like “someone should buy more soap” with “I need to make sure we get more soap”.
Problems Sharing the Second Shift
I got the idea to write this after a friend shared Emma’s comic on their Facebook wall. Seeing the sense of hopelessness or anxiety it gave people who hadn’t been raised to know how to do chores or recognize when they had to be done was very eye-opening for me. One common complaint among people unused to chores was that it would be very stressful for them to try and notice every time something wasn’t perfect in order to swoop in and fix it.
I think this is a very reasonable thing to worry about if you and your partner are incapable of talking about things like “what does good enough look like?” and “how can we split these up, so that neither of us has to constantly ensure absolutely everything is perfect?”. In mainstream society, there’s a tendency for couples not to talk about their preferences and instead believe that true love necessarily provides intuition into everything your partner could want.
This becomes a real disaster when everyone assumes that their own way of doing things is the only reasonable way people would want to do it. In this case, genuinely different standards end up being misinterpreted as incompetence or subtle resistance.
All this is to say: if you’re worried that you can’t do anything to your partner’s nebulous standards, the root cause of this problem might be that you have no clue what those standards are and don’t know how to talk about them, not that noticing things is inherently very stressful . You should also make sure that you haven’t just ignored ten years of requests to do things to a certain standard, maybe because it was more convenient for you to ignore them?
I will say that if it feels impossible or very stressful to try and keep track of everything, this should be taken as evidence of how your partner might feel about it too. Foisting all that work onto them is a step of last resort that should only be undertaken after you’ve talked with them and made sure it isn’t just as costly for them to do all the management as it would be for you to do it.
Once you’ve overcome (or renegotiated) the stressful aspects of the second shift and taken on your share of it, it’s pretty natural to expect your partner to express a lot of gratitude. This may not necessarily happen or may not happen right away, especially if it’s taken you a very long time to start caring. “What took them so long?” is probably a more realistic response than “my hero!”.
If you feel underpraised, stop and consider how often you praise your partner for doing housework. If you already do, that’s awesome. Tell them that while this isn’t a quid pro quo, you’d be more motivated to do chores if they praised you too. If you don’t praise them, perhaps ye should give as ye expect to receive? Positive reinforcement probably will help you continue to do chores, but you and your partner may have to work through some lingering feelings before they’re quite willing to take that final step.
 Or partners. Or roommates. Or family. Endlessly caveating for all potential relationships that can occur in shared spaces is inimical to good flow and I’m vain enough about my writing that I’m going to sacrifice some nuance in the name of readability. ^
 For more about how the “economy of gratitude” can intersect with chores, see pages 54, 147, and 308 of The Second Shift by Professor Arlie Russel Hochschild (eBook version). ^
 Make sure to do the grocery shopping yourself, as grocery shopping is a skill all on its own. You haven’t fully appreciated just how taxing it can be until you’ve found yourself in the produce aisle, futilely scanning for an obscure vegetable and frantically Googling things like “can you use green onions instead of shallots?” or “what is the difference between scallions and shallots?”. (Learning to cook was full of onion related trauma for me) ^
 There is a big difference between your partner doing a chore while you relax and do other things and your partner doing a chore while you keep them company and help them with little things. If there are chores you are genuinely hopeless at that you still want to be a part of, you can help your partner out by making their life less boring and providing some company. Even people who can’t boil water without burning down the kitchen can fetch things from the fridge. ^
 It’s deeply unfair for people to be held to standards that they don’t know about. Having a clear conversation about chore expectations allows you and your partner to avoid the feeling that you’re being judged by capricious and mysterious standards. ^
 I am a bona fide expert at stressing out over little things and found a ten-minute conversation codifying the implicit assumptions my partner and I had around chores eliminated basically all of the stress I had. I now know that they find disorder much more stressful than lack of cleanliness and really appreciate me keeping things organized (I’m the opposite, so gave little thought to order), while they now know my esophageal problems make it very hard for me to eat food that is weirdly prepared (my partner is a very proficient cook with an iron gut, which sometimes leads to culinary experiments that are a bit beyond my ability to choke down; I stick to recipes).
Still, if this is very stressful for you even after a conversation, there is nothing wrong or broken about you! Be prepared to challenge your assumption that this will necessarily be stressful, but if your assumption is borne out, you should probably try something else. Maybe you can compensate for not managing the chores in other ways (perhaps by doing more of the actual work of chores)? I think splitting all aspects of chores evenly is a useful default, but each partnership needs to figure out for themselves what feels fair and achievable to them! ^
Previously I described regulation as a regressive tax. It may not kill jobs per se, but it certainly shifts them towards people with university degrees, largely at the expense of those without. I’m beginning to rethink that position; I’m increasingly worried that many types of regulation are actually leading to a net loss of jobs. There remains a paucity of empirical evidence on this subject. Today I’m going to present a (I believe convincing) model of how regulations could kill jobs, but I’d like to remind everyone that models are less important than evidence and should only be the focus of discussion in situations like this, where the evidence is genuinely sparse.
Let’s assume that regulation has no first order effect on jobs. All jobs lost through regulation (and make no mistake, there will be lost jobs) are offset by different jobs in regulatory compliance or the jobs created when the compliance people spend the money they make, etc., on to infinity. So far, this is all fine and dandy.
Talking to members of the local start-up community, I reckon that many small sized hardware start-ups spend the equivalent of an engineer’s salary on regulatory compliance yearly. Instead of a hypothetical engineer (or marketer, or salesperson, etc.), they’re providing a salary to a lawyer, or a technician at the FCC, or some other mid-level bureaucrat.
No matter how well this person does their job, they aren’t creating anything of value. There’s no chance that they’ll come up with or contribute to a revolutionary new product that drives a lot of economic growth and ends up creating dozens, hundreds, or (in very rare cases) thousands of jobs. An engineer could.
There’s obviously many ways that even successful start-ups with all the engineers they need can fail to create jobs on net. They could disrupt an established industry in a way that causes layoffs at the existing participants (although it’s probably fallacious to believe that this will cause net job losses either, given the lump of labour fallacy). Also, something like 60% of start-ups fail. In the case of failure, money from wealthy investors is transferred to other people and I doubt most people care if the beneficiaries are engineers or in compliance.
But discounting all that, I think what this boils down to is: when you’re paying an engineer, there’s a chance that the engineer will invent something that increases productivity and drives productivity growth (leading to cheaper prices and maybe even new industries previously thought impossible). When you pay someone in sales or marketing, you get a chance to get your product in front of customers and see it really take off. When you’re paying for regulatory compliance, you get an often-useless stamp of approval, or have to make expensive changes because some rent-seeking corporation got spurious requirements written into the regulation.
Or the regulatory agency catches a fatal flaw and averts a catastrophe. I’m not saying that never happens. Just that I think it’s much rarer than many people might believe. Seeing the grinding wheels of regulation firsthand has cured me of all my youthful idealistic approval for it. Sometimes consumers need to be protected from out of control profit-seeking, sure. But once you’ve been forced to actually do some regulatory compliance, you start to understand just how much regulation exists to prevent established companies from having to compete against new entrants. This makes everything more expensive and everyone but a few well-connected shareholders worse off.
Regulations has real trade-offs; there are definite goods, but also definite downsides. And now I think the downsides are even worse than I first predicted.