Literature, Model

Does Amateurish Writing Exist

[Warning: Spoilers for Too Like the Lightning]

What marks writing as amateurish (and whether “amateurish” or “low-brow” works are worthy of awards) has been a topic of contention in the science fiction and fantasy community for the past few years, with the rise of Hugo slates and the various forms of “puppies“.

I’m not talking about the learning works of genuine amateurs. These aren’t stories that use big words for the sake of sounding smart (and at the cost of slowing down the stories), or over the top fanfiction-esque rip-offs of more established works (well, at least not since the Wheel of Time nomination in 2014). I’m talking about that subtler thing, the feeling that bubbles up from the deepest recesses of your brain and says “this story wasn’t written as well as it could be”.

I’ve been thinking about this a lot recently because about ¾ of the way through Too Like The Lightning by Ada Palmer, I started to feel myself put off [1]. And the only explanation I had for this was the word “amateurish” – which popped into my head devoid of any reason. This post is an attempt to unpack what that means (for me) and how I think it has influenced some of the genuine disagreements around rewarding authors in science fiction and fantasy [2]. Your tastes might be calibrated differently and if you disagree with my analysis, I’d like to hear about it.

Now, there are times when you know something is amateurish and that’s okay. No one should be surprised that John Ringo’s Paladin of Shadows series, books that he explicitly wrote for himself are parsed by most people as pretty amateurish. When pieces aren’t written explicitly for the author only, I expect some consideration of the audience. Ideally the writer should be having fun too, but if they’re writing for publication, they have to be writing to an audience. This doesn’t mean that they must write exactly what people tell them they want. People can be a terrible judge of what they want!

This also doesn’t necessarily imply pandering. People like to be challenged. If you look at the most popular books of the last decade on Goodreads, few of them could be described as pandering. I’m familiar with two of the top three books there and both of them kill off a fan favourite character. People understand that life involves struggle. Lois McMaster Bujold – who has won more Hugo awards for best novel than any living author – once said she generated plots by considering “what’s the worst possible thing I can do to these people?” The results of this method speak for themselves.

Meditating on my reaction to books like Paladin of Shadows in light of my experiences with Too Like The Lightning is what led me to believe that the more technically proficient “amateurish” books are those that lose sight of what the audience will enjoy and follow just what the author enjoys. This may involve a character that the author heavily identifies with – the Marty Stu or Mary Sue phenomena – who is lovingly described overcoming obstacles and generally being “awesome” but doesn’t “earn” any of this. It may also involve gratuitous sex, violence, engineering details, gun details, political monologuing (I’m looking at you, Atlas Shrugged), or tangents about constitutional history (this is how most of the fiction I write manages to become unreadable).

I realized this when I was reading Too Like the Lightning. I loved the world building and I found the characters interesting. But (spoilers!) when it turned out that all of the politicians were literally in bed with each other or when the murders the protagonist carried out were described in grisly, unrepentant detail, I found myself liking the book a lot less. This is – I think – what spurred the label amateurish in my head.

I think this is because (in my estimation), there aren’t a lot of people who actually want to read about brutal torture-execution or literally incestuous politics. It’s not (I think) that I’m prudish. It seemed like some of the scenes were written to be deliberately off-putting. And I understand that this might be part of the theme of the work and I understand that these scenes were probably necessary for the author’s creative vision. But they didn’t work for me and they seemed like a thing that wouldn’t work for a lot of people that I know. They were discordant and jarring. They weren’t pulled off as well as they would have had to be to keep me engaged as a reader.

I wonder if a similar process is what caused the changes that the Sad Puppies are now lamenting at the Hugo Awards. To many readers, the sexualized violence or sexual violence that can find its way into science fiction and fantasy books (I’d like to again mention Paladin of Shadows) is incredibly off-putting. I find it incredibly off-putting. Books that incorporate a lot of this feel like they’re ignoring the chunk of audience that is me and my friends and it’s hard while reading them for me not to feel that the writers are fairly amateurish. I normally prefer works that meditate on the causes and uses of violence when they incorporate it – I’d put N.K. Jemisin’s truly excellent Broken Earth series in this category – and it seems like readers who think this way are starting to dominate the Hugos.

For the people who previously had their choices picked year after year, this (as well as all the thinkpieces explaining why their favourite books are garbage) feels like an attack. Add to this the fact that some of the books that started winning had a more literary bent and you have some fans of the genre believing that the Hugos are going to amateurs who are just cruising to victory by alluding to famous literary works. These readers look suspiciously on crowds who tell them they’re terrible if they don’t like books that are less focused on the action and excitement they normally read for. I can see why that’s a hard sell, even though I’ve thoroughly enjoyed the last few Hugo winners [3].

There’s obviously an inferential gap here, if everyone can feel angry about the crappy writing everyone else likes. For my part, I’ll probably be using “amateurish” only to describe books that are technically deficient. For books that are genuinely well written but seem to focus more on what the author wants than (on what I think) their likely audience wants, well, I won’t have a snappy term, I’ll just have to explain it like that.

Footnotes

[1] A disclaimer: the work of a critic is always easier than that of a creator. I’m going to be criticizing writing that’s better than my own here, which is always a risk. Think of me not as someone criticizing from on high, but frantically taking notes right before a test I hope to barely pass. ^

[2] I want to separate the Sad Puppies, who I view as people sad that action-packed books were being passed over in favour of more literary ones from the Rabid Puppies, who just wanted to burn everything to the ground. I’m not going to make any excuses for the Rabid Puppies. ^

[3] As much as I can find some science fiction and fantasy too full of violence for my tastes, I’ve also had little to complain about in the past, because my favourite author, Lois McMaster Bujold, has been reliably winning Hugo awards since before I was born. I’m not sure why there was never a backlash around her books. Perhaps it’s because they’re still reliably space opera, so class distinctions around how “literary” a work is don’t come up when Bujold wins. ^

Model, Politics, Quick Fix

The Awkward Dynamics of the Conservative Leadership Debates

Tanya Granic Allen is the most idealistic candidate I’ve ever seen take the stage in a Canadian political debate. This presents some awkward challenges for the candidates facing her, especially Mulroney and Elliot.

First, there’s the simple fact of her idealism. I think Granic Allen genuinely believes everything she says. For her, knowing what’s right and what’s wrong is simple. There isn’t a whole lot of grey. She even (bless her) probably believes that this will be an advantage come election time. People overwhelming don’t like the equivocation of politicians, so Granic Allen must assume her unequivocal moral stances will be a welcome change

For many people, it must be. Even for those who find it grating, it seems almost vulgar to attack her. It’s clear that she isn’t in this for herself and doesn’t really care about personal power. Whether she could maintain that innocence in the face of the very real need to make political compromises remains an open question, but for now she does represent a certain vein of ideological conservatism in a form that is unsullied by concerns around electability.

The problem here is that the stuff Granic Allen is pushing – “conscience rights” and “parental choice” – is exactly the sort of thing that can mobilize opposition to the PC party. Fighting against sex-ed and abortion might play well with the base, but Elliot and Mulroney know that unbridled social conservatism is one of the few things that can force the province’s small-l liberals to hold their noses and vote for the big-L Liberal Party. In an election where we can expect embarrassingly low turnout (it was 52% in 2014), this can play a major role.

A less idealistic candidate would temper themselves to help the party in the election. Granic Allen has no interest in doing this, which basically forces the pragmatists to navigate the tricky act of distancing themselves from her popular (with the base) proposals so that they might carry the general election.

Second, there’s the difficult interaction between the anti-rational and anti-empirical “common sense” conservatism pushed by Granic Allen and Ford and the pragmatic, informed conservatism of Elliot and Mulroney.

For Ford and Granic Allen, there’s a moral nature to truth. They live in a just world where something being good is enough to make it true. Mulroney and Elliot know that reality has an anti-partisan bias.

Take clean energy contracts. Elliot quite correctly pointed out that ripping up contracts willy-nilly will lead to a terrible business climate in Ontario. This is the sort of suggestion we normally see from the hard left (and have seen in practice in places the hard left idolizes, like Venezuela). But Granic Allen is committed to a certain vision of the world and in her vision of the world, government getting out of the way can’t help but be good.

Christine Elliot has (and this is a credit to her) shown that she’s not very ideological, in that she can learn how the world really works and subordinate ideology to truth, even when inconvenient. This would make her a more effective premier than either Granic Allen or Ford, but might hurt her in the leadership race. I’ve seen her freeze a couple times when she’s faced with defending how the world really works to an audience that is ideologically prevented from acknowledging the truth.

(See for example, the look on her face when she was forced to defend her vote to ban conversion therapy. Elliot’s real defense of that bill probably involves phrases like “stuck in the past”, “ignorant quacks” and “vulnerable children who need to be protected from people like you”. But she knew that a full-throated defense of gender dysphoria as a legitimate problem wouldn’t win her any votes in this race.)

As Joseph Heath has pointed out, this tension between reality and ideology is responsible for the underrepresentation of modern conservatives among academics. Since the purpose of the academy is (broadly) truth-seeking, we shouldn’t be surprised to see it select against an ideology that explicitly rejects not only the veracity of much of the products of this truth seeking (see, for example, Granic Allen’s inability to clearly state that humans are causing climate change) but the worthwhileness of the whole endeavour of truth seeking.

When everything is trivially knowable via the proper application of “common-sense”, there’s no point in thinking deeply. There’s no point in experts. You just figure out what’s right and you do it. Anything else just confuses the matter and leaves the “little guy” to get shafted by the elites.

Third, the carbon tax has produced a stark, unvoiced split between the candidates. On paper, all are opposing it. In reality, only Ford and Granic Allen seriously believe they have any chance at stopping it. I’m fairly sure that Elliot and Mulroney plan to mount a token opposition, then quickly fold when they’re reminded that raising taxes and giving money to provinces is a thing the Federal Government is allowed to do. This means that they’re counting on money from the carbon tax to balance their budget proposals. They can’t say this, because Ford and Granic Allen are forcing them to the right here, but I would bet that they’re privately using it to reassure fiscally conservative donors about the deficit.

Being unable to discuss what is actually the centrepiece of their financial plans leaves Elliot and Mulroney unable to give very good information about how they plan to balance the budget. They have to fall back on empty phrases like “line by line by line audit” and “efficiencies”, because anything else feels like political suicide.

This shows just how effective Granic Allen has been at being a voice for the grassroots. By staking out positions that resonate with the base, she’s forcing other leadership contestants to endorse them or risk losing to her. Note especially how she’s been extracting promises from Elliot and Mulroney whenever possible – normally around things she knows they don’t want to agree to but that play well with the base. By doing this, she hopes to remove much of their room to maneuver in the general election and prevent any big pivot to centre.

Whether this will work really depends on how costly politicians find breaking promises. Conventional wisdom holds that they aren’t particularly bothered by it. I wonder if Granic Allen’s idealism blinds her to this fact. I’m certainly sure that she wouldn’t break a promise except under the greatest duress.

On the left, it’s very common to see a view of politics that emphasizes pure and moral people. The problem with the system, says the communist, is that we let greedy people run it. If we just replaced them all with better people, we’d get a fair society. Granic Allen is certainly no communist. But she does seem to believe in the “just need good people” theory of government – and whether she wins or loses, she’s determined to bring all the other candidates with her.

This isn’t an incrementalist approach, which is why it feels so foreign to people like me. Granic Allen seems to be making the decision that she’d rather the Conservatives lose (again!) to the Liberals than that they win without a firm commitment to do things differently.

The conflict in the Ontario Conservative party ­– the conflict that was surfaced when his rivals torpedoed Patrick Brown – is around how far the party is willing to go to win. The Ontario Conservatives aren’t the first party to go through this. When UK Labour members picked Jeremy Corbyn, they clearly threw electability behind ideological purity.

In the Ontario PC party, Allen and Ford have clearly staked out a position emphasizing purity. Mulroney and Elliot have just as clearly chosen to emphasize success. Now it’s up to the members. I’m very interested to see what they decide.

Economics, Model, Quick Fix

Not Just Zoning: Housing Prices Driven By Beauty Contests

No, this isn’t a post about very pretty houses or positional goods. It’s about the type of beauty contest described by John Maynard Keynes.

Imagine a newspaper that publishes one hundred pictures of strapping young men. It asks everyone to send in the names of the five that they think are most attractive. They offer a prize: if your selection matches the five men most often appearing in everyone else’s selections, you’ll win $500.

You could just do what the newspaper asked and send in the names of those men that you think are especially good looking. But that’s not very likely to give you the win. Everyone’s tastes are different and the people you find attractive might not be very attractive to anyone else. If you’re playing the game a bit smarter, you’ll instead pick the five people that you think have the broadest appeal.

You could go even deeper and realize that many other people will be trying to win and so will also be trying to pick the most broadly appealing people. Therefore, you should pick people that you think most people will view as broadly appealing (which differs from picking broadly appealing people if you know something about what most people find attractive that isn’t widely known). This can go on indefinitely (although Yudkowsky’s Law of Ultrafinite Recursion states that “In practice, infinite recursions are at most three levels deep“, which gives me a convenient excuse to stop before this devolves into “I know you know I know that you know that…” ad infinitum).

This thought experiment was relevant to an economist because many assets work like this. Take gold: its value cannot to be fully explained by its prettiness or industrial usefulness; some of its value comes from the belief that someone else will want it in the future and be willing to pay more for it than they would a similarly useful or pretty metal. For whatever reason, we have a collective delusion that gold is especially valuable. Because this delusion is collective enough, it almost stops being a delusion. The delusion gives gold some of its value.

When it comes to houses, beauty contests are especially relevant in Toronto and Vancouver. Faced with many years of steadily rising house prices, people are willing to pay a lot for a house because they believe that they can unload it on someone else in a few years or decades for even more.

When talking about highly speculative assets (like Bitcoin), it’s easy to point out the limited intrinsic value they hold. Bitcoin is an almost pure Keynesian Beauty Contest asset, with most of its price coming from an expectation that someone else will want it at a comparable or better price in the future. Houses are obviously fairly intrinsically valuable, especially in very desirable cities. But the fact that they hold some intrinsic value cannot by itself prove that none of their value comes from beliefs about how much they can be unloaded for in the future – see again gold, which has value both as an article of commerce and as a beauty contest asset.

There’s obviously an element of self-fulfilling prophecy here, with steadily increasing house prices needed to sustain this myth. Unfortunately, the housing market seems especially vulnerable to this sort of collective mania, because the sunk cost fallacy makes many people unwilling to sell their houses at a price below what they paid for it. Any softening of the market removes sellers, which immediately drives up prices again. Only a massive liquidation event, like we saw in 2007-2009 can push enough supply into the market to make prices truly fall.

But this isn’t just a self-fulfilling prophecy. There’s deliberateness here as well. To some extent, public policy is used to guarantee that house prices continue to rise. NIMBY residents and their allies in city councils deliberately stall projects that might affect property values. Governments provide tax credits or access to tax-advantaged savings accounts for homes. In America, mortgage payments provide a tax credit!

All of these programs ultimately make housing more expensive wherever supply cannot expand to meet the artificially increased demand – which basically describes any dense urban centre. Therefore, these home buying programs fail to accomplish their goal of making house more affordable, but do serve to guarantee that housing prices will continue to go up. Ultimately, they really just represent a transfer of wealth from taxpayers generally to those specific people who own homes.

Unfortunately, programs like this are very sticky. Once people buy into the collective delusion that home prices must always go up, they’re willing to heavily leverage themselves to buy a home. Any dip in the price of homes can wipe out the value of this asset, making it worth less than the money owed on it. Since this tends to make voters very angry (and also lead to many people with no money) governments of all stripes are very motivated to avoid it.

This might imply that the smart thing is to buy into the collective notion that home prices always go up. There are so many people invested in this belief at all levels of society (banks, governments, and citizens) that it can feel like home prices are too important to fall.

Which would be entirely convincing, except, I’m pretty sure people believed that in 2007 and we all know how that ended. Unfortunately, it looks like there’s no safe answer here. Maybe the collective mania will abate and home prices will stop being buoyed ever upwards. Or maybe they won’t and the prices we currently see in Toronto and Vancouver will be reckoned cheap in twenty years.

Better zoning laws can help make houses cheaper. But it really isn’t just zoning. The beauty contest is an important aspect of the current unaffordability.

Economics, Model

Against Job Lotteries

In simple economic theory, wages are supposed to act as signals. When wages increase in a sector, it should signal people that there’s lots of work to do there, incentivizing training that will be useful for that field, or causing people to change careers. On the flip side, when wages decrease, we should see a movement out of that sector.

This is all well and good. It explains why the United States has seen (over the past 45 years) little movement in the number of linguistics degrees, a precipitous falloff in library sciences degrees, some decrease in English degrees, and a large increase in engineering and business degrees [1].

This might be the engineer in me, but I find things that are working properly boring. What I’m really interested in is when wage signals break down and are replaced by a job lottery.

Job lotteries exist whenever there are two tiers to a career. On one hand, you’ll have people making poverty wages and enduring horrendous conditions. On the other, you’ll see people with cushy wages, good job security, and (comparatively) reasonable hours. Job lotteries exist in the “junior doctor” system of the United Kingdom, in the academic system of most western countries, and teaching in Ontario (up until very recently). There’s probably a much less extreme version of this going on even in STEM jobs (in that many people go in thinking they’ll work for Google or the next big unicorn and end up building websites for the local chamber of commerce or writing internal tools for the company billing department [2]). A slightly different type of job lottery exists in industries where fame plays a big role: writing, acting, music, video games, and other creative endeavours.

Job lotteries are bad for two reasons. Compassionately, it’s really hard to see idealistic, bright, talented people endure terribly conditions all in the hope of something better, something that might never materialize. Economically, it’s bad when people spend a lot of time unemployed or underemployed because they’re hopeful they might someday get their dream job. Both of these reasons argue for us to do everything we can to dismantle job lotteries.

I do want to make a distinction between the first type of job lottery (doctors in the UK, professor, teachers), which is a property of how institutions have happened to evolve, and the second, which seems much more inherent to human nature. “I’ll just go with what I enjoy” is a very common media strategy that will tend to split artists (of all sorts) into a handful of mega-stars, a small group of people making a modest living, and a vast mass of hopefuls searching for their break. To fix this would require careful consideration and the building of many new institutions – projects I think we lack the political will and the know-how for.

The problems in the job market for professors, doctors, or teachers feel different. These professions don’t rely on tastemakers and network effects. There’s also no stark difference in skills that would imply discontinuous compensation. This doesn’t imply that skills are flat – just that they exist on a steady spectrum, which should imply that pay could reasonably follow a similar smooth distribution. In short, in all of these fields, we see problems that could be solved by tweaks to existing institutions.

I think institutional change is probably necessary because these job lotteries present a perfect storm of misdirection to our primate brains. That is to say (1) People are really bad at probability and (2) the price level for the highest earners suggests that lots of people should be entering the industry. Combined, this means that people will be fixated on the highest earners, without really understanding how unlikely that is to be them.

Two heuristics drive our inability to reason about probabilities: the representativeness heuristic (ignoring base rates and information about reliability in favour of what feels “representative”) and the availability heuristic (events that are easier to imagine or recall feel more likely). The combination of these heuristics means that people are uniquely sensitive to accounts of the luckiest members of a profession (especially if this is the social image the profession projects) and unable to correctly predict their own chances of reaching that desired outcome (because they can imagine how they will successfully persevere and make everything come out well).

Right now, you’re probably laughing to yourself, convinced that you would never make a mistake like this. Well let’s try an example.

Imagine a scenario is which only ten percent of current Ph. D students will get tenure (basically true). Now Ph. D students are quite bright and are incredibly aware of their long odds. Let’s say that if a student three years into a program makes a guess as to whether or not they’ll get a tenure track job offer, they’re correct 80% of the time. If a student tells you they think they’ll get a tenure track job offer, how likely do you think it is that they will? Stop reading right now and make a guess.

Seriously, make a guess.

This won’t work if you don’t try.

Okay, you can keep reading.

It is not 80%. It’s not even 50%. It’s 31%. This is probably best illustrated visually.

Craft Design Online has inadvertently created a great probability visualization tool.

 

There are four things that can happen here (I’m going to conflate tenure track job offers with tenure out of a desire to stop typing “tenure track job offers”).

Ten students will get tenure. Of these ten, eight (0.8 x 10) will correctly believe they will get it (1/green) and two (10 – 0.8 x 10) will incorrectly believe they won’t (2/yellow). Ninety students won’t get tenure. Of these 90, 18 (90 – 0.8 x 90) will incorrectly believe they will get tenure (3/orange) and 72 (0.8 x 90) will correctly believe they won’t get tenure (4/red). Twenty-six students, those coloured green (1) and orange (3) believe they’ll get tenure. But we know that only eight of them really will – which works out to just below the 31% I gave above.

Almost no one can do this kind of reasoning, especially if they aren’t primed for a trick. The stories we build in our head about the future feel so solid that we ignore the base rate. We think that we’ll know if we’re going to make it. And even worse, we think that a feeling of “knowing” if we’ll make it provides good information. We think that relatively accurate predictors provide useful information against a small chance. They clearly don’t. When the base rate is small (here 10%), the base rate is the single greatest predictor of your chances.

But this situation doesn’t even require small chances for us to make mistakes. Imagine you had two choices: a career that leaves you feeling fulfilled 100% of the time, but is so competitive that you only have an 80% chance of getting into it (assume in the other 20%, you either starve or work a soul-crushing fast food job with negative fulfillment) or a career where you are 100% likely to get a job, but will only find it fulfilling 80% of the time.

Unless that last 20% of fulfillment is strongly super-linear [3][4], or you don’t have any value at all on eating/avoiding McDrugery, it is better to take the guaranteed career. But many people looking at this probably rounded 80% to 100% – another known flaw in human reasoning. You can very easily have a job lottery even when the majority of people in a career are in the “better” tier of the job, because many entrants to the field will view “majority” as all and stick with it when they end up shafted.

Now, you might believe that these problems aren’t very serious, or that surely people making a decision as big as a college major or career would correct for them. But these fallacies date to the 70s! Many people still haven’t heard of them. And the studies that first identified them found them to be pretty much universal. Look, the CIA couldn’t even get people to do probability right. You think the average job seeker can? You think you can? Make a bunch of predictions for the next year and then talk with me when you know how calibrated (or uncalibrated) you are.

If we could believe that people would become better at probabilities, we could assume that job lotteries would take care of automatically. But I think it is clear that we cannot rely on that, so we must try and dismantle them directly. Unfortunately, there’s a reason many are this way; many of them have come about because current workers have stacked the deck in their own favour. This is really great for them, but really bad for the next group of people entering the workforce. I can’t help but believe that some of the instability faced by millennials is a consequence of past generations entrenching their benefits at our expense [5]. Others have come about because of poorly planned policies, bad enrolment caps, etc.

These cover the two ways we can deal with a job lottery, we can limit the supply indirectly (by making the job, or the perception of the job once you’ve “made it” worse), or limit the supply directly (by changing the credentials necessary of the job, or implementing other training caps)   . In many of the examples of job lotteries I’ve found, limiting the supply directly might be a very effective way to deal with the problem.

I can make this claim because limiting supply directly has worked in the real world. Faced with a chronic 33% oversupply of teachers and soaring unemployment rates among teaching graduates, Ontario chose to cut in half the number of slots in teacher’s college and double the length of teacher’s college programs. No doubt this was annoying for the colleges, which made good money off of those largely doomed extraneous pupils, but it did lead to the end of the oversupply of teachers and a tighter job market for teachers and this was probably better for the economy compared to the counterfactual.

Why? Because having people who’ve completed four years of university do an extra year or two of schooling only to wait around and hope for a job is a real drag. They could be doing something productive with that time! The advantage of increasing gatekeeping around a job lottery and increasing it as early as possible is that you force people to go find something productive to do. It is much better for an economy to have hopeful proto-teachers who would in fact be professional resume submitters go into insurance, or real estate, or tutoring, or anything at all productive and commensurate with their education and skills.

There’s a cost here, of course. When you’re gatekeeping (for e.g. teacher’s college or medical school), you’re going to be working with lossy proxies for the thing you actually care about, which is performance in the eventual job. The lossier the proxy, the more you are needlessly depressing the quality of people who are allowed to do the job – which is a serious concern when you’re dealing with heart surgery ­– or the people providing foundational education to your next generation.

You can also find some cases where increasing selectiveness in an early stage doesn’t successfully force failed applicants to stop wasting their time and get on with their life. I was very briefly enrolled in a Ph. D program for biomedical engineering a few years back. Several professors I interviewed with while considering graduate school wanted to make sure I had no aspirations on medical school – because they were tired of their graduate students abandoning research as soon as their Ph. D was complete. For these students who didn’t make it into medical school after undergrad, a Ph. D was a ticket to another shot at getting in [6]. Anecdotally, I’ve seen people who fail to get into medical school or optometry get a master’s degree, then try again.

Banning extra education before medical school cuts against the idea that people should be able to better themselves, or persevere to get to their dreams. It would be institutionally difficult. But I think that it would, in this case, probably be a net good.

There are other fields where limiting supply is rather harmful. Graduate students are very necessary for science. If we punitively limited their number, we might find a lot of valuable scientific progress falling to a stand-still. We could try and replace graduate students with a class of professional scientific assistants, but as long as the lottery for professorship is so appealing (for those who are successful), I bet we’d see a strong preference for Ph. D programs over professional assistantships.

These costs sometimes make it worth it to go right to the source of the job lottery, the salaries and benefits of people already employed [7]. Of course, this has its own downsides. In the case of doctors, high salaries and benefits are useful for making really clever applicants choose to go into medicine rather than engineering and law. For other jobs, there’s the problems of practicality and fairness.

First, it is very hard to get people to agree to wage or benefit cuts and it almost always results in lower morale – even if you have “sound macro-economic reasons” for it. In addition, many jobs with lotteries have them because of union action, not government action. There is no czar here to change everything. Second, people who got into those careers made those decisions based on the information they had at the time. It feels weird to say “we want people to behave more rationally in the job market, so by fiat we will change the salaries and benefits of people already there.” The economy sometimes accomplishes that on its own, but I do think that one of the roles of political economics is to decrease the capriciousness of the world, not increase it.

We can of course change the salaries and benefits only for new employees. But this somewhat confuses the signalling (for a long time, people will still have principle examples of the profession come from the earlier cohort). It also rarely alleviates a job lottery, because in practice people set this up for new employees to have reduced salaries and benefits for a time. Once they get seniority, they’ll expect to enjoy all the perks of seniority.

Adjunct professorships feel like a failed attempt to remove the job lottery for full professorships. Unfortunately, they’ve only worsened it, by giving people a toe-hold that makes them feel like they might someday claw their way up to full professorship. I feel that when it comes to professors, the only tenable thing to do is greatly reduce salaries (making them closer to the salary progression of mechanical engineers, rather than doctors), hire far more professors, cap graduate students wherever there is high under- and un- employment, and have more professional assistants who do short 2-year college courses. Of course, this is easy to say and much harder to do.

If these problems feel intractable and all the solutions feel like they have significant downsides, welcome to the pernicious world of job lotteries. When I thought of writing about them, coming up with solutions felt like by far the hardest part. There’s a complicated trade-off between proportionality, fairness, and freedom here.

Old fashioned economic theory held that the freer people were, the better off they would be. I think modern economists increasingly believe this is false. Is a world in which people are free to get very expensive training ­– despite very long odds for a job and cognitive biases that make understanding just how punishing the odds are – expensive training, in short, that they’d in expectation be better off without, a better one than a world where they can’t?

I increasingly believe that it isn’t. And I increasingly believe that having rough encounters with reality early on and having smooth salary gradients is important to prevent this world. Of course, this is easy for me to say. I’ve been very deliberate taking my skin out of job lotteries. I dropped out of graduate school. I write often and would like to someday make money off of writing, but I viscerally understand the odds of that happening, so I’ve been very careful to have a day job that I’m happy with [8].

If you’re someone who has made the opposite trade, I’m very interested in hearing from you. What experiences do you have that I’m missing that allowed you to make that leap of faith?

Footnotes:

[1] I should mention that there’s a difference between economic value, normative/moral value, and social value and I am only talking about economic value here. I wouldn’t be writing a blog post if I didn’t think writing was important. I wouldn’t be learning French if I didn’t think learning other languages is a worthwhile endeavour. And I love libraries.

And yes, I know there are many career opportunities for people holding those degrees and no I don’t think they’re useless. I simply think a long-term shift in labour market trends have made them relatively less attractive to people who view a degree as a path to prosperity. ^

[2] That’s not to knock these jobs. I found my time building internal tools for an insurance company to be actually quite enjoyable. But it isn’t the fame and fortune that some bright-eyed kids go into computer science seeking. ^

[3] That is to say, that you enjoy each additional percentage of fulfillment at a multiple (greater than one) of the previous one. ^

[4] This almost certainly isn’t true, given that the marginal happiness curve for basically everything is logarithmic (it’s certainly true for money and I would be very surprised if it wasn’t true for everything else); people may enjoy a 20% fulfilling career twice as much as a 10% fulfilling career, but they’ll probably enjoy a 90% fulfilling career very slightly more than an 80% fulfilling career. ^

[5] It’s obvious that all of this applies especially to unions, which typically fight for seniority to matter quite a bit when it comes to job security and pay and do whatever they can to bid up wages, even if that hurts hiring. This is why young Canadians end up supporting unions in theory but avoiding them in practice. ^

[6] I really hope that this doesn’t catch on. If an increasing number of applicants to medical school already have graduate degrees, it will be increasingly hard for those with “merely” an undergraduate degree to get in to medical school. Suddenly we’ll be requiring students to do 11 years of potentially useless training, just so that they can start the multi-year training to be a doctor. This sort of arms race is the epitome of wasted time.

In many European countries, you can enter medical school right out of high school and this seems like the obviously correct thing to do vis a vis minimizing wasted time. ^

[7] The behaviour of Uber drivers shows job lotteries on a small scale. As Uber driver salaries rise, more people join and all drivers spend more time waiting around, doing nothing. In the long run (here meaning eight weeks), an increase in per-trip costs leads to no change whatsoever in take home pay.

The taxi medallion system that Uber has largely supplanted prevented this. It moved the job lottery one step further back, with getting the medallion becoming the primary hurdle, forcing those who couldn’t get one to go work elsewhere, but allowing taxi drivers to largely avoid dead times.

Uber could restrict supply, but it doesn’t want to and its customers certainly don’t want it to. Uber’s chronic driver oversupply (relative to a counterfactual where drivers waited around very little) is what allows it to react quickly during peak hours and ensure there’s always an Uber relatively close to where anyone would want to be picked up. ^

[8] I do think that I would currently be a much better writer if I’d instead tried to transition immediately to writing, rather than finding a career and writing on the side. Having a substantial safety net removes almost all of the urgency that I’d imagine I’d have if I was trying to live on (my non-existent) writing income.

There’s a flip side here too. I’ve spent all of zero minutes trying to monetize this blog or worrying about SEO, because I’m not interested in that and I have no need to. I also spend zero time fretting over popularizing anything I write (again, I don’t enjoy this). Having a security net makes this something I do largely for myself, which makes it entirely fun. ^

Advice, Model

Improvement Without Superstition

[7 minute read]

When you make continuous, incremental improvements to something, one of two things can happen. You can improve it a lot, or you can fall into superstition. I’m not talking about black cats or broken mirrors, but rather humans becoming addicted to whichever steps were last seen to work, instead of whichever steps produce their goal.

I’ve seen superstition develop first hand. It happened in one of the places you might least expect it – in a biochemistry lab. In the summer of 2015, I found myself trying to understand which mutants of a certain protein were more stable than the wildtype. Because science is perpetually underfunded, the computer that drove the equipment we were using was ancient and frequently crashed. Each crash wiped out an hour or two of painstaking, hurried labour and meant we had less time to use the instrument to collect actual data. We really wanted to avoid crashes! Therefore, over the course of that summer, we came up with about 12 different things to do before each experiment (in sequence) to prevent them from happening.

We were sure that 10 out of the 12 things were probably useless, we just didn’t know which ten. There may have been no good reason that opening the instrument, closing, it, then opening it again to load our sample would prevent computer crashes, but as far as we could tell when we did that, the machine crashed far less. It was the same for the other eleven. More self-aware than I, the graduate student I worked with joked to me: “this is how superstitions get started” and I laughed along. Until I read two articles in The New Yorker.

In The Score (How Childbirth Went Industrial), Dr. Atul Gawande talks about the influence of the Apgar score on childbirth. Through a process of continuous competition and optimization, doctors have found out ways to increase the Apgar scores of infants in their first five minutes of life – and how to deal with difficult births in ways that maximize their Apgar scores. The result of this has been a shocking (six-fold) decrease in infant mortality. And all of this is despite the fact that according to Gawande, “[in] a ranking of medical specialties according to their use of hard evidence from randomized clinical trials, obstetrics came in last. Obstetricians did few randomized trials, and when they did they ignored the results.”

Similarly, in The Bell Curve (What happens when patients find out how good their doctors really are), Gawande found that the differences between the best CF (cystic fibrosis) treatment centres and the rest turned out to hinge on how rigorously each centre followed the guidelines established by big clinical trials. That is to say, those that followed the accepted standard of care to the letter had much lower survival rates than those that hared off after any potentially lifesaving idea.

It seems that obstetricians and CF specialists were able to get incredible results without too much in the way of superstitions. Even things that look at first glance to be minor superstitions often turned out not to be. For example, when Gawande looked deeper into a series of studies that showed forceps were as good as or better than Caesarian sections, he was told by an experienced obstetrician (who was himself quite skilled with forceps) that these trials probably benefitted from serious selection effects (in general, only doctors particularly confident in their forceps skills volunteer for studies of them). If forceps were used on the same industrial scale as Caesarian sections, that doctor suspected that they’d end up worse.

But I don’t want to give the impression that there’s something about medicine as a field that allows doctors to make these sorts of improvements without superstition. In The Emperor of all Maladies, Dr. Siddhartha Mukherjee spends some time talking about the now discontinued practices of “super-radical” mastectomy and “radical” chemotherapy. In both treatments, doctors believed that if some amount of a treatment was good, more must be better. And for a while, it seemed better. Cancer survival rates improved after these procedures were introduced.

But randomized controlled trials showed that there was no benefit to those invasive, destructive procedures beyond that offered by their less-radical equivalents. Despite this evidence, surgeons and oncologists clung to these treatments with an almost religious zeal, long after they should have given up and abandoned them. Perhaps they couldn’t bear to believe that they had needlessly poisoned or maimed their patients. Or perhaps the superstition was so strong that they felt they were courting doom by doing anything else.

The simplest way to avoid superstition is to wait for large scale trials. But from both Gawande articles, I get a sense that matches with anecdotal evidence from my own life and that of my friends. It’s the sense that if you want to do something, anything, important – if you want to increase your productivity or manage your depression/anxiety, or keep CF patients alive – you’re likely to do much better if you take the large scale empirical results and use them as a springboard (or ignore them entirely if they don’t seem to work for you).

For people interested in nootropics, melatonin, or vitamins, there’s self-blinding trials, which provide many of the benefits of larger trials without the wait.  But for other interventions, it’s very hard to effectively blind yourself. If you want to see if meditation improves your focus, for example, then you can’t really hide the fact that you meditated on certain days from yourself [1].

When I think about how far from the established evidence I’ve gone to increase my productivity, I worry about the chance I could become superstitious.

For example, trigger-action plans (TAPs) have a lot of evidence behind them. They’re also entirely useless to me (I think because I lack a visual imagination with which to prepare a trigger) and I haven’t tried to make one in years. The Pomodoro method is widely used to increase productivity, but I find I work much better when I cut out the breaks entirely – or work through them and later take an equivalent amount of time off whenever I please. I use pomos only as a convenient, easy to Beemind measure of how long I worked on something.

I know modest epistemologies are supposed to be out of favour now, but I think it can be useful to pause, reflect, and wonder: when is one like the doctors saving CF patients and when is one like the doctors doing super-radical mastectomies? I’ve written at length about the productivity regime I’ve developed. How much of it is chaff?

It is undeniable that I am better at things. I’ve rigorously tracked the outputs on Beeminder and the graphs don’t lie. Last year I averaged 20,000 words per month. This year, it’s 30,000. When I started my blog more than a year ago, I thought I’d be happy if I could publish something once per month. This year, I’ve published 1.1 times per week.

But people get better over time. The uselessness of super-radical mastectomies was masked by other cancer treatments getting better. Survival rates went up, but when the accounting was finished, none of that was to the credit of those surgeries.

And it’s not just uselessness that I’m worried about, but also harm; it’s possible that my habits have constrained my natural development, rather than promoting it. This has happened in the past, when poorly chosen metrics made me fall victim to Campbell’s Law.

From the perspective of avoiding superstition: even if you believe that medicine cannot wait for placebo controlled trials to try new, potentially life-saving treatments, surely you must admit that placebo controlled trials are good for determining which things aren’t worth it (take as an example the very common knee surgery, arthroscopic partial meniscectomy, which has repeatedly performed no better than sham surgery when subjected to controlled trials).

Scott Alexander recently wrote about an exciting new antidepressant failing in Stage I trials. When the drug was first announced, a few brave souls managed to synthesize some. When they tried it, they reported amazing results, results that we now know to have been placebo. Look. You aren’t getting an experimental drug synthesized and trying it unless you’re pretty familiar with nootropics. Is the state of self-experimentation really that poor among the nootropics community? Or is it really hard to figure out if something works on you or not [2]?

Still, reflection isn’t the same thing as abandoning the inside view entirely. I’ve been thinking up heuristics since I read Dr. Gawande’s articles; armed with these, I expect to have a reasonable shot at knowing when I’m at risk of becoming superstitious. They are:

  • If you genuinely care only about the outcome, not the techniques you use to attain it, you’re less likely to mislead yourself (beware the person with a favourite technique or a vested interest!).
  • If the thing you’re trying to improve doesn’t tend to get better on its own and you’re only trying one potentially successful intervention at a time, fewer of your interventions will turn out to be superstitions and you’ll need to prune less often (much can be masked by a steady rate of change!).
  • If you regularly abandon sunk costs (“You abandon a sunk cost. You didn’t want to. It’s crying.”), superstitions do less damage, so you can afford to spend less mental effort on avoid them.

Finally, it might be that you don’t care that some effects are placebo, so long as you get them and get them repeatedly. That’s what happened with the experiment I worked on that summer. We knew we were superstitious, but we didn’t care. We just needed enough data to publish. And eventually, we got it.

[Special thanks go to Tessa Alexanian, who provided incisive comments on an earlier draft. Without them, this would be very much an incoherent mess. This was cross-posted on Less Wrong 2.0 and as of the time of posting it here, there’s at least one comment over there.]

Footnotes:

[1] Even so, there are things you can do here to get useful information. For example, you could get in the habit of collecting information on yourself for a month or so (like happiness, focus, etc.), then try several combinations of interventions you think might work (e.g. A, B, C, AB, BC, CA, ABC, then back to baseline) for a few weeks each. Assuming that at least one of the interventions doesn’t work, you’ll have a placebo to compare against. Although be sure to correct any results for multiple comparisons. ^

[2] That people still buy anything from HVMN (after they rebranded themselves in what might have been an attempt to avoid a study showing their product did no better than coffee) actually makes me suspect the latter explanation is true, but still. ^

Model, Politics

Four Narratives on Mohammed Bin Salman

[10 minute read]

Since June 21st of this year, Mohammed bin Salman (often known by his initials, MBS) has been the crown prince of Saudi Arabia. This required what was assuredly not a palace coup, because changes of government or succession are never coups, merely “similar to coups”, “coup-like”, “coup-esque”, or “coupLite™” [1]. As crown prince, MBS has championed a loosening of religious restrictions on women and entertainment, a decrease in reliance on oil for state revenues, and a harder line with Qatar and Iran.

Media coverage has been, uh, split. Here’s an editorial in The Washington Post comparing MBS to Putin, while an editorial in The New York Times fawningly declares “Saudi Arabia’s Arab Spring, at Last” [2]. Given that there’s so much difference in opinion on MBS, I thought it might be useful to collect and summarize some of the common narratives, before giving my own perspective on the man.

MBS as the Enlightened Despot

Historical Archetype: Frederick the Great.
Proponents: Al Arabiya [3], optimistic western journalists.
Don’t talk to them about: The war in Yemen, the blockade of Qatar, the increased stifling of dissent.

Exemplified by the fawning column above, this school of thought holds that MBS is a dynamic young leader who will reform the Saudi economy, end its dependence on oil, overhaul its institutions, end corruption, and “restore” a more moderate form of Islam.

They point to several initiatives that back this up. There’s the Vision 2030 plan that aims to spur entrepreneurship and reduce corruption. There’s much needed educational reforms. There’s the decision to allow women to drive and view sports games. There’s the lifting of bans on entertainment. For some of them, the ambiguous clamp-down on “corruption” is even further evidence that MBS is very serious about his reforms.

To supporters, MBS has achieved much in very little time, which they take to be clear evidence of a strong work ethic and a keen intelligence. His current crop of reforms gives them clear hope that clerical power can be shattered and Saudi Arabia can one day become a functioning, modern, democracy.

MBS as a character in Game of Thrones

Historical Archetype: Richard Nixon
Proponents: Cynical western journalists, Al Jazeera
Don’t talk to them about: How real-life politics is never actually as interesting or well planned as Game of Thrones.

Cersei Lannister’s quotable warning, that “when you play a game of thrones you win or you die” might imply that MBS is on somewhat shaky ground. Proponents of the first view might dispute that and proponents of the next rejoice in it. Proponents of this view point out that so far, MBS seems to be winning.

By isolating Qatar and launching a war in Yemen, he has checked Iranian influence on the Arabian Peninsula. Whether or not it’s valid, his corruption crackdown has sidelined many potential sources of competition (and will probably net much needed liquid cash for the state coffers; it is ironic that Saudi state now turns to sources of liquidity other than the literal liquid that made it so rich). His conflict with Qatar might yet result in the shutdown of Al Jazeera, the most popular TV channel in the Arabic speaking world and long a thorn in the side of Saudi Arabian autocracy.

People who view the conflict through this lens either aren’t particularly concerned with right or wrong (e.g. westerners who just want to get their realpolitik fix) or think that the very fact that MBS might be engaging in HBO worthy realpolitik proves he is guilty of a grave crime (e.g. Al Jazeera, westerners worrying that the region might become even more unstable).

MBS as an overreaching tyrant

Historical Archetype: Joseph II (epitaph: “Here lies Joseph II, who failed in all he undertook.”)
Proponents: Arab spring activists and their allies
Don’t talk to them about: How much better MBS is than any plausible alternative.

Saudi Arabia is a rentier state with an unusual relationship with its population. Saudi state revenues are not derived from taxation (which almost invariably results in calls for responsible government), but instead from oil money. This money is distributed back to citizens via cushy government jobs. In Saudi Arabia, two-thirds of citizen employment is in the public sector. The private sector is almost wholly the purview of expats, who (if I’m reading the latest official Saudi employment report right) hold 75% of the non-governmental jobs [4].

With oil set to become obsolete in the next fifty years, Saudi Arabia is in a very bad position. The only thing that can save it is a diversified economy, but the path there isn’t smooth. Overarching reform of an economy is difficult and normally relies on extensive, society-wide consultation. Proponents of this theory see MBS as intent on centralizing power so that he can achieve this transformation single-handedly.

They note that the reversal of the ban on women driving has been paired with intense pressure on the very activists who originally agitated for its removal, pressure to say nothing and to avoid celebrations. They also note that the anti-corruption sweep conveniently removes many people who could have stood in MBS’s way as he embarks on his reforms and expropriates their wealth for the state [5]. They note that independent economists and other civil society figures – just the sort of people who could have provided (and did provide) nuanced feedback on Vision 2030 – have found themselves suddenly detained on MBS’s orders.

Proponents of this theory believe that MBS is trying to modernize Saudi Arabia, but that he is doomed to fail in his attempts without building a (possibly democratic) consensus around the direction of the kingdom. They believe that Saudi Arabia cannot have the civil society necessary for reform until the government stops viewing rights as something it gives the citizens (and that they must be grateful for), but as an inherent human birthright.

If you believe this, you’ll most likely see MBS as moving the kingdom further from this ideal. And you might see the invasion and ongoing war in Yemen as the sort of cluster-fuck we can expect from MBS’s too-rapid attempts to accumulate and use power.

My View

I would first like to note that one advantage of caricaturing other views then providing a synthesis is that you get to appear reasonable and nuanced by comparison. I’m going to claim that as my reward for going through the work to post this, but please do remember that other people have nuanced views too. I got where I am by reading or listening to them!

My overarching concern with respect to Saudi Arabia is checking the spread of Wahhabi fundamentalism. Saudi Arabia has been exporting this world-wide, with disastrous effects. Wahhabism may not be the official ideology of the so-called Islamic State (Daesh), but it is inextricably tied to their barbarism. Or rather, their barbarity is inextricably tied to and influenced by Wahhabism. It is incredibly easy to find articles by authors, Muslim or not, (many by academics) marking the connection between Wahhabism and terrorism.

The takfiri impulses of Wahhabism [6] underlie the takfiri doctrine so beloved of Daesh. Of course, the vast, vast majority of Wahhabis engage in neither terrorism, nor public executions of (by Canadian standards) innocent people. But insofar as those things do happen in the Sunni world, Wahhabi men are unusually likely to be the perpetrators. It is tempting to go further, to claim that conservatives are wrong – that there is no Islamic terrorism problem, merely a Wahhabi terrorism problem [7] – but this would be false.

(There is terrorism conducted by Shia Muslims and by other Sunni sects and to call terrorism a solely Wahhabi problem makes it sound like there are no peaceful Wahhabis. A much more accurate (and universal, as this is true across almost all religions and populations) single cause would be masculinity, as almost all terrorists are men.)

Still, the fact that so much terrorism can be traced back to a close western ally [8] is disquieting and breeds some amount of distrust of the west in some parts of the Islamic world (remember always that Muslim are the primary victims of Islamic terrorism; few have better reasons to despise Islamic terrorism than the terrorists’ co-religionists and most-frequent victims).

Beyond terrorist groups like Daesh, Wahhabism fuels sectarian conflicts, strips rights from women, makes life even more dangerous for queer people in Muslim countries, and leads to the arrest and persecution of atheists. I am in a general a staunch liberal and I believe that most religions can coexist peacefully and many represent paths towards human flourishing. I do not believe this about Wahhabism. It stifles flourishing and breeds misery wherever it lands. It must be stopped.

The fact that Wahhabism at home is a problem for MBS (the Wahhabi clergy is an alternative, non-royal power centre that he can’t directly control) could give me some hope that he might stop supporting Wahhabism. Certainly he has made statements to that effect. But it is very unclear if he has any real interest in ending Saudi Arabia $100 billion-dollar effort to export Wahhabism abroad. I would be unsurprised if he deals with the domestic problems inherent in displacing the clergy (i.e. they might not want to be displaced without a messy fight) by sending the most reticent and troublesome members abroad, where they won’t mess up his own plans.

There’s the added wrinkle of Iran. MBS clearly hates Iran and Wahhabism considers Iranian Shiites heretical by default. MBS could easily hold onto Wahhabism abroad simply for its usefulness in checking Iranian influence.

Second to this concern is my concern for the human rights of Yemenis. MBS launched a war that has been marked by use of cluster munitions and flagrant disregard for civilian casualties. MBS instigated this war and was defense minister for much of its duration. Its existence and his utter failure to hold his troops to humanitarian standards is a major black mark against him.

Finally, I care about human rights inside Saudi Arabia. It seems clear that in general, the human rights situation inside the country will improve with MBS in power. There really doesn’t exist a plausible power centre that is more likely to make the average Saudi freer. That said, MBS has detained activists and presided over the death sentence of peaceful protestors.

The average Saudi who does not rock the boat may see her life improve. But the activists who have struggled for human rights will probably not be able to enjoy them themselves.

What this means is that MBS is better than almost all plausible replacements (in the short-term), but he is by no means a good leader, or a morally upstanding individual. In the long term, he might stunt the very civil society that Saudi Arabia needs to become a society that accepts and promotes human flourishing [9]. And if he fails in his quest to modernize Saudi society, we’re much more likely to see unrest, repression, and a far worse regime than we are to see democratic change.

In the long run, we’re all dead. But before that, Saudi Arabia may be in for some very uncomfortable changes.

Footnotes

[1] As near as I can tell, the change was retroactively made all proper with the Allegiance Council, as soon as the fait was truly accompli. Reports that they approved it beforehand seem to come only from sources with a very vested interest in that being true. ^

[2] There’s something deeply disturbing about a major news organization comparing a change in which unelected despot will lead a brutal dictatorship with a movement that earnestly strove for democratic change. ^

[3] A note on news outlets linked to throughout this post: Al Arabiya is owned by Saudi Arabia and therefore tends to view everything Saudi Arabia does in the best possible light. Al Jazeera is owned by Qatar (which is currently being blockaded by Saudi Arabia) and tends to view the kingdom in the worst possible light. The Arab Tyrants Manual Podcast that informed my own views here is produced by Iyad El-Baghdadi, who was arrested for his Arab Spring reporting by The United Arab Emirates (a close ally of Saudi Arabia) and later exiled. This has somewhat soured his already dim view on Arab dictatorships. ^

[4] Foreigners make up about 53% of the total labour force and almost all of them work in the private sector. Saudis holding private jobs are ~15.5% of the labour force based on these numbers. If we divide 15.5% by 53% plus 15.5%, we get 22% of private jobs held by Saudis. I think for purposes of this comparison, Saudi Aramco, the state oil giant, counts as the public sector.

Remember also that Saudi Arabia has a truly dismal adult labour force participation rate, a side of effect of their deeply misogynistic public policy. ^

[5] Furthermore, they point out that it is basically impossible to tell if a Saudi royal is corrupt or not, because there is no clear boundary between the personal fortune of the Saud dynasty and the state coffers. Clearing up this particular ambiguity seems low on the priority list of a man who just bought a half-billion dollar yacht.

(If you’re not too lazy to click on a footnote, but are too lazy to click on a link, it was MBS. MBS bought the giant yacht. Spoilers.) ^

[6] I’ve long held the belief that Wahhabism is dangerous. When talking about this with my Muslim friends, I was often hesitant and apologetic. I needn’t have been. Their vehemence in criticism of Wahhabism often outstripped mine. That was because they had all of my reasons to dislike Wahhabism, plus the unique danger takfir presented to them.

Takfir is the idea that Wahhabis (or their ideological descendants) may deem other Muslims to be infidels if they do not follow Wahhabism’s austere commandments. This often leads to the execution or lynching of more moderate Muslims at the hands of takfiris. As you may have guessed, most North American Muslims could be called takfir by Wahhabis or others of their ilk.

Remember: there are Quranic rules of conduct (oft broken, but still existing) that govern how ISIL may treat Christians or Jews. With those they declare takfir, there are no such niceties. Daesh ecstatically executes Muslims they deem takfir.

Takfir is one of the many reasons that it is easy to find articles by Muslim authors decrying Wahhabism. Many Muslims legitimately fear a form of Islam that would happily deem them heretical and execute them. ^

[7] It is commonly reported that 15 of the 19 September 11 hijackers were Saudi men, brought up on Wahhabism. The link between Wahhabism, takfir, and terrorism is another reason it is common to find non-Wahhabi Muslims opposed to Wahhabism. Here’s a sampling of English language reporting on Daesh from Muslim countries. Indeed, in many sources I’ve read, the word takfiri was exclusively followed by “terrorist” or “terrorists”. ^

[8] It remains baffling and disgusting that politicians like Donald Trump, Teresa May, and Justin Trudeau can claim to oppose terrorism, while also maintaining incredibly close relationships with Saudi Arabia, which was described in a leaked diplomatic cable as “the most significant source of funding to Sunni terrorist groups worldwide”. ^

[9] To create a civil society, Saudi Arabia would need to lift restrictions on the press, give activists some official power, and devolve more power to elected municipalities. Civil society is the corona of pressure groups, advisors, and influencers that exist around a government and allow people to build common knowledge about their desires. Civil society helps you understand just how popular or unpopular a government policy is and gives you a lever to pull if you want to influence it.

A functioning civil society protects a government from its own mistakes (by making an outcry possible before any deed is irreversibly done) and helps ensure that the government is responsible to the will of the people.

That MBS is working hard to prevent civil society shows that he has no desire for feedback and believes he knows better than literally everyone else in the country who is not already his sycophant. I see few ways this could end well. ^

Model, Philosophy

When Remoter Effects Matter

In utilitarianism, “remoter effects” are the result of our actions influencing other people (and are hotly debated). I think that remoter effects are often overstated, especially (as Sir Williams said in Utilitarianism for and against) when they give the conventionally ethical answer. For example, a utilitarian might claim that the correct answer to the hostage dilemma [1] is to kill no one, because killing weakens the sanctity of human life and may lead to more deaths in the future.

When debating remoter effects, I think it’s worthwhile to split them into two categories: positive and negative. Positive remoter effects are when your actions cause others to refrain from some negative action they might otherwise take. Negative remoter effects are when your actions make it more likely that others will engage in a negative action [2].

Of late, I’ve been especially interested in ways that positive and negative remoter effects matter in political disagreements. To what extent will acting in an “honourable” [3] or pro-social way convince one’s opponents to do the same? Conversely, does fighting dirty bring out the same tendency in your opponents?

Some of my favourite bloggers are doubtful of the first proposition:

In “Deontologist Envy”, Ozy writes that we shouldn’t necessarily be nice to our enemies in the hopes that they’ll be nice to us:

In general people rarely have their behavior influenced by their political enemies. Trans people take pains to use the correct pronouns; people who are overly concerned about trans women in bathrooms still misgender them. Anti-racists avoid the use of slurs; a distressing number of people who believe in human biodiversity appear to be incapable of constructing a sentence without one. Social justice people are conscientious about trigger warnings; we are subjected to many tedious articles about how mentally ill people should be in therapy instead of burdening the rest of the world with our existence.

In “The Blues of Self-Regulation”, David Schraub talks about how this specifically applies to Republicans and Democrats:

The problem being that, even when Democrats didn’t change a rule protecting the minority party, Republicans haven’t even blinked before casting them aside the minute they interfered with their partisan agenda.

Both of these points are basically correct. Everything that Ozy says about asshats on the internet is true and David wrote his post in response to Republicans removing the filibuster for Supreme Court nominees.

But I still think that positive remoter effects are important in this context. When they happen (and I will concede that this is rare), it is because you are consistently working against the same political opponents and at least some of those opponents are honourable people. My favourite example here (although it is from war, not politics) is the Christmas Day Truce. This truce was so successful and widespread that high command undertook to move men more often to prevent a recurrence.

In politics, I view positive remoter effects as key to Senator John McCain repeatedly torpedoing the GOP healthcare plans. While Senators Murkowski and Collins framed their disagreements with the law around their constituents, McCain specifically mentioned the secretive, hurried and partisan approach to drafting the legislation. This stood in sharp contrast to Obamacare, which had numerous community consultations, went through committee and took special (and perhaps ridiculous) care to get sixty senators on board.

Imagine that Obamacare had been passed after secret drafting and no consultations. Imagine if Democrats had dismantled even more rules in the senate. They may have gotten a few more of their priorities passed or had a stronger version of Obamacare, but right now, they’d be seeing all that rolled back. Instead of evidence of positive remoter effects, we’d be seeing a clear case of negative ones.

When dealing with political enemies, positive remoter effects require a real sacrifice. It’s not enough not to do things that you don’t want to do anyway (like all the examples Ozy listed) and certainly not enough to refrain from doing things to third parties. For positive remoter effects to matter at all – for your opponents (even the honourable ones) not to say “well, they did it first and I don’t want to lose” – you need to give up some tools that you could use to advance your interests. Tedious journalists don’t care about you scrupulously using trigger warnings, but may appreciate not receiving death threats on Twitter.

Had right-wingers refrained from doxxing feminist activists (or even applied any social consequences at all against those who did so), all principled people on the left would be refusing to engage in doxxing against them. As it stands, that isn’t the case and those few leftists who ask their fellow travelers to refrain are met with the entirely truthful response: “but they started it!”

This highlights what might be an additional requirement for positive remoter effects in the political sphere: you need a clearly delimited coalition from which you can eject misbehaving members. Political parties are set up admirably for this. They regularly kick out members who fail to act as decorously as their office demands. Social movements have a much harder time, with predictable consequences – it’s far too easy for the most reprehensible members of any group to quickly become the representatives, at least as far as tactics are concerned.

Still, with positive remoter effects, you are not aiming at a movement or party broadly. Instead you are seeking to find those honourable few in it and inspire them on a different path. When it works (as it did with McCain), it can work wonders. But it isn’t something to lay all your hopes on. Some days, your enemies wake up and don’t screw you over. Other days, you have to fight.

Negative remoter effects seem so obvious as to require almost no explanation. While it’s hard (but possible) to inspire your opponents to civility with good behaviour, it’s depressingly easy to bring them down to your level with bad behavior. Acting honourably guarantees little, but acting dishonourably basically guarantees a similar response. Insofar as honour is a useful characteristic, it is useful precisely because it stops this slide towards mutual annihilation.

Notes:

[1] In the hostage dilemma, you are one of ten hostages, captured by rebels. The rebel leader offers you a gun with a single bullet. If you kill one of your fellow hostages, all of the survivors (including you) will be let free. If you refuse all of the hostages (including you) will be killed. You are guarded such that you cannot use the weapon against your captors. Your only option is to kill another hostage, or let all of the hostages be killed.

Here, I think remoter effects fail to salvage the conventional answer and the only proper utilitarian response is to kill one of the other hostages. ^

[2] Here I’m using “negative” in a roughly utilitarian sense: negative actions are those that tend to reduce the total utility of the world. When used towards good ends, negative actions consume some of the positive utility that the ends generate. When used towards ill ends, negative actions add even more disutility. This definition is robust against different preferred plans of actions (e.g. it works across liberals and conservatives, who might both agree that political violence tends to reduce utility, even if it doesn’t always reduce utility enough to rule it out in the face of certain ends), but isn’t necessarily robust across all terminal values (e.g. if you care only about reducing suffering and I care only for increasing happiness we may have different opinions on the tendency of reproduction towards good or ill).

Negative actions are roughly equivalent to “defecting”. “Roughly” because it is perhaps more accurate to say that the thing that makes defecting so pernicious is that it involves negative actions of a special class, those that generate extra disutility (possibly even beyond what simple addition would suggest) when both parties engage in them. ^

[3] I used “honourable” in several important places and should probably define it. When discussing actions, I think honourable actions are the opposite of “negative” actions as defined above: actions that tend towards the good, but can be net ill if used for bad ends. When describing “people” as honourable, I’m pointing to people who tend to reinforce norms around cooperation. This is more or less equivalent to being inherently reluctant to use negative actions to advance goals unless provoked.

My favourite example of honour is Salah ad-Din. He sent his own personal physician to tend to King Richard, who was his great enemy and used his own money to buy back a child kidnapped into slavery. Conveniently for me, Salah ad-Din shows both sides of what it means to be honourable. He personally executed Raynald III of Tripoli after Raynald ignored a truce, attacked Muslim caravans, and tortured many of the caravaners to death. To Guy of Lusignan, King of Jerusalem (who was captured in the same battle as Raynald and wrongly feared he was next to die), Salah ad-Din said: “[i]t is not the wont of kings, to kill kings; but that man had transgressed all bounds, and therefore did I treat him thus.” ^

History, Model

Warriors and Soldiers

Epistemic Status: Full of sweeping generalizations because I don’t want to make it 10x longer by properly unpacking all the underlying complexity.

[9 minute read]

In 2006, Dr. Atul Gawande wrote an article in The New Yorker about maternal care entitled “How Childbirth Went Industrial“. It’s an excellent piece from an author who consistently produces excellent pieces. In it, Gawande charts the rise of the C-section, from its origin as technique so dangerous it was considered tantamount to murder (and consequently banned on living mothers), to its current place as one of the most common surgical procedures carried out in North American hospitals.

The C-section – and epidurals and induced labour – have become so common because obstetrics has become ruthlessly focused on maximizing the Apgar score of newborns. Along the way, the field ditched forceps (possibly better for the mother yet tricky to use or teach), a range of maneuvers for manually freeing trapped babies (likewise difficult), and general anesthetic (genuinely bad for infants, or at least for the Apgar scores of infants).

The C-section has taken the place of much of the specialized knowledge of obstetrics of old, not the least because it is easy to teach and easy for even relatively less skilled doctors to get right. When Gawande wrote the article, there was debate about offering women in their 39th week of pregnancy C-sections as an alternative to waiting for labour. Based on the stats, this hasn’t quite come to pass, but C-sections have become slightly more prevalent since the article was written.

I noticed two laments in the piece. First, Gawande wonders at the consequences of such an essential aspect of the human experience being increasingly (and based off of the studies that show forceps are just as good as C-sections, arguably unnecessarily) medicalized. Second, there’s a sense throughout the article that difficult and hard-won knowledge is being lost.

The question facing obstetrics was this: Is medicine a craft or an industry? If medicine is a craft, then you focus on teaching obstetricians to acquire a set of artisanal skills—the Woods corkscrew maneuver for the baby with a shoulder stuck, the Lovset maneuver for the breech baby, the feel of a forceps for a baby whose head is too big. You do research to find new techniques. You accept that things will not always work out in everyone’s hands.

But if medicine is an industry, responsible for the safest possible delivery of millions of babies each year, then the focus shifts. You seek reliability. You begin to wonder whether forty-two thousand obstetricians in the U.S. could really master all these techniques. You notice the steady reports of terrible forceps injuries to babies and mothers, despite the training that clinicians have received. After Apgar, obstetricians decided that they needed a simpler, more predictable way to intervene when a laboring mother ran into trouble. They found it in the Cesarean section.

Medicine would not be the first industry to industrialize. The quasi-mythical King Ludd that gave us the phrase “Luddite” was said to be a weaver, put out of business by the improved mechanical knitting machines. English programs turn out thousands of writers every year, all with an excellent technical command of the English language, but most with none of the emotive power of Gawande. Following the rules is good enough when you’re writing for a corporation that fears to offend, or for technical clarity. But the best writers don’t just know how to follow the rules. They know how and when to break them.

If Gawande was a student of military history, he’d have another metaphor for what is happening to medicine: warriors are being replaced by soldiers.

If you ever find yourself in possession of a spare hour and feel like being lectured breathlessly by a wide-eyed enthusiast, find your local military history buff (you can identify them by their collection of swords or antique guns) and ask them whether there’s any difference between soldiers and warriors.

You can go do this now, or I can fill in, having given this lecture many times myself.

Imagine your favourite (or least favourite) empire from history. You don’t get yourself an empire by collecting bottle caps. To create one, you need some kind of army. To staff your army, you have two options. Warriors, or soldiers.

(Of course, this choice isn’t made just by empires. Their neighbours must necessarily face the same conundrum.)

Warriors are the heroes of movies. They were almost always the product of training that starts at a young age and more often than not were members a special caste. Think medieval European Knights, Japanese Samurai, or the Hashashin fida’i. Warriors were notable for their eponymous mastery of war. A knight was expected to understand strategy and tactics, riding, shooting, fighting (both on foot and mounted), and wrestling. Warriors wanted to live up to their warrior ethos, which normally emphasized certain virtues, like courage and mercy (to other warriors, not to any common peasant drafted to fight them).

Soldiers were whichever conscripts or volunteers someone could get into a reasonable standard of military order. They knew only what they needed to complete their duties: perhaps one or two simple weapons, how to march in formation, how to cook, and how to repair some of their equipment [1]. Soldiers just wanted to make it through the next battle alive. In service to this, they were often brutally efficient in everything they did. Fighting wasn’t an art to them – it was simple butchery and the simpler and quicker the better. Classic examples of soldiers are the Roman Legionaries, Greek Hoplites, and Napoleon’s Grande Armée.

The techniques that soldiers learned were simple because they needed to be easy to teach to ignorant peasants on a mass scale in a short time. Warriors had their whole childhood for elaborate training.

(Or at least, that’s the standard line. In practice, things were never quite as clear cut as that – veteran soldiers might have been as skilled as any warrior, for example. The general point remains though; one on one, you would always have bet on a warrior over a soldier.)

But when you talk about armies, a funny thing happens. Soldiers dominated [2]. Individually, they might have been kind of crap at what they did. Taken as a whole though, they were well-coordinated. They looked out for each other. They fought as a team. They didn’t foolishly break ranks, or charge headlong into the enemy. When Germanic warriors came up against Roman soldiers, they were efficiently butchered. The Germans went into battle looking for honour and perhaps a glorious death. The Romans happily gave them the latter and so lived (mostly) to collect their pensions. Whichever empire you thought about above almost certainly employed soldiers, not warriors.

It turns out that discipline and common purpose have counted for rather a lot more in military history than simple strength of arms. Of this particular point, I can think of no better example than the rebellion that followed the Meiji restoration. The few rebel samurai, wonderfully trained and unholy terrors in single combat were easily slaughtered by the Imperial conscripts, who knew little more than which side of a musket to point at the enemy.

The very fact that the samurai didn’t embrace the firing line is a point against them. Their warrior code, which esteemed individual skill, left them no room to adopt this devastating new technology. And no one could command them to take it up, because they were mostly prima donnas where their honour was concerned.

I don’t want to be too hard on warriors. They were actually an efficient solution to the problem of national defence if a population was small and largely agrarian, lacked political cohesion or logistical ability, or was otherwise incapable of supporting a large army. Under these circumstances, polities could not afford to keep a large population under arms at all times. This gave them several choices: they could rely on temporary levies, who would be largely untrained. They could have a large professional army that paid for itself largely through raiding, or they could have a small, elite cadre of professional warriors.

All of these strategies had disadvantages. Levies tended to have very brittle morale, and calling up a large proportion of a population makes even a successfully prosecuted war economically devastating. Raiding tends to make your neighbours really hate you, leading to more conflicts. It can also be very bad for discipline and can backfire on your own population in lean times. Professional warriors will always be dwarfed in numbers by opponents using any other strategy.

Historically, it was never as simple as solely using just one strategy (e.g. European knights were augmented with and eventually supplanted by temporary levies), but there was a clear lean towards one strategy or another in most resource-limited historical polities. It took complex cultural technology and a well-differentiated economy to support a large force of full time soldiers and wherever these pre-conditions were lacking, you just had to make do with what you could get [3].

When conditions suddenly call for a struggle – whether that struggle is against a foreign adversary, to boost profits, or to cure disease, it is useful to look at how many societal resources are thrown at the fight. When resources are scarce, we should expect to see a few brilliant generalists, or many poorly trained conscripts. When resources are thick on the ground, the amount that can be spent on brilliant people is quickly saturated and the benefits of training your conscripts quickly accrue. From one direction or another, you’ll approach the concept of soldiers.

Doctors as soldiers, not as warriors is the concept Gawande is brushing up against in his essay. These new doctors will be more standardized, with less room for individual brilliance, but more affordances for working well in teams. The prima donnas will be banished (as they aren’t good team players, even when they’re brilliant). Dr. Gregory House may have been the model doctor in the Victorian Age, or maybe even in the fifties. But I doubt any hospital would want him now. It may be that this standardization is just the thing we need to overcome persistent medical errors, improve outcomes across the board, and make populations healthier. But I can sympathize with the position that it might be causing us to lose something beautiful.

In software development, where I work, a similar trend can be observed. Start-ups aggressively court ambitious generalists, for whom freedom to build things their way is more important than market rate compensation and is a better incentive than even the lottery that is stock-options. At start-ups, you’re likely to see languages that are “fun” to work with, often dynamically typed, even though these languages are often considered less inherently comprehensible than their more “enterprise-friendly” statically typed brethren.

It’s with languages like Java (or its Microsoft clone, C#) and C++ that companies like Google and Amazon build the underlying infrastructure that powers large tracts of the internet. Among the big pure software companies, Facebook is the odd one out for using PHP (and this choice required them to rewrite the code underlying the language from scratch to make it performant enough for their large load).

It’s also at larger companies where team work, design documents, and comprehensibility start to be very important (although there’s room for super-stars at all of the big “tech” companies still; it’s only in companies more removed from tech and therefore outside a lot of the competition for top talent where being a good team player and writing comprehensible code might top brilliance as a qualifier). This isn’t to say that no one hiring for top talent appreciates things like good documentation, or comprehensibility. Merely that it is easy for a culture that esteems individual brilliance to ignore these things are a mark of competence.

Here the logic goes that anyone smart enough for the job will be smart enough to untangle the code of their predecessors. As anyone who’s been involved in the untangling can tell you, there’s a big difference between “smart enough to untangle this mess” and “inclined to wade through this genius’s spaghetti code to get to the part that needs fixing”.

No doubt there exist countless other examples in fields I know nothing about.

The point of gathering all these examples and shoving them into my metaphor is this: I think there exist two important transitions that can occur when a society needs to focus a lot of energy on a problem. The transition from conscripts to soldiers isn’t very interesting, as it’s basically the outcome of a process of continuous improvement.

But the transition from warriors to soldiers is. It’s amazing that we can often get better results by replacing a few highly skilled generalists who apply a lot of hard fought decision making, with a veritable army of less well trained, but highly regimented and organized specialists. It’s a powerful testament to the usefulness of group intelligence. Of course, sometimes (e.g. Google, or the Mongols) you get both, but these are rare happy accidents.

Being able to understand where this transition is occurring helps you understand where we’re putting effort. Understanding when it’s happening within your own sphere of influence can help you weather it.

Also note that this transition doesn’t only go in one direction. As manufacturing becomes less and less prevalent in North America, we may return to the distant past, when manufacturing stuff was only undertaken by very skilled artisans making unique objects.

Footnotes:

[1] Note the past tense throughout much of this essay; when I speak about soldiers and warriors, I’m referring only to times before the 1900s. I know comparatively little about how modern armies are set up. ^

[2] Best of all were the Mongols, who combined the lifelong training of warriors with the discipline and organization of soldiers. When Mongols clashed with European knights in Hungary, their “dishonourable” tactics (feints, followed by feigned retreats and skirmishing) easily took the day. This was all possible through a system of signal flags that allowed Subutai to command the whole battle from a promontory. European leaders were expected to show their bravery by being in the thick of fighting, which gave them no overall control over their lines. ^

[3] Historically, professional armies with good logistical support could somewhat pay for themselves by expanding an empire, which brought in booty and slaves. This is distinct from raiding (which does not seek to incorporate other territories) and has its own disadvantages (rebellion, over-extension, corruption, massive unemployment among unskilled labourers, etc.). ^

Data Science, Literature, Model

Two Ideas Worth Sharing From ‘Weapons of Math Destruction’

Recently, I talked about what I didn’t like in Dr. Cathy O’Neil’s book, Weapons of Math Destruction. This time around, I’d like to mention two parts of it I really liked. I wish Dr. O’Neil put more effort into naming the concepts she covered; I don’t have names for them from WMD, but in my head, I’ve been calling them Hidden Value Encodings and Axiomatic Judgements.

Hidden Value Encodings

Dr. O’Neil opens the book with a description of the model she uses to cook for her family. After going into a lot of detail about it, she makes this excellent observation:

Here we see that models, despite their reputation for impartiality, reflect goals and ideology. When I removed the possibility of eating Pop-Tarts at every meal, I was imposing my ideology on the meals model. It’s something we do without a second thought. Our own values and desires influence our choices, from the data we choose to collect to the questions we ask. Models are opinions embedded in mathematics.

It is far too easy to view models as entirely empirical, as math made form and therefore blind to values judgements. But that couldn’t be further from the truth. It’s value judgements all the way down.

Imagine a model that tries to determine when a credit card transaction is fraudulent. Fraudulent credit cards transactions cost the credit card company money, because they must refund the stolen amount to the customer. Incorrectly identifying credit card transactions also costs a company money, either through customer support time, or if the customer gets so fed up by constant false positives that they switch to a different credit card provider.

If you were tasked with building a model to predict which credit card transactions were fraudulent by one of the major credit card companies, you would probably build into your model a variable cost for failing to catch fraudulent transactions (equivalent to the cost the company must bear if the transaction is fraudulent) and a fixed cost for labelling innocuous transactions as fraudulent (equivalent to the average cost of a customer support call plus the average chance of a false positive pushing someone over the edge into switching cards multiplied by the cost of their lost business over the next few years).

From this encoding, we can already see that our model would want to automatically approve all transactions below the fixed cost of dealing with false positives [1], while applying increasing scrutiny to more expensive items, especially expensive items with big resale value or items more expensive than the cardholder normally buys (as both of these point strongly toward fraud).

This seems innocuous and logical. It is also encoding at least two sets of values. First, it encodes the values associated with capitalism. At the most basic level, this algorithm “believes” that profit is good and losses are bad. It is aimed to maximize profit for the bank and while we may hold this as a default assumption for most algorithms associated with companies, that does not mean it is devoid of values; instead it encodes all of the values associated with capitalism [2]. Second, the algorithm encodes some notion that customers have freedom to choose between alternatives (even more so than is encoded by default in accepting capitalism).

By applying a cost to false positives (and likely it would be a cost that rises with each previous false positive), you are tacitly acknowledging that customers could take their business elsewhere. If customers instead had no freedom to choose who they did business with, you could merely encode as your loss from false positives the fixed cost of fielding support calls. Since outsourced phone support is very cheap, your algorithm would care much less about false positives if there was no consumer choice.

As far as I can tell, there is no “value-free” place to stand. An algorithm in the service of a hospital that helps diagnose patients or focus resources on the most ill encodes the value that “it is better to be healthy than sick; better to be alive than dead”. These values might be (almost-)universal, but they still exist, they are still encoded, and they still deserve to be interrogated when we put functions of our society in the hands of software governed by them.

Axiomatic Judgements

One of the most annoying parts of being a child is the occasional requirement to accept an imposition on your time or preferences with the explanation “because I say so”. “Because I say so” isn’t an argument, it’s a request that you acknowledge adults’ overwhelming physical, earning, and social power as giving them a right to set arbitrary rules for you. Some algorithms, forced onto unwelcoming and less powerful populations (teachers, job-seekers, etc.) have adopted this MO as well. Instead of having to prove that they have beneficial effects or that their outputs are legitimate, they define things such that their outputs are always correct and brook no criticism.

Here’s Dr. O’Neil talking about a value-added teaching model in Washington State:

When Mathematica’s scoring system tags Sarah Wysocki and 205 other teachers as failures, the district fires them. But how does it ever learn if it was right? It doesn’t. The system itself has determined that they were failures, and that is how they are viewed. Two hundred and six “bad” teachers are gone. That fact alone appears to demonstrate how effective the value-added model is. It is cleansing the district of underperforming teachers. Instead of searching for the truth, the score comes to embody it.

She contrasts this with how Amazon operates: “if Amazon.​com, through a faulty correlation, started recommending lawn care books to teenage girls, the clicks would plummet, and the algorithm would be tweaked until it got it right.” On the other hand, the teacher rating algorithm doesn’t update, doesn’t look check if it is firing good teachers, and doesn’t take an accounting of its own costs. It holds it as axiomatic ­–a basic fact beyond questioning– that its results are the right results.

I am in full agreement with Dr. O’Neil’s criticism here. Not only does it push past the bounds of fairness to make important decisions, like hiring and firing, through opaque formulae that are not explained to those who are being judged and lack basic accountability, but it’s a professional black mark on all of the statisticians involved.

Whenever you train a model, you hold some data back. This is your test data and you will use it to assess how well your model did. That gets you through to “production” – to having your model out in the field. This is an exciting milestone, not only because your model is now making decisions and (hopefully) making them well, but because now you’ll have way more data. You can see how your new fraud detection algorithm does by the volume of payouts and customer support calls. You can see how your new leak detection algorithm does by customers replying to your emails and telling you if you got it right or not.

A friend of mine who worked in FinTech once told me that they approved 1.5% of everyone who applied for their financial product, no matter what. They’d keep the score their model gave to that person on record, then see how the person fared in reality. If they used the product responsibly despite a low score, or used it recklessly despite a high score, it was viewed as valuable information that helped the team make their model that much better. I can imagine a team of data scientists, heads together around a monitor, looking through features and asking each other “huh, do any of you see what we missed here?” and it’s a pleasant image [3].

Value added teaching models, or psychological pre-screens for hiring do nothing of the sort (even though it would be trivial for them to!). They give results and those results are defined as the ground truth. There’s no room for messy reality to work its way back into the cycle. There’s no room for the creators to learn. The algorithm will be flawed and imperfect, like all products of human hands. That is inevitable. But it will be far less perfect than it could be. Absent feedback, it is doomed to always be flawed, in ways both subtle and gross, and in ways unknown to its creators and victims.

Like most Canadian engineering students, I made a solemn vow:

…in the presence of these my betters and my equals in my calling, [I] bind myself upon my honour and cold iron, that, to the best of my knowledge and power, I will not henceforward suffer or pass, or be privy to the passing of, bad workmanship or faulty material in aught that concerns my works before mankind as an engineer…

Sloppy work, like that value-added teacher model is the very definition of bad workmanship. Would that I never suffer something like that to leave my hands and take life in the world! It is no Quebec Bridge, but the value-added teaching model and other doomed to fail algorithms like it represent a slow-motion accident, steadily stealing jobs and happiness from people with no appeal or remorse.

I can accept stains on the honour of my chosen profession. Those are inevitable. But in a way, stains on our competence are so much worse. Models that take in no feedback are both, but the second really stings me.

Footnotes

[1] This first approximation isn’t correct in practice, because certain patterns of small transactions are consistent with fraud. I found this out the hard way, when a certain Bitcoin exchange’s credit card verification procedure (withdrawing less than a dollar, then refunding it a few days later, after you tell them how much they withdrew) triggered the fraud detection software at my bank. Apparently credit card thieves will often do a similar thing (minus the whole “ask the cardholder how much was withdrawn” step), as a means of checking if the card is good without cluing in the cardholder. ^

[2] I don’t mean this as a criticism of capitalism. I seek merely to point out (that like all other economic systems) capitalism is neither value neutral, nor inevitable. “Capitalism” encodes values like “people are largely rational”, “people often act to maximize their gains” and “choice is fundamentally good and useful”. ^

If socialist banks had ever made it to the point of deploying algorithms (instead of collapsing under the weight of their flawed economic system), those algorithms would also encode values (like “people will work hard for the good of the whole” and “people are inherently altruistic” and “it is worth it to sacrifice efficiency in the name of fairness”).

[3] Dulce et decorum est… get the fucking data science right. ^

Data Science, Literature, Model

Two Fallacies From ‘Weapons of Math Destruction’

Much thanks to Cody Wild for providing editing and feedback. That said, I would like to remind my readers that I deserve full credit for all errors and that all opinions expressed here are only guaranteed to be mine.

[12 minute read]

I recently read Weapons of Math Destruction by Dr. Cathy O’Neil and found it an enormously frustrating book. It’s not that whole book was rubbish ­– that would have made things easy. No, the real problem with this book is that the crap and the pearls were so closely mixed that I had to stare at every sentence very, very carefully in hopes of figuring out which one each was. There’s some good stuff in here. But much of Dr. O’Neil’s argumentation relies on two new (to me) fallacies. It’s these fallacies (which I’ve dubbed the Ought-Is Fallacy and the Availability Bait-and-Switch) that I want to explore today.

Ought-Is Fallacy

It’s a commonly repeated truism that “correlation doesn’t imply causation”. People who’ve been around the statistics block a bit longer might echo Randall Monroe and retort that “correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there'”. Understanding why a graph like this:

In addition to this graph obviously being anchored, using it is obviously fair use.
Image Copyright The New York Times, 2017. Used here for purposes of commentary and criticism.

Is utter horsecrap [1], despite how suggestive it looks is the work of a decent education in statistics. Here correlation doesn’t imply causation. On the other hand, it’s not hard to find excellent examples where correlation really does mean causation:

This would be a risky graph to use if echo chambers didn't mean that I know literally no one who doesn't believe in global warming
Source: The National Centers for Environmental Administration. Having to spell “centre” wrong and use inferior units is a small price to pay for the fact that the American government immediately releases everything it creates into the public domain.

When trying to understand the ground truth, it’s important that you don’t confuse correlation with causation. But not every human endeavour is aimed at determining the ground truth. Some endeavours really do just need to understand which activities and results are correlated. Principal among these is insurance.

Let’s say I wanted to sell you “punched in the face” insurance. You’d pay a small premium every month and if you were ever punched in the face hard enough to require dental work, I’d pay you enough to cover it [2]. I’d probably charge you more if you were male, because men are much, much more likely to be seriously injured in an assault than women are.

I’m just interested in pricing my product. It doesn’t actually matter if being a man is causal of more assaults or just correlated with it. It doesn’t matter if men aren’t inherently more likely to assault and be assaulted compared to women (for a biological definition of “inherently”). It doesn’t matter what assault rates would be like in a society without toxic masculinity. One thing and one thing alone matters: on average, I will have to pay out more often for men. Therefore, I charge men more.

If you were to claim that because there may be nothing inherent in maleness that causes assault and being assaulted, therefore men shouldn’t have to pay more, you are making a moral argument, not an empirical one. You are also committing the ought-is fallacy. Just because your beliefs tell you that some aspect of the world should be a certain way, or that it would be more moral for the world to be a certain way, does not mean the world actually is that way or that everyone must agree to order the world as if that were true.

This doesn’t prevent you from making a moral argument that we should ignore certain correlates in certain cases in the interest of fairness, merely that you should not be making an empirical argument about what is ultimately values.

The ought-is fallacy came up literally whenever Weapons of Math Destruction talked about insurance, as well as when it talked about sentencing disparities. Here’s one example:

But as the questions continue, delving deeper into the person’s life, it’s easy to imagine how inmates from a privileged background would answer one way and those from tough inner-city streets another. Ask a criminal who grew up in comfortable suburbs about “the first time you were ever involved with the police,” and he might not have a single incident to report other than the one that brought him to prison. Young black males, by contrast, are likely to have been stopped by police dozens of times, even when they’ve done nothing wrong. A 2013 study by the New York Civil Liberties Union found that while black and Latino males between the ages of fourteen and twenty-four made up only 4.7 percent of the city’s population, they accounted for 40.6 percent of the stop-and-frisk checks by police. More than 90 percent of those stopped were innocent. Some of the others might have been drinking underage or carrying a joint. And unlike most rich kids, they got in trouble for it. So if early “involvement” with the police signals recidivism, poor people and racial minorities look far riskier.

Now I happen to agree with Dr. O’Neil that we should not allow race to end up playing a role in prison sentence length. There are plenty of good things to include in a sentence length: seriousness of crime, remorse, etc. I don’t think race should be one of these criteria and since the sequence of events that Dr. O’Neil mentions make this far from the default in the criminal justice system, I think doing more to ensure race stays out of sentencing is an important moral responsibility we have as a society.

But Dr. O’Neil’s empirical criticism of recidivism models is entirely off base. In this specific example, she is claiming that some characteristics that correlate with recidivism should not be used in recidivism models even though they improve the accuracy, because they are not per se causative of crime.

Because of systematic racism and discrimination in policing [3], the recidivism rate among black Americans is higher. If the only thing you care about is maximizing the prison sentence of people who are most likely to re-offend, then your model will tag black people for longer sentences. It does not matter what the “cause” of this is! Your accuracy will still be higher if you take race into account.

To say “black Americans seem to have a higher rate of recidivism, therefore we should punish them more heavily” is almost to commit the opposite fallacy, the is-ought. Instead, we should say “yes, empirically there’s a high rate of recidivism among black Americans, but this is probably caused by social factors and regardless, if we don’t want to create a population of permanently incarcerated people, with all of the vicious cycle of discrimination that this creates, we should aim for racial parity in sentencing”. This is a very strong (and I think persuasive) moral claim [4].

It certainly is more work to make a complicated moral claim that mentions the trade-offs we must make between punishment and fairness (or between what is morally right and what is expedient) than it is to make a claim that makes no reference to these subtleties. When we admit that we are sacrificing accuracy in the name of fairness, we do open up an avenue for people to attack us.

Despite this disadvantage, I think keeping our moral and empirical claims separate is very important. When you make the empirical claim that “being black isn’t causative of higher rates of recidivism, therefore the models are wrong when they rank black Americans as more likely to reoffend”, instead of the corresponding ethical claim, then you are making two mistakes. First, there’s lots of room to quibble about what “causative” even means, beyond simple genetic causation. Because you took an empirical and not ethical position, you may have to fight any future evidence to the contrary of your empirical position, even if the evidence is true; in essence, you risk becoming an enemy of the truth. If the truth becomes particularly obvious (and contrary to your claims) you risk looking risible and any gains you achieved will be at risk of reversal.

Second, I would argue that it is ridiculous to claim that universal human rights must rest on claims of genetic identicalness between all groups of people (and trying to make the empirical claim above, rather than a moral claim implicitly embraces this premise). Ashkenazi Jews are (on average) about 15 IQ points ahead of other groups. Should we give them any different moral worth because of this? I would argue no [5]. The only criteria for full moral worth as a human and all universal rights that all humans are entitled to is being human.

As genetic engineering becomes possible, it will be especially problematic to have a norm that moral worth of humans can be modified by their genetic predisposition to pro-social behaviour. Everyone, but most especially the left, which views diversity and flourishing as some of its most important projects should push back against both the is-ought and ought-is fallacies and fight for an expansive definition of universal human rights.

Availability Bait-and-Switch

Imagine someone told you the following story:

The Fair Housing Act has been an absolute disaster for my family! My brother was trying to sublet his apartment to a friend for the summer. Unfortunately, one of the fair housing inspectors caught wind of this and forced him to put up notices that it was for rent. He had to spend a week showing random people around it and some snot-nosed five-year-old broke one of his vases while he was showing that kid’s mother around. I know there were problems before, but is the Fair Housing Act really worth it if it can cause this?

Most people would say the answer to the above is “yes, it really was worth it, oh my God, what is wrong with you?”

But it’s actually hard to think that. Because you just read a long, vivid, easily imaginable example of what exactly was wrong with the current regime and a quick throw away reference to there being problems with the old way things were done. Some people might say that it’s better to at least mention that the other way of doing things had its problems too. I disagree strenuously.

When you make a throw-away reference to problems with another way of doing things, while focusing all of your descriptive effort on the problems of the current way (or vice-versa), you are committing the Availability Bait-and-Switch. And you are giving a very false illusion of balance; people will remember that you mentioned both had problems, but they will not take this away as their impression. You will have tricked your readers into thinking you gave a balanced treatment (or at least paved the way for a defence against claims that you didn’t give a balanced treatment) while doing nothing of the sort!

We are all running corrupted hardware. One of the most notable cognitive biases we have is the availability heuristic. We judge probabilities based on what we can easily recall, not on any empirical basis. If you were asked “are there more words in the average English language book that start with k, or have k as the third letter?”, you’d probably say “start with k!” [6]. In fact, words with “k” as the third letter show up more often. But these words are harder to recall and therefore much less available to your brain.

If I were to give you a bunch of very vivid examples of how algorithms can ruin your life (as Dr. O’Neil repeatedly does, most egregiously in chapters 1, 5, and 8) and then mention off-hand that human decision making also used to ruin a lot of people’s lives, you’d probably come out of our talk much more concerned with algorithms than with human decision making. This was a thing I had to deliberately fight against while reading Weapons of Math Destruction.

Because for a book about how algorithms are destroying everything, there was a remarkable paucity of data on this destruction. I cannot recall seeing any comparative analysis (backed up by statistics, not anecdotes) of the costs and benefits of human decision making and algorithmic decision making, as it applied to Dr. O’Neil’s areas of focus. The book was all the costs of one and a vague allusion to the potential costs of the other.

If you want to give your readers an accurate snapshot of the ground truth, your examples must be representative of the ground truth. If algorithms cause twice as much damage as human decision making in certain circumstances (and again, I’ve seen zero proof that this is the case) then you should interleave every two examples of algorithmic destruction with one of human pettiness. As long as you aren’t doing this, you are lying to your readers. If you’re committed to lying, perhaps for reasons of pithiness or flow, then drop the vague allusions to the costs of the other way of doing things. Make it clear you’re writing a hatchet job, instead of trying to claim epistemic virtue points for “telling both sides of the story”. At least doing things that way is honest [7].

Footnotes

[1] This is a classic example of “anchoring”, a phenomenon where you appear to have a strong correlation in a certain direction because of a single extreme point. When you have anchoring, it’s unclear how generalizable your conclusion is – as the whole direction of the fit could be the result of the single extreme point.

Here’s a toy example:

Note that the thing that makes me suspicious of anchoring here is that we have a big hole with no data and no way of knowing what sort of data goes there (it’s not likely we can randomly generate a bunch of new countries and plot their gun ownership and rate of mass shootings). If we did some more readings (ignoring the fact that in this case we can’t) and got something like this:

I would no longer be worried about anchoring. It really isn’t enough just to look at the correlation coefficient either. The image labelled “Also Not Anchored” has a marginally lower correlation coefficient than the anchored image, even though (I would argue) it is FAR more likely to represent a true positive correlation. Note also we have no way to tell that more data will necessarily give us a graph like the third. We could also get something like this:

In which we have a fairly clear trend of noisy data with an average of 2.5 irrespective of our x-value and a pair of outliers driving a slight positive correlation.

Also, the NYT graph isn’t normalized to population, which is kind of a WTF level mistake. They include another graph that is normalized later on, but the graph I show is the preview image on Facebook. I was very annoyed with the smug liberals in the comments of the NYT article, crowing about how conservatives are too stupid to understand statistics. But that’s a rant for another day…  ^

[2] I’d very quickly go out of business because of the moral hazard and adverse selection built into this product, but that isn’t germane to the example. ^

[3] Or at least, this is my guess as to the most plausible factors in the recidivism rate discrepancy. I think social factors ­– especially when social gaps are so clear and pervasive – seem much more likely than biological ones. The simplest example of the disparity in policing – and its effects – is the relative rates of being stopped by police during Stop and Frisk given above by Dr. O’Neil. ^

[4] It’s possible that variations in Monoamine oxidase A or some other gene amongst populations might make some populations more predisposed (in a biological sense) to violence or other antisocial behaviour. Given that violence and antisocial behaviour are relatively uncommon (e.g. about six in every one thousand Canadian adults are incarcerated or under community supervision on any given day), any genetic effect that increases them would both be small on a social level and lead to a relatively large skew in terms of supervised populations.

This would occur in the same way that repeat offenders tend to be about one standard deviation below median societal IQ but the correlation between IQ and crime explains very little of the variation in crime. This effect exists because crime is so rare.

It is unfortunately easy for people to take things like “Group X is 5% more likely to be violent”, and believe that people in Group X are something like 5% likely to assault them. This obviously isn’t true. Given that there are about 7.5 assaults for every 1000 Canadians each year, a population that was instead 100% Group X (with their presumed 5% higher assault rate) would see about 7.875 assaults per 1000 people, a difference of about one additional assault per 3500 people.

Unfortunately, if society took its normal course, we could expect to see Group X very overrepresented in prison. As soon as Group X gets a reputation for violence, juries would be more likely to convict, bail would be less likely, sentences might be longer (out of fear of recidivism), etc. Because many jobs (and in America, social benefits and rights) are withdrawn after you’ve been sentenced to jail, formerly incarcerated members of Group X would see fewer legal avenues to make a living. This could become even worse if even non-criminal members of Group X would denied some jobs due to fear of future criminality, leaving Group X members with few overall options but the black and grey economies and further tightening the spiral of incarceration and discrimination.

In this case, I think the moral thing to do as a society is to ignore any evidence we have about between-group differences in genetic propensities to violence. Ignoring results isn’t the same thing as pretending they are false or banning research; we aren’t fighting against truth, simply saying that some small extra predictive power into violence is not worth the social cost that Group X would face in a society that is entirely unable to productively reason about statistics.  ^

[5] Although we should be ever vigilant against people who seek to do the opposite and use genetic differences between Ashkenazi Jews and other populations as a basis for their Nazi ideology. As Hannah Arendt said, the Holocaust was a crime against humanity perpetrated on the body of the Jewish people. It was a crime against humanity (rather than “merely” a crime against Jews) because Jews are human. ^

[6] Or at least, you would if I hadn’t warned you that I was about to talk about biases. ^

[7] My next blog post is going to be devoted to what I did like about the book, because I don’t want to commit the mistakes I’ve just railed against (and because I think there was some good stuff in the book that bears reviewing). ^