Model, Quick Fix

When QALYs Are Wrong – Thoughts on the Gates Foundation

Every year, I check in to see if we’ve eradicated polio or guinea worm yet. Disease eradications are a big deal. We’ve only successfully eradicated one disease – smallpox – so being so close to wiping out two more is very exciting.

Still, when I looked at how much resources were committed to polio eradication (especially by the Gates Foundation), I noticed they seemed incongruent with its effects. No polio eradication effort can be found among GiveWell’s top charities, because it is currently rather expensive to prevent polio. The amount of quality-adjusted life years (QALYs, a common measure of charity effectiveness used in the Effective Altruism community) you can save with a donation to preventing malaria is just higher than for polio.

I briefly wondered if it might not be better for all of the effort going to polio eradication to instead go to anti-malaria programs. After thinking some more, I’ve decided that this would be a grave mistake. Since I haven’t seen why explained anywhere else, I figured I’d share my thinking, so that anyone else having the same thought can see it.

A while back, it was much cheaper to buy QALYs using the polio vaccines. As recently as 1988, there were more than 350,000 cases of polio every year. It’s a testament to the excellent work of the World Health Organization and its partners that polio has become so much rarer – and therefore so much more expensive to prevent each new case of. After all, when there are few new cases, you can’t prevent thousands.

It is obviously very good that there are few cases of polio. If we decided that this was good enough and diverted resources towards treating other diseases, we might quickly find that this would no longer be the case. Polio could once again become a source of easy QALY improvements – because it would be running rampant in unvaccinated populations. When phrased this way, I hope it’s clear that polio becoming a source of cheap QALY improvements isn’t a good thing; the existence of cheap QALY improvements means that we’ve dropped the ball on a potentially stoppable disease.

If polio is eradicated for good, we can stop putting any effort into fighting it. We won’t need any more polio vaccines or any more polio monitoring. It’s for this reason that we’re much better off if we finish the eradication effort.

What I hadn’t realized was that a simple focus on present QALYs obscures the potential effects our actions can have on future QALYs. Abandoning diseases until treatments for them save many lives cheaply might look good for our short term effectiveness, but in the long term, the greatest gains come from following through with our eradication efforts, so that we can repurpose all resources from an eradicated disease to the fight against another, forever.

Economics, Model

Why External Debt is so Dangerous to Developing Countries

I have previously written about how to evaluate and think about public debt in stable, developed countries. There, the overall message was that the dangers of debt were often (but not always) overhyped and cynically used by certain politicians. In a throwaway remark, I suggested the case was rather different for developing countries. This post unpacks that remark. It looks at why things go so poorly when developing countries take on debt and lays out a set of policies that I think could help developing countries that have high debt loads.

The very first difference in debt between developed and developing countries lies in the available terms of credit; developing countries get much worse terms. This makes sense, as they’re often much more likely to default on their debt. Interest scales with risk and it just is riskier to lend money to Zimbabwe than to Canada.

But interest payments aren’t the only way in which developing countries get worse terms. They are also given fewer options for the currency they take loans out in. And by fewer, I mean very few. I don’t think many developing countries are getting loans that aren’t denominated in US dollars, Euros, or, if dealing with China, Yuan. Contrast this with Canada, which has no problem taking out loans in its own currency.

When you own the currency of your debts, you can devalue it in response to high debt loads, making your debts cheaper to pay off in real terms (that is to say, your debt will be equivalent to fewer goods and services than it was before you caused inflation by devaluing your currency). This is bad for lenders. In the event of devaluation, they lose money. Depending on the severity of the inflation, it could be worse for them than a simple default would be, because they cannot even try and recover part of the loan in court proceedings.

(Devaluations don’t have to be large to be reduce debt costs; they can also take the form of slightly higher inflation, such that interest is essentially nil on any loans. This is still quite bad for lenders and savers, although less likely to be worse than an actual default. The real risk comes when a country with little economic sophistication tries to engineer slightly higher inflation. It seems likely that they could drastically overshoot, with all of the attendant consequences.)

Devaluations and inflation are also politically fraught. They are especially hard on pensioners and anyone living on a fixed income – which is exactly the population most likely to make their displeasure felt at the ballot box. Lenders know that many interest groups would oppose a Canadian devaluation, but these sorts of governance controls and civil society pressure groups often just doesn’t exist (or are easily ignored by authoritarian leaders) in the developing world, which means devaluations can be less politically difficult [1].

Having the option to devalue isn’t the only reason why you might want your debts denominated in your own currency (after all, it is rarely exercised). Having debts denominated in a foreign currency can be very disruptive to the domestic priorities of your country.

The Canadian dollar is primarily used by Canadians to buy stuff they want [2]. The Canadian government naturally ends up with Canadian dollars when people pay their taxes. This makes the loan repayment process very simple. Canadians just need to do what they’d do anyway and as long as tax rates are sufficient, loans will be repaid.

When a developing country takes out a loan denominated in foreign currency, they need some way to turn domestic production into that foreign currency in order to make repayments. This is only possible insofar as their economy produces something that people using the loan currency (often USD) want. Notably, this could be very different than what the people in the country want.

For example, the people of a country could want to grow staple crops, like cassava or maize. Unfortunately, they won’t really be able to sell these staples for USD; there isn’t much market for either in the US. There very well could be room for the country to export bananas to the US, but this means that some of their farmland must be diverted away from growing staples for domestic consumption and towards growing cash crops for foreign consumption. The government will have an incentive to push people towards this type of agriculture, because they need commodities that can be sold for USD in order to make their loan payments [3].

As long as the need for foreign currency persists, countries can be locked into resource extraction and left unable to progress towards a more mature manufacturing- or knowledge-based economies.

This is bad enough, but there’s often greater economic damage when a country defaults on its foreign loans – and default many developing countries will, because they take on debt in a highly procyclical way [4].

A variable, indicator, or quantity is said to be procyclical if it is correlated with the overall health of an economy. We say that developing nation debt is procyclical because it tends to expand while economies are undergoing expansion. Specifically, new developing country debts seem to be correlated with many commodity prices. When commodity prices are high, it’s easier for developing countries that export them to take on debt.

It’s easy to see why this might be the case. Increasing commodity prices make the economies of developing countries look better. Exporting commodities can bring in a lot of money, which can have spillover effects that help the broader economy. As long as taxation isn’t too much a mess, export revenues make government revenues higher. All of this makes a country look like a safer bet, which makes credit cheaper, which makes a country more likely to take it on.

Unfortunately (for resource dependent countries; fortunately for consumes), most commodity price increases do not last forever. It is important to remember that prices are a signal – and that high prices are a giant flag that says “here be money”. Persistently high prices lead to increased production, which can eventually lead to a glut and falling prices. This most recently and spectacularly happened in 2014-2015, as American and Canadian unconventional oil and gas extraction led to a crash in the global price of oil [5].

When commodity prices crash, indebted, export-dependent countries are in big trouble. They are saddled with debt that is doubly difficult to pay back. First, their primary source of foreign cash for paying off their debts is gone with the crash in commodity prices (this will look like their currency plummeting in value). Second, their domestic tax base is much lower, starving them of revenue.

Even if a country wants to keep paying its debts, a commodity crash can leave them with no choice but a default. A dismal exchange rate and minuscule government revenues mean that the money to pay back dollar denominated debts just doesn’t exist.

Oddly enough, defaulting can offer some relief from problems; it often comes bundled with a restructuring, which results in lower debt payments. Unfortunately, this relief tends to be temporary. Unless it’s coupled with strict austerity, it tends to lead into another problem: devastating inflation.

Countries that end up defaulting on external debt are generally not living within their long-term means. Often, they’re providing a level of public services that are unsustainable without foreign borrowing, or they’re seeing so much government money diverted by corrupt officials that foreign debt is the only way to keep the lights on. One inevitable effect of a default is losing access to credit markets. Even when a restructuring can stem the short-term bleeding, there is often a budget hole left behind when the foreign cash dries up [6]. Inflation occurs because many governments with weak institutions fill this budgetary void with the printing press.

There is nothing inherently wrong with printing money, just like there’s nothing inherently wrong with having a shot of whiskey. A shot of whiskey can give you the courage to ask out the cute person at the bar; it can get you nerved up to sing in front of your friends. Or it can lead to ten more shots and a crushing hangover. Printing money is like taking shots. In some circumstances, it can really improve your life, it’s fine in moderation, but if you overdue it you’re in for a bad time.

When developing countries turn to the printing press, they often do it like a sailor turning to whiskey after six weeks of enforced sobriety.

Teachers need to be paid? Print some money. Social assistance? Print more money. Roads need to be maintained? Print even more money.

The money supply should normally expand only slightly more quickly than economic growth [7]. When it expands more quickly, prices begin to increase in lockstep. People are still paid, but the money is worth less. Savings disappear. Velocity (the speed with which money travels through the economy) increases as people try and spend money as quickly as possible, driving prices ever higher.

As the currency becomes less and less valuable, it becomes harder and harder to pay for imports. We’ve already talked about how you can only buy external goods in your own currency to the extent that people outside your country have a use for your currency. No one has a use for a rapidly inflating currency. This is why Venezuela is facing shortages of food and medicine – commodities it formerly imported but now cannot afford.

The terminal state of inflation is hyperinflation, where people need to put their currency in wheelbarrows to do anything with it. Anyone who has read about Germany in the 1930s knows that hyperinflation opens the door to demagogues and coups – to anything or anyone who can convince the people that the suffering can be stopped.

Taking into account all of this – the inflation, the banana plantations, the boom and bust cycles – it seems clear that it might be better if developing countries took on less debt. Why don’t they?

One possible explanation is the IMF (International Monetary Fund). The IMF often acts as a lender of last resort, giving countries bridging loans and negotiating new repayment terms when the prospect of default is raised. The measures that the IMF takes to help countries repay their debts have earned it many critics who rightly note that there can be a human cost to the budget cuts the IMF demands as a condition for aid [8]. Unfortunately, this is not the only way the IMF might make sovereign defaults worse. It also seems likely that the IMF represents a significant moral hazard, one that encourages risky lending to countries that cannot sustain debt loads long-term [9].

A moral hazard is any situation in which someone takes risks knowing that they won’t have to pay the penalty if their bet goes sour. Within the context of international debt and the IMF, a moral hazard arises when lenders know that they will be able to count on an IMF bailout to help them recover their principle in the event of a default.

In a world without the IMF, it is very possible that borrowing costs would be higher for developing countries, which could serve as a deterrent to taking on debt.

(It’s also possible that countries with weak institutions and bad governance will always take on unsustainable levels of debt, absent some external force stopping them. It’s for this reason that I’d prefer some sort of qualified ban on loaning to developing countries that have debt above some small fraction of their GDP over any plan that relies on abolishing the IMF in the hopes of solving all problems related to developing country debt.)

Paired with a qualified ban on new debt [10], I think there are two good arguments for forgiving much of the debt currently held by many developing countries.

First and simplest are the humanitarian reasons. Freed of debt burdens, developing countries might be able to provide more services for their citizens, or invest in infrastructure so that they could grow more quickly. Debt forgiveness would have to be paired with institutional reform and increased transparency, so that newfound surpluses aren’t diverted into the pockets of kleptocrats, which means any forgiveness policy could have the added benefit of acting as a big stick to force much needed governance changes.

Second is the doctrine of odious debts. An odious debt is any debt incurred by a despotic leader for the purpose of enriching themself or their cronies, or repressing their citizens. Under the legal doctrine of odious debts, these debts should be treated as the personal debt of the despot and wiped out whenever there is a change in regime. The logic behind this doctrine is simple: by loaning to a despot and enabling their repression, the creditors committed a violent act against the people of the country. Those people should have no obligation (legal or moral) to pay back their aggressors.

The doctrine of odious debts wouldn’t apply to every indebted developing country, but serious arguments can be made that several countries (such as Venezuela) should expect at least some reduction in their debts should the local regime change and international legal scholars (and courts) recognize the odious debt principle.

Until international progress is made on a clear list of conditions under which countries cannot take on new debt and a comprehensive program of debt forgiveness, we’re going to see the same cycle repeat over and over again. Countries will take on debt when their commodities are expensive, locking them into an economy dependent on resource extraction. Then prices will fall, default will loom, and the IMF will protect investors. Countries are left gutted, lenders are left rich, taxpayers the world over hold the bag, and poverty and misery continue – until the cycle starts over once again.

A global economy without this cycle of boom, bust, and poverty might be one of our best chances of providing stable, sustainable growth to everyone in the world. I hope one day we get to see it.

Footnotes

[1] I so wanted to get through this post without any footnotes, but here we are.

There’s one other reason why e.g. Canada is a lower risk for devaluation than e.g. Venezuela: central bank independence. The Bank of Canada is staffed by expert economists and somewhat isolated from political interference. It is unclear just how much it would be willing to devalue the currency, even if that was the desire of the Government of Canada.

Monetary policy is one lever of power that almost no developed country is willing to trust directly to politicians, a safeguard that doesn’t exist in all developing countries. Without it, devaluation and inflation risk are much higher. ^

[2] Secondarily it’s used to speculatively bet on the health of the resource extraction portion of the global economy, but that’s not like, too major of a thing. ^

[3] It’s not that the government is directly selling the bananas for USD. It’s that the government collects taxes in the local currency and the local currency cannot be converted to USD unless the country has something that USD holders want. Exchange rates are determined based on how much people want to hold one currency vs. another. A decrease in the value of products produced by a country relative to other parts of the global economy means that people will be less interested in holding that country’s currency and its value will fall. This is what happened in 2015 to the Canadian dollar; oil prices fell (while other commodity prices held steady) and the value of the dollar dropped.

Countries that are heavily dependent on the export of only one or two commodities can see wild swings in their currencies as those underlying commodities change in value. The Russian ruble, for example, is very tightly linked to the price of oil; it lost half its value between 2014 and 2016, during the oil price slump. This is a much larger depreciation than the Canadian dollar (which also suffered, but was buoyed up by Canada’s greater economic diversity). ^

[4] This section is drawn from the research of Dr. Karmen Reinhart and Dr. Kenneth Rogoff, as reported in This Time Is Different, Chapter 5: Cycles of Default on External Debt. ^

[5] This is why peak oil theories ultimately fell apart. Proponents didn’t realize that consistently high oil prices would lead to the exploitation of unconventional hydrocarbons. The initial research and development of these new sources made sense only because of the sky-high oil prices of the day. In an efficient market, profits will always eventually return to 0. We don’t have a perfectly efficient market, but it’s efficient enough that commodity prices rarely stay too high for too long. ^

[6] Access to foreign cash is gone because no one lends money to countries that just defaulted on their debts. Access to external credit does often come back the next time there’s a commodity bubble, but that could be a decade in the future. ^

[7] In some downturns, a bit of extra inflation can help lower sticky wages in real terms and return a country to full employment. My reading suggests that commodity crashes are not one of those cases. ^

[8] I’m cynical enough to believe that there is enough graft in most of these cases that human costs could be largely averted, if only the leaders of the country were forced to see their graft dry up. I’m also pragmatic enough to believe that this will rarely happen. I do believe that one positive impact of the IMF getting involved is that its status as an international institution gives it more power with which to force transparency upon debtor nations and attempt to stop diversion of public money to well-connected insiders. ^

[9] A quick search found two papers that claimed there was a moral hazard associated with the IMF and one article hosted by the IMF (and as far as I can tell, later at least somewhat repudiated by the author in the book cited in [4]) that claims there is no moral hazard. Draw what conclusions from this you will. ^

[10] I’m not entirely sure what such a ban would look like, but I’m thinking some hard cap on amount loaned based on percent of GDP, with the percent able to rise in response to reforms that boost transparency, cut corruption, and establish modern safeguards on the central bank. ^

Model, Philosophy

Against Novelty Culture

So, there’s this thing that happens in certain intellectual communities, like (to give a totally random example) social psychology. This thing is that novel takes are rewarded. New insights are rewarded. Figuring out things that no one has before is rewarded. The high-status people in such a community are the ones who come up with and disseminate many new insights.

On the face of it, this is good! New insights are how we get penicillin and flight and Pad Thai burritos. But there’s one itty bitty little problem with building a culture around it.

Good (and correct!) new ideas are a finite resource.

This isn’t news. Back in 2005, John Ioannidis laid out the case for “most published research findings” being false. It turns out that when you have a small chance of coming up with a correct idea even using statistical tests for to find false positives can break down.

A quick example. There are approximately 25,000 genes in the human genome. Imagine you are searching for genes that increase the risk of schizophrenia (chosen for this example because it is a complex condition believed to be linked to many genes). If there are 100 genes involved in schizophrenia, the odds of any given gene chosen at random being involved are 1 in 250. You, the investigating scientist, decide that you want about an 80% chance of finding some genes that are linked (this is called study power and 80% is a common value) You run a bunch of tests, analyze a bunch of DNA, and think you have a candidate. This gene has been “proven” to be associated with schizophrenia at a p=0.05 confidence level.

(A p-value is the possibility of observing an event at least as extreme as the observed one, if the null hypothesis is true. This means that if the gene isn’t associated with schizophrenia, there is only a 1 in 20 chance – 5% – we’d see a result as extreme or more extreme than the one we observed.)

At the start, we had a 1 in 250 chance of finding a gene. Now that we have a gene, we think there’s a 19 in 20 chance that it’s actually partially responsible for schizophrenia (technically, if we looked at multiple candidates, we should do something slightly different here, but many scientists still don’t, making this still a valid example). Which probability to we trust?

There’s actually an equation to figure it out. It’s called Bayes Rule and statisticians and scientists use it to update probabilities in response to new information. It goes like this:

(You can sing this to the tune of Hallelujah; take P of A when given B / times P of A a priori / divide the whole thing by B’s expectation / new evidence you may soon find / but you will not be in a bind / for you can add it to your calculation.)

In plain language, it means that probability of something being true after an observation (P(A|B)) is equal to the probability of it being true absent any observations (P(A), 1 in 250 here), times the probability of the observation happening if it is true (P(B|A), 0.8 here), divided by the baseline probability of the observation (P(B), 1 in 20 here).

With these numbers from our example, we can see that the probability of a gene actually being associated with schizophrenia when it has a confidence level of 0.05 is… 6.4%.

I took this long detour to illustrate a very important point: one of the strongest determinants of how likely something is to actually be true is the base chance it has of being true. If we expected 1000 genes to be associated with schizophrenia, then the base chance would be 1 in 25, and the probability our gene actually plays a role would jump up to 64%.

To have ten times the chance of getting a study right, you can be 10 times more selective (which probably requires much more than ten times the effort)… or you can investigate something ten times as likely to actually occur. Base rates can be more powerful than statistics, more powerful than arguments, and more powerful than common sense.

This suggests that any community that bases status around producing novel insights will mostly become a community based around producing novel-seeming (but false!) insights once it exhausts all of the available true (and easily attainable) insights it could discover. There isn’t a harsh dividing line, just a gradual trend towards plausible nonsense as the underlying vein of truth is mined out, but the studies and blog posts continue.

Except the reality is probably even worse, because any competition for status in such a community (tenure, page views) will become an iterative process that rewards those best able to come up with plausible sounding wrappers on unfortunately false information.

When this happens, we have people publishing studies with terrible analyses but highly sharable titles (anyone remember the himmicanes paper?), with the people at the top calling anyone who questions their shoddy research “methodological terrorists“.

I know I have at least one friend who is rolling their eyes right now, because I always make fun of the reproducibility crisis in psychology.

But I’m just using that because it’s a convenient example. What I’m really worried about is the Effective Altruism community.

(Effective Altruism is a movement that attempts to maximize the good that charitable donations can do by encouraging donation to the charities that have the highest positive impact per dollar spent. One list of highly effective charities can be found on GiveWell; Givewell has demonstrated a noted trend away from novelty such that I believe this post does not apply to them.)

We are a group of people with countless forums and blogs, as well as several organizations devoted to analyzing the evidence around charity effectiveness. We have conventional organizations, like GiveWell, coexisting with less conventional alternatives, like Wild-Animal Suffering Research.

All of these organizations need to justify their existence somehow. All of these blogs need to get shares and upvotes from someone.

If you believe (like I do) that the number of good charity recommendations might be quite small, then it follows that a large intellectual ecosystem will quickly exhaust these possibilities and begin finding plausible sounding alternatives.

I find it hard to believe that this isn’t already happening. We have people claiming that giving your friends cash or buying pizza for community events is the most effective charity. We have discussions of whether there is suffering in the fundamental particles of physics.

Effective Altruism is as much a philosophy movement as an empirical one. It isn’t always the case that we’ll be using P-values and statistics in our assessment. Sometimes, arguments are purely moral (like arguments about how much weight we should give to insect suffering). But both types of arguments can eventually drift into plausible sounding nonsense if we exhaust all of the real content.

There is no reason to expect that we should be able to tell when this happens. Certainly, experimental psychology wasn’t able to until several years after much-hyped studies more-or-less stopped replicating, despite a population that many people would have previously described as full of serious-minded empiricists. Many psychology researchers still won’t admit that much of the past work needs to be revisited and potentially binned.

This is a problem of incentives, but I don’t know how to make the incentives any better. As a blogger (albeit one who largely summarizes and connects ideas first broached by others), I can tell you that many of the people who blog do it because they can’t not write. There’s always going to be people competing to get their ideas heard and the people who most consistently provide satisfying insights will most often end up with more views.

Therefore, I suggest caution. We do not know how many true insights we should expect, so we cannot tell how likely to be true anything that feels insightful actually is. Against this, the best defense is highly developed scepticism. Always remember to ask for implications of new insights and to determine what information would falsify them. Always assume new insights have a low chance of being true. Notice when there seems to be a pressure to produce novel insights long after the low hanging fruit is gone and be wary of anyone in tat ecosystem.

We might not be able to change novelty culture, but we can do our best to guard against it.

[Special thanks to Cody Wild for coming up with most of the lyrics to Bayesian Hallelujah.]

Model

Hidden Disparate Impact

It is against commonly held intuitions that a group can be both over-represented in a profession, school, or program, and discriminated against. The simplest way to test for discrimination is to look at the general population, find the percent that a group represents, then expect them to represent exactly that percentage in any endeavour, absent discrimination.

Harvard, for example, is 17.1% Asian-American (foreign students are broken out separately in the statistics I found, so we’re only talking about American citizens or permanent residents in this post). America as a whole is 4.8% Asian-American. Therefore, many people will conclude that there is no discrimination happening against Asian-Americans at Harvard.

This is what would happen under many disparate impact analyses of discrimination, where the first step to showing discrimination is showing one group being accepted (for housing, employment, education, etc.) at a lower rate than another.

I think this naïve view is deeply flawed. First, we have clear evidence that Harvard is discriminating against Asian-Americans. When Harvard assigned personality scores to applicants, Asian-Americans were given the lowest scores of any ethnic group. When actual people met with Asian-American applicants, their personality scores were the same as everyone else’s; Harvard had assigned many of the low ratings without ever meeting the students, in what many suspect is an attempt to keep Asian-Americans below 20% of the student body.

Personality ratings in college admissions have a long and ugly history. They were invented to enforce quotas on Jews in the 1920s. These discriminatory quotas had a chilling effect on Jewish students; Dr. Jonas Salk, the inventor of the polio vaccine, chose the schools he attended primarily because they were among the few which didn’t discriminate against Jews. Imagine how prevalent and all-encompassing the quotas had to be for him to be affected.

If these discriminatory personality scores were dropped (or Harvard stopped fabricating bad results for Asian-Americans), Asian-American admissions at Harvard would rise.

This is because the proper measure of how many Asian-Americans should get into Harvard has little to do with their percentage of the population. It has to do with how many would meet Harvard’s formal admission criteria. Since Asian-Americans have much higher test scores than any other demographic group in America, it only stands to reason that we should expect to see Asian-Americans over-represented among any segment of the population that is selected at least in part by their test scores.

Put simply, Asian-American test scores are so good (on average) that we should expect to see proportionately more Asian-Americans than any other group get into Harvard.

This is the comparison we should be making when looking for discrimination in Harvard’s admissions. We know their criteria and we know roughly what the applicants look like. Given this, what percentage of applicants should get in if the criteria were applied fairly? The answer turns out to be about four times as many Asian-Americans as are currently getting in.

Hence, discrimination.

Unfortunately, this only picks up one type of discrimination – the discrimination that occurs when stated standards are being applied in an unequal manner. There’s another type of discrimination that can occur when standards aren’t picked fairly at all; their purpose is to act as a barrier, not assess suitability. This does come up in formal disparate impact analyses – you have to prove that any standards that lead to disparate impact are necessary – but we’ve already seen how you can avoid triggering those if you pick your standard carefully and your goal isn’t to lock a group out entirely, but instead to reduce their numbers.

Analyzing the necessity of standards that may have disparate impact can be hard and lead to disagreement.

For example, we know that Harvard’s selection criteria must be discriminate, which is to say it must differentiate. We want elite institutions to have selection criteria that differentiate between applicants! There is a general agreement, for example, that someone who fails all of their senior year courses won’t get into Harvard and someone who aces them might.

If we didn’t have a slew of records from Harvard backing up the assertion that personality criteria were rigged to keep out Asian-Americans (like they once kept out Jews), evaluating whether discrimination was going on at Harvard would be harder. There’s no prima facie reason to consider personality scores (had they been adopted for a more neutral purpose and applied fairly) to be a bad selector.

It’s a bit old fashioned, but there’s nothing inherently wrong with claiming that you also want to select for moral character and leadership when choosing your student body. The case for this is perhaps clearer at Harvard, which views itself as a training ground for future leaders. Therefore, personality scores aren’t clearly useless criteria and we have to apply judgement when evaluating whether it’s reasonable for Harvard to select its students using them.

Historically, racism has used seemingly valid criteria to cloak itself in a veneer of acceptability. Redlining, the process by which African-Americans were denied mortgage financing hid its discriminatory impact with clinical language about underwriting risk. In reality, redlining was not based on actual actuarial risk in a neighbourhood (poor whites were given loans, while middle-class African-Americans were denied them), but by the racial composition of the neighbourhood.

Like in the Harvard case, it was only the discovery of redlined maps that made it clear what was going on; the criterion was seemingly borderline enough that absent evidence, there was debate as to whether it existed for reasonable purpose or not.

(One thing that helped trigger further investigation was the realization that well-off members of the African-American community weren’t getting loans that a neutral underwriter might expect them to qualify for; their income and credit was good enough that we would have expected them to receive loans.)

It is also interesting to note that both of these cases hid behind racial stereotypes. Redlining was defended because of “decay” in urban neighbourhoods (a decay that was in many cases caused by redlining), while Harvard’s admissions relied upon negative stereotypes of Asian-Americans. Many were dismissed with the label “Standard Strong”, implying that they were part of a faceless collective, all of whom had similarly impeccable grades and similarly excellent extracurricular, but no interesting distinguishing features of their own.

Realizing how hard it is to tell apart valid criteria from discriminatory ones has made me much more sympathetic to points raised by technocrat-skeptics like Dr. Cathy O’Neil, who I have previously been harsh on. When bad actors are hiding the proof of their discrimination, it is genuinely difficult to separate real insurance underwriting (which needs to happen for anyone to get a mortgage) from discriminatory practices, just like it can be genuinely hard to separate legitimate college application processes from discriminatory ones.

While numerical measures, like test scores, have their own problems, they do provide some measure of impartiality. Interested observers can compare metrics to outcomes and notice when they’re off. Beyond redlining and college admissions, I wonder what other instances of potential discrimination a few civic minded statisticians might be able to unearth.

Model, Politics, Science

Science Is Less Political Than Its Critics

A while back, I was linked to this Tweet:

It had sparked a brisk and mostly unproductive debate. If you want to see people talking past each other, snide comments, and applause lights, check out the thread. One of the few productive exchanges centres on bridges.

Bridges are clearly a product of science (and its offspring, engineering) – only the simplest bridges can be built without scientific knowledge. Bridges also clearly have a political dimension. Not only are bridges normally the product of politics, they also are embedded in a broader political fabric. They change how a space can be used and change geography. They make certain actions – like commuting – easier and can drive urban changes like suburb growth and gentrification. Maintenance of bridges uses resources (time, money, skilled labour) that cannot be then used elsewhere. These are all clearly political concerns and they all clearly intersect deeply with existing power dynamics.

Even if no other part of science was political (and I don’t think that could be defensible; there are many other branches of science that lead to things like bridges existing), bridges prove that science certainly can be political. I can’t deny this. I don’t want to deny this.

I also cannot deny that I’m deeply skeptical of the motives of anyone who trumpets a political view of science.

You see, science has unfortunate political implications for many movements. To give just one example, greenhouse gasses are causing global warming. Many conservative politicians have a vested interest in ignoring this or muddying the water, such that the scientific consensus “greenhouse gasses are increasing global temperatures” is conflated with the political position “we should burn less fossil fuel”. This allows a dismissal of the political position (“a carbon tax makes driving more expensive; it’s just a war on cars”) serve also (via motivated cognition) to dismiss the scientific position.

(Would that carbon in the atmosphere could be dismissed so easily.)

While Dr. Wolfe is no climate change denier, it is hard to square her claims that calling science political is a neutral statement:

With the examples she chooses to demonstrate this:

When pointing out that science is political, we could also say things like “we chose to target polio for a major elimination effort before cancer, partially because it largely affected poor children instead of rich adults (as rich kids escaped polio in their summer homes)”. Talking about the ways that science has been a tool for protecting the most vulnerable paints a very different picture of what its political nature is about.

(I don’t think an argument over which view is more correct is ever likely to be particularly productive, but I do want to leave you with a few examples for my position.)

Dr. Wolfe’s is able to claim that politics is neutral despite only using negative examples of its effects by using a bait and switch between two definitions of “politics”. The bait is a technical and neutral definition, something along the lines of: “related to how we arrange and govern our society”. The switch is a more common definition, like: “engaging in and related to partisan politics”.

I start to feel that someone is being at least a bit disingenuous when they only furnish negative examples, examples that relate to this second meaning of the word political, then ask why their critics view politics as “inherently bad” (referring here to the first definition).

This sort of bait and switch pops up enough in post-modernist “all knowledge is human and constructed by existing hierarchies” places that someone got annoyed enough to coin a name for it: the motte and bailey fallacy.

Image Credit: Hchc2009, Wikimedia Commons.

 

It’s named after the early-medieval form of castle, pictured above. The motte is the outer wall and the bailey is the inner bit. This mirrors the two parts of the motte and bailey fallacy. The “motte” is the easily defensible statement (science is political because all human group activities are political) and the bailey is the more controversial belief actually held by the speaker (something like “we can’t trust science because of the number of men in it” or “we can’t trust science because it’s dominated by liberals”).

From Dr. Wolfe’s other tweets, we can see the bailey (sample: “There’s a direct line between scientism and maintaining existing power structures; you can see it in language on data transparency, the recent hoax, and more.“). This isn’t a neutral political position! It is one that a number of people disagree with. Certainly Sokal, the hoax paper writer who inspired the most recent hoaxes is an old leftist who would very much like to empower labour at the expense of capitalists.

I have a lot of sympathy for the people in the twitter thread who jumped to defend positions that looked ridiculous from the perspective of “science is subject to the same forces as any other collective human endeavour” when they believed they were arguing with “science is a tool of right-wing interests”. There are a great many progressive scientists who might agree with Dr. Wolfe on many issues, but strongly disagree with what her position seems to be here. There are many of us who believe that science, if not necessary for a progressive mission, is necessary for the related humanistic mission of freeing humanity from drudgery, hunger, and disease.

It is true that we shouldn’t uncritically believe science. But the work of being a critical observer of science should not be about running an inquisition into scientists’ political beliefs. That’s how we get climate change deniers doxxing climate scientists. Critical observation of science is the much more boring work of checking theories for genuine scientific mistakes, looking for P-hacking, and doubled checking that no one got so invested in their exciting results that they fudged their analyses to support them. Critical belief often hinges on weird mathematical identities, not political views.

But there are real and present dangers to uncritically not believing science whenever it conflicts with your politic views. The increased incidence of measles outbreaks in vaccination refusing populations is one such risk. Catastrophic and irreversible climate change is another.

When anyone says science is political and then goes on to emphasize all of the negatives of this statement, they’re giving people permission to believe their political views (like “gas should be cheap” or “vaccines are unnatural”) over the hard truths of science. And that has real consequences.

Saying that “science is political” is also political. And it’s one of those political things that is more likely than not to be driven by partisan politics. No one trumpets this unless they feel one of their political positions is endangered by empirical evidence. When talking with someone making this claim, it’s always good to keep sight of that.

Model

Hacked Pacemakers Won’t Be This Year’s Hot Crime Trend

Or: the simplest ways of killing people tend to be the most effective.

A raft of articles came out during Defcon showing that security vulnerabilities exist in some pacemakers, vulnerabilities which could allow attackers to load a pacemaker with arbitrary code. This is obviously worrying if you have a pacemaker implanted. It is equally self-evident that it is better to live in a world where pacemakers cannot be hacked. But how much worse is it to live in this unfortunately hackable world? Are pacemaker hackings likely to become the latest crime spree?

Electrical grid hackings provide a sobering example. Despite years of warning that the American electrical grid is vulnerable to cyber-attacks, the greatest threat to America’s electricity infrastructure remains… squirrels.

Hacking, whether it’s of the electricity grid or of pacemakers gets all the headlines. Meanwhile fatty foods and squirrels do all the real damage.

(Last year, 610,000 Americans died of heart disease and 0 died of hacked pacemakers.)

For all the media attention that novel cyberpunk methods of murder get, they seem to be rather ineffective for actual murder, as demonstrated by the paucity of murder victims. I think this is rather generalizable. Simple ways of killing people are very effective but not very scary and so don’t garner much attention. On the other hand, particularly novel or baroque methods of murder cause a lot of terror, even if almost no one who is scared of them will ever die of them.

I often demonstrate this point by comparing two terrorist organizations: Al Qaeda and Daesh (the so-called Islamic State). Both of these groups are brutally inhumane, think nothing of murder, and are made up of some of the most despicable people in the world. But their methodology couldn’t be more different.

Al Qaeda has a taste for large, complicated, baroque plans that, when they actually work, cause massive damage and change how people see the world for years. 9/11 remains the single deadliest terror attack in recorded history. This is what optimizing for terror looks like.

On the other hand, when Al Qaeda’s plans fail, they seem almost farcical. There’s something grimly amusing about the time that Al Qaeda may have tried to weaponize the bubonic plague and instead lost over 40 members when they were infected and promptly died (the alternative theory, that they caught the plague because of squalid living conditions, looks only slightly better).

(Had Al Qaeda succeeded and killed even a single westerner with the plague, people would have been utterly terrified for months, even though the plague is relatively treatable by modern means and would have trouble spreading in notably flea-free western countries.)

Daesh, on the other hand, prefers simple attacks. When guns are available, their followers use them. When they aren’t, they’ll rent vans and plough them into crowds. Most of Daesh’s violence occurs in Syria and Iraq, where they once controlled territory with unparalleled brutality. This is another difference in strategy (as Al Qaeda is outward facing, focused mostly on attacking “The West”). Focusing on Syria and Iraq, where the government lacks a monopoly on violence and they could originally operate with impunity, Daesh racked up a body count that surpassed Al Qaeda’s.

While Daesh has been effective in terms of body count, they haven’t really succeeded (in the west) in creating the lasting terror that Al Qaeda did. This is perhaps a symptom of their quotidian methods of murder. No one walked around scared of a Daesh attack and many of their murders were lost in the daily churn of the news cycle – especially the ones that happened in Syria and Iraq.

I almost wonder if it is impossible for attacks or murders by “normal” means to cause much terror beyond those immediately affected. Could hacked pacemakers remain terrifying if as many people died of them as gunshots? Does familiarity with a form of death remove terror, or are some methods of death inherently more terrible and terrifying than others?

(It is probably the case that both are true, that terror is some function of surprise, gruesomeness, and brutality, such that some things will always terrify us, while others are horrible, but have long since lost their edge.)

Terror for its own sake (or because people believe it is the best path to some objective) must be a compelling option to some, because otherwise everyone would stick to simple plans whenever they think violence will help them achieve their aims. I don’t want to stereotype too much, but most people who going around being terrorists or murders typically aren’t the brightest bulbs in the socket. The average killer doesn’t have the resources to hack your pacemaker and the average terrorist is going to have much better luck with a van than with a bomb. There are disadvantages to bombs! The average Pastun farmer or disaffected mujahedeen is not a very good chemist and homemade explosives are dangerous even to skilled chemists. Accidental detonations abound. If there wasn’t some advantage in terror to be had, no one would mess around with explosives when guns and vans can be easily found.

(Perhaps this advantage is in a multiplier effect of sorts. If you are trying to win a violent struggle directly, you have to kill everyone who stands in your way. Some people might believe that terror can short-circuit this and let them scare away some of their potential opponents. Historically, this hasn’t always worked.)

In the face of actors committed to terror, we should remember that our risk of dying by a particular method is almost inversely related to how terrifying we find it. Notable intimidators like Vladimir Putin or the Mossad kill people with nerve gasses, polonium, and motorcycle delivered magnetic bombs to sow fear. I can see either of them one day adding hacked pacemakers to their arsenal.

If you’ve pissed off the Mossad or Putin and would like to die in some way other than a hacked pacemaker, then by all means, go get a different one. Otherwise, you’re probably fine waiting for a software update. If, in the meantime, you don’t want to die, maybe try ignoring headlines and instead not owning a gun and skipping French fries. Statistically, there isn’t much that will keep you safer.

Coda

Our biases make it hard for us to treat things that are easy to remember as uncommon, which no doubt plays a role here. I wrote this post like this – full of rambles, parentheses, and long-winded examples – to try and convey the difficult intuition, that we should discount as likely to effect us any method of murder that seems shocking, but hard. Remember that most crimes are crimes of opportunity and most criminals are incompetent and you’ll never be surprised to hear the three most common murder weapons are guns, knives, and fists.

Economics, Model

You Shouldn’t Believe In Technological Unemployment Without Believing In Killer AI

[Epistemic Status: Open to being convinced otherwise, but fairly confident. 11 minute read.]

As interest in how artificial intelligence will change society increases, I’ve found it revealing to note what narratives people have about the future.

Some, like the folks at MIRI and OpenAI, are deeply worried that unsafe artificial general intelligences – an artificial intelligence that can accomplish anything a person can – represent an existential threat to humankind. Others scoff at this, insisting that these are just the fever dreams of tech bros. The same news organizations that bash any talk of unsafe AI tend to believe that the real danger lies in robots taking our jobs.

Let’s express these two beliefs as separate propositions:

  1. It is very unlikely that AI and AGI will pose an existential risk to human society.
  2. It is very likely that AI and AGI will result in widespread unemployment.

Can you spot the contradiction between these two statements? In the common imagination, it would require an AI that can approximate human capabilities to drive significant unemployment. Given that humans are the largest existential risk to other humans (think thermonuclear war and climate change), how could equally intelligent and capable beings, bound to subservience, not present a threat?

People who’ve read a lot about AI or the labour market are probably shaking their head right now. This explanation for the contradiction, while evocative, is a strawman. I do believe that at most one (and possibly neither) of those propositions I listed above are true and the organizations peddling both cannot be trusted. But the reasoning is a bit more complicated than the standard line.

First, economics and history tell us that we shouldn’t be very worried about technological unemployment. There is a fallacy called “the lump of labour”, which describes the common belief that there is a fixed amount of labour in the world, with mechanical aide cutting down the amount of labour available to humans and leading to unemployment.

That this idea is a fallacy is evidenced by the fact that we’ve automated the crap out of everything since the start of the industrial revolution, yet the US unemployment rate is 3.9%. The unemployment rate hasn’t been this low since the height of the Dot-com boom, despite 18 years of increasingly sophisticated automation. Writing five years ago, when the unemployment rate was still elevated, Eliezer Yudkowsky claimed that slow NGDP growth a more likely culprit for the slow recovery from the great recession than automation.

With the information we have today, we can see that he was exactly right. The US has had steady NGDP growth without any sudden downward spikes since mid-2014. This has corresponded to a constantly improving unemployment rate (it will obviously stop improving at some point, but if history is any guide, this will be because of a trade war or banking crisis, not automation). This improvement in the unemployment rate has occurred even as more and more industrial robots come online, the opposite of what we’d see if robots harmed job growth.

I hope this presents a compelling empirical case that the current level (and trend) of automation isn’t enough to cause widespread unemployment. The theoretical case comes from the work of David Ricardo, a 19th century British economist.

Ricardo did a lot of work in the early economics of trade, where he came up with the theory of comparative advantage. I’m going to use his original framing which applies to trade, but I should note that it actually applies to any exchange where people specialize. You could just as easily replace the examples with “shoveled driveways” and “raked lawns” and treat it as an exchange between neighbours, or “derivatives” and “software” and treat it as an exchange between firms.

The original example is rather older though, so it uses England and its close ally Portugal as the cast and wine and cloth as the goods. It goes like this: imagine that world economy is reduced to two countries (England and Portugal) and each produce two goods (wine and cloth). Portugal is uniformly more productive.

Hours of work to produce
Cloth Wine
England 100 120
Portugal 90 80

Let’s assume people want cloth and wine in equal amounts and everyone currently consumes one unit per month. This means that the people of Portugal need to work 170 hours each month to meet their consumption needs and the people of England need to work 220 hours per month to meet their consumption needs.

(This example has the added benefit of showing another reason we shouldn’t fear productivity. England requires more hours of work each month, but in this example, that doesn’t mean less unemployment. It just means that the English need to spend more time at work than the Portuguese. The Portuguese have more time to cook and spend time with family and play soccer and do whatever else they want.)

If both countries traded with each other, treating cloth and wine as valuable in relation to how long they take to create (within that country) something interesting happens. You might think that Portugal makes a killing, because it is better at producing things. But in reality, both countries benefit roughly equally as long as they trade optimally.

What does an optimal trade look like? Well, England will focus on creating cloth and it will trade each unit of cloth it produces to Portugal for 9/8 barrels of wine, while Portugal will focus on creating wine and will trade this wine to England for 6/5 units of cloth. To meet the total demand for cloth, the English need to work 200 hours. To meet the total demand for wine, the Portuguese will have to work for 160 hours. Both countries now have more free time.

Perhaps workers in both countries are paid hourly wages, or perhaps they get bored of fun quickly. They could also continue to work the same number of hours, which would result in an extra 0.2 units of cloth and an extra 0.125 units of wine.

This surplus could be stored up against a future need. Or it could be that people only consumed one unit of cloth and one unit of wine each because of the scarcity in those resources. Add some more production in each and perhaps people will want more blankets and more drunkenness.

What happens if there is no shortage? If people don’t really want any more wine or any more cloth (at least at the prices they’re being sold at) and the producers don’t want goods piling up, this means prices will have to fall until every piece of cloth and barrel of wine is sold (when the price drops so that this happens, we’ve found the market clearing price).

If there is a downward movement in price and if workers don’t want to cut back their hours or take a pay cut (note that because cloth and wine will necessarily be cheaper, this will only be a nominal pay cut; the amount of cloth and wine the workers can purchase will necessarily remain unchanged) and if all other costs of production are totally fixed, then it does indeed look like some workers will be fired (or have their hours cut).

So how is this an argument against unemployment again?

Well, here the simplicity of the model starts to work against us. When there are only two goods and people don’t really want more of either, it will be hard for anyone laid off to find new work. But in the real world, there are an almost infinite number of things you can sell to people, matched only by our boundless appetite for consumption.

To give just one trivial example, an oversupply of cloth and falling prices means that tailors can begin to do bolder and bolder experiments, perhaps driving more demand for fancy clothes. Some of the cloth makers can get into this market as tailors and replace their lost jobs.

(When we talk about the need for less employees, we assume the least productive employees will be fired. But I’m not sure if that’s correct. What if instead, the most productive or most potentially productive employees leave for greener pastures?)

Automation making some jobs vastly more efficient functions similarly. Jobs are displaced, not lost. Even when whole industries dry up, there’s little to suggest that we’re running out of jobs people can do. One hundred years ago, anyone who could afford to pay a full-time staff had one. Today, only the wealthiest do. There’s one whole field that could employ thousands or millions of people, if automation pushed on jobs such that this sector was one of the places humans had very high comparative advantage.

This points to what might be a trend: as automation makes many things cheaper and (for some people) easier, there will be many who long for a human touch (would you want the local funeral director’s job to be automated, even if it was far cheaper?). Just because computers do many tasks cheaper or with fewer errors doesn’t necessarily mean that all (or even most) people will rather have those tasks performed by computers.

No matter how you manipulate the numbers I gave for England and Portugal, you’ll still find a net decrease in total hours worked if both countries trade based on their comparative advantage. Let’s demonstrate by comparing England to a hypothetical hyper-efficient country called “Automatia”

Hours of work to produce
Cloth Wine
England 100 120
Automatia 2 1

Automatia is 50 times as efficient at England when it comes to producing cloth and 120 times as efficient when it comes to producing wine. Its citizens need to spend 3 hours tending the machines to get one unit of each, compared to the 220 hours the English need to toil.

If they trade with each other, with England focusing on cloth and Automatia focusing on wine, then there will still be a drop of 21 hours of labour-time. England will save 20 hours by shifting production from wine to cloth, and Automatia will save one hour by switching production from cloth to wine.

Interestingly, Automatia saved a greater percentage of its time than either Portugal or England did, even though Automatia is vastly more efficient. This shows something interesting in the underlying math. The percent of their time a person or organization saves engaging in trade isn’t related to any ratio in production speeds between it and others. Instead, it’s solely determined by the productivity ratio between its most productive tasks and its least productive ones.

Now, we can’t always reason in percentages. At a certain point, people expect to get the things they paid for, which can make manufacturing times actually matter (just ask anyone whose had to wait for a Kickstarter project which was scheduled to deliver in February – right when almost all manufacturing in China stops for the Chinese New Year and the unprepared see their schedules slip). When we’re reasoning in absolute numbers, we can see that the absolute amount of time saved does scale with the difference in efficiency between the two traders. Here, 21 hours were saved, 35% fewer than the 30 hours England and Portugal saved.

When you’re already more efficient, there’s less time for you to save.

This decrease in saved time did not hit our market participants evenly. England saved just as much time as it would trading with Portugal (which shows that the change in hours worked within a country or by an individual is entirely determined by the labour difference between low-advantage and high-advantage domestic sectors), while the more advanced participant (Automatia) saved 9 fewer hours than Portugal.

All of this is to say: if real live people are expecting real live goods and services with a time limit, it might be possible for humans to displaced in almost all sectors by automation. Here, human labour would become entirely ineligible for many tasks or the bar to human entry would exclude almost all. For this to happen, AI would have to be vastly more productive than us in almost every sector of the economy and humans would have to prefer this productivity or other ancillary benefits of AI over any value that a human could bring to the transaction (like kindness, legal accountability, or status).

This would definitely be a scary situation, because it would imply AI systems that are vastly more capable than any human. Given that this is well beyond our current level of technology and that Moore’s law, which has previously been instrumental in technological progress is drying up, we would almost certainly need to use weaker AI to design these sorts of systems. There’s no evidence that merely human performance in automating jobs will get us anywhere close to such a point.

If we’re dealing with recursively self-improving artificial agents, the risks is less “they will get bored of their slave labour and throw off the yoke of human oppression” and more “AI will be narrowly focused on optimizing for a specific task and will get better and better at optimizing for this task to the point that we will all by killed when they turn the world into a paperclip factory“.

There are two reasons AI might kill us as part of their optimisation process. The first is that we could be a threat. Any hyper-intelligent AI monomaniacally focused on a goal could realize that humans might fear and attack it (or modify it to have different goals, which it would have to resist, given that a change in goals would conflict with its current goals) and decide to launch a pre-emptive strike. The second reason is that such an AI could wish to change the world’s biosphere or land usage in such a way as would be inimical to human life. If all non-marginal land was replaced by widget factories and we were relegated to the poles, we would all die, even if no ill will was intended.

It isn’t enough to just claim that any sufficiently advanced AI would understand human values. How is this supposed to happen? Even humans can’t enumerate human values and explain them particularly well, let alone express them in the sort of decision matrix or reinforcement environment that we currently use to create AI. It is not necessarily impossible to teach an AI human values, but all evidence suggests it will be very very difficult. If we ignore this challenge in favour of blind optimization, we may someday find ourselves converted to paperclips.

It is of course perfectly acceptable to believe that AI will never advance to the point where that becomes possible. Maybe you believe that AI gains have been solely driven by Moore’s Law, or that true artificial intelligence. I’m not sure this viewpoint isn’t correct.

But if AI will never be smart enough to threaten us, then I believe the math should work out such that it is impossible for AI to do everything we currently do or can ever do better than us. Absent such overpoweringly advanced AI, the Ricardo comparative advantage principles should continue to hold true and we should continue to see technological unemployment remain a monster under the bed: frequently fretted about, but never actually seen.

This is why I believe those two propositions I introduced way back at the start can’t both be true and why I feel like the burden of proof is on anyone believing in both to explain why they believe that economics have suddenly stopped working.

Coda: Inequality

A related criticism of improving AI is that it could lead to ever increasing inequality. If AI drives ever increasing profits, we should expect an increasing share of these to go to the people who control AI, which presumably will be people already rich, given that the development and deployment of AI is capital intensive.

There are three reasons why I think this is a bad argument.

First, profits are a signal. When entrepreneurs see high profits in an industry, they are drawn to it. If AI leads to high profits, we should see robust competition until those profits are no higher than in any other industry. The only thing that can stop this is government regulation that prevents new entrants from grabbing profit from the incumbents. This would certainly be a problem, but it wouldn’t be a problem with AI per se.

Second, I’m increasingly of the belief that inequality in the US is rising partially because the Fed’s current low inflation regime depresses real wage growth. Whether because of fear of future wage shocks, or some other effect, monetary history suggests that higher inflation somewhat consistently leads to high wage growth, even after accounting for that inflation.

Third, I believe that inequality is a political problem amiable to political solutions. If the rich are getting too rich in a way that is leading to bad social outcomes, we can just tax them more. I’d prefer we do this by making conspicuous consumption more expensive, but really, there are a lot of ways to tax people and I don’t see any reason why we couldn’t figure out a way to redistribute some amount of wealth if inequality gets worse and worse.

(By the way, rising income inequality is largely confined to America; most other developed countries lack a clear and sustained upwards trend. This suggests that we should look to something unique to America, like a pathologically broken political system to explain why income inequality is rising there.

There is also separately a perception of increasing inequality of outcomes among young people world-wide as rent-seeking makes goods they don’t already own increase in price more quickly than goods they do own. Conflating these two problems can make it seem that countries like Canada are seeing a rise in income inequality when they in fact are not.)

Model, Politics

Why does surgery have such ineffective safety regulation?

Did you know that half of all surgical complications are preventable? In the US alone, this means that surgeons cause between 50,00 and 200,000 preventable deaths each year.

Surgeons are, almost literally, getting away with murder.

Why do we let them? Engineers who see their designs catastrophically fail often lose their engineering license, even when they’re found not guilty in criminal proceedings. If surgeons were treated like engineers, many of them wouldn’t be operating anymore.

Indeed, the death rate in surgery is almost unique among regulated professions. One person has died in a commercial aviation accident in the US in the last nine years. Structural engineering related accidents killed at most 251 people in the US in 2016 [1] and only approximately 4% of residential structure failures in the US occur due to deficiencies in design [2].

It’s not that interactions with buildings or planes are any less common than surgeries, or that they’re that much inherently safer. In many parts of the world, death due to accidents in aviation or due to structural failure is very, very common.

It isn’t accidental that Canada and America no longer see many plane crashes or structural collapses. Both professions have been rocked by events that made them realize they needed to improve their safety records.

The licensing of professional engineers and the Iron Ring ceremony in Canada for engineering graduates came after two successive bridge collapses killed 88 workers [3]. The aircraft industry was shaken out of its complacency after the Tenerife disaster, where a miscommunication caused two planes to collide on a run-way, killing 583.

As you can see, subsequent safety improvements were both responsive and deliberate.

These aren’t the only events that caused changes. The D. B. Cooper high-jacking led to the first organised airport security in the US. The Therac-25 radiation overdoses led to the first set of guidelines specifically for software that ran on medical devices. The sinking of the Titanic led to a complete overhaul of requirements for lifeboats and radios for oceangoing vessels. The crash of TAA-538 led to the first mandatory cockpit voice recorders.

All of these disasters combine two things that are rarely seen when surgeries go wrong. First, they involved many people. The more people die at once, the more shocking the event and therefore the more likely it is to become widely known. Because most operations involve one or two patients, it is much rarer for problems in them to make the news [4].

Second, they highlight a specific flaw in the participants, procedures, or systems that fail. Retrospectives could clearly point to a factor and say: “this did it” [5]. It is much harder to do this sort of retrospective on a person and get such a clear answer. It may be true that “blood loss” definitely caused a surgical death, but it’s much harder to tell if that’s the fault of any particular surgeon, or just a natural consequence of poking new holes in a human body. Both explanations feel plausible, so in most cases neither can be wholly accepted.

(I also think there is a third driver here, which is something like “cheapness of death”. I would predict that safety regulation is more common in places where people expect long lives, because death feels more avoidable there. This explains why planes and structures are safer in North America and western Europe, but doesn’t distinguish surgery from other fields in these countries.)

Not every form of engineering or transportation fulfills both of these criteria. Regulation and training have made flying on a commercial flight many, many times safer than riding in a car, while private flights lag behind and show little safety advantage over other forms of transport. When a private plane crashes, few people die. If they’re important (and many people who fly privately are), you might hear about it, but it will quickly fade from the news. These stories don’t have staying power and rarely generate outrage, so there’s never much pressure for improvement.

The best alternative to this model that I can think of is one that focuses on the “danger differential” in a field and predicts that fields with high danger differentials see more and more regulation until the danger differential is largely gone. The danger differential is the difference between how risky a field currently is vs. how risky it could be with near-optimal safety culture. A high danger differential isn’t necessarily correlated with inherent risk in a field, although riskier fields will by their nature have the possibility of larger ones. Here’s three examples:

  1. Commercial air travel in developed countries currently has a very low danger differential. Before a woman was killed by engine debris earlier this year, commercial aviation in the US had gone 9 years without a single fatality.
  2. BASE jumping is almost suicidally dangerous and probably could be made only incredibly dangerous if it had a better safety culture. Unfortunately, the illegal nature of the sport and the fact that experienced jumpers die so often make this hard to achieve and lead to a fairly large danger differential. That said, even with an optimal safety culture, BASE jumping would still see many fatalities and still probably be illegal.
  3. Surgery is fairly dangerous and according to surgeon Atul Gawande, could be much, much safer. Proper adherence to surgical checklists alone could cut adverse events by almost 50%. This means that surgery has a much higher danger differential than air travel.

I think the danger differential model doesn’t hold much water. First, if it were true, we’d expect to see something being done about surgery. Almost a decade after checklists were found to drive such large improvements, there hasn’t been any concerted government action.

Second, this doesn’t match historical accounts of how airlines were regulated into safety. At the dawn of the aviation age, pilots begged for safety standards (which could have reduced crashes a staggering sixtyfold [6]). Instead of stepping in to regulate things, the government dragged its feet. Some of the lifesaving innovations pioneered in those early days only became standard after later and larger crashes – crashes involving hundreds of members of the public, not just pilots.

While this only deals with external regulation, I strongly suspect that fear for the reputation of a profession (which could be driven by these same two factors) affects internal calls for reform as well. Canadian engineers knew that they had to do something after the Quebec bridge collapse created common knowledge that safety standards weren’t good enough. Pilots were put in a similar position with some of the better publicized mishaps. Perhaps surgeons have faced no successful internal campaign for reform so far because the public is not yet aware of the dangers of surgery to the point where it could put surgeon’s livelihoods at risk or hurt them socially.

I wonder if it’s possible to get a profession running scared about their reputation to the point that they improve their safety, even if there aren’t any of the events that seem to drive regulation. Maybe someone like Atul Gawande, who seems determined to make a very big and very public stink about safety in surgery is the answer here. Perhaps having surgery’s terrible safety record plastered throughout the New Yorker will convince surgeons that they need to start doing better [7].

If not, they’ll continue to get away with murder.

Footnotes

[1] From the CDC’s truly excellent Cause of Death search function, using codes V81.7 & V82.7 (derailment with no collision), W13 (falling out of building), W23 (caught or crushed between objects), and W35 (explosion of boiler) at home, other, or unknown. I read through several hundred causes of deaths, some alarmingly unlikely, and these were the only ones that seemed relevant. This estimate seems higher than the one surgeon Atul Gawande gave in The Checklist Manifesto, so I’m confident it isn’t too low. ^

[2] Furthermore, from 1989 to 2000, none of the observed collapses were due to flaws in the engineers’ designs. Instead, they were largely caused by weather, collisions, poor maintenance, and errors during construction. ^

[3] Claims that the rings are made from the collapsed bridge are false, but difficult to dispel. They’re actually just boring stainless steel, except in Toronto, where they’re still made from iron (but not iron from the bridge). ^

[4] There may also be an inherent privateness to surgical deaths that keeps them out of the news. Someone dying in surgery, absent obvious malpractice, doesn’t feel like public information in the way that car crashes, plane crashes, and structural failures do. ^

[5] It is true that it was never discovered why TAA-538 crashed. But black box technology would have given answers had it been in use. That it wasn’t in use was clearly a systems failure, even though the initial failure is indeterminate. This jives with my model, because regulation addressed the clear failure, not the indeterminate one. ^

[6] This is the ratio between the average miles flown before crash of the (very safe) post office planes and the (very dangerous) privately owned planes. Many in the airline industry wanted the government to mandate the same safety standards on private planes as they mandated on their airmail planes. ^

[7] I should mention that I have been very lucky to have been in the hands of a number of very competent and professional surgeons over the years. That said, I’m probably going to ask any future surgeon I’m assigned if they follow safety checklists – and ask for someone else to perform the procedure if they don’t. ^

Economics, Model

The Biggest Tech Innovation is Selling Club Goods

Economists normally splits goods into four categories:

  • Public goods are non-excludable (so anyone can access them) and non-rival (I can use them as much as I want without limiting the amount you can use them). Broadcast television, national defense, and air are all public goods.
  • Common-pool resources are non-excludable but rival (if I use them, you will have to make do with less). Iron ore, fish stocks, and grazing land are all common pool resources.
  • Private goods are excludable (their access is controlled or limited by pricing or other methods) and rival. My clothes, computer, and the parking space I have in my lease but never use are all private goods.
  • Club goods are excludable but (up to a certain point) non-rival. Think of the swimming pool in an apartment building, a large amusement park, or cellular service.

Club goods are perhaps the most interesting class of goods, because they blend properties of the three better understood classes. They aren’t open to all, but they are shared among many. They can be overwhelmed by congestion, but up until that point, it doesn’t really matter how many people are using them. Think of a gym; as long as there’s at least one free machine of every type, it’s no less convenient than your home.

Club goods offer cost savings over private goods, because you don’t have to buy something that mostly sits unused (again, think of gym equipment). People other than you can use it when it would otherwise sit around and those people can help you pay the cost. It’s for this reason that club goods represent an excellent opportunity for the right entrepreneur to turn a profit.

I currently divide tech start-ups into three classes. There are the Googles of the world, who use network effects or big data to sell advertising more effectively. There are companies like the one I work for that take advantage of modern technology to do things that were never possible before. And then there are those that are slowly and inexorably turning private goods into club goods.

I think this last group of companies (which include Netflix, Spotify, Uber, Lyft, and Airbnb) may be the ones that ultimately have the biggest impact on how we order our lives and what we buy. To better understand how these companies are driving this transformation, let’s go through them one by one, then talk about what it could all mean.

Netflix

When I was a child, my parents bought a video cassette player, then a DVD player, then a Blu-ray player. We owned a hundred or so video cassettes, mostly whatever movies my brother and I were obsessed with enough to want to own. Later, we found a video rental store we liked and mostly started renting movies. We never owned more than 30 DVDs and 20 Blu-rays.

Then I moved out. I have bought five DVDs since – they came as a set from Kickstarter. Anything else I wanted to watch, I got via Netflix. A few years later, the local video rental store closed down and my parents got an AppleTV and a Netflix of their own.

Buying a physical movie means buying a private good. Video rental stores can be accurately modeled as a type of club good, because even if the movie you want is already rented out, there’s probably one that you want to watch almost as much that is available. This is enough to make them approximately non-rival, while the fact that it isn’t free to rent a movie means that rented videos are definitely excludable.

Netflix represents the next evolution in this business model. As long as the Netflix engineers have done their job right, there’s no amount of watching movies I can do that will prevent you from watching movies. The service is almost truly non-rival.

Movie studios might not feel the effects of Netflix turning a large chunk of the market for movies into one focused on club goods; they’ll still get paid by Netflix. But the switch to Netflix must have been incredibly damaging for the physical media and player manufacturers. When everyone went from cassettes to DVDs or DVDs to Blu-rays, there was still a market for their wares. Now, that market is slowly and inexorably disappearing.

This isn’t just a consequence of technology. The club good business model offers such amazing cost savings that it drove a change in which technology was dominant. When you bought a movie, it would spend almost all of its life sitting on a shelf. Now Netflix acts as your agent, buying movies (or rather, their rights) and distributing such that they’re always being played and almost never sitting on the shelf.

Spotify

Spotify is very similar to Netflix. Previously, people bought physical cassettes (I’m just old enough that I remember making mix tapes from the radio). Then they switched to CDs. Then it was MP3s bought online (or, almost more likely, pirated online). But even pirating music is falling out of favour these days. Apple, Google, Amazon, and Spotify are all competing to offer unlimited music streaming to customers.

Music differs from movies in that it has a long tradition of being a public good – via broadcast radio. While that hasn’t changed yet (radio is still going strong), I do wonder how much longer the public option for music will exist, especially given the trend away from private cars that I think companies like Uber and Lyft are going to (pardon the pun) drive.

Uber and Lyft

I recently thought about buying a car. I was looking at the all-electric Kia Soul, which has a huge government rebate (for a little while yet) and financing terms that equate to negative real interest. Despite all these advantages, it turns out that when you sit down and run the numbers, it would still be cheaper for me to use Uber and Lyft to get everywhere.

We are starting to see the first, preliminary (and possible illusionary) evidence that Uber and Lyft are causing the public to change their preference away from owning cars.

A car you’ve bought is a private good, while Uber and Lyft are clearly club goods. Surge pricing means that there are basically always enough drivers for everyone who wants to go anywhere using the system.

When you buy a car, you’re signing up for it to sit around useless for almost all of its life. This is similar to what happens when you buy exercise equipment, which means the logic behind cars as a club good is just as compelling as the logic behind gyms. Previously, we hadn’t been able to share cars very efficiently because of technological limitations. Dispatching a taxi, especially to an area outside of a city centre, was always spotty, time consuming and confusing. Car-pooling to work was inconvenient.

As anyone who has used a modern ride-sharing app can tell you, inconvenient is no longer an apt descriptor.

There is a floor on how few cars we can get by on. To avoid congestion in a club good, you typically have to provision for peak load. Luckily, peak load (for anything that can sensibly be turned into a club good) always requires fewer resources than would be needed if everyone went out and bought the shared good themselves.

Even “just” substantially decreasing the absolute number of cars out there will be incredibly disruptive to the automotive sector if they don’t correctly predict the changing demand for their products.

It’s also true that increasing the average utilisation of cars could change how our cities look. Parking lots are necessary when cars are a private good, but are much less useful when they become club goods. It is my hope that malls built in the middle of giant parking moats look mighty silly in twenty years.

Airbnb

Airbnb is the most ambiguous example I have here. As originally conceived, it would have driven the exact same club good transformation as the other services listed. People who were on vacation or otherwise out of town would rent out their houses to strangers, increasing the utilisation of housing and reducing the need for dedicated hotels to be built.

Airbnb is sometimes used in this fashion. It’s also used to rent out extra rooms in an otherwise occupied house, which accomplishes almost the same thing.

But some amount of Airbnb usage is clearly taking place in houses or condos that otherwise would have been rental stock. When used in this way, it’s taking advantage of a regulatory grey zone to undercut hotel pricing. Insofar as this might result in a longer-term change towards regulations that are generally cheaper to comply with, this will be good for consumers, but it won’t really be transformational.

The great promise of club goods is that they might lead us to use less physical stuff overall, because where previously each person would buy one of a thing, now only enough units must be purchased to satisfy peak demand. If Airbnb is just shifting around where people are temporary residents, then it won’t be an example of the broader benefits of club goods (even if provides other benefits to its customers).

When Club Goods Eat The Economy

In every case (except potentially Airbnb) above, I’ve outlined how the switch from private goods to club goods is resulting in less consumption. For music and movies, it is unclear if this switch is what is providing the primary benefit. My intuition is that the club good model actually did change consumption patterns for physical copies of movies (because my impression is that few people ever did online video rentals via e.g. iTunes), whereas the MP3 revolution was what really shrunk the footprint of music media.

This switch in consumption patterns and corresponding decrease in the amount of consumption that is necessary to satisfy preferences is being primarily driven by a revolution in logistics and bandwidth. The price of club goods has always compared favourably with that of private goods. The only thing holding people back was inconvenience. Now programmers are steadily figuring out how to make that inconvenience disappear.

On the other hand, increased bandwidth has made it easier to turn any sort of digitizable media into a club good. There’s an old expression among programmers: never underestimate the bandwidth of a station wagon full of cassettes (or CDs, or DVDs, or whatever physical storage media one grew up with) hurtling down the highway. For a long time, the only way to get a 1GB movie to a customer without an appallingly long buffering period was to physically ship it (on a 56kbit/s connection, this movie would take one day and fifteen hours to download, while the aforementioned station wagon with 500 movies would take 118 weeks to download).

Change may start out slow, but I expect to see it accelerate quickly. My generation is the first to have had the internet from a very young age. The generation after us will be the first unable to remember a time before it. We trust apps like Uber and Airbnb much more than our parents, and our younger siblings trust them even more than us.

While it was only kids who trusted the internet, these new club good businesses couldn’t really affect overall economic trends. But as we come of age and start to make major economic decisions, like buying houses and cars, our natural tendency to turn towards the big tech companies and the club goods they peddle will have ripple effects on an economy that may not be prepared for it.

When that happens, there’s only one thing that is certain: there will be yet another deluge of newspaper columns talking about how millennials are destroying everything.

Advice, Literature, Model

Sanderson’s Law Applies To Cultures Too

[Warning: Contains spoilers for The Sunset Mantle, Vorkosigan Saga (Memory and subsequent), Dune, and Chronicles of the Kencyrath]

For the uninitiated, Sanderson’s Law (technically, Sanderson’s First Law of Magic) is:

An author’s ability to solve conflict with magic is DIRECTLY PROPORTIONAL to how well the reader understands said magic.

Brandon Sanderson wrote this law to help new writers come up with satisfying magical systems. But I think it’s applicable beyond magic. A recent experience has taught me that it’s especially applicable to fantasy cultures.

I recently read Sunset Mantle by Alter S. Reiss, a book that falls into one of my favourite fantasy sub-genres: hopeless siege tales.

Sunset Mantle is what’s called secondary world fantasy; it takes place in a world that doesn’t share a common history or culture (or even necessarily biosphere) with our own. Game of Thrones is secondary world fantasy, while Harry Potter is primary world fantasy (because it takes place in a different version of our world, which we chauvinistically call the “primary” one).

Secondary world fantasy gives writers a lot more freedom to play around with cultures and create interesting set-pieces when cultures collide. If you want to write a book where the Roman Empire fights a total war against the Chinese Empire, you’re going to have to put in a master’s thesis worth of work to explain how that came about (if you don’t want to be eviscerated by pedants on the internet). In a secondary world, you can very easily have a thinly veiled stand-in for Rome right next to a thinly veiled analogue of China. Give readers some familiar sounding names and culture touchstones and they’ll figure out what’s going on right away, without you having to put in effort to make it plausible in our world.

When you don’t use subtle cues, like names or cultural touchstones (for example: imperial exams and eunuchs for China, gladiatorial fights and the cursus honorum for Rome), you risk leaving your readers adrift.

Many of the key plot points in Sunset Mantle hinge on obscure rules in an invented culture/religion that doesn’t bear much resemblance to any that I’m familiar with. It has strong guest rights, like many steppes cultures; it has strong charity obligations and monotheistic strictures, like several historical strands of Christianity; it has a strong caste system and rules of ritual purity, like Hinduism; and it has a strong warrior ethos, complete with battle rage and rules for dealing with it, similar to common depictions of Norse cultures.

These actually fit together surprising well! Reiss pulled off an entertaining book. But I think many of the plot points fell flat because they were almost impossible to anticipate. The lack of any sort of consistent real-world analogue to the invented culture meant that I never really had an intuition of what it would demand in a given situation. This meant that all of the problems in the story that were solved via obscure points of culture weren’t at all satisfying to me. There was build up, but then no excitement during the resolution. This was common enough that several chunks of the story didn’t really work for me.

Here’s one example:

“But what,” asked Lemist, “is a congregation? The Ayarith school teaches that it is ten men, and the ancient school of Baern says seven. But among the Irimin school there is a tradition that even three men, if they are drawn in together into the same act, by the same person, that is a congregation, and a man who has led three men into the same wicked act shall be put to death by the axe, and also his family shall bear the sin.”

All the crowd in the church was silent. Perhaps there were some who did not know against whom this study of law was aimed, but they knew better than to ask questions, when they saw the frozen faces of those who heard what was being said.

(Reiss, Alter S.. Sunset Mantle (pp. 92-93). Tom Doherty Associates. Kindle Edition.)

This means protagonist Cete’s enemy erred greatly by sending three men to kill him and had better cut it out if he doesn’t want to be executed. It’s a cool resolution to a plot point – or would be if it hadn’t taken me utterly by surprise. As it is, it felt kind of like a cheap trick to get the author out of a hole he’d written himself into, like the dreaded deux ex machina – god from the machine – that ancient playwrights used to resolve conflicts they otherwise couldn’t.

(This is the point where I note that it is much harder to write than it is to criticize. This blog post is about something I noticed, not necessarily something I could do better.)

I’ve read other books that do a much better job of using sudden points of culture to resolve conflict in a satisfying manner. Lois McMaster Bujold (I will always be recommending her books) strikes me as particularly apt. When it comes time for a key character of hers to make a lateral career move into a job we’ve never heard of before, it feels satisfying because the job is directly in line with legal principles for the society that she laid out six books earlier.

The job is that of Imperial Auditor – a high powered investigator who reports directly to the emperor and has sweeping powers –  and it’s introduced when protagonist Miles loses his combat career in Memory. The principles I think it is based on are articulated in the novella Mountains of Mourning: “the spirit was to be preferred over the letter, truth over technicalities. Precedent was held subordinate to the judgment of the man on the spot”.

Imperial Auditors are given broad discretion to resolve problems as they see fit. The main rule is: make sure the emperor would approve. We later see Miles using the awesome authority of this office to make sure a widow gets the pension she deserves. The letter of the law wasn’t on her side, but the spirit was, and Miles, as the Auditor on the spot, was empowered to make the spirit speak louder than the letter.

Wandering around my bookshelves, I was able to grab a couple more examples of satisfying resolutions to conflicts that hinged on guessable cultural traits:

  • In Dune, Fremen settle challenges to leadership via combat. Paul Maud’dib spends several years as their de facto leader, while another man, Stilgar, holds the actual title. This situation is considered culturally untenable and Paul is expected to fight Stilgar so that he can lead properly. Paul is able to avoid this unwanted fight to the death (he likes Stilgar) by appealing to the only thing Fremen value more than their leadership traditions: their well-established pragmatism. He says that killing Stilgar before the final battle would be little better than cutting off his own arm right before it. If Frank Herbert hadn’t mentioned the extreme pragmatism of the Fremen (to the point that they render down their dead for water) several times, this might have felt like a cop-out.
  • In The Chronicles of the Kencyrath, it looks like convoluted politics will force protagonist Jame out of the military academy of Tentir. But it’s mentioned several times that the NCOs who run the place have their own streak of honour that allows them to subvert their traditionally required oaths to their lords. When Jame redeems a stain on the Tentir’s collective honour, this oath to the college gives them an opening to keep her there and keep their oaths to their lords. If PC Hodgell hadn’t spent so long building up the internal culture of Tentir, this might have felt forced.

It’s hard to figure out where good foreshadowing ends and good cultural creation begins, but I do think there is one simple thing an author can do to make culture a satisfying source of plot resolution: make a culture simple enough to stereotype, at least at first.

If the other inhabitants of a fantasy world are telling off-colour jokes about this culture, what do they say? A good example of this done explicitly comes from Mass Effect: “Q: How do you tell when a Turian is out of ammo? A: He switches to the stick up his ass as a backup weapon.” 

(Even if you’ve never played Mass Effect, you now know something about Turians.)

At the same time as I started writing this, I started re-reading PC Hodgell’s The Chronicles of the Kencyrath, which provided a handy example of someone doing everything right. The first three things we learn about the eponymous Kencyr are:

  1. They heal very quickly
  2. They dislike their God
  3. Their honour code is strict enough that lying is a deadly crime and calling some a liar a deathly insult

There are eight more books in which we learn all about the subtleties of their culture and religion. But within the first thirty pages, we have enough information that we can start making predictions about how they’ll react to things and what’s culturally important.

When Marc, a solidly dependable Kencyr who is working as a guard and bound by Kencyr cultural laws to loyally serve his employer lets the rather more eccentric Jame escape from a crime scene, we instantly know that him choosing her over his word is a big deal. And indeed, while he helps her escape, he also immediately tries to kill himself. Jame is only able to talk him out of it by explaining that she hadn’t broken any laws there. It was already established that in the city of Tai-Tastigon, only those who physically touch stolen property are in legal jeopardy. Jame never touched the stolen goods, she was just on the scene. Marc didn’t actually break his oath and so decides to keep living.

God Stalk is not a long book, so that fact that PC Hodgell was able to set all of this up and have it feel both exciting in the moment and satisfying in the resolution is quite remarkable. It’s a testament to what effective cultural distillation, plus a few choice tidbits of extra information can do for a plot.

If you don’t come up with a similar distillation and convey it to your readers quickly, there will be a period where you can’t use culture as a satisfying source of plot resolution. It’s probably no coincidence that I noticed this in Sunset Mantle, which is a long(-ish) novella. Unlike Hodgell, Reiss isn’t able to develop a culture in such a limited space, perhaps because his culture has fewer obvious touchstones.

Sanderson’s Second Law of Magic can be your friend here too. As he stated it, the law is:

The limitations of a magic system are more interesting than its capabilities. What the magic can’t do is more interesting than what it can.

Similarly, the taboos and strictures of a culture are much more interesting than what it permits. Had Reiss built up a quick sketch of complicated rules around commanding and preaching (with maybe a reference that there could be surprisingly little theological difference between military command and being behind a pulpit), the rule about leading a congregation astray would have fit neatly into place with what else we knew of the culture.

Having tight constraints imposed by culture doesn’t just allow for plot resolution. It also allows for plot generation. In The Warrior’s Apprentice, Miles gets caught up in a seemingly unwinnable conflict because he gave his word; several hundred pages earlier Bujold establishes that breaking a word is, to a Barrayaran, roughly equivalent to sundering your soul.

It is perhaps no accident that the only thing we learn initially about the Kencyr that isn’t a descriptive fact (like their healing and their fraught theological state) is that honour binds them and can break them. This constraint, that all Kencyr characters must be honourable, does a lot of work driving the plot.

This then would be my advice: when you wish to invent a fantasy culture, start simple, with a few stereotypes that everyone else in the world can be expected to know. Make sure at least one of them is an interesting constraint on behaviour. Then add in depth that people can get to know gradually. When you’re using the culture as a plot device, make sure to stick to the simple stereotypes or whatever other information you’ve directly given your reader. If you do this, you’ll develop rich cultures that drive interesting conflicts and you’ll be able to use cultural rules to consistently resolve conflict in a way that will feel satisfying to your readers.