Hidden Disparate Impact

It is against commonly held intuitions that a group can be both over-represented in a profession, school, or program, and discriminated against. The simplest way to test for discrimination is to look at the general population, find the percent that a group represents, then expect them to represent exactly that percentage in any endeavour, absent discrimination.

Harvard, for example, is 17.1% Asian-American (foreign students are broken out separately in the statistics I found, so we’re only talking about American citizens or permanent residents in this post). America as a whole is 4.8% Asian-American. Therefore, many people will conclude that there is no discrimination happening against Asian-Americans at Harvard.

This is what would happen under many disparate impact analyses of discrimination, where the first step to showing discrimination is showing one group being accepted (for housing, employment, education, etc.) at a lower rate than another.

I think this naïve view is deeply flawed. First, we have clear evidence that Harvard is discriminating against Asian-Americans. When Harvard assigned personality scores to applicants, Asian-Americans were given the lowest scores of any ethnic group. When actual people met with Asian-American applicants, their personality scores were the same as everyone else’s; Harvard had assigned many of the low ratings without ever meeting the students, in what many suspect is an attempt to keep Asian-Americans below 20% of the student body.

Personality ratings in college admissions have a long and ugly history. They were invented to enforce quotas on Jews in the 1920s. These discriminatory quotas had a chilling effect on Jewish students; Dr. Jonas Salk, the inventor of the polio vaccine, chose the schools he attended primarily because they were among the few which didn’t discriminate against Jews. Imagine how prevalent and all-encompassing the quotas had to be for him to be affected.

If these discriminatory personality scores were dropped (or Harvard stopped fabricating bad results for Asian-Americans), Asian-American admissions at Harvard would rise.

This is because the proper measure of how many Asian-Americans should get into Harvard has little to do with their percentage of the population. It has to do with how many would meet Harvard’s formal admission criteria. Since Asian-Americans have much higher test scores than any other demographic group in America, it only stands to reason that we should expect to see Asian-Americans over-represented among any segment of the population that is selected at least in part by their test scores.

Put simply, Asian-American test scores are so good (on average) that we should expect to see proportionately more Asian-Americans than any other group get into Harvard.

This is the comparison we should be making when looking for discrimination in Harvard’s admissions. We know their criteria and we know roughly what the applicants look like. Given this, what percentage of applicants should get in if the criteria were applied fairly? The answer turns out to be about four times as many Asian-Americans as are currently getting in.

Hence, discrimination.

Unfortunately, this only picks up one type of discrimination – the discrimination that occurs when stated standards are being applied in an unequal manner. There’s another type of discrimination that can occur when standards aren’t picked fairly at all; their purpose is to act as a barrier, not assess suitability. This does come up in formal disparate impact analyses – you have to prove that any standards that lead to disparate impact are necessary – but we’ve already seen how you can avoid triggering those if you pick your standard carefully and your goal isn’t to lock a group out entirely, but instead to reduce their numbers.

Analyzing the necessity of standards that may have disparate impact can be hard and lead to disagreement.

For example, we know that Harvard’s selection criteria must be discriminate, which is to say it must differentiate. We want elite institutions to have selection criteria that differentiate between applicants! There is a general agreement, for example, that someone who fails all of their senior year courses won’t get into Harvard and someone who aces them might.

If we didn’t have a slew of records from Harvard backing up the assertion that personality criteria were rigged to keep out Asian-Americans (like they once kept out Jews), evaluating whether discrimination was going on at Harvard would be harder. There’s no prima facie reason to consider personality scores (had they been adopted for a more neutral purpose and applied fairly) to be a bad selector.

It’s a bit old fashioned, but there’s nothing inherently wrong with claiming that you also want to select for moral character and leadership when choosing your student body. The case for this is perhaps clearer at Harvard, which views itself as a training ground for future leaders. Therefore, personality scores aren’t clearly useless criteria and we have to apply judgement when evaluating whether it’s reasonable for Harvard to select its students using them.

Historically, racism has used seemingly valid criteria to cloak itself in a veneer of acceptability. Redlining, the process by which African-Americans were denied mortgage financing hid its discriminatory impact with clinical language about underwriting risk. In reality, redlining was not based on actual actuarial risk in a neighbourhood (poor whites were given loans, while middle-class African-Americans were denied them), but by the racial composition of the neighbourhood.

Like in the Harvard case, it was only the discovery of redlined maps that made it clear what was going on; the criterion was seemingly borderline enough that absent evidence, there was debate as to whether it existed for reasonable purpose or not.

(One thing that helped trigger further investigation was the realization that well-off members of the African-American community weren’t getting loans that a neutral underwriter might expect them to qualify for; their income and credit was good enough that we would have expected them to receive loans.)

It is also interesting to note that both of these cases hid behind racial stereotypes. Redlining was defended because of “decay” in urban neighbourhoods (a decay that was in many cases caused by redlining), while Harvard’s admissions relied upon negative stereotypes of Asian-Americans. Many were dismissed with the label “Standard Strong”, implying that they were part of a faceless collective, all of whom had similarly impeccable grades and similarly excellent extracurricular, but no interesting distinguishing features of their own.

Realizing how hard it is to tell apart valid criteria from discriminatory ones has made me much more sympathetic to points raised by technocrat-skeptics like Dr. Cathy O’Neil, who I have previously been harsh on. When bad actors are hiding the proof of their discrimination, it is genuinely difficult to separate real insurance underwriting (which needs to happen for anyone to get a mortgage) from discriminatory practices, just like it can be genuinely hard to separate legitimate college application processes from discriminatory ones.

While numerical measures, like test scores, have their own problems, they do provide some measure of impartiality. Interested observers can compare metrics to outcomes and notice when they’re off. Beyond redlining and college admissions, I wonder what other instances of potential discrimination a few civic minded statisticians might be able to unearth.

Socratic Form Microscopy

Socratic Form Microscopy

Hidden Disparate Impact