Weapons of Math Destruction (2016)

Teal Deer

TL;DR

Sophisticated models that learn patterns from large datasets offer the promise of providing impartial, efficient, accurate, “smart” decision-makers. As such, they are becoming more widespread and have a lot of influence over people’s life outcomes. Weapons of Math Destruction argues, however, that these algorithms have troubling features that codify unjust discrimination and are inscrutable and unaccountable. I focus in this post on these undesirable features and how to counteract them.

Weapons of Math Destruction (2016) by Cathy O’Neil

What are “weapons of math destruction”?

Weapons of Math DestructionThis book focuses on the way mathematical/statistical/machine learning models can be employed to do wide-scale harm. The author picks out three features she uses to classify these “weapons of math destruction” (WMDs) / three features that all such weapons share:

(1) Harmful effects (i.e. the decisions made by these algorithms can negatively impact people, like shutting them out of employment, putting them in jail, charging them more money for the same service, etc.)

(2) Scale (the same algorithm is used by many different agents to make decisions that affect a large body of the population; this is the main contribution of computational tools — the automation of work in a way that makes it mass-produceable)

(3) Opaqueness / Lack of transparency / Lack of accountability (the source code of these models is proprietary, making it impossible or at least difficult to challenge decisions they spit out and ask for a justification of the decision)

Of the many WMD examples the author discusses in this book, I think they fall into two types:

(1) Things that are intrinsically harmful, which computational power allows to be scaled up. Examples from the book falling into this category are micro-trading in the financial sector; scams such as for-profit colleges and payday loans (scaled up by internet advertising and micro-targeting); politician micro-targeting (also benefiting from internet advertising and micro-targeting)1; and college rankings (rankings generally lead to perverse incentives but if they’re fairly local, the damage is smaller-scale).

(2) Things that are not intrinsically harmful, which become harmful if their decisions are unjust or unfair. Examples from this category include resume-ranking algorithms; algorithms used to make sentencing decisions; predictive policing schemes; operations research; and data-driven determination of interest and insurance rates.

I focus more on the second type in this post because I find the examples are more coherent and fit a single pattern. However, both types of WMDs have the effect of reinforcing and exacerbating inequality, which is the author’s main concern when analyzing the impact of these types of algorithms on society.

Ways in which WMDs support inequality

Individual attention (manual evaluation) and automation (WMDs) form a bifurcated system

One of the points the author makes early on is that WMDs are a form of work automation. Automation is a way to deal with a large number of things (in the case of WMDs, those things are frequently people) cheaply. The first example the author gives is that of a teacher whose performance was evaluated by a value-added model (a fancy, but statistically unsound model based on standardized test scores — it is close to being a random number generator). The evaluation it gave of her teaching was poor, and so she was fired from the low-income school she was working at, despite highly positive feedback from students and parents. She was hired by a wealthy private school on the basis of her personal connections and strong evaluations. So a teacher evaluation tool, which is deployed largely in low-income public school districts to eliminate “waste” (underperforming teachers), culls essentially a random portion of its teachers each year, keeping teacher turnover high in low-income schools (undoubtedly negative for the learning environment of those schools) and pushing good teachers to wealthy schools capable of retaining them.

Another example is that there is a lot of work done in retail/service jobs to optimize the staff size and scheduling to the predicted conditions of the day. This kind of smart scheduling results in unstable/unpredictable schedules for service workers, which puts stress on them and their ability to make plans or work the number of hours they need in a week. While this kind of micro-scheduling has the ability to expand beyond retail jobs, at the moment, the brunt of these harsh algorithms are being borne by low-paid workers.

Without a closed loop, WMD predictions become truth

One problematic feature of many WMDs discussed in this book is that they do not update their models with new evidence (or in some cases, do not need to update their models). Ideally, a model that’s making decisions based on predictions would follow up with the objects of its decisions in order to evaluate how appropriate its earlier decision was. If its initial decision was faulty, it needs to update on the new information to learn that people similar to the one evaluated are more likely to warrant the opposite decision.

However, in some cases, the decisions of WMDs are self-fulfilling or circular. If a WMD deems someone to be hireable and they are hired, its prediction becomes truth. Conversely, if a WMD decides someone isn’t hireable, that person becomes de facto un-hireable. Same goes for creditworthiness — if no one deems a person worthy of loan, that person cannot prove their creditworthiness. The WMD creates the truth they see in the world.

These kinds of self-fulfilling prophecies are especially at play with predictive policing algorithms because more resources (police) are sent to high-crime areas, which increases the chance of a crime being caught, which raises the crime statistics in that neighborhood.

In U.S. politics, because of the electoral college system, voters in safe states have very little ability to influence the outcome of a presidential election. Voting models therefore put much more weight on predicting the behavior of people in swing states. A similar thing happens with age and education: young people and people with low education consistently turn out to vote at lower rates than other segments of the population. This creates a vicious cycle: unimportant voters are not catered to because they do not influence the results; unimportant voters get disenchanted and stop voting, which makes them less important/more discounted in models, and less catered to. The prediction becomes reality.

WMDs are thus a complicated/obfuscatory way to say that “those without should go without” and “those who have deserve to have”, which serves to cement the status quo. A way that the author put it in a talk was that WMDs encode and automate the past, and in so doing, create (not predict) the future.

WMDs weaken workers’ rights

WMDs are often employed in ways that undermine workers’ rights, and in many cases this is very very intentional. For example, the effect of the teacher evaluation systems described above is to weaken teacher job security by widespread evaluations whose results are difficult to contest. Micro-scheduling algorithms infringe upon the right of workers to have predictable yet flexible schedules that respect their commitments outside of work (e.g. to their family and friends). The author of the book warns that these kinds of performance evaluations and workplace optimizations will not stay confined to one industry. (However, given that most workforces are not as strongly unionized as teachers, my guess is that WMD-based workplace evaluation is just not necessary, because job security is already so precarious/”at-will”.)

A wing of the Democrat party has largely embraced the kind of “smart evaluation” and “efficient leanness” embodied by WMDs: a technocratic, meritocratic ordering of society, and a love of sophistication for its sake while ignoring (or pretending to ignore) the obfuscatory and inegalitarian effects of their models. For example, Obama’s education grant programs took back mandatory evaluation of schools based on standardized test scores (from the No Child Left Behind act) but still incentivized states to implement more fine-grained teacher evaluation via test scores, value-added models being the way to accomplish that; these value-added teacher evaluations were championed by Democrats like Rahm Emanuel; Clinton pushed for teacher evaluations in the past but backed off of them in the 2016 primary, while keeping the door open for evaluation of and closure of low-performing schools.2

Another major example the book uses is the micro-segmentation of the market happening in the insurance industry in general, but the health insurance industry in particular. The book talks about how insurance companies now work with corporations to offer insurance plan discounts in exchange for employee participation in wellness programs like step count monitoring. For people concerned with the workers’ rights to privacy and autonomy, this surrendering of personal health data and activity to one’s employer in exchange for money should be worrying.

But let me detour a bit to talk about how messed up insurance micro-segmentation is in general. Micro-segmentation is when you sort people into many tiers of risk and charge people in the higher-risk tiers more money (or more legally: have a really high default premium and give people in lower-risk tiers discounts). This activity fundamentally undermines the whole purpose of insurance. As O’Neil puts it in this book:

Insurance is an industry, traditionally, that draws on the majority of the community to respond to the needs of an unfortunate minority […] As insurance companies learn more about us, they’ll be able to pinpoint those who appear to be the riskiest customers, and then either drive their rates to the stratosphere or — where legal — deny them coverage. This is a far cry from insurance’s original purpose, which is to help society balance its risk. In a targeted world, we no longer pay the average. Instead, we’re saddled with anticipated costs. Instead of smoothing out life’s costs, insurance companies will demand payment for those bumps in advance. This undermines the purpose of insurance, and the hits will fall especially hard on those who can least afford them.

Like she said, first, this system exacerbates inequality by charging people who can least afford insurance high premiums, like squeezing blood from a rock. Second, it undermines what insurance is meant to be. What is unfortunate is the extent to which people have come to accept this in their hearts as okay! I have had liberals tell me recently that people who engage in less risky behavior should be able to pay lower premiums for their insurance. This seems reasonable at the start, but it ends up turning insurance into something where people are financially responsible for everything they are demonstrably at fault for not preventing, regardless of their ability to pay for it. I can’t imagine buying something called “insurance” that asks me to pay for risk directly rather than offsetting it!

Anyway, healthcare is a right, and insofar as we make access to it dependent on people’s ability to pay, and we monetarily punish people for being unhealthy, we are both rendering insurance socially useless and are depriving people of their rights.

The math of WMDs obscures illegal discrimination

Finally, the biggest theme of this book is that these algorithms obfuscate discrimination that is unethical — and in many cases illegal to boot. A recurring pattern of the deployment of WMDs is that they were seen as improvements of the status quo, which is that decisions are left to the subjective evaluations of fallible humans, who are prone to nepotism, racial and other types of bias, and inconsistency. Math gives people an illusion of impartiality and lack of bias. However, the formulas used in the math are designed by people who are fallible, and the systems are often bootstrapped by data consisting of prior decisions made by humans whose decisions were fallible (such decisions are ironically considered “ground truth” or the “gold standard”). As such, the main contribution of the addition of fancy models to decision-making (aside from automation) is to obscure discriminatory decision-making.

As an example, the author talks about personality tests used in hiring. These collect information about people’s inclination to each of the Big Five personality traits, and on this basis evaluate whether people are likely to be high-quality workers and thus worth hiring. However, these traits correlate with various mental illnesses or neuroatypicality, making this arguably a violation of the Americans with Disabilities Act.

Some other common proxies people use turn out to be proxies for race. For example, zip codes — for many people, this is a readily-available piece of information that people might throw into a big data model without reservations. But because of segregation, zip code is a proxy of race, and discrimination on the basis of race is illegal. However, many data scientists will freely admit to using geographical data in their models because they are unaware of the implications of doing so.

We would never accept association to be a basis of suspicion, especially within our legal system. And yet, that is what these tests do.

More generally, machine learning is often an exercise in stereotyping, which is essentially the implication of guilt by association. The author describes how some courts predict convicted criminals’ chance of recidivism using a questionnaire (Level of Service Inventory – Revised, or LSI-R), and use it to make sentencing or parole decisions. Part of the questionnaire asks questions about their education level, whether the person has parents or other family members who were convicted, and whether they have friends who are criminals. The first thing correlates to race, and the second two allows a person’s criminality to be moderated by where they come from and who they associate with. We would never accept association to be a basis of suspicion, especially within our legal system. And yet, that is what these tests do. But because these prediction algorithms are sufficiently hidden/opaque, this violation of people’s rights flies under the radar.

What to do about WMDs

The author has a lot of productive suggestions about how to mitigate the harm caused by WMDs.

The objectives of creating a model should always be helping meet the needs of THE PEOPLE BEING MODELED (as opposed to increasing efficiency or quality of decisions). The punchier slogan version of this is “people over profit”. As part of this, (1) people who are affected by the algorithm should have the ability to offer feedback on its functioning and make suggestions to change the algorithm, and (2) people should have access to the model, allowing them to see how different inputs would have resulted in a different decision.

Fairness should be explicitly built into models. The author does not go into much detail of how this might be accomplished, but at the minimum this entails two things: First, people should be made widely aware of proxies that are dangerous to use, such as social network connections and geographical information. Second, keeping humans in the loop (as laid out in the first point) can help improve algorithms to be more fair.

Data science may need to be professionalized, i.e. adopt a code of ethics that is enforced by a body of peers capable of removing people from the profession who are shown to have behaved unethically. Part of that code of ethics will need to talk about the racist proxies listed above and the duty to be responsible with people’s private data, like not allowing predicted health outcomes to affect people’s access to resources — this is unethical.

Existing algorithms need to be held legally accountable for discriminatory behavior. A lot of work is already ongoing in this area. There is a lawsuit against the personality tests used for hiring, and people are doing research to “audit” WMDs by sending in different profiles as input and seeing how that affects the algorithm’s output. People are also pushing for algorithms that are used in the public sector (legal system, public school employees, etc.) to divulge proprietary code so as to be democratically accountable.

Conclusion

As I mentioned earlier, the idea of efficiency and smartness offered by sophisticated algorithms make them very appealing to a certain type of person attracted to smart decision-making and leanness, which means they have gotten wide uptake. However, these models perpetuate inequality (and sometimes that is a feature and not a bug). Regardless of intention, it is vital that people demand democratic control and accountability when these models are employed.

Buy on Amazon


1 The book illustrates this is in an interesting way. She talks about how Mitt Romney’s 47% comment — that 47% of Americans are dependent on the government and feel entitled to government handouts — was made to a closed-door private donors event. That rhetoric meant to stroke the egos of rich donors was not the kind of rhetoric a politician would use on the campaign trail, and Romney was burned badly when a bartender at the event leaked video of his comments to the press. Targeted ads offered by Facebook and others allows a Politician of Many Faces to segment their base and present a pleasing aspect to each one, but much more successfully than Romney did. This kind of activity undermines democracy.

2 Some background on Clinton’s ambivalent position: 1 2 3 (I don’t think she meant what she was literally saying, but I do believe she intended to convey a sense that underperforming schools should be shuttered.)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s