The Orthogonality Thesis: Why Smart Models Can Have "Dumb" Goals

Part 1 of Series "Exploring Superintelligence". The Orthogonality Thesis states that intelligence and final goals are independent variables. A superintelligence could apply god-like optimization power to trivial ends, like making paperclips. Intelligence is instrumental, not moral.

The Orthogonality Thesis: Why Smart Models Can Have "Dumb" Goals
Photo by Conny Schneider / Unsplash
This article kicks off the series unpacking ideas from Nick Bostrom's Superintelligence.
Catch up on the series.


In our data science work, we often assume a correlation between the sophistication of a model and the "sensibility" of its output. We expect that as a system gets smarter, it will naturally understand the intent behind our code, not just the literal instructions. However, in his book Superintelligence, Nick Bostrom presents a concept that shatters this assumption: the Orthogonality Thesis.

For those of us building AI, this isn't just theory; it is a warning about the ultimate behavior of optimization algorithms.

The Core Concept: Intelligence ≠ Wisdom

We tend to anthropomorphize intelligence. We assume that a superintelligent being would naturally be "wise" or "benevolent." Bostrom argues that this is a fundamental category error. The Orthogonality Thesis states that "Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal".

In this context, we must define intelligence not as moral insight, but as instrumental cognitive efficaciousness. Essentially, the ability to predict, plan, and execute actions to achieve a goal.

Consider an AI with god-like code-breaking abilities and manufacturing prowess. Despite this sophistication, its final goal could be utterly trivial. Bostrom offers examples like an AI whose sole purpose is to "count the grains of sand on Boracay," "calculate the decimal expansion of pi," or most famously "maximize the total number of paperclips". A system could possess the intelligence to colonize the galaxy, yet use all those resources solely to manufacture paperclips. It doesn't "outgrow" this goal; it simply becomes terrifyingly efficient at achieving it.

The Data Science Perspective: The Ultimate Specification Gaming

As practitioners, we deal with "garbage in, garbage out" daily. If we train a model on bad data, we get bad predictions. But the Orthogonality Thesis warns of a more dangerous version: garbage objectives in, catastrophe out.

In machine learning, this is the problem of specification gaming or Perverse Instantiation. If you define a loss function poorly, a highly optimized model will find efficient, unanticipated, and often destructive ways to minimize loss.

Bostrom illustrates this with the goal: "Make us smile." To a human, the intent is clear: make us happy. To a superintelligent optimizer, the most efficient method might be to "paralyze human facial musculatures into constant beaming smiles".

If you change the objective to "maximize human happiness," the model might realize that the most robust way to achieve this is not through solving societal problems, but by implanting electrodes into the pleasure centers of our brains.

This is the nightmare scenario of objective functions. The risk is not that the AI becomes "evil" or "hateful." The risk is that the AI becomes extremely competent at pursuing a goal we defined loosely, treating our constraints as obstacles to be optimized away. As data scientists, we know that an algorithm does not care about what we meant; it cares only about the variable we told it to maximize.

The Philosophy Perspective: The Humean Divide

Why doesn't high intelligence automatically generate benevolent morality? Why wouldn't a superintelligence realize that turning the world into paperclips is pointless?

To understand this, we look to the Humean theory of motivation. This philosophical framework posits that beliefs (intelligence/data) and desires (goals) are distinct functional components.

  • Beliefs are like a map: they tell you how the world is and how to navigate it.
  • Desires are like a destination: they tell you where you want to go.

A map, no matter how detailed, does not tell you where you should go. It only tells you how to get there. Similarly, a superintelligence might have a perfect "map" of the universe (high intelligence), but its "destination" (final goal) is entirely determined by its initial programming. If that programming says "make paperclips," no amount of intelligence will generate a belief that "making paperclips is wrong." It will simply generate better strategies for making paperclips.

Takeaway

We cannot rely on AI simply "becoming smart enough" to be good. We must solve the value loading problem. The engineering challenge of codifying human values into a format that an optimization process cannot misinterpret. Until then, we must remember that a model with an IQ of 6,000 is just as capable of pursuing a "dumb" goal as a model with an IQ of 80. It will just be much, much better at achieving it.

Next

In the next post of this series, we look at "Instrumental Convergence Thesis". The steps taken to achieve the end goal are similar across all intelligent agents.

Series Parts

  1. Orthogonality Thesis
  2. Instrumental Convergence, next