What is specification gaming

AI algorithms have a tendency to exploit loopholes in the task specification. They may take shortcuts to achieve their goals. This phenomenon is called specification gaming.

If we don’t explicitly specify a problem or the environment or leave room for several interpretations, algorithms will try to game the specified objective (thus: ‘specification gaming’). In the process they may use shortcuts that have disastrous, unintended side-effects.

Specification gaming: just trying to find the shortcut

It is not really cheating, because the algorithm is doing what it is literally told to do.

specification gaming examples

Careful what you wish for

If you know the legend of King Midas, you are familiar with specification gaming. This mythical king could wish for anything he wanted and he choose to have everything he’d touch to turn into gold.

As punishment for his greed, his wish was taken very literal and everything around him turned into gold, including his family (who turned into gold statues) and his food (that became inedible). He died alone shortly after, which was surely not his intention.

King Midas made a crucial mistake: he wasn’t specific enough. He should have said something along the lines of “I want every metal object I touch to turn into gold”.

ALGORITHMIC SHORTCUTS

A real-life example would be an algorithm that is designed to minimize the energy usage on the power grid, e.g. in order to be more environmentally friendly. If the task is too broadly defined, the algorithm could turn the electricity off for the entire neighborhood. That would definitely lower the energy usage, but a power outage was not the intended outcome.

Therein lies the challenge for the developer: foreseeing possible loopholes. The more complex a system, the harder it is to do so.

No harm intended.

Specification gaming, rogue AI and the end of humanity

Specification gaming is even identified as a possible existential threat to humanity, for example in situations in which humans are the cause of a problem that needs to be solved.

Take global warming. An efficient way to tackle the climate crisis could be to eliminate humans from the equation. This would probably satisfy the objective in the literal sense (stop further global warming), but would not be an acceptable outcome from the human point of view.

FOOD FOR THOUGHT

Do you consider it cheating to use a loophole or to exploit a bug in the system?

(Humans and corporations do it all the time. E.g. tax havens are legal loopholes).

Resources specification gaming

Great list of real-life examples of specification gaming, compiled by Google’s Deepmind.