What is specification gaming
AI algorithms have a tendency to exploit loopholes in the task specification. They may take shortcuts to achieve their goals. This phenomenon is called specification gaming.
If we don’t explicitly specify a problem or the environment or leave room for several interpretations, algorithms will try to game the specified objective (thus: ‘specification gaming’). In the process they may use shortcuts that have disastrous, unintended side-effects.
It is not really cheating, because the algorithm is doing what it is literally told to do.
specification gaming examples
Careful what you wish for
If you know the legend of King Midas, you are familiar with specification gaming. This mythical king could wish for anything he wanted and he choose to have everything he’d touch to turn into gold.
As punishment for his greed, his wish was taken very literal and everything around him turned into gold, including his family (who turned into gold statues) and his food (that became inedible). He died alone shortly after, which was surely not his intention.
King Midas made a crucial mistake: he wasn’t specific enough. He should have said something along the lines of “I want every metal object I touch to turn into gold”.
A real-life example would be an algorithm that is designed to minimize the energy usage on the power grid, e.g. in order to be more environmentally friendly. If the task is too broadly defined, the algorithm could turn the electricity off for the entire neighborhood. That would definitely lower the energy usage, but a power outage was not the intended outcome.
Therein lies the challenge for the developer: foreseeing possible loopholes. The more complex a system, the harder it is to do so.
No harm intended.
Specification gaming, rogue AI and the end of humanity
Specification gaming is even identified as a possible existential threat to humanity, for example in situations in which humans are the cause of a problem that needs to be solved.
Take global warming. An efficient way to tackle the climate crisis could be to eliminate humans from the equation. This would probably satisfy the objective in the literal sense (stop further global warming), but would not be an acceptable outcome from the human point of view.
Resources specification gaming
Great list of real-life examples of specification gaming, compiled by Google’s Deepmind.