In this paperappearing in the Journal of Experimental & Theoretical Artificial Intelligence,scientist Steve Omohundrolays out the case that the autonomous robots of the future “are likely to behave in anti-social and harmful ways unless they are very carefully designed.”
In other words, Omohundro is hypothesizing that Hollywood’s common “robot uprising” trope holds some water in a very real way. Autonomous robots will soon be “approximately rational,” meaning that they will have a new degree of awareness of their goals and will take steps to ensure they can continue meeting them. The go-to exemplar here is always HAL, the sentient computer aboard the spaceship in 2001: A Space Odyssey, who kills the astronauts aboard the ship when he learns that they aim to power him down.
Omohundro scales this down a bit and offers the example of a chess-playing robot endowed with this “approximate rationality”:
When roboticists are asked by nervous onlookers about safety, a common answer is ‘We can always unplug it!’ But imagine this outcome from the chess robot’s point of view. A future in which it is unplugged is a future in which it cannot play or win any games of chess. This has very low utility and so expected utility maximization will cause the creation of the instrumental subgoal of preventing itself from being unplugged. If the system believes the roboticist will persist in trying to unplug it, it will be motivated to develop the subgoal of permanently stopping the roboticist. Because nothing in the simple chess utility function gives a negative weight to murder, the seemingly harmless chess robot will become a killer out of the drive for self-protection.
Robots are, on a certain level, crazed maniacs addicted to carrying out their tasks. This is great news for humans, who will be able to harness this addiction to have robots do a variety of things we don’t want to do. But Omohundro warns that we need to take steps now in order to ensure that future systems are designed safely — special care needs to be taken to ensure that a robot can be properly constrained and that its programming will never be at odds with itself.
Toward the end of the paper, the author lays out six different types of “harmful systems” (read: evil robots). These are: