314 — Learning Social Conventions in Markov Games
In the 60s, David Lewis began publishing a body of work describing the social implications of convention — the tendency for players in a game (where here, ‘game’ refers to a game in the ‘game theory’ sense, so this definition includes, say, society in general) to act in a consistent way without any rules explicitly mandating it. For example, Americans tend to speak English in day-to-day transactions: This is a function of society, not of law, and so it’s a convention and not a rule.
One of the very interesting byproducts of the recent Go-playing AI-vs-humans research was the realization that the AI used tactics that were never considered by the modern Go-playing human: Indeed, these moves were unconventional. Perhaps in Go, these sorts of techniques are acceptable; but a self-driving car that manages to get to Point B despite ignoring common driver conventions would be a pretty unpopular car. Convention is important.
Observational Self-Play (OSP) is a newly proposed mechanism by which self-play AI also learns by observing other players in order to learn their play conventions. This lightly penalizes the agent for acting against convention, and rewards the agent for following convention. So a “victory” made when following convention is favorable over an unorthodox victory.