Preprint paper, DeepMind described a new reinforcement learning technique that models human behavior in a potentially new and powerful way. It could lead to much more capable AI decision-making systems than have been previously released, which could be a boon for enterprises looking to boost productivity through workplace automation.
In “Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games,” DeepMind — the research division of Alphabet whose work chiefly involves reinforcement learning, an area of AI concerned with how software agents ought to take actions to maximize some reward — introduces an economic competition model with a peer-to-peer contract mechanism that enables the discovery and enforcement of alliances among agents in multi-player games. The coauthors say that this sort of alliance formation confers advantages that wouldn’t exist were the agents to go it alone.
“Zero-sum games have long guided artificial intelligence research, since they possess both a rich strategy space of best-responses and a clear evaluation metric,” wrote the paper’s contributors. “What’s more, competition is a vital mechanism in many real-world multi-agent systems capable of generating intelligent innovations: Darwinian evolution, the market economy and the AlphaZero algorithm, to name a few.”
The DeepMind scientists first sought to mathematically define the challenge of forming alliances, focusing on alliance formation in many-player zero-sum games — that is, mathematical representations of situations in which each participant’s gain or loss of utility is exactly balanced by the losses or gains of the utility of the other participants. They examined symmetric zero-sum many-player games — games in which all players have the same actions and symmetric payoffs given each individual’s action — and they attempted to provide empirical results showing that alliance formation often yields a social dilemma, thus requiring adaptation between co-players.
As the researchers point out, zero-sum multi-player games introduce the problem of dynamic team formation and breakup. Emergent teams must coordinate within themselves to effectively compete in the game, just as in team games like soccer. The process of team formation may itself be a social dilemma — intuitively, players should form alliances to defeat others, but membership in an alliance requires individuals to contribute to a wider good that is not completely aligned with their self-interest. Additionally, decisions must be made about which teams to join and leave, and how to shape the strategy of these teams.
The team experimented with a “gifting game” in which players — i.e., reinforcement learning-trained agents — started with a pile of digital chips of their own color. On each player’s turn, they had to take a chip of their own color and gift it to another player or discard it from the game. The game ended when no player had any chips of their own color left; the winners were the players with the most chips of any color, with winners sharing a payoff of value “1” equally and all other players receiving a payoff of “0.”
Players acted selfishly more often than not, the researchers found, hoarding chips such that a three-way draw resulted despite the fact that if two agents agreed to exchange chips, they’d achieve a better outcome. The team theorizes it was because although two players could’ve achieved a better outcome for the alliance were they to trust each other, each stood to gain by persuading the other to gift a chip and then reneging on the deal.
That said, they assert that reinforcement learning is able to adapt if an institution supporting cooperative behavior exists. That’s where contracts come in — the researchers propose a mechanism for incorporating contracts into games where each player must submit an offer comprising (1) a choice of partner, (2) a suggested action for that partner, and (3) an action that the player promises to take. If two players offer contracts that are identical, then these become binding, which is to say that the environment enforces the promised actions are taken.
The team reports that once agents were able to sign binding contracts, chips flowed freely in the “gifting game.” By contrast, without contracts and the benefits of the mutual trust they conferred, there wasn’t any chip exchange.
“Our model suggests several avenues for further work,” wrote the coauthors. “Most obviously, we might consider contracts in an environment with a larger state space … More generally, it would be fascinating to discover how a system of contracts might emerge and persist within multi-agent learning dynamics without directly imposing mechanisms for enforcement. Such a pursuit may eventually lead to a valuable feedback loop from AI to sociology and economics.