Agent-based models (ABMs) are often been used to investigate how decisions made by individuals within a system lead to systemic outcomes that might not be obvious from knowing those micro decisions. DeepMind (a Google owned company) has published a paper on using machine learning techniques to study how agents in Prisoners Dilemma-style games learn whether to cooperate or exploit each other. In the paper’s conclusion DeepMind joins a chorus of researchers that proposes the use of agent-based modelling to assess how changes in regulations will affect behaviour, including testing for unintended consequences of policy. From an asset owner’s perspective this has potential application to the design of DC retirement arrangements – particularly the ability to model decision-making in response to choice with incomplete information, and how this might lock people into different decision paths. If one has a paternalistic perspective designing such systems to have a “least harm” bias makes sense. Developing the tools to test if a system encourages harmful behaviour would seem a necessary part of that process.
DeepMind describes in its paper (Multi-agent Reinforcement Learning in Sequential Social Dilemmas) that it applied its experience of using neural networks for decision-making to repeated play of Prisoner’s Dilemma-style games. The results showed the emergence of cooperation (playing so that both players benefit) and defection (playing for individual benefit at the expense of the other player) spontaneously in each game. Whether players learnt to cooperate or defect depended on the game being played but also the “cognitive ability” of the player.
As noted in the paper, from a structural perspective the single play version of the games in the paper are identical to the Prisoner’s Dilemma. However, incorporating repeated play and learning by the players introduces a temporal and path-dependence to the strategies employed by the players and the behaviours/outcomes that result. The paper notes that a number of real world dilemmas that could be considered single-play Prisoner’s Dilemma-style games are actually better thought of as repeated, sequential games of the type modelled in the paper. Real world problems given as examples are the extraction of renewable vs non-renewable resources and the emergence of social behaviour patterns from experience of sustainable vs unsustainable social behaviours.
While many in the investment industry aim to exploit machine learning for its potential to assist in security selection, portfolio management or trading, this paper from DeepMind shows that these advances also have the potential to better model financial decision-making and the impact of policy in potentially more realistic simulations. Possible applications of such modelling might include insight into how market strategies might evolve or the unintended consequences of different regulations on the financial industry.
The DeepMind paper also shows how successful players pursued strategies that they were able to successfully execute (ie strategies they could implement) even if there were theoretically better strategies available. In an investment context this resonates with the concept of asset owners selecting an investment strategy that their governance allows them to execute effectively in preference to a theoretically “better” strategy which can’t be executed successfully. This suggests that, for an asset owner, understanding one’s governance and building a strategy that can be executed within that governance capability (or improving the governance capability to match the desired strategy) is the appropriate approach to take in a competitive environment. Making best use of a finite supply of governance capability requires a full exploration of beliefs and objectives in order to identify the strategies where a successful execution is most likely. This is particularly important in harder areas such as sustainability.