

Instead of trying to model the entire environment, MuZero just models aspects that are important to the agent’s decision-making process. MuZero uses a different approach to overcome the limitations of previous approaches. As the name suggests, model-free algorithms do not use a learned model and instead estimate what is the best action to take next. Until now, the best results on Atari are from model-free systems, such as DQN, R2D2 and Agent57. However, the complexity of modelling every aspect of an environment has meant these algorithms are unable to compete in visually rich domains, such as Atari. Model-based systems aim to address this issue by learning an accurate model of an environment’s dynamics, and then using it to plan. This makes it difficult to apply them to messy real world problems, which are typically complex and hard to distill into simple rules. Systems that use lookahead search, such as AlphaZero, have achieved remarkable success in classic games such as checkers, chess and poker, but rely on being given knowledge of their environment’s dynamics, such as the rules of the game or an accurate simulator. Researchers have tried to tackle this major challenge in AI by using two main approaches: lookahead search or model-based planning. Humans learn this ability quickly and can generalise to new scenarios, a trait we would also like our algorithms to have. For example, if we see dark clouds forming, we might predict it will rain and decide to take an umbrella with us before we venture out.

The ability to plan is an important part of human intelligence, allowing us to solve problems and make decisions about the future.

In doing so, MuZero demonstrates a significant leap forward in the capabilities of reinforcement learning algorithms. By combining this model with AlphaZero’s powerful lookahead tree search, MuZero set a new state of the art result on the Atari benchmark, while simultaneously matching the performance of AlphaZero in the classic planning challenges of Go, chess and shogi. MuZero, first introduced in a preliminary paper in 2019, solves this problem by learning a model that focuses only on the most important aspects of the environment for planning. Until now, most approaches have struggled to plan effectively in domains, such as Atari, where the rules or dynamics are typically unknown and complex. MuZero masters Go, chess, shogi and Atari without needing to be told the rules, thanks to its ability to plan winning strategies in unknown environments.įor many years, researchers have sought methods that can both learn a model that explains their environment, and can then use that model to plan the best course of action. Now, in a paper in the journal Nature, we describe MuZero, a significant step forward in the pursuit of general-purpose algorithms. Two years later, its successor - AlphaZero - learned from scratch to master Go, chess and shogi. In 2016, we introduced AlphaGo, the first artificial intelligence (AI) program to defeat humans at the ancient game of Go.
