This project takes a detailed look at the games that have been used as AI benchmarks and grand challenges over the past few decades, and what we've learned from them. There is sometimes a narrative of consistent progress as more and more games fall to AI victories: Chess was conquered in 1997 by Deep Blue, Go in 2016 by AlphaGo, Atari games in the period 2015 – 2020 using Deep Q Networks, and so on.
And indeed, a number of impressive techniques have been developed in the process. But we'd like to go back and analyze these results in more detail. What kinds of challenges, specifically, do these games pose, and how do those challenges relate to challenges found in non-game decision-making problems? How general-purpose are the algorithms that have been successful at playing games? We'd like both qualitative and quantitative answers to those questions.
A few questions (of many): What do the games' state spaces look like, and what features of each game pose the most difficulty? What does the gameplay look like qualitatively if you watch it? Do the challenging aspects change when different algorithms are used to play the same game, or are challenges more of a feature of the games themselves? In a multi-game benchmark set like Atari, can we identify games that are good proxies for different kinds of decision-making? And how robust are any of these findings to changes in rules, problem size, computational resources, and algorithm hyperparameters?
Funding provided by:
Collaborators: David Dunleavy, Amy K. Hoover