What does AI's success at game playing tell us?

This project takes a detailed look at the games that have been used as AI benchmarks and grand challenges over the past few decades, and what we've learned from them. There is sometimes a narrative of consistent progress as more and more games fall to AI victories: Chess was conquered in 1997 by Deep Blue, Go in 2016 by AlphaGo, Atari games in the period 2015 – 2020 using Deep Q Networks, and so on.

And indeed, a number of impressive techniques have been developed in the process. But we'd like to go back and analyze these results in more detail. What kinds of challenges, specifically, do these games pose, and how do those challenges relate to challenges found in non-game decision-making problems? How general-purpose are the algorithms that have been successful at playing games? We'd like both qualitative and quantitative answers to those questions.

A few questions (of many): What do the games' state spaces look like, and what features of each game pose the most difficulty? What does the gameplay look like qualitatively if you watch it? Do the challenging aspects change when different algorithms are used to play the same game, or are challenges more of a feature of the games themselves? In a multi-game benchmark set like Atari, can we identify games that are good proxies for different kinds of decision-making? And how robust are any of these findings to changes in rules, problem size, computational resources, and algorithm hyperparameters?

Publications:

Estimates for the branching factors of Atari games, CoG 2021
Investigating vanilla MCTS scaling on the GVG-AI game corpus, CIG 2016

Blog posts:

DeepMind AlphaStar (Starcraft II bot) roundup (2019)
How difficult is the GVG-AI competition? (2016)

Funding provided by:

The National Science Foundation, under the Robust Intelligence program of the Division of Information & Intelligent Systems (IIS), Grant No. 1948017
American University, through a College of Arts & Sciences faculty startup grant

Collaborators (current): Amy K. Hoover, Javad Rajabi

Past collaborators: Britton Horn, David Dunleavy

Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.