Like many AI researchers, I've been intrigued by recent advances in large language models (LLMs), and generative AI more broadly. At the same time, I'm concerned about their reliability, robustness, and ease of control. This cluster of projects looks into what we can do about that.
The strategy I've been pursuing (with many collaborators) is to put the LLM inside a larger AI system. Something I picked up from game-design research that I use in thinking about AI is: what's the "core loop"? In reinforcement learning it's act–observe–update. In chatbot–style AI it's prompt–generate–prompt–generate.
What's a good core AI loop? In my opinion, a lot of "agentic AI" work as of 2025 is too ad-hoc on that question, and relies too much on the LLM as the top-level AI system. I think we can do better if we start with a classical AI algorithm – evolution, planning, or even symbolic inference – as the core loop, then look for the brittle parts that can be judiciously LLM-ified. The hope is that this combines some of the strengths of each paradigm.
Separately, I've grown concerned about reproducibility of published results that use closed-weight LLMs like ChatGPT, so have been doing some work on that too.
Publications:
Funding provided by:
Collaborators (current): Adam Gaier, Amy K. Hoover, Ioannis Koutis, Joel Lehman, Elliot Meyerson, Arash Moradi Karkaj, Ben Samuel, Mike Treanor