Targeting specific distributions of trajectories in MDPs

David L. Roberts, Mark J. Nelson, Charles L. Isbell, Michael Mateas, Michael L. Littman (2006). Targeting specific distributions of trajectories in MDPs. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1213–1218.

Abstract

We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert a traditional MDP into a TTD-MDP. We derive an algorithm for finding non-deterministic policies by constructing a trajectory tree that allows us to compute locally-consistent policies. We specify the necessary conditions for solving the problem exactly and present a heuristic algorithm for constructing policies when an exact answer is impossible or impractical. We present empirical results for our algorithm in two domains: a synthetic grid world and stories in an interactive drama or game.


Back to publications.