Targeting specific distributions of trajectories in MDPs

Download: PDF.

“Targeting specific distributions of trajectories in MDPs” by David L. Roberts, Mark J. Nelson, Charles L. Isbell, Michael Mateas, and Michael L. Littman. In Proceedings of the 21st National Conference on Artificial Intelligence, 2006, pp. 1213-1218.

Abstract

We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert a traditional MDP into a TTD-MDP. We derive an algorithm for finding non-deterministic policies by constructing a trajectory tree that allows us to compute locally-consistent policies. We specify the necessary conditions for solving the problem exactly and present a heuristic algorithm for constructing policies when an exact answer is impossible or impractical. We present empirical results for our algorithm in two domains: a synthetic grid world and stories in an interactive drama or game.

BibTeX entry:

@inproceedings{TTDMDP:AAAI06,
   author = {David L. Roberts and Mark J. Nelson and Charles L. Isbell and
	Michael Mateas and Michael L. Littman},
   title = {Targeting specific distributions of trajectories in {MDPs}},
   booktitle = {Proceedings of the 21st National Conference on Artificial
	Intelligence},
   pages = {1213--1218},
   year = {2006}
}

Back to publications.