A globally optimal algorithm for TTD-MDPs

Download: PDF.

“A globally optimal algorithm for TTD-MDPs” by Sooraj Bhat, David L. Roberts, Mark J. Nelson, Charles L. Isbell Jr., and Michael Mateas. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, 2007, pp. 1196-1203.

Abstract

In this paper, we discuss the use of Targeted Trajectory Distribution Markov Decision Processes (TTD-MDPs) — a variant of MDPs in which the goal is to realize a specified distribution of trajectories through a state space — as a general agent-coordination framework.

We present several advances to previous work on TTD-MDPs. We improve on the existing algorithm for solving TTD-MDPs by deriving a greedy algorithm that finds a policy that provably minimizes the global KL-divergence from the target distribution. We test the new algorithm by applying TTD-MDPs to drama management, where a system must coordinate the behavior of many agents to ensure that a game follows a coherent storyline, is in keeping with the author's desires, and offers a high degree of replayability.

Although we show that suboptimal greedy strategies will fail in some cases, we validate previous work that suggests that they can work well in practice. We also show that our new algorithm provides guaranteed accuracy even in those cases, with little additional computational cost. Further, we illustrate how this new approach can be applied online, eliminating the memory-intensive offline sampling necessary in the previous approach.

BibTeX entry:

@inproceedings{TTDMDP:AAMAS07,
   author = {Sooraj Bhat and David L. Roberts and Mark J. Nelson and
	Isbell Jr., Charles L. and Michael Mateas},
   title = {A globally optimal algorithm for {TTD}-{MDPs}},
   booktitle = {Proceedings of the 6th International Joint Conference on
	Autonomous Agents and Multiagent Systems},
   pages = {1196-1203},
   year = {2007}
}

Back to publications.