Monte Carlo Tree Search in continuous spaces using Voronoi optimistic optimization with regret bounds

AAAI Oral (2020)

초록

Many important applications, including robotics, data-center management, and process control require planning action sequences in domains with continuous state and action spaces and discontinuous objective functions. Monte Carlo tree search (MCTS) is an effective strategy for planning in discrete action spaces. We provide a novel MCTS-like algorithm (VOOT) for deterministic environments with continuous state-action spaces, which, in turn, is based on a novel black-box function-optimization algorithm (VOO) to efficiently sample actions. The VOO algorithm uses Voronoi partitioning to guide sampling, and is particularly efficient in high-dimensional spaces. The VOOT algorithm has an instance of VOO at each node in the tree. We provide the regret bounds for both algorithms and demonstrate their empirical effectiveness in several high-dimensional problems including two difficult robotics planning problems.

저자

김범준 (MIT), 이경재 (서울대학교), 임성빈 (카카오브레인), Leslie Kaelbling (MIT), Tomas Lozano-Perez (MIT)

키워드

Robotics Optimization

발행 날짜

2020.02.07