Name: Eluder Dimension and the Sample Complexity of Optimistic Exploration
Start: 2013-12-07T16:40:00-0800
End: 2013-12-07T17:00:00-0800

Back To Schedule

Eluder Dimension and the Sample Complexity of Optimistic Exploration

This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms. Some of the most successful algorithms for this problem use the principle of optimism in the face of uncertainty to guide exploration. The clearest example of this is the class of upper confidence bound (UCB) algorithms, but recent work has shown that a simple posterior sampling algorithm, sometimes called Thompson sampling, also shares a close theoretical connection with optimistic approaches. In this paper, we develop a regret bound that holds for both classes of algorithms. This bound applies broadly and can be specialized to many model classes. It depends on a new notion we refer to as the eluder dimension, which measures the degree of dependence among action rewards. Compared to UCB algorithm regret bounds for specific model classes, our general bound matches the best available for linear models and is stronger than the best available for generalized linear models.
None

Speakers

Benjamin Van Roy

Dan Russo

Saturday December 7, 2013 4:40pm - 5:00pm PST
Harvey's Convention Center Floor, CC

Orals

posterid Sat48
location Poster# Sat48

NIPS 2013

Benjamin Van Roy

Dan Russo

Attendees (0)

NIPS 2013

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Benjamin Van Roy

Dan Russo

Attendees (0)