Skip to Main Content

NOTE: BIS 526 students are required to attend in person (47 College St., Room 106A). All others are requested to attend via Zoom.

Speaker- Eric Laber, Ph.D.

Title "Model-assisted reinforcement learning"


In reinforcement learning (RL), methods for estimating an optimal policy are broadly categorized as model-based or model-free. Model-based RL seeks to learn a complete model for the system under study, i.e., the complete system dynamics. Given an estimated model of the system, one can derive the optimal policy using Monte Carlo methods, or, in some specialized cases, direct calculation. When there is strong domain knowledge to inform a high-quality and parsimonious system dynamics model, model-based RL can be extremely effective especially when data are scarce. However, if the posited models are misspecified the resulting estimated policy can perform poorly. Furthermore, model diagnostics for Markov decision processes or other commonly used models in RL are under-developed, making it difficult to identify and correct a misspecified model. Model-free methods do not require a model of the underlying system dynamics making them potentially more robust to misspecification. However, because they impose less structure on the data generating model, they tend to have higher variance and are less suitable when data are scarce. The strengths and weaknesses of model-based and model-free RL are complementary and it is natural to try and combine them to obtain the strengths of both and the weaknesses of neither. We propose a doubly-robust estimator for an optimal decision strategy that is: (i) consistent if either the posited dynamics model is correctly specified or the estimating equation underpinning a model-free method is unbiased; and (ii) asymptotically efficient if the dynamics model is correctly specified and the estimating equation is unbiased. In simulation experiments, we show that the doubly-robust estimator performs favorably to state-of-the-art model-free and model-based methods in terms of expected cumulative utility. (Joint work with Owen Leete.)


  • Duke University

    Eric Laber, Ph.D.


Host Organizations




Lectures and Seminars