Chapter 5: Monte Carlo Methods

June 8, 2026

Monte Carlo methods

-require only experience, not complete knowledge of the environment
-solve RL problems by averaging sampled returns
-apply only to episodic settings where episodes terminate
-extend DP ideas like policy evaluation, policy improvement, and GPI using sample experience

Monte Carlo Prediction

first-visit MC - estimate vπ(s)v_{\pi}(s) as the average of returns following the first visit to ss in each episode

every-visit MC - estimate vπ(s)v_{\pi}(s) by averaging returns after every visit to ss