Chapter 5: Monte Carlo Methods

June 8, 2026

Monte Carlo methods

-require only experience, not complete knowledge of the environment

-solve RL problems by averaging sampled returns

-apply only to episodic settings where episodes terminate

-extend DP ideas like policy evaluation, policy improvement, and GPI using sample experience

first-visit MC - estimate $v_{\pi}(s)$ as the average of returns following the first visit to $s$ in each episode

every-visit MC - estimate $v_{\pi}(s)$ by averaging returns after every visit to $s$