Reinforcement learning

Algorithms

https://csc413-2020.github.io/assets/slides/lec10.pdf

Repeat forever:

Notes:

• On-policy (~can only use data generated using the current policy).
• Works with large and continuous action spaces.
• Works with stochastic policies.

Q-learning

https://csc413-2020.github.io/assets/slides/lec11.pdf

Train a neural network representing your $Q(s_t, a_t)$ function.
• Don't always take the best action during training (e.g. $\epsilon$-greedy policy).