EP 042 • NOV 20
Scaling On-Policy RL for LLMs
We discuss the shift from supervised fine-tuning to reinforcement learning with environmental feedback.
#RLHF
#PPO