Configure and run a full RL pipeline using the cookbook's RL abstractions with `RLDatasetBuilder`. In tutorials 05-06 you wrote RL loops manually. The cookbook also provides `rl.train.Config` + ...
Supervised fine-tuning teaches a model from example outputs. Reinforcement learning (RL) teaches from *rewards* -- the model generates its own outputs, and a reward function scores them. The model ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results