Skip to content

brianfitzgerald/minRL

Repository files navigation

MinRL

Simple, clean, heavily commented implementation of various policy gradient algorithms applied to LLMs.

General principles of this codebase:

  • Heavily comment each line as an education reference, a la nanoGPT.
  • Unit and system tested to allow for easy re-implementation.
  • Easy to follow control flow. No async inference for the time being.
  • Reasonably optimized - use vLLM for inference, with the sleep API, LoRA support
  • Easy to install and use, and easy to hack and add new algorithms / tasks.

Influences / references:

  • TRL GRPOTrainer
  • GRPO-Zero
  • LoRA Without Regret
  • prime-rl from Prime Intellect

Commands

To run training:

python train.py

Logs:

tensorboard --logdir runs

Modal:

uv run modal run -d modal_train.py::training

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors