-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 10 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 5 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 5
Cornell-AGI
university
AI & ML interests
Reinforcement Learning from Human Feedback
Organization Card
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated • 4 -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated • 3 -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 23 • 2
-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 10 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 5 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 5
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated • 4 -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated • 3 -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 23 • 2
models
20

Cornell-AGI/apo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
7

Cornell-AGI/ppo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
7

Cornell-AGI/rebel_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
7

Cornell-AGI/grpo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
7

Cornell-AGI/grpo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
8

Cornell-AGI/ppo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
11

Cornell-AGI/rebel_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
7

Cornell-AGI/apo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
7

Cornell-AGI/grpo_math_qwen2.5_7b
Text Generation
•
8B
•
Updated
•
89

Cornell-AGI/ppo_math_qwen2.5_7b
Text Generation
•
8B
•
Updated
•
89
datasets
15
Cornell-AGI/math_size_qwen2.5_7b_eval
Viewer
•
Updated
•
7.5k
•
17
Cornell-AGI/math_size_qwen2.5_3b_eval
Viewer
•
Updated
•
7.5k
•
7
Cornell-AGI/math_size_qwen2.5_1.5b_eval
Viewer
•
Updated
•
7.5k
•
38
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer
•
Updated
•
7.47k
•
5
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer
•
Updated
•
7.47k
•
5
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer
•
Updated
•
7.47k
•
10
Cornell-AGI/amazon_movie_tv_item_mxbai
Viewer
•
Updated
•
10.5k
•
10
Cornell-AGI/amazon_movie_tv_llama_mxbai
Viewer
•
Updated
•
17.1k
•
25
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_2
Viewer
•
Updated
•
116k
•
14
•
1
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer
•
Updated
•
64.6k
•
23
•
2