Qian Liu’s Post

DeepSeek-R1 is absolutely wild—and so are we! 🚀 After 5 days of DeepSeek-R1, we’ve replicated its pure reinforcement learning magic on math reasoning — no reward models, no supervised fine-tuning, from a base model — and the results are mind-blowing: 🧠 A 7B model + 8K MATH examples for verification + Reinforcement Learning = "aha moment" 🌟 Long Chain-of-Thought and Self Reflection emerge naturally 🔥 Record Math Performance: ✅ 33.3% on AIME ✅ 62.5% on AMC ✅ 77.2% on MATH 📈 Outperforms Qwen2.5-Math-7B-Instruct and matches strong methods like PRIME and rStar-MATH, despite using >50x less data and just the simple PPO algorithm! We proudly open-source our complete training code and methodology to the research community. By sharing these resources, we aim to establish our Simple Reinforcement Learning recipe as an inspiration for future development in reinforcement learning. 🔬Check out more details and our findings in our blog: https://lnkd.in/gfK9pgtt 🖥️ The training code can be found at: https://lnkd.in/gmwwfigt

7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient | Notion

hkust-nlp.notion.site

5 Comments

Zexin Yan

applied AI/ML System Software Engineer

Very helpful as

Zaheen E Muktadi Syed

Graduate Research Assistant @University of Central Florida || Ex Huawei

Interesting

2 Reactions

Xiaobing S.

Research Scientist (NLP, LLM, Safety)

Everyone can tune his/her LLM.

2 Reactions

Mahatma Kawa

Machine Learning Engineer at BRI | LLM Practitioner | Robotics Enthusiast | Experienced IoT Engineer

The explanation and the code are very clear and easy to understand. Thanks

1 Reaction

Louis Owen

Awesome Qian Liu and congrats on your new job!

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Boyang "Albert" Li

Nanyang Associate Professor at NTU, Singapore
4w
Report this post
After DeepSeek R1, the next major obstacle for AI is soft reasoning, or reasoning that requires common sense and does not have definite right or wrong answers. The RL reward becomes ill defined. One needs either RLHF or curated MCQs.

Qian Liu
4w

DeepSeek-R1 is absolutely wild—and so are we! 🚀 After 5 days of DeepSeek-R1, we’ve replicated its pure reinforcement learning magic on math reasoning — no reward models, no supervised fine-tuning, from a base model — and the results are mind-blowing: 🧠 A 7B model + 8K MATH examples for verification + Reinforcement Learning = "aha moment" 🌟 Long Chain-of-Thought and Self Reflection emerge naturally 🔥 Record Math Performance: ✅ 33.3% on AIME ✅ 62.5% on AMC ✅ 77.2% on MATH 📈 Outperforms Qwen2.5-Math-7B-Instruct and matches strong methods like PRIME and rStar-MATH, despite using >50x less data and just the simple PPO algorithm! We proudly open-source our complete training code and methodology to the research community. By sharing these resources, we aim to establish our Simple Reinforcement Learning recipe as an inspiration for future development in reinforcement learning. 🔬Check out more details and our findings in our blog: https://lnkd.in/gfK9pgtt 🖥️ The training code can be found at: https://lnkd.in/gmwwfigt

7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient | Notion

hkust-nlp.notion.site

3 Comments
Like Comment
To view or add a comment, sign in
Frank Winkler

Sr. Solution Architect & AI/ML Specialist | Machine Learning, Data Engineering
3w Edited
Report this post
And so it continues down the "minimal guided self learning" path set by DeepSeek. Basemodel + RL with small verification dataset = magic Looking forward to seing more at larger scale

Qian Liu
4w

DeepSeek-R1 is absolutely wild—and so are we! 🚀 After 5 days of DeepSeek-R1, we’ve replicated its pure reinforcement learning magic on math reasoning — no reward models, no supervised fine-tuning, from a base model — and the results are mind-blowing: 🧠 A 7B model + 8K MATH examples for verification + Reinforcement Learning = "aha moment" 🌟 Long Chain-of-Thought and Self Reflection emerge naturally 🔥 Record Math Performance: ✅ 33.3% on AIME ✅ 62.5% on AMC ✅ 77.2% on MATH 📈 Outperforms Qwen2.5-Math-7B-Instruct and matches strong methods like PRIME and rStar-MATH, despite using >50x less data and just the simple PPO algorithm! We proudly open-source our complete training code and methodology to the research community. By sharing these resources, we aim to establish our Simple Reinforcement Learning recipe as an inspiration for future development in reinforcement learning. 🔬Check out more details and our findings in our blog: https://lnkd.in/gfK9pgtt 🖥️ The training code can be found at: https://lnkd.in/gmwwfigt

7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient | Notion

hkust-nlp.notion.site
Like Comment
To view or add a comment, sign in
Oroghene Oboreh-Snapps, Ph.D

Transmission Planning Consultant @ 1898 and Co. WHQ (Part of Burns and McDonnell) |Deep Reinforcement Learning Expert| Microgrids| Renewable Energy Integration
2mo
Report this post
When I first started working with Reinforcement Learning (RL) in early 2020, one of the biggest challenges I faced was the lack of clear guidelines on what to look for when training these models. Without direction, it’s easy to spend years going in circles. Over time, I found a few resources that were incredibly helpful: • Sutton’s textbook (a must-read for RL fundamentals) • Google DeepMind and Deeplizard videos on YouTube • Phil Tabor’s courses on Udemy That said, the most valuable lesson I learned—and the advice I always share—is to first explore whether the problem can be solved using classical methods, speaking from a controls perspective. This approach ensures the problem is solvable and provides a solid baseline. It also gives you the confidence that your RL model just needs further tuning to achieve the desired performance.
Amirabbas Asadi

Student in Applied Mathematics | Probabilistic ML | RL | High-Performance Scientific Computing
2mo Edited

Most Reinforcement Learning books that I've read so far, do not get into the engineering and operational challenges of RL. I mean project lifecycle, best practices for designing reward and environment, deployment, monitoring, etc. According to my experience, RL solutions are significantly more difficult to hire compared to other ML paradigms. I agree that in many failed cases, RL shouldn't have been used in the first place. But I was looking for such guides if there are any. I know some companies that provide RL as a service and want to study the stories of successes/failures. So I like to hear about your experience or if you can mention specific examples. I saw this book today which looks interesting and I will share my opinion after finishing it. It seems more focused on the practical challenges and also has three chapters in addition to standard RL topics namely "Improving How an Agent Learns", "Practical Reinforcement Learning", and "Operational Reinforcement Learning". Topics: 🔴 Why Reinforcement Learning? 🔴 Markov Decision Processes, Dynamic Programming, and Monte Carlo Methods 🔴 Temporal-Difference Learning, Q-Learning, and n-Step Algorithm 🔴 Deep Q-Networks 🔴 Policy Gradient Methods 🔴 Beyond Policy Gradients 🔴 Entropy Methods 🔴 Improving How an Agent Learns 🔴 Practical Reinforcement Learning 🔴 Operational Reinforcement Learning Book's webpage: https://rl-book.com/
3 Comments
Like Comment
To view or add a comment, sign in
Jade Choghari

MLE LeRobot @ Hugging Face 🤗 | CS @ Waterloo
1mo
Report this post
been diving into jax and wrote some notes. what’s exciting is how efficient it can be—like training a model to solve the XOR problem in 1 second. pure functions, no state, and everything optimized for speed and scalability. it can save serious time and money. for now, i added two blogs: > intro to jax: covers how functional programming lets you write pure, composable code with no side effects, and why transformations like grad, vmap, and jit are game changers. > building a neural network with flax: walks through implementing a minimal neural network to solve XOR, using flax for modularity and jax for speed—showing how the entire process can run in seconds. if you’re building ml models and care about efficiency, jax is worth learning.
2 Comments
Like Comment
To view or add a comment, sign in
Emergence AI

2,150 followers
6mo
Report this post
Read three papers about Sequential Decision Making, from the team at Emergence: 1. Deep Policy Iteration with Integer Programming for Inventory Management 👉 https://lnkd.in/dGBHGmGa 2. Math Programming based Reinforcement Learning for Multi-Echelon Inventory Management 👉 https://lnkd.in/dnffyfE8 3. Nonstationary Reinforcement Learning with Linear Function Approximation 👉 https://lnkd.in/d33k6r-m #AI #AIagents #SequentialDecisionMaking
Like Comment
To view or add a comment, sign in
Haritha M

Aspiring Data Analyst and Aspiring Business Analyst
9mo
Report this post
Hello, LinkedIn Community! 🎓🌟 Delighted to announce my certification in Supervised ML: Regression & Classification! 🚀 Equipped with the power to predict the future using data-driven insights, I'm ready to embark on transformative journeys. 📈 With a deep understanding of both regression and classification techniques, I'm poised to tackle real-world challenges head-on. #MachineLearning #Certified 🏆 Thrilled to leverage this knowledge to unlock new possibilities and drive innovation in every project I undertake. Let's harness the potential of data and chart a course towards success! 🌟🔍📊 #DataScience #RIT #ML #CertificationComplete #CareerGrowth #ProfessionalDevelopment #TechSkills #CSBS #SupervisedLearning #Regression #Classification #DataMining #Python #Programming #MLAlgorithms #Algorithm #DataDriven #PredictiveAnalytics #DataMining

Completion Certificate for Supervised Machine Learning: Regression and Classification

coursera.org
Like Comment
To view or add a comment, sign in
Fatna Elkouri

Master student AI and Emerging Technologies | software development
1mo
Report this post
🎓 Certificate Achieved: Supervised Machine Learning: Regression and Classification I’m excited to share that I’ve successfully completed this course offered by Stanford Online and taught by Andrew Ng. This program provided a deep understanding of supervised learning techniques, including linear regression, logistic regression, and key concepts such as cost functions, gradient descent, and regularization. 💡 Key Highlights: Developed skills in building predictive models and handling real-world datasets. Gained practical experience in Python for implementing machine learning algorithms. Explored the theory and application of regression and classification models. This achievement enhances my foundation in machine learning and aligns with my career goals in artificial intelligence and emerging technologies.

Completion Certificate for Supervised Machine Learning: Regression and Classification

coursera.org

4 Comments
Like Comment
To view or add a comment, sign in
Jhalak Kotwani

Student of 4th semester at Bhilai Institute of Technology (BIT), Durg
8mo Edited
Report this post
Ready to tackle real-world problems with Al! Equipped with the knowledge and tools from the Supervised Machine Learning: Regression and Classification program by DeepLearning.AI & Stanford University. This program taught me how to build and train machine learning models in Python using popular libraries like NumPy and scikit-learn. I also learned how to apply these models to real-world prediction and classification tasks. I'm grateful to the @Andrew Ng who helped me on this journey. I'm also grateful to DeepLearning.Al and Stanford University for offering this world-class program. I'm excited to leverage my newly acquired skills to make a positive impact on the world. #MachineLearning #SupervisedLearning #Regression #Classification #DeepLearning #Al #Stanford #AndrewNg

Completion Certificate for Supervised Machine Learning: Regression and Classification

coursera.org

1 Comment
Like Comment
To view or add a comment, sign in
Debarchan Mitra, FRM

Machine Learning | Data Science | AB InBev
7mo Edited
Report this post
Checkout my review of the MVP of all learning resources in Machine Learning i.e. the book An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani, and Jonathan Taylor in my recent blog-post!

Why you (yes, you) should read this book

mba-wala-in-tech.blogspot.com
Like Comment
To view or add a comment, sign in

1,371 followers

11 Posts

View Profile Connect

Qian Liu’s Post

More Relevant Posts

Explore topics