Alessandro Cappelli’s Post

Cofounder & Research Scientist

4mo

📜 A new blog post explaining the fundamentals of RL for LLMs—the perfect place to intuitively understand PPO and RLHF from first principles. 👇👇 Pre-trained LLMs are often compared to Lovecraft's Shoggoth, a monstrous, shape-shifting beast. PPO is the algorithm that brought it under control. Take a look! 🔗 https://lnkd.in/eXqDvrUV

1 Comment

Giorgio Angelotti

ML Consultant @ Vesuvius Challenge | PhD in AI | Physicist | Data Scientist

4mo

Sir, nice blog post. I have a silly question. RL is a framework to obtain a (sub-)optimal policy for a sequential decision making problem under uncertainty. Usually, after the agent takes an action, there is an environment that changes the state of the system in a possibly stochastic way. In what you are describing, the next state of the system is dictated deterministically by the action (the next token) produced by the agent (the LLM). Therefore, wouldn't the problem be better formalized as a multi-armed bandit? It would make more sense to use the general RL formalism if you were to train on full conversations rather than just prompt-answers, in that case the "environment" could be the human user that changes the state in a stochastic way with his prompts. Does this make sense to you?

2 Reactions

To view or add a comment, sign in

More Relevant Posts

Training Qualifications UK

12,199 followers
5mo Edited
Report this post
Exam voids can be a pain in the neck. So, how do we combat them? We minimise exam voids with our forward-thinking Functional Skills exam system that comes kitted out with: 🔎 Diagnostic tests. A test is taken before the assessment to determine if your learner's computer meets our system requirements. 📄 Exam System Guidance. The guidance document contains everything your learner and invigilator need to know, from downloading the secure browser to understanding the exam conditions. 💻 Room-sweeping technology. This ensures learners comply with exam conditions by taking a 360-degree video of their surroundings. We make life easier. Learn more. ➡️ https://hubs.ly/Q02S9d4P0
Like Comment
To view or add a comment, sign in
carl martin

Embodied Cognition & Deliberate Practice for Literacy
2mo Edited
Report this post
Timed reading of closed sets of high frequency words paired with priming exercises between trials...best instructional routine I've found to structure HFW recognition in a Deliberate Practice framework. When is it critical to help struggling readers automatize HFW recognition? Certainly if already off the ORF standard for passage reading:
Like Comment
To view or add a comment, sign in
David Berenstein

ML & DevRel @ Hugging Face 🤗 || 👨🏽🍳 Cooking, 👨🏽💻 Coding, 🏆 Committing
4mo
Report this post
How effectively can we align base LLMs without SFT or RLHF? URIAL achieves alignment through in-context learning of base LLMs and three stylistic examples and a system prompt. ⚖️ SFT Mistral-7b / RLHF Llama-2-70b Code https://buff.ly/3Uij52u Paper https://buff.ly/3A7RdqU
Like Comment
To view or add a comment, sign in
Tony D.

Lead Information Security Software Engineer at Fifth Third Bank | Graduate Student at Georgia Tech | CISSP, CCSP, CSSLP
2mo
Report this post
Getting closer to taking my PNPT exam.

Owned Access from Hack The Box!

labs.hackthebox.com
Like Comment
To view or add a comment, sign in
The Arthur Terry School

Secondary School at The Arthur Terry School
1mo
Report this post
Mrs Preece's Y13s have been creating glucose calibration curves as part of their required practical, using a colorimeter. Some pretty qualitative results as well as collecting quantitative result to plot their calibration curve next lesson. The Arthur Terry Learning Partnership (ATLP)
Like Comment
To view or add a comment, sign in
Alvaro Duran Tovar

Lead Machine Learning Engineer
10mo
Report this post
The latest keywords if you are following LLMs space: * llama 3 (impressive performance with 70b params). * phi 3 (pretty good performance as a small scale LM, 3.8b params). * HQQ (half quadratic quantization) * Q* (reinforcement learning, "self play" training)
Like Comment
To view or add a comment, sign in
AppointLink

204 followers
8mo
Report this post
Our Exam Scheduling application simplifies the process of creating, validating, and sharing exam schedules. Tailored for higher education staff, this intuitive tool utilizes data on class size, schedules, room availability, capacity, disability accommodations, and exam policies. With this information, the tool seamlessly generates personalized exam schedules for all students, eliminating manual scheduling hassles and ensuring efficient operations. Learn more: https://ow.ly/ngS750SjF1Q
Like Comment
To view or add a comment, sign in
RadProf Media Productions, Inc. (RMP, Inc.)

496 followers
9mo
Report this post
CT REGISTRY EXAM - QUESTION OF THE WEEK Take a stab at this week's challenge from Professor Mike! Make sure you post your answer below, and follow along for the solution! Join us to keep learning, and put your knowledge to the test!
Like Comment
To view or add a comment, sign in
The Royal College of Radiologists

15,953 followers
11mo
Report this post
Have you tried our new REAL module on pitfalls in reporting scaphoid radiographs? You'll review radiographic features on scaphoid radiographs and learn how to apply a systematic approach to evaluation. Take the course on the RCR Learning Hub: https://bit.ly/4ajBlP5
Like Comment
To view or add a comment, sign in
Matt Woodfine

Lead Practitioner of Maths at Hinchingbrooke School and software developer.
1mo
Report this post
The discussion this qu sparked with my Y11s today was interesting: “Is 1 a square number?” “What about 25 - can I include that?” Revisiting basic facts and exam technique often uncovers surprising gaps in understanding. https://lnkd.in/e9HAdwh9 #maths #mathschat #MathsToday
Like Comment
To view or add a comment, sign in

1,247 followers

36 Posts

View Profile Follow

Alessandro Cappelli’s Post

More Relevant Posts

Explore topics