Saurabh Gupta’s Post

Engineering Leader, Machine Learning & Video Recommendations at Meta

Reinforcement Learning for Large Scale Recommendation System Very excited to deliver a talk on how to use reinforcement learning for optimizing user retention in large scale recommendation system with my colleague Gaurav Chakravorty at QCon 2024 conference today (https://qconsf.com/). If you are attending QCon this week, let me know and I will be happy to connect. RL (or say sequential decision making) has been very effective in many applications in past with latest being in aligning LLMs with human preference. We are going to discuss on how some of the RL concepts can be used for optimizing long term reward (i.e. user retention) in large scale recommendation system. Full schedule : https://lnkd.in/guiaCarC

To view or add a comment, sign in

More Relevant Posts

TWIML

2,689 followers
10mo Edited
Report this post
Reinforcement Learning agents used to struggle to complete simple multi-step tasks. Without a basic understanding of the world, the only way to make progress was exhaustive trial and error. Kamyar Azizzadenesheli explains how LLMs changed everything. 🎧/🎥: https://bit.ly/3wajpqP
Like Comment
To view or add a comment, sign in
Alvaro Almeida Gomez

Cientifico-investigador en ciencia de datos, machine learning y problemas inversos.
8mo
Report this post
Multi agent Reinforcement learning https://lnkd.in/dn7XqQHT 2024 version MIT Press
Like Comment
To view or add a comment, sign in
Tanat Tonguthaisri, CISSP®

enabling digital services for Student Loan related activities while maintaining the highest security standard, the most compliant personal data protection and customer-centric data-driven innovation.
8mo
Report this post
🚀 Excited to share our latest blog post on learning to watermark LLM-generated text using reinforcement learning. We explore the potential of embedding detectable signals into LLM-generated text to track misuse and offer a new approach to watermark design space. Our model-level watermark, embedding signals into LLM weights, demonstrates enhanced accuracy, robustness, and adaptability, even allowing for open-sourcing of the watermarked model. Discover more and explore our open-source code: [Read the full post here](https://bit.ly/3IIuCSl). #SocialMediaMarketing #ReinforcementLearning #LLM #Watermarking
Like Comment
To view or add a comment, sign in
Charles H. Martin, PhD

AI Specialist and Distinguished Engineer (NLP & Search). Inventor of weightwatcher.ai . TEDx Speaker. Need help with AI ? #talkToChuck
6mo Edited
Report this post
“We empirically show that a GPT-style transformer exhibits a transition from in-distribution to out-of-distribution generalization as the number of pre-training tasks increases.” Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks https://lnkd.in/gewg9T_h source: https://lnkd.in/g2JZvuKi
2 Comments
Like Comment
To view or add a comment, sign in
Sauradeep Debnath

Machine Learning Lead @AICoE, Cigna. IIT-Hyd.
1mo Edited
Report this post
Hey all, in my previous blog, I covered the basics of RL, covering the scenarios of Model based (Planning), Model Free ( Learning), Function Approximation ( e.g. deep-Q-network/ DQN). Here I tried cover policy gradients, importance sampling, natural policy gradients, TRPO and then finally PPO — which is very popularly used for RLHF ( Reinforcement Learning from Human Feedback) for LLM these days. The mathematical details and derivations get really complicated in some of these topics; I have tried to keep them as simple as possible. While reading the individual papers, from policy gradient ( NIPS , 1999 paper) to PPO (2017), I have tried to focus on what are the key ideas they introduced and what are new improvements. #reinforcement_learning #rl #dl #deeprl #markov_decision_process #deeplearning #policy_based_rl #npg #trpo #ppo

Policy Based Deep RL — part 2 of Reinforcement Learning Series

link.medium.com

2 Comments
Like Comment
To view or add a comment, sign in
Johan Källström

Distinguished Engineer and Technical Fellow in Simulation-Based Training at Saab Aeronautics
10mo
Report this post
Our paper on a utility-based approach to reinforcement learning was accepted for presentation at AAMAS later this year, and is available for reading on arxiv https://lnkd.in/dktJb4Qa

Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

arxiv.org
Like Comment
To view or add a comment, sign in
Tanat Tonguthaisri, CISSP®

enabling digital services for Student Loan related activities while maintaining the highest security standard, the most compliant personal data protection and customer-centric data-driven innovation.
9mo
Report this post
🌟 Excited to share our latest blog post on Action-constrained Policy Gradient with Normalizing Flows for reinforcement learning! 🌟 Action-constrained reinforcement learning (ACRL) is crucial for addressing safety-critical and resource-allocation related decision making problems. Our new approach utilizes a normalizing flow model to learn a mapping between the feasible action space and a distribution on a latent variable. We tackle the challenge of action sampling for convex and non-convex constraints, and integrate the learned normalizing flow with the DDPG algorithm. The result? Significantly fewer constraint violations and improved speed on continuous control tasks. Access the full post here: https://bit.ly/3OC57p7 #ReinforcementLearning #PolicyGradient #AIResearch #SocialMediaMarketing
Like Comment
To view or add a comment, sign in
Towards Data Science

638,807 followers
3mo
Report this post
For a patient introduction to reinforcement learning, look no further than Oliver S's ongoing series; in part 2, we dive deep into the inner workings of Markov decision processes.

Introducing Markov Decision Processes, Setting up Gymnasium Environments and Solving them via…

towardsdatascience.com
Like Comment
To view or add a comment, sign in
Abhay Singh

AI and Gameplay Programmer | Deep Learning Specialist | Architecting Intelligent Systems
7mo Edited
Report this post
Leveled up my RL skills! After some trial and error, finally trained an agent to ace the Acrobot-v1 challenge. Reinforcement Learning is truly fascinating! Talk about defying gravity! Video is recorded at 15FPS instead of ideal 30FPS. Here is the repository if you have some tinkering in mind :) https://lnkd.in/g7cWineW #ReinforcementLearning #DeepLearning #GymEnvironment #NeuralNetworks #MachineLearning

8 Comments
Like Comment
To view or add a comment, sign in
AMA Ed Hub

2,177 followers
1w
Report this post
From Clinical Problem Solvers– Spaced Learning Series Episode 4 – Recurrent Presyncope https://lnkd.in/g-2V66cJ

Spaced Learning Series Episode 4 – Recurrent Presyncope
Like Comment
To view or add a comment, sign in

4,272 followers

29 Posts

View Profile Follow

Saurabh Gupta’s Post

More Relevant Posts

Explore topics