Reinforcement Learning for Large Scale Recommendation System Very excited to deliver a talk on how to use reinforcement learning for optimizing user retention in large scale recommendation system with my colleague Gaurav Chakravorty at QCon 2024 conference today (https://qconsf.com/). If you are attending QCon this week, let me know and I will be happy to connect. RL (or say sequential decision making) has been very effective in many applications in past with latest being in aligning LLMs with human preference. We are going to discuss on how some of the RL concepts can be used for optimizing long term reward (i.e. user retention) in large scale recommendation system. Full schedule : https://lnkd.in/guiaCarC
Saurabh Gupta’s Post
More Relevant Posts
-
Reinforcement Learning agents used to struggle to complete simple multi-step tasks. Without a basic understanding of the world, the only way to make progress was exhaustive trial and error. Kamyar Azizzadenesheli explains how LLMs changed everything. 🎧/🎥: https://bit.ly/3wajpqP
To view or add a comment, sign in
-
Multi agent Reinforcement learning https://lnkd.in/dn7XqQHT 2024 version MIT Press
To view or add a comment, sign in
-
🚀 Excited to share our latest blog post on learning to watermark LLM-generated text using reinforcement learning. We explore the potential of embedding detectable signals into LLM-generated text to track misuse and offer a new approach to watermark design space. Our model-level watermark, embedding signals into LLM weights, demonstrates enhanced accuracy, robustness, and adaptability, even allowing for open-sourcing of the watermarked model. Discover more and explore our open-source code: [Read the full post here](https://bit.ly/3IIuCSl). #SocialMediaMarketing #ReinforcementLearning #LLM #Watermarking
To view or add a comment, sign in
-
“We empirically show that a GPT-style transformer exhibits a transition from in-distribution to out-of-distribution generalization as the number of pre-training tasks increases.” Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks https://lnkd.in/gewg9T_h source: https://lnkd.in/g2JZvuKi
To view or add a comment, sign in
-
Hey all, in my previous blog, I covered the basics of RL, covering the scenarios of Model based (Planning), Model Free ( Learning), Function Approximation ( e.g. deep-Q-network/ DQN). Here I tried cover policy gradients, importance sampling, natural policy gradients, TRPO and then finally PPO — which is very popularly used for RLHF ( Reinforcement Learning from Human Feedback) for LLM these days. The mathematical details and derivations get really complicated in some of these topics; I have tried to keep them as simple as possible. While reading the individual papers, from policy gradient ( NIPS , 1999 paper) to PPO (2017), I have tried to focus on what are the key ideas they introduced and what are new improvements. #reinforcement_learning #rl #dl #deeprl #markov_decision_process #deeplearning #policy_based_rl #npg #trpo #ppo
To view or add a comment, sign in
-
Our paper on a utility-based approach to reinforcement learning was accepted for presentation at AAMAS later this year, and is available for reading on arxiv https://lnkd.in/dktJb4Qa
Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning
arxiv.org
To view or add a comment, sign in
-
🌟 Excited to share our latest blog post on Action-constrained Policy Gradient with Normalizing Flows for reinforcement learning! 🌟 Action-constrained reinforcement learning (ACRL) is crucial for addressing safety-critical and resource-allocation related decision making problems. Our new approach utilizes a normalizing flow model to learn a mapping between the feasible action space and a distribution on a latent variable. We tackle the challenge of action sampling for convex and non-convex constraints, and integrate the learned normalizing flow with the DDPG algorithm. The result? Significantly fewer constraint violations and improved speed on continuous control tasks. Access the full post here: https://bit.ly/3OC57p7 #ReinforcementLearning #PolicyGradient #AIResearch #SocialMediaMarketing
To view or add a comment, sign in
-
For a patient introduction to reinforcement learning, look no further than Oliver S's ongoing series; in part 2, we dive deep into the inner workings of Markov decision processes.
Introducing Markov Decision Processes, Setting up Gymnasium Environments and Solving them via…
towardsdatascience.com
To view or add a comment, sign in
-
Leveled up my RL skills! After some trial and error, finally trained an agent to ace the Acrobot-v1 challenge. Reinforcement Learning is truly fascinating! Talk about defying gravity! Video is recorded at 15FPS instead of ideal 30FPS. Here is the repository if you have some tinkering in mind :) https://lnkd.in/g7cWineW #ReinforcementLearning #DeepLearning #GymEnvironment #NeuralNetworks #MachineLearning
To view or add a comment, sign in
-
From Clinical Problem Solvers– Spaced Learning Series Episode 4 – Recurrent Presyncope https://lnkd.in/g-2V66cJ
Spaced Learning Series Episode 4 – Recurrent Presyncope
To view or add a comment, sign in