Charith Peris, PhD’s Post

View profile for Charith Peris, PhD

Senior Applied Scientist | Responsible AI | Artificial General Intelligence at Amazon

Here’s a blogpost that breaks down our work on Attribute Controlled Fine-tuning for Large Language Models . We introduced a new method that trains an auxiliary model to control a specific attribute (in this case toxicity). The approach regularizes the LLM's training by penalizing deviations from the desired (non-toxic) distribution, using the auxiliary model trained alongside the core LLM. This work, published at #EMNLP2024, was led by our intern Tao Meng (UCLA) together with our collaborators Ninareh Mehrabi, Palash Goyal, PhD, Anil Ramakrishna, PhD, Aram Galstyan, Richard Zemel, Kai-Wei Chang and Rahul Gupta. Blogpost: https://lnkd.in/eRzvKSKi Paper: https://lnkd.in/eHNwgWAt

Detoxification of large language models via regularized fine-tuning

Detoxification of large language models via regularized fine-tuning

amazon.science

To view or add a comment, sign in

Explore topics