In our #NeurIPS2024 paper „I Don’t Know: Explicit Modeling of Uncertainty with an [IDK] Token“, we equip LLMs with a special „I Don’t Know“ (IDK) token that is learned to be predicted instead of „hallucinating“ wrong completions. We finetune LLMs with a novel IDK objective to shift probability mass towards the new IDK token for uncertain predictions. By design, the IDK objective does not require any annotations or handcrafted examples! All details in our paper: https://lnkd.in/dJX-FX7a (joint work with Roi Cohen, Eden Biran and Gerard de Melo).
LLMs "hallucinate": making up information instead of revealing their lack of knowledge 🤔 Our #NeurIPS2024 paper adds an 𝙸–𝙳𝚘𝚗'𝚝–𝙺𝚗𝚘𝚠 🤷🏾♂️ token to the LLM and fine-tunes it to predict this instead of hallucinating. No extra annotation is needed for training! The regular cross-entropy loss shifts all wrong probability mass to the gold token, while our IDK objective divides the shifted probability mass between the gold and 𝙸–𝙳𝚘𝚗'𝚝–𝙺𝚗𝚘𝚠 🤷🏾♂️ token for wrong predictions. Joint work w/ Roi Cohen, Konstantin Dobler, Eden Biran