Behrooz Omidvar-Tehrani’s Post

Senior ML Scientist @ AWS | LLM Agents for Amazon Q

9mo Edited

We are thrilled to announce that our ICML paper, titled "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation," is now available on arXiv. This paper presents one of the first automated, interpretable, task-specific evaluation methods for Retrieval-Augmented Generation (RAG) in Q&A contexts. For a summary of our contributions, check the following 🧵 from my co-author Laurent Callot on 𝕏: https://lnkd.in/gBF5iXaR. https://lnkd.in/g6r3Xv27 #ICML #LLMEvaluation #AmazonScience #RAG

Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation

arxiv.org

5 Comments

Oguzhan (Ouz) Gencoglu

Co-founder & Head of AI @ Root Signals | Measure and Control Your GenAI

9mo

Interesting work. The link to your implementation that was mentioned in your paper does not seem to be alive: https://github.com/amazon-science/auto-rag-eval Any pointers to your repo?

2 Reactions

Niraj Jetly

Software Engineering Leader at Amazon Web Services (AWS), Ex-CTO/VP Engineering, Board Member

9mo

Congratulations Behrooz Omidvar-Tehrani , it’s very interesting .

1 Reaction

Niccolo' Gentile, PhD

Research Scientist @ Foyer Group | AI Applied Research | Ex-Amazon

9mo

Amir Ali Aynetchi this looks very interesting

2 Reactions

Sarthak Jain

Signal Processing/NLP @Sony @UofSC AIISC | AI/ML @IIITD

9mo

Congrats Behrooz Omidvar-Tehrani

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Alexander Shabalin

PhD Student at Constructor University
1mo Edited
Report this post
I'm happy to announce that our paper "TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings" was accepted to A* #AAAI2025 conference as an oral 📣 presentation! ➡ Paper on arXiv: https://lnkd.in/d5uZv77A ➡ Code on GitHub: https://lnkd.in/dp2DbEiE 🔬 The paper mainly investigates the latent space for text diffusion models and proposes to encode text with a BERT-type model before processing it with a diffusion model, instead of just mapping each token to its embedding. Other key contributions: 🔍 While the latent representation can be easily decoded to text with any model, more complex context-dependant decoders perform much better because they are able to correct inaccuracies of the diffusion model. 🔍 The popular self-conditioning technique improves the quality of the textual diffusion model, because it helps to increase the prediction confidence at each denoising step. As a side effect, this leads to the possibility of greatly reducing the number of denoising steps. 🔍 Text diffusion models require the addition of a larger amount of noise at the beginning of the forward process, because the distance between text entities (tokens) in latent space is much larger than between pixel values for image diffusion. If the amount of noise isn't sufficient, the diffusion model easily reconstructs the original text.

TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings

arxiv.org
Like Comment
To view or add a comment, sign in
Michael Katz

Principal Research Scientist at IBM TJ Watson Research Center; Executive Council Board Member at ICAPS International Conference on Automated Planning and Scheduling
9mo
Report this post
There has been an enormous effort from the research community to solve planning problems with Large Language Models, with dozens of proposed methods essentially performing an external combinatorial search, calling the LLM at each search node (e.g., ToT, RAP, LATS, ...). Unfortunately, the papers do not analyze how cost-(in)efficient such approaches are. Not to worry, we do that for you! In our recent paper we show how wasteful are the methods that call language models at every search step and propose a very simple alternative - use the language models before the search to create a code to be run at every step of the search instead. We show that, with very little feedback from a human (hope to be automated in the future), we can solve entire datasets in seconds with 100% accuracy. We hope that the scientific community can embrace the mindset of being economical with the expensive resource that the large language models are. The paper: https://lnkd.in/ePb8yYaq Harsha Kokel Kavitha Srinivas Shirin Sohrabi #planning #LLM https://lnkd.in/ePb8yYaq

Planning with Language Models Through The Lens of Efficiency

arxiv.org
Like Comment
To view or add a comment, sign in
Devam Mondal
7mo
Report this post
Excited to announce that the preprint for my latest paper "Ensuring Responsible Sourcing of Large Language Model Training Data Through Knowledge Graph Comparison" in collaboration with Carlo Lipizzi has been released on arXiv. With the traditional "black-box" architecture of LLMs, understanding their training data becomes difficult due to a reliance on various metrics and intermediate components (perplexity, embeddings, etc.). We take a novel approach by considering content the LLM produces and avoid statistical methods, instead focusing on broad ideas through knowledge graphs. We utilize similarity metrics for content matching, and explore graph theory in the realm of large language models through a novel isomorphism metric to assess idea structure. We plan on submitting this paper to TACL, and hope that it advances the use of knowledge graphs and other visual knowledge representation techniques in large language models. Stay tuned. Link to the paper: https://lnkd.in/e4pEiD_2

Ensuring Responsible Sourcing of Large Language Model Training Data Through Knowledge Graph Comparison

arxiv.org
Like Comment
To view or add a comment, sign in
Yamil Garcia

Tech enthusiast, embedded systems engineer, and passionate educator! I specialize in Embedded C, Python, and C++, focusing on microcontrollers, firmware development, and hardware-software integration.
1mo
Report this post
Scientific papers are sometimes hard to understand because of the complex structure and longer text, which makes us unable to know where to start. Luckily, we can use Language Models to simplify the reading process by summarizing them. https://lnkd.in/esVJ3eDj

How to Summarize Scientific Papers Using the BART Model with Hugging Face Transformers - KDnuggets

kdnuggets.com
Like Comment
To view or add a comment, sign in
Venkatesh Vinayakarao

Principal Engineer (Map Search @ Here) | Ex-Microsoft Bing, Yahoo | PhD in Code Search | IIIT Delhi, Carnegie Mellon University
8mo
Report this post
Haven't you seen the sign, "RAGGING is strictly prohibited"? Seems like RAG'ging is the hot thing nowadays. "Retrieval-Augmented Generation (RAG) has recently emerged as a method to extend beyond the pre-trained knowledge of Large Language Models by augmenting the original prompt with relevant passages or documents retrieved by an Information Retrieval (IR) system." - From "The Power of Noise: Redefining Retrieval for RAG Systems". https://lnkd.in/gjUzYuYE.

The Power of Noise: Redefining Retrieval for RAG Systems

arxiv.org
Like Comment
To view or add a comment, sign in
Javier González

Senior Principal Research Manager @ Microsoft Research Cambridge
6mo
Report this post
We are thrilled to announce the release of our latest paper, “Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models”, authored by Aditya Nori and myself. In this paper, we explore the reasoning abilities of large language models (LLMs) by distinguishing between two key aspects: the accuracy with which an LLM solves a problem and its capacity to understand and process the fundamental elements that lead to that solution. We introduce a framework to assess how effectively LLMs can replicate real-world reasoning mechanisms using two essential probabilistic concepts: the probability of necessity (PN) and the probability of sufficiency (PS), which are essential concepts for connecting causes to their effects. By viewing LLMs as abstract machines that process information through a natural language interface, we examine the conditions under which LLMs can compute suitable approximations of these quantities. We hope you enjoy reading it! Paper: https://lnkd.in/eXZ6Jdvv #AIResearch #MachineLearning #ArtificialIntelligence #MicrosoftResearch #LLMs #Innovation

3 Comments
Like Comment
To view or add a comment, sign in
Kevin Shen

AI Solutions for Materials Science | Computation, ML & Data Science, Cheminformatics
10mo
Report this post
What do language models learn? Besides patterns and relationships, there’s also the raw information and knowledge. This paper does some really nice analysis of the latter. (It’s also part of a really interesting series of papers at large) https://lnkd.in/gDdf6_gs

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

arxiv.org
Like Comment
To view or add a comment, sign in
Mark Madsen

Applied analytics/data/ML advisor, architect, public speaker, with deep technical knowledge and cross-industry business experience
8mo
Report this post
Another day, another paper with the same conclusion. This paper is directly related to a prior post (LLM to do regressions, just say no). In this case the paper is "Are Language Models Actually Useful for Time Series Forecasting?" TL;DR - No, they are not, relative to existing methods. The authors are kind in that the abstract saves you the reading if you aren't interested in the underlying details. If you remove the LLM component you get the same or better results. Share and enjoy: https://lnkd.in/gVYKi-Gt

2406.16964

arxiv.org
Like Comment
To view or add a comment, sign in
Farshad Ghodsian

Sr. Technical Product Manager - AI Infrastructure & MLOps @ AMD
10mo Edited
Report this post
If you are curious about how different advanced Retrieval Augmented Generation (RAG) techniques compare to each other and to the basic/naive RAG approach then this recently published paper is worth a read. It takes a systematic approach to evaluating different RAG techniques including HyDE, LLM Reranking, MMR, etc. and even combines techniques to see how they fare against each other. TLDR: If you are not already using or experimenting with HyDE, LLM Rerank or Sentence Window Retrieval in your RAG system you should be as they were shown to be the most effective for retrieval precision: https://lnkd.in/eBjvAdAU #GenAI #RAG

2404.01037.pdf

arxiv.org
Like Comment
To view or add a comment, sign in
Bhaskarjit Sarmah

Head RQA AI Labs at BlackRock | Gen AI Leader
2mo
Report this post
Excited to share our latest paper titled - 𝐇𝐨𝐰 𝐭𝐨 𝐂𝐡𝐨𝐨𝐬𝐞 𝐚 𝐓𝐡𝐫𝐞𝐬𝐡𝐨𝐥𝐝 𝐟𝐨𝐫 𝐚𝐧 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐌𝐞𝐭𝐫𝐢𝐜 𝐟𝐨𝐫 𝐋𝐚𝐫𝐠𝐞 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬. This work addresses the critical gap in research on setting robust thresholds for LLM evaluation metrics, which are essential for the reliable deployment of LLMs. Drawing inspiration from model risk management practices in regulated industries like finance, we propose a systematic methodology to define these thresholds. The approach involves identifying risks associated with the specific application, understanding stakeholders' risk tolerance, and applying statistically rigorous procedures using ground-truth data.

Dhagash Mehta, Ph.D.

Head of Applied Artificial Intelligence Research for Investment Management
2mo

My new paper, titled 'How to Choose a Threshold for an Evaluation Metric for Large Language Models', coauthored with Bhaskarjit Sarmah Mingshu Li, Ph.D. Jingrao Lyu Nathalia Castellanos Stefano Pasquali is out on arxiv today. Here is the preprint: https://lnkd.in/eGQMPSEm

How to Choose a Threshold for an Evaluation Metric for Large Language Models

arxiv.org

4 Comments
Like Comment
To view or add a comment, sign in

3,646 followers

159 Posts

View Profile Connect

Behrooz Omidvar-Tehrani’s Post

More Relevant Posts

Explore topics