Derek Thomas’ Post

Chief Hugging Officer at 🤗 Best hugger in the company!

6mo

I couldn't be more proud to write my first official 🤗 blogpost 🎉! I explore the often overlooked yet crucial topic of profiling LLM deployments, specifically with TGI's Benchmarking Tool. It’s a vast area but so vital for understanding the performance nuances of our models, especially for different use-cases. Have you ever encountered surprises or challenges while profiling LLMs? I’d love to hear your experiences and insights! Check out the blog and let’s discuss! 📈 https://lnkd.in/gwK56EXa #LLM #GenAI #TGI #performance

Benchmarking Text Generation Inference

huggingface.co

7 Comments

Rafael Hernández Murcia

Machine Learning Team Lead | Data Scientist Senior Manager | Lecturer | Kaggle Master

6mo

The benchmarking tool of TGI is very useful. Thanks for the thoughtful analysis! I really like the way you described different tradeoffs linked to user experience. Regarding your statement: “It's important to keep track of actual user behavior. When we estimate user behavior we have to start somewhere and make educated guesses. “ What do you think about load testing tools for this purpose? In my team we found really useful to simulate different applications and set rate limits on TGI using k6. Would be great if you have time for a chat about this topic any time soon. Maybe that could be a great idea for a second blog post 🤗

1 Reaction

Shahnawaz Gaur

6mo

LLM deployments are increasingly significant in the AI field, serving as the backbone for a variety of applications, from natural language processing to generative content creation. Their importance lies in their ability to understand, generate, and interact with human-like text, making them crucial for advancing AI technologies and applications. TGI is known for various things depending on the context. It can refer to TGI Fridays, an American restaurant chain; Triumph Group, recognized for its stock performance and market outlook; and Tropical General Investments, known for its efforts to tackle hunger and foster entrepreneurship. Additionally, in the context of technology and AI, TGI refers to Text Generation Inference's Benchmarking Tool, crucial for profiling LLM deployments. Understanding performance nuances in AI modeling is crucial because it allows for the optimization of models for specific tasks, ensuring they operate efficiently and effectively. It also helps in identifying and mitigating potential biases, ensuring fairness and accuracy in AI applications.

4 Reactions

Rajiv Shah

Bridging the gap from demo to production for Generative AI

6mo

Very useful at covering the nuances around deploying LLMs!

3 Reactions

Abhishek Bisht

<NLP,Machine Learning, GenAI ,Techie> 🦙

6mo

Thanks, Much needed writeup from hf on tgi benchmarking tool, which is an underrated/ hidden gem IMO 🤗

1 Reaction

Jeff Boudier

Product + Growth at Hugging Face

6mo

Great post Derek! TGI Benchmarking Tool makes it easy to measure latency and throughput for one's particular use case and data, which is the only measures that matter!

3 Reactions

Marcel Boersma, PhD

Building AI-driven audits | AI taskforce member | Senior (Engineering) Manager | PhD University of Amsterdam | Researching computational and AI techniques

6mo

Aleksei Maliutin

See more comments

To view or add a comment, sign in

More Relevant Posts

Antonio Zarauz Moreno

Cognitive-AI Tech Lead @Credicorp
6mo
Report this post
Excellent guide on how to deploy and benchmark LLMs.

Derek Thomas

Chief Hugging Officer at 🤗 Best hugger in the company!
6mo

I couldn't be more proud to write my first official 🤗 blogpost 🎉! I explore the often overlooked yet crucial topic of profiling LLM deployments, specifically with TGI's Benchmarking Tool. It’s a vast area but so vital for understanding the performance nuances of our models, especially for different use-cases. Have you ever encountered surprises or challenges while profiling LLMs? I’d love to hear your experiences and insights! Check out the blog and let’s discuss! 📈 https://lnkd.in/gwK56EXa #LLM #GenAI #TGI #performance

Benchmarking Text Generation Inference

huggingface.co
Like Comment
To view or add a comment, sign in
Francesco Tonolini

Applied Scientist at Amazon
3mo
Report this post
Excited to have presented our paper on Bayesian Prompt Ensembles at ACL 2024. In this paper, we explore the use of multiple prompt re-formulations to capture model uncertainty in LLMs. We derive the connection between prompt re-formulation and model uncertainty/Bayesian NNs and propose a method to learn the relative importance of different prompts for best performance. As part of this work, we open-sourced a code package and tutorial with which you can a) easily use LLMs to perform classification tasks and b) apply our method, BayesPE, to improve uncertainty estimation through the use of multiple prompts. Hope it is useful to folks out there! Paper: https://lnkd.in/etpRRwJC Code: https://lnkd.in/eDHdQ6k4

Bayesian prompt ensembles: Model uncertainty estimation for black-box large language models

amazon.science
Like Comment
To view or add a comment, sign in
Abha Dawesar
2w
Report this post
What is your biggest challenge in evaluating LLMs today? Here's a must-read article by Rama Ramakrishnan on gaining value from LLMs with a practical LLM cost equation. What sort of data curation will you need? How will you fix errors? Can you estimate ongoing run costs? #GenAI#ArtificialIntelligence#BusinessTransformation https://lnkd.in/e4tyKWzi

A Practical Guide to Gaining Value From LLMs

sloanreview.mit.edu
Like Comment
To view or add a comment, sign in
Hendrix Liu

Co-Founder @ Keywords AI (YC W24) | LLM evals + prompt management
5mo
Report this post
I just published a blog on how to fine tune LLMs and create custom datasets. In this blog, I cover: - What fine-tuning is and its benefits - When to fine-tune a model - Methods for fine-tuning, including full fine-tuning and parameter-efficient fine-tuning - How to prepare your custom dataset Check it out here: https://lnkd.in/gDcrrBGP

Fine-Tuning LLMs with Custom Datasets

keywordsai.substack.com

1 Comment
Like Comment
To view or add a comment, sign in
Lucas Lima

Data & AI Consulting Director - Generative AI @ NTT Data // I transform novelty ideas into reality to bring positive impact to people's Lives
2mo
Report this post
https://goo.gle/3AVM3yc Another scenario where RAG is a better choice to ground responses... "Trade-offs of the RAG approach Advantages to using this approach are that RAG automatically benefits from ongoing model evolution, particularly improvements in the LLM generating the final response. As this LLM advances, it can better utilize the context retrieved by RAG, leading to more accurate and insightful outputs even with the same retrieved data generated by the query LLM. A disadvantage is that modifying the user's prompt can sometimes lead to a less intuitive user experience. In addition, the effectiveness of grounding depends on the quality of the generated queries to Data Commons."

Grounding AI in reality with a little help from Data Commons

research.google
Like Comment
To view or add a comment, sign in
Daniel Puente Viejo

Generative AI Engineer II @ NTT Data | Machine Learning Engineer | NLP | Deep Learning | Deep Knowledge Graphs | Data Science | Microsoft Azure | Amazon Web Services (AWS)
7mo
Report this post
🥳 New small article published!! 📄 Key metrics for evaluating a RAG system 📄 RAG systems are becoming increasingly popular, but questions remain about evaluation methods to ensure they are working well. In this article you will find some innovative approaches to evaluate your RAG system and see how good it is without much effort. It has been written to be intuitive and easy to follow. You can see it displayed on Medium. Thank you very much for reading it, I hope you like it!! Medium publication: https://lnkd.in/dqcyb6_s #rag #llm #genai #machinelearning

Key metrics for evaluating a RAG system

medium.com

2 Comments
Like Comment
To view or add a comment, sign in
Sam Miller

Co-Founder, Artanis | AI PhD, Alan Turing Institute
6mo Edited
Report this post
**WorkBench - New LLM Evaluation Dataset** LLM agents can access external tools, such as calendars and search engines, via API calls. They’re able to overcome some common shortcomings of LLMs like i) inability to interact with the real world, and ii) hallucinations. Businesses are excited about agents. They take LLMs from "cool they can summarize an email for me" to "OMG they can send emails, schedule meetings and chase leads too!" Even better, this can all be done via a chat interface like Slack. But are these agents reliable in practice? WorkBench shows that, for agents out-the-box, the answer is No. WorkBench is a sandbox environment with databases and tools representing a realistic workplace environment. We tested 5 models and found the best-performing approach (ReAct with GPT-4) was **only 43% accurate**. In many cases, errors led to severe consequences such as sending private emails to the wrong people! I want to thank all the people who helped shape WorkBench: Olly, my co-lead author, for making this a lot of fun to work on. Bertie, Patricio, Tanaya for contributions throughout, this paper hugely benefited from frequent iteration. Tom, Duncan, Tishtrya, Daniel, Tony, Vlad for great feedback at the end, making this accessible to a non-academic audience. Paper: https://lnkd.in/eU44UC_J Github: https://lnkd.in/eRVHSzJG
6 Comments
Like Comment
To view or add a comment, sign in
Randy Johnston

We help accountants use technology better so they can achieve their goals. Our specialty is CPA firms and we have consulted for most Top 100 firms, technology providers, and a variety of industries in North America.
1mo
Report this post
Well written with speed comparison tables

Anthropic Claude: How to use the impressive ChatGPT rival | Digital Trends

digitaltrends.com
Like Comment
To view or add a comment, sign in
Mahmood Khan

AI/Python developer@AgileLoop | ML Engineer | Large Action Models | Transformers| AGI | OpenCV | Java | Python | Restful APIs | GCP | Prompt Engineer |
2mo
Report this post
Anthropic context retrieval where u can simply add context to your embeddings. for better understanding
Aymeric Roucher

Building agents @ Hugging Face 🤗 | Polytechnique - Cambridge
2mo Edited

Anthropic just released a chunk improvement technique that vastly improves RAG performance! 🔥 Crash reminder: Retrieval Augmented Generation (RAG) is a widely-used technique for improving your LLM chatbot's answers to user questions. It goes like this: instead of generating an LLM answer straight away, it just adds a previous step called Retrieval, that retrieves relevant documents from your knowledge base through semantic search, and just appends the top K documents to the prompt. ➡️ As a result, the LLM answer is grounded in context. ⛔️ The difficulty with this retrieval step is that when you split your documents into chunks that will be retrieved, you lose context. So importance chunks could be missed. 💡 Anthropic has just released a blog post that shows that you can use one LLM call to generate a bit of context for each chunk. Then you embed together the original chunk + this bit of added context, so that the embedding is much more representative of the document in its context! 🤔 Isn't that crazy expensive? Well it would have been before, but not so much anymore with their new Prompt caching feature that makes duplicating thousands of requests with the same prompt much less expensive. They give an indicative price tag of only $1.02 per million chunks processed! ✅ And this vastly improves performance on their benchmark! Read their blog post 👉 https://lnkd.in/ehw85wn5
Like Comment
To view or add a comment, sign in
Juan L. Chulilla Ph.D.

NATO Serge Lazareff Prize. Founding partner, Red Team Shield. European Defense Agency non-Government Expert. Corporate and User Researcher. Lecturer @ UNIR
9mo Edited
Report this post
I just read an interesting post about the possible expiration of RAG (Retrieval Augmented Generation) based frameworks. Recall that these frameworks extend the capabilities of a large language model in documentation or Q&A (Questions and Answers) systems by chunking the documentation, vectorizing it with an embedding engine, retrieving the most relevant chunks for the answer by measuring the semantic distance to the input and using the aggregate of these segments to generate the answer. A Twitter user named @shaun.agi has shared a brief analysis claiming that the future context window of new LLMs, with Google's Gemini at the forefront, renders the advantages of RAG systems moot. Here is his analysis: https://lnkd.in/d2DVXcq4 In my view, this conclusion is not accurate, especially in the short term. Firstly, while 10M tokens might seem substantial, and indeed it is for a single issue, it falls short for specialized domain data products or documentation amassed over years by significant organizations. At Red Team Shield, S.L., for instance, we manage documentation several orders of magnitude larger. Secondly, the use of proprietary LLMs outside an organization's premises is not always viable for handling the most sensitive information. It might be suitable for Open Source information, but not for data we do not want to leave the company's premises. For these reasons, I believe the author's claim that a 10M context window can solve 90% of use cases is an overstatement. On the contrary, I anticipate that sensitive information will continue to be managed in the short to medium term using modular versions of RAG systems, complete with all necessary redundancies and guardrails, and achieving the best balance with the investment of human hours in information management. Remember, humans may be out of the loop in operations, but never in responsibilities. Short term == summer 2024 Medium term == summer 2025 Who knows == september 2025

Shaun.AGI (@agishaun) on X

twitter.com
Like Comment
To view or add a comment, sign in

7,219 followers

48 Posts

View Profile Connect

Derek Thomas’ Post

More Relevant Posts

Explore topics