Rebecca Li’s Post

AI Investor @ Costanoa VC; ex-Databricks; ex-Weights & Biases

8mo

LLMOps can be boiled down to three things: 1. Make sure the LLM system does what it is supposed to do: eval 2. Make sure the LLM system doesn't do what it is not supposed to do: red-teaming, guardrails 3. Have continuous visibility into the previous two: observability and analytics into dev/test/prod 2 & 3 can be automated or heavily augmented. 1 is the hardest to automate given every LLM application has its own intent. A design pattern for 1 is the continuous articulation of human intent to an AI evaluator, so the AI can execute them. The future of eval tools augment human evaluators by elicit an important set of high-level intent from human and translate that to low-level assertions. Shreya Shankar did a wonderful job with the EvalGen framework. Who else is working on it?

Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences

arxiv.org

5 Comments

Brian Bradford Dunn

Founder & CEO of Rogatio.ai, an AI-native services company. ‘Retired’ Kearney Senior Partner with over 25 years of experience.

8mo

Rebecca Li I can’t say we’re working on this (we aren’t!), but I’m at least seeing an extension to this problem… We function with a host of independent LLM endpoints who all interact with each other (many-to-many). Our current scale means we’re mainly focused on #1 (2/3) and #2 (1/3) with - I agree - #1 being the most difficult (mission critical?). Because, however, we are also ‘multi-model’ (OpenAI, Anthropic, Gemini) - and ‘multiple model’ (for example, we combine GPT 3.5 and 4 into same end-point via Azure’s prompt flow), I’m anticipating some real complexity of LLMOps at-scale. The rise of interest in Mixture-of-Agents (MoA) will likely lead others to some of these same challenges…

Walter Haydock

I help AI-powered companies manage cyber, compliance, and privacy risk so they can innovate responsibly | ISO 42001, NIST AI RMF, HITRUST AI security, and EU AI Act expert | Harvard MBA | Marine veteran

8mo

A key thing for 1 & 2 that cannot be automated: DEFINING the business and security requirements before you start building. I frequently see engineering and product teams skip this step and just get right to it!

4 Reactions

yuxin ruan

7mo

In my experience, human eval is the bottle-neck when scaling up. Humans are just not very good at describing intentions in precision. I am working on combining LLM with rule engine to gain the right mix of flexibility and consistency.

Frederique De Letter

Analytics Leader @ Keller Williams | Data & AI Transformation

8mo

100% it is still shocking to see instances where it is LLM dev only without the ops and no path to sustainable and responsible scaling.

1 Reaction

👋 Luca Baggi

AI Engineer @xtream • Maintainer @functime.ai • Machine Learning and Statistics @UniMi

8mo

Very effective summary! s/o Emanuele

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Kirill Balakhonov

Bringing AI into Web3 @ Nethermind | Product Leadership | STEM PhD | UK Global Talent | Mentor & Educator
7mo
Report this post
🚀 LazyLLM: Dynamic Token Pruning for Efficient Long-Context LLM Inference - My Key Takeaways from the Paper 🔍 What: Researchers have developed LazyLLM, a novel method to accelerate large language models (LLMs) processing long input texts. LazyLLM dynamically selects only crucial tokens from the input at each generation step, deferring the processing of remaining tokens to later stages. ⏱️ Why: Standard LLMs spend significant time processing the entire input before generating the first token (prefilling stage). This causes delays and poor user experience, especially with lengthy inputs. The researchers aimed to expedite this process by reducing initial computations. 🧠 How: LazyLLM employs progressive token pruning based on importance, determined by attention maps from previous layers. This allows the model to select different token sets at various generation steps. An Aux Cache enables efficient retrieval of previously excluded tokens without redundant computations. 📊 Results: LazyLLM significantly improved time-to-first-token (TTFT) with minimal accuracy loss across various tasks. For instance, with the Llama 2 7B model on multi-document question answering, TTFT accelerated 2.34x with less than 1% accuracy drop. The method requires no additional training and can be easily integrated into existing LLMs. Paper: https://lnkd.in/dD5Th-QZ
Like Comment
To view or add a comment, sign in
Kapil Uthra

Driving Digital Transformation | AI & Cloud Enthusiast | OpenText ECM/EIM Expert
10mo
Report this post
𝐓𝐞𝐬𝐭𝐢𝐧𝐠 𝐥𝐚𝐫𝐠𝐞 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐦𝐨𝐝𝐞𝐥𝐬 (𝐋𝐋𝐌𝐬) 𝐢𝐬 𝐜𝐫𝐮𝐜𝐢𝐚𝐥 𝐟𝐨𝐫 𝐞𝐧𝐬𝐮𝐫𝐢𝐧𝐠 𝐚𝐜𝐜𝐮𝐫𝐚𝐜𝐲 𝐚𝐧𝐝 𝐫𝐞𝐥𝐢𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐢𝐧 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈 𝐚𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬. 𝐇𝐞𝐫𝐞 𝐚𝐫𝐞 𝐟𝐨𝐮𝐫 𝐤𝐞𝐲 𝐬𝐭𝐫𝐚𝐭𝐞𝐠𝐢𝐞𝐬: 🛠️ 𝐂𝐫𝐞𝐚𝐭𝐞 𝐓𝐞𝐬𝐭 𝐃𝐚𝐭𝐚: Develop relevant test datasets to understand user needs and benchmarks. Conduct thorough testing including unit, functional, regression, and bias testing. 🤖 𝐀𝐮𝐭𝐨𝐦𝐚𝐭𝐞 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Utilize automated methods for efficient model quality and performance evaluation. Balance automation with manual verification for comprehensive testing. 🎯 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐞 𝐑𝐀𝐆 𝐐𝐮𝐚𝐥𝐢𝐭𝐲: Employ Retrieval-Augmented Generation (RAG) techniques to enhance LLM accuracy. Test RAG effectiveness using gold standard datasets, reinforcement learning, or adversarial networks. 📊 𝐃𝐞𝐯𝐞𝐥𝐨𝐩 𝐌𝐞𝐭𝐫𝐢𝐜𝐬: Establish measurable KPIs to validate LLM performance. Consider metrics like accuracy, consistency, and user feedback for continuous improvement.

How to test large language models

infoworld.com
Like Comment
To view or add a comment, sign in
Ibrahim Al Azhar
3mo
Report this post
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models: There are two existing approaches to fine-tuning: 1. Parameter efficient fine-tuning (PEFT) (LoRa, QLoRa): update a small number of parameters; the rest will freeze. 2. Reduce the memory footprint during the training phase of fine-tuning. However, PEFT needs caching intermediate activations during forward processing, which takes the same training time as full fine-tuning and can’t reduce the memory footprint. Existing methods require a considerable amount of memory for fine-tuning. Quantized side tuning (QST) quantized LLM weights into 4-bit to reduce the memory footprint. Other models need backpropagation where QST utilizes the hidden state of LLM as a task-specific using a side network separated from LLM. QST uses several low-rank adaptors, and gradient-free downsample modules to reduce trainable parameters. https://lnkd.in/gjpK2s7w

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

arxiv.org
Like Comment
To view or add a comment, sign in
Giuseppe Canale CISSP

cybersecurity | AI | ML | coding | database | art of phish founder | occasioni.it founder | secondlife.it founder.
2mo
Report this post
Comparing RAG and Fine-Tuned Language Models: Performance analysis 💥💥 GET FULL SOURCE CODE AT THIS LINK 👇👇 👉 https://lnkd.in/dh74ZjRz In this comparison, we delve into the technical differences between Rule-based Artificial Generalization (RAG) and fine-tuned language models. RAG, a rule-based approach, employs predefined rules to understand and generate language, while fine-tuned models are pre-trained machine learning models fine-tuned on specific tasks. RAG's rigidity makes it excel in tasks where clear-cut rules exist, such as part-of-speech tagging or question answering in controlled environments. However, RAG struggles when dealing with complex, open-domain text problems due to the difficulty of capturing the nuances of language and context. Fine-tuned models, in contrast, are extremely flexible and can learn complex patterns from vast amounts of data. They perform exceptionally well on tasks like sentiment analysis, text classification, and question answering in open-domain environments. Yet, their flexibility comes at the cost of increased resource requirements and potential bias vulnerabilities. Both RAG and fine-tuned language models have their unique strengths and weaknesses. For a thorough understanding of text processing techniques, we encourage exploring the comparison between these two approaches. Additional Resources: Documentation: - [Stanford RAG](https://lnkd.in/d3xsDd-8 Stanford-CoreNLP.html#RuleBased) - [Hugging Face Transformers](https://lnkd.in/dM3myDSu) Papers: - [Bird, S., Grave, E., & Damerau, J. M. (2006). An Intelligent Information Retrieval System Using RAG: Text Matching, Question Reformulation, and Sense Disambiguation. Proceedings of the 2006 Conference on Intelligent Information and Database Systems.](https://lnkd.in/dDa9cM9W) - [Dehghani, A., Pando, I. V., Martinez-Cruzado, J. A., Suarez, A., & Gaizauskas, N. (2 #STEM #Programming #Technology #Tutorial #comparing #finetuned #language #models #performance #analysis Find this and all other slideshows for free on our website: https://lnkd.in/dh74ZjRz #STEM #Programming #Technology #Tutorial #comparing #finetuned #language #models #performance #analysis https://lnkd.in/dT-8BHyi

Comparing RAG and Fine-Tuned Language Models: Performance analysis

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Sulbha Jain

Data Scientist @ Amazon | Generative AI | Mentor
3mo
Report this post
Large Language Models (LLMs) are trained on extensive datasets, enabling them to handle a wide range of tasks with general capabilities. To enhance an LLM’s performance on particular tasks, fine-tuning is a viable option. Want to quick fine tune a custom model, here is a news classifier fine tuned on Gemini, try out: https://lnkd.in/gbc8bBCh

Fine tuning a custom model with Gemini

medium.com
Like Comment
To view or add a comment, sign in
Ben Dickson

Software Engineer | Tech Blogger
9mo
Report this post
MoRA is a parameter-efficient fine-tuning (PEFT) technique for large language models that addresses some of the limitations of other popular techniques such as low-rank adaptation (LoRA). Specifically, MoRA uses a square matrix instead of low-rank matrices used in LoRA. This structure makes it more suitable for fine-tuning tasks that require the model to learn new knowledge as opposed to instruction fine-tuning on existing knowledge. Experiments show that on memorization tasks and datasets that deviate from the base LLM's knowledge, MoRA comes much closer to full fine-tuning with the same number of parameters as a LoRA model. https://lnkd.in/eimC3S7r

Microsoft, Beihang release MoRA, an efficient LLM fine-tuning technique

https://venturebeat.com

1 Comment
Like Comment
To view or add a comment, sign in
Ajeet Singh Raina

👣 Follow me for Docker, Kubernetes, Cloud-Native, LLM and GenAI stuffs | Technology Influencer | 🐳 Developer Advocate at Docker | Author at Collabnix.com | Distinguished Arm Ambassador
5mo
Report this post
Langfun, powered by PyGlove, makes working with language models (LM) more intuitive by treating language as functions. Using Object-Oriented Prompting, it enables users to prompt LLMs with objects and types, enhancing control and simplifying agent. https://lnkd.in/gp3n97Zy

GitHub - google/langfun: OO for Language Models

github.com

2 Comments
Like Comment
To view or add a comment, sign in
Thomas N. Harrison

--
5mo
Report this post
The idea that LLMs are adaptable enough to solve cross-domain problems suggests we’re only scratching the surface of what these models can do. The ability of LLMs to handle different types of data (cross domain) (e.g., text, images, audio) and solve problems in diverse industries like healthcare, finance, logistics, and even creative fields is important. For example: Healthcare: LLMs could analyze medical records, predict outcomes, or even assist in diagnostics by processing and understanding structured patient data and imaging. Logistics: In supply chain management, LLMs might optimize routes or inventory, handling real-time data like shipment tracking or customer demand. Finance: By analyzing market trends and historical financial data, LLMs can assist in predictive analysis, fraud detection, and automated decision-making. The flexibility to work across these domains without needing custom-built models for each task could dramatically reduce costs and development time while increasing AI’s reach. If you're not thinking about how to leverage them for your industry, you might be missing out on a game-changing opportunity.
Dean Hardy-White

Marketing AI content
5mo

Andrej Karpathy tweet goes viral. Here is the breakdown: Karpathy tweeted that the name "Large Language Models" (LLMs) is misleading. These models aren't really about language specifically. They're actually very flexible tools that can work with all sorts of data, not just text. Here are the main points: 1. LLMs are good at predicting what comes next in a sequence of "tokens" (which can be pieces of text, parts of images, snippets of audio, etc.). 2. We could call them something more accurate like "Autoregressive Transformers" to reflect what they actually do. 3. LLMs can be used for many different tasks, not just language-related ones. If you can turn your problem into a sequence of tokens, you might be able to use an LLM to solve it. 4. In the future, we might see lots of different problems being solved using this same basic approach. 5. This could mean that very flexible programming tools (like PyTorch) might be overkill for many tasks, since a lot of problems could be solved with just an LLM. 6. Karpathy thinks this idea is partly true, but not completely. In essence, he's suggesting that LLMs are more versatile than their name implies, and this versatility could change how we approach many different types of problems in technology. P.S. What do you think?
Like Comment
To view or add a comment, sign in
FriendliAI

2,185 followers
8mo Edited
Report this post
🌟 Structured Outputs for LLM Agents! 🌟 Structured Output is crucial for most enterprise AI use cases, as deviations can break downstream pipelines. Our new Structured Outputs feature ensures LLM responses adhere to specific formats like JSON and other structured outputs. It provides easy integration with other systems. It enforces patterns (e.g., Regex) and formats, allowing seamless import of LLM outputs into your code, reducing parsing errors and enhancing efficiency. This feature also enables you to maintain compatibility across different systems. Our latest blog covers examples of Structured Sentiment Analysis to get output in JSON Schema, Data Wrangling with CSVs (Regex), and generating language-specific results (e.g., Korean text). Check our blog: https://lnkd.in/dQSZ6_sa #Friendli #FriendliAI #LLMserving #llmagent

Introducing Structured Output on Friendli Engine for Building LLM Agents

friendli.ai
Like Comment
To view or add a comment, sign in
Dean Hardy-White

Marketing AI content
5mo
Report this post
Andrej Karpathy tweet goes viral. Here is the breakdown: Karpathy tweeted that the name "Large Language Models" (LLMs) is misleading. These models aren't really about language specifically. They're actually very flexible tools that can work with all sorts of data, not just text. Here are the main points: 1. LLMs are good at predicting what comes next in a sequence of "tokens" (which can be pieces of text, parts of images, snippets of audio, etc.). 2. We could call them something more accurate like "Autoregressive Transformers" to reflect what they actually do. 3. LLMs can be used for many different tasks, not just language-related ones. If you can turn your problem into a sequence of tokens, you might be able to use an LLM to solve it. 4. In the future, we might see lots of different problems being solved using this same basic approach. 5. This could mean that very flexible programming tools (like PyTorch) might be overkill for many tasks, since a lot of problems could be solved with just an LLM. 6. Karpathy thinks this idea is partly true, but not completely. In essence, he's suggesting that LLMs are more versatile than their name implies, and this versatility could change how we approach many different types of problems in technology. P.S. What do you think?
29 Comments
Like Comment
To view or add a comment, sign in

4,259 followers

66 Posts

View Profile Connect

Rebecca Li’s Post

More Relevant Posts

Comparing RAG and Fine-Tuned Language Models: Performance analysis

https://www.youtube.com/

Explore topics