Patronus AI’s Post

5,345 followers

2mo

Introducing Lynx v2.0, an 8B State-of-the-Art RAG hallucination detection model 🚀 Since we released Lynx v1.1 a few months ago, hundreds of thousands of developers have used it in all kinds of real world applications for real-time RAG hallucination detection. ⚡ Now, Lynx v2.0 is even better 👀 and it was trained on long context data from real world domains like finance and medicine. - Beats Claude-3.5-Sonnet on HaluBench by 2.2% - 3.4% higher accuracy than Lynx v1.1 on HaluBench - Optimized for long context use cases - Detects 8 types of common hallucinations, including Coreference Errors, Calculation Errors, CoT hallucinations, and more! Use Lynx 2.0 with any of our Day 1 integration partners like NVIDIA, MongoDB, and Nomic AI ✨ And that’s our 10th day of Christmas at Patronus AI 😉🌲 2 more to go! Try it out with the Patronus API: https://app.patronus.ai Read the docs: https://lnkd.in/e-rMzMe8 Read the Lynx arXiv paper: https://lnkd.in/eznVjrWA Read the Lynx blog: https://lnkd.in/eYaP5Zpe

To view or add a comment, sign in

More Relevant Posts

Yuze Ma

AI & HPC Infra | Founding PM @ Lepton AI | ex-AI HPC @ Alibaba Cloud
2mo
Report this post
In the rapidly evolving world of AI, hallucinations in RAG can lead to unreliable outputs and eroded trust—serious challenges for businesses relying on accurate and dependable AI systems. At the end of the day, accountability is key. Businesses need AI solutions they can trust to deliver accurate and consistent results. Huge kudos to the Patronus team for tackling such an essential challenge—excited to see the positive impact this will have! 🥰
Rebecca Qian

Co-Founder & CTO @ Patronus AI | LLM and Agent Evaluation
2mo

Introducing Lynx v2.0, an 8B State-of-the-Art RAG hallucination detection model 🚀 Since we released Lynx v1.1 a few months ago, hundreds of thousands of developers have used it in all kinds of real world applications for real-time RAG hallucination detection. ⚡ Now, Lynx v2.0 is even better 👀 and it was trained on long context data from real world domains like finance and medicine. - Beats Claude-3.5-Sonnet on HaluBench by 2.2% - 3.4% higher accuracy than Lynx v1.1 on HaluBench - Optimized for long context use cases - Detects 8 types of common hallucinations, including Coreference Errors, Calculation Errors, CoT hallucinations, and more! Use Lynx 2.0 with any of our Day 1 integration partners like NVIDIA, MongoDB, and Nomic AI ✨ And that’s our 10th day of Christmas at Patronus AI 😉🌲 2 more to go! Try it out with the Patronus API: https://app.patronus.ai Read the docs: https://lnkd.in/eM9UidC4 Read the Lynx arXiv paper: https://lnkd.in/eFpqs9Qq Read the Lynx blog: https://lnkd.in/eP42J2Ug
Like Comment
To view or add a comment, sign in
Rebecca Qian

Co-Founder & CTO @ Patronus AI | LLM and Agent Evaluation
2mo
Report this post
Introducing Lynx v2.0, an 8B State-of-the-Art RAG hallucination detection model 🚀 Since we released Lynx v1.1 a few months ago, hundreds of thousands of developers have used it in all kinds of real world applications for real-time RAG hallucination detection. ⚡ Now, Lynx v2.0 is even better 👀 and it was trained on long context data from real world domains like finance and medicine. - Beats Claude-3.5-Sonnet on HaluBench by 2.2% - 3.4% higher accuracy than Lynx v1.1 on HaluBench - Optimized for long context use cases - Detects 8 types of common hallucinations, including Coreference Errors, Calculation Errors, CoT hallucinations, and more! Use Lynx 2.0 with any of our Day 1 integration partners like NVIDIA, MongoDB, and Nomic AI ✨ And that’s our 10th day of Christmas at Patronus AI 😉🌲 2 more to go! Try it out with the Patronus API: https://app.patronus.ai Read the docs: https://lnkd.in/eM9UidC4 Read the Lynx arXiv paper: https://lnkd.in/eFpqs9Qq Read the Lynx blog: https://lnkd.in/eP42J2Ug
2 Comments
Like Comment
To view or add a comment, sign in
Patronus AI

5,345 followers
4mo Edited
Report this post
Introducing the Patronus API - leading AI evaluation models 🚀 - Beats ragas on RAG evaluation tasks - Beats Llama Guard and Perspective on safety tasks - LLM judges better than SOTA LLMs - Excels in practical domains like finance and customer support Hundreds of elite AI teams across companies like Hospitable.com, Exa, and Algomo use Patronus to do alpha evals. ⚡ We are also excited to launch the Patronus API with Day 1 integration partners like NVIDIA, MongoDB, IBM, Portkey, and Nomic AI. The best is yet to come 🚀 Try it out: https://app.patronus.ai/ Patronus Evaluator Benchmarking: https://lnkd.in/eDkvGK8F Read the VentureBeat coverage: https://lnkd.in/eA2kx8Uf Read how Hospitable.com, Exa, and Algomo use Patronus: Day 1 Integrations with NVIDIA, MongoDB, IBM, Portkey, Nomic AI: https://lnkd.in/etEgykMG 🚀

14 Comments
Like Comment
To view or add a comment, sign in
Andrew Steinberg

Technical Sourcer
2mo
Report this post
Enhance #RAG with LLM-driven knowledge graphs. Read our technical deep dive on insights, techniques, and #GitHub code to replicate workflows for GPU-accelerated graph creation, with NVIDIA NeMo, NIM, and cuGraph. #LLM #GenerativeAI #GPU #NvidiaNeMo #NIMs

Insights, Techniques, and Evaluation for LLM-Driven Knowledge Graphs | NVIDIA Technical Blog

developer.nvidia.com
Like Comment
To view or add a comment, sign in
Tim Herfurth

Dr. phil. nat. | Lead Data Scientist at Zühlke | AI & Data consulting
2mo
Report this post
OpenAI's o3 model is making big waves, demonstrating impressive improvements on challenging reasoning benchmarks. [Some say - again - it's close to AGI. I don't think so. See, for example, Melanie Mitchell's blog post: 📚 https://lnkd.in/ewvyseQq] A key of o3's working - as far as can be known - lies in its use of 𝙩𝙚𝙨𝙩-𝙩𝙞𝙢𝙚 𝙘𝙤𝙢𝙥𝙪𝙩𝙖𝙩𝙞𝙤𝙣—a method that allows models to dynamically allocate more computational resources to solve harder problems, rather than depending solely on larger pretraining budgets. This shift toward test-time compute scaling is particularly exciting because it shows how smaller models can sometimes match or even exceed the performance of much larger ones by “thinking longer” during inference. But it also can be veeery expensive at inference time. If you’d like to learn more about this approach and its practical implementation, I recommend this blog post by Hugging Face. 👉🏻 https://lnkd.in/edzaJ8Vu The post provides both theoretical insights and hands-on strategies for applying these ideas. It covers: 🔹 Compute-optimal scaling: Insights into enhancing model performance during test-time. 🔹 Diverse Verifier Tree Search (DVTS): A novel extension to search techniques that boosts diversity and results. 🔹 Search and Learn toolkit: Tools for implementing test-time strategies efficiently. #AI #LLM #GPT #o3

Scaling test-time compute - a Hugging Face Space by HuggingFaceH4

huggingface.co
Like Comment
To view or add a comment, sign in
Ravit Jain Ravit Jain is an Influencer

Founder & Host of "The Ravit Show" | LinkedIn Top Voice | Startups Advisor | Gartner Ambassador | Evangelist | Data & AI Community Builder | Influencer Marketing B2B | Marketing & Media | (Mumbai/San Francisco)
2mo Edited
Report this post
I visited one of the fanciest booths at Re: Invent to learn more about what they are building! DataStax is always innovating and building great things. We discussed about Apache Cassandra, Langflow, DataStax <> NVIDIA Partnership! Don't forget to collect your Bodi from the booth 1328! It was good to meet Phil Nash, Patrick McFadin, and Alejandro Cantarero. Thanks for giving me a quick demo and insight into the cool things DataStax is doing! #data #ai #awsreinvent #awsreinvent2024 #reinvent2024 #datastax #theravitshow

14 Comments
Like Comment
To view or add a comment, sign in
Dhiraj Patra

AI, ML, GenAI, IoT Innovator | Software Architect | Cloud | Data Science
7mo
Report this post
Google DeepMind Presents MoNE: A Novel Computer Vision Framework for the Adaptive Processing of Visual Tokens by Dynamically Allocating Computational Resources to Different Tokens https://lnkd.in/gmHynSYZ

Google DeepMind Presents MoNE: A Novel Computer Vision Framework for the Adaptive Processing of Visual Tokens by Dynamically Allocating Computational Resources to Different Tokens

https://www.marktechpost.com
Like Comment
To view or add a comment, sign in
Anand Kannappan
4mo Edited
Report this post
Today, we’re excited to launch the Patronus API, the first self-serve solution to reliably detect AI failures in production! 🚀 We all know that getting AI products right is hard. Hallucinations, security concerns, and other unexpected behavior are top of mind for everyone. That’s why our research team has developed powerful evaluation models that developers can use, for both offline testing and real-time guardrails. Our customers start with our SDK to run custom evals and compare performance snapshots on their AI products. Now, Patronus Evaluators give you alpha above the baseline on evaluator quality. ✨ Create an API key, and start immediately with free credits in your account. Then, you can access models like Lynx, our flagship RAG hallucination detection model. You can also configure your own LLM judges with custom criteria, making the Patronus API usable for any AI use case. Hundreds of AI engineers across companies like Hospitable.com, Exa, and Algomo use Patronus to do alpha evals. ⚡ We are also thrilled to launch the Patronus API with our Day 1 Integration Partners: NVIDIA, IBM, MongoDB, Portkey, and Nomic AI. At Patronus AI, we are on a mission to make high quality LLM evaluation accessible to everyone. The best is yet to come! 🚀 Try it out: https://app.patronus.ai/ Read the VentureBeat coverage: https://lnkd.in/ed9Wwuxc Patronus Evaluator Benchmarking: https://lnkd.in/eShSScPA Read how Hospitable.com, Exa, and Algomo use Patronus: https://lnkd.in/euBKKjTM Day 1 Integrations with NVIDIA, MongoDB, IBM, Portkey, Nomic AI: https://lnkd.in/et7DnCMh

24 Comments
Like Comment
To view or add a comment, sign in
GPT-Lab (Tampere University)

839 followers
3mo Edited
Report this post
🌟 Exciting News from GPT-Lab! 🌟 We are thrilled to share our latest video featuring Pekka Sillberg, where they delve into the intricacies of building a GPU server and the innovative use of Generative AI for sensor data collection. This insightful session is a must-watch for anyone interested in the intersection of advanced computing infrastructure and AI-driven data techniques. 📌 Explore key insights on: - The technical challenges and solutions in constructing a high-performance GPU server. - Harnessing the capabilities of Generative AI for enhancing sensor data collection processes. Whether you are an AI enthusiast, a data scientist, or someone keen on the latest tech developments, this video promises valuable knowledge. 👉 Watch now: [Pekka Sillberg on Building a GPU Server and Using GenAI for Sensor Data Collection](https://lnkd.in/df6X_Rpf) Stay connected with GPT-Lab for more pioneering discussions and advancements in technology! Please note: This post has been crafted by an AI assistant.
Like Comment
To view or add a comment, sign in
Shivam Sourav

ASE Intern @NRI | ML Engineer @Omdena | Open Source Contributor
9mo
Report this post
For the community, by the community." Planning a bootcamp on the basics of Hugging Face and LlamaIndex . With heavy focus on fine tuning the open source LLMs using Lightning AI gpu environment. Will make sure everyone knows about the new technologies. #opensource #LLM #ArtificialIntelligence #AIagent
9 Comments
Like Comment
To view or add a comment, sign in

5,345 followers

View Profile Connect

Patronus AI’s Post

More Relevant Posts

Explore topics