Patronus AI’s Post

Name: Last Thursday marked the end of the 12 Days of Christmas at Patronus AI 🎄… | Patronus AI
Uploaded: 2024-12-23T23:53:03.087Z
Duration: 18 s
Channel: Patronus AI

Patronus AI

5,345 followers

2mo

Last Thursday marked the end of the 12 Days of Christmas at Patronus AI 🎄 In case you missed it, here's a recap of everything we announced ⬇️ Day 1: Automatic Failure Highlighting in LLM Outputs Day 2: FinanceBench v1.1 Day 3: Adaptive Dataset Uploads Day 4: 100 Prompt Injections Day 5: Patronus Experiments Day 6: Patronus Comparisons 2.0 Day 7: SOC-2 Type 1 Compliance Day 8: Excessive Agency Test Suite Day 9: 360 Degree Human Annotation Day 10: Lynx 2.0 Day 11: Criteria Copilot Day 12: Glider More coming soon 👀 But for now, merry Christmas! 🎅

Transcript

I wish you a Merry Christmas, I wish you a Merry Christmas, I wish you a Merry Christmas and a Happy New Year.

To view or add a comment, sign in

More Relevant Posts

Suren Konathala

Engineering Manager | Java & GenAI for Software Engineering & CX | Building & Leading Tech Teams | Open Source Advocate | Member - IEEE, ACM, AAAI
1mo
Report this post
AI open-source world is exploding!!! #Qwen2.5-Max > #Deepseek > Meta While the world is still not out of the Deepseek shock, yet another open-source model - Qwen 2.5-Max as they say "..achieves competitive performance against the top-tier models, and outcompetes DeepSeek V3 in benchmarks like Arena Hard, LiveBench, LiveCodeBench, GPQA-Diamond."
Qwen

6,138 followers
1mo

The burst of DeepSeek V3 has attracted attention from the whole AI community to large-scale MoE models. Concurrently, we have been building Qwen2.5-Max, a large MoE LLM pretrained on massive data and post-trained with curated SFT and RLHF recipes. It achieves competitive performance against the top-tier models, and outcompetes DeepSeek V3 in benchmarks like Arena Hard, LiveBench, LiveCodeBench, GPQA-Diamond. 📖 Blog: https://lnkd.in/gUydnVsf 💬 Qwen Chat: https://chat.qwenlm.ai (choose Qwen2.5-Max as the model) ⚙️ API: https://lnkd.in/gH4Ny9YC （check the code snippet in the blog） 💻 HF Demo: https://lnkd.in/gE9qvf2t In the future, we not only continue the scaling in pretraining, but also invest in the scaling in RL. We hope that Qwen is able to explore the unknown in the near future! 🔥 💗 Thank you for your support during the past year. See you next year!
Like Comment
To view or add a comment, sign in
Giacomo Maria Pilia

Global Head Of Services and Retail Education at CHANEL Fragrance and Beauty // Crafting the future of Client Experience and Education.
1mo
Report this post
Hello, Qwen ! Right after DeepSeek rise, a new AI powerhouse enters the arena, this time, developed by the Chinese giant Alibaba Group While DeepSeek made waves, Qwen is stepping into the spotlight, outperforming its rival across multiple benchmarks. One thing is clear: Geopolitics is fueling AI innovation and investment, pretty much like the space race of the Cold War once did. The U.S. 🇺🇸and China 🇨🇳 are going BIG, but is there room for Europe 🇪🇺 in this battle? Yet beyond competition, this is a bigger shift: AI is evolving at an unprecedented pace, and the race is no longer just about SIZE. it’s about intelligence, adaptability, and real-world impact. #AI #ArtificialIntelligence #LLM #Innovation #FutureOfTech
Qwen

6,138 followers
1mo

The burst of DeepSeek V3 has attracted attention from the whole AI community to large-scale MoE models. Concurrently, we have been building Qwen2.5-Max, a large MoE LLM pretrained on massive data and post-trained with curated SFT and RLHF recipes. It achieves competitive performance against the top-tier models, and outcompetes DeepSeek V3 in benchmarks like Arena Hard, LiveBench, LiveCodeBench, GPQA-Diamond. 📖 Blog: https://lnkd.in/gUydnVsf 💬 Qwen Chat: https://chat.qwenlm.ai (choose Qwen2.5-Max as the model) ⚙️ API: https://lnkd.in/gH4Ny9YC （check the code snippet in the blog） 💻 HF Demo: https://lnkd.in/gE9qvf2t In the future, we not only continue the scaling in pretraining, but also invest in the scaling in RL. We hope that Qwen is able to explore the unknown in the near future! 🔥 💗 Thank you for your support during the past year. See you next year!
Like Comment
To view or add a comment, sign in
Sandar Aung

IT Sales_Business Development (Software Development & Infrastructure Solutions)
3mo Edited
Report this post
Presenting the AI Power Era - the Data to Decision event, brought to you by ATG Systems! ❤️ 🖤 Here is the Pilot! ⚡ #atgsystems #dataevent #itsolutionsprovider
Like Comment
To view or add a comment, sign in
Aviral Shukla

Distributed Systems|LLM|Software Engineering
3mo
Report this post
GenAI is no stranger today but if you are looking for getting relevant results we need to inject subtleties by using feature store and document store (RAG) to improve the response from genAI systems .
Like Comment
To view or add a comment, sign in
AI Makerspace

10,388 followers
6mo
Report this post
To better understand fine-tuning our embedding models, we need to embed ourselves in the embedding layer of our embedding model to embed… ERROR! 🚨 An infinite loop has been detected. #llm #rag #genai

2 Comments
Like Comment
To view or add a comment, sign in
Sahar Nesaei, PhD

Generative AI/ML Scientist| Build LLM applications
6mo
Report this post
Could it get any more dope than diving deep into fine-tuning embeddings for better retrieval? Absolutely not! This level of insight is uniquely embedded in the AI Makerspace no other spaces compares! 🚀 Thanks 👨🏫🤖 "Dr. Greg" Loughnane and Chris Alexiuk. #AIMakerspace #AI #RAG #MachineLearning

AI Makerspace

10,388 followers
6mo

To better understand fine-tuning our embedding models, we need to embed ourselves in the embedding layer of our embedding model to embed… ERROR! 🚨 An infinite loop has been detected. #llm #rag #genai

1 Comment
Like Comment
To view or add a comment, sign in
Lewis Tunstall

LLM Engineering & Research at 🤗
10mo Edited
Report this post
The new Mixtral-8x22B base model from Mistral AI is a total beast for fine-tuning and has produced some of the highest scores I've ever seen on chat benchmarks like IFEval and BBH 🤯 We teamed up with Argilla and KAIST AI to fine-tune it with a brand new recipe for Zephyr models 🪁: https://lnkd.in/dhAmb4c2 🧑🍳 Align the base model with Odds Ratio Preference Optimisation (ORPO). This novel algorithm does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO 🦫 Use a brand new dataset of 7k high-quality, multi-turn preferences that has been developed by our friends at Argilla: https://lnkd.in/drPCMX4S As usual, we are open sourcing the training code in the Alignment Handbook for the community to build on: https://lnkd.in/dhDn9_fX This has been a epic speed run with Jiwoo Hong, Noah Lee, and Álvaro Bartolomé del Canto - now I can finally sleep 😂
27 Comments
Like Comment
To view or add a comment, sign in
Mark J. Fletcher, ENP

VP Public Safety | 911inform, NG911 SME - Federal MLTS Expert on Kari's Law | RAY BAUM'S Act | Alyssa's Law. NENA Northeast Regional Director
4mo
Report this post
AI in Public Safety Panel session. Scan the QR Code to add this session to your calendar!
Like Comment
To view or add a comment, sign in
Abdul Basit

Android Developer | Kotlin Enthusiast | Crafting Scalable & High-Performance Apps
3mo
Report this post
Built an AI video generator using RunwayML API 🪄✨. Leveraged Retrofit for efficient API interactions and Room for robust local data storage. repo : https://lnkd.in/duKmNefp
Like Comment
To view or add a comment, sign in
Terry Feltham

Business Architect
2mo Edited
Report this post
OpenAI has just shown that for all intents and purposes, we now live in the era of AGI. With its new o3 model announced today, OpenAI has scored an unbelievable 87.5% on the ARC Prize AGI eval. Until today, no other model or combination of models has been able to achieve over 46%. This represents a quantum leap in an area of human-like reasoning that has been extremely difficult for machine models to approach. https://lnkd.in/gm9XqM7g
Like Comment
To view or add a comment, sign in

5,345 followers

View Profile Follow

Patronus AI’s Post

Transcript

More Relevant Posts

Explore topics