Last Thursday marked the end of the 12 Days of Christmas at Patronus AI 🎄 In case you missed it, here's a recap of everything we announced ⬇️ Day 1: Automatic Failure Highlighting in LLM Outputs Day 2: FinanceBench v1.1 Day 3: Adaptive Dataset Uploads Day 4: 100 Prompt Injections Day 5: Patronus Experiments Day 6: Patronus Comparisons 2.0 Day 7: SOC-2 Type 1 Compliance Day 8: Excessive Agency Test Suite Day 9: 360 Degree Human Annotation Day 10: Lynx 2.0 Day 11: Criteria Copilot Day 12: Glider More coming soon 👀 But for now, merry Christmas! 🎅
Patronus AI’s Post
More Relevant Posts
-
AI open-source world is exploding!!! #Qwen2.5-Max > #Deepseek > Meta While the world is still not out of the Deepseek shock, yet another open-source model - Qwen 2.5-Max as they say "..achieves competitive performance against the top-tier models, and outcompetes DeepSeek V3 in benchmarks like Arena Hard, LiveBench, LiveCodeBench, GPQA-Diamond."
The burst of DeepSeek V3 has attracted attention from the whole AI community to large-scale MoE models. Concurrently, we have been building Qwen2.5-Max, a large MoE LLM pretrained on massive data and post-trained with curated SFT and RLHF recipes. It achieves competitive performance against the top-tier models, and outcompetes DeepSeek V3 in benchmarks like Arena Hard, LiveBench, LiveCodeBench, GPQA-Diamond. 📖 Blog: https://lnkd.in/gUydnVsf 💬 Qwen Chat: https://chat.qwenlm.ai (choose Qwen2.5-Max as the model) ⚙️ API: https://lnkd.in/gH4Ny9YC (check the code snippet in the blog) 💻 HF Demo: https://lnkd.in/gE9qvf2t In the future, we not only continue the scaling in pretraining, but also invest in the scaling in RL. We hope that Qwen is able to explore the unknown in the near future! 🔥 💗 Thank you for your support during the past year. See you next year!
To view or add a comment, sign in
-
-
Hello, Qwen ! Right after DeepSeek rise, a new AI powerhouse enters the arena, this time, developed by the Chinese giant Alibaba Group While DeepSeek made waves, Qwen is stepping into the spotlight, outperforming its rival across multiple benchmarks. One thing is clear: Geopolitics is fueling AI innovation and investment, pretty much like the space race of the Cold War once did. The U.S. 🇺🇸and China 🇨🇳 are going BIG, but is there room for Europe 🇪🇺 in this battle? Yet beyond competition, this is a bigger shift: AI is evolving at an unprecedented pace, and the race is no longer just about SIZE. it’s about intelligence, adaptability, and real-world impact. #AI #ArtificialIntelligence #LLM #Innovation #FutureOfTech
The burst of DeepSeek V3 has attracted attention from the whole AI community to large-scale MoE models. Concurrently, we have been building Qwen2.5-Max, a large MoE LLM pretrained on massive data and post-trained with curated SFT and RLHF recipes. It achieves competitive performance against the top-tier models, and outcompetes DeepSeek V3 in benchmarks like Arena Hard, LiveBench, LiveCodeBench, GPQA-Diamond. 📖 Blog: https://lnkd.in/gUydnVsf 💬 Qwen Chat: https://chat.qwenlm.ai (choose Qwen2.5-Max as the model) ⚙️ API: https://lnkd.in/gH4Ny9YC (check the code snippet in the blog) 💻 HF Demo: https://lnkd.in/gE9qvf2t In the future, we not only continue the scaling in pretraining, but also invest in the scaling in RL. We hope that Qwen is able to explore the unknown in the near future! 🔥 💗 Thank you for your support during the past year. See you next year!
To view or add a comment, sign in
-
-
Presenting the AI Power Era - the Data to Decision event, brought to you by ATG Systems! ❤️ 🖤 Here is the Pilot! ⚡ #atgsystems #dataevent #itsolutionsprovider
To view or add a comment, sign in
-
-
GenAI is no stranger today but if you are looking for getting relevant results we need to inject subtleties by using feature store and document store (RAG) to improve the response from genAI systems .
To view or add a comment, sign in
-
-
Could it get any more dope than diving deep into fine-tuning embeddings for better retrieval? Absolutely not! This level of insight is uniquely embedded in the AI Makerspace no other spaces compares! 🚀 Thanks 👨🏫🤖 "Dr. Greg" Loughnane and Chris Alexiuk. #AIMakerspace #AI #RAG #MachineLearning
To better understand fine-tuning our embedding models, we need to embed ourselves in the embedding layer of our embedding model to embed… ERROR! 🚨 An infinite loop has been detected. #llm #rag #genai
To view or add a comment, sign in
-
The new Mixtral-8x22B base model from Mistral AI is a total beast for fine-tuning and has produced some of the highest scores I've ever seen on chat benchmarks like IFEval and BBH 🤯 We teamed up with Argilla and KAIST AI to fine-tune it with a brand new recipe for Zephyr models 🪁: https://lnkd.in/dhAmb4c2 🧑🍳 Align the base model with Odds Ratio Preference Optimisation (ORPO). This novel algorithm does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO 🦫 Use a brand new dataset of 7k high-quality, multi-turn preferences that has been developed by our friends at Argilla: https://lnkd.in/drPCMX4S As usual, we are open sourcing the training code in the Alignment Handbook for the community to build on: https://lnkd.in/dhDn9_fX This has been a epic speed run with Jiwoo Hong, Noah Lee, and Álvaro Bartolomé del Canto - now I can finally sleep 😂
To view or add a comment, sign in
-
-
AI in Public Safety Panel session. Scan the QR Code to add this session to your calendar!
To view or add a comment, sign in
-
-
Built an AI video generator using RunwayML API 🪄✨. Leveraged Retrofit for efficient API interactions and Room for robust local data storage. repo : https://lnkd.in/duKmNefp
To view or add a comment, sign in
-
-
OpenAI has just shown that for all intents and purposes, we now live in the era of AGI. With its new o3 model announced today, OpenAI has scored an unbelievable 87.5% on the ARC Prize AGI eval. Until today, no other model or combination of models has been able to achieve over 46%. This represents a quantum leap in an area of human-like reasoning that has been extremely difficult for machine models to approach. https://lnkd.in/gm9XqM7g
To view or add a comment, sign in
-