Janusz Naklicki’s Post

7mo

Faster, more accurate, less data needed = BaseModel.AI

synerise.com | basemodel.ai | cleora.ai | wislakrakow.com | agh.edu.pl

8mo Edited

🔥 Another great example of BaseModel.ai power created by Synerise. 😎 In May 2024, a preprint titled "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" [1] was posted by Meta #AI researchers to arXiv. The preprint introduced a novel recommender, referred to as 'HSTU' which stands for “Hierarchical Sequential Transduction Units” promising new state-of-the-art results for sequential recommendation tasks, as well as scalability to exceptionally large datasets. #HSTU was able to achieve significant improvements over prior state-of-the-art (SASRec) on all metrics on all datasets. #HSTU is yet another attempt at adapting (modified) #Transformers to generative recommendation, after #DeepMind’s #TIGER model (benchmarked in a previous post) While exact HSTU training and inference times are not reported, the model is based on a modified Transformer architecture. Meta AI’s team has optimized the architecture significantly allowing training 2-15x faster than Transformer++. Yet, even with those optimizations BaseModel’s training and inference processes are orders of magnitude faster. Explore the detailed comparison between BaseModel and Meta AI's HSTU for sequential #recommendations here: https://lnkd.in/d_6EXs6V

BaseModel vs HSTU for sequential recommendations

sair.synerise.com

To view or add a comment, sign in

More Relevant Posts

Jaroslaw Krolewski

synerise.com | basemodel.ai | cleora.ai | wislakrakow.com | agh.edu.pl
8mo Edited
Report this post
🔥 Another great example of BaseModel.ai power created by Synerise. 😎 In May 2024, a preprint titled "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" [1] was posted by Meta #AI researchers to arXiv. The preprint introduced a novel recommender, referred to as 'HSTU' which stands for “Hierarchical Sequential Transduction Units” promising new state-of-the-art results for sequential recommendation tasks, as well as scalability to exceptionally large datasets. #HSTU was able to achieve significant improvements over prior state-of-the-art (SASRec) on all metrics on all datasets. #HSTU is yet another attempt at adapting (modified) #Transformers to generative recommendation, after #DeepMind’s #TIGER model (benchmarked in a previous post) While exact HSTU training and inference times are not reported, the model is based on a modified Transformer architecture. Meta AI’s team has optimized the architecture significantly allowing training 2-15x faster than Transformer++. Yet, even with those optimizations BaseModel’s training and inference processes are orders of magnitude faster. Explore the detailed comparison between BaseModel and Meta AI's HSTU for sequential #recommendations here: https://lnkd.in/d_6EXs6V

BaseModel vs HSTU for sequential recommendations

sair.synerise.com

1 Comment
Like Comment
To view or add a comment, sign in
Bhanu Bhakta Sigdel

Senior Software Engineer | Java Spring Boot | Node.js | Ruby on Rails | React.js | Python | Conversational AI Agent | LLM
9mo Edited
Report this post
🚀🤖 Demystifying AI: Unveiling the Power of RAG Architecture 🤖🚀 Curious about AI's transformative force? Explore the intricacies of Retriever-Augmented Generation (RAG) architecture – a game-changer in the AI landscape blending retrieval and generative processes to tackle complex queries head-on. 🧩 What is RAG Architecture? 🧩 RAG seamlessly integrates retrieval and generation, retrieving relevant documents to fuel a powerful neural network (like Transformers) that crafts responses, ensuring accuracy and depth. 🌐 How Does it Work? 🌐 Picture asking a complex question to an AI system. RAG steps in, sourcing related documents to craft a comprehensive, accurate response. This empowers AI to handle nuanced queries and tailor responses to specific contexts. 🎯 Benefits of RAG Architecture: 🎯 - Enhanced Factual Accuracy: RAG cross-verifies information, reducing factual errors. - Tackling Complexities: RAG extracts insights from data to handle intricate questions. - Personalized Interactions: RAG tailors responses, enhancing user-centric experiences. 🤖 Ready to explore the potential of RAG? 🤖 Share your AI experiences or delve into RAG architecture by commenting below! 🚀🤖 Demystifying AI: Unveiling the Power of RAG Architecture 🤖🚀 Explore the inner workings of the Retriever-Augmented Generation (RAG) architecture, a transformative force in AI that expertly combines retrieval-based and generative processes to tackle complex queries. 🧩 What is RAG Architecture? 🧩 RAG elegantly bridges the gap between retrieval and generation. It retrieves relevant documents based on your query and utilizes them to inform a powerful neural network (like Transformers) to generate a response. 🌐How Does it Work? 🌐 Imagine you ask an AI system a complex question. RAG kicks in, searching for related documents, then uses that knowledge to craft a comprehensive and factually accurate answer. This empowers AI to address nuanced questions and personalize responses based on context. 🎯 Benefits of RAG Architecture 🎯 Enhanced Factual Accuracy: RAG verifies information with retrieved documents, reducing the risk of factual errors. Tackling Complexities: RAG can handle intricate questions by drawing insights from relevant data. Personalized Interactions: RAG tailors responses based on retrieved information, creating a more user-centric experience. 🤖 Ready to explore the potential of RAG? 🤖 Share your experiences with AI or ask questions about RAG architecture in the comments below! #AI #RAGArchitecture #MachineLearning #NaturalLanguageProcessing #LLM #VectorDatabase
4 Comments
Like Comment
To view or add a comment, sign in
Aymeric Roucher

Building agents @ Hugging Face 🤗 | Polytechnique - Cambridge
5mo Edited
Report this post
Emu3: Next-token prediction conquers multimodal tasks 🔥 This is the most important research in months: we’re now very close to having a single architecture to handle all modalities. The folks at Beijing Academy of Artificial Intelligence(BAAI) just released Emu3, a single model that handles text, images, and videos all at once. 𝗪𝗵𝗮𝘁'𝘀 𝘁𝗵𝗲 𝗯𝗶𝗴 𝗱𝗲𝗮𝗹? 🌟 Emu3 is the first model to truly unify all these different types of data (text, images, video) using just one simple trick: predicting the next token. And it’s only 8B, but really strong: 🖼️ For image generation, it's matching the best specialized models out there, like SDXL. 👁️ In vision tasks, it's outperforming top models like LLaVA-1.6-7B, which is a big deal for a model that wasn't specifically designed for this. 🎬 It's the first to nail video generation without using complicated diffusion techniques. 𝗛𝗼𝘄 𝗱𝗼𝗲𝘀 𝗶𝘁 𝘄𝗼𝗿𝗸? 🧩 Emu3 uses a special tokenizer (SBER-MoVQGAN) to turn images and video clips into sequences of 4,096 tokens. 🔗 Then, it treats everything - text, images, and videos - as one long series of tokens to predict. 🔮 During training, it just tries to guess the next token, whether that's a word, part of an image, or a video frame. To build their multimodal dataset, the team: 🎨 Tossed out low-res and ugly images using an aesthetic model (LAION-AI aesthetic filter) to score pictures and videos ✍️ Got GPT-4V to write captions for 1 million images, same for frames of video, to finally get a truly multimodal dataset. 𝗖𝗮𝘃𝗲𝗮𝘁𝘀 𝗼𝗻 𝘁𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀: 👉 In image generation, Emu3 beats SDXL, but it’s also much bigger (8B vs 3.5B). It would be more difficult to beat the real diffusion GOAT FLUX-dev. 👉 In vision, authors also don’t show a comparison against all the current SOTA models like Qwen-VL or Pixtral. On the positive side, this approach is exciting because it's simple (next token prediction) and scalable(handles all sorts of data). On the other hand, it's once again the Bitter lesson: no matter your architecture, just throw good data and compute, and you’ll get the best model out there. 😬
15 Comments
Like Comment
To view or add a comment, sign in
David Lee

software engineer
5mo
Report this post
The availability of high-quality human-generated text data is becoming increasingly limited, posing challenges for AI development. While efforts to create synthetic data and augment existing datasets are ongoing, the benefits of these approaches remain controversial. Issues such as overfitting and unexpected drops in performance have been observed in some cases. Given this context, multimodal training is emerging as a promising approach for maintaining the scaling laws that have driven progress in AI. By incorporating diverse data types beyond text, multimodal training offers a potential solution to data scarcity while potentially unlocking new capabilities in AI systems. This strategy may allow AI models to continue improving and scaling, even as traditional text-based data sources become more limited. As the field evolves, leveraging multiple modalities could be key to overcoming current data challenges and pushing the boundaries of AI performance.
Aymeric Roucher

Building agents @ Hugging Face 🤗 | Polytechnique - Cambridge
5mo Edited

Emu3: Next-token prediction conquers multimodal tasks 🔥 This is the most important research in months: we’re now very close to having a single architecture to handle all modalities. The folks at Beijing Academy of Artificial Intelligence(BAAI) just released Emu3, a single model that handles text, images, and videos all at once. 𝗪𝗵𝗮𝘁'𝘀 𝘁𝗵𝗲 𝗯𝗶𝗴 𝗱𝗲𝗮𝗹? 🌟 Emu3 is the first model to truly unify all these different types of data (text, images, video) using just one simple trick: predicting the next token. And it’s only 8B, but really strong: 🖼️ For image generation, it's matching the best specialized models out there, like SDXL. 👁️ In vision tasks, it's outperforming top models like LLaVA-1.6-7B, which is a big deal for a model that wasn't specifically designed for this. 🎬 It's the first to nail video generation without using complicated diffusion techniques. 𝗛𝗼𝘄 𝗱𝗼𝗲𝘀 𝗶𝘁 𝘄𝗼𝗿𝗸? 🧩 Emu3 uses a special tokenizer (SBER-MoVQGAN) to turn images and video clips into sequences of 4,096 tokens. 🔗 Then, it treats everything - text, images, and videos - as one long series of tokens to predict. 🔮 During training, it just tries to guess the next token, whether that's a word, part of an image, or a video frame. To build their multimodal dataset, the team: 🎨 Tossed out low-res and ugly images using an aesthetic model (LAION-AI aesthetic filter) to score pictures and videos ✍️ Got GPT-4V to write captions for 1 million images, same for frames of video, to finally get a truly multimodal dataset. 𝗖𝗮𝘃𝗲𝗮𝘁𝘀 𝗼𝗻 𝘁𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀: 👉 In image generation, Emu3 beats SDXL, but it’s also much bigger (8B vs 3.5B). It would be more difficult to beat the real diffusion GOAT FLUX-dev. 👉 In vision, authors also don’t show a comparison against all the current SOTA models like Qwen-VL or Pixtral. On the positive side, this approach is exciting because it's simple (next token prediction) and scalable(handles all sorts of data). On the other hand, it's once again the Bitter lesson: no matter your architecture, just throw good data and compute, and you’ll get the best model out there. 😬
Like Comment
To view or add a comment, sign in
Gerald Hewes
5mo
Report this post
Exciting new model as Emu3 - as described below and in their summary this is the first model to generate the image/video content using just transformers instead of combining transformers with diffusion techniques. This is huge. The last couple of weeks has been very exciting between this release, LLAMA 3.2 multimodal model and Pixtral-12B.
Aymeric Roucher

Building agents @ Hugging Face 🤗 | Polytechnique - Cambridge
5mo Edited

Emu3: Next-token prediction conquers multimodal tasks 🔥 This is the most important research in months: we’re now very close to having a single architecture to handle all modalities. The folks at Beijing Academy of Artificial Intelligence(BAAI) just released Emu3, a single model that handles text, images, and videos all at once. 𝗪𝗵𝗮𝘁'𝘀 𝘁𝗵𝗲 𝗯𝗶𝗴 𝗱𝗲𝗮𝗹? 🌟 Emu3 is the first model to truly unify all these different types of data (text, images, video) using just one simple trick: predicting the next token. And it’s only 8B, but really strong: 🖼️ For image generation, it's matching the best specialized models out there, like SDXL. 👁️ In vision tasks, it's outperforming top models like LLaVA-1.6-7B, which is a big deal for a model that wasn't specifically designed for this. 🎬 It's the first to nail video generation without using complicated diffusion techniques. 𝗛𝗼𝘄 𝗱𝗼𝗲𝘀 𝗶𝘁 𝘄𝗼𝗿𝗸? 🧩 Emu3 uses a special tokenizer (SBER-MoVQGAN) to turn images and video clips into sequences of 4,096 tokens. 🔗 Then, it treats everything - text, images, and videos - as one long series of tokens to predict. 🔮 During training, it just tries to guess the next token, whether that's a word, part of an image, or a video frame. To build their multimodal dataset, the team: 🎨 Tossed out low-res and ugly images using an aesthetic model (LAION-AI aesthetic filter) to score pictures and videos ✍️ Got GPT-4V to write captions for 1 million images, same for frames of video, to finally get a truly multimodal dataset. 𝗖𝗮𝘃𝗲𝗮𝘁𝘀 𝗼𝗻 𝘁𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀: 👉 In image generation, Emu3 beats SDXL, but it’s also much bigger (8B vs 3.5B). It would be more difficult to beat the real diffusion GOAT FLUX-dev. 👉 In vision, authors also don’t show a comparison against all the current SOTA models like Qwen-VL or Pixtral. On the positive side, this approach is exciting because it's simple (next token prediction) and scalable(handles all sorts of data). On the other hand, it's once again the Bitter lesson: no matter your architecture, just throw good data and compute, and you’ll get the best model out there. 😬
Like Comment
To view or add a comment, sign in
AI Toolhouse - AI Tools Catalogue

886 followers
9mo
Report this post
Machine learning models are becoming increasingly complex, and understanding their inner workings is more important than ever. That's where Model Explorer comes in—a powerful graph visualization tool developed by Google researchers. Model Explorer helps you visualize, debug, and optimize even the most intricate ML models with ease. 🔍 Key Features: Hierarchical Layout for clear architecture understanding Compatibility with TensorFlow, PyTorch, and JAX GPU-Accelerated Rendering for smooth performance Interactive Features for effective debugging Ready to dive deeper? Check out our latest blog post for an in-depth look at how Model Explorer can transform your ML workflow - https://lnkd.in/g7BmmXiX 😍 Follow AI Toolhouse for more such amazing content. 🌟 Explore 𝟑𝟔𝟎𝟎+ latest AI Tools here for FREE ➡️ https://lnkd.in/dpQB7xZU #MachineLearning #DataScience #AI #ModelVisualization #TechInnovation

Model Explorer: Visualize Complex ML Models Easily

https://blog.aitoolhouse.com

1 Comment
Like Comment
To view or add a comment, sign in
Mihai (Mike) Caraman, PhD
2mo Edited
Report this post
@OpenAI's o3 reasoning models unveiled yesterday, achieved a breakthrough 75.7% respectively 87.5% on the current ARC-AGI benchmark for general intelligence, reaching human-level performance on average for these specific tasks. This is inline with my early prediction for 'slow path thinking' models that enhance transformer architecture with reinforcement learning and other techniques. The price for running these intelligent models is though prohibitive: 20$/task for mini model and +10K$/task for larger model. I expect this year's trend with SLM (Small Language Models) shrinking and inference optimizations advancements in software and hardware to continue in 2025, to brig the running cost down. I am hardly waiting to contribute to it ☺️ François Chollet, co-creator of ARC-AGI: "This is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models. For context, ARC-AGI-1 took 4 years to go from 0% with GPT-3 in 2020 to 5% in 2024 with GPT-4o. All intuition about AI capabilities will need to get updated for o3." Also, the improved ARC-AGI-2 benchmark version is expected soon, to raise the bar for next-generation intelligent models innovation. PS: - Recent research found TTT (Test-Time Training), learning from few-shot examples during inference, to be very effective for Abstract Reasoning even with smaller models, and ensembles to raise performance around 50%: https://lnkd.in/dqSvuV4w - The benchmarks for o3 models were run with the semi-private evaluation version. Thought Sam Altman mentioned that o3 ‘did not targeted’ ARC-AGI, it’s not clear if the models were pretrained with potentially leaked test data, in which case the results are less relevant. #OpenAI #o3 #ARCAGI #AGI #Benchmark #Nextgen #LLM #SLM #Transformers #ReinforcementLearning #Inference #Efficency #Latency #Cost #Performance #TPU

OpenAI o3 Breakthrough High Score on ARC-AGI-Pub

arcprize.org

2 Comments
Like Comment
To view or add a comment, sign in
Haq Nawaz

Data Science Consultant at ILI.DIGITAL AG | The No. 1 Digital Business Builder. We turn your physical assets into digital miracles.
1mo
Report this post
I recently came across this insightful post by Rao Muhammad Adeel Nawab, Ph.D. introducing the innovative concept of Cache-Augmented Generation (CAG). As someone new to this area, I am fascinated by its potential to overcome the challenges associated with Retrieval-Augmented Generation (RAG), particularly in knowledge-intensive and long-context applications. One idea I’m eager to explore is how CAG could be utilized for religious and linguistic studies, such as analyzing the Quran English translations. The ability to preload curated knowledge into a Key-Value (KV) cache could provide fast, accurate, and context-aware responses to theological and linguistic queries. This approach might pave the way for creating advanced tools for studying complex texts and bridging gaps in understanding cultural and historical contexts. The possibilities with CAG are endless, from streamlining workflows to achieving higher accuracy and speed in AI tasks. Thank you, Rao Muhammad Adeel Nawab, Ph.D. for sharing this game-changing concept and its remarkable results. 💬 I’d love to hear thoughts from my network! What potential applications do you see for CAG in domains requiring long-context LLMs? Let’s discuss! #AI #GenerativeAI #CacheAugmentedGeneration #NLP #MachineLearning #LongContextLLMs

Rao Muhammad Adeel Nawab, Ph.D.

CEO, Ilm O Irfan Technologies | Generative AI Expert & Consultant | AI & NLP Researcher
1mo

🚀 Goodbye RAG, Hello CAG! Revolutionizing Long-Context LLMs with Cache-Augmented Generation 🤖 Generative AI is evolving, and with it comes the need for faster, smarter, and simpler solutions to handle knowledge-intensive tasks. A recent breakthrough [https://lnkd.in/dpTcHySx, https://lnkd.in/dHUuNrES] introduces Cache-Augmented Generation (CAG), a game-changing approach that outperforms Retrieval-Augmented Generation (RAG) for "long-context LLMs" by solving its biggest challenges: 🔍 The Problem with RAG: ⿡ Retrieval Latency: Waiting for real-time data slows everything down. ⿢ Errors in Document Selection: Retrieval systems often miss or misrank critical information. ⿣ Increased System Complexity: Complex architectures = headaches for developers! ✨ Enter CAG! The Smarter Alternative: 🌟 No Delays: Preloading knowledge removes the need for retrieval, ensuring lightning-fast responses. 🌟 Error-Free: CAG ensures your AI works with complete, accurate data for consistent results. 🌟 Simplified Architecture: Forget about complicated retrieval pipelines. CAG streamlines workflows for easier maintenance. ✨ How CAG Works! 🌟 External Knowledge Preloading: A curated document set is preprocessed into a precomputed Key-Value (KV) cache, stored for reuse to reduce repetitive computational costs. 🌟 Inference: The KV cache is combined with user queries, enabling quick, error-free responses by eliminating dynamic retrieval latency. 🌟 Cache Reset: To sustain performance, the KV cache is efficiently truncated for rapid reinitialization without full reloads, ensuring speed and responsiveness across sessions. 💡 The Results Speak for Themselves: On benchmarks like HotPotQA and SQuAD, CAG crushed the competition: ✅ 97% Faster: Processing time reduced from 94s to just 2.3s. ✅ Higher Accuracy: Achieved top scores in generating precise, contextually relevant answers. 📈 With CAG, we’re not just improving AI—we’re revolutionizing how it integrates knowledge! As long-context LLMs evolve, this approach promises to redefine the future of AI workflows. 🔥 The future is retrieval-free! Are you ready to embrace it? 💬 I’d Love to Hear From You! What are your thoughts on Cache-Augmented Generation (CAG) and its potential to revolutionize long-context LLMs? Drop your comments below to share your insights, ask questions, or discuss the possibilities this breakthrough unlocks. Your feedback is invaluable! #AIInnovation #LongContextLLMs #RAGvsCAG #CacheAugmentedGeneration #GenerativeAI #FutureOfAI #TechInnovation

3 Comments
Like Comment
To view or add a comment, sign in
Rao Muhammad Adeel Nawab, Ph.D.

CEO, Ilm O Irfan Technologies | Generative AI Expert & Consultant | AI & NLP Researcher
1mo
Report this post
🚀 Goodbye RAG, Hello CAG! Revolutionizing Long-Context LLMs with Cache-Augmented Generation 🤖 Generative AI is evolving, and with it comes the need for faster, smarter, and simpler solutions to handle knowledge-intensive tasks. A recent breakthrough [https://lnkd.in/dpTcHySx, https://lnkd.in/dHUuNrES] introduces Cache-Augmented Generation (CAG), a game-changing approach that outperforms Retrieval-Augmented Generation (RAG) for "long-context LLMs" by solving its biggest challenges: 🔍 The Problem with RAG: ⿡ Retrieval Latency: Waiting for real-time data slows everything down. ⿢ Errors in Document Selection: Retrieval systems often miss or misrank critical information. ⿣ Increased System Complexity: Complex architectures = headaches for developers! ✨ Enter CAG! The Smarter Alternative: 🌟 No Delays: Preloading knowledge removes the need for retrieval, ensuring lightning-fast responses. 🌟 Error-Free: CAG ensures your AI works with complete, accurate data for consistent results. 🌟 Simplified Architecture: Forget about complicated retrieval pipelines. CAG streamlines workflows for easier maintenance. ✨ How CAG Works! 🌟 External Knowledge Preloading: A curated document set is preprocessed into a precomputed Key-Value (KV) cache, stored for reuse to reduce repetitive computational costs. 🌟 Inference: The KV cache is combined with user queries, enabling quick, error-free responses by eliminating dynamic retrieval latency. 🌟 Cache Reset: To sustain performance, the KV cache is efficiently truncated for rapid reinitialization without full reloads, ensuring speed and responsiveness across sessions. 💡 The Results Speak for Themselves: On benchmarks like HotPotQA and SQuAD, CAG crushed the competition: ✅ 97% Faster: Processing time reduced from 94s to just 2.3s. ✅ Higher Accuracy: Achieved top scores in generating precise, contextually relevant answers. 📈 With CAG, we’re not just improving AI—we’re revolutionizing how it integrates knowledge! As long-context LLMs evolve, this approach promises to redefine the future of AI workflows. 🔥 The future is retrieval-free! Are you ready to embrace it? 💬 I’d Love to Hear From You! What are your thoughts on Cache-Augmented Generation (CAG) and its potential to revolutionize long-context LLMs? Drop your comments below to share your insights, ask questions, or discuss the possibilities this breakthrough unlocks. Your feedback is invaluable! #AIInnovation #LongContextLLMs #RAGvsCAG #CacheAugmentedGeneration #GenerativeAI #FutureOfAI #TechInnovation

10 Comments
Like Comment
To view or add a comment, sign in
My Social

Technical Lead | Tech Blogger | AI Enthusiast | Exploring New & Emerging AI-ML based Models
5mo
Report this post
Learn how GRIN-MoE, Microsoft’s latest Mixture-of-Experts (MoE) model, is setting new standards in AI. With features like sparse gradient estimation for expert selection and model parallelism, GRIN-MoE is more scalable and efficient than traditional MoE models. It excels in coding and mathematics, making it a powerful tool for complex tasks. #GRINMoE #Microsoft #AI #OpenSource #MachineLearning #DeepLearning #AIModel https://lnkd.in/drQWEaCf

GRIN-MoE: Microsoft’s Revolutionary Mixture-of-Experts Model

socialviews81.blogspot.com
Like Comment
To view or add a comment, sign in

2,747 followers

176 Posts

View Profile Follow

Janusz Naklicki’s Post

More Relevant Posts

Explore topics