Towards Data Science’s Post

View organization page for Towards Data Science

640,919 followers

3mo Edited

This article by Pere Martra explores a structured pruning technique for state-of-the-art models, that uses a GLU architecture, enabling the creation of smaller and more efficient large language models. #Llama #LLM

How to Prune LLaMA 3.2 and Similar Large Language Models

towardsdatascience.com

To view or add a comment, sign in

More Relevant Posts

Mohamed Faskath

Student @ UOM || BSc(Hons) IT ||AI/ML enthusiast || Django || DRF || MERN Stack|| Symfony
1mo
Report this post
How DeepSeek Achieves Top Performance with Innovative Techniques DeepSeek leverages a Mixture of Experts (MoE) architecture and advanced training methods to outperform even more expensive models. By dynamically routing tasks to specialized "experts," it maximizes efficiency and accuracy. Want to learn more? Check out the details here: https://hubs.la/Q0347b7q0 #AI #MachineLearning #DeepSeek #Innovation #Tech

DeepLearning.AI

1,161,386 followers
1mo

How DeepSeek uses a Mixture of Experts architecture and other training techniques to outperform more expensive models 👇 https://hubs.la/Q0347b7q0

DeepSeek-V3 Redefines LLM Performance and Cost Efficiency

deeplearning.ai
Like Comment
To view or add a comment, sign in
Renato Pires dos Santos

Founder & Lead Developer at academicum.ai | Building AI-Powered Tools to Enhance Academic Research and Learning
1mo
Report this post
Its Mixture of Experts architecture allows DeepSeek to attain a new level of learning and reasoning.

DeepLearning.AI

1,161,386 followers
1mo

How DeepSeek uses a Mixture of Experts architecture and other training techniques to outperform more expensive models 👇 https://hubs.la/Q0347b7q0

DeepSeek-V3 Redefines LLM Performance and Cost Efficiency

deeplearning.ai
Like Comment
To view or add a comment, sign in
Mohammed Anas B.S

Solution Architect | AWS | AI | ML | Deep Learning | Generative AI | LLM’S | High Performance Computing
1mo
Report this post
DeepSeek leverages Mixture of Experts (MoE) to dynamically activate only a subset of the model for specific tasks, significantly reducing the compute load. In this architecture, input data is routed to the most relevant experts—specialized sub-networks within the model—based on task requirements. Rather than engaging the entire model, only a small fraction of its parameters are used during computation, optimizing resource utilization while maintaining accuracy. This approach allows DeepSeek to efficiently scale complex AI workloads, delivering high performance with reduced computational costs and energy consumption. # The world of AI is lightning up.

DeepLearning.AI

1,161,386 followers
1mo

How DeepSeek uses a Mixture of Experts architecture and other training techniques to outperform more expensive models 👇 https://hubs.la/Q0347b7q0

DeepSeek-V3 Redefines LLM Performance and Cost Efficiency

deeplearning.ai
Like Comment
To view or add a comment, sign in
Anugraha Sinha

GenerativeAI & AI/ML Business Lead | GenerativeAI & AI/ML Dev Leader | AI/ML SME | Computer Vision - 3D/RGB | Author | Researcher
1mo
Report this post
DeepSeek : indeed a breakthrough to show that we can have inference using lower compute and power, additionally a better training methodology using MoE. While this could provide an offshoot of LLMs using similar techniques, but what is most important is how much would DeepSeek be adopted in enterprise applications. OpenAI through Microsoft Azure, Gemini through Google Cloud, Claude through Amazon Web Services (AWS), and obviously others have had a stronger adoption because there is a strong sense of guarantee on data security. It's only a matter of time that we will have something similar/may be better which could be adopted faster, in a more secure manner. This also shows how much our Wall Street Guru(s) understand technology. 😊😎 Traders vs investor dilemma or I should say the FOMO syndrome or fear of apocalypse.... 🙄 #generativeai #openai #aws #microsoft #google

DeepLearning.AI

1,161,386 followers
1mo

How DeepSeek uses a Mixture of Experts architecture and other training techniques to outperform more expensive models 👇 https://hubs.la/Q0347b7q0

DeepSeek-V3 Redefines LLM Performance and Cost Efficiency

deeplearning.ai

1 Comment
Like Comment
To view or add a comment, sign in
Ravi G

President/Principal/Founder
1mo
Report this post
This article provides great insights into how Deep Seek was able to train a better model using Mixture of Experts technique. However, as correctly pointed in the article , people who trained DeepSeek with MOE techniques could not explain why costs fell dramatically. Implications of the new findings from DeepSeek are huge. It implies that now we can train a better model with 5 million dollars instead of 20 million or more.

DeepLearning.AI

1,161,386 followers
1mo

How DeepSeek uses a Mixture of Experts architecture and other training techniques to outperform more expensive models 👇 https://hubs.la/Q0347b7q0

DeepSeek-V3 Redefines LLM Performance and Cost Efficiency

deeplearning.ai
Like Comment
To view or add a comment, sign in
Diego Moreno

Master in Management and Administration of Organizations | AI & Machine Learning Specialist | Passionate about Sustainable Growth through Innovation
1mo Edited
Report this post
DeepSeek AI-V3, a new model from Hangzhou upstart DeepSeek, delivers outstanding performance and may change the equation for training costs. This open large language model outperforms Llama 3.1 405B and GPT-4o on key benchmarks and achieves exceptional scores in coding and math. The model uses a mixture-of-experts (MoE) architecture, which allows it to process inputs more efficiently by activating only a subset of its parameters. Trained on roughly 15 trillion tokens, DeepSeek-V3 demonstrates superior performance in various tasks while maintaining cost efficiency. 🐋 Read the full article here.

DeepLearning.AI

1,161,386 followers
1mo

How DeepSeek uses a Mixture of Experts architecture and other training techniques to outperform more expensive models 👇 https://hubs.la/Q0347b7q0

DeepSeek-V3 Redefines LLM Performance and Cost Efficiency

deeplearning.ai
Like Comment
To view or add a comment, sign in
Nasrin Pournajar

Full Stack Developer / BPMS Solution Developer at Isfahan University of Technology
1mo Edited
Report this post
An insightful article Highlighting DeepSeek-V3's impressive performance, as it outperforms Llama 3.1 405B and GPT-4o in coding and math tasks.

DeepLearning.AI

1,161,386 followers
1mo

How DeepSeek uses a Mixture of Experts architecture and other training techniques to outperform more expensive models 👇 https://hubs.la/Q0347b7q0

DeepSeek-V3 Redefines LLM Performance and Cost Efficiency

deeplearning.ai
Like Comment
To view or add a comment, sign in
KISHORE HARSHAN KUMAR

GenAI Intern at GrowthArc || Top Machine Learning Voice || 1x Microsoft Certified || 2x Salesforce Certified || 1x IBM Certified || 2x Oracle Certified
5mo
Report this post
I'm excited to share my latest article on "What’s Behind the Architecture of FAISS in Semantic Search"! In this deep dive, I explore the inner workings of Facebook AI Similarity Search (FAISS) and its role in powering semantic search, a key innovation driving modern AI and machine learning applications. From vector embeddings to efficient similarity retrieval, this article breaks down the core concepts, architecture, and use cases of FAISS. Check it out to understand how FAISS makes large-scale search faster and smarter! 🔗 #AI #SemanticSearch #FAISS #MachineLearning #ArtificialIntelligence #NLP #DataScience #TechInnovation Pooja Porchezhian

What behind the architecture of FAISS in Semantic Search

link.medium.com
Like Comment
To view or add a comment, sign in
Umar Iftikhar

Computer Vision Engineer | Data Scientist | AI Specialist | Revolutionizing Real-Time Analytics and Automation Technologies
2mo
Report this post
ModernBERT: Revisiting BERT in the Era of LLMs and Generative AI LightOn and Answer.ai have reimagined BERT for the modern AI landscape, introducing ModernBERT—a powerful update with cutting-edge architecture, enhanced capabilities, and remarkable performance improvements. Key Highlights: - Model Variants: Available in two sizes—Base (139M parameters) and Large (395M parameters). - Superior Performance: Achieves better results across all metrics compared to the original BERT and RoBERTa models. - Extended Context Length: Supports an impressive 8,192 token context length, 16 times greater than the original BERT. - Advanced Architecture: Incorporates Flash Attention 2, Rotary Positional Embeddings (RoPE), and alternating attention mechanisms for efficient processing. - Massive Training Dataset: Trained on 2 trillion tokens, primarily in English and programming languages. - Faster Inference: Delivers 2-4x speed improvements when handling mixed-length inputs. - Open-Source License: Released under the Apache 2.0 license for broader community access and usage. - Easily Accessible: Available on Hugging Face and integrated into the Transformers library. For more details: Models: https://lnkd.in/ethiJ2xh Blog: https://lnkd.in/ebiEzb4P Paper: https://lnkd.in/ezR8MUBF #ModernBERT #AI #NLP #MachineLearning #GenerativeAI #Transformers #BERT #LLMs #DeepLearning #ArtificialIntelligence #HuggingFace #FlashAttention #OpenSource #Research #Code #DataScience Umar Iftikhar

ModernBERT - a answerdotai Collection

huggingface.co

2 Comments
Like Comment
To view or add a comment, sign in
Matthew S.

Chief Architect
4mo
Report this post
Thanks Tushar Madaan for pointing me to Claude.ai, loving the mindmaps you can build! This one is about what may constitute architecture significance: https://lnkd.in/e_WTAusK

Claude

claude.ai

1 Comment
Like Comment
To view or add a comment, sign in

640,919 followers

View Profile Follow

Towards Data Science’s Post

How to Prune LLaMA 3.2 and Similar Large Language Models

towardsdatascience.com

More from this author

Python Upskilling for Advanced Data Science Workflows? You're in the Right Place.

Making the Most of LLMs: Evaluation, Temperature, and More

The Variable Is Moving — Stay With Us!

Explore topics