I am very happy that after my Machine Learning PhD at MIT: me, Rajat Dandekar and Sreedath Panat returned to India and started Vizuara. It gives me an opportunity to make posts like these: At 6pm today, we are launching our next Youtube playlist: “Build DeepSeek from Scratch” This series is a part of our vision to make AI accessible to all. Ever since DeepSeek was launched, everyone is focused on: - Flashy headlines - Company wars - Building LLM applications powered by DeepSeek I very strongly think that students, researchers, engineers and working professionals should focus on the foundations. The real question we should ask ourselves is: “Can I build the DeepSeek architecture and model myself, from scratch?” If you ask this question, you will discover that to make DeepSeek work, there are a number of key ingredients which play a role: (1) Mixture of Experts (MoE) (2) Multi-head Latent Attention (MLA) (3) Rotary Positional Encodings (RoPE) (4) Multi-token prediction (MTP) (5) Supervised Fine-Tuning (SFT) (6) Group Relative Policy Optimisation (GRPO) Our aim with the “Build DeepSeek from Scratch” playlist is: - To teach you the mathematical foundations behind all the 6 ingredients above. - To code all 6 ingredients above, from scratch. - To assemble these ingredients and to run a “mini Deep-Seek” on your own. After this, you will among the top 0.1%. of ML/LLM engineers who can build DeepSeek ingredients on their own. This playlist won’t be a 1 hour or 2 hour video. This will be a mega playlist of 35-40 videos with a duration of 40+ hours. It will be in-depth. No fluff. Solid content. Join us for the 6pm premier here: https://lnkd.in/gRcNE-sg P.S: Attached is a small GIF showing the notes we have made. This is just 5-10% of the total amount of notes and material we have prepared for this series!
Hi Raj, We are definitely on the same page re: the big picture. the AI/GPU arms race will not end well and yes people continue to be excluded from the equation. I'd love to talk to you about a novel discovery I have documented the other day. Multi-Agent Reasoning workflows within a single task model, which includes iterative refinement loops even before a query is even generated. I have a model in production that is capable of assessing complex scenarios of buying a home through a psychological, financial, appraisal, and lending lens, risk checks, all within a single query, backtested every which way I know how. Oh and the unlock for this is through layers of natural language "directives" and expansions of system prompts through documents themselves. Lightning in a bottle to achieve through perfect alignment of ivflatt, hybrid search rag, with a crossencoder reranker. there are of course so many other requirements but they are not technical ones... Given your proficiency navigating the internals of deepseek though I have no doubt that you would be able to advance this in ways that are well beyond mine (I am merely self taught through my structured knowledge.) Thanks!
Completely agreed with the approach and focusing on fundamentals. It always helps to visualize with clarity.
Brilliant!!!! Raj Abhijit Dandekar we will carry your mission forward in the Alchemist Club Studios www.alchemistclubstudios.com The Alchemist Apprentices are learning Neural Networks from scratch
I'm looking forward to this! There may come a time when we can train such advanced reasoning models on our laptops; let's hope that time is not very far 😀
This is a great Initiative 👍 Mathematical concepts are so important. We need educators like you 👏
Finally, I have been searching for something like this ever since it came out! Looking forward to the series!
Dr. Raj, Thank you very much. I am with you. I am proud of you. I will study your material rigorously and will do coding. With regards, Dr. Nirmalya Sen Phone number: 8584934379
From scratch as in No transformers? no trl library, pure pytorch if not pure python/ numpy?
Are the videos available ? I can't find any video where start coding.
Robotics and AI with a strong WHY - Founder
3wThis is an incredible initiative, truly focusing on the foundational aspects of AI and deep learning. Building DeepSeek from scratch will not only deepen our understanding but also empower us to innovate and create new solutions. Looking forward to diving into the mathematical foundations and coding journey with your mega playlist. Thank you for making AI more accessible and empowering us to be part of the top 0.1% of ML/LLM engineers.