🚀 DeepSeek-R1: A Paradigm Shift in LLM Reasoning! The AI landscape just witnessed a major breakthrough! DeepSeek-R1, a revolutionary Large Language Model (LLM), has proven that pure Reinforcement Learning (RL) can significantly enhance reasoning capabilities without a single byte of supervised fine-tuning data. Github Repo: https://lnkd.in/g9SyDsdy I decided to run it locally on my system and test its reasoning with a fun query running: “How many ‘r’ are there in the word strawberry?” 🍓 DeepSeek-R1 responded with a fascinating chain of thought, refer to the image below. This demonstrates the model’s ability to reason step-by-step, showcasing the power of RL-driven training in handling language tasks with ease! 🔥 Why Is DeepSeek-R1 Special? 💡 Zero Supervised Data, 100% RL Forget traditional supervised fine-tuning—DeepSeek-R1-Zero evolved entirely through RL, improving itself over multiple iterations. 📊 Crushing Benchmarks Across Domains: •🧮 Mathematics: Achieved a stunning 97.3% on MATH-500, surpassing many top-tier models. •💻 Coding: Scored an impressive 96.3 percentile on Codeforces, demonstrating expert-level coding skills. •🧠 General Reasoning: Excelled across diverse logic and reasoning benchmarks. ❤️ Open-Source Power DeepSeek-AI has generously open-sourced versions of the model, ranging from 1.5B to 70B parameters, giving the AI community access to cutting-edge reasoning capabilities. This is a game-changer in AI research. Smaller models achieving remarkable feats through knowledge distillation show that size isn’t everything! Exciting times ahead 🚀 #AI #MachineLearning #DeepLearning #LLM #GenAI #AIAgents #aiagents #ReinforcementLearning #DeepSeekR1 #OpenSource
Shivansh Srivastava’s Post
More Relevant Posts
-
If you're tackling the Traveling Salesman Problem (TSP) or interested in challenging optimization problems, take a look at our latest publication. We've developed an open-source tool that simplifies generating large-scale TSP datasets. It's ideal for those exploring modern machine learning methods (like LLMs) or benchmarking traditional algorithms. This work is part of a larger project where we’re exploring how Large Language Models (LLMs) can be applied to combinatorial optimization problems. While there’s a perception that LLMs aren’t suited for NP-hard problems, we’re pushing the boundaries to see what they can achieve. We’ll be sharing the results of this project in a future publication. Key highlights: - Flexibility: Define your own problem scenarios and generate the data you need. - Scale: From thousands to millions of instances, this tool helps you generate datasets that support large-scale projects. - Accessibility: Easy to use, with pre-solved datasets for benchmarking. Check out the full publication here: https://lnkd.in/gdb43EKJ Share your thoughts and stay tuned for more updates on our LLM project! #AI #MachineLearning #Optimization #TSP #DataGeneration #LLMs #CombinatorialOptimization #ShareYourAI #ReadyTensor
To view or add a comment, sign in
-
-
Scaling Laws for Pre-Training Agents and World Models The paper explores how scaling (i.e., increasing model parameters, dataset size, and compute resources) affects the performance of embodied AI agents, specifically in behavior cloning and world modeling tasks. 🔹Introduction and Motivation The research aims to understand the impact of scaling on pre-trained embodied agents, similar to how scaling laws have been studied extensively for large language models (LLMs). ◾◾The focus is on two tasks: ◾World Modeling (WM): Learning to predict future observations based on past sequences. ◾Behavior Cloning (BC): Learning to mimic actions taken by humans or agents in specific environments. 🔹Scaling Laws ◾◾The paper demonstrates that scaling laws similar to those in LLMs apply to both world modeling and behavior cloning. ◾◾Power Laws are used to establish relationships between model size, dataset size, and training compute (FLOPs). These power laws allow researchers to predict the optimal model and dataset sizes for a given compute budget. 🔹Key Findings ◾◾World Modeling: The study finds that optimal scaling in world modeling depends heavily on the tokenizer used. A tokenizer with a higher compression rate (more tokens per observation) shifts the optimal trade-off toward larger model sizes. ◾◾Behavior Cloning: ◾When using tokenized input observations, the scaling laws skew towards favoring more data over larger models. ◾However, when using a CNN-based architecture (i.e., where observations are encoded as continuous embeddings), the scaling laws shift towards favoring larger models over more data. 🔹 Conclusion ◾◾This study establishes that scaling laws for pre-training can be extended beyond language models to embodied AI tasks like world modeling and behavior cloning. ◾◾The findings help optimize resource allocation for training agents in complex environments, providing guidelines on how to balance model size and dataset size. #GenAI #AI #LLM #datascience #machinelearning #Scaling #RAG #CNN Reference : https://lnkd.in/gkFeJacf
To view or add a comment, sign in
-
Unlock the potential of time series data with LLMs! Here's how Large Time Series Models (LTSM) are revolutionizing forecasting: - 📊 Harnesses vast data like LLMs for comprehensive learning - 🧩 Tokenizes time series data into symbolic representations - 🔧 Fine-tunes pre-trained models to avoid overfitting - 🌐 Diverse datasets improve model generalization #AI #TimeSeries #DataScience - 🛠️ Reprogramming LLMs involves steps like tokenization, model selection, and prompt engineering. - 📉 Statistical prompts outperform text prompts for time series modeling. - 📈 Medium-sized models excel in short-term forecasting, while diverse datasets enhance overall performance. - 📚 Includes open-source frameworks for benchmarking and reprogramming your own LTSM. https://lnkd.in/gd24uuWf
To view or add a comment, sign in
-
Good overview of the newest #LLM #Deepseek. Kore.ai allows businesses to ingest any LLM into our platform to assist with complex problems and processes. The choosing of the right platform to accommodate the customer or employee experience is critical to great outcomes. #thefutureisnow #aiforprocess #PartnerPowered Carl Katz Amy Jeschke
Language Models, AI Agents, Agentic Applications, Development Frameworks & Data-Centric Productivity Tools
𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸-𝗥𝗲𝗮𝘀𝗼𝗻𝗲𝗿 for Multi-round Conversation management. In each round of the conversation, the model outputs the CoT (reasoning_content) and the final answer (content). In subsequent rounds of the conversation, the CoT from previous rounds is not concatenated into the context, as illustrated in the following diagram:| 🚀 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗳𝗼𝗿 𝗣𝗿𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 Some tasks involve complex problems that demand logical rigor and step-by-step analysis. DeepSeek-Reasoner is a Language model engineered to excel in tasks requiring advanced reasoning, problem-solving & structured decision-making. 𝗪𝗵𝗮𝘁 𝗦𝗲𝘁𝘀 𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸-𝗥𝗲𝗮𝘀𝗼𝗻𝗲𝗿 𝗔𝗽𝗮𝗿𝘁? 🔹 𝘚𝘱𝘦𝘤𝘪𝘢𝘭𝘪𝘴𝘦𝘥 𝘓𝘰𝘨𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯𝘨: Unlike general-purpose models, DeepSeek-Reasoner is fine-tuned for tasks like mathematical proofs, code debugging, scientific research & multi-step decision-making. It breaks down problems into clear, logical steps—not just final answers. 🔹 𝘉𝘦𝘯𝘤𝘩𝘮𝘢𝘳𝘬 𝘋𝘰𝘮𝘪𝘯𝘢𝘯𝘤𝘦: Outperforms leading models (e.g., GPT-4, Claude) on reasoning-focused benchmarks like MATH (competition-level math) and Codeforces coding challenges. 🔹𝘓𝘰𝘯𝘨-𝘊𝘰𝘯𝘵𝘦𝘹𝘵 𝘔𝘢𝘴𝘵𝘦𝘳𝘺: Handles 32k-token contexts, retaining critical details across lengthy dialogues or technical documents. Perfect for parsing research papers or multi-file codebases. 👉 Explore DeepSeek-Reasoner on Hugging Face. 👉 Read what Clem Delangue 🤗 is writing, and also Aravind Srinivas. 👉 DeepSeek AI documentation & API Guides: https://lnkd.in/dpaa-9W7 👉 Share how you would leverage a reasoning-first AI or any questions in the comments! #AI #MachineLearning #TechInnovation #DataScience #STEM #DeepLearning DeepSeek AI
To view or add a comment, sign in
-
-
Recently, LLMs have achieved impressive feats. 👇 𝟵𝟳% 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 in identifying causal directionality (GPT-3, PaLM). 𝟵𝟮% 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 in counterfactual reasoning (GPT-4). 𝟴𝟲% 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 in necessary cause identification (GPT-4). We see a standout area: causal reasoning with Large Language Models (LLMs) like GPT-3 and GPT-4. These statistics translate to better predictive outcomes, risk assessments, and actionable insights. For businesses, this means enhanced decision-making precision and speed And you know what? Their capabilities extend significantly when combined with specialized algorithms and structured knowledge. Algorithms refine LLM insights by... - enabling structured, multi-step causal analysis - optimizing the processing of causal relationships - maintaining logical consistency across decisions External knowledge sources like knowledge graphs provide context and structure missing from LLM training datasets by... - offering a structured schema of relationships and dependencies - serving as a reference to confirm the empirical validity of the LLM’s inferences - supplying historical and environmental details that LLMs might overlook The result? A robust system capable of complex causal reasoning. #dataanalytics #ai #technology #data #artificialintelligence 𝗟𝗼𝗼𝗸𝗶𝗻𝗴 𝘁𝗼 𝗰𝗼𝗹𝗹𝗲𝗰𝘁 𝗱𝗮𝘁𝗮 𝘁𝗼 𝗯𝘂𝗶𝗹𝗱 𝘆𝗼𝘂𝗿 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗴𝗿𝗮𝗽𝗵𝘀? Nimble 's AI-powered web data access solutions have won the trust of leading data teams at Uber, Pinterest, Discord and more. ➡️ https://lnkd.in/db9iPu-D
To view or add a comment, sign in
-
-
🏆 LLM Models Showdown: Open Source vs Closed Source - Who’s Leading the Race? The race for dominance in the world of Large Language Models (LLMs) is heating up! With the latest benchmarking results, it's clear that both Open Source and Closed Source models are fighting neck-and-neck in various categories of performance. 🚀 🔓 Open Source LLMs like NVLM-D, Llama 3-V, and InternVL2 are making huge strides in areas like document understanding, VQA, and diagram comprehension. On the other hand, 🔐 Closed Source LLMs like GPT-4V, Claude 3.5, and Gemini 1.5 Pro are excelling in specific areas like OCR and chart-based QA. Let’s break it down: 📊 DocVQA: Open Source LLMs edge out the competition with NVLM-D and Llama series. InternVL2 tops with 94.1. 📈 ChartQA: Claude 3.5 Sonnet (Closed Source) shines with 90.8, slightly ahead of the Open Source leaders. 🤔 Text-based understanding (MMMLU, GSM8K, HumanEval): Both camps are strong, but Claude 3.5 edges the win. The future of AI development is at a crossroads between open collaboration and proprietary advancements. Which side are you on? 🧐 🔗 Dive deeper into the detailed performance comparison with this in-depth analysis! 👇 (CHECK COMMENTS) #AI #MachineLearning #LLM #OpenSourceAI #ClosedSourceAI #GPT #Llama #ClaudeAI #GeminiPro #DataScience #AICompetition #ArtificialIntelligence
To view or add a comment, sign in
-
-
Self-Taught Reasoner (STaR) employs an iterative method of rationale generation, correction, and fine-tuning to enhance a model's reasoning abilities. This approach enables models to achieve performance on par with much larger counterparts on tasks like GSM8K and CommonsenseQA, as demonstrated in a 2022 study. Implementation Steps: 1: Begin with a large language model (such as GPT-J) and compile a small dataset containing examples with reasoning rationales (e.g., for math problems). 2: Use few-shot prompting to generate rationales and answers for a broad set of problems from the dataset. 3: For incorrect answers, re-prompt the model to generate a rationale based on the correct answer (referred to as providing a "hint"). 4: Fine-tune the model using both the initial correct rationales and the newly corrected examples. 5: Repeat steps 2-4 through several iterations (typically 30-40 as per the study) to gradually improve the model's reasoning performance. Key Insights: - STaR allows models to learn from their mistakes by generating rationales, even when initial answers are incorrect. - The corrected rationales play a critical role in the learning process. - The iterative approach improved GPT-J's performance from 5.8% to 10.7%. - This method reduces the reliance on large, manually labeled rationale datasets, offering a more scalable approach to improving reasoning. Although the paper is from 2022, the technique remains relevant and effective today. You can find the paper here: https://lnkd.in/e-dUyyth #AI #MachineLearning #NaturalLanguageProcessing #ReasoningModels #SelfTaughtReasoner #STaRMethod #FewShotLearning #IterativeLearning #ModelOptimization #DeepLearning #AIResearch #RationaleGeneration #FineTuning #GPTJ #AIInnovation #CommonsenseQA #GSM8K Umar Iftikhar
To view or add a comment, sign in
-
Future of Transformers State machines excel at managing deterministic situations. Advancing from this foundation, State Space Models (SSMs) take the baton and introduce probabilistic elements, making them adept at handling dynamic systems that change continuously, much like the stock markets. SSMs are well-suited for applications in control systems, signal processing, and increasingly in machine learning for modeling sequences. They are particularly valuable for handling time series data and are commonly employed in modeling physical systems, financial forecasting, weather prediction, and other scenarios requiring predictions over time under conditions of uncertainty. SSMs are capable of managing long-term dependencies within data. When integrated with attention mechanisms, they can significantly enhance model performance. Specifically, Large Language Models (LLMs) built with SSM frameworks can process extensive temporal text or data effectively. The use of Sliding Window Attention allows these models to focus on important data segments that may not follow straightforward patterns by concentrating on smaller data windows to identify complex relationships. Adding Multi-Layer Perceptrons (MLPs) enables these models to perform complex computations, enhancing their ability to understand and remember factual information. This forms the core of the SAMBA architecture, which excels in tasks requiring deep language understanding and generation, such as responding to questions or writing code. SAMBA's efficiency in managing longer text passages without losing track of earlier content makes it particularly effective for complex language tasks that involve substantial context. This efficiency marks a significant progression in natural language processing, promising improved performance in practical applications where large-scale text comprehension is essential. The potential of SAMBA extends beyond text; its architecture can also be adapted for analyzing numbers, speech, and video. By incorporating cross-attention, SAMBA can detect patterns across different modalities, broadening its applicability and enhancing its analytical power. This versatility points to a promising direction for future research and application across diverse data types. Innovation Hacks AI #generativeai #ai #llm Link to the paper:https://lnkd.in/dTKM-x77 Github: https://lnkd.in/dGEZ3XYb
To view or add a comment, sign in
-
🇮🇳 | AI/ML Fanatic | Ex- AI Developer @DIRO | Ex- AI/ML Developer Inter @MonsterAPI | Ex-AI Intern @MetaGeeksTechnologies | Ex-Project Intern @STMicroElectronics |
1moThe AI world is evolving at an alarming pace