𝐃𝐚𝐲 7 𝐈𝐧 𝐆𝐞𝐧𝐀𝐈: 𝐄𝐱𝐩𝐥𝐨𝐫𝐢𝐧𝐠 𝐁𝐄𝐑𝐓'𝐬 𝐂𝐨𝐫𝐞 𝐚𝐧𝐝 𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠 Day 7 of my Generative AI journey was packed with insights as I wrapped up the BERT paper, "𝐁𝐄𝐑𝐓: 𝐏𝐫𝐞-𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐨𝐟 𝐃𝐞𝐞𝐩 𝐁𝐢𝐝𝐢𝐫𝐞𝐜𝐭𝐢𝐨𝐧𝐚𝐥 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 𝐟𝐨𝐫 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐔𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠" by Devlin et al. I also delved into Umair Jamil’s video, "𝐁𝐄𝐑𝐓 𝐞𝐱𝐩𝐥𝐚𝐢𝐧𝐞𝐝: 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠, 𝐈𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞, 𝐁𝐄𝐑𝐓 𝐯𝐬 𝐆𝐏𝐓/𝐋𝐋𝐌, 𝐅𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠, [𝐂𝐋𝐒] 𝐭𝐨𝐤𝐞𝐧", which added another layer of clarity to my understanding. ( Check it out: https://lnkd.in/dGZg6mQw ) 𝐔𝐧𝐩𝐚𝐜𝐤𝐢𝐧𝐠 𝐭𝐡𝐞 𝐁𝐄𝐑𝐓 𝐏𝐚𝐩𝐞𝐫 The architecture and pre-training strategies in BERT are a masterclass in optimization and versatility. Here’s what stood out: → [𝐂𝐋𝐒] 𝐓𝐨𝐤𝐞𝐧: Used for sentence-level tasks, the token aggregates information from the input sequence and is pivotal for classification models. → 𝐌𝐚𝐬𝐤𝐞𝐝 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠 (𝐌𝐋𝐌): This clever approach of masking random tokens teaches BERT to predict context-sensitive words, enabling a bidirectional understanding of language. → 𝐍𝐞𝐱𝐭 𝐒𝐞𝐧𝐭𝐞𝐧𝐜𝐞 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧 (𝐍𝐒𝐏): A task that helps BERT understand sentence relationships, making it effective for tasks like Q&A and summarization. 𝐈𝐧𝐬𝐢𝐠𝐡𝐭𝐬 𝐟𝐫𝐨𝐦 𝐔𝐦𝐚𝐢𝐫 𝐉𝐚𝐦𝐢𝐥’𝐬 𝐕𝐢𝐝𝐞𝐨 Umair Jamil’s explanation bridged the gap between theory and practice. He not only demystified BERT’s architecture but also touched on: → 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐓𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞𝐬: How large-scale datasets and compute are utilized for pre-training. → 𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠: The ease of adapting BERT for tasks like sentiment analysis and named entity recognition by tweaking the final layers. → 𝐂𝐨𝐦𝐩𝐚𝐫𝐢𝐬𝐨𝐧𝐬: Distinguishing BERT from GPT and other LLMs, emphasizing its pre-training methodology and bidirectional nature. And that was it for today 🤷♂️ 𝐖𝐡𝐚𝐭’𝐬 𝐍𝐞𝐱𝐭? Now it’s time to put theory into practice. I'll try implementing the BERT tomorrow. #GenAI #BERT_learnings
Firoj Paudel’s Post
More Relevant Posts
-
An interesting read for anyone who uses GPT every day. TLDR: Having the model "reread" the prompt once improves performance, so use a template like this: "Q: {Input Query}. Read the question again: {Input Query}." https://lnkd.in/eezqPa2f
To view or add a comment, sign in
-
🔍 Explore GPT-2 Right in Your 𝗕𝗿𝗼𝘄𝘀𝗲𝗿 with 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿 𝗘𝘅𝗽𝗹𝗮𝗶𝗻𝗲𝗿 🧠 Ever wondered how transformer models like GPT-2 work under the hood? 💭 𝘛𝘳𝘢𝘯𝘴𝘧𝘰𝘳𝘮𝘦𝘳 𝘌𝘹𝘱𝘭𝘢𝘪𝘯𝘦𝘳 is an interactive tool that brings the architecture of GPT-2 to life—𝗿𝘂𝗻𝗻𝗶𝗻𝗴 𝗱𝗶𝗿𝗲𝗰𝘁𝗹𝘆 𝗶𝗻 𝘆𝗼𝘂𝗿 𝗯𝗿𝗼𝘄𝘀𝗲𝗿. Built on a model derived from 𝗔𝗻𝗱𝗿𝗲𝗷 𝗞𝗮𝗿𝗽𝗮𝘁𝗵𝘆’𝘀 𝗻𝗮𝗻𝗼𝗚𝗣𝗧 project and converted to 𝗢𝗡𝗡𝗫 𝗥𝘂𝗻𝘁𝗶𝗺𝗲, it allows for seamless 𝗶𝗻-𝗯𝗿𝗼𝘄𝘀𝗲𝗿 𝗲𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻. You can input prompts, see token probabilities, and visualize every computation across layers—all in real-time. 🔥 For anyone serious about understanding transformers, this tool is a must-try. #ai #machinelearning #gpt2 #transformers #techtools #dataviz #deeplearning #transformer_explainer
To view or add a comment, sign in
-
OpenAI just dropped a bombshell: GPT O1 is live, and it's a game-changer 🔥. Quick breakdown of what matters: • 34% fewer major mistakes than preview • 50% faster response time • Full multimodal capabilities • New Pro tier ($200/month) with unlimited access The most interesting part? They've introduced "O1 Pro Mode" - essentially GPT-4 Turbo with maximum compute power for solving complex problems. Real numbers from their demo: → Competition math: significant boost over base O1 → Reliability metrics show measurable improvements → Multimodal analysis solving complex engineering problems For developers: API access is coming with structured outputs, function calling, and image understanding capabilities. Key takeaway: This isn't just an incremental update. OpenAI is clearly positioning this as their most capable model yet, especially for technical users. What intrigues me most is the decision to launch a Pro tier at $200. It signals a clear shift toward power users and professional applications. What's your take - is unlimited access to the most powerful AI worth $200/month for your business?
To view or add a comment, sign in
-
-
Developers can now fine-tune GPT-4o with custom datasets, boosting performance and accuracy for their specific needs. #AI #MachineLearning #TechUpdate
To view or add a comment, sign in
-
Looking forward to “consulting companies” that start selling it, and most importantly customers that for some reason decide to buy it. Those use-cases will be worth a lot of our attention, unlike “gamechanger” demos like this. I don’t believe that strategy advice is possible to imitate. And it is nothing more than imitation. It may look believable to someone on the outside, but it’s still just a word salad with no understanding of context, problem, risks, analysis of alternatives, priorities, forks on the path, etc. If the situation is new and unique - there is no training data for “AI” to learn from. If the situation is not unique - congratulations, you’ll get an average of stratgies tried and implemented by your competition, just later than them. I don’t think that is how you gain strategic advantage 😎 Humans can learn from very limited data. Humans can transfer knowledge and logic between industries, markets and cultures. And most importantly, humans can make judgement calls when it makes sense to do it and when it does not. If you current human consultants do it or not is completely different conversation 😁 #AI #o1
I tested GPT o1 this morning…. Wow. Strategy has historically been one of the highest margin services a business can offer. But your strategy practice is about to get COOKED by agentic AI if you aren't ready to adapt. Advanced reasoning for all is here. Knowledge is no longer the bottleneck. In my testing of OpenAI's GPT-o1 preview, I gave it a MBA style business case along with a complete list of deliverables that a McKinsey team might typically create to address the case. (I created both of these synthetically with help from Claude 3.5 Sonnet, you can read the raw material in the blog linked below.) It thought for 125 seconds… Then, I kid you not, it wrote for 40 minutes straight. It produced 7 consecutive sections of the deliverable plan I requested. The material was well thought out, though it was light on the level of detail for each atomic element. But… This was done ZERO SHOT. I literally sent a single prompt with my context (albeit a well-structured request and well-structured context) and it put out this whole response in one go. With additional rounds of iteration with the user or longer inference time on the reasoning steps, this strategy is rapidly filled out in a more robust capacity. I started screen recording at Deliverable 2 because I know how the new foundational model releases go. They’re a bit unhinged / loose around the edges, so you’ll get some very novel and unique behavior from a request that may then get cut off, canceled, deleted, or altered by the filtering system retroactively. 40 mins in it finally triggered a “hmm something’s wrong” response, stopped writing, then deleted its response. If you want to see the full text of what was produced up to the cutoff, read my blog here: https://lnkd.in/ga4JqNf2 #AI #agents #agenticengineering #o1 #gpto1 #innovation #disruption #artificialintelligence #chainofthought #reasoning
To view or add a comment, sign in
-
Critical review of the following post, which claims that GPT-o will generate strategic plans that will "cook" strategy consulting companies. Note: I didn't even try hard. Summary: GPT-o read the provided business situation document, which was written to set up output like placing a ball on a tee, and parroted it back as a "strategic outline." GPT is still a stochastic parrot 🦜 Fact: Imitation is not a strategy. If EngeryX is willing to spend 16 weeks--1/3 year--snd millions of dollars to be told that they need to play catch up with their competitors, there's no hope for them. Given their competitors aren't vacuous as is EnergyX, they will continue to innovate rather than sit on their hands as did EnergyX. How did they ever succeed as a startup? And 15% is not good to begin with, unless every competitor has no more than 10-15% of the market. How will this draw high AI/ML talent and retain it? We're thinking they are really smart, I imagine. Any C-suite that would take this path and pay for it is already looking for their parachute packages, and exchanging thIs engagement for references to their next McKinsey-friendly executive position. Sorry. I flew Frontier today and I'm in no mood for more bad business pitches. But thanks for giving me an easy challenge for my weekend cool-down. Bonus: I'm pretty sure Musk wouldn't be happy with the name.
I tested GPT o1 this morning…. Wow. Strategy has historically been one of the highest margin services a business can offer. But your strategy practice is about to get COOKED by agentic AI if you aren't ready to adapt. Advanced reasoning for all is here. Knowledge is no longer the bottleneck. In my testing of OpenAI's GPT-o1 preview, I gave it a MBA style business case along with a complete list of deliverables that a McKinsey team might typically create to address the case. (I created both of these synthetically with help from Claude 3.5 Sonnet, you can read the raw material in the blog linked below.) It thought for 125 seconds… Then, I kid you not, it wrote for 40 minutes straight. It produced 7 consecutive sections of the deliverable plan I requested. The material was well thought out, though it was light on the level of detail for each atomic element. But… This was done ZERO SHOT. I literally sent a single prompt with my context (albeit a well-structured request and well-structured context) and it put out this whole response in one go. With additional rounds of iteration with the user or longer inference time on the reasoning steps, this strategy is rapidly filled out in a more robust capacity. I started screen recording at Deliverable 2 because I know how the new foundational model releases go. They’re a bit unhinged / loose around the edges, so you’ll get some very novel and unique behavior from a request that may then get cut off, canceled, deleted, or altered by the filtering system retroactively. 40 mins in it finally triggered a “hmm something’s wrong” response, stopped writing, then deleted its response. If you want to see the full text of what was produced up to the cutoff, read my blog here: https://lnkd.in/ga4JqNf2 #AI #agents #agenticengineering #o1 #gpto1 #innovation #disruption #artificialintelligence #chainofthought #reasoning
To view or add a comment, sign in
-
Companies won't use AI for strategy. But consultancies will. If you're an overworked consultant on 14h grinds, why wouldn't you use it? The big consultancies might still hesitate on this. But if you are a smaller one, you can now compete with the big ones on deliveries. Hundreds of pages on strategy paper? No problem. And if you think strategy consulting is about deliveries... you don't know consulting work.
I tested GPT o1 this morning…. Wow. Strategy has historically been one of the highest margin services a business can offer. But your strategy practice is about to get COOKED by agentic AI if you aren't ready to adapt. Advanced reasoning for all is here. Knowledge is no longer the bottleneck. In my testing of OpenAI's GPT-o1 preview, I gave it a MBA style business case along with a complete list of deliverables that a McKinsey team might typically create to address the case. (I created both of these synthetically with help from Claude 3.5 Sonnet, you can read the raw material in the blog linked below.) It thought for 125 seconds… Then, I kid you not, it wrote for 40 minutes straight. It produced 7 consecutive sections of the deliverable plan I requested. The material was well thought out, though it was light on the level of detail for each atomic element. But… This was done ZERO SHOT. I literally sent a single prompt with my context (albeit a well-structured request and well-structured context) and it put out this whole response in one go. With additional rounds of iteration with the user or longer inference time on the reasoning steps, this strategy is rapidly filled out in a more robust capacity. I started screen recording at Deliverable 2 because I know how the new foundational model releases go. They’re a bit unhinged / loose around the edges, so you’ll get some very novel and unique behavior from a request that may then get cut off, canceled, deleted, or altered by the filtering system retroactively. 40 mins in it finally triggered a “hmm something’s wrong” response, stopped writing, then deleted its response. If you want to see the full text of what was produced up to the cutoff, read my blog here: https://lnkd.in/ga4JqNf2 #AI #agents #agenticengineering #o1 #gpto1 #innovation #disruption #artificialintelligence #chainofthought #reasoning
To view or add a comment, sign in
-
In today's bold, new AI world, the almost daily release of "new" AI models is becoming the norm. This rapid innovation is a natural part of the technology cycle, driving progress and opening up unprecedented possibilities. From a market perspective, this is beneficial, offering us a plethora of choices. However, we must choose wisely. The Diffusion of Innovations theory explains this well: these advancements spread through society in stages, from early adopters to the majority, and finally to the laggards. Let's stay curious, adaptable, and ensure we harness this technology responsibly, without sacrificing privacy, security, or the intrinsic value of human beings. Together, we can shape a future where technology and humanity thrive in harmony. #AI #Data #Innovation #TechCycle #FutureReady #EthicalAI
AI Developer Experience at Google DeepMind 🔵 prev: Tech Lead at Hugging Face, AWS ML Hero 🤗 Sharing my own views and AI News
Deepseek V3 beaten! Ai2 released Tülu 3 405B, an open-source post-training model built on Llama 3.1 405B. It outperforms DeepSeek AI V3 the base model behind Deepseek R1 and is on par with OpenAI GPT-4o. 👀 Tülu 3 405B is not a Reasoning model like R1 or o1, but team shared that they are working on it! 🔥 Tülu Recipe: 1️⃣ Data Curation: Collect a diverse mix of public datasets and synthetic data using persona-driven methods, focusing on core skills like reasoning, coding, and safety. 2️⃣ Supervised Finetuning (SFT): Train initial models on a carefully curated mix, iteratively refining the mix and decontaminating against evaluation datasets. 3️⃣ Preference Tuning (DPO): Optimize models using Direct Preference Optimization with a mix of on-policy (SFT vs. other models) and off-policy data. 4️⃣ Reinforcement Learning on Verifiable Rewards (RLVR): RL (PPO) based optimization on specific skills like math and instruction following verifiable rewards, .e.g. math Trainings Code, Datasets, and Model weights are all released: Code: https://lnkd.in/eNmZS7aP Models: https://lnkd.in/e7wyGd6j Paper: https://lnkd.in/eXR2xZKD
To view or add a comment, sign in
-
-
Deepseek V3 beaten! Ai2 released Tülu 3 405B, an open-source post-training model built on Llama 3.1 405B. It outperforms DeepSeek AI V3 the base model behind Deepseek R1 and is on par with OpenAI GPT-4o. 👀 Tülu 3 405B is not a Reasoning model like R1 or o1, but team shared that they are working on it! 🔥 Tülu Recipe: 1️⃣ Data Curation: Collect a diverse mix of public datasets and synthetic data using persona-driven methods, focusing on core skills like reasoning, coding, and safety. 2️⃣ Supervised Finetuning (SFT): Train initial models on a carefully curated mix, iteratively refining the mix and decontaminating against evaluation datasets. 3️⃣ Preference Tuning (DPO): Optimize models using Direct Preference Optimization with a mix of on-policy (SFT vs. other models) and off-policy data. 4️⃣ Reinforcement Learning on Verifiable Rewards (RLVR): RL (PPO) based optimization on specific skills like math and instruction following verifiable rewards, .e.g. math Trainings Code, Datasets, and Model weights are all released: Code: https://lnkd.in/eNmZS7aP Models: https://lnkd.in/e7wyGd6j Paper: https://lnkd.in/eXR2xZKD
To view or add a comment, sign in
-
-
Mocking LLMs for not getting certain tasks right is like mocking knives because a steak knife can't saw a tree. We're talking about an early technology that's existed for 2.5 years. A technology that has already surpassed 99.9% of humans on many expert tasks when it comes to quality: legal comprehension, protein folding calculations, maths Olympiad questions, data foraging, even creative ventures like ideation in physics or copy writing (when done well). This is the baby technology. GPT 2/3 was the flint, GPT 4 is perhaps the bronze age. We've not put all the heft or knowhow into making the industrial bandsaw yet. We'll get there. It might not be LLMs, but it'll be something. We're still seeing big progress as we integrate ideas from cognitive science, and I suspect we will continue to see these gains at least for the next six months. GPT 4o1 integrates Chain of Thought (metaprompting - humans do this by asking "and what next") into the existing model which gives a huge uptick in efficacy on reasoning problems. And that happened just by changing the way we trained the model, not even changing its architecture significantly. Software engineers, lawyers, researchers, and even some doctors are already using these tools to great effect. We need to stopping thinking 'If' and asking 'therefore what?'.
To view or add a comment, sign in
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
2moBERT's bidirectional nature is key to nuanced understanding. Fine-tuning for specific tasks unlocks its practical potential. How do you envision leveraging BERT's contextual embeddings for zero-shot learning?