Tero Keski-Valkama’s Post

Helping AIs cure cancer. AI generalist working from Spain. Experience in leadership, AI research, software engineering.

6mo

To first get the elephant out of the way, the graph says no such thing. The y-axis is "attention on RL (reinforcement learning)", and not any indicator of the strength or intelligence of the models. I am now seeing a lot of posts about "RLHF is barely RL", and that we should try to make the models "more RL". This is part true but subtly incorrect. What we want is more search, not sparse scalar rewards. By search I obviously mean the algorithmic sense, not "search engine" sense. Search is about exploring the solution space to find satisfactory or best possible solutions. RL means using search to optimize sparse scalar reward feedback. RL in the classical sense doesn't scale, because any reasonable reward signals aren't informative enough to learn about the world. That's why RL systems generally have all sorts of hacks from proxy reward functions, like intrinsic rewards and differentiable critics, to world models which learn by self-supervision. Over the history of RL, we have actually done less and less RL, and more and more self-supervises learning, to get more information and training feedback from the world. We have also swapped naive rewards to depictions of goal states (for controllability), or to preferences (e.g. DPO, to sidestep the impossible challenge of defining a good reward signal). May I also point out how terribly bad an idea it is to set a single reward objective to a highly intelligent entity? It will make a huge mess, no matter how smart you think you were when you set it to be rewarded by profit, or by closed sales. It will not only make a mess, but it will lead to a monomaniacal specialist AI, not a generalist one. It will be weak and highly exploitable by AIs not encumbered by the same dysfunctions. What we need is more search over open-ended non-imitative objectives, not "more RL".

To view or add a comment, sign in

More Relevant Posts

Christian Ulstrup

AI Implementation Expert | Fmr. MIT AI Co-Chair | Helping Leaders Execute 10x Faster | ex-Red Bull, -Arterys (acq. by Tempus AI, NASDAQ:TEM), -ARPA-H AI Advisor | Book a Strategic Planning Call
6mo
Report this post
To all my AI agent bulls out there... I'm curious, what's the most compelling example that you've seen? I'm bullish on AI in general, but am extremely bearish on agents per se because of the reliability issues and compounding errors. I do think this will get solved more or less through brute force compute and training, but probably not for another decade or so. All of the frameworks and whatnot I've seen definitely spark imagination, but besides one click, mediocre content generators (aka "AI slop machines") that are polluting the public web, I haven't seen anything that really works all that well that I would consider an agent (i.e., horizontally scaled sequence of reads and writes that can handle flexible inputs and outputs and deliver more quality outcomes than it destroys through the introduction of errors and risk), and even then you still need to apply a huge amount of human oversight. I definitely want to be wrong about this and would love to see a concrete example of something promising if you're able to share!

5 Comments
Like Comment
To view or add a comment, sign in
Hashim Muhammad Nadeem

Building Vertical AI Agents | 5x National Hackathon Winner | Co-Founder, Chief AI & Data Officer @ cosmosys | AI Engineer @ Knowledge Discovery & Data Science Lab
4mo
Report this post
The primary business use case I've seen for RAG is when the external knowledge doesn't contain the information the company wants. Which is usually why they use RAG. I wonder how this would help in that.
Mehdi Allahyari

AI Researcher | Full-stack AI Engineer | Design, build and implement Machine learning and language model products | AI/ML/ NLP writer | PhD
4mo Edited

Another interesting paper from Google about RAG! 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗲𝗱 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 (𝗥𝗔𝗚) is great. RAG has become the go-to approach for LLMs to tackle knowledge-intensive tasks. It's like giving your AI a really smart research assistant, but it's not perfect. RAG isn't foolproof. It can sometimes pull in irrelevant, misleading, or even flat-out wrong information. 😬 This research paper shows that roughly 𝟳𝟬% 𝗼𝗳 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗲𝗱 𝗽𝗮𝘀𝘀𝗮𝗴𝗲𝘀 𝗱𝗼𝗻'𝘁 𝗱𝗶𝗿𝗲𝗰𝘁𝗹𝘆 𝗰𝗼𝗻𝘁𝗮𝗶𝗻 𝘁𝗿𝘂𝗲 𝗮𝗻𝘀𝘄𝗲𝗿𝘀. That's a big problem for reliability. Solution: 𝗔𝘀𝘁𝘂𝘁𝗲 𝗥𝗔𝗚 This approach does something clever: 1. It taps into the LLM's internal knowledge first 2. Then it compares that with the external info from RAG 3. Finally, it consolidates everything, filtering out conflicts and irrelevancies The results? 𝗔𝘀𝘁𝘂𝘁𝗲 𝗥𝗔𝗚 𝗼𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝗽𝗿𝗲𝘃𝗶𝗼𝘂𝘀 𝗥𝗔𝗚 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵𝗲𝘀, even when the retrieval quality is poor. Why does this matter? Because as we push AI to tackle more complex, real-world problems, we need it to be as reliable and accurate as possible. Astute RAG is a step towards making AI responses more trustworthy and grounded in reality. Full paper in the comments. If you'd like to master RAG systems, make sure to check out our course: https://lnkd.in/eJTZpeWv
Like Comment
To view or add a comment, sign in
Pavan Belagatti Pavan Belagatti is an Influencer

GenAI Evangelist | Developer Advocate | Tech Content Creator | 35k Newsletter Subscribers | Empowering AI/ML/Data Startups
6mo
Report this post
Let's Understand & Build an #AgenticRAG Pipeline in 13 Mins! My complete step-by-step video: https://lnkd.in/euQZPRH5 Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model using an external knowledge base attached through a vector database. It is all okay until we have simple documents and use cases. But, with the introduction of AI agents, the researchers thought why not integrate these AI agents in our RAG pipeline. Because these AI agents are able to plan, come up with logic, reasoning, and they also know how to use tools whenever there is a requirement. This gave rise to the so called 'Agentic RAG' Agentic RAG utilizes intelligent agents and advanced LLMs to streamline and enhance the retrieval and generation process, making it more efficient compared to traditional RAG approach. Let's build a simple agentic RAG workflow using LangChain, CrewAI and Groq. https://lnkd.in/euQZPRH5 Well, if you are new to my profile, I am an active content creator and share such videos and content on a regular basis, please consider subscribing to my YouTube channel also: https://lnkd.in/gMCpfMKh

4 Comments
Like Comment
To view or add a comment, sign in
Marc Sedam
2mo
Report this post
I'm feeling a little saucy today and had this conversation a few times already this week so will put it here. LLMs are played out. Dead. Over. There are 4-5 companies with the financial and data resources to build a general LLM and they've already done it. And each of those organizations craves yet more data to train and adjust the model. LLMs are now feature rich and data poor. Most of these generic LLMs do 90% of the job for the vast majority of layperson applications, but never come close to perfection. The future is Small Language Models (SLMs) where researchers either build a full predictive model from scratch but for the purpose of doing one thing specifically well 100% of the time. So deep but not wide. These models will be built on highly-curated, well-understood data alongside the subject matter expertise that UNDERSTANDS both the problem and the solution. If I've said it 100 times I've said it once--the data IS the AI. Allie K. Miller

8 Comments
Like Comment
To view or add a comment, sign in
Alexander Ratner

Co-founder and CEO at Snorkel AI
10mo
Report this post
Prediction: In the next phase of AI, some gains will come from *scaling up* dataset & LLM size- and many will now come from *scaling down*. Bigger dataset/model != better anymore. Scaling up: In some frontier areas like multimodal where we're likely far from data scale saturation (e.g. image, video, time series, motion/control, etc) we'll see continued zero-to-one step changes from scaling up data/LLM sizes. This will be especially powerful where there are existing synthetic data generators to leverage (e.g. video game, gameplay, logic engines). Scaling down: In areas like text/chat/etc, scaling to better, bigger jack-of-all-trades generalists will have diminishing returns, due to exhaustion of both resources (hitting limit of data on the internet) and patience (business/developers want reliable results on actual use cases, not better brainteaser memorizers...). Here, base LLMs will continue to commoditize, and the game will be about training/tuning on small, carefully curated use case/domain-specific datasets for high performance (accuracy & cost) on specific tasks. We've already seen amazing results from the latter Snorkel AI! Intuition: - If you are training a toddler to read/talk, volume of raw data matters. - If you are training a new grad employee- you want a carefully curated curriculum for the task they are actually supposed to learn to do. You don't care about how many internet brainteasers they've memorized... you want them to perform on a specific set of tasks with high accuracy and speed. This is a big shift in how we think about LLM "scaling"- and it's all about how you curate & develop the data!

16 Comments
Like Comment
To view or add a comment, sign in
Spence Green

CEO @ LILT | The Last Mile Podcast Host
10mo
Report this post
Agreed with the latter point, especially when considering performance / cost. We find 100x cost efficiency vs the frontier models for 1-7B parameter models specialized to specific tasks. SFT of this class of models gives additional task gains while preserving the cost advantages.

Alexander Ratner

Co-founder and CEO at Snorkel AI
10mo

Prediction: In the next phase of AI, some gains will come from *scaling up* dataset & LLM size- and many will now come from *scaling down*. Bigger dataset/model != better anymore. Scaling up: In some frontier areas like multimodal where we're likely far from data scale saturation (e.g. image, video, time series, motion/control, etc) we'll see continued zero-to-one step changes from scaling up data/LLM sizes. This will be especially powerful where there are existing synthetic data generators to leverage (e.g. video game, gameplay, logic engines). Scaling down: In areas like text/chat/etc, scaling to better, bigger jack-of-all-trades generalists will have diminishing returns, due to exhaustion of both resources (hitting limit of data on the internet) and patience (business/developers want reliable results on actual use cases, not better brainteaser memorizers...). Here, base LLMs will continue to commoditize, and the game will be about training/tuning on small, carefully curated use case/domain-specific datasets for high performance (accuracy & cost) on specific tasks. We've already seen amazing results from the latter Snorkel AI! Intuition: - If you are training a toddler to read/talk, volume of raw data matters. - If you are training a new grad employee- you want a carefully curated curriculum for the task they are actually supposed to learn to do. You don't care about how many internet brainteasers they've memorized... you want them to perform on a specific set of tasks with high accuracy and speed. This is a big shift in how we think about LLM "scaling"- and it's all about how you curate & develop the data!

1 Comment
Like Comment
To view or add a comment, sign in
Mark Kovarski

Responsible AI | Co-Founder | CTO | Enterprise | Automation
1mo Edited
Report this post
𝐊𝐢𝐦𝐢 𝐤1.5: 𝐀 𝐍𝐞𝐰 𝐌𝐮𝐥𝐭𝐢-𝐌𝐨𝐝𝐚𝐥 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐋𝐋𝐌 Deepseek has introduced a new open-source R1 model this morning, while Moonshot AI also this morning introduced Kimi k1.5, a multi-modal reasoning LLM that utilizes reinforcement learning with both long and short-chain-of-thought (CoT). The secret of o1 is out! 🚀 No PRM, no MCTS, no intricate playbook—just large-scale, verifiable data driving reasoning and self-reflection. Any RL algorithm can unlock this potential. 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐨𝐟 𝐊𝐢𝐦𝐢 𝐤1.5 ✨ Short-CoT Performance - Outperforms GPT-4o and Claude Sonnet 3.5 on benchmarks like AIME, MATH-500, and LiveCodeBench by up to +550%. 🔗 Long-CoT Performance - Matches o1-level performance across modalities, including MathVista, AIME, and Codeforces. 𝐊𝐞𝐲 𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐬 𝐨𝐟 𝐊𝐢𝐦𝐢 𝐤1.5 Long Context Scaling: ⮑ Handles up to 128k tokens for RL-based generation. ⮑ Trained efficiently using partial rollouts. Enhanced Policy Optimization: ⮑ Incorporates techniques like online mirror descent, sampling strategies, and length penalties. Multi-Modal Capabilities: ⮑ Combines text and vision reasoning. We know enough now to know o1-type inference will be cheap in the future. Tech report: https://lnkd.in/gRy6nm7k Home: https://kimi.ai/
Like Comment
To view or add a comment, sign in
durga maruthi vara prasad

AI researcher
5mo
Report this post
interesting GEN AI project : actually I was planning to build a ai project which could act as our strategic companion. what is the idea : so basically how interesting would that if we could able to fine tune a LLM model where you could provide your life current situation and it what you want { it could be in the form of goals , desires ,needs } and how you would like to go about it. where the LLM model now takes those inputs and then it would actually provide you a detail strategic outline of what are the possible options you have , and what are the possible decision you could decide and what are the long term consequence , what are the the things you need to prepare yourself from, and if could provide what are your enemies up to how interesting would that if It could also provide counter attack plan. counter points: 1. LLM can't predict the future ? response : that true , either I could but we could predict and especially with the upcoming LLM which has access to data from the internet all the time how would powerful insights it could generate based on the data itself ? based on what are the list of lot of things which are happening around how can it make sure that I could take the best possible data drive decision ever which protects me from all the dangers of the world to certain extent. 2.what could I actually possible benefit from ? well having a strategic companion could actual help me in making the right decision not in short games but for long term games. you could actual have a person would could analyse what is actual going on around the world and would make sure that you won't lose or failure to the external factors which if you were aware of it could have mitigated a long time ago can be done by your ai strategic companion. - at last helps you in preparing for all sort of circumstances which you could have never thought of which could revolutionise your life. 3. why can't I just prompt it ? well you could but having a dedicate fine tune llm might provide a better response and output. and if I can I would like to see how we can modify the way llm could provide us the output not in just linear text but a more of strategic diagram of all possibilities. anyone interested could actual open a startup around this and may eventual build a epic AI which not only servers any given individual but for the whole world. let the strategic battle began
Like Comment
To view or add a comment, sign in
Phillip Carter

Principal PM @ Honeycombio
2mo
Report this post
Now we're "going public" with honeycomb.io's position on AI: First, and most obvious, we are in a bubble. Duh. But the size of the bubble correlates with the magnitude of its ultimate impact. Bill Gates has a famous quote here: "We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten." Second, and less obvious to many, AI/LLMs is not merely "a guessing machine" like it used to be circa 2017. There are concrete and very real advances that move LLMs from little thingies that obliviously pick a word on a probability distribution to actual forms of reasoning. Advances in post-training and RLHF empirically capture a form of grounding in truth, and techniques like chain-of-thought and tree-of-thought have demonstrated forms of reasoning nobody could have predicted just a few years ago. It's real, it's what they are, and that's what we're dealing with. AI is a new, weird, and sometimes janky kind of virtual computer. It's a computer that's bad at adding two numbers, but it can turn Japanese into domain-specific JSON pretty easily. If you're out here saying you've got all of this figured out, you're a goddamn liar. Nobody here knows what to REALLY make of this stuff just yet. We sure don't over at Honeycomb. But we're gonna earnestly post a ton of content showing you how we're learning, and we hope engineers who are also trying to make sense of this stuff can get a lot of value out of that. Yes, there will be some product announcements and "marketing moments". But I'm more excited about how we're going to post, earnestly, engineer-to-engineer, about all the kinds of things we tried and what worked, what didn't, and what we think you might take away from that. Third, AI is probably not about to replace software engineers or support staff in any reasonably effective capacity, and those who are trying to do so are making a poor and also immoral decision. Fuck that noise. But it WILL shape the nature of this kind of work, and the pattern we're seeing as most effective is using this technology as a second-guesser/reviewer of work done by a human. Perhaps counter-intuitively, LLMs are very bad at producing useful, novel code or content, but they are extremely good at ensuring a piece of code or content is coherent with respect to some goals. And finally, Observability. It's more important that ever! Haha, the observability company says its purpose gets more important, go figure! But really though. Observability is ULTIMATELY about knowing if your software is actually accomplishing its goals. Whether you do that by trawling through some logs or have better analysis tools like Honeycomb, that's what you're after. And in a world where more and more code gets written by an AI, the focus shifts even more towards ensuring it actually does what it claims to be doing. How do you measure that? Starts with an O and ends with bservability. Article in the comments because LinkedIn algo.
29 Comments
Like Comment
To view or add a comment, sign in

4,008 followers

View Profile Connect

Tero Keski-Valkama’s Post

More from this author

Large Language Models - What Now?

My Story of 63 Job Applications and 83 Job Interviews in Slightly Over 2 Months

How to job interview a Finnish software engineer:

Explore topics