With the o3 model making a massive leap forward—are we so close to achieving AGI that OpenAI no longer sees it as the ultimate destination? o3 scored 87.5% on the ARC-AGI benchmark (high compute) and 76.5% on standard compute. It took four years to progress from 0% (GPT-3) to 5% (GPT-4o), but only six months to leap to an impressive 87.5%. This isn’t just incremental progress—it’s a groundbreaking transformation. Meanwhile, the Financial Times reports that OpenAI is considering removing its AGI clause with Microsoft to unlock more funding. Why would OpenAI move away from a clause tied to its core mission? AGI is hard to define and even harder to measure. If AGI means solving ANY problem a human can easily handle, we’re still far from that point. But what’s happening right now might be just as significant. With models like o3, we’re entering the era of sub-AGIs—systems that outperform humans in specific domains and can execute end-to-end tasks autonomously. These systems have the potential for massive economic impact. As for OpenAI, scaling this kind of research requires billions in R&D. Just three months after the o1 model, OpenAI announced o3. The need for significant funding is clear, and the shift from chasing unclear AGI ambitions to delivering high-performing, adaptable systems may now be the focus. The pace of AI advancement is truly remarkable!
Roy Nissim’s Post
More Relevant Posts
-
DeepSeek has just released its latest open-source model, DeepSeek R1, and it's making waves in the AI community. R1 is not only on par with OpenAI’s o1 model in performance but is also ~27x cheaper than OpenAI’s API pricing! Will DeepSeek’s aggressive pricing and open-source approach start eroding the pricing power of leading AI labs? Key highlights: 1️⃣ Performance: DeepSeek R1 matches OpenAI’s o1 and outperforms Claude’s 3.5 Sonnet. 2️⃣ Unbeatable Pricing: DeepSeek R1: Input: $0.55 / Output: $2.19 per million tokens OpenAI o1: Input: $15.00 / Output: $60.00 per million tokens This is ~27x cheaper than OpenAI, offering incredible value. 3️⃣ Open Source for All: DeepSeek R1 is released under a permissive MIT license, allowing users to run it privately and build around it freely. 4️⃣ Transparency & Innovation: Alongside R1, DeepSeek released Qwen and Llama versions distilled with the same training pipeline they use for R1. — The pricing in particular appears to be pretty aggressive trying to substantially undercut the pricing of the American AI labs. This adds to last month's release of DeepSeek v3 which was priced substantially below general market pricing. With its strong performance, affordability, and open-source nature (allowing the community to build around it extensively), DeepSeek seems ready to capture a larger share of the market. Just for example, on OpenRouter, DeepSeek v3 (released in December) has been the most-used open-source model in the last month!
To view or add a comment, sign in
-
-
I was playing with some of the AI text-to-image tools this weekend and I have a few takeaways. First of all, the evolution in the models is pretty incredible. This progression over OpenAI's Dall-E model with the prompt ""A cute baby sea otter" is wild (see attached). Second, Google's Imagen is too hard (maybe impossible?) to use outside of VertexAI Studio. Even when using their docs, you run into errors with their SDKs due to (I'm guessing) some sort of limited release or preview that is not mentioned in the docs. Still wading through colab notebooks to try to find a path through the weeds. One thing that OpenAI has done really well is allow easy access via a (somewhat) reliable API and only needing an API key to get up and running. No convoluted auth processes of spinning up service workers (although Google is simplifying this with their Gemini API). Side note, I LOVE the Google free tiers and a generous allotment with an easy API has me looking there first for text gen. Third, looking for interesting ways to self host or PaaS services to host a Hugging Face model and dataset to run my own image generation. With prices still pretty even across all providers at around $0.04/image, a more affordable alternative is worth pursuing when using at scale.
To view or add a comment, sign in
-
-
OpenAI Must Earn $100 Bn to Prove #AGI’s Worth to Microsoft An odd way to define if AGI is achieved. https://lnkd.in/g97UxFwe #GenerativeAI
To view or add a comment, sign in
-
Software will eat labour. OpenAI has announced their latest "o3 model", and it has just shattered the "ARC-AGI" benchmark, scoring an incredible 87.5%. This benchmark pushes an AI to its limits, evaluating its ability to tackle novel problems rather than just regurgitating its training data. Basically a test for AGI. This is what scaling LLM compute by $18bn gets you. OpenAI was the first to discover this, and they're leading the charge to AGI well!
To view or add a comment, sign in
-
-
Why LLMs models built aren't as useful as they should be? Most of the models, we are seeing are just a wrapper around OpenAI API or any other GenAI alternatives available. We need to build things that has contribution of other components as well. The more other components' contribution, the more we can differentiate our product. In this regard, Knowledge Graph could play a critical role. We could use knowledge graphs to fine-tune the model's response and make it a specialist rather than a generalist.
To view or add a comment, sign in
-
Wait - is the new GPT-4o a smaller and less intelligent model? We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o. GPT-4o (Yesterday's Nov release) vs GPT-4o (Aug): ➤ Artificial Analysis Quality Index decrease from 77 to 71 (now equal to GPT-4o mini) ➤ GPQA Diamond decrease from 51% to 39%, MATH decrease from 78% to 69% ➤ Speed increase from ~80 output tokens/s to ~180 tokens/s ➤ No pricing change Our Output Speed benchmarks are currently measuring ~180 output tokens/s for the Nov 20th model, while the August model shows ~80 tokens/s. We have generally observed significantly faster speeds on launch day for OpenAI models (likely due to OpenAI provisioning capacity ahead of adoption), but previously have not seen a 2x speed difference. Based on this data, we conclude that it is likely that OpenAI’s Nov 20th GPT-4o model is a smaller model than the August release. Given that OpenAI has not cut prices for the Nov 20th version, we recommend that developers do not shift workloads away from the August version without careful testing.
To view or add a comment, sign in
-
-
OpenAI Must Earn $100 Bn to Prove #AGI’s Worth to Microsoft An odd way to define if AGI is achieved. https://lnkd.in/g97UxFwe #GenerativeAI
OpenAI Must Earn $100 Bn to Prove AGI’s Worth to Microsoft
http://analyticsindiamag.com
To view or add a comment, sign in
-
OpenAI confirmed the release of o3 and o3-mini reasoning models, claiming advancements toward AGI with step-by-step reasoning, adjustable thinking times, and superior benchmark performance. While addressing safety with “deliberative alignment,” o3 highlights include a record-breaking 25.2% score on Frontier Math. The launch follows increased competition and leadership changes within OpenAI. How did OpenAI roll it out? • Launch of o3 Models: OpenAI unveiled o3 and o3-mini, successors to the o1 reasoning model, with claims of approaching AGI capabilities in certain conditions. • Enhanced Reasoning: o3 features a “private chain of thought,” enabling step-by-step reasoning, with adjustable thinking time for improved accuracy. • Improved Performance: Outperforms o1 on multiple benchmarks, including a record-breaking 25.2% on Frontier Math, where no other model exceeds 2%. • Multifunctional Design: Offers low, medium, and high reasoning time modes to suit varying complexity and response requirements. • AGI Proximity: Scored 87.5% on ARC-AGI, showcasing progress toward artificial general intelligence. • Competition: Released amidst a surge in reasoning models by rivals like Google, Alibaba, and DeepSeek. • Safety Measures: Employs “deliberative alignment” techniques for improved safety and transparency. • Future Plans: Partnering with ARC-AGI for next-gen benchmarks and staggered release plans starting January 2025. • Leadership Change: Alec Radford, a key scientist behind OpenAI’s GPT models, departed to pursue independent research. https://lnkd.in/dZUvxGXY
OpenAI announces new o3 models | TechCrunch
https://techcrunch.com
To view or add a comment, sign in
-
OpenAI rolls out the full version of o1, its hot reasoning model Excerpt: "The model, which seems geared toward scientists, engineers, and coders, is designed to solve thorny problems. The researchers said it's the first model that OpenAI trained to "think" before it responds, meaning it tends to give more detailed and accurate responses than other AI helpers."
OpenAI rolls out the full version of o1, its hot reasoning model
msn.com
To view or add a comment, sign in
-
LLMs are a race to the bottom. OpenAI responds to Deepseek R1 by launching new o3-mini. Its priced at $1.10/million input tokens, $4.40/million output tokens—less than half the price of GPT-4o (currently $2.50/$10) and massively cheaper than o1 ($15/60). Competitive in pricing to Deepseek R1 but wins some and loses some on benchmarks against R1. Huge win for consumers with competitive open-source LLMs pushing the market. https://lnkd.in/gPZse6SW
OpenAI launches o3-mini, its latest 'reasoning' model | TechCrunch
https://techcrunch.com
To view or add a comment, sign in
2x Founder | PhD, MBA | Cutting Through AI Complexity to Drive Real-World Impact
1moOpenAI's o3 ARC-AGI testing report: https://arcprize.org/blog/oai-o3-pub-breakthrough FT on OpenAI potentially ditching AGI clause: https://www.ft.com/content/2c14b89c-f363-4c2a-9dfc-13023b6bce65 Sam Altman's says "We Know How To Build AGI" in his recent blog: https://blog.samaltman.com/reflections