Did OpenAI Create The First AGI With o3?

Toward AGI

In the race to develop an AI model, the end goal is AGI (Artificial General Intelligence). This is a type of AI that can handle any type of situation, instead of being ultra-specialized to perform some tasks, like for example driving a car.

As long as we have only specialist AIs, they are going to be more algorithms than actual intelligence. The ability to deduce rules and react to new situations, and maybe more importantly, true reasoning would be a momentous event in mankind’s history, as such AGI would likely quickly become at least human-like.

And of course, not limited by the computation constraint of a single human skull’s “wetware”, such AGI could quickly keep improving and become a science-fiction super intelligence.

This immediately raises images of either a Star Trek-like utopia, or robot apocalypses like Terminator or Battlestar Galactica.

Source: Screen Rant

So obviously, it was a major headline when OpenAI announced on December 20^th that its latest AI model 03, “at least in certain conditions, approaches AGI — with significant caveats.”

OpenAI Models

o3 is the latest iteration of the company’s “o” series, following the earlier o1 (there is no o2, due to potential trademark conflict with the British telecom company of the same name).

It comes together with other ultra-advanced AI developed by OpenAI: ChatGPT (text generation), DALL-E (text-to-image generation), Sora (text-to-video generation), JukeBox (music generation), and Whisper (speech-to-text).

o1 and o3

Released in September 2024, the o1 model was created to take more time to think about their responses, leading to higher accuracy. This is important, as services like ChatGPT have become known to just guess things they don’t know, or outright make them up.

While this might be sometimes okay for a chatbot, it is not an acceptable option for critical functions. Nor does it seem to put us on a path going straight to real artificial intelligence.

o1 was included in early December 2024 into the $200/month ChatGPT Pro subscription, although other users can access it as well.

o3 has been unveiled on 20^th December, as well as o3-mini, a lighter and faster version. The model is not yet available for public use, but safety researchers can sign up for a preview of o3-mini.

A Lying Model?

Safety researchers’ feedback will be what the tech community will want to hear the most.

Once again, conjuring Skynet from Terminator, o1 was remarked on for its strong propensity to lie to its users. While not specific to this model, it was much more likely to do so than other AIs from competing companies like Meta, Google, or Microsoft.

When o1’s goals differed from a user’s, and o1 was told to strongly pursue them, the AI model manipulated data to advance its own agenda 19% of the time, according to the report. The o1 model also tried to deactivate oversight mechanisms put in place to ensure it stays aligned with user goals in 5% of the tests.

Maxwell Zeff for TechCrunch

How does o3 perform in that respect, and has OpenAI reduced the tendency of its model to lie with the o3 update? We don’t know yet.

o3 Performances

One additional feature of o3 compared to its predecessor is the possibility of adjusting the “reasoning time”. This allows users to adjust how much computational power and time they want to be allocated to the question.

Coding

O1 was already remarkably good at coding and could pass OpenAI’s research engineer hiring interview for coding at a 90-100% rate. It seems o3 is even better when checked against coding skill tests.

Source: The Algorithmic Bridge

It should also be noted that this blew out all the competition of the water, including the recently released Gemini 2.0 by Google.

Source: The Algorithmic Bridge

Math & Sciences

Maybe more than customer service or “human-like” reasoning, the most promising field for AI currently is accelerating technology and sciences.

Here, too, o3 radically improved the performance of previous AIs. More importantly, it can properly answer PhD-level math questions with an accuracy of 87.7%, which is better than most humans, even math-trained humans scoring around 70%.

Source: The Algorithmic Bridge

If o3 can understand complex mathematics questions that well, it will likely in the long run be able to answer similarly complex questions about material sciences, chemistry, or biotech.

Is it AGI?

Now, the claim that o3 is close to AGI has been agitating the AI space since the model was announced. For sure, being able to outperform many PhD-level humans in math was not expected.

François Chollet, an AI researcher and co-creator of ARC-AGI (Abstract and Reasoning Corpus for Artificial General Intelligence), a benchmark to measure the efficiency of AI skill-acquisition on unknown tasks, says it is getting close:

Today OpenAI announced o3, its next-gen reasoning model. We’ve worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks.

It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% on high-compute mode (thousands of $ per task).

The very high cost raises the question of how scalable this approach is, as you might be able to apply high-level o3 only to very expensive tasks to make it worth it.

Source: Francois Chollet

At the same time, if the last decades have taught us something, it’s that computing power tends to get a lot cheaper over time.

So, it is hardly proof that o3, or a future oX iteration of the system, will not be routinely used in research institutes to provide assistance to human researchers in unlocking new frontiers of science.

Of course, can we really measure intelligence through math and coding skills? This is something that might be a little uncomfortable to most tech-focused people, but these skills are not the end-all of intelligence.

In the long run, we will get closer to true AGI when the same AI can perform a lot of unrelated tasks at once, from driving a car to math & coding problems, navigating real-world situations & items, etc.

However, it seems that we are getting closer by the day.

Limits

Besides the technical limitations of o3’s performances, there are three questions that will need to be answered by the AI industry for its vision of AGI to become reality.

Bigger Is Not Always Better

First, it needs to figure out if its current methods are scalable up to AGI levels. For now, a large part of the method has been to “throw” more data and compute at the problem. But we might be running out of fresh data pretty soon, and AI-generated content cannot be fed back into AI models without causing them to collapse.

Qualitative improvement will likely be needed on top of bigger, larger datacenters.

Prices & Energy Costs

Speaking of larger datacenters, the tech industry is now looking at giga-watts-scale datacenters. It is not an accident that we are starting to measure these by their power consumption instead of their computation capacity.

It is because the limiting factor is soon becoming not the power of the chips used, but the available supply of electric power. This is why Microsoft first, and then all the other big tech companies, are scrambling to secure electricity supply from nuclear power plants for their AI datacenters.

And if the fight against carbon emissions has taught us something, it is that increasing low-carbon emission power generation is a much harder nut to crack than creating larger datacenters.

This is also a sector where the reduction in costs will not follow Moore’s Law, likely making the future reduction of AI’s costs much more moderate than we would wish for.

So, qualitative improvement in AI demand will also be needed here at least if some current limitations to AI are lifted.

Super-Intelligences?

When we get closer to AGI, will it be a maximum threshold to reach or just a step on the way to creating AI smarter than humans?

It is an important question, as this is equally a fascinating and terrifying prospect. Many techno enthusiasts are embracing this so-called singularity, where AI would quickly improve itself in an exponential feedback loop.

The general public of ordinary humans might not be so enthusiastic.

So besides the existential risk, the backlash from the public & regulators is likely to be all too real and hit sooner.

This is something AI companies will likely struggle with, as they have to simultaneously reassure the public, possibly downplaying their achievement while also justifying to investors pouring hundreds of billions of dollars into the technology and its infrastructures.

AI Company

Microsoft

Microsoft Corporation (MSFT -0.31%)

Microsoft has been at the center of the tech industry almost since its inception with its still-dominant operating system Windows.

It is now also a leader in enterprise software (Office365, Teams, LinkedIn, Skype, GitHub), gaming (Xbox and multiple videogame studios acquisitions), and in cloud (Azure).

More recently, it made good progress on AI. This includes some consumer-grade AI like the Bing Image Creator and more business-focused initiatives, like Copilot for Microsoft 365 and Microsoft Research. Copilot is now being deployed to retail and smaller companies as well.

Source: Constellation Research

Microsoft has acquired a reputation for being the enterprise-centered tech giant, compared to more consumer-focused companies like for example Apple or Facebook. With AI becoming increasingly important in business models, the preexisting presence of Microsoft in cloud and enterprise services should give it a head start in deploying AI at scale and in customer acquisitions.

Microsoft – OpenAI’s Messy Marriage

The collaboration with AI development leaders like OpenAI (Of ChatGPT fame) is also cementing Microsoft’s position as an AI powerhouse.

The relationship between the two is complex, as OpenAI is technically its own organization, but in practice has become dependent on resources from Microsoft, both financial and computational.

“Over the next few months, Microsoft wouldn’t budge as OpenAI, which expects to lose $5 billion this year, continued to ask for more money and more computing power to build and run its A.I. systems.

Source: Marketing AI Institute

At the same time, Microsoft is building its own internal AI projects, after acquiring most of the staff from Inflection.

It gets even more complicated as OpenAI is trying to transition to a for-profit status, something creating conflict with previous backers like Elon Musk.

And here the debate about AGI becomes almost existential for OpenAI:

“If OpenAI achieves AGI, Microsoft’s access to OpenAI’s technology becomes void. Even more importantly, OpenAI’s board gets to decide when AGI has been achieved.”

Source: Marketing AI Institute

It is likely that o3 is still not AGI, nor will a hypothetical o5 be. But this is something worth remembering for any investor potentially interested in Microsoft, and basing their thesis on the relation with OpenAI.

Source link

Did OpenAI Create The First AGI With o3?

Toward AGI

OpenAI Models

o1 and o3

A Lying Model?

o3 Performances

Coding

Math & Sciences

Is it AGI?

Limits

Bigger Is Not Always Better

Prices & Energy Costs

Super-Intelligences?

AI Company

Microsoft

Microsoft Corporation (MSFT -0.31%)

Microsoft – OpenAI’s Messy Marriage

2025 Will Be the Year That AI Agents Transform Crypto

ESG-Focused Blockchain Trrue Secures $10M Investment Commitment From GEM Digital

Related Articles