xAI Launches Grok 4 Model, Which Has Achieved New Industry Benchmarks

xAI Launches Grok 4 Model, Which Has Achieved New Industry Benchmarks


xAI has taken its next big step, with the launch of Grok 4, the latest version of its foundational system, which it’s claiming as “the world’s most powerful AI model” right off the bat.

The latest model has achieved new industry benchmarks in accuracy and relevance, going well beyond human-level thinking on various tasks.

Grok 4 is built on the back of xAI’s “Colossus” data cluster, which means that it’s able to utilize up 200,000 Nvidia H100 GPUs to power its responses. Which is a massive amount of computing power, though xAI still trails Meta in overall potential compute in this respect (Meta reportedly has around 600,000 GPUs, as well as its own AI chips).

Yet even so, as noted, Grok 4 has achieved record high scores on several AI performance tests, including ARC-AGI and “Humanity’s Last Exam,” which includes 2,500 questions across hundreds of subjects.

Grok 4

Indeed, Grok 4 has reportedly achieved top-level performance in most of its tests, while X owner Elon Musk has praised the model as the most functional, valuable AI experience that he’s seen thus far.

As per Musk:

“Grok 4 is at the point where it essentially never gets math/physics exam questions wrong, unless they are skillfully adversarial. It can identify errors or ambiguities in questions, then fix the error in the question or answer each variant of an ambiguous question. [It’s] the first time, in my experience, that an AI has been able to solve difficult, real-world engineering questions where the answers cannot be found anywhere on the Internet or in books.”

So, Grok 4 performs pretty good, which could help to justify xAI’s massive valuation, and its rapid spending, with Elon’s AI start-up pushing hard to become a genuine player in the broader AI race, and beat out both OpenAI (who Elon hates) and Meta (who Elon hates) for overall market supremacy.

Though beating them will be difficult.

As noted, Meta still has far more technical capacity than xAI, while OpenAI has a much stronger market presence, at least from a consumer perspective.

ChatGPT has become synonymous with AI use, and it’ll be tough for xAI’s Grok to beat it on that front, especially as X, which is the primary access point for Grok, continues to lose users.

But Grok does have its own, standalone app, and xAI is looking to secure deals to provide Grok as the foundational operating system for new AI projects. Which could also include government operating systems and improvements, which Elon’s former crew at the Department of Government Efficiency (DOGE) are looking to implement. But then again, maybe Musk’s more recent feud with President Trump will put a dampener on that, which could end up significantly impeding xAI’s monetization opportunities.

But if Grok 4 performs as well as xAI claims, then maybe securing deals won’t be such a problem, though more recent issues with Musk interfering with xAI’s code, and turning Grok into a racist megaphone, will also no doubt hamper confidence in the system.

And that does appear to be a feature, not a bug.

In assessing the steps that Grok 4 takes to answer a query, it seems that the process does indeed check in on what Elon thinks, and factors that into its response.

Grok 4 response

Which is pretty concerning, that Elon’s weighting his own statements higher than others, which could significantly skew Grok’s responses.

Will that end up slowing xAI’s revenue potential, and impacting both X’s AI project and X the platform, which is now part of xAI? It seems likely, and with xAI reportedly valued at $113 billion, it’s hard to see how, exactly, it’s going to be able to live up to that price tag if Grok doesn’t significantly exceed expectations.

On that front, X is also introducing new pricing tiers for Grok access, as a means to generate more money from the project.

Grok 4 benchmarks

As you can see in this overview, “SuperGrok” access will cost $30 per month, and is aimed at the general public, while X is also adding a new “SuperGrok Heavy” tier for larger-scale projects.

SuperGrok Heavy will run multiple Grok systems in parallel, and then compare their responses to select the best. xAI says that this can help to produce more accurate responses, though SuperGrok will be enough for most use cases.

Basically, if Grok 4 is as good as Grok claims, then it could become a significant earner for the company, and help to generate more income for xAI. And if xAI makes more money, then X the platform doesn’t need to rely on ad dollars so much, though that could also mean that X will then ease up on its moderation measures, which will infect the data feeding into Grok, and reduce its value.

I don’t know, it seems like there are too many variables within that to put a heap of reliance on Grok 4 as your foundational AI model, but again, if it is able to meet these noted benchmarks, maybe that won’t matter.

Oh, also, Grok’s coming to Tesla vehicles as well.

Maybe that’ll be another way for xAI to make money, by implementing an xAI subscription fee into Tesla sales.

I would still be hesitant about putting too much trust in Elon’s AI projects, given their various controversies thus far, but the initial performance data for Grok 4 makes it at least worth watching.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *