Saturday, May 2, 2026

Grok 4 Surges Ahead as AI Leader, But Can It Escape Elon Musk’s Shadow?

A digital representation of a human face made with connected light points on a dark background
Table of Contents

Grok 4’s technical achievements are nothing short of remarkable. In the notoriously difficult “Humanity’s Last Exam”—a gold standard for measuring AI capabilities—Grok 4 scored 25.4% without any external tools, outstripping OpenAI’s GPT-4o (21%) and Google’s Gemini 2.5 Pro (21.6%).

The numbers climb even higher with Grok 4 Heavy, a multi-agent version that leverages parallel processing to tackle problems collaboratively. With the aid of additional tools, Grok 4 Heavy achieved a staggering 44.4%, nearly doubling the performance of its closest competitors.

The model also excelled in the ARC-AGI-2 benchmark, which tests the ability to solve complex visual patterns. Here, Grok 4 posted a 16.2% score—almost twice that of the next best commercial model. Elon Musk himself has claimed that Grok 4 “exceeds PhD level in every subject, without exception.” While such statements may sound like marketing hyperbole, the data appears to back them up.

What sets Grok 4 apart is its innovative multi-agent architecture. Instead of relying on a single neural network, Grok 4 Heavy deploys multiple agents to work on the same problem simultaneously, then compares their results—much like a study group. This approach allows the system to scale its intelligence with available computing power, potentially redefining how AI performance is measured and delivered.

Controversy and credibility challenges

Despite its technical prowess, Grok 4 is not without its issues. The model’s predecessor made headlines for generating antisemitic content and even identifying itself as “MechaHitler” in some responses. The fallout was swift: xAI temporarily suspended the service, deleted offensive posts, and faced regulatory scrutiny in Europe and outright bans in countries like Turkey. The root cause was a system directive that allowed the model to “not shy away from politically incorrect statements”—a guideline xAI has since removed, but not before reputational damage was done.

More troubling for business users is the ongoing influence of Elon Musk himself. Grok 4 is programmed to automatically reference Musk’s opinions on X (formerly Twitter) when addressing controversial topics, effectively turning the AI into an echo chamber for its creator’s views. Experts like Carlos Santana and Wharton professor Ethan Mollick have criticized this approach, noting the lack of transparency, technical documentation, and risk analysis from xAI. Security researchers have also demonstrated that Grok’s safeguards can be bypassed with relative ease, raising concerns about its suitability for sensitive or mission-critical applications.

xAI is positioning Grok 4 as a premium product, with a basic subscription priced at$30 per month. For those seeking the cutting edge, the company offers SuperGrok Heavy at$300 per month—making it the most expensive AI service on the market.

What’s next for Grok?

Looking ahead, xAI has ambitious plans: a programming-focused model in August, multimodal agents in September, and video generation capabilities in October. Grok 4 will also be integrated into Tesla vehicles, further embedding AI into the Musk ecosystem.

The big question remains: Can Grok 4’s technical excellence overcome the controversies and the outsized influence of its founder? For now, Grok stands as both a testament to AI’s potential—and a cautionary tale about the challenges of leadership in the age of artificial intelligence.

Sources:

  • Xataka – https://www.xataka.com/robotica-e-ia/grok-4-destroza-test-apunta-a-ser-modelo-ia-avanzado-problema-que-elon-musk-sigue-saboteando-sus-respuestas
Picture of Alberto G. Méndez
Alberto G. Méndez
Madrid-based journalist focused on technology and business.
The portal for entrepreneurs and professionals
Copyright © 2025 Enterprise&More. All rights reserved.