Elon Musk’s AI company, xAI, late on Monday released its latest flagship AI model, Grok 3, and unveiled new capabilities for the Grok iOS and web apps.
Grok, xAI’s answer to models like OpenAI’s GPT-4o and Google’s Gemini, can analyze images and respond to questions, and powers a number of features on Musk’s social network, X. Grok 3, which has been in development for several months, was optimistically slated for release in 2024, but missed that deadline.
Monday’s is an ambitious launch.
xAI has been using an enormous data center in Memphis containing around 200,000 GPUs to train Grok 3. In a post on X, Musk claimed that Grok 3 was developed with “10x” more computing power than its predecessor, Grok 2, using an expanded training data set that ostensibly includes filings from court cases.
“Grok 3 is an order of magnitude more capable than Grok 2,” Musk said during a live-streamed presentation on Monday. “[It’s a] maximally truth-seeking AI, even if that truth is sometimes at odds with what is politically correct.”
Grok 3 is a family of models, to be precise. A smaller version of Grok 3, Grok 3 mini, responds to questions more quickly at the cost of some accuracy. Not all the models and related features of Grok 3 are available yet (some are in beta), but they began rolling out on Monday.
xAI claims Grok 3 beats GPT-4o on benchmarks including AIME (which evaluates a model’s performance on a sampling of math questions) and GPQA (which assesses models using PhD-level physics, biology, and chemistry problems). An early version of Grok 3 also scored competitively in Chatbot Arena, a crowdsourced test that pits different AI models against each other and has users vote on their preferred responses, according to xAI.

Two models in the new Grok 3 family, Grok 3 Reasoning and Grok 3 mini Reasoning, can carefully “think through” problems, similar to “reasoning” models like OpenAI’s o3-mini and Chinese AI company DeepSeek’s R1. Reasoning models try to fact-check themselves before giving out results, which helps them avoid some of the pitfalls that normally trip up models.
xAI claims that Grok 3 Reasoning surpasses the best version of o3-mini — o3-mini-high — on several popular benchmarks, including a newer mathematics benchmark called AIME 2025.

These reasoning models can be accessed via the Grok app. Users can ask Grok 3 to “Think,” or — for more difficult queries — leverage “Big Brain” mode for reasoning that employs additional computing. xAI describes the reasoning models as best suited for mathematics, science, and programming questions.
Musk said some of the reasoning models’ “thoughts” are obscured in the Grok app to prevent distillation, a method used by AI model developers to extract knowledge from other models. Recently, DeepSeek was accused of distilling OpenAI’s models to create its own.
Grok’s reasoning models underpin a new feature in the Grok app called DeepSearch, xAI’s answer to AI-powered research tools like OpenAI’s deep research. DeepSearch scans the internet and X to analyze information and deliver an abstract in response to a question.
Subscribers to X’s Premium+ tier ($50 per month) will get access to Grok 3 first, and other features will be gated behind a new plan that xAI’s calling SuperGrok. Priced at $30 per month or $300 per year (if leaks are to be believed), SuperGrok unlocks additional reasoning and DeepSearch queries, and throws in unlimited image generation.

In the future — as soon as about a week from now — the Grok app will gain a “voice mode,” Musk said, which will give Grok models a synthesized voice. A few weeks after that, Grok 3 models will be available via xAI’s enterprise API, along with the DeepSearch capability.
xAI plans to open-source Grok 2 in the coming months, Musk said.
“Our general approach is that we will open-source the last version [of Grok] when the next version is fully out,” he continued. “When Grok 3 is mature and stable, which is probably within a few months, then we’ll open-source Grok 2.”
When Musk announced Grok roughly two years ago, he pitched the AI model as edgy, unfiltered, and anti-“woke” — in general, willing to answer controversial questions other AI systems won’t. He delivered on some of that promise. Told to be vulgar, for example, Grok and Grok 2 would happily oblige, spewing colorful language you likely wouldn’t hear from ChatGPT.
But Grok models prior to Grok 3 hedged on political subjects and wouldn’t cross certain boundaries. In fact, one study found that Grok leaned to the political left on topics like transgender rights, diversity programs, and inequality.
Musk has blamed the behavior on Grok’s training data — public web pages — and pledged to “shift Grok closer to politically neutral.” It’s not yet clear whether xAI has achieved that goal, and what the consequences might be.