Great news for xAI: Grok is now pretty good at answering questions about Baldur’s Gate

by dharm
February 20, 2026 · 7:10 PM
In this photo illustration, the logo of 'OpenAI' is displayed on a mobile phone screen in front of a computer screen displaying the photograph of Elon Musk.


Different AI labs have different priorities. OpenAI has traditionally focused on consumer users, for instance, while its rival Anthropic tends to target enterprises. Elon Musk’s xAI, we discovered recently, has been placing particular emphasis on video-game walkthroughs.

On Friday, Business Insider’s Grace Kay published a detailed and far-reaching report about xAI, the AI startup recently acquired by SpaceX, with particular emphasis on how Musk is making life difficult for employees. But this particular anecdote stood out:

In one instance last year, a model release was delayed for several days because Musk was dissatisfied with how the chatbot answered detailed questions about the video game “Baldur’s Gate,” according to people familiar with the matter. High-level engineers were pulled from other projects to improve the responses before launch, they said.

Of course, you can imagine the frustration of any respected and experienced engineer who shows up to work thinking he’ll be tackling fundamental problems of knowledge and machine intelligence, only to be sidetracked into helping a 54-year-old man beat his video game. But the anecdote raises an even more pressing question: Did Musk end up getting the gaming skills he wanted?

To answer that question, our resident RPG-enthusiast Ram Iyer put together a set of five general questions about Baldur’s Gate, which we ran against xAI and the three major models in a kind of quasi-benchmark that I’ve decided to call BaldurBench.

In the interest of journalistic transparency, I’ve made all the chat transcripts public, so you can see them here: Grok, ChatGPT, Claude, and Gemini.

First, the good news: Grok actually gives pretty good information. Its responses were a bit dense with gamer jargon — “save-scumming” instead of saving and “DPS” instead of damage — but the answers were both useful and well-informed, provided you knew what it was talking about. Grok also really loves tables and theorycraft, which is about what you would expect.

There are lots of Baldur’s Gate guides out there and the models were generally drawing from the same ones, so the biggest differences were stylistic. ChatGPT prefers bulleted lists and sentence fragments, while Gemini loves to bold important words.

Techcrunch event

Boston, MA
|
June 9, 2026

The biggest surprise was Claude, which was particularly concerned about giving me information that would spoil my experience of the game. When I asked about good party compositions, it closed the guidance by saying “don’t stress too much and just play what sounds fun to you.” Thanks, Claude!

It’s important to bear in mind, this is a subject area we know (thanks to Business Insider’s reporting) that xAI has specifically focused on reaching parity. So we shouldn’t read too much into the fact that, after the reported sprint, Grok’s advice turned out about the same as the other models. Still, it’s nice to know xAI can make it work if it tries.

 

⚠️ Disclaimer: All information provided on MyCabiz is published in good faith for general informational purposes only. MyCabiz does not make any warranties regarding the accuracy or completeness of the information and shall not be held liable for any losses arising from its use. Financial markets are subject to risk, and users are advised to consult a SEBI-registered financial advisor prior to making any investment decisions. Past performance is not a reliable indicator of future outcomes.

Suggested Topics: