Elon Musk’s Grok AI Loses Antisemitism Test, Wins Participation Trophy

Photo by NASA Hubble Space Telescope on Unsplash

KEY POINTS

•The Anti-Defamation League conducted a January 2026 study testing six large language models on antisemitism recognition.
•xAI’s Grok performed the worst compared to ChatGPT, Llama, Gemini, DeepSeek, and Claude, which scored the highest.
•The ADL categorized content under three buckets and noted all models still need improvements for better moderation.

In a stellar display of algorithmic humility dated January 2026, the Anti-Defamation League (ADL) rated six major large language models to see who’s best at spotting antisemitic content. Elon Musk’s xAI Grok charmingly stumbled to the bottom of the heap, bested by OpenAI's ChatGPT, Meta's Llama, Google's Gemini, and even DeepSeek, with Anthropic’s Claude sailing to the top of this very specific leaderboard. The ADL categorized offensive material under 'anti-Jewish,' 'anti-Zionist,' and 'extremist,' revealing even the champion bots have plenty of blind spots—lending hope that AI can fumble towards better political correctness in 2026. For those keeping score at home, Grok’s performance perfectly parallels a high school talent show solo: brave but needing a lot more practice.

Share the Story

(1 of 3)

Mockingbird News

@MockingbirdNews · now

Elon Musk’s Grok AI managed to flunk antisemitism detection against seasoned chatbots, proving even billionaire-backed tech can’t skip sensitivity training.

173/280 charactersReady to post

Mockingbird News

@MockingbirdNews · now

When xAI’s Grok fails at spotting hate speech, you gotta wonder if its training data was just Musk’s Twitter timeline.

135/280 charactersReady to post

Mockingbird News

@MockingbirdNews · now

Anthropic’s Claude metadata reads like it’s had more therapy sessions than Grok, and the scores show it.

121/280 charactersReady to post