Anybody Can AI
Posts
ChatGPT loses at chess by 1970s Atari 2600 Chess

ChatGPT loses at chess by 1970s Atari 2600 Chess

PLUS: Meta’s LLaMA 3.1 Remembers 42% of Harry Potter

Jenil Soni
June 19, 2025

ChatGPT “Absolutely Wrecked” by 1970s Atari 2600 Chess

Citrix engineer Robert Caruso pitted ChatGPT 4 (via text-based emulator) against Atari’s 1979 Video Chess (Atari 2600) and, unexpectedly, the chatbot lost decisively. Despite attempts to help—providing board layouts and switching to standard notation—ChatGPT repeatedly misidentified pieces and made rookie blunders like missing pawn forks, confusing rooks with bishops, and losing track of its own pieces. After about 90 minutes of gameplay riddled with errors, ChatGPT conceded defeat.

Key Points:

Language Models vs Purpose-Built Engines - Vintage chess software like Atari’s was designed for precision move evaluation—even with minimal compute power—while ChatGPT, as a language model, isn’t optimized for game-state tracking or rule enforcement .
Persistent Errors - ChatGPT continued to make elementary mistakes despite being corrected, even claiming it would play better if restarted—a reminder that LLMs lack structured reasoning in sequential tasks .
A Humbling Reminder - This match highlights that specialized tools—like chess engines—still outperform general AI on their dedicated tasks, underscoring the continued importance of domain-specific systems

Conclusion
While ChatGPT excels in text, conversation, and general reasoning, this humbling defeat shows it’s not the right tool for every job. For structured, rule-based tasks—like chess—it’s still no match for even early single-purpose engines. This story underscores why combining specialized systems with LLMs remains the smart strategy: use the right tool for the job.

Meta’s LLaMA 3.1 Remembers 42% of Harry Potter

Research from Stanford, Cornell, and West Virginia University reveals that Meta’s LLaMA 3.1‑70B can reproduce about 42% of the first Harry Potter book, generating 50-token excerpts with over 50% accuracy—even when prompted with the preceding context.

Key Points

Massive Memorization Spike
LLaMA 1 (65B) retained just 4.4% of the book, but LLaMA 3.1 (70B) skyrocketed to 42%—suggesting greater memorization with larger training volumes.
Popular Books Memorized More - LLaMA 3.1 also recited large sections of other famous works like The Hobbit and 1984, while recalling just 0.13% of lesser-known titles—indicating a memorization bias toward widely quoted texts.
Copyright & Fair Use Reignited - The findings complicate copyright debates: LLaMA’s ability to recite long verbatim passages could raise new legal questions and impact ongoing litigation and open-source model policies.

Conclusion

LLaMA 3.1’s capacity to recall nearly half of Harry Potter challenges the notion that memorization is rare in large language models. This highlights the trade-offs between model scale and copyright risk—and raises the question: how should future open models balance knowledge retention with ethical and legal safeguards?

🚀 Other AI updates to look out

“Kangaroo” Revealed as Hailuo 02—#2 Video Model

A mysterious top-performing video model dubbed “Kangaroo” on Artificial Analysis was recently identified as Hailuo 02 (0616) from MiniMax. It now ranks #2, just behind ByteDance’s Seedance 1.0 and ahead of Google’s Veo 3.

Trump Admin’s AI.gov Initiative Leaked via GitHub

Details of a Trump-era “whole-government AI initiative” surfaced on a now-deleted GitHub repo, dubbed AI.gov, with plans to launch a federal AI hub on July 4, 2025. Managed by the GSA’s tech team, it includes an AI chatbot, an all-in-one API linking agencies to AI models (OpenAI, Google, Anthropic), and a dashboard called “CONSOLE” for usage monitoring. The leak has sparked serious concerns about privacy, security, and automation of federal jobs.

Alta Raises $11M for AI Personal Styling Agent

Alta, founded by Jenny Wang, has secured $11 million in seed funding led by Menlo Ventures to develop an AI-driven personal styling agent. The app uses AI to generate custom outfit suggestions tailored to users’ wardrobes, occasions, budgets, weather, and lifestyle—and even supports virtual try-ons via avatars .

Thankyou for reading.