Samsung’s Tiny AI Model Outperforms Massive Language Models in Complex Reasoning

Samsung’s Tiny AI Model Outperforms Massive Language Models in Complex Reasoning

In a surprising twist to the AI arms race, Samsung researchers have unveiled a miniature model that outsmarts some of the largest language models in reasoning tasks — all while using a fraction of the computational power.

The Tiny Recursive Model (TRM), developed by Alexia Jolicoeur-Martineau of Samsung SAIL Montréal, is shaking up long-held beliefs that bigger always means better in artificial intelligence. The model contains just 7 million parameters — less than 0.01% the size of today’s leading large language models (LLMs) — yet it delivers state-of-the-art results on challenging reasoning benchmarks like ARC-AGI, which tests an AI’s ability to solve abstract problems similar to human intelligence.

Rethinking AI’s Growth Obsession

The industry has spent years scaling up models to achieve smarter and more capable systems, but that approach has its limits. While LLMs excel at generating fluent text, they often falter on multi-step reasoning tasks where one early mistake can throw off an entire solution. Methods like Chain-of-Thought prompting, which encourage models to reason step by step, improve accuracy but require enormous data and compute resources — and still don’t guarantee flawless logic.

Samsung’s TRM offers an alternative: smarter reasoning through recursion rather than scale.

How Samsung’s Tiny Recursive Model Works

Building on earlier research from the Hierarchical Reasoning Model (HRM), TRM simplifies the concept of recursive reasoning. Instead of using two networks to refine answers at different stages, TRM relies on a single, compact network that repeatedly revises both its internal reasoning and final answer.

Here’s how it works:
The model receives a question, an initial guess, and an internal reasoning state. It refines its reasoning across several cycles, then updates its prediction based on that improved logic. This process repeats up to 16 times, allowing the system to gradually correct itself — much like a person double-checking their work.

Interestingly, the smaller two-layer version of TRM outperformed deeper variants, suggesting that a leaner design helps the model generalize better and avoid overfitting — a common issue in small-data scenarios.

Record-Breaking Results with Minimal Resources

The numbers tell the story. On the Sudoku-Extreme benchmark, TRM achieved 87.4% accuracy, up from HRM’s 55%. On Maze-Hard, which tests an AI’s ability to navigate complex paths, TRM scored 85.3%, again beating HRM’s 74.5%.

But the most striking result came from the ARC-AGI benchmark — a tough test for abstract reasoning. TRM reached 44.6% accuracy on ARC-AGI-1 and 7.8% on ARC-AGI-2, surpassing both its larger HRM predecessor and even some of the world’s biggest commercial AI models, such as Gemini 2.5 Pro, which scored only 4.9% on ARC-AGI-2.

Samsung also improved training efficiency by simplifying the Adaptive Computation Time (ACT) mechanism, cutting unnecessary computational steps without sacrificing performance.

A New Path for AI Development

The TRM’s success challenges the prevailing notion that only massive models can achieve top-tier reasoning. Instead, Samsung’s findings highlight a new direction in AI design — one focused on recursive, self-correcting architectures that make better use of limited resources.

By showing that intelligence doesn’t have to come at an enormous computational cost, TRM opens the door to more sustainable, efficient AI systems — a development that could reshape how the entire industry thinks about progress.

Read more