China’s Moonshot AI Surpasses GPT-5 and Claude Sonnet 4.5, Redefining the Global AI Race

China’s Moonshot AI Surpasses GPT-5 and Claude Sonnet 4.5, Redefining the Global AI Race

A new Chinese artificial intelligence model has shaken up the global tech landscape. Beijing-based Moonshot AI has unveiled its Kimi K2 Thinking model, which outperformed OpenAI’s GPT-5 and Anthropic’s Claude Sonnet 4.5 in several benchmark tests—sparking debate about whether the balance of AI innovation is beginning to tilt toward China.

Valued at around US$3.3 billion and backed by tech giants Alibaba Group and Tencent Holdings, Moonshot released Kimi K2 Thinking on November 6 as a fully open-source model. Analysts are calling the moment another “DeepSeek moment,” referring to an earlier milestone when Chinese developers disrupted expectations about the cost of building frontier AI systems.

Breaking Performance Records

According to the company’s GitHub release, Kimi K2 Thinking scored 44.9% on Humanity’s Last Exam—a rigorous reasoning benchmark of 2,500 questions—surpassing GPT-5’s 41.7%. The model also led the BrowseComp benchmark, which measures web-browsing and information-seeking capabilities, with a 60.2% score, and ranked first in the Seal-0 research benchmark with 56.3%.

Kimi K2 Thinking
Kimi K2 Thinking, Moonshot’s best open-source thinking model.

Industry observers say the results are significant because Kimi K2’s open weights allow full public access, while matching or exceeding the performance of proprietary models like GPT-5. VentureBeat described it as “a turning point” for open-source AI, where the performance gap between public and closed models “has effectively collapsed.”

Affordable Power: A Cost Revolution

Adding to the surprise, CNBC reported that Kimi K2 Thinking was trained for only US$4.6 million—a fraction of what U.S. firms reportedly spend. Estimates from the South China Morning Post suggest the model’s API costs are six to ten times lower than those of OpenAI and Anthropic.

The model uses a Mixture-of-Experts architecture with one trillion parameters, of which 32 billion are active per inference, and employs INT4 quantization for roughly double the generation speed without sacrificing accuracy.

Hugging Face co-founder Thomas Wolf commented that Kimi K2 Thinking was another example of an open model outperforming closed systems.

“Is this another DeepSeek moment?” he asked. “Should we expect one every few months now?”

Technical Strengths and Ongoing Gaps

Moonshot researchers claim the model sets “new records” across reasoning, coding, and agentic tasks. It can autonomously execute 200–300 sequential tool calls, maintaining coherent reasoning over hundreds of steps—an ability that impressed independent testers.

Consultancy Artificial Analysis found Kimi K2 achieved 93% accuracy in its Tau-2 Bench Telecom test, the highest result it has recorded. Still, Nathan Lambert of the Allen Institute for AI noted that while open models like Kimi K2 are closing the gap, top closed systems remain roughly four to six months ahead in raw performance.

Competing Through Cost and Innovation

Chinese experts say the country’s strategy now centers on cost efficiency and architectural innovation rather than sheer computing power. IT system architect Zhang Ruiwang observed that since U.S. models still lead in overall capability, Chinese firms must “compete through cost-effectiveness.”

According to Zhang Yi of iiMedia Research, training costs for Chinese AI models have dropped sharply thanks to improved data curation and model design.

“It marks a shift away from the old approach of simply stacking more GPUs,” he said.

The model’s Modified MIT License allows full commercial and derivative use, with one condition: any deployer serving over 100 million monthly users or earning more than US$20 million per month must display the Kimi K2 name on the product interface.

A Turning Point in Global AI

Industry reaction has been swift. Menlo Ventures partner Deedy Das called the release “a turning point in AI,” noting that “a Chinese open-source model is now #1.” Lambert added that the rise of Moonshot and DeepSeek has put “serious pricing pressure” on Western developers.

Together with peers like DeepSeek, Qwen, and Baichuan, Moonshot is helping redefine how China competes in the AI race—not by matching the U.S. model-for-model, but by innovating in efficiency, accessibility, and cost.

Whether this signals a lasting shift or a temporary convergence remains to be seen. But the message is clear: the global AI race is no longer a one-sided contest, and the era of affordable, open, high-performance models has arrived.

Read more