China’s new AI Kimi K2 Thinking Outperforms GPT‑5 in Landmark AI Benchmark

China’s Moonshot AI has entered the global AI arena with its latest model, Kimi K2 Thinking, claiming to surpass OpenAI’s GPT-5 and Anthropic’s Claude Sonnet 4.5 on several high-profile benchmarks. Unlike the multi-billion-dollar models developed in the U.S., Kimi K2 Thinking reportedly cost under $5 million to train, demonstrating that smaller players can make significant advances in artificial intelligence at a fraction of the cost.

Based in Beijing, Moonshot AI unveiled Kimi K2 Thinking as an evolution of its earlier Kimi K2 model. The model uses a Mixture-of-Experts (MoE) architecture, composed of specialized sub-networks that collaborate to solve complex tasks. It is designed for advanced reasoning, planning, and problem-solving, capable of executing multi-step processes, verifying its own outputs, and using tools like web browsers to refine its responses. Benchmarks such as Humanity’s Last Exam, BrowseComp, and Seal-0 indicate that the model performs exceptionally well on reasoning, analytical challenges, and information retrieval.

On the critical benchmark “Humanity’s Last Exam” (HLE), which comprises 2,500 expert‑level questions across domains, the model achieved a score of 44.9%, compared with GPT‑5’s 41.7%

With one trillion parameters, Kimi K2 Thinking is accessible via Hugging Face, marking a rare instance of a highly capable AI being made open source. This openness contrasts with the paywalled, proprietary nature of most Western models, allowing developers globally to experiment with advanced AI capabilities without costly barriers.

The model’s arrival has intensified debates about the shifting balance of power in AI development. While the U.S. has long led the field with massive investment and proprietary systems, China’s combination of lower training costs, looser regulations, and innovative engineering presents a credible challenge. Nvidia CEO Jensen Huang noted China’s strategic advantages in energy costs and AI development, although he later clarified that the U.S. remains slightly ahead.

Moonshot’s release signals that the AI race is no longer purely about resource-heavy models; efficiency, architecture, and open access are increasingly influential. If Kimi K2 Thinking consistently matches or outperforms GPT-5, it could reshape perceptions of where the next era of AI innovation will emerge and who will define global AI standards.