China is advancing a new approach to chip design that aims to counter US export controls and reduce its reliance on foreign semiconductor technology. The strategy, known as chip stacking, focuses on vertically layering older, domestically produced chips to boost performance. The question gaining global attention is whether this method can meaningfully challenge Nvidia’s lead in AI computing.
At the heart of the idea is a simple shift in direction: if China cannot produce cutting edge chips, it will try to build more capable systems with the chips it can still manufacture. Wei Shaojun, vice president of the China Semiconductor Industry Association and a professor at Tsinghua University, recently explained an architecture that merges 14 nanometer logic chips with 18 nanometer DRAM through three dimensional hybrid bonding. These specifications are important because they sit just outside the targets of current US export restrictions, placing the concept within reach for Chinese fabs.

The technique relies on bringing memory and compute much closer together. Instead of data constantly bouncing between processor and external memory, the stacked design places them almost side by side. Hybrid bonding allows direct copper connections at tiny sub 10 micrometre distances, cutting down the delays that slow conventional chip layouts.
Wei has suggested that this setup could offer competitive performance while consuming less power. He cited figures of around 120 teraflops and efficiency near 2 teraflops per watt. But even with these gains, the gap remains wide. Nvidia’s A100 GPU delivers up to 312 teraflops, underscoring the challenge of matching hardware built on far more advanced process nodes. New architectures can reduce some bottlenecks, but they cannot erase the efficiency and density advantages that come with modern 4nm manufacturing.
Still, China’s interest in chip stacking goes beyond raw speed. Huawei founder Ren Zhengfei has emphasized the value of clustering and stacking chips instead of trying to win a race at the process level. With TSMC and Samsung pushing to 3nm and beyond, China is choosing a path where it can differentiate through design, packaging, and software.
Another factor is Nvidia’s CUDA ecosystem, which has become the backbone of modern AI development. Competing directly with CUDA would require years of software investment and widespread developer adoption. By proposing a different computing model, China aims to bypass this dependence rather than duplicate it.
The strategy has promise, but obstacles remain. Stacked chips generate more heat, and older nodes like 14nm run hotter than modern alternatives. Managing that heat in a vertical layout is complex. Manufacturing yields are also harder to control, since defects in any layer can ruin the full stack. And even if the hardware matures, the software ecosystem for these architectures will take time to build.
The most realistic near term use cases are workloads where memory bandwidth is more important than peak compute. Some AI inference tasks, data analysis, and specialized applications could benefit from the design’s strengths. Matching Nvidia’s performance across general AI training is still a distant target.
What this shift does show is China’s commitment to finding creative ways around supply chain barriers. Instead of trying to reproduce Western chipmaking at a disadvantage, it is exploring architectural innovation where export controls have less influence. That makes chip stacking a meaningful development in the global AI hardware race, even if it is not yet a direct threat to Nvidia’s leadership.
As the semiconductor landscape evolves, this approach highlights how competition may expand beyond process nodes to include system design, packaging, and integrated software strategies.