APAC Companies Shift AI Inference to the Edge as Costs and Latency Pressures Grow

APAC Companies Shift AI Inference to the Edge as Costs and Latency Pressures Grow

Asia Pacific businesses are investing heavily in artificial intelligence, yet many still find it hard to turn that spending into real value. A growing number of industry studies point to the same issue. The infrastructure behind these projects is often not designed to support the speed, scale, or responsiveness that modern AI workloads require. As a result, even well-funded initiatives fall short of their expected return.

Akamai believes it has an answer. The company is rolling out its Inference Cloud, built with NVIDIA hardware and powered by Blackwell GPUs, to bring AI processing closer to users. The logic is straightforward. If AI tools need to make fast decisions, the computation should happen near the people or devices that depend on them, not in distant data centers. Akamai says this approach can cut costs, reduce latency, and support applications that rely on real-time responses.

Jay Jenkins, Akamai’s CTO of Cloud Computing, says this shift reflects a deeper change in how enterprises are using AI. The industry is discovering that inference, not training, is now the real bottleneck.

Jay Jenkins, CTO of Cloud Computing at Akamai

The Infrastructure Gap Slowing AI Projects

Jenkins notes that many organizations underestimate the jump from pilot projects to full deployment. It is one thing to experiment with generative AI in a controlled environment and another to run models at scale every day. Expenses climb quickly, especially for companies dependent on centralized cloud setups and large GPU clusters. Latency also becomes a problem when models need to run multiple inference steps over long distances.

According to Jenkins, performance issues can erode the user experience and reduce the business impact leaders expect. Multi-cloud complexity, strict data requirements, and varying compliance rules across Asia Pacific widen the gap further.

Inference Takes Center Stage

Across the region, businesses are moving beyond small trials and integrating AI directly into products and services. That means inference is happening constantly. Whether it is language processing, computer vision, or multimodal workloads, the computing load has grown faster than most teams expected.

This is especially true in markets with multiple languages, local regulations, and varied data environments. Centralized infrastructure struggles to deliver the sub-second responsiveness many services now require.

Why Edge Infrastructure Matters

Running inference closer to the source of data can reshape performance and cost. It cuts travel time for information and reduces the need to route massive volumes of data through major cloud hubs. This is crucial for physical AI systems like robots, autonomous machines, or smart city tools that rely on millisecond-level decisions.

Akamai’s internal analysis shows that customers in India and Vietnam saw meaningful reductions in the cost of running image-generation workloads at the edge. Improved GPU utilization and lower egress fees played a big role.

Industries Leading the Shift

Retail and e-commerce are among the first movers. Slow responses can push customers away, so tools like personalized recommendations, search, and multimodal shopping systems benefit immediately from local inference.

Financial services are also accelerating adoption. Fraud detection, payment approvals, and transaction scoring involve chains of rapid AI decisions. Running these workloads at the edge helps firms act faster and keep sensitive data within national borders.

Cloud and GPU Partnerships Take on New Importance

As demand grows, cloud providers and hardware manufacturers are forming closer partnerships to deliver AI-ready infrastructure. Akamai’s collaboration with NVIDIA is a prominent example, with GPUs, DPUs, and AI software positioned in thousands of edge locations.

This distributed model supports both performance and compliance. With nearly half of large APAC enterprises struggling to navigate differing data rules across markets, local processing is becoming essential. Security features like zero-trust controls, data-aware routing, and bot protection are now built in by default.

Preparing for Agentic AI and Automation

Agentic systems run many decisions in sequence, which requires ultra-low latency. Jenkins says the diversity of the APAC region makes this challenging but achievable. Countries vary widely in connectivity and regulations, so AI systems need to run wherever conditions allow. Research suggests most enterprises already rely on public cloud, but many expect to adopt more edge services by 2027.

This transition will require infrastructure that can store data locally, route tasks to the nearest suitable location, and continue operating even when networks are unstable.

What Companies Should Plan For

As inference moves to the edge, organizations will need new processes for managing distributed AI. Models will be updated across multiple locations, and teams will need visibility into performance, cost, and errors across core and edge systems.

Data governance will evolve as well. Keeping data closer to where it originates can simplify compliance in a region with many regulatory frameworks. Security, however, becomes more complex. Every edge site must be protected, and companies need strong defenses for APIs, data pipelines, and fraud prevention.

Read more