Akamai Technologies unveiled the first global-scale implementation of NVIDIA AI Grid reference design. By integrating NVIDIA AI infrastructure into Akamai?s infrastructure, and leveraging intelligent workload orchestration across its network, Akamai intends to move the industry beyond isolated AI factories toward a unified, distributed grid for AI inference. The move marks a significant step in the evolution of Akamai?s Inference Cloud, introduced late last year.

As the first to operationalize the AI Grid, Akamai is rolling out thousands of NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, providing a platform to enable enterprises to run agentic and physical AI with the responsiveness of local compute and the scale of the global web. At the heart of the AI Grid is an intelligent orchestrator that acts as a real-time broker for AI requests. Applying Akamai?s expertise in application performance optimization to AI, this workload-aware control plane optimizes "tokenomics" by radically improving cost per token, time-to-first-token, and throughput.

A major differentiator for Akamai is the ability for customers to access fine-tuned or sparsified models through its enormous global edge footprint, which offers a massive cost and performance advantage for the long tail of AI workloads. For example: Cost Efficiency at Scale: Enterprises can dramatically reduce inference costs by matching workloads to the right compute tier automatically. The orchestrator applies techniques like semantic caching and intelligent routing to direct requests to right-sized resources, reserving premium GPU cycles for the workloads that demand them.

Underpinning this is Akamai Cloud, built on open-source infrastructure with generous egress allowances to support data-intensive AI operations at scale. Real-Time Responsiveness: Gaming studios can deliver AI-driven NPC interactions that maintain player immersion in milliseconds. Financial institutions can execute personalized fraud detection and marketing recommendations in the moment between login and first screen.

Broadcasters can transcode and dub content in real time for global audiences. These outcomes are powered by Akamai's globally distributed edge network with over 4,400 locations with integrated caching, serverless edge compute, and high-performance connectivity that processes requests at the point of user contact, bypassing the round-trip lag of origin dependent clouds. Production-Grade AI at the Core: Large language models, continuous post-training, and multi-modal inference workloads require sustained, high-density compute that only dedicated infrastructure can deliver.

Akamai's multi-thousand GPU clusters, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, provide the concentrated horsepower for the heaviest AI workloads, complementing the distributed edge with centralized scale.