DigitalOcean announced that its Inference Cloud Platform is delivering 2X production inference throughput for Character.ai, a leading AI entertainment platform operating one of the most demanding production inference workloads in the market handling over a billion queries per day, through a tightly integrated software and hardware collaboration with AMD. Character.ai leverages both proprietary and open source models to power its high-volume, high-concurrency, latency-sensitive applications. By migrating these workloads to DigitalOcean's inference cloud platform, Character.ai achieved significantly higher request throughput while adhering to rigorous latency targets.

Compared to standard, non-optimized GPU infrastructure, this transition reduced the cost per token by 50% and substantially expanded usable capacity for their end users. This performance milestone builds on DigitalOcean's growing momentum with large-scale AI customers like Character.ai, supporting platform expansion and richer multimodal experiences. DigitalOcean worked closely with Character.ai and AMD to deploy AMD Instinct??

GPUs optimized specifically for inference workloads. Rather than treating GPUs as interchangeable infrastructure, DigitalOcean's platform integrates hardware-aware scheduling and optimized inference runtimes to extract higher sustained performance per node. AMD has invested heavily in ROCm??, its open end-to-end AI software stack.

Through deep collaboration, the teams optimized ROCm with vLLM, AITER - AMD's inference-focused runtime and optimization framework for transformer workloads - and deployment configurations for Character's workloads on DigitalOcean AMD Instinct?? MI300X and MI325X GPUs, contributing to throughput improvement. In collaboration with Character.ai, DigitalOcean engineers tuned distributed inference configurations to balance latency, throughput, and concurrency.

In some production scenarios, these optimizations increased throughput by 2X under the same latency constraints, directly improving total cost of ownership. Operating large-scale AI inference under real production constraints This approach reflects DigitalOcean's broader strategy: GPUs matter, but outcomes matter more. DigitalOcean is designing, operating, and optimizing systems that can yield significantly more reliable performance for its customers.

Unlike traditional cloud approaches that emphasize GPU availability alone, DigitalOcean's Inference Cloud is designed to operate AI applications in production. The Character.ai deployment reflects a broader shift in how AI infrastructure is built and evaluated. As inference workloads scale, customers are prioritizing predictable performance, operational simplicity, and cost efficiency over raw hardware specifications.

For additional information on the specific testing methodologies, hardware configurations, and performance benchmarks used to achieve these results, as well as important information regarding performance variability, please see technical deep-dive here.