AIC and ScaleFlux announced delivering a joint hardware platform designed to accelerate emerging Inference Context Memory Storage (introduced as ICMS and now referred to as CMX) deployments for large-scale AI inference infrastructure. CMX architectures address the challenge of AI Agent workloads by introducing a high-performance storage layer that can hold and serve large context datasets outside GPU memory while maintaining the low latency required for inference operations. By combining the AIC F2032-G6 JBOF Storage System with ScaleFlux NVMe SSDs and NVIDIA's latest data-center networking technologies?including the NVIDIA BlueField-4 DPU and NVIDIA ConnectX-9 SuperNIC?the companies are delivering a purpose-built hardware platform optimized for the rapidly growing context memory storage tier in modern AI clusters.
The AIC F2032-G6 JBOF platform provides an ideal foundation for this new infrastructure tier. Designed as a high-density NVMe storage system, the platform integrates BlueField-4 DPUs and/or ConnectX-9 SuperNICs to deliver high-throughput, low-latency connectivity between GPU servers and shared context memory storage. When populated with ScaleFlux NVMe SSDs, the system delivers a powerful and efficient hardware configuration for CMX deployments.
ScaleFlux SSD technology is designed to sustain high-IOPS, low-latency data access patterns typical of KV-cache workloads while improving storage efficiency and overall system utilization. All of this operates together to minimize the crucial "time to first token" and minimize the time GPUs lose waiting for data. Lower wait times translate to higher GPU utilization and greater ROI from those multi-million (or billion!) dollar investments in AI infra.
Context memory is emerging as a new data tier in AI infrastructure. By pairing ScaleFlux NVMe SSDs with AIC's high-density JBOF platform and NVIDIA's advanced data-center networking technologies, they are delivering a hardware solution optimized for the next generation of AI inference pipelines. The joint platform helps AI infrastructure operators address several key challenges associated with long-context inference workloads, including: Expanding KV-cache requirements driven by larger context windows and persistent AI sessions; Efficient offloading of context memory from GPU HBM and system DRAM; High-performance shared storage capable of serving context data to large GPU clusters; Scalable infrastructure architectures for agentic AI and multi-modal inference workloads.
As organizations deploy increasingly sophisticated AI services, the need for scalable context memory infrastructure is expected to grow rapidly. Solutions such as the AIC F2032-G6 JBOF with ScaleFlux NVMe SSDs provide a flexible and efficient hardware platform to support this new layer in the AI data pipeline. Together, AIC and ScaleFlux are enabling AI infrastructure builders to deploy high-performance context memory storage systems that help maximize GPU utilization while supporting the next generation of long-context AI applications.

















