Groq announced its partnership with Meta to deliver fast inference for the official Llama API - giving developers the fastest, most cost-effective way to run the latest Llama models. Now in preview, the Llama 4 API model accelerated by Groq will run on the Groq LPU, the world's most efficient inference chip. That means developers can run Llama models with no tradeoffs: low cost, fast responses, predictable low latency, and reliable scaling for production workloads.

Unlike general-purpose GPU stacks, Groq is vertically integrated for one job: inference. Builders are increasingly switching to Groq because every layer, from custom silicon to cloud delivery, is engineered to deliver consistent speed and cost efficiency without compromise. The Llama API is the first-party access point for Meta's openly available models, optimized for production use.

With Groq infrastructure, developers get: Speeds of up to 625 tokens/sec throughput; Minimal lift to get started - just three lines of code to migrate from OpenAI; No cold starts, no tuning, no GPU overhead.