Kuaishou Technology announced that on December 1, 2025, Kling AI officially unveiled Kling O1, positioned as the industry's first unified multimodal creation tool. Powered by next-generation video and imaging architectures, Kling O1 integrates text, video, image, and subject inputs, consolidating all generation and editing tasks into a single, all-encompassing engine. This launch definitively resolves the "consistency challenge" regarding characters and scenes in AI video generation, providing a deeply integrated, one-stop solution tailored for film, television, social media, advertising, and e-commerce.
As the pioneer of unified multimodal video models, Kling O1 is engineered on a Multimodal Visual Language (MVL) framework. It transcends the boundaries of traditional single-task video generation models by fusing a comprehensive spectrum of capabilities - including reference-based video generation, text-to-video generation, start and end frame generation, video in-painting (content insertion and removal), video modification and transformation, style re- renderinging, and shot extension - into one versatile engine. Regarding video duration, Kling O1 restores temporal control to the creator, supporting generation lengths between 3 and 10 seconds.
Whether crafting a brief visual impact or a sustained narrative arc, pacing is entirely user-defined. Notably, as part of the unified model, Kling O1's first- and last-frame capabilities will soon support the 3-10 second range, further enhancing narrative flexibility. Kling AI also unveiled the Kling O1 image model, enabling seamless end-to-end workflows from basic image generation to advanced detail editing.
Users can generate images from text alone or upload up to 10 reference images to inspire and guide new creations. The model offers four key advantages. First, it ensures high feature retention, keeping subject elements stable and consistent.
Second, it offers high-precision detail editing, aligning every adjustment with user expectations. Third, it maintains accurate style control, ensuring a consistent visual tone throughout. Fourth, it delivers exceptionally rich creativity, enabling more dynamic and expressive creative outputs, truly making "what you envision is what get" a reality.
Combining both generation and editing into a single solution, the all-new Kling O1 is uniquely adapted for film, television, social Media, advertising, and e-Commerce scenarios. Whether building narratives from scratch or restructuring existing element, Kling O1 flexibly leverages its referencing and editing capabilities to streamline production. In film and television production, utilizing robust image (subject) consistency and a dedicated subject library, Kling O1 maintains strict continuity of characters, costumes, and props across every shot, effortlessly generating coherent cinematic sequences.
For video editors and social media creators, with prompts as simple as " remove passersby from the background" or "enhance sky color," Kling O1 automatically performs pixel-level intelligent repair and reconstruction. To solve the hassles of scheduling models and outfit changes, Kling O1 allows for a 24/7 virtual runway. By uploading model and clothing images and inputting prompts, Kling O1 flawlessly renders fabric textures and details, producing high-quality video lookbooks at scale.
Kling O1's robust and full-featured capabilities are driven by innovations in its technical architecture. The brand-new Kling O1 video model makes makes traditionally fragmented video generation, editing, and comprehension features, creating a next-generation unified generative foundation. By combining the Multimodal Transformer with built-in multimodal comprehension and multimodal long-context, Kling O1 thoroughly integrates diverse tasks into a single, cohesive workflow.

















