FusionFrames: efficient architectural aspects for text-to-video generation pipeline
arxiv, 2023
Read more
Kandinsky: an improved text-to-image synthesis with image prior and latent diffusion
EMNLP 2023
Read more
Kandinsky 3.0 technical report
arxiv, 2023
Read more
CRAFT: Cultural Russian-oriented dataset adaptation for focused text-to-image generation
Doklady Mathematics, 2024
Read more
Kandinsky 3: Text-to-image synthesis for multifunctional generative framework
EMNLP 2024
Read more
Improveyourvideos: Architectural improvements for text-to-video generation pipeline
IEEE Access, 2024
Read more
RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation
NAACL 2025
Read more
VIVAT: Virtuous Improving VAE Training through Artifact Mitigation
arxiv, 2025
Read more
NABLA: Neighborhood Adaptive Block-Level Attention
arxiv, 2025
Read more
Time-Correlated Video Bridge Matching
arxiv, 2025
Read more
Publications
Here we share the results of our research and development. We are convinced that openness
and collaboration are the foundation of progress in the field of artificial intelligence. This section contains publications that describe in detail the architectural solutions, teaching methods, and stages
of creating Kandinsky.
Try now
Video
Image
Read more
A line-up of Text-to-Image
and Image Editing models that generate HD images (1280×768, 1024×1024) from English and Russian text prompts with precise details and accurate text rendering, powered by a 6 billion parameter architecture.
Image Lite
A line-up of Text-to-Video and Image-to-Video models that produce up to 10 seconds of high-quality SD video (768x512, 512x512, 24 fps) with strong consistency, powered by 2 billion parameters Diffusion Transformer
for fast, efficient generation.
Video Lite
A line-up of Text-to-Video and Image-to-Video models designed for high-quality HD video (1280x768, 24 fps) synthesis, delivering rich motion dynamics and precise camera control, English and Russian prompts understanding. Diffusion Transformer is scaled to 19 billion parameters for superior visual quality.
Video Pro
2021-2022
Malevich/Kandinsky 1.0

1.3B/12B parameters

Text-to-image generation at 256x256 resolution. Autoregressive approach, trained on Russian-language data.
July 2023
Kandinsky 2.2

4.8B parameters

Generates photorealistic images at 1024x1024 resolution. Supports ControlNet for precise editing and 4-second video clips.
November 2023
Kandinsky Video 1.0

17.5B parameters

Generates 8-second videos from text. Uses keyframe interpolation at 512x512 resolution.
May 2024
Kandinsky Video 1.1

17.5B parameters

Generates 5.5-second videos. Trained on 4.6M text-video pairs, with captions generated by LLaVA-1.5.
June 2025
Kandinsky 4.1

14B parameters

Refined image generation with improved aesthetics, guided by expert artist feedback.
2022-2023
Kandinsky 2.0/2.1

2B/3.3B parameters

Diffusion-based generation. Supports 101 languages. Enables inpainting, outpainting, and image editing at resolution up to 768x768.
November 2023
Kandinsky 3.0

11.9B parameters

High-fidelity image generation with improved text alignment.
April 2024
Kandinsky Flash/3.1

11.9B/15.5B parameters

Flash: Generates images in just 4 steps — 10x faster than standard diffusion.
3.1: Uses Flash as a refiner to boost quality. Adds cultural awareness for Russian-language prompts.
December 2024
Kandinsky 4.0

17.5B parameters

Generates images and 12-second videos from text. Supports audio generation conditioned on video input.
November 2025
Kandinsky 5.0
6B/2B/19B parameters

New family based on Flow Matching.
Includes:
  • Image Lite (6B) — generates high quality HD images and has strong text writing capabilities and Russian concept understanding. Variant of this model is capable to edit input images given text prompt.
  • Video Lite (2B) — is a lightweight video generation model that ranks #1 among open-source models in its class.
  • Video Pro (19B) — generates high quality HD images and has strong text writing capabilities and Russian concept understanding. Variant of this model is capable to edit input images given text prompt.
History
Our Mission
Our mission is to move the industry forward by creating models and approaches that open up new formats
of creativity, accelerate scientific discoveries, and unlock the potential of generative AI.
About Kandinsky Lab
We are a team of researchers, engineers, and analysts who create state-of-the-art image and video generation models. We combine deep expertise in machine learning, data,
and computing systems and share research, code,
and models to develop artificial intelligence together
with the community.