r/StableDiffusion@rStableDiffusion P.55772

r/StableDiffusion

Kandinsky 5 - video output examples from a 24gb GPU

About two weeks ago , the news of the Kandinsky 5 lite models came up on here [https://www.reddit.com/r/StableDiffusion/comments/1nuipsj/opensourced\_kandinsky\_50\_t2v\_lite\_a\_lite\_2b/](https://www.reddit.com/r/StableDiffusion/comments/1nuipsj/opensourced_kandinsky_50_t2v_lite_a_lite_2b/) with a nice video from the repos page and with ComfyUI nodes included . However, what wasn't mentioned on their repo page (originally) was that it needed 48gb VRAM for the VAE Decoding....ahem.

**In the last few days, that has been taken care of and it now tootles along using \~19GB on the run and spiking up to \~24GB on the VAE decode**

https://preview.redd.it/5y3bin2aduuf1.png?width=817&format=png&auto=webp&s=7145052efce232663aad7e61166caa694db27636

* Speed : unable to implement Magcache in my workflow yet [https://github.com/Zehong-Ma/ComfyUI-MagCache](https://github.com/Zehong-Ma/ComfyUI-MagCache)
* Who Can Use It: 24gb+ VRAM gpu owners
* Models Unique Selling Point : making 10s videos out of the box
* Github Page : [https://github.com/ai-forever/Kandinsky-5](https://github.com/ai-forever/Kandinsky-5)
* **Very Important Caveat** : the requirements messed up my Comfy install (the Pytorch to be specific), so I'd suggest a fresh trial install to keep it initially separate from your working install - ie know what you're doing with a pytorch.
* Is it any good ? : eye of the beholder time and each model has particular strengths in particular scenarios - also 10s out of the box . It takes about 12min total for each gen and I want to go play the new BF6 (these are my first 2 gens).
* workflow ?: in the repo
* Particular model used for video below : Kandinsky5lite\_t2v\_sft\_10s.safetensors

[I'm making no comment on their #1 claims. ](https://preview.redd.it/1yxlsop8guuf1.png?width=816&format=png&auto=webp&s=8b1b307b273f9b63e85558f117919908253f781d)

Test videos below using a prompt I made with an LLM feeding their text encoders :

Not cherry picked either way,

* 768x512
* length: 10s
* 48fps (interpolated from 24fps)
* 50 steps
* 11.94s/it
* render time: 9min 09s for a 10s video (it took longer in total as I added post processing to the flow) . I also have not yet got MagCache working
* 4090 24gb vram with 64gb ram

https://reddit.com/link/1o5epv7/video/xk32u4wikuuf1/player

https://preview.redd.it/8t1gkm3kbuuf1.png?width=1949&format=png&auto=webp&s=ce36344737441a8514eac525c1ef7cc02372bac7

https://redd.it/1o5epv7
@rStableDiffusion

From the StableDiffusion community on Reddit: Open-sourced Kandinsky 5.0 T2V Lite a lite (2B parameters) version of Kandinsky 5.0…

Explore this post and more from the StableDiffusion community

www.tgoop.com/rStableDiffusion/55772

5 viewsOct 13 at 09:40

tgoop.com/rStableDiffusion/55772

Create: 2025-10-13
Last Update: 2025-10-17 19:53:07

BY r/StableDiffusion

Share with your friend now:
tgoop.com/rStableDiffusion/55772

Telegram News

Kandinsky 5 - video output examples from a 24gb GPU