✅ Release the checkpoints of Latte-1

We release the chckpoints of Latte-1 on Hugging Face.

Inference

Latte-1 is now integrated into diffusers.

You can easily run Latte using the following code. We also support inference with 4/8-bit quantization, which can reduce GPU memory from 17 GB to 9 GB.


#Please update the version of diffusers at leaset to 0.30.0
from diffusers import LattePipeline
from diffusers.models import AutoencoderKLTemporalDecoder
from torchvision.utils import save_image
import torch
import imageio

torch.manual_seed(0)

device = “cuda” if torch.cuda.is_available() else “cpu” video_length = 16 # 1 (text-to-image) or 16 (text-to-video) pipe = LattePipeline.from_pretrained(“maxin-cn/Latte-1”, torch_dtype=torch.float16).to(device)

#Using temporal decoder of VAE vae = AutoencoderKLTemporalDecoder.from_pretrained(“maxin-cn/Latte-1”, subfolder=“vae_temporal_decoder”, torch_dtype=torch.float16).to(device) pipe.vae = vae

prompt = “a cat wearing sunglasses and working as a lifeguard at pool.” videos = pipe(prompt, video_length=video_length, output_type=‘pt’).frames.cpu()

Xin Ma
Xin Ma

I’m a Ph.D canditate at Monash University. My research interests include image super-resolution and inpainting, model compression, face recognition, video generation, large-scale generative models, etc