Text-to-Video Generation

Latte - A first open source Transformer-based Video Diffusion Generation Framework (TMLR 2025)

A simple and general latent video diffusion model incorporating sptio-temporal Transformers for video generation.

Latte - A first open source Transformer-based Video Diffusion Generation Framework (TMLR 2025)

LaVie - A High-Quality Video Generation Framework (IJCV 2024)

A large-scale text-to-video framework that produces high-quality and temporally coherent videos. This framework operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

LaVie - A High-Quality Video Generation Framework (IJCV 2024)