Publications

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Published in International Conference on Learning Representations (ICLR), 2024, Stars
Latte: Latent Diffusion Transformer for Video Generation
arXiv preprint arXiv:2401.03048, Stars
LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
arXiv preprint arXiv:2309.15103, Stars