Publications

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Published in International Conference on Learning Representations (ICLR), 2024, Stars
Latte: Latent Diffusion Transformer for Video Generation
arXiv preprint arXiv:2401.03048, Stars
LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
Published in International Journal of Computer Vision (IJCV), 2024, JCR Q1 & CCF-A, Stars