Welcome πŸ‘‹ I am currently a Ph.D candidate at Monash University. Previously, I received the M.S degree from University of Chinese Academy of Sciences, where I studied at CRIPAC under the leadership of Prof. Tieniu Tan and was supervised by Prof. Ran He. Before that, I obtained the B.E degree from Jiangsu University. My research interests include image super-resolution and inpainting, model compression, face recognition, video generation, large-scale generative models, etc. I am always pursuing research collaborations on deep generative models for images and videos. Feel free to contact me if you are interested.

Work Experiences

Shanghai Artificial Intelligence Laboratory
Research Intern
Shanghai Artificial Intelligence Laboratory
December 2022 – Present Shanghai

Research on generative models. During this period,

  • A high-quality text-to-video generation framework LaVie is proposed.
  • A general Transformer-based latent video diffusion model, referred to as Latte, is introduced.
Algorithm Engineer
June 2021 – September 2022 Beijing
Research on model compression. A model compression tool has been developed to assist developers in rapidly deploying models to edge devices without compromising model accuracy, while simultaneously enhancing model inference speed. This tool has been extensively implemented across various businesses at Meituan. One paper was accepted by CVPR 2022 during this period.
Algorithm Intern
April 2020 – August 2020 Beijing
Research on Image Dewatermarking Algorithm. An image dewatermarking algorithm was proposed based on attention mechanism and self-supervised learning. The service is now launched on Meituan App. Related work was accepted by ICPR 2020 and selected as an oral presentation.

Recent Publications

Quickly discover relevant content by filtering publications.
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Published in International Conference on Learning Representations (ICLR), Spotlight, 2024, Stars
Latte: Latent Diffusion Transformer for Video Generation
arXiv preprint arXiv:2401.03048, Stars
LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
arXiv preprint arXiv:2309.15103, Stars

Recent Projects

Latte - A Transformer-based Video Diffusion Generation Framework
A simple and general latent video diffusion model incorporating sptio-temporal Transformers for video generation.
LaVie - A High-Quality Video Generation Framework
A large-scale text-to-video framework that produces high-quality and temporally coherent videos. This framework operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

Granted Patents

  • Image super-resolution method of deep neural network fusing mutual information, CN110211035B

  • Attention-mechanism-based image completion method and device, CN112184582B

  • Cartoon style image conversion model training method, image generation method and device, CN112232485B

  • Image completion method based on uncertainty estimation, CN112686817B

Academic Activities

  • Conference Reviewers:

    • IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    • International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    • IEEE International Conference on Multimedia and Expo (ICME)

    • Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

  • Journal Reviewers:

    • Signal, Image and Video Processing

    • IEEE Transactions on Circuits and Systems for Video Technology

    • International Journal of Computer Vision