🎉 one paper Cinemo was accpeted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Xin Ma

Feb 27, 2025 2 min read

one paper Cinemo was accpeted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Abstract

Diffusion models have achieved significant progress in the task of image animation due to their powerful generative capabilities. However, preserving appearance consistency to the static input image, and avoiding abrupt motion change in the generated animation, remains challenging. In this paper, we introduce Cinemo, a novel image animation approach that aims at achieving better appearance consistency and motion smoothness. The core of Cinemo is to focus on learning the distribution of motion residuals, rather than directly predicting frames as in existing diffusion models. During the inference, we further mitigate the sudden motion changes in the generated video by introducing a novel DCT-based noise refinement strategy. To counteract the over-smoothing of motion, we introduce a dynamics degree control design for better control of the magnitude of motion. Altogether, these strategies enable Cinemo to produce highly consistent, smooth, and motion-controllable results. Extensive experiments compared with several state-of-the-art methods demonstrate the effectiveness and superiority of our proposed approach. In the end, we also demonstrate how our model can be applied for motion transfer or video editing of any given video. The project page is available at https://maxin-cn.github.io/cinemo_project/.

Setup

Download and set up the repo:


git clone https://github.com/maxin-cn/Cinemo
cd Cinemo
conda env create -f environment.yml
conda activate cinemo

Animation

You can sample from our pre-trained Cinemo models. Weights for our pre-trained Cinemo model can be found here. The script has various arguments for adjusting sampling steps, changing the classifier-free guidance scale, etc:


bash pipelines/animation.sh

Related model weights will be downloaded automatically, and the following results can be found here.

CVPR Markdown

Xin Ma

PhD Student

I’m a Ph.D canditate at Monash University. My research interests include video and image generation, multimodal models, low-level vision, and face recognition, among others.