Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models  

Xin Ma1 Yaohui Wang2 Gengyun Jia3 Xinyuan Chen2 Yuan-Fang Li1
Cunjian Chen1 Yu Qiao2

1Monash University 2Shanghai Artificial Intelligence Laboratory 3Nanjing University of Posts and Telecommunications

[Paper]     [Github]    


Click to play results from Cinemo!

Input Image

Case 1 Image

Animated Video

Input Image

Case 2 Image

Animated Video

Input Image

Case 5 Image

Animated Video

"Red Panda Eating Bamboo"

"Car Moving"

"Fireworks"

Case 1 Image
Case 2 Image
Case 5 Image

"Flowers Swaying"

"Girl Walking on the Beach"

"House Rotating"

Case 1 Image
Case 2 Image
Case 5 Image

"People Runing"

"Shark Swimming"

"Windmill Turning"

Methodology

Image animation aims to generate dynamic visual content from input static images. Diffusion models have become mainstream in image animation research due to their powerful generative capabilities, achieving remarkable success. However, maintaining consistency with the detailed information of the input static image over time (such as style, background, and object of the input static image) and ensuring smoothness in animated video narratives guided by textual prompts remain considerable challenges. In this paper, we propose a novel method called Cinemo, which can perform motion-controllable image animation with strong consistency. Our method introduces a novel framework focused on understanding the distribution of motion residuals, rather than directly generating subsequent frames. Additionally, an effective method based on the structural similarity index is proposed to control the motion intensity. Furthermore, we propose noise refinement based on discrete cosine transform to ensure layout consistency. These three strategies help Cinemo generate highly consistent and motion-controlled image animation results. Compared to previous methods, Cinemo offers simpler and more precise user control and better generative performance. Extensive experiments against several baseline methods, including both commercial tools and research approaches, across multiple metrics, underscore the effectiveness and superiority of our proposed approach.

Comparisons

We shows the animated results generated by different methods using the prompt "girl smiling".
We qualitatively compare our method with both commercial tools and research approaches, including Pika Labs, Genmo, ConsistI2V, DynamiCrafter, I2VGen-XL, SEINE, PIA and SVD.

Click to play the following animations!

Analysis

The ablation studies and potential applications are presented here.

Motion intensity controllability

We demonstrate that our method can finely control the motion intensity of animated videos. The prompt is "shark swimming".

Click to play the following animations!


Effectiveness of DCTInit

We demonstrate that the proposed DCTInit can stabilize the video generation process and effectively mitigate sudden motion change; the DCT frequency domain decomposition method can effectively mitigate the color inconsistency issues caused by the FFT frequency domain decomposition method. The first and second lines prompt "woman smiling" and "robot dancing", respectively


Motion control by prompt

We demonstrate that our method does not rely on complex guiding instructions and even simple textual prompts can yield satisfactory visual effects.


Motion transfer/Video editing

We demonstrate that our proposed method can also be applied to motion transfer and video editing. We use the off-the-shelf image editing method to edit the first frame of the input video.

Gallery

More animation results generated by our method are shown here.

Click to play results from Cinemo!

"Big Cat Yawning"

"Downward Flow of Waterfall"

"Bubbles Floating Upwards"

"Car Driving on the Road"

"City Lightning"

"Clouds in the Sky Moving Slowly"

"Dragon Glowing Eyes"

"Ducks Swimming on the Water"

"Flames Burning and Light Snow Falling"

"People Walking"

"Planet Rotating"

"River Flowing"

"Shark Falling into the Sea"

"Snowman Waving His Hand"

"Tank Moving"

"Tree Swaying"

"Woman Blinking"

"Woman Walking"

"Sea Swell"

"Space Station Moving"

"Girl Dancing under the Stars"

Project page template is borrowed from DreamBooth.