Authors: Y Wu, C Huang, F Yang, F Wang
Year: 2025
Published in: ArXiv
Institution: Nvidia, National Taiwan University
Research Area: Motion Customization of Text-to-Video Diffusion Models
Discipline: Computer Vision, Pattern Recognition
MotionMatcher is a novel framework for motion customization in text-to-video (T2V) diffusion models, using high-level spatio-temporal motion features rather than pixel-level objectives, achieving state-of-the-art performance.
Methods: Fine-tuning pre-trained text-to-video diffusion models at feature level by comparing spatio-temporal motion features instead of pixel-level objectives to address motion customization from reference videos.
Key Findings: Efficacy of motion customization in T2V models; ability to accurately capture complex motion and avoid content leakage from reference videos.