Nvidia: Research Institution — Prolific Citations Library

Browse 2 peer-reviewed papers from Nvidia spanning Video Generation, Diffusion Models (2025). Research powered by Prolific's high-quality participant data.

This page lists 2 peer-reviewed papers from researchers at Nvidia in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (2 of 2)

One-Minute Video Generation with Test-Time Training

Authors: K Dalal, D Koceja, G Hussein, J Xu, Y Zhao, Y Song, S Han, KC Cheung, J Kautz, C Guestrin, T Hashimoto, S Koyejo, Y Choi, Y Sun, X Wang

Year: 2025

Published in: ArXiv

Institution: Nvidia, Stanford University, UT Austin, University of California Berkeley, University of California San Diego

Research Area: Video Generation, Diffusion Models, Test-Time Training

Discipline: Computer Science

The paper introduces Test-Time Training (TTT) layers into Transformers to generate coherent one-minute videos from text storyboards, outperforming baselines in storytelling coherence but facing efficiency and artifact challenges.

Methods: Experimentation with Test-Time Training layers embedded in pre-trained Transformer models, evaluated using a dataset curated from Tom and Jerry cartoons and compared against Mamba 2, Gated DeltaNet, and sliding-window attention layers.

Key Findings: Effectiveness of video generation methods in creating coherent multi-scene stories in one-minute videos.

Citations: 52

Sample Size: 100
MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching

Authors: Y Wu, C Huang, F Yang, F Wang

Year: 2025

Published in: ArXiv

Institution: Nvidia, National Taiwan University

Research Area: Motion Customization of Text-to-Video Diffusion Models

Discipline: Computer Vision, Pattern Recognition

MotionMatcher is a novel framework for motion customization in text-to-video (T2V) diffusion models, using high-level spatio-temporal motion features rather than pixel-level objectives, achieving state-of-the-art performance.

Methods: Fine-tuning pre-trained text-to-video diffusion models at feature level by comparing spatio-temporal motion features instead of pixel-level objectives to address motion customization from reference videos.

Key Findings: Efficacy of motion customization in T2V models; ability to accurately capture complex motion and avoid content leakage from reference videos.