Computer Vision Research

This page lists 3 peer-reviewed papers in the discipline of Computer Vision in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (3 of 3)

Visual cognition in multimodal large language models

Authors: LM Schulze Buschoff, E Akata, M Bethge

Year: 2025

Published in: Nature Machine ..., 2025 - nature.com

Institution: Max Planck Institute

Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)

Discipline: Cognitive Science, Artificial Intelligence, Computer Vision

Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.

Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.

Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.

DOI: https://doi.org/10.1038/s42256-024-00963-y

Citations: 70
MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching

Authors: Y Wu, C Huang, F Yang, F Wang

Year: 2025

Published in: ArXiv

Institution: Nvidia, National Taiwan University

Research Area: Motion Customization of Text-to-Video Diffusion Models

Discipline: Computer Vision, Pattern Recognition

MotionMatcher is a novel framework for motion customization in text-to-video (T2V) diffusion models, using high-level spatio-temporal motion features rather than pixel-level objectives, achieving state-of-the-art performance.

Methods: Fine-tuning pre-trained text-to-video diffusion models at feature level by comparing spatio-temporal motion features instead of pixel-level objectives to address motion customization from reference videos.

Key Findings: Efficacy of motion customization in T2V models; ability to accurately capture complex motion and avoid content leakage from reference videos.
The value of AI guidance in human examination of synthetically-generated faces

Authors: A Boyd, P Tinsley, K Bowyer, A Czajka

Year: 2023

Published in: Proceedings of the AAAI ..., 2023 - ojs.aaai.org

Institution: University of Notre Dame, Clemson University, University of North Carolina at Charlotte

Research Area: Computer Vision, Human-Computer Interaction, Synthetic Media Detection

Discipline: Artificial Intelligence, Computer Vision, Human-Computer Interaction

Citations: 20