Authors: LM Schulze Buschoff, E Akata, M Bethge
Year: 2025
Published in: Nature Machine ..., 2025 - nature.com
Institution: Max Planck Institute
Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)
Discipline: Cognitive Science, Artificial Intelligence, Computer Vision
Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.
Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.
Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.
DOI: https://doi.org/10.1038/s42256-024-00963-y
Citations: 70