Multimodal Ai: Research Area — Prolific Citations Library

Discover 6 peer-reviewed studies in Multimodal Ai (2024–2025). Explore research findings powered by Prolific's diverse participant panel.

This page lists 6 peer-reviewed papers in the research area of Multimodal Ai in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (6 of 6)

Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models

Authors: L Ibrahim, C Akbulut, R Elasmar, C Rastogi, M Kahng, MR Morris, KR McKee, V Rieser, M Shanahan, L Weidinger

Year: 2025

Published in: arXiv preprint arXiv:2502.07077, 2025•arxiv.org

Institution: Google DeepMind, Google, University of Oxford

Research Area: Multimodal conversational AI, conversational AI, Evaluation methodology, benchmarking

Discipline: Computer Science, Natural Language Processing (NLP), Human–Computer Interaction (HCI)

The paper evaluates anthropomorphic behaviors in SOTA LLMs through a multi-turn methodology, showing that such behaviors, including empathy and relationship-building, predominantly emerge after multiple interactions and influence user perceptions.

Methods: Multi-turn evaluation of 14 anthropomorphic behaviors using simulations of user interactions, validated by a large-scale human subject study.

Key Findings: Anthropomorphic behaviors in large language models, including relationship-building and pronoun usage, and their perception by users.

Citations: 26

Sample Size: 1101
Whose view of safety? a deep dive dataset for pluralistic alignment of text-to-image models

Authors: C Rastogi, TH Teh, P Mishra, R Patel, D Wang, M Díaz, A Parrish, AM Davani, Z Ashwood

Year: 2025

Published in: arXiv preprint arXiv:2507.13383, 2025•arxiv.org

Institution: Google DeepMind, Google Research, Google

Research Area: AI alignment, safety evaluation, AI Safety, Multimodal evaluation, Human–AI interaction, LLM

Discipline: Computer Science, Machine Learning, Artificial Intelligence

This research introduces the DIVE dataset to enable pluralistic alignment in text-to-image models by accounting for diverse safety perspectives, revealing demographic variations in harm perception and advancing T2I model alignment strategies.

Methods: The study involved collecting feedback across 1000 prompts from demographically intersectional human raters to capture diverse safety perspectives, with an emphasis on empirical and contextual differences in harm perception.

Key Findings: Safety perceptions of text-to-image (T2I) model outputs from diverse demographic viewpoints and the influence of these perspectives on alignment strategies.

Citations: 1

Sample Size: 1000
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark

Authors: D Testa, G Bonetta, R Bernardi, A Bondielli

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Università di Roma La Sapienza

Research Area: Multimodal Reasoning, AI Benchmarking

Discipline: Artificial Intelligence

MAIA is a benchmark designed to evaluate the reasoning abilities of Vision Language Models (VLMs) on video-based tasks, with a focus on Italian culture and language, revealing their fragility in consistency and visually grounded language comprehension and generation.

Methods: MAIA comprises a set of video-related questions tested with two tasks: visual statement verification and open-ended visual question answering, categorized into twelve reasoning types to disentangle language-vision relations.

Key Findings: The ability of Vision Language Models (VLMs) to perform consistent, visually grounded natural language understanding and generation across fine-grained reasoning categories.

DOI: https://doi.org/10.48550/arXiv.2502.16989
Multimodal large language models can make context-sensitive hate speech evaluations aligned with human judgement

Authors: T Davidson

Year: 2025

Published in: Nature Human Behaviour, 2025 - nature.com

Institution: University of Oxford, Davidson College

Research Area: Hate Speech Evaluation, Multimodal LLMs, Social Bias, Computational Law, AI Bias, AI Evaluation

Discipline: Artificial Intelligence

The study demonstrates that larger multimodal large language models (MLLMs) can align closely with human judgement in context-sensitive hate speech evaluations, though they still exhibit biases and limitations.

Methods: Conjoint experiments where simulated social media posts varying in attributes like slur usage and user demographics were evaluated by MLLMs and compared to human judgements.

Key Findings: The capacity of MLLMs to evaluate hate speech in a context-sensitive manner and their alignment with human judgement, while assessing biases and responsiveness to contextual cues.

Sample Size: 1854
Evidence of human-like visual-linguistic integration in multimodal large language models during predictive language processing

Authors: V Kewenig, C Edwards

Year: 2024

Published in: ... and Rechardt, Akilles ..., 2023 - papers.ssrn.com

Research Area: Multimodal AI, Cognitive Science, Visual-Linguistic Integration

Discipline: Artificial Intelligence, Computational Linguistics, Cognitive Science

Citations: 2
MAIA: A benchmark for multimodal AI assessment

Authors: D Testa, G Bonetta, R Bernardi

Year: 2024

Published in: Proceedings of the ..., 2025 - aclanthology.org

Institution: Università di Roma La Sapienza, Fondazione Bruno Kessler, University of Pisa

Research Area: Multimodal AI Assessment, Visual Language Models (VLMs), Video Understanding, Computational Linguistics

Discipline: Artificial Intelligence, Computational Linguistics