Multimodal large language models can make context-sensitive hate speech evaluations aligned with human judgement
Authors: T Davidson
Published: 2025
Publication: Nature Human Behaviour, 2025 - nature.com
The study demonstrates that larger multimodal large language models (MLLMs) can align closely with human judgement in context-sensitive hate speech evaluations, though they still exhibit biases and limitations.
Methods: Conjoint experiments where simulated social media posts varying in attributes like slur usage and user demographics were evaluated by MLLMs and compared to human judgements.
Key Findings: The capacity of MLLMs to evaluate hate speech in a context-sensitive manner and their alignment with human judgement, while assessing biases and responsiveness to contextual cues.
Limitations: Demographic and lexical biases persist, particularly in smaller models, and full context sensitivity cannot be achieved even through advanced prompting.
Institution: University of Oxford, Davidson College
Research Area: Hate Speech Evaluation, Multimodal LLMs, Social Bias, Computational Law, AI Bias, AI Evaluation
Discipline: Artificial Intelligence
Sample Size: 1854 participants