Multimodal large language models can make context-sensitive hate speech evaluations aligned with human judgement

Authors: T Davidson

Published: 2025

Publication: Nature Human Behaviour, 2025 - nature.com

The study demonstrates that larger multimodal large language models (MLLMs) can align closely with human judgement in context-sensitive hate speech evaluations, though they still exhibit biases and limitations.

Methods: Conjoint experiments where simulated social media posts varying in attributes like slur usage and user demographics were evaluated by MLLMs and compared to human judgements.

Key Findings: The capacity of MLLMs to evaluate hate speech in a context-sensitive manner and their alignment with human judgement, while assessing biases and responsiveness to contextual cues.

Limitations: Demographic and lexical biases persist, particularly in smaller models, and full context sensitivity cannot be achieved even through advanced prompting.

Institution: University of Oxford, Davidson College

Research Area: Hate Speech Evaluation, Multimodal LLMs, Social Bias, Computational Law, AI Bias, AI Evaluation

Discipline: Artificial Intelligence

Sample Size: 1854 participants