Authors: C Rastogi, TH Teh, P Mishra, R Patel, D Wang, M Díaz, A Parrish, AM Davani, Z Ashwood
Year: 2025
Published in: arXiv preprint arXiv:2507.13383, 2025•arxiv.org
Institution: Google DeepMind, Google Research, Google
Research Area: AI alignment, safety evaluation, AI Safety, Multimodal evaluation, Human–AI interaction, LLM
Discipline: Computer Science, Machine Learning, Artificial Intelligence
This research introduces the DIVE dataset to enable pluralistic alignment in text-to-image models by accounting for diverse safety perspectives, revealing demographic variations in harm perception and advancing T2I model alignment strategies.
Methods: The study involved collecting feedback across 1000 prompts from demographically intersectional human raters to capture diverse safety perspectives, with an emphasis on empirical and contextual differences in harm perception.
Key Findings: Safety perceptions of text-to-image (T2I) model outputs from diverse demographic viewpoints and the influence of these perspectives on alignment strategies.
Citations: 1
Sample Size: 1000
Authors: J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang
Year: 2023
Published in: arXiv preprint arXiv ..., 2023 - arxiv.org
Institution: Cornell University, Georgia Institute of Technology
Research Area: Reinforcement Learning from Human Feedback (RLHF), Safe AI, Reinforcement Learning
Discipline: Artificial Intelligence, Machine Learning
DOI: https://doi.org/10.48550/arXiv.2310.12773
Citations: 598