Discover 5 peer-reviewed studies in Ai Safety (2023–2025). Explore research findings powered by Prolific's diverse participant panel.
This page lists 5 peer-reviewed papers in the research area of Ai Safety in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.
-
Authors: A Dahlgren Lindström, L Methnani, L Krause
Year: 2025
Published in: Ethics and Information ..., 2025 - Springer
Institution: Umeå University, Vrije Universiteit Amsterdam
Research Area: AI Alignment, AI Safety, Reinforcement Learning from Human Feedback (RLHF), Sociotechnical Systems
Discipline: Artificial Intelligence, Ethics
The paper critiques AI alignment efforts using RLHF and RLAIF, highlighting theoretical and practical limitations in meeting the goals of helpfulness, harmlessness, and honesty, and advocates for a broader sociotechnical approach to AI safety and ethics.
Methods: Sociotechnical critique of RLHF techniques with an analysis of theoretical frameworks and practical implementations.
Key Findings: The alignment of AI systems with human values and the efficacy of RLHF techniques in achieving the HHH principle (helpfulness, harmlessness, honesty).
DOI: https://doi.org/10.1007/s10676-025-09837-2
Citations: 14
-
Authors: C Rastogi, TH Teh, P Mishra, R Patel, D Wang, M Díaz, A Parrish, AM Davani, Z Ashwood
Year: 2025
Published in: arXiv preprint arXiv:2507.13383, 2025•arxiv.org
Institution: Google DeepMind, Google Research, Google
Research Area: AI alignment, safety evaluation, AI Safety, Multimodal evaluation, Human–AI interaction, LLM
Discipline: Computer Science, Machine Learning, Artificial Intelligence
This research introduces the DIVE dataset to enable pluralistic alignment in text-to-image models by accounting for diverse safety perspectives, revealing demographic variations in harm perception and advancing T2I model alignment strategies.
Methods: The study involved collecting feedback across 1000 prompts from demographically intersectional human raters to capture diverse safety perspectives, with an emphasis on empirical and contextual differences in harm perception.
Key Findings: Safety perceptions of text-to-image (T2I) model outputs from diverse demographic viewpoints and the influence of these perspectives on alignment strategies.
Citations: 1
Sample Size: 1000
-
Authors: K Grosse, N Ebert
Year: 2025
Published in: ARXIV
Institution: IBM Research, ZHAW
Research Area: Security and privacy risks, LLM, human–AI interaction, AI Safety
Discipline: Computer Science
A survey of 3,270 UK adults reveals significant security and privacy risks in AI conversational agent usage, with a third engaging in risky behavior enabling attacks and many unaware of how their data are used or opting out.
Methods: Representative survey conducted via Prolific platform targeting UK adults, focusing on usage behaviors of AI conversational agents.
Key Findings: User behaviors related to security and privacy risks, data sanitization practices, attempts to jailbreak AI models, and awareness of data usage policies.
Sample Size: 3270
-
Authors: A Li, Q Xiao, P Cao, J Tang, Y Yuan, Z Zhao
Year: 2024
Published in: arXiv preprint arXiv ..., 2024 - arxiv.org
Institution: Beijing University, Alibaba Group
Research Area: Reinforcement Learning from AI Feedback (RLAIF), Safety and Utility of Open-domain Language Models, Open Source LLM
Discipline: Artificial Intelligence
Citations: 12
-
Authors: A Hari, MS Abdulla
Year: 2023
Published in: 2023 - iimk.ac.in
Research Area: AI Safety, Information Technology Strategy
Discipline: Artificial Intelligence
Citations: 3