Ai Safety: Research Area — Prolific Citations Library

Discover 5 peer-reviewed studies in Ai Safety (2023–2025). Explore research findings powered by Prolific's diverse participant panel.

This page lists 5 peer-reviewed papers in the research area of Ai Safety in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (5 of 5)

Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback: AD Lindström et al.

Authors: A Dahlgren Lindström, L Methnani, L Krause

Year: 2025

Published in: Ethics and Information ..., 2025 - Springer

Institution: Umeå University, Vrije Universiteit Amsterdam

Research Area: AI Alignment, AI Safety, Reinforcement Learning from Human Feedback (RLHF), Sociotechnical Systems

Discipline: Artificial Intelligence, Ethics

The paper critiques AI alignment efforts using RLHF and RLAIF, highlighting theoretical and practical limitations in meeting the goals of helpfulness, harmlessness, and honesty, and advocates for a broader sociotechnical approach to AI safety and ethics.

Methods: Sociotechnical critique of RLHF techniques with an analysis of theoretical frameworks and practical implementations.

Key Findings: The alignment of AI systems with human values and the efficacy of RLHF techniques in achieving the HHH principle (helpfulness, harmlessness, honesty).

DOI: https://doi.org/10.1007/s10676-025-09837-2

Citations: 14
Whose view of safety? a deep dive dataset for pluralistic alignment of text-to-image models

Authors: C Rastogi, TH Teh, P Mishra, R Patel, D Wang, M Díaz, A Parrish, AM Davani, Z Ashwood

Year: 2025

Published in: arXiv preprint arXiv:2507.13383, 2025•arxiv.org

Institution: Google DeepMind, Google Research, Google

Research Area: AI alignment, safety evaluation, AI Safety, Multimodal evaluation, Human–AI interaction, LLM

Discipline: Computer Science, Machine Learning, Artificial Intelligence

This research introduces the DIVE dataset to enable pluralistic alignment in text-to-image models by accounting for diverse safety perspectives, revealing demographic variations in harm perception and advancing T2I model alignment strategies.

Methods: The study involved collecting feedback across 1000 prompts from demographically intersectional human raters to capture diverse safety perspectives, with an emphasis on empirical and contextual differences in harm perception.

Key Findings: Safety perceptions of text-to-image (T2I) model outputs from diverse demographic viewpoints and the influence of these perspectives on alignment strategies.

Citations: 1

Sample Size: 1000
Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents

Authors: K Grosse, N Ebert

Year: 2025

Published in: ARXIV

Institution: IBM Research, ZHAW

Research Area: Security and privacy risks, LLM, human–AI interaction, AI Safety

Discipline: Computer Science

A survey of 3,270 UK adults reveals significant security and privacy risks in AI conversational agent usage, with a third engaging in risky behavior enabling attacks and many unaware of how their data are used or opting out.

Methods: Representative survey conducted via Prolific platform targeting UK adults, focusing on usage behaviors of AI conversational agents.

Key Findings: User behaviors related to security and privacy risks, data sanitization practices, attempts to jailbreak AI models, and awareness of data usage policies.

Sample Size: 3270
HRLAIF: Improvements in helpfulness and harmlessness in open-domain reinforcement learning from ai feedback

Authors: A Li, Q Xiao, P Cao, J Tang, Y Yuan, Z Zhao

Year: 2024

Published in: arXiv preprint arXiv ..., 2024 - arxiv.org

Institution: Beijing University, Alibaba Group

Research Area: Reinforcement Learning from AI Feedback (RLAIF), Safety and Utility of Open-domain Language Models, Open Source LLM

Discipline: Artificial Intelligence

Citations: 12
Ai safety: where do we stand presently

Authors: A Hari, MS Abdulla

Year: 2023

Published in: 2023 - iimk.ac.in

Research Area: AI Safety, Information Technology Strategy

Discipline: Artificial Intelligence

Citations: 3