Discover 11 peer-reviewed studies in Ai Alignment (2024–2026). Explore research findings powered by Prolific's diverse participant panel.
This page lists 11 peer-reviewed papers in the research area of Ai Alignment in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.
-
Authors: C Yuan, B Ma, Z Zhang, B Prenkaj, F Kreuter, G Kasneci
Year: 2026
Published in: arXiv preprint arXiv:2601.08634, 2026•arxiv.org
Institution: Munich Center for Machine Learning, LMU Munich, Technical University of Munich
Research Area: Artificial Intelligence, AI Ethics, AI Alignment, Political Science, Computational Social Science
Discipline: Computer Science, Natural Language Processing (NLP)
This paper examines how large language models’ (LLMs) political outputs shift when you explicitly prime them with different moral values. Instead of just assigning fake personas (like “pretend to be liberal”), the authors condition models to endorse or reject specific moral values (e.g., utilitarianism, fairness, authority). They then measure how those moral primes move the models’ positions in...
DOI: https://doi.org/10.48550/arXiv.2601.08634
-
Authors: N Petrova, A Gordon, E Blindow
Year: 2026
Published in: Open review
Institution: Prolific
Research Area: Human-centered AI evaluation, Bayesian statistics, Responsible AI, AI alignment, LLM Evaluation
Discipline: Machine Learning, Artificial Intelligence
The study introduces HUMAINE, a multidimensional evaluation framework for LLMs, revealing demographic-specific preference variations and ranking google/gemini-2.5-pro as the top-performing model with a posterior probability of 95.6%.
Methods: Multi-turn naturalistic conversations analyzed using a hierarchical Bayesian Bradley-Terry-Davidson model with post-stratification to census data, stratified across 22 demographic groups.
Key Findings: Performance of 28 LLMs across five human-centric dimensions, accounting for demographic-specific preferences.
Sample Size: 23404
-
Authors: A Dahlgren Lindström, L Methnani, L Krause
Year: 2025
Published in: Ethics and Information ..., 2025 - Springer
Institution: Umeå University, Vrije Universiteit Amsterdam
Research Area: AI Alignment, AI Safety, Reinforcement Learning from Human Feedback (RLHF), Sociotechnical Systems
Discipline: Artificial Intelligence, Ethics
The paper critiques AI alignment efforts using RLHF and RLAIF, highlighting theoretical and practical limitations in meeting the goals of helpfulness, harmlessness, and honesty, and advocates for a broader sociotechnical approach to AI safety and ethics.
Methods: Sociotechnical critique of RLHF techniques with an analysis of theoretical frameworks and practical implementations.
Key Findings: The alignment of AI systems with human values and the efficacy of RLHF techniques in achieving the HHH principle (helpfulness, harmlessness, honesty).
DOI: https://doi.org/10.1007/s10676-025-09837-2
Citations: 14
-
Authors: C Qian, AT Parisi, C Bouleau, V Tsai
Year: 2025
Published in: Proceedings of the ..., 2025 - aclanthology.org
Institution: Google, Google DeepMind
Research Area: Human-AI Alignment, Collective Reasoning, Social Biases, LLM Simulation of Human Behavior, AI Bias
Discipline: Natural Language Processing, Artificial Intelligence, Computational Social Science
This study examines human-AI alignment in collective reasoning using an empirical framework, demonstrating how LLMs either mirror or mask human biases depending on context, cues, and model-specific inductive biases.
Methods: The study uses the Lost at Sea social psychology task in a large-scale online experiment, simulating LLM groups conditioned on human decision-making data across varying conditions of visible or pseudonymous demographics.
Key Findings: Alignment of LLM behavior with human social reasoning, focusing on collective decision-making and biases in group interactions.
Citations: 1
Sample Size: 748
-
Authors: C Rastogi, TH Teh, P Mishra, R Patel, D Wang, M Díaz, A Parrish, AM Davani, Z Ashwood
Year: 2025
Published in: arXiv preprint arXiv:2507.13383, 2025•arxiv.org
Institution: Google DeepMind, Google Research, Google
Research Area: AI alignment, safety evaluation, AI Safety, Multimodal evaluation, Human–AI interaction, LLM
Discipline: Computer Science, Machine Learning, Artificial Intelligence
This research introduces the DIVE dataset to enable pluralistic alignment in text-to-image models by accounting for diverse safety perspectives, revealing demographic variations in harm perception and advancing T2I model alignment strategies.
Methods: The study involved collecting feedback across 1000 prompts from demographically intersectional human raters to capture diverse safety perspectives, with an emphasis on empirical and contextual differences in harm perception.
Key Findings: Safety perceptions of text-to-image (T2I) model outputs from diverse demographic viewpoints and the influence of these perspectives on alignment strategies.
Citations: 1
Sample Size: 1000
-
Authors: M Ku, T Li, K Zhang, Y Lu, X Fu, W Zhuang
Year: 2024
Published in: - arXiv preprint arXiv …, 2023 - arxiv.org
Institution: University of Waterloo, Ohio State University, University of California Santa Barbara, University of Pensylvania
Research Area: AI alignment, Representation learning, Cognitive computational modeling, Vision foundation models evaluation, Multimodal, Vision models
Discipline: Computer Science, Artificial Intelligence, Machine Learning
This paper presents a method for **aligning machine vision model representations with human visual similarity judgments across different abstraction levels, improving how well models reflect human perceptual and conceptual organization and enhancing generalization and uncertainty prediction.
DOI: https://doi.org/10.48550/arXiv.2310.01596
Citations: 59
-
Authors: JD Lomas, W van der Maden, S Bandyopadhyay
Year: 2024
Published in: Advanced Design ..., 2024 - Elsevier
Institution: Delft University of Technolog, Playpower Labs, Hong Kong Polytechnic University, Utrecht University
Research Area: AI Alignment, Affective Computing, Emotional Expression in Generative AI, Human Perception of AI Emotions
Discipline: Affective Computing, Artificial Intelligence, Human-Computer Interaction (HCI)
This study evaluates how well generative AI systems (like DALL·E 2/3 and Stable Diffusion) can generate emotionally expressive content that aligns with how humans perceive those emotions, finding that model performance varies by emotion type and model, with implications for designing more emotionally aligned AI.
DOI: https://doi.org/10.1016/j.ijadr.2024.10.002
Citations: 5
-
Authors: M Lerner, F Dorner, E Ash, N Goel
Year: 2024
Published in: ... of the 62nd Annual Meeting of ..., 2024 - aclanthology.org
Institution: ETH Zürich, University of Oxford
Research Area: Fairness in AI, Content Moderation, Human-AI Alignment
Discipline: Computational Social Science
Citations: 5
-
Authors: Andrew Konya, Aviv Ovadya, Kevin Feng, Quan Ze Chen, Lisa Schirch, Colin Irwin
Year: 2024
Published in: ArXiv
Institution: AI & Democracy Foundation, Remesh, University of Washington
Research Area: AI Alignment, Public Will, Expert Intelligence, Rule-based Reward
Discipline: Artificial Intelligence, Computational Social Science
-
Authors: Jing-Jing Li♡♠ Valentina Pyatkin♠ Max Kleiman-Weiner♣ Liwei Jiang♣ Nouha Dziri♠ &Anne G. E. Collins♡ Jana Schaich Borg♢ Maarten Sap♠◆ Yejin Choi♣ Sydney Levine♠
Year: 2024
Published in: ArXiv
Institution: Allen Institute for AI, Duke University, University of California Berkeley, University of Washington
Research Area: LLM Safety Moderation, Interpretable AI (XAI), LLM Alignment, Steerable AI
Discipline: Artificial Intelligence
-
Authors: Mehdi Khamassi, Marceau Nahon1 and Raja Chatila
Year: 2024
Published in: ArXiv
Institution: Sorbonne University
Research Area: AI Alignment, AI Ethics, Computational Cognition
Discipline: Artificial Intelligence, Ethics, Computational Cognition