Ai Alignment: Research Area — Prolific Citations Library

Discover 11 peer-reviewed studies in Ai Alignment (2024–2026). Explore research findings powered by Prolific's diverse participant panel.

This page lists 11 peer-reviewed papers in the research area of Ai Alignment in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (11 of 11)

Moral Lenses, Political Coordinates: Towards Ideological Positioning of Morally Conditioned LLMs

Authors: C Yuan, B Ma, Z Zhang, B Prenkaj, F Kreuter, G Kasneci

Year: 2026

Published in: arXiv preprint arXiv:2601.08634, 2026•arxiv.org

Institution: Munich Center for Machine Learning, LMU Munich, Technical University of Munich

Research Area: Artificial Intelligence, AI Ethics, AI Alignment, Political Science, Computational Social Science

Discipline: Computer Science, Natural Language Processing (NLP)

This paper examines how large language models’ (LLMs) political outputs shift when you explicitly prime them with different moral values. Instead of just assigning fake personas (like “pretend to be liberal”), the authors condition models to endorse or reject specific moral values (e.g., utilitarianism, fairness, authority). They then measure how those moral primes move the models’ positions in...

DOI: https://doi.org/10.48550/arXiv.2601.08634
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

Authors: N Petrova, A Gordon, E Blindow

Year: 2026

Published in: Open review

Institution: Prolific

Research Area: Human-centered AI evaluation, Bayesian statistics, Responsible AI, AI alignment, LLM Evaluation

Discipline: Machine Learning, Artificial Intelligence

The study introduces HUMAINE, a multidimensional evaluation framework for LLMs, revealing demographic-specific preference variations and ranking google/gemini-2.5-pro as the top-performing model with a posterior probability of 95.6%.

Methods: Multi-turn naturalistic conversations analyzed using a hierarchical Bayesian Bradley-Terry-Davidson model with post-stratification to census data, stratified across 22 demographic groups.

Key Findings: Performance of 28 LLMs across five human-centric dimensions, accounting for demographic-specific preferences.

Sample Size: 23404
Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback: AD Lindström et al.

Authors: A Dahlgren Lindström, L Methnani, L Krause

Year: 2025

Published in: Ethics and Information ..., 2025 - Springer

Institution: Umeå University, Vrije Universiteit Amsterdam

Research Area: AI Alignment, AI Safety, Reinforcement Learning from Human Feedback (RLHF), Sociotechnical Systems

Discipline: Artificial Intelligence, Ethics

The paper critiques AI alignment efforts using RLHF and RLAIF, highlighting theoretical and practical limitations in meeting the goals of helpfulness, harmlessness, and honesty, and advocates for a broader sociotechnical approach to AI safety and ethics.

Methods: Sociotechnical critique of RLHF techniques with an analysis of theoretical frameworks and practical implementations.

Key Findings: The alignment of AI systems with human values and the efficacy of RLHF techniques in achieving the HHH principle (helpfulness, harmlessness, honesty).

DOI: https://doi.org/10.1007/s10676-025-09837-2

Citations: 14
To Mask or to Mirror: Human-AI Alignment in Collective Reasoning

Authors: C Qian, AT Parisi, C Bouleau, V Tsai

Year: 2025

Published in: Proceedings of the ..., 2025 - aclanthology.org

Institution: Google, Google DeepMind

Research Area: Human-AI Alignment, Collective Reasoning, Social Biases, LLM Simulation of Human Behavior, AI Bias

Discipline: Natural Language Processing, Artificial Intelligence, Computational Social Science

This study examines human-AI alignment in collective reasoning using an empirical framework, demonstrating how LLMs either mirror or mask human biases depending on context, cues, and model-specific inductive biases.

Methods: The study uses the Lost at Sea social psychology task in a large-scale online experiment, simulating LLM groups conditioned on human decision-making data across varying conditions of visible or pseudonymous demographics.

Key Findings: Alignment of LLM behavior with human social reasoning, focusing on collective decision-making and biases in group interactions.

Citations: 1

Sample Size: 748
Whose view of safety? a deep dive dataset for pluralistic alignment of text-to-image models

Authors: C Rastogi, TH Teh, P Mishra, R Patel, D Wang, M Díaz, A Parrish, AM Davani, Z Ashwood

Year: 2025

Published in: arXiv preprint arXiv:2507.13383, 2025•arxiv.org

Institution: Google DeepMind, Google Research, Google

Research Area: AI alignment, safety evaluation, AI Safety, Multimodal evaluation, Human–AI interaction, LLM

Discipline: Computer Science, Machine Learning, Artificial Intelligence

This research introduces the DIVE dataset to enable pluralistic alignment in text-to-image models by accounting for diverse safety perspectives, revealing demographic variations in harm perception and advancing T2I model alignment strategies.

Methods: The study involved collecting feedback across 1000 prompts from demographically intersectional human raters to capture diverse safety perspectives, with an emphasis on empirical and contextual differences in harm perception.

Key Findings: Safety perceptions of text-to-image (T2I) model outputs from diverse demographic viewpoints and the influence of these perspectives on alignment strategies.

Citations: 1

Sample Size: 1000
ImagenHub: Standardizing the evaluation of conditional image generation models

Authors: M Ku, T Li, K Zhang, Y Lu, X Fu, W Zhuang

Year: 2024

Published in: - arXiv preprint arXiv …, 2023 - arxiv.org

Institution: University of Waterloo, Ohio State University, University of California Santa Barbara, University of Pensylvania

Research Area: AI alignment, Representation learning, Cognitive computational modeling, Vision foundation models evaluation, Multimodal, Vision models

Discipline: Computer Science, Artificial Intelligence, Machine Learning

This paper presents a method for **aligning machine vision model representations with human visual similarity judgments across different abstraction levels, improving how well models reflect human perceptual and conceptual organization and enhancing generalization and uncertainty prediction.

DOI: https://doi.org/10.48550/arXiv.2310.01596

Citations: 59
Evaluating the alignment of AI with human emotions

Authors: JD Lomas, W van der Maden, S Bandyopadhyay

Year: 2024

Published in: Advanced Design ..., 2024 - Elsevier

Institution: Delft University of Technolog, Playpower Labs, Hong Kong Polytechnic University, Utrecht University

Research Area: AI Alignment, Affective Computing, Emotional Expression in Generative AI, Human Perception of AI Emotions

Discipline: Affective Computing, Artificial Intelligence, Human-Computer Interaction (HCI)

This study evaluates how well generative AI systems (like DALL·E 2/3 and Stable Diffusion) can generate emotionally expressive content that aligns with how humans perceive those emotions, finding that model performance varies by emotion type and model, with implications for designing more emotionally aligned AI.

DOI: https://doi.org/10.1016/j.ijadr.2024.10.002

Citations: 5
Whose preferences? differences in fairness preferences and their impact on the fairness of AI utilizing human feedback

Authors: M Lerner, F Dorner, E Ash, N Goel

Year: 2024

Published in: ... of the 62nd Annual Meeting of ..., 2024 - aclanthology.org

Institution: ETH Zürich, University of Oxford

Research Area: Fairness in AI, Content Moderation, Human-AI Alignment

Discipline: Computational Social Science

Citations: 5
Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment

Authors: Andrew Konya, Aviv Ovadya, Kevin Feng, Quan Ze Chen, Lisa Schirch, Colin Irwin

Year: 2024

Published in: ArXiv

Institution: AI & Democracy Foundation, Remesh, University of Washington

Research Area: AI Alignment, Public Will, Expert Intelligence, Rule-based Reward

Discipline: Artificial Intelligence, Computational Social Science
SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

Authors: Jing-Jing Li♡♠ Valentina Pyatkin♠ Max Kleiman-Weiner♣ Liwei Jiang♣ Nouha Dziri♠ &Anne G. E. Collins♡ Jana Schaich Borg♢ Maarten Sap♠◆ Yejin Choi♣ Sydney Levine♠

Year: 2024

Published in: ArXiv

Institution: Allen Institute for AI, Duke University, University of California Berkeley, University of Washington

Research Area: LLM Safety Moderation, Interpretable AI (XAI), LLM Alignment, Steerable AI

Discipline: Artificial Intelligence
Strong and weak alignment of large language models with human values

Authors: Mehdi Khamassi, Marceau Nahon1 and Raja Chatila

Year: 2024

Published in: ArXiv

Institution: Sorbonne University

Research Area: AI Alignment, AI Ethics, Computational Cognition

Discipline: Artificial Intelligence, Ethics, Computational Cognition