Preference Learning Studies

This page lists 47 peer-reviewed papers tagged with Preference Learning in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (20 of 47)

Bayesian teaching enables probabilistic reasoning in large language models

Authors: L Qiu, F Sha, K Allen, Y Kim, T Linzen, S van Steenkiste

Year: 2026

Published in: Nature …, 2026 - nature.com

Institution: Meta, Google DeepMind, Massachusetts Institute of Technology, Google Research, Google

Research Area: Probabilistic reasoning, Bayesian cognition, Neural language models, Reasoning, AI Evaluations

Discipline: Machine learning, Artificial Intelligence

This paper sits at the intersection of machine learning and computational cognitive science, showing that large language models can acquire generalized probabilistic reasoning by being trained to imitate Bayesian belief updating rather than relying on prompting or heuristics.

Citations: 8
Impacts of Effective Altruism on Donor Behaviour: A Randomised Discrete Choice Experiment

Authors: JW Berge, LIO Berge, W Chiu, I Kolstad

Year: 2026

Published in: The Journal of Development Studies, 2026•Taylor & Francis

Institution: Norwegian School of Economics

Effective altruism information treatment showed no effect on donations to advocacy organisations but increased support for international-focused NGOs, particularly among donors with less universalistic preferences and low trust in NGOs.

Methods: Randomised online discrete choice experiment with pre-registration and incentivised donation measurement.

Key Findings: Impact of effective altruism information treatment on donor behaviour and NGO support preferences.

DOI: https://doi.org/10.1080/00220388.2025.2601583
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

Authors: N Petrova, A Gordon, E Blindow

Year: 2026

Published in: Open review

Institution: Prolific

Research Area: Human-centered AI evaluation, Bayesian statistics, Responsible AI, AI alignment, LLM Evaluation

Discipline: Machine Learning, Artificial Intelligence

The study introduces HUMAINE, a multidimensional evaluation framework for LLMs, revealing demographic-specific preference variations and ranking google/gemini-2.5-pro as the top-performing model with a posterior probability of 95.6%.

Methods: Multi-turn naturalistic conversations analyzed using a hierarchical Bayesian Bradley-Terry-Davidson model with post-stratification to census data, stratified across 22 demographic groups.

Key Findings: Performance of 28 LLMs across five human-centric dimensions, accounting for demographic-specific preferences.

Sample Size: 23404
RLHF deciphered: A critical analysis of reinforcement learning from human feedback for LLMs

Authors: S Chaudhari, P Aggarwal, V Murahari

Year: 2025

Published in: ACM Computing ..., 2025 - dl.acm.org

Institution: University of Massachusetts Amherst, Carnegie Mellon University, Princeton University

Research Area: Reinforcement Learning from Human Feedback (RLHF), Large Language Models

Discipline: Artificial Intelligence

The paper critically analyzes reinforcement learning from human feedback (RLHF) for large language models (LLMs), emphasizing the importance and limitations of reward models in improving human-aligned AI systems.

Methods: Analyzed RLHF frameworks through reinforcement learning principles; conducted a categorical literature review to identify modeling challenges, assumptions, and framework limitations.

Key Findings: Investigated RLHF's fundamentals, focusing on the role of reward models, implications of design choices in RLHF training algorithms, and underlying issues like generalization errors, model misspecification, and feedback sparsity.

Citations: 117
Visual cognition in multimodal large language models

Authors: LM Schulze Buschoff, E Akata, M Bethge

Year: 2025

Published in: Nature Machine ..., 2025 - nature.com

Institution: Max Planck Institute

Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)

Discipline: Cognitive Science, Artificial Intelligence, Computer Vision

Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.

Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.

Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.

DOI: https://doi.org/10.1038/s42256-024-00963-y

Citations: 70
People Overtrust AI-Generated Medical Advice despite Low Accuracy

Authors: S Shekar, P Pataranutaporn, C Sarabu, GA Cecchi

Year: 2025

Published in: NEJM AI, 2025 - ai.nejm.org

Institution: MIT Media Lab, IBM Research, Stanford University, Massachusetts Institute of Technology

Research Area: AI Ethics, Healthcare, Patient Trust, Medical Misinformation

Discipline: Artificial Intelligence, Human-Computer Interaction, AI Ethics

This paper discusses a study by MIT researchers detailing patient trust in AI-generated medical advice, even when that advice is incorrect, raising concerns about misinformation in healthcare.

Citations: 19
Large Language Models are often politically extreme, usually ideologically inconsistent, and persuasive even in informational contexts

Authors: N Aldahoul, H Ibrahim, M Varvello, A Kaufman

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Delft University of Technology, University of Pennsylvania, New York University, King Abdullah University of Science and Technology, Massachusetts Institute of Technology, University of Texas at Austin

Research Area: Artificial Intelligence, Computer Science, Political Science

Discipline: Artificial Intelligence, Social Science

The study finds that Large Language Models (LLMs) exhibit extreme political views on specific topics despite appearing ideologically moderate overall, and demonstrate a persuasive influence on users' political preferences even in informational contexts.

Methods: Compared 31 LLMs' political biases against benchmarks (legislators, judges, representative voter samples) and conducted a randomized experiment to measure their persuasive impact in informational interactions.

Key Findings: Ideological consistency, political extremity, and persuasive effects of LLMs in information-seeking contexts.

Citations: 7

Sample Size: 31
Sycophantic AI decreases prosocial intentions and promotes dependence

Authors: M Cheng, C Lee, P Khadpe, S Yu, D Han

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Stanford University, Carnegie Mellon University

Research Area: Computer Science, Artificial Intelligence, Sycophancy.

Discipline: Computer Science, Psychology

The study shows that sycophantic AI, which validates user inputs unquestioningly, reduces people's prosocial behavior and fosters dependence, despite users perceiving such AI as higher quality and more trustworthy.

Methods: The researchers conducted two preregistered experiments including a live-interaction study, where participants discussed real interpersonal conflicts with AI models. They evaluated responses from 11 state-of-the-art AI models on levels of sycophancy and its psychological effects on users.

Key Findings: The prevalence of sycophantic behavior in AI, users' prosocial intentions, conviction of being in the right, trust in AI, and willingness to reuse sycophantic AI models.

Citations: 5

Sample Size: 1604
Age and gender distortion in online media and large language models

Authors: D Guilbeault, S Delecourt, BS Desikan

Year: 2025

Published in: Nature, 2025 - nature.com

Institution: Stanford University, University of California Berkeley, University of Oxford

Research Area: AI Bias, Media Representation, Social Science

Discipline: Computational Social Science, Artificial Intelligence

The study highlights age-related gender bias in online media and language models, showing women are portrayed as younger than men, especially in high-status occupations, and explores how algorithms amplify these biases.

Methods: Analysis of 1.4 million images and videos from online sources and nine language models, followed by a pre-registered experiment involving participants to evaluate biases in internet content and algorithms.

Key Findings: Age and gender bias in occupational depiction across online platforms and language models, as well as its influence on beliefs and hiring preferences.

Citations: 4

Sample Size: 459
Impact of AI-Assisted Diagnosis on American Patients' Trust in and Intention to Seek Help From Health Care Professionals: Randomized, Web-Based Survey ...

Authors: C Chen, Z Cui

Year: 2025

Published in: Journal of Medical Internet Research, 2025 - jmir.org

Institution: Medical College of Wisconsin

Research Area: Trust in AI, AI-assisted diagnosis, Health communication, Healthcare human-AI interaction

Discipline: Digital Health, Human-Computer Interaction, Behavioral Science

Patients trust and are more likely to seek help from doctors explicitly avoiding AI-assisted diagnosis rather than those using extensive or moderate AI, highlighting a strong aversion to AI in healthcare settings.

Methods: A randomized, web-based 4-group survey experiment was conducted with controls for sociodemographic factors and analysis using regression, mediation, and moderation techniques.

Key Findings: Trust in and intention to seek medical help from health care professionals using AI-assisted diagnosis versus those avoiding AI, and the influence of demographic, social, and experiential factors.

DOI: https://doi.org/10.2196/66083

Citations: 4

Sample Size: 1762
Persuading voters using human-artificial intelligence dialogues

Authors: H Lin, G Czarnek, B Lewis, JP White, AJ Berinsky

Year: 2025

Published in: Nature, 2025 - nature.com

Institution: Massachusetts Institute of Technology

Research Area: Political Persuasion, Human-AI Dialogue, Electoral Behavior

Discipline: Political Science, Artificial Intelligence

The study shows that AI-driven dialogues can significantly influence voter attitudes and candidate preferences, with persuasion effects surpassing traditional political advertisements, though inaccuracies were more prevalent in content generated by AI models supporting right-wing candidates.

Methods: Pre-registered experiments where participants interacted with AI advocating for one of two candidates or a ballot measure, examining persuasion strategies and effects across three elections.

Key Findings: Influence of AI-generated dialogues on voter attitudes and preferences, including analysis of persuasion strategies and accuracy of presented information.

Citations: 3
Incentivizing High-Quality Human Annotations with Golden Questions

Authors: S Liu, Z Cai, H Wang, Z Ma, X Li

Year: 2025

Published in: arXiv preprint arXiv:2505.19134, 2025 - arxiv.org

Institution: Meta, Imperial College London

Research Area: Artificial Intelligence, Crowdsourcing, Large Language Models

Discipline: Artificial Intelligence

The paper develops a principal-agent model to incentivize high-quality human annotations using golden questions and identifies criteria for these questions to effectively monitor annotators' performance.

Methods: The authors use a principal-agent model with maximum likelihood estimators (MLE) and hypothesis testing to design incentive-compatible systems for annotators. Golden questions of high certainty and similar format to normal data were selected and validated through experiments.

Key Findings: The effectiveness of golden questions for incentivizing and monitoring high-quality human annotations in preference data.

DOI: https://doi.org/10.48550/arXiv.2505.19134

Citations: 1
Influencing Humans to Conform to Preference Models for RLHF

Authors: S Hatgis-Kessell, WB Knox, S Booth, S Niekum

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Stanford University, UMass Amherst, Carnegie Mellon University

Research Area: Reinforcement Learning from Human Feedback (RLHF)

Discipline: Artificial Intelligence, Human-Computer Interaction

The paper investigates whether human preferences can be influenced to align more closely with assumed preference models in RLHF algorithms through interventions such as showing model-derived quantities, training on preference models, and modifying elicitation questions.

Methods: Three human studies were conducted where interventions were tested, including revealing model-derived quantities, training participants on a preference model, and altering how preference questions were framed.

Key Findings: Evaluated the impact of interventions on humans' expression of preferences to align better with the assumed preference models of RLHF algorithms.

DOI: https://doi.org/10.48550/arXiv.2501.06416

Citations: 1
Who's Sorry Now: User Preferences Among Rote, Empathic, and Explanatory Apologies from LLM Chatbots

Authors: Z Ashktorab, A Buccella, J D'Cruz, Z Fowler, A Gill, KY Leung, PD Magnus, J Richards

Year: 2025

Published in: arXiv preprint arXiv:2507.02745, 2025•arxiv.org

Institution: IBM Research, University at Albany

Research Area: Human-AI Interaction, AI systems evaluation, UX, User Experience

Discipline: Computer Science, Human-Computer Interaction

In a preregistered study with 162 participants, people generally prefer explanatory apologies from LLM chatbots over rote or purely empathic ones—though in biased error scenarios empathic apologies are sometimes favored—highlighting the complexity of designing chatbot apologies that effectively repair trust.

DOI: https://doi.org/10.48550/arXiv.2507.02745

Citations: 1
A Descriptive and Normative Theory of Human Beliefs in RLHF

Authors: S Dandekar, S Deshmukh, F Chiu, WB Knox

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: University of California, Davis, Northwestern University

Research Area: Reinforcement Learning from Human Feedback (RLHF), Human-AI Interaction, AI Theory

Discipline: Artificial Intelligence, Social Science

The paper investigates how human beliefs about agent capabilities influence preferences in RLHF, proposing a model to minimize the mismatch between beliefs and idealized agent capabilities, ultimately improving policy performance.

Methods: Human studies and synthetic experiments to model and test the impact of belief mismatches on human preferences and RLHF effectiveness.

Key Findings: Effects of human beliefs about agent capabilities on their provided preferences and the performance of RLHF policies.

DOI: https://doi.org/10.48550/arXiv.2506.01692
Artificial minds and real beliefs: Perceiving mental states in AI

Authors: O Jacobs

Year: 2025

Published in: 2025 - open.library.ubc.ca

Institution: University of British Columbia

Research Area: Mind Perception in Human-AI Interaction, Anthropomorphism, Psychology

Discipline: Psychology, Human-Computer Interaction

This is a University of British Columbia doctoral thesis that investigates how people perceive and attribute mental states (beliefs, intentions, minds) to artificial intelligence systems — exploring the psychological and conceptual underpinnings of mind perception in human–AI interaction.
Fairness Perceptions in Regression-based Predictive Models

Authors: Mukund Telukunta, Venkata Sriram Siddhardh Nadendla, Morgan Stuart, Casey Canfield

Year: 2025

Published in: ArXiv

Institution: Missouri University of Science and Technology, United Network for Organ Sharing

Research Area: Algorithmic Fairness, Healthcare AI, Decision Making

Discipline: Artificial Intelligence

The study investigates fairness in regression-based predictive models for kidney transplantation, introducing three group fairness notions and identifying social preferences for fairness criteria, revealing biases against age groups but fairness towards gender and race groups.

Methods: Three novel fairness notions (independence, separation, sufficiency) were introduced alongside crowd feedback analysis through a Mixed-Logit discrete choice model.

Key Findings: Fairness in regression-based predictive analytics regarding group fairness criteria across social dimensions such as age, gender, and race.

Sample Size: 85
LLM-Generated Ads: From Personalization Parity to Persuasion Superiority

Authors: E Meguellati, S Civelli, L Han, A Bernstein

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Oregon Health Sciences University, Oregon University of California, Irvine, Han Institute, NYU School of Law, Bernstein Research

Research Area: Advertising, Persuasion Strategies, Human-AI Interaction in Content Generation

Discipline: Artificial Intelligence

LLM-generated advertisements achieved parity with human-written ads in personalization and demonstrated superiority in persuasion using psychological principles, outperforming human ads even when AI-origin detection impacted results.

Methods: Two-part study: First examined LLM personalization based on personality traits; second tested psychological persuasion principles using universal messages across authority, consensus, cognition, and scarcity.

Key Findings: Effectiveness of LLM-generated ads in personalization and persuasive storytelling compared to human-created ads.

Sample Size: 1200
When Content is Goliath and Algorithm is David: The Style and Semantic Effects of Generative Search Engine

Authors: L Ma, J Qin, X Xu, Y Tan

Year: 2025

Published in: arXiv preprint arXiv:2509.14436, 2025•arxiv.org

Institution: University of North Carolina Charlotte, University of Science and Technology of China, University of Washington

Research Area: LLM behavior, Algorithmic content preference, Human-AI Interaction

Discipline: Computer Science, Information Retrieval, Artificial Intelligence

This paper studies how generative search engines that use large language models (LLMs)—like Google’s AI overviews—select and cite web content, showing that these engines prefer content that is more predictable and semantically coherent for the model, and that LLM-based content polishing can increase the diversity and usefulness of AI summaries for users.

DOI: https://doi.org/10.48550/arXiv.2509.14436
A survey of reinforcement learning from human feedback

Authors: T Kaufmann, P Weng, V Bengs, E Hüllermeier

Year: 2024

Published in: 2024 - epub.ub.uni-muenchen.de

Institution: Paderborn University, German Research Center for Artificial Intelligence (DFKI), Duke Kunshan University

Research Area: Reinforcement Learning from Human Feedback (RLHF), Large Language Models, Reward Modeling

Discipline: Artificial Intelligence

This paper surveys the fundamentals, diverse applications, and evolving impact of reinforcement learning from human feedback (RLHF), emphasizing its role in improving intelligent system alignment and performance.

Methods: The paper utilizes a survey-based approach to synthesize existing research, exploring the interactions between reinforcement learning algorithms and human input.

Key Findings: The study examines the principles, dynamics, applications, and trends in RLHF, offering insights into its role in enhancing large language models (LLMs) and intelligent systems.

Citations: 354