Browse 47 peer-reviewed papers in Preference Learning. Discover studies powered by high-quality human data from Prolific.
This page lists 47 peer-reviewed papers tagged with Preference Learning in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.
-
Authors: L Qiu, F Sha, K Allen, Y Kim, T Linzen, S van Steenkiste
Year: 2026
Published in: Nature …, 2026 - nature.com
Institution: Meta, Google DeepMind, Massachusetts Institute of Technology, Google Research, Google
Research Area: Probabilistic reasoning, Bayesian cognition, Neural language models, Reasoning, AI Evaluations
Discipline: Machine learning, Artificial Intelligence
This paper sits at the intersection of machine learning and computational cognitive science, showing that large language models can acquire generalized probabilistic reasoning by being trained to imitate Bayesian belief updating rather than relying on prompting or heuristics.
Citations: 8
-
Authors: JW Berge, LIO Berge, W Chiu, I Kolstad
Year: 2026
Published in: The Journal of Development Studies, 2026•Taylor & Francis
Institution: Norwegian School of Economics
Effective altruism information treatment showed no effect on donations to advocacy organisations but increased support for international-focused NGOs, particularly among donors with less universalistic preferences and low trust in NGOs.
Methods: Randomised online discrete choice experiment with pre-registration and incentivised donation measurement.
Key Findings: Impact of effective altruism information treatment on donor behaviour and NGO support preferences.
DOI: https://doi.org/10.1080/00220388.2025.2601583
-
Authors: N Petrova, A Gordon, E Blindow
Year: 2026
Published in: Open review
Institution: Prolific
Research Area: Human-centered AI evaluation, Bayesian statistics, Responsible AI, AI alignment, LLM Evaluation
Discipline: Machine Learning, Artificial Intelligence
The study introduces HUMAINE, a multidimensional evaluation framework for LLMs, revealing demographic-specific preference variations and ranking google/gemini-2.5-pro as the top-performing model with a posterior probability of 95.6%.
Methods: Multi-turn naturalistic conversations analyzed using a hierarchical Bayesian Bradley-Terry-Davidson model with post-stratification to census data, stratified across 22 demographic groups.
Key Findings: Performance of 28 LLMs across five human-centric dimensions, accounting for demographic-specific preferences.
Sample Size: 23404
-
Authors: S Chaudhari, P Aggarwal, V Murahari
Year: 2025
Published in: ACM Computing ..., 2025 - dl.acm.org
Institution: University of Massachusetts Amherst, Carnegie Mellon University, Princeton University
Research Area: Reinforcement Learning from Human Feedback (RLHF), Large Language Models
Discipline: Artificial Intelligence
The paper critically analyzes reinforcement learning from human feedback (RLHF) for large language models (LLMs), emphasizing the importance and limitations of reward models in improving human-aligned AI systems.
Methods: Analyzed RLHF frameworks through reinforcement learning principles; conducted a categorical literature review to identify modeling challenges, assumptions, and framework limitations.
Key Findings: Investigated RLHF's fundamentals, focusing on the role of reward models, implications of design choices in RLHF training algorithms, and underlying issues like generalization errors, model misspecification, and feedback sparsity.
Citations: 117
-
Authors: LM Schulze Buschoff, E Akata, M Bethge
Year: 2025
Published in: Nature Machine ..., 2025 - nature.com
Institution: Max Planck Institute
Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)
Discipline: Cognitive Science, Artificial Intelligence, Computer Vision
Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.
Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.
Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.
DOI: https://doi.org/10.1038/s42256-024-00963-y
Citations: 70
-
Authors: S Shekar, P Pataranutaporn, C Sarabu, GA Cecchi
Year: 2025
Published in: NEJM AI, 2025 - ai.nejm.org
Institution: MIT Media Lab, IBM Research, Stanford University, Massachusetts Institute of Technology
Research Area: AI Ethics, Healthcare, Patient Trust, Medical Misinformation
Discipline: Artificial Intelligence, Human-Computer Interaction, AI Ethics
This paper discusses a study by MIT researchers detailing patient trust in AI-generated medical advice, even when that advice is incorrect, raising concerns about misinformation in healthcare.
Citations: 19
-
Authors: N Aldahoul, H Ibrahim, M Varvello, A Kaufman
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Delft University of Technology, University of Pennsylvania, New York University, King Abdullah University of Science and Technology, Massachusetts Institute of Technology, University of Texas at Austin
Research Area: Artificial Intelligence, Computer Science, Political Science
Discipline: Artificial Intelligence, Social Science
The study finds that Large Language Models (LLMs) exhibit extreme political views on specific topics despite appearing ideologically moderate overall, and demonstrate a persuasive influence on users' political preferences even in informational contexts.
Methods: Compared 31 LLMs' political biases against benchmarks (legislators, judges, representative voter samples) and conducted a randomized experiment to measure their persuasive impact in informational interactions.
Key Findings: Ideological consistency, political extremity, and persuasive effects of LLMs in information-seeking contexts.
Citations: 7
Sample Size: 31
-
Authors: M Cheng, C Lee, P Khadpe, S Yu, D Han
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Stanford University, Carnegie Mellon University
Research Area: Computer Science, Artificial Intelligence, Sycophancy.
Discipline: Computer Science, Psychology
The study shows that sycophantic AI, which validates user inputs unquestioningly, reduces people's prosocial behavior and fosters dependence, despite users perceiving such AI as higher quality and more trustworthy.
Methods: The researchers conducted two preregistered experiments including a live-interaction study, where participants discussed real interpersonal conflicts with AI models. They evaluated responses from 11 state-of-the-art AI models on levels of sycophancy and its psychological effects on users.
Key Findings: The prevalence of sycophantic behavior in AI, users' prosocial intentions, conviction of being in the right, trust in AI, and willingness to reuse sycophantic AI models.
Citations: 5
Sample Size: 1604
-
Authors: D Guilbeault, S Delecourt, BS Desikan
Year: 2025
Published in: Nature, 2025 - nature.com
Institution: Stanford University, University of California Berkeley, University of Oxford
Research Area: AI Bias, Media Representation, Social Science
Discipline: Computational Social Science, Artificial Intelligence
The study highlights age-related gender bias in online media and language models, showing women are portrayed as younger than men, especially in high-status occupations, and explores how algorithms amplify these biases.
Methods: Analysis of 1.4 million images and videos from online sources and nine language models, followed by a pre-registered experiment involving participants to evaluate biases in internet content and algorithms.
Key Findings: Age and gender bias in occupational depiction across online platforms and language models, as well as its influence on beliefs and hiring preferences.
Citations: 4
Sample Size: 459
-
Authors: C Chen, Z Cui
Year: 2025
Published in: Journal of Medical Internet Research, 2025 - jmir.org
Institution: Medical College of Wisconsin
Research Area: Trust in AI, AI-assisted diagnosis, Health communication, Healthcare human-AI interaction
Discipline: Digital Health, Human-Computer Interaction, Behavioral Science
Patients trust and are more likely to seek help from doctors explicitly avoiding AI-assisted diagnosis rather than those using extensive or moderate AI, highlighting a strong aversion to AI in healthcare settings.
Methods: A randomized, web-based 4-group survey experiment was conducted with controls for sociodemographic factors and analysis using regression, mediation, and moderation techniques.
Key Findings: Trust in and intention to seek medical help from health care professionals using AI-assisted diagnosis versus those avoiding AI, and the influence of demographic, social, and experiential factors.
DOI: https://doi.org/10.2196/66083
Citations: 4
Sample Size: 1762
-
Authors: H Lin, G Czarnek, B Lewis, JP White, AJ Berinsky
Year: 2025
Published in: Nature, 2025 - nature.com
Institution: Massachusetts Institute of Technology
Research Area: Political Persuasion, Human-AI Dialogue, Electoral Behavior
Discipline: Political Science, Artificial Intelligence
The study shows that AI-driven dialogues can significantly influence voter attitudes and candidate preferences, with persuasion effects surpassing traditional political advertisements, though inaccuracies were more prevalent in content generated by AI models supporting right-wing candidates.
Methods: Pre-registered experiments where participants interacted with AI advocating for one of two candidates or a ballot measure, examining persuasion strategies and effects across three elections.
Key Findings: Influence of AI-generated dialogues on voter attitudes and preferences, including analysis of persuasion strategies and accuracy of presented information.
Citations: 3
-
Authors: S Liu, Z Cai, H Wang, Z Ma, X Li
Year: 2025
Published in: arXiv preprint arXiv:2505.19134, 2025 - arxiv.org
Institution: Meta, Imperial College London
Research Area: Artificial Intelligence, Crowdsourcing, Large Language Models
Discipline: Artificial Intelligence
The paper develops a principal-agent model to incentivize high-quality human annotations using golden questions and identifies criteria for these questions to effectively monitor annotators' performance.
Methods: The authors use a principal-agent model with maximum likelihood estimators (MLE) and hypothesis testing to design incentive-compatible systems for annotators. Golden questions of high certainty and similar format to normal data were selected and validated through experiments.
Key Findings: The effectiveness of golden questions for incentivizing and monitoring high-quality human annotations in preference data.
DOI: https://doi.org/10.48550/arXiv.2505.19134
Citations: 1
-
Authors: S Hatgis-Kessell, WB Knox, S Booth, S Niekum
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Stanford University, UMass Amherst, Carnegie Mellon University
Research Area: Reinforcement Learning from Human Feedback (RLHF)
Discipline: Artificial Intelligence, Human-Computer Interaction
The paper investigates whether human preferences can be influenced to align more closely with assumed preference models in RLHF algorithms through interventions such as showing model-derived quantities, training on preference models, and modifying elicitation questions.
Methods: Three human studies were conducted where interventions were tested, including revealing model-derived quantities, training participants on a preference model, and altering how preference questions were framed.
Key Findings: Evaluated the impact of interventions on humans' expression of preferences to align better with the assumed preference models of RLHF algorithms.
DOI: https://doi.org/10.48550/arXiv.2501.06416
Citations: 1
-
Authors: Z Ashktorab, A Buccella, J D'Cruz, Z Fowler, A Gill, KY Leung, PD Magnus, J Richards
Year: 2025
Published in: arXiv preprint arXiv:2507.02745, 2025•arxiv.org
Institution: IBM Research, University at Albany
Research Area: Human-AI Interaction, AI systems evaluation, UX, User Experience
Discipline: Computer Science, Human-Computer Interaction
In a preregistered study with 162 participants, people generally prefer explanatory apologies from LLM chatbots over rote or purely empathic ones—though in biased error scenarios empathic apologies are sometimes favored—highlighting the complexity of designing chatbot apologies that effectively repair trust.
DOI: https://doi.org/10.48550/arXiv.2507.02745
Citations: 1
-
Authors: S Dandekar, S Deshmukh, F Chiu, WB Knox
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: University of California, Davis, Northwestern University
Research Area: Reinforcement Learning from Human Feedback (RLHF), Human-AI Interaction, AI Theory
Discipline: Artificial Intelligence, Social Science
The paper investigates how human beliefs about agent capabilities influence preferences in RLHF, proposing a model to minimize the mismatch between beliefs and idealized agent capabilities, ultimately improving policy performance.
Methods: Human studies and synthetic experiments to model and test the impact of belief mismatches on human preferences and RLHF effectiveness.
Key Findings: Effects of human beliefs about agent capabilities on their provided preferences and the performance of RLHF policies.
DOI: https://doi.org/10.48550/arXiv.2506.01692
-
Authors: O Jacobs
Year: 2025
Published in: 2025 - open.library.ubc.ca
Institution: University of British Columbia
Research Area: Mind Perception in Human-AI Interaction, Anthropomorphism, Psychology
Discipline: Psychology, Human-Computer Interaction
This is a University of British Columbia doctoral thesis that investigates how people perceive and attribute mental states (beliefs, intentions, minds) to artificial intelligence systems — exploring the psychological and conceptual underpinnings of mind perception in human–AI interaction.
-
Authors: Mukund Telukunta, Venkata Sriram Siddhardh Nadendla, Morgan Stuart, Casey Canfield
Year: 2025
Published in: ArXiv
Institution: Missouri University of Science and Technology, United Network for Organ Sharing
Research Area: Algorithmic Fairness, Healthcare AI, Decision Making
Discipline: Artificial Intelligence
The study investigates fairness in regression-based predictive models for kidney transplantation, introducing three group fairness notions and identifying social preferences for fairness criteria, revealing biases against age groups but fairness towards gender and race groups.
Methods: Three novel fairness notions (independence, separation, sufficiency) were introduced alongside crowd feedback analysis through a Mixed-Logit discrete choice model.
Key Findings: Fairness in regression-based predictive analytics regarding group fairness criteria across social dimensions such as age, gender, and race.
Sample Size: 85
-
Authors: E Meguellati, S Civelli, L Han, A Bernstein
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Oregon Health Sciences University, Oregon University of California, Irvine, Han Institute, NYU School of Law, Bernstein Research
Research Area: Advertising, Persuasion Strategies, Human-AI Interaction in Content Generation
Discipline: Artificial Intelligence
LLM-generated advertisements achieved parity with human-written ads in personalization and demonstrated superiority in persuasion using psychological principles, outperforming human ads even when AI-origin detection impacted results.
Methods: Two-part study: First examined LLM personalization based on personality traits; second tested psychological persuasion principles using universal messages across authority, consensus, cognition, and scarcity.
Key Findings: Effectiveness of LLM-generated ads in personalization and persuasive storytelling compared to human-created ads.
Sample Size: 1200
-
Authors: L Ma, J Qin, X Xu, Y Tan
Year: 2025
Published in: arXiv preprint arXiv:2509.14436, 2025•arxiv.org
Institution: University of North Carolina Charlotte, University of Science and Technology of China, University of Washington
Research Area: LLM behavior, Algorithmic content preference, Human-AI Interaction
Discipline: Computer Science, Information Retrieval, Artificial Intelligence
This paper studies how generative search engines that use large language models (LLMs)—like Google’s AI overviews—select and cite web content, showing that these engines prefer content that is more predictable and semantically coherent for the model, and that LLM-based content polishing can increase the diversity and usefulness of AI summaries for users.
DOI: https://doi.org/10.48550/arXiv.2509.14436
-
Authors: T Kaufmann, P Weng, V Bengs, E Hüllermeier
Year: 2024
Published in: 2024 - epub.ub.uni-muenchen.de
Institution: Paderborn University, German Research Center for Artificial Intelligence (DFKI), Duke Kunshan University
Research Area: Reinforcement Learning from Human Feedback (RLHF), Large Language Models, Reward Modeling
Discipline: Artificial Intelligence
This paper surveys the fundamentals, diverse applications, and evolving impact of reinforcement learning from human feedback (RLHF), emphasizing its role in improving intelligent system alignment and performance.
Methods: The paper utilizes a survey-based approach to synthesize existing research, exploring the interactions between reinforcement learning algorithms and human input.
Key Findings: The study examines the principles, dynamics, applications, and trends in RLHF, offering insights into its role in enhancing large language models (LLMs) and intelligent systems.
Citations: 354