Carnegie Mellon University: Research Institution — Prolific Citations Library

Browse 10 peer-reviewed papers from Carnegie Mellon University spanning Reinforcement Learning from Human Feedback (RLHF), LLM (2017–2025). Research powered by Prolific's high-quality participant data.

This page lists 10 peer-reviewed papers from researchers at Carnegie Mellon University in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (10 of 10)

RLHF deciphered: A critical analysis of reinforcement learning from human feedback for LLMs

Authors: S Chaudhari, P Aggarwal, V Murahari

Year: 2025

Published in: ACM Computing ..., 2025 - dl.acm.org

Institution: University of Massachusetts Amherst, Carnegie Mellon University, Princeton University

Research Area: Reinforcement Learning from Human Feedback (RLHF), LLM, RLHF

Discipline: Artificial Intelligence

The paper critically analyzes reinforcement learning from human feedback (RLHF) for large language models (LLMs), emphasizing the importance and limitations of reward models in improving human-aligned AI systems.

Methods: Analyzed RLHF frameworks through reinforcement learning principles; conducted a categorical literature review to identify modeling challenges, assumptions, and framework limitations.

Key Findings: Investigated RLHF's fundamentals, focusing on the role of reward models, implications of design choices in RLHF training algorithms, and underlying issues like generalization errors, model misspecification, and feedback sparsity.

Citations: 117
REL-AI: An interaction-centered approach to measuring human-lm reliance

Authors: K Zhou, JD Hwang, X Ren, N Dziri

Year: 2025

Published in: Proceedings of the ..., 2025 - aclanthology.org

Institution: Stanford University, University of Southern California, Carnegie Mellon University, Allen Institute for AI

Research Area: Human-LM Reliance, Interaction-Centered Framework, Human-Computer Interaction (HCI)

Discipline: Human-Computer Interaction (HCI), Artificial Intelligence

The study introduces Rel-A.I., an interaction-centered evaluation approach to measure human reliance on LLM responses, revealing that politeness and interaction context significantly influence user reliance.

Methods: Nine user studies were conducted, analyzing user reliance influenced by LLM communication features such as politeness and context through participant interaction experiments.

Key Findings: The degree of human reliance on LLM responses based on communication style (e.g., politeness) and interaction context (e.g., knowledge domain, prior interactions).

Citations: 18

Sample Size: 450
Don't Be Fooled: The Misinformation Effect of Explanations in Human-AI Collaboration

Authors: P Spitzer, J Holstein, K Morrison

Year: 2025

Published in: ... Journal of Human ..., 2025 - Taylor & Francis

Institution: Karlsruhe Institute of Technology, Carnegie Mellon University, University of Bayreuth

Research Area: Human-AI Collaboration, Explainable AI (XAI)

Discipline: Human-Computer Interaction (HCI)

Incorrect explanations in AI-assisted decision-making lead to a misinformation effect, negatively impacting human reasoning, procedural knowledge, and collaboration performance.

Methods: A study on human-AI collaboration involving AI-supported decision-making paired with explainable AI (XAI) to assess the effects of incorrect explanations.

Key Findings: Impact of incorrect explanations on human reasoning strategies, procedural knowledge, and team performance in human-AI collaboration.

Citations: 13

Sample Size: 160
Sycophantic AI decreases prosocial intentions and promotes dependence

Authors: M Cheng, C Lee, P Khadpe, S Yu, D Han

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Stanford University, Carnegie Mellon University

Research Area: Computers and Society, Artificial Intelligence, AI, Sycophancy.

Discipline: Computer Science, Psychology

The study shows that sycophantic AI, which validates user inputs unquestioningly, reduces people's prosocial behavior and fosters dependence, despite users perceiving such AI as higher quality and more trustworthy.

Methods: The researchers conducted two preregistered experiments including a live-interaction study, where participants discussed real interpersonal conflicts with AI models. They evaluated responses from 11 state-of-the-art AI models on levels of sycophancy and its psychological effects on users.

Key Findings: The prevalence of sycophantic behavior in AI, users' prosocial intentions, conviction of being in the right, trust in AI, and willingness to reuse sycophantic AI models.

Citations: 5

Sample Size: 1604
Imperfections of XAI: phenomena influencing AI-assisted decision-making

Authors: P Spitzer, K Morrison, V Turri, M Feng, A Perer

Year: 2025

Published in: ACM Transactions on ..., 2025 - dl.acm.org

Institution: Carnegie Mellon University, Karlsruhe Institute of Technology, University of Bayreuth

Research Area: Explainable AI (XAI), AI-Assisted Decision-Making, Human-AI Collaboration

Discipline: Artificial Intelligence

The study highlights how imperfect explainable AI (XAI), along with human cognitive styles, affects reliance on AI and the performance of human–AI teams, providing design guidelines for better collaboration systems.

Methods: The researchers conducted a study with 136 participants, analyzing the effects of explanation imperfections and cognitive styles on AI-assisted decision-making and human–AI collaboration.

Key Findings: The impact of incorrect explanations and explanation modalities on human reliance, decision-making, and human–AI team performance, as well as the role of cognitive styles.

Citations: 2

Sample Size: 136
Influencing Humans to Conform to Preference Models for RLHF

Authors: S Hatgis-Kessell, WB Knox, S Booth, S Niekum

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Stanford University, UMass Amherst, Carnegie Mellon University

Research Area: Reinforcement Learning with Human Feedback (RLHF)

Discipline: Artificial Intelligence, Human-Computer Interaction (HCI)

The paper investigates whether human preferences can be influenced to align more closely with assumed preference models in RLHF algorithms through interventions such as showing model-derived quantities, training on preference models, and modifying elicitation questions.

Methods: Three human studies were conducted where interventions were tested, including revealing model-derived quantities, training participants on a preference model, and altering how preference questions were framed.

Key Findings: Evaluated the impact of interventions on humans' expression of preferences to align better with the assumed preference models of RLHF algorithms.

DOI: https://doi.org/10.48550/arXiv.2501.06416

Citations: 1
Locating Risk: Task Designers and the Challenge of Risk Disclosure in RAI Content Work

Authors: A Qian, R Shaw, L Dabbish, J Suh, H Shen

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Carnegie Mellon University, University of Pittsburgh, University of Utah, Yale School of Medicine, Yale University

Research Area: Responsible AI, Content Moderation, Risk Disclosure, Worker Well-being in Human-Computer Interaction (HCI).

Discipline: Computational Social Science, Human-Computer Interaction (HCI)

The paper examines how task designers approach well-being risk disclosure in Responsible AI (RAI) content work, highlighting a need for better frameworks to communicate such risks effectively.

Methods: Interviews were conducted with 23 task designers from academic and industry sectors to gather insights on risk recognition, interpretation, and communication practices.

Key Findings: How task designers recognize, interpret, and communicate well-being risks in RAI content work.

Citations: 1

Sample Size: 23
Overcoming algorithm aversion: A comparison between process and outcome control

Authors: L Cheng, A Chouldechova

Year: 2024

Published in: Proceedings of the 2023 CHI Conference ..., 2023 - dl.acm.org

Institution: Carnegie Mellon University

Research Area: Human-Computer Interaction (HCI), Algorithm Aversion, Decision Science

Discipline: Human-Computer Interaction (HCI)

Giving users process control by selecting the training algorithm mitigates algorithm aversion, but not by changing input factors, while combined outcome and process control is not more effective than each individually.

Methods: Replication study on outcome control and novel process control conditions tested on MTurk and Prolific platforms.

Key Findings: Impact of outcome control, process control, and combined controls on algorithm aversion mitigation.

DOI: https://doi.org/10.1145/3544548.3581253

Citations: 41
Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty

Authors: K Zhou,JD Hwang, X Ren,M Sap

Year: 2024

Published in: ArXiv

Institution: Allen Institute for AI, Carnegie Mellon University, Stanford University, University of Southern California

Research Area: LLM Reliability and Uncertainty Quantification, Reinforcement Learning from Human Feedback (RLHF), LLM

Discipline: Artificial Intelligence
Beyond the Turk: Alternative platforms for crowdsourcing behavioral research

Authors: E Peer, L Brandimarte, S Samat, A Acquisti

Year: 2017

Published in: Journal of experimental social ..., 2017 - Elsevier

Institution: Eller College of Management, University of Arizona, Heinz College, Carnegie Mellon University

Research Area: Crowdsourcing behavioral research

Discipline: Behavioral Science

Citations: 3928