Browse 10 peer-reviewed papers from Carnegie Mellon University spanning Reinforcement Learning from Human Feedback (RLHF), LLM (2017–2025). Research powered by Prolific's high-quality participant data.
This page lists 10 peer-reviewed papers from researchers at Carnegie Mellon University in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.
-
Authors: S Chaudhari, P Aggarwal, V Murahari
Year: 2025
Published in: ACM Computing ..., 2025 - dl.acm.org
Institution: University of Massachusetts Amherst, Carnegie Mellon University, Princeton University
Research Area: Reinforcement Learning from Human Feedback (RLHF), LLM, RLHF
Discipline: Artificial Intelligence
The paper critically analyzes reinforcement learning from human feedback (RLHF) for large language models (LLMs), emphasizing the importance and limitations of reward models in improving human-aligned AI systems.
Methods: Analyzed RLHF frameworks through reinforcement learning principles; conducted a categorical literature review to identify modeling challenges, assumptions, and framework limitations.
Key Findings: Investigated RLHF's fundamentals, focusing on the role of reward models, implications of design choices in RLHF training algorithms, and underlying issues like generalization errors, model misspecification, and feedback sparsity.
Citations: 117
-
Authors: K Zhou, JD Hwang, X Ren, N Dziri
Year: 2025
Published in: Proceedings of the ..., 2025 - aclanthology.org
Institution: Stanford University, University of Southern California, Carnegie Mellon University, Allen Institute for AI
Research Area: Human-LM Reliance, Interaction-Centered Framework, Human-Computer Interaction (HCI)
Discipline: Human-Computer Interaction (HCI), Artificial Intelligence
The study introduces Rel-A.I., an interaction-centered evaluation approach to measure human reliance on LLM responses, revealing that politeness and interaction context significantly influence user reliance.
Methods: Nine user studies were conducted, analyzing user reliance influenced by LLM communication features such as politeness and context through participant interaction experiments.
Key Findings: The degree of human reliance on LLM responses based on communication style (e.g., politeness) and interaction context (e.g., knowledge domain, prior interactions).
Citations: 18
Sample Size: 450
-
Authors: P Spitzer, J Holstein, K Morrison
Year: 2025
Published in: ... Journal of Human ..., 2025 - Taylor & Francis
Institution: Karlsruhe Institute of Technology, Carnegie Mellon University, University of Bayreuth
Research Area: Human-AI Collaboration, Explainable AI (XAI)
Discipline: Human-Computer Interaction (HCI)
Incorrect explanations in AI-assisted decision-making lead to a misinformation effect, negatively impacting human reasoning, procedural knowledge, and collaboration performance.
Methods: A study on human-AI collaboration involving AI-supported decision-making paired with explainable AI (XAI) to assess the effects of incorrect explanations.
Key Findings: Impact of incorrect explanations on human reasoning strategies, procedural knowledge, and team performance in human-AI collaboration.
Citations: 13
Sample Size: 160
-
Authors: M Cheng, C Lee, P Khadpe, S Yu, D Han
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Stanford University, Carnegie Mellon University
Research Area: Computers and Society, Artificial Intelligence, AI, Sycophancy.
Discipline: Computer Science, Psychology
The study shows that sycophantic AI, which validates user inputs unquestioningly, reduces people's prosocial behavior and fosters dependence, despite users perceiving such AI as higher quality and more trustworthy.
Methods: The researchers conducted two preregistered experiments including a live-interaction study, where participants discussed real interpersonal conflicts with AI models. They evaluated responses from 11 state-of-the-art AI models on levels of sycophancy and its psychological effects on users.
Key Findings: The prevalence of sycophantic behavior in AI, users' prosocial intentions, conviction of being in the right, trust in AI, and willingness to reuse sycophantic AI models.
Citations: 5
Sample Size: 1604
-
Authors: P Spitzer, K Morrison, V Turri, M Feng, A Perer
Year: 2025
Published in: ACM Transactions on ..., 2025 - dl.acm.org
Institution: Carnegie Mellon University, Karlsruhe Institute of Technology, University of Bayreuth
Research Area: Explainable AI (XAI), AI-Assisted Decision-Making, Human-AI Collaboration
Discipline: Artificial Intelligence
The study highlights how imperfect explainable AI (XAI), along with human cognitive styles, affects reliance on AI and the performance of human–AI teams, providing design guidelines for better collaboration systems.
Methods: The researchers conducted a study with 136 participants, analyzing the effects of explanation imperfections and cognitive styles on AI-assisted decision-making and human–AI collaboration.
Key Findings: The impact of incorrect explanations and explanation modalities on human reliance, decision-making, and human–AI team performance, as well as the role of cognitive styles.
Citations: 2
Sample Size: 136
-
Authors: S Hatgis-Kessell, WB Knox, S Booth, S Niekum
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Stanford University, UMass Amherst, Carnegie Mellon University
Research Area: Reinforcement Learning with Human Feedback (RLHF)
Discipline: Artificial Intelligence, Human-Computer Interaction (HCI)
The paper investigates whether human preferences can be influenced to align more closely with assumed preference models in RLHF algorithms through interventions such as showing model-derived quantities, training on preference models, and modifying elicitation questions.
Methods: Three human studies were conducted where interventions were tested, including revealing model-derived quantities, training participants on a preference model, and altering how preference questions were framed.
Key Findings: Evaluated the impact of interventions on humans' expression of preferences to align better with the assumed preference models of RLHF algorithms.
DOI: https://doi.org/10.48550/arXiv.2501.06416
Citations: 1
-
Authors: A Qian, R Shaw, L Dabbish, J Suh, H Shen
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Carnegie Mellon University, University of Pittsburgh, University of Utah, Yale School of Medicine, Yale University
Research Area: Responsible AI, Content Moderation, Risk Disclosure, Worker Well-being in Human-Computer Interaction (HCI).
Discipline: Computational Social Science, Human-Computer Interaction (HCI)
The paper examines how task designers approach well-being risk disclosure in Responsible AI (RAI) content work, highlighting a need for better frameworks to communicate such risks effectively.
Methods: Interviews were conducted with 23 task designers from academic and industry sectors to gather insights on risk recognition, interpretation, and communication practices.
Key Findings: How task designers recognize, interpret, and communicate well-being risks in RAI content work.
Citations: 1
Sample Size: 23
-
Authors: L Cheng, A Chouldechova
Year: 2024
Published in: Proceedings of the 2023 CHI Conference ..., 2023 - dl.acm.org
Institution: Carnegie Mellon University
Research Area: Human-Computer Interaction (HCI), Algorithm Aversion, Decision Science
Discipline: Human-Computer Interaction (HCI)
Giving users process control by selecting the training algorithm mitigates algorithm aversion, but not by changing input factors, while combined outcome and process control is not more effective than each individually.
Methods: Replication study on outcome control and novel process control conditions tested on MTurk and Prolific platforms.
Key Findings: Impact of outcome control, process control, and combined controls on algorithm aversion mitigation.
DOI: https://doi.org/10.1145/3544548.3581253
Citations: 41
-
Authors: K Zhou,JD Hwang, X Ren,M Sap
Year: 2024
Published in: ArXiv
Institution: Allen Institute for AI, Carnegie Mellon University, Stanford University, University of Southern California
Research Area: LLM Reliability and Uncertainty Quantification, Reinforcement Learning from Human Feedback (RLHF), LLM
Discipline: Artificial Intelligence
-
Authors: E Peer, L Brandimarte, S Samat, A Acquisti
Year: 2017
Published in: Journal of experimental social ..., 2017 - Elsevier
Institution: Eller College of Management, University of Arizona, Heinz College, Carnegie Mellon University
Research Area: Crowdsourcing behavioral research
Discipline: Behavioral Science
Citations: 3928