Princeton University Papers

Browse 7 peer-reviewed papers from Princeton University spanning Large Language Models, Reinforcement Learning from Human Feedback (RLHF) (2022–2025). Research powered by Prolific's high-quality participant data.

This page lists 7 peer-reviewed papers from researchers at Princeton University in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (7 of 7)

RLHF deciphered: A critical analysis of reinforcement learning from human feedback for LLMs

Authors: S Chaudhari, P Aggarwal, V Murahari

Year: 2025

Published in: ACM Computing ..., 2025 - dl.acm.org

Institution: University of Massachusetts Amherst, Carnegie Mellon University, Princeton University

Research Area: Reinforcement Learning from Human Feedback (RLHF), Large Language Models

Discipline: Artificial Intelligence

The paper critically analyzes reinforcement learning from human feedback (RLHF) for large language models (LLMs), emphasizing the importance and limitations of reward models in improving human-aligned AI systems.

Methods: Analyzed RLHF frameworks through reinforcement learning principles; conducted a categorical literature review to identify modeling challenges, assumptions, and framework limitations.

Key Findings: Investigated RLHF's fundamentals, focusing on the role of reward models, implications of design choices in RLHF training algorithms, and underlying issues like generalization errors, model misspecification, and feedback sparsity.

Citations: 117
On the conversational persuasiveness of GPT-4

Authors: F Salvi, M Horta Ribeiro, R Gallotti, R West

Year: 2025

Published in: Nature Human Behaviour, 2025 - nature.com

Institution: EPFL, Fondazione Bruno Kessle, Princeton University

Research Area: Conversational Persuasion of LLM, Human-Computer Interaction, Behavioral Science, Large Language Models

Discipline: Behavioral Science

GPT-4 can use personalized arguments to be more persuasive in debates, outperforming humans in 64.4% of AI-human comparisons when personalization is applied.

Methods: Preregistered controlled study involving multiround debates with random assignment to conditions focusing on AI-human comparisons, personalization, and opinion strength.

Key Findings: Effectiveness of persuasion by GPT-4, especially when using personalized arguments, compared to humans in debates.

Citations: 65

Sample Size: 900
Fostering appropriate reliance on large language models: The role of explanations, sources, and inconsistencies

Authors: SSY Kim, JW Vaughan, QV Liao, T Lombrozo

Year: 2025

Published in: Proceedings of the ..., 2025 - dl.acm.org

Institution: Wake Forest University, University of Illinois at Urbana-Champaign, Princeton University, University of California Berkeley

Research Area: Appropriate Reliance on LLMs, Explainable AI (XAI), Human-AI Interaction, Cognitive Psychology

Discipline: Cognitive Psychology, Artificial Intelligence, Human-Computer Interaction

The study examines factors that influence users' reliance on LLM responses, finding explanations increase reliance, while sources and inconsistent explanations reduce reliance on incorrect responses.

Methods: Think-aloud study followed by a pre-registered, controlled experiment to assess the impact of explanations, sources, and inconsistencies in LLM responses on user reliance.

Key Findings: Users' reliance on LLM responses, accuracy, and the influence of explanations, inconsistencies, and sources on these measures.

DOI: https://doi.org/10.1145/3706598.3714020

Citations: 38

Sample Size: 308
Capturing the complexity of human strategic decision-making with machine learning

Authors: JQ Zhu, JC Peterson, B Enke, TL Griffiths

Year: 2025

Published in: Nature Human Behaviour, 2025 - nature.com

Institution: Princeton University, Boston University, Harvard University

Research Area: Strategic decision-making, Machine learning, Computational Cognitive Science

Discipline: Artificial Intelligence

This study used deep neural networks to analyze human strategic decision-making, predicting choices more accurately than existing theories and uncovering the context-dependent nature of reasoning and decision-making in complex games.

Methods: Deep neural networks trained on data from procedurally generated matrix games with over 2,400 variations; models were modified for interpretability.

Key Findings: Human choices and reasoning in initial play of two-player matrix games, focusing on strategic decision-making and response to game complexity.

DOI: https://doi.org/10.1038/s41562-025-02230-5

Citations: 16

Sample Size: 90000
Inattentive responding can induce spurious associations between task behaviour and symptom measures

Authors: S Zorowitz, J Solis, Y Niv, D Bennett

Year: 2023

Published in: Nature human behaviour, 2023 - nature.com

Institution: Princeton University, Rutgers University, Monash University

Research Area: Research Methodology, Behavioral Research Methods, Experimental Psychology (focus on data quality and spurious correlations)

Discipline: Behavioral Science

DOI: https://doi.org/10.1038/s41562-023-01640-7

Citations: 110
Multimodality and attention increase alignment in natural language prediction between humans and computational models

Authors: V Kewenig, A Lampinen, SA Nastase

Year: 2023

Published in: arXiv preprint arXiv ..., 2023 - arxiv.org

Institution: University College London, Princeton University, Exeter University

Research Area: Computational Linguistics, Cognitive Science

Discipline: Computational Linguistics

DOI: https://doi.org/10.48550/arXiv.2308.06035

Citations: 3
Uncalibrated models can improve human-ai collaboration

Authors: K Vodrahalli, T Gerstenberg

Year: 2022

Published in: Advances in Neural ..., 2022 - proceedings.neurips.cc

Institution: Columbia University, Princeton University, Intel, Stanford University, Massachusetts Institute of Technology

Research Area: Human-AI Collaboration, Human Behavior Modeling, Decision Making

Discipline: Artificial Intelligence

Citations: 70