Large Language Models Papers

This page lists 35 peer-reviewed papers in the research area of Large Language Models in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (20 of 35)

RLHF deciphered: A critical analysis of reinforcement learning from human feedback for LLMs

Authors: S Chaudhari, P Aggarwal, V Murahari

Year: 2025

Published in: ACM Computing ..., 2025 - dl.acm.org

Institution: University of Massachusetts Amherst, Carnegie Mellon University, Princeton University

Research Area: Reinforcement Learning from Human Feedback (RLHF), Large Language Models

Discipline: Artificial Intelligence

The paper critically analyzes reinforcement learning from human feedback (RLHF) for large language models (LLMs), emphasizing the importance and limitations of reward models in improving human-aligned AI systems.

Methods: Analyzed RLHF frameworks through reinforcement learning principles; conducted a categorical literature review to identify modeling challenges, assumptions, and framework limitations.

Key Findings: Investigated RLHF's fundamentals, focusing on the role of reward models, implications of design choices in RLHF training algorithms, and underlying issues like generalization errors, model misspecification, and feedback sparsity.

Citations: 117
Visual cognition in multimodal large language models

Authors: LM Schulze Buschoff, E Akata, M Bethge

Year: 2025

Published in: Nature Machine ..., 2025 - nature.com

Institution: Max Planck Institute

Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)

Discipline: Cognitive Science, Artificial Intelligence, Computer Vision

Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.

Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.

Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.

DOI: https://doi.org/10.1038/s42256-024-00963-y

Citations: 70
On the conversational persuasiveness of GPT-4

Authors: F Salvi, M Horta Ribeiro, R Gallotti, R West

Year: 2025

Published in: Nature Human Behaviour, 2025 - nature.com

Institution: EPFL, Fondazione Bruno Kessle, Princeton University

Research Area: Conversational Persuasion of LLM, Human-Computer Interaction, Behavioral Science, Large Language Models

Discipline: Behavioral Science

GPT-4 can use personalized arguments to be more persuasive in debates, outperforming humans in 64.4% of AI-human comparisons when personalization is applied.

Methods: Preregistered controlled study involving multiround debates with random assignment to conditions focusing on AI-human comparisons, personalization, and opinion strength.

Key Findings: Effectiveness of persuasion by GPT-4, especially when using personalized arguments, compared to humans in debates.

Citations: 65

Sample Size: 900
LLM-generated messages can persuade humans on policy issues

Authors: H Bai, JG Voelkel, S Muldowney, JC Eichstaedt

Year: 2025

Published in: Nature ..., 2025 - nature.com

Institution: Stanford University

Research Area: Political Persuasion, Large Language Models

Discipline: Computational Social Science

LLM-generated messages can effectively persuade humans on policy issues similarly to human-crafted messages, with differences in perceived persuasion mechanisms.

Methods: Three pre-registered experiments were conducted comparing the persuasive effectiveness of LLM-generated and human-generated messages on policy attitudes, using control conditions with neutral messages.

Key Findings: Influence of LLM-generated messages on participants' policy attitudes and perceived characteristics of the message authors.

Citations: 37

Sample Size: 4829
Comparing the persuasiveness of role-playing large language models and human experts on polarized US political issues

Authors: K Hackenburg, L Ibrahim, BM Tappin, M Tsakiris

Year: 2025

Published in: AI & SOCIETY, 2025 - Springer

Institution: Oxford Internet Institute, University of Oxford

Research Area: Political Communication and Persuasion, Large Language Models

Discipline: Political Science, Artificial Intelligence

GPT-4's ability to generate persuasive messages rivaled human experts on polarized US political issues, suggesting AI tools may have significant implications for political campaigns and democracy.

Methods: Pre-registered experiment where GPT-4 generated partisan role-playing persuasive messages, which were compared to those from human persuasion experts.

Key Findings: Persuasive impact of GPT-4-generated messages versus human expert messages on U.S. political issues.

Citations: 35

Sample Size: 4955
Can large language models assess personality from asynchronous video interviews? A comprehensive evaluation of validity, reliability, fairness, and rating patterns

Authors: T Zhang, A Koutsoumpis, JK Oostrom

Year: 2025

Published in: IEEE Transactions ..., 2024 - ieeexplore.ieee.org

Institution: Southeast University, Vrije Universiteit, Tilburg University

Research Area: LLM Personality Assessment, Human-AI Interaction, Large Language Models

Discipline: Human-AI Interaction, Social Science, Humanities

LLMs like GPT-3.5 and GPT-4 can rival or outperform task-specific AI models in assessing personality traits from asynchronous video interviews, but show uneven performance, low reliability, and potential biases, warranting cautious use in high-stakes scenarios.

Methods: The study evaluated GPT-3.5 and GPT-4 performance in assessing personality traits and interview performance using simulated AVI responses, comparing them with ratings from task-specific AI and human annotators.

Key Findings: Validity, reliability, fairness, and rating patterns of LLMs (GPT-3.5 and GPT-4) in personality assessment from asynchronous video interviews.

Citations: 31

Sample Size: 685
Scaling language model size yields diminishing returns for single-message political persuasion

Authors: K Hackenburg, BM Tappin, P Röttger, SA Hale

Year: 2025

Published in: Proceedings of the ..., 2025 - pnas.org

Institution: University of California Berkeley, University of Cambridge, University of Oxford, Max Planck Institute

Research Area: Political Persuasion, Large Language Models

Discipline: Computational Social Science, Political Science

Scaling language model sizes leads to diminishing returns in generating persuasive political messages, with larger models providing minimal gains compared to smaller ones after controlling for task completion metrics like coherence and relevance.

Methods: Generated 720 political messages using 24 LLMs of varying sizes and tested their persuasiveness through a large-scale randomized survey experiment.

Key Findings: Persuasive capability of language models across different sizes in generating political messages.

Citations: 31

Sample Size: 25982
Large Language Models are overconfident and amplify human bias

Authors: F Sun, N Li, K Wang, L Goette

Year: 2025

Published in: arXiv preprint arXiv:2505.02151, 2025 - arxiv.org

Institution: HKU Business School

Research Area: LLM Overconfidence and Human Bias Amplification, Bias, Large Language Models

Discipline: Artificial Intelligence, Behavioral Science

Large language models (LLMs) exhibit overconfidence, amplifying human bias, especially in cases where their certainty declines, and their input doubles overconfidence in human decision making despite improving accuracy.

Methods: Algorithmically constructed reasoning problems with known ground truths were used to evaluate LLMs' confidence; comparisons were drawn with human performance using similar experimental protocols.

Key Findings: LLM confidence levels, correctness probabilities, comparison of bias between LLMs and humans, and effects of LLM input on human decision making.

Citations: 21
The levers of political persuasion with conversational artificial intelligence

Authors: K Hackenburg, BM Tappin, L Hewitt, E Saunders

Year: 2025

Published in: Science, 2025 - science.org

Institution: London School of Economics and Political Science, Stony Brook University

Research Area: Political Persuasion with Conversational AI, Large Language Models, Factual Accuracy in AI Systems.

Discipline: Political Science, Computational Social Science

This Science paper shows that conversational AI chatbots can systematically influence political opinions at scale, and that techniques like post-training and prompting make them far more persuasive—but that increased persuasion is tied to reduced factual accuracy in what the AI says.

Citations: 12
Trick or Neat: Adversarial Ambiguity and Language Model Evaluation

Authors: A Karamolegkou, O Eberle, P Rust, C Kauf, A Søgaard

Year: 2025

Published in: ArXiv

Institution: Aleph Alpha, Massachusetts Institute of Technology

Research Area: Adversarial Ambiguity, Language Model Evaluation, Artificial Intelligence, Natural Language Processing, Large Language Models, AI Evaluation, Red Teaming

Discipline: Natural Language Processing

The paper assesses language models' sensitivity to ambiguity using an adversarial dataset and finds that direct prompting poorly identifies ambiguity, while linear probes achieve high accuracy in decoding ambiguity from model representations.

Methods: An adversarial ambiguity dataset was introduced with various types of ambiguities and transformations; models were tested using direct prompts and linear probes trained on internal representations.

Key Findings: Language models' ability to detect ambiguity, including syntactic, lexical, and phonological types, as well as performance under adversarial variations.

Citations: 2
Benchmarking World-Model Learning

Authors: A Warrier, D Nguyen, M Naim, M Jain, Y Liang, K Schroeder, C Yang, JB Tenenbaum, S Vollmer, K Ellis, Z Tavares

Year: 2025

Published in: 2025 - arXiv preprint arXiv …, 2025 - arxiv.org

Institution: Basis Research Institute, DFKI GmbH, Harvard University, Quebec AI Institute, University of Cambridge, Massachusetts Institute of Technology, Cornell University

Research Area: Agent learning, World Models, Benchmarking, Evaluation protocols, Reinforcement Learning from Human Feedback (RLHF), Large Language Models

Discipline: Computer Science, Artificial Intelligence, Machine Learning

The paper introduces WorldTest, a novel protocol for evaluating model-learning agents using reward-free exploration and behavior-based scoring, and demonstrates that humans outperform models on the AutumnBench suite of tasks, revealing significant gaps in world-model learning.

Methods: The authors proposed WorldTest, a protocol separating reward-free interaction from scored tests in related environments, with evaluations done using AutumnBench—a dataset of 43 grid-world environments and 129 tasks across prediction, planning, and causal dynamics.

Key Findings: Performance of model-learning agents and humans in acquiring world models for masked-frame prediction, planning, and understanding causal dynamics.

Citations: 1

Sample Size: 517
Incentivizing High-Quality Human Annotations with Golden Questions

Authors: S Liu, Z Cai, H Wang, Z Ma, X Li

Year: 2025

Published in: arXiv preprint arXiv:2505.19134, 2025 - arxiv.org

Institution: Meta, Imperial College London

Research Area: Artificial Intelligence, Crowdsourcing, Large Language Models

Discipline: Artificial Intelligence

The paper develops a principal-agent model to incentivize high-quality human annotations using golden questions and identifies criteria for these questions to effectively monitor annotators' performance.

Methods: The authors use a principal-agent model with maximum likelihood estimators (MLE) and hypothesis testing to design incentive-compatible systems for annotators. Golden questions of high certainty and similar format to normal data were selected and validated through experiments.

Key Findings: The effectiveness of golden questions for incentivizing and monitoring high-quality human annotations in preference data.

DOI: https://doi.org/10.48550/arXiv.2505.19134

Citations: 1
Towards Strategic Persuasion with Language Models

Authors: Z Cheng, J You

Year: 2025

Published in: arXiv preprint arXiv:2509.22989, 2025 - arxiv.org

Institution: University of Southern California, University of California Berkeley

Research Area: Artificial Intelligence, Computer Science, Computer Science and Game Theory, Strategic Persuasion, Reinforcement Learning, Language Models, Large Language Models, Reinforcement Learning from Human Feedback (RLHF)

Discipline: Artificial Intelligence

This paper introduces a scalable framework, utilizing Bayesian Persuasion, to evaluate and train LLMs for strategic persuasion, demonstrating significant persuasion gains and effective strategies through reinforcement learning.

Methods: Repurposed human-human persuasion datasets for evaluation and training; applied Bayesian Persuasion framework; used reinforcement learning to optimize LLMs for strategic persuasion.

Key Findings: The persuasive capabilities and strategies of large language models (LLMs) in various settings.

Citations: 1
Whose view of safety? a deep dive dataset for pluralistic alignment of text-to-image models

Authors: C Rastogi, TH Teh, P Mishra, R Patel, D Wang, M Díaz, A Parrish, AM Davani, Z Ashwood

Year: 2025

Published in: arXiv preprint arXiv:2507.13383, 2025•arxiv.org

Institution: Google DeepMind, Google Research, Google

Research Area: AI alignment, safety evaluation, AI Safety, Multimodal evaluation, Human-AI Interaction, Large Language Models

Discipline: Computer Science, Machine Learning, Artificial Intelligence

This research introduces the DIVE dataset to enable pluralistic alignment in text-to-image models by accounting for diverse safety perspectives, revealing demographic variations in harm perception and advancing T2I model alignment strategies.

Methods: The study involved collecting feedback across 1000 prompts from demographically intersectional human raters to capture diverse safety perspectives, with an emphasis on empirical and contextual differences in harm perception.

Key Findings: Safety perceptions of text-to-image (T2I) model outputs from diverse demographic viewpoints and the influence of these perspectives on alignment strategies.

Citations: 1

Sample Size: 1000
Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents

Authors: K Grosse, N Ebert

Year: 2025

Published in: ARXIV

Institution: IBM Research, ZHAW

Research Area: Security and privacy risks, Large Language Models, Human-AI Interaction, AI Safety

Discipline: Computer Science

A survey of 3,270 UK adults reveals significant security and privacy risks in AI conversational agent usage, with a third engaging in risky behavior enabling attacks and many unaware of how their data are used or opting out.

Methods: Representative survey conducted via Prolific platform targeting UK adults, focusing on usage behaviors of AI conversational agents.

Key Findings: User behaviors related to security and privacy risks, data sanitization practices, attempts to jailbreak AI models, and awareness of data usage policies.

Sample Size: 3270
tAlfa: Enhancing Team Effectiveness and Cohesion with AI-Generated Automated Feedback

Authors: Mohammed Almutairi, Charles Chiang, Yuxin Bai, Diego Gomez-Zara

Year: 2025

Published in: ArXiv

Institution: University of Notre Dame

Research Area: Human-AI Interaction, Team Effectiveness, Automated Feedback, Large Language Models

Discipline: Human-Computer Interaction

tAIfa, an AI tool using LLMs, enhances team communication and cohesion through automated feedback based on interaction analysis.

Methods: Between-subjects study where team interactions were analyzed by an AI agent (tAIfa) to deliver feedback on strengths and areas for improvement.

Key Findings: Team communication, contributions, and cohesion with and without tAIfa's feedback.

Sample Size: 18
Gemma 2: Improving Open Language Models at a Practical Size

Authors: Gemma Team

Year: 2024

Published in: ArXiv

Institution: Google DeepMind, Google

Research Area: Large Language Models, Model Efficiency, Architecture

Discipline: Artificial Intelligence

Gemma 2 introduces scalable Transformer-based language models (2B-27B parameters) enhanced with techniques like local-global and group-query attention, achieving state-of-the-art performance for their size and competing with larger models.

Methods: The study applied modifications to the Transformer architecture, such as local-global attentions and group-query attention, as well as knowledge distillation training for select model sizes.

Key Findings: Performance of lightweight language models in terms of efficiency and competitiveness with larger models.

Citations: 1649
A survey of reinforcement learning from human feedback

Authors: T Kaufmann, P Weng, V Bengs, E Hüllermeier

Year: 2024

Published in: 2024 - epub.ub.uni-muenchen.de

Institution: Paderborn University, German Research Center for Artificial Intelligence (DFKI), Duke Kunshan University

Research Area: Reinforcement Learning from Human Feedback (RLHF), Large Language Models, Reward Modeling

Discipline: Artificial Intelligence

This paper surveys the fundamentals, diverse applications, and evolving impact of reinforcement learning from human feedback (RLHF), emphasizing its role in improving intelligent system alignment and performance.

Methods: The paper utilizes a survey-based approach to synthesize existing research, exploring the interactions between reinforcement learning algorithms and human input.

Key Findings: The study examines the principles, dynamics, applications, and trends in RLHF, offering insights into its role in enhancing large language models (LLMs) and intelligent systems.

Citations: 354
Predicting results of social science experiments using large language models

Authors: L Hewitt, A Ashokkumar, I Ghezae, R Willer

Year: 2024

Published in: Preprint, 2024 - samim.io

Institution: Stanford University, New York University

Research Area: Social Science Experiments, Large Language Model Prediction, Large Language Models

Discipline: Computational Social Science

The study presents a framework using large language models to predict outcomes of social science field experiments, achieving 78% accuracy but facing challenges with experiments on complex social issues.

Methods: Authors used an automated framework powered by large language models to predict outcomes of 276 field experiments drawn from economics literature.

Key Findings: The prediction accuracy of large language models for outcomes of field experiments addressing various human behaviors.

Citations: 68

Sample Size: 276
Evidence of a log scaling law for political persuasion with large language models

Authors: K Hackenburg, BM Tappin, P Röttger, S Hale

Year: 2024

Published in: arXiv preprint arXiv ..., 2024 - arxiv.org

Institution: University of Oxford, The Alan Turing Institute, Royal Holloway, University of London, Bocconi University, Meedan

Research Area: LLM scaling laws, Political Persuasion, Large Language Models, AI Social Science

Discipline: Political Science, Artificial Intelligence

Persuasiveness of messages generated by large language models follows a log scaling law with diminishing returns as model size increases, and task completion appears to primarily drive this capability.

Methods: Generated 720 persuasive messages on 10 U.S. political issues using 24 language models of varying sizes; evaluated persuasiveness through a large-scale randomized survey experiment.

Key Findings: Persuasiveness of large language model-generated political messages across different model sizes.

Citations: 17

Sample Size: 25982