2025 Papers

This page lists 139 peer-reviewed papers published in 2025 in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (20 of 139)

The placebo effect of artificial intelligence in Human-Computer Interaction (HCI)

Authors: T Kosch, R Welsch, L Chuang, A Schmidt

Year: 2025

Published in: ACM Transactions on ..., 2023 - dl.acm.org

Institution: Aalto University

Research Area: User Expectations, HCI Research Bias, Artificial Intelligence, AI Bias

Discipline: Human-Computer Interaction

The belief in receiving adaptive AI support positively impacts user performance, demonstrating a placebo effect in Human-Computer Interaction.

Methods: Two experiments where participants completed word puzzles under conditions with or without supposed AI support; in reality, no AI assistance was provided.

Key Findings: Impact of perceived AI support on user expectations and task performance.

DOI: https://doi.org/10.1145/3529225

Citations: 149

Sample Size: 469
RLHF deciphered: A critical analysis of reinforcement learning from human feedback for LLMs

Authors: S Chaudhari, P Aggarwal, V Murahari

Year: 2025

Published in: ACM Computing ..., 2025 - dl.acm.org

Institution: University of Massachusetts Amherst, Carnegie Mellon University, Princeton University

Research Area: Reinforcement Learning from Human Feedback (RLHF), Large Language Models

Discipline: Artificial Intelligence

The paper critically analyzes reinforcement learning from human feedback (RLHF) for large language models (LLMs), emphasizing the importance and limitations of reward models in improving human-aligned AI systems.

Methods: Analyzed RLHF frameworks through reinforcement learning principles; conducted a categorical literature review to identify modeling challenges, assumptions, and framework limitations.

Key Findings: Investigated RLHF's fundamentals, focusing on the role of reward models, implications of design choices in RLHF training algorithms, and underlying issues like generalization errors, model misspecification, and feedback sparsity.

Citations: 117
What large language models know and what people think they know

Authors: M Steyvers, H Tejeda, A Kumar, C Belem

Year: 2025

Published in: Nature Machine ..., 2025 - nature.com

Institution: University of California Irvine

Research Area: Computational Linguistics, Computational Social Science, AI Ethics, Trust in AI

Discipline: Computational Social Science

LLMs often lead to user overestimation of response accuracy, especially with longer explanations; adjusting explanation styles to align with model confidence improves calibration and discrimination gaps, enhancing trust in AI-assisted decision making.

Methods: Conducted experiments using multiple-choice and short-answer questions to study user confidence versus model-stated confidence; varied explanation length and alignment with model internal confidence.

Key Findings: Calibration gap (human vs. model confidence), discrimination gap (ability to distinguish correct vs. incorrect answers), and effects of explanation style and length on user trust.

Citations: 100
Toxic content and user engagement on social media: Evidence from a field experiment

Authors: G Beknazar-Yuzbashev, R Jiménez-Durán, J McCrosky

Year: 2025

Published in: 2025 - econstor.eu

Institution: Mozilla Foundation, Columbia University, Bocconi University, Stanford University, University of Warwick

Research Area: Social Media, User Engagement, Toxicity

Discipline: Social Science

Reducing exposure to toxic content on social media lowers user engagement but also decreases the toxicity of user-generated content, highlighting a trade-off for platforms between reduced toxicity and increased engagement.

Methods: Pre-registered browser extension field experiment on Facebook, Twitter, and YouTube to randomly hide toxic content for six weeks; supplemented with a survey experiment.

Key Findings: Impact of reduced exposure to toxic content on advertising impressions, time spent, engagement, and user-generated content toxicity; explored curiosity and alignment between engagement and welfare.

Citations: 76
Visual cognition in multimodal large language models

Authors: LM Schulze Buschoff, E Akata, M Bethge

Year: 2025

Published in: Nature Machine ..., 2025 - nature.com

Institution: Max Planck Institute

Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)

Discipline: Cognitive Science, Artificial Intelligence, Computer Vision

Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.

Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.

Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.

DOI: https://doi.org/10.1038/s42256-024-00963-y

Citations: 70
On the conversational persuasiveness of GPT-4

Authors: F Salvi, M Horta Ribeiro, R Gallotti, R West

Year: 2025

Published in: Nature Human Behaviour, 2025 - nature.com

Institution: EPFL, Fondazione Bruno Kessle, Princeton University

Research Area: Conversational Persuasion of LLM, Human-Computer Interaction, Behavioral Science, Large Language Models

Discipline: Behavioral Science

GPT-4 can use personalized arguments to be more persuasive in debates, outperforming humans in 64.4% of AI-human comparisons when personalization is applied.

Methods: Preregistered controlled study involving multiround debates with random assignment to conditions focusing on AI-human comparisons, personalization, and opinion strength.

Key Findings: Effectiveness of persuasion by GPT-4, especially when using personalized arguments, compared to humans in debates.

Citations: 65

Sample Size: 900
Human detection of political speech deepfakes across transcripts, audio, and video

Authors: M Groh, A Sankaranarayanan, N Singh, DY Kim

Year: 2025

Published in: Nature ..., 2024 - nature.com

Institution: Northwestern University, Massachusetts Institute of Technology

Research Area: Deepfakes, Media Forensics, Human Perception of AI-Generated Content, Political Communication

Discipline: Computational Social Science

Humans are better at detecting deepfake political speeches using audio-visual cues than relying on text alone; state-of-the-art text-to-speech audio makes deepfakes harder to discern.

Methods: Five pre-registered randomized experiments with varied base rates of misinformation, audio sources, question framings, and media modalities were conducted.

Key Findings: Human accuracy in discerning real political speeches from deepfakes across media formats and contextual variables.

DOI: https://doi.org/10.1038/s41467-024-51998-z

Citations: 63

Sample Size: 2215
Dimensions of diversity in human perceptions of algorithmic fairness

Authors: N Grgić-Hlača, G Lima, A Weller

Year: 2025

Published in: Proceedings of the 2nd ..., 2022 - dl.acm.org

Institution: Max Planck Institute, École Polytechnique Fédérale de Lausanne, University of Cambridge, The Alan Turing Institute

Research Area: Algorithmic Fairness, Human Perception, Diversity in AI Decision-Making

Discipline: Social Science, Artificial Intelligence

This study examines how sociodemographic factors and personal experience influence perceptions of fairness in algorithmic decision-making, particularly in bail decisions, highlighting the importance of diverse perspectives in regulatory oversight.

Methods: Explored perceptions of procedural fairness using surveys to assess the influence of demographics and personal experiences.

Key Findings: Impact of demographics (age, education, gender, race, political views) and personal experience on perceptions of fairness of algorithmic feature use in bail decisions.

DOI: 10.1145/3551624.3555306

Citations: 62
One-Minute Video Generation with Test-Time Training

Authors: K Dalal, D Koceja, G Hussein, J Xu, Y Zhao, Y Song, S Han, KC Cheung, J Kautz, C Guestrin, T Hashimoto, S Koyejo, Y Choi, Y Sun, X Wang

Year: 2025

Published in: ArXiv

Institution: Nvidia, Stanford University, UT Austin, University of California Berkeley, University of California San Diego

Research Area: Video Generation, Diffusion Models, Test-Time Training

Discipline: Computer Science

The paper introduces Test-Time Training (TTT) layers into Transformers to generate coherent one-minute videos from text storyboards, outperforming baselines in storytelling coherence but facing efficiency and artifact challenges.

Methods: Experimentation with Test-Time Training layers embedded in pre-trained Transformer models, evaluated using a dataset curated from Tom and Jerry cartoons and compared against Mamba 2, Gated DeltaNet, and sliding-window attention layers.

Key Findings: Effectiveness of video generation methods in creating coherent multi-scene stories in one-minute videos.

Citations: 52

Sample Size: 100
Fostering appropriate reliance on large language models: The role of explanations, sources, and inconsistencies

Authors: SSY Kim, JW Vaughan, QV Liao, T Lombrozo

Year: 2025

Published in: Proceedings of the ..., 2025 - dl.acm.org

Institution: Wake Forest University, University of Illinois at Urbana-Champaign, Princeton University, University of California Berkeley

Research Area: Appropriate Reliance on LLMs, Explainable AI (XAI), Human-AI Interaction, Cognitive Psychology

Discipline: Cognitive Psychology, Artificial Intelligence, Human-Computer Interaction

The study examines factors that influence users' reliance on LLM responses, finding explanations increase reliance, while sources and inconsistent explanations reduce reliance on incorrect responses.

Methods: Think-aloud study followed by a pre-registered, controlled experiment to assess the impact of explanations, sources, and inconsistencies in LLM responses on user reliance.

Key Findings: Users' reliance on LLM responses, accuracy, and the influence of explanations, inconsistencies, and sources on these measures.

DOI: https://doi.org/10.1145/3706598.3714020

Citations: 38

Sample Size: 308
LLM-generated messages can persuade humans on policy issues

Authors: H Bai, JG Voelkel, S Muldowney, JC Eichstaedt

Year: 2025

Published in: Nature ..., 2025 - nature.com

Institution: Stanford University

Research Area: Political Persuasion, Large Language Models

Discipline: Computational Social Science

LLM-generated messages can effectively persuade humans on policy issues similarly to human-crafted messages, with differences in perceived persuasion mechanisms.

Methods: Three pre-registered experiments were conducted comparing the persuasive effectiveness of LLM-generated and human-generated messages on policy attitudes, using control conditions with neutral messages.

Key Findings: Influence of LLM-generated messages on participants' policy attitudes and perceived characteristics of the message authors.

Citations: 37

Sample Size: 4829
Comparing the persuasiveness of role-playing large language models and human experts on polarized US political issues

Authors: K Hackenburg, L Ibrahim, BM Tappin, M Tsakiris

Year: 2025

Published in: AI & SOCIETY, 2025 - Springer

Institution: Oxford Internet Institute, University of Oxford

Research Area: Political Communication and Persuasion, Large Language Models

Discipline: Political Science, Artificial Intelligence

GPT-4's ability to generate persuasive messages rivaled human experts on polarized US political issues, suggesting AI tools may have significant implications for political campaigns and democracy.

Methods: Pre-registered experiment where GPT-4 generated partisan role-playing persuasive messages, which were compared to those from human persuasion experts.

Key Findings: Persuasive impact of GPT-4-generated messages versus human expert messages on U.S. political issues.

Citations: 35

Sample Size: 4955
Taking the person seriously: Ethically aware IS research in the era of reinforcement learning-based personalization

Authors: T Greene, G Shmueli, S Ray

Year: 2025

Published in: Journal of the Association for ..., 2023 - aisel.aisnet.org

Institution: National Tsing Hua University, Copenhagen Business School

Research Area: Information Systems Ethics, Reinforcement Learning for Personalization

Discipline: Information Systems

The paper examines the ethical risks of reinforcement learning-based personalization and proposes three research directions for IS scholars to address its societal implications and inadequacies in existing regulations.

Methods: The study presents a conceptual analysis of emergent features and societal risks associated with reinforcement learning-based personalization and proposes research directions.

Key Findings: Potential harms of reinforcement learning-based personalization, such as reduced autonomy, social and political destabilization, and mass surveillance, alongside the limitations of current data protection laws.

DOI: https://aisel.aisnet.org/jais/vol24/iss6/6

Citations: 33
When AI-Based Agents Are Proactive: Implications for Competence and System Satisfaction in Human-AI Collaboration

Authors: C Diebel, M Goutier, M Adam, A Benlian

Year: 2025

Published in: Business & Information Systems ..., 2025 - Springer

Institution: Technical University of Darmstadt, University of Goettingen

Research Area: Human-AI Collaboration, System Satisfaction, User Competence

Discipline: Information Systems, Human-Computer Interaction, Artificial Intelligence

Proactive AI-based agent assistance decreases users' competence-based self-esteem and system satisfaction, especially for users with higher AI knowledge.

Methods: Vignette-based online experiment using self-determination theory as the framework to evaluate user responses to proactive vs. reactive AI assistance.

Key Findings: Impact of proactive vs. reactive AI help on users' competence-based self-esteem and system satisfaction, moderated by users' AI knowledge levels.

DOI: https://doi.org/10.1007/s12599-024-00918-y

Citations: 32
Can large language models assess personality from asynchronous video interviews? A comprehensive evaluation of validity, reliability, fairness, and rating patterns

Authors: T Zhang, A Koutsoumpis, JK Oostrom

Year: 2025

Published in: IEEE Transactions ..., 2024 - ieeexplore.ieee.org

Institution: Southeast University, Vrije Universiteit, Tilburg University

Research Area: LLM Personality Assessment, Human-AI Interaction, Large Language Models

Discipline: Human-AI Interaction, Social Science, Humanities

LLMs like GPT-3.5 and GPT-4 can rival or outperform task-specific AI models in assessing personality traits from asynchronous video interviews, but show uneven performance, low reliability, and potential biases, warranting cautious use in high-stakes scenarios.

Methods: The study evaluated GPT-3.5 and GPT-4 performance in assessing personality traits and interview performance using simulated AVI responses, comparing them with ratings from task-specific AI and human annotators.

Key Findings: Validity, reliability, fairness, and rating patterns of LLMs (GPT-3.5 and GPT-4) in personality assessment from asynchronous video interviews.

Citations: 31

Sample Size: 685
Scaling language model size yields diminishing returns for single-message political persuasion

Authors: K Hackenburg, BM Tappin, P Röttger, SA Hale

Year: 2025

Published in: Proceedings of the ..., 2025 - pnas.org

Institution: University of California Berkeley, University of Cambridge, University of Oxford, Max Planck Institute

Research Area: Political Persuasion, Large Language Models

Discipline: Computational Social Science, Political Science

Scaling language model sizes leads to diminishing returns in generating persuasive political messages, with larger models providing minimal gains compared to smaller ones after controlling for task completion metrics like coherence and relevance.

Methods: Generated 720 political messages using 24 LLMs of varying sizes and tested their persuasiveness through a large-scale randomized survey experiment.

Key Findings: Persuasive capability of language models across different sizes in generating political messages.

Citations: 31

Sample Size: 25982
The challenges of providing explanations of AI systems when they do not behave like users expect

Authors: M Riveiro, S Thill

Year: 2025

Published in: Proceedings of the 30th ACM Conference on User ..., 2022 - dl.acm.org

Institution: Linköping University, University of Skövde

Research Area: Explainable AI (XAI), Human-Computer Interaction

Discipline: Human-Computer Interaction

Users prefer factual explanations when AI outputs match expectations and mechanistic explanations when outputs deviate, with preferences influenced by response format (multiple-choice vs free text).

Methods: Participants were presented with scenarios involving an automated text classifier and asked to express their preference for explanations either through multiple-choice or free text responses.

Key Findings: User-desired content of AI explanations based on whether system behaviour aligns or deviates from expectations.

DOI: 10.1145/3503252.3531306

Citations: 30
The more followers the better? The impact of food influencers on consumer behaviour in the social media context

Authors: A Misra, TD Dinh, SY Ewe

Year: 2025

Published in: British Food Journal, 2024 - emerald.com

Institution: Monash University

Research Area: Consumer Behavior, Social Media Marketing

Discipline: Marketing

The study found that the number of followers and content type of food influencers significantly shape consumer behavior in the social media context, highlighting their role in effective marketing strategies for the food industry.

Methods: Quantitative analysis examining the relationship between influencers' follower counts, content type, and consumer reactions using social media data.

Key Findings: Influencer follower count, type of content communicated by influencers, consumer behavior influenced by these variables.

DOI: https://doi.org/10.1108/BFJ-01-2024-0096

Citations: 30
Consumer engagement with brands' COVID-19 messaging on social media: the role of perceived brand-social issue fit and brand opportunism

Authors: J Mundel, J Yang

Year: 2025

Published in: Journal of Interactive Advertising, 2021 - Taylor & Francis

Institution: Arizona State University, Loyola University

Research Area: Consumer Behavior, Social Media Marketing, COVID-19 Studies

Discipline: Marketing, Social Science

Brands with strong perceived fit between their products and COVID-19 messaging showed higher consumer engagement and positive attitudes, while those with lower perceived fit faced negative evaluations due to perceptions of opportunism.

Methods: Analyzed consumer responses to Instagram ads using perceived brand-social issue fit as a determinant of ad evaluations, brand attitudes, and engagement intentions.

Key Findings: Consumer responses, ad evaluations, brand attitudes, engagement intentions, and perceived brand opportunism based on fit between product type and COVID-19 messaging.

DOI: https://www.tandfonline.com/doi/abs/10.1080/15252019.2021.1958274#

Citations: 29
Impact of annotator demographics on sentiment dataset labeling

Authors: Y Ding, J You, TK Machulla, J Jacobs, P Sen

Year: 2025

Published in: Proceedings of the ..., 2022 - dl.acm.org

Institution: University of California Irvine, University of Florida, State University of New York at Buffalo, University of Waterloo, Virginia Tech

Research Area: Computational Social Science, Human-Computer Interaction, Sentiment Analysis

Discipline: Computational Social Science

Demographic differences among annotators significantly affect sentiment dataset labels, causing up to a 4.5% accuracy difference in sentiment prediction models.

Methods: Crowdsourced annotations from >1000 workers combined with demographic data; analysis of multimodal sentiment datasets and evaluation using machine learning models.

Key Findings: Impact of annotator demographics on sentiment labeling and its effect on model predictions.

DOI: https://doi.org/10.1145/3555632

Citations: 28

Sample Size: 1000