Artificial Intelligence: Academic Discipline — Prolific Citations Library

Explore 40 peer-reviewed papers in Artificial Intelligence (2025–2026). Academic research using Prolific for high-quality human data collection.

This page lists 40 peer-reviewed papers in the discipline of Artificial Intelligence in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (20 of 40)

Bayesian teaching enables probabilistic reasoning in large language models

Authors: L Qiu, F Sha, K Allen, Y Kim, T Linzen, S van Steenkiste

Year: 2026

Published in: Nature …, 2026 - nature.com

Institution: Meta, Google DeepMind, Massachusetts Institute of Technology, Google Research, Google

Research Area: Probabilistic reasoning, Bayesian cognition, Neural language models, Reasoning, AI Evaluations

Discipline: Machine learning, Artificial intelligence

This paper sits at the intersection of machine learning and computational cognitive science, showing that large language models can acquire generalized probabilistic reasoning by being trained to imitate Bayesian belief updating rather than relying on prompting or heuristics.

Citations: 8
Once One Fails, All Are Suspect: Understanding Error Generalization in AI

Authors: L Dai, Z Wang, L Chen, J Jin

Year: 2026

Published in: 2026•scholarspace.manoa.hawaii.edu

Institution: Shanghai International Studies University

Research Area: Socio-Economic Impacts of AI, Algorithmic Systems

Discipline: Computer Science, Artificial Intelligence

AI errors lead to broader negative generalizations about other AI systems compared to human errors, largely due to perceptions of AI's inflexibility and inability to learn from mistakes.

Methods: Conducted four one-factor experiments across distinct contexts to compare human responses to AI errors and human errors.

Key Findings: Generalization of error perceptions from one AI system to others, and psychological mechanisms driving this process.
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

Authors: N Petrova, A Gordon, E Blindow

Year: 2026

Published in: Open review

Institution: Prolific

Research Area: Human-centered AI evaluation, Bayesian statistics, Responsible AI, AI alignment, LLM Evaluation

Discipline: Machine Learning, Artificial Intelligence

The study introduces HUMAINE, a multidimensional evaluation framework for LLMs, revealing demographic-specific preference variations and ranking google/gemini-2.5-pro as the top-performing model with a posterior probability of 95.6%.

Methods: Multi-turn naturalistic conversations analyzed using a hierarchical Bayesian Bradley-Terry-Davidson model with post-stratification to census data, stratified across 22 demographic groups.

Key Findings: Performance of 28 LLMs across five human-centric dimensions, accounting for demographic-specific preferences.

Sample Size: 23404
RLHF deciphered: A critical analysis of reinforcement learning from human feedback for LLMs

Authors: S Chaudhari, P Aggarwal, V Murahari

Year: 2025

Published in: ACM Computing ..., 2025 - dl.acm.org

Institution: University of Massachusetts Amherst, Carnegie Mellon University, Princeton University

Research Area: Reinforcement Learning from Human Feedback (RLHF), LLM, RLHF

Discipline: Artificial Intelligence

The paper critically analyzes reinforcement learning from human feedback (RLHF) for large language models (LLMs), emphasizing the importance and limitations of reward models in improving human-aligned AI systems.

Methods: Analyzed RLHF frameworks through reinforcement learning principles; conducted a categorical literature review to identify modeling challenges, assumptions, and framework limitations.

Key Findings: Investigated RLHF's fundamentals, focusing on the role of reward models, implications of design choices in RLHF training algorithms, and underlying issues like generalization errors, model misspecification, and feedback sparsity.

Citations: 117
Visual cognition in multimodal large language models

Authors: LM Schulze Buschoff, E Akata, M Bethge

Year: 2025

Published in: Nature Machine ..., 2025 - nature.com

Institution: Max Planck Institute

Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)

Discipline: Cognitive Science, Artificial Intelligence, Computer Vision

Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.

Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.

Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.

DOI: https://doi.org/10.1038/s42256-024-00963-y

Citations: 70
Dimensions of diversity in human perceptions of algorithmic fairness

Authors: N Grgić-Hlača, G Lima, A Weller

Year: 2025

Published in: Proceedings of the 2nd ..., 2022 - dl.acm.org

Institution: Max Planck Institute, École Polytechnique Fédérale de Lausanne, University of Cambridge, The Alan Turing Institute

Research Area: Algorithmic Fairness, Human Perception, Diversity in AI Decision-Making

Discipline: Social Science, Artificial Intelligence

This study examines how sociodemographic factors and personal experience influence perceptions of fairness in algorithmic decision-making, particularly in bail decisions, highlighting the importance of diverse perspectives in regulatory oversight.

Methods: Explored perceptions of procedural fairness using surveys to assess the influence of demographics and personal experiences.

Key Findings: Impact of demographics (age, education, gender, race, political views) and personal experience on perceptions of fairness of algorithmic feature use in bail decisions.

DOI: 10.1145/3551624.3555306

Citations: 62
Fostering appropriate reliance on large language models: The role of explanations, sources, and inconsistencies

Authors: SSY Kim, JW Vaughan, QV Liao, T Lombrozo

Year: 2025

Published in: Proceedings of the ..., 2025 - dl.acm.org

Institution: Wake Forest University, University of Illinois at Urbana-Champaign, Princeton University, University of California Berkeley

Research Area: Appropriate Reliance on LLMs, Explainable AI, Human-AI Interaction, Cognitive Psychology

Discipline: Cognitive Psychology, Artificial Intelligence, Human-Computer Interaction (HCI)

The study examines factors that influence users' reliance on LLM responses, finding explanations increase reliance, while sources and inconsistent explanations reduce reliance on incorrect responses.

Methods: Think-aloud study followed by a pre-registered, controlled experiment to assess the impact of explanations, sources, and inconsistencies in LLM responses on user reliance.

Key Findings: Users' reliance on LLM responses, accuracy, and the influence of explanations, inconsistencies, and sources on these measures.

DOI: https://doi.org/10.1145/3706598.3714020

Citations: 38

Sample Size: 308
Comparing the persuasiveness of role-playing large language models and human experts on polarized US political issues

Authors: K Hackenburg, L Ibrahim, BM Tappin, M Tsakiris

Year: 2025

Published in: AI & SOCIETY, 2025 - Springer

Institution: Oxford Internet Institute, University of Oxford

Research Area: Political Communication and Persuasion, LLM

Discipline: Political Science, Artificial Intelligence

GPT-4's ability to generate persuasive messages rivaled human experts on polarized US political issues, suggesting AI tools may have significant implications for political campaigns and democracy.

Methods: Pre-registered experiment where GPT-4 generated partisan role-playing persuasive messages, which were compared to those from human persuasion experts.

Key Findings: Persuasive impact of GPT-4-generated messages versus human expert messages on U.S. political issues.

Citations: 35

Sample Size: 4955
When AI-Based Agents Are Proactive: Implications for Competence and System Satisfaction in Human-AI Collaboration

Authors: C Diebel, M Goutier, M Adam, A Benlian

Year: 2025

Published in: Business & Information Systems ..., 2025 - Springer

Institution: Technical University of Darmstadt, University of Goettingen

Research Area: Human-AI Collaboration, System Satisfaction, User Competence

Discipline: Information Systems, Human-Computer Interaction (HCI), Artificial Intelligence

Proactive AI-based agent assistance decreases users' competence-based self-esteem and system satisfaction, especially for users with higher AI knowledge.

Methods: Vignette-based online experiment using self-determination theory as the framework to evaluate user responses to proactive vs. reactive AI assistance.

Key Findings: Impact of proactive vs. reactive AI help on users' competence-based self-esteem and system satisfaction, moderated by users' AI knowledge levels.

DOI: https://doi.org/10.1007/s12599-024-00918-y

Citations: 32
Large Language Models are overconfident and amplify human bias

Authors: F Sun, N Li, K Wang, L Goette

Year: 2025

Published in: arXiv preprint arXiv:2505.02151, 2025 - arxiv.org

Institution: HKU Business School

Research Area: LLM Overconfidence and Human Bias Amplification, Bias, LLM

Discipline: Artificial Intelligence, Behavioral Science

Large language models (LLMs) exhibit overconfidence, amplifying human bias, especially in cases where their certainty declines, and their input doubles overconfidence in human decision making despite improving accuracy.

Methods: Algorithmically constructed reasoning problems with known ground truths were used to evaluate LLMs' confidence; comparisons were drawn with human performance using similar experimental protocols.

Key Findings: LLM confidence levels, correctness probabilities, comparison of bias between LLMs and humans, and effects of LLM input on human decision making.

Citations: 21
People Overtrust AI-Generated Medical Advice despite Low Accuracy

Authors: S Shekar, P Pataranutaporn, C Sarabu, GA Cecchi

Year: 2025

Published in: NEJM AI, 2025 - ai.nejm.org

Institution: MIT Media Lab, IBM Research, Stanford University, Massachusetts Institute of Technology

Research Area: AI Ethics, Healthcare, Patient Trust, Medical Misinformation

Discipline: Artificial Intelligence, Human-Computer Interaction (HCI), AI Ethics

This paper discusses a study by MIT researchers detailing patient trust in AI-generated medical advice, even when that advice is incorrect, raising concerns about misinformation in healthcare.

Citations: 19
REL-AI: An interaction-centered approach to measuring human-lm reliance

Authors: K Zhou, JD Hwang, X Ren, N Dziri

Year: 2025

Published in: Proceedings of the ..., 2025 - aclanthology.org

Institution: Stanford University, University of Southern California, Carnegie Mellon University, Allen Institute for AI

Research Area: Human-LM Reliance, Interaction-Centered Framework, Human-Computer Interaction (HCI)

Discipline: Human-Computer Interaction (HCI), Artificial Intelligence

The study introduces Rel-A.I., an interaction-centered evaluation approach to measure human reliance on LLM responses, revealing that politeness and interaction context significantly influence user reliance.

Methods: Nine user studies were conducted, analyzing user reliance influenced by LLM communication features such as politeness and context through participant interaction experiments.

Key Findings: The degree of human reliance on LLM responses based on communication style (e.g., politeness) and interaction context (e.g., knowledge domain, prior interactions).

Citations: 18

Sample Size: 450
Capturing the complexity of human strategic decision-making with machine learning

Authors: JQ Zhu, JC Peterson, B Enke, TL Griffiths

Year: 2025

Published in: Nature Human Behaviour, 2025 - nature.com

Institution: Princeton University, Boston University, Harvard University

Research Area: Strategic decision-making, Machine learning, Computational Cognitive Science

Discipline: Artificial Intelligence

This study used deep neural networks to analyze human strategic decision-making, predicting choices more accurately than existing theories and uncovering the context-dependent nature of reasoning and decision-making in complex games.

Methods: Deep neural networks trained on data from procedurally generated matrix games with over 2,400 variations; models were modified for interpretability.

Key Findings: Human choices and reasoning in initial play of two-player matrix games, focusing on strategic decision-making and response to game complexity.

DOI: https://doi.org/10.1038/s41562-025-02230-5

Citations: 16

Sample Size: 90000
Large Language Models Are More Persuasive Than Incentivized Human Persuaders

Authors: P. Schoenegger, F. Salvi, J. Liu, X. Nan, R. Debnath, B. Fasolo, E. Leivada, G. Recchia, F. Günther, A. Zarifhonarvar, J. Kwon, Z. Ul Islam, M. Dehnert, D. Y. H. Lee, M. G. Reinecke, D. G. Kamper, M. Kobaş, A. Sandford, J. Kgomo, L. Hewitt, S. Kapoor, K. Oktar, E. E. Kucuk, B. Feng, C. R. Jones, I. Gainsburg, S. Olschewski, N. Heinzelmann, F. Cruz, B. M. Tappin, T. Ma, P. S. Park, R. Onyonka, A. Hjorth, P. Slattery, Q. Zeng, L. Finke, I. Grossmann, A. Salatiello, E. Karger

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: London School of Economics and Political Science, University of Cambridge, University College London, Massachusetts Institute of Technology, University of Oxford, Modulo Research, Stanford University, Federal Reserve Bank of Chicago, ETH Zürich, University of Johannesburg

Research Area: Computation and Language

Discipline: Social Science, Artificial Intelligence

This paper compares a frontier LLM (Claude Sonnet 3.5) against incentivized human persuaders in a conversational quiz setting, finding that the AI's persuasion capabilities surpass those of humans with real-money bonuses tied to performance.

Citations: 16
Cognitive Forcing for Better Decision-Making: Reducing Overreliance on AI Systems Through Partial Explanations

Authors: S de Jong, V Paananen, B Tag

Year: 2025

Published in: Proceedings of the ACM on ..., 2025 - dl.acm.org

Institution: Niels van Berkel: Aalborg University, Sander de Jong, Ville Paananen, Benjamin Tag: Monash University

Research Area: Cognitive Forcing, Human-AI Interaction, AI Explainability (XAI), Decision-Making in AI Systems.

Discipline: Human-Computer Interaction (HCI), Artificial Intelligence

Partial explanations encourage critical thinking and reduce user overreliance on incorrect AI suggestions, with performance varying based on individual need for cognition and task difficulty.

Methods: Two experiments were conducted: (1) participants identified shortest paths in weighted graphs, and (2) participants corrected spelling and grammar errors in text, with AI suggestions accompanied by no, partial, or full explanations.

Key Findings: Effectiveness of partial explanations in reducing overreliance on incorrect AI suggestions, and interaction of explanation type with task difficulty and user need for cognition.

DOI: https://doi.org/10.1145/3710946

Citations: 14

Sample Size: 474
Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback: AD Lindström et al.

Authors: A Dahlgren Lindström, L Methnani, L Krause

Year: 2025

Published in: Ethics and Information ..., 2025 - Springer

Institution: Umeå University, Vrije Universiteit Amsterdam

Research Area: AI Alignment, AI Safety, Reinforcement Learning from Human Feedback (RLHF), Sociotechnical Systems

Discipline: Artificial Intelligence, Ethics

The paper critiques AI alignment efforts using RLHF and RLAIF, highlighting theoretical and practical limitations in meeting the goals of helpfulness, harmlessness, and honesty, and advocates for a broader sociotechnical approach to AI safety and ethics.

Methods: Sociotechnical critique of RLHF techniques with an analysis of theoretical frameworks and practical implementations.

Key Findings: The alignment of AI systems with human values and the efficacy of RLHF techniques in achieving the HHH principle (helpfulness, harmlessness, honesty).

DOI: https://doi.org/10.1007/s10676-025-09837-2

Citations: 14
Impact of Tone-Aware Explanations in Recommender Systems

Authors: A Okoso, K Otaki, S Koide, Y Baba

Year: 2025

Published in: ACM Transactions on Recommender Systems, 2025•dl.acm.org

Institution: Toyota Central R and D Labs, Toyota

Research Area: Human-Computer Interaction (HCI)

Discipline: Machine Learning, Artificial Intelligence

The study demonstrates that tailoring the tone of textual explanations in recommender systems to domains and user attributes, such as age and personality traits, can enhance users' perceptions and engagement.

Methods: Two online user studies: (1) 470 participants evaluated synthetic explanations with six tones across three domains (movies, hotels, and home products), (2) 103 participants engaged with a real-world dataset from the hotel domain using a personalized recommender system.

Key Findings: The perceived effects of different textual explanation tones on users, examined across domains (movies, hotels, home products) and user attributes (e.g., age, personality traits).

DOI: https://dl.acm.org/doi/10.1145/3718101

Citations: 13

Sample Size: 573
Aligning machine and human visual representations across abstraction levels

Authors: L Muttenthaler, K Greff, F Born, B Spitzer, S Kornblith

Year: 2025

Published in: Nature, 2025 - nature.com

Institution: Google DeepMind, Google, Machine Learning Group, Technische Universität Berlin, BIFOLD, Berlin Institute for the Foundations of Learning and Data, Max Planck Institute

Research Area: Cognitive Alignment, Computer Vision, Multi-level Conceptual Knowledge

Discipline: Artificial Intelligence, Cognitive Science

This paper presents a method for **aligning machine vision model representations with human visual similarity judgments across different abstraction levels, improving how well models reflect human perceptual and conceptual organization and enhancing generalization and uncertainty prediction.

Citations: 11
Transforming data annotation with ai agents: A review of architectures, reasoning, applications, and impact

Authors: MM Karim, S Khan, DH Van, X Liu, C Wang, Q Qu

Year: 2025

Published in: Future Internet, 2025 - mdpi.com

Institution: Chinese Academy of Sciences, Zhejiang University, South-Central Minzu University

Research Area: Artificial Intelligence, Data Annotation, Multi-Agent Systems

Discipline: Artificial Intelligence

The paper reviews the role of AI agents powered by large language models in addressing challenges in data annotation, focusing on architectures, workflows, real-world applications, and future research directions for improving efficiency, scalability, transparency, and bias mitigation.

Methods: Comprehensive review and analysis of AI agent architectures, workflows, applications, and evaluation methods in data annotation across multiple industries.

Key Findings: Capabilities of LLM-driven agents in reasoning, adaptive learning, collaborative annotation, and their impact on quality assurance, cost, scalability, and bias mitigation.

Citations: 10
A Framework to Assess the Persuasion Risks Large Language Model Chatbots Pose to Democratic Societies

Authors: Z Chen, J Kalla, Q Le, S Nakamura-Sakai

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: The affiliated institutions could not be determined from the provided context or an external search of the URL.

Research Area: Artificial Intelligence and Social Science, Persuasion Studies, Political Persuasion, LLM Chatbots, Democratic Societies

Discipline: Artificial Intelligence, Social Science

The study evaluates the cost-effectiveness and persuasive risks of Large Language Model (LLM) chatbots in political contexts, finding that while LLMs are as persuasive as campaign ads under exposure, their large-scale influence is currently limited by scalability and cost barriers.

Methods: Two survey experiments combined with real-world simulation exercises to measure the persuasiveness of LLM chatbots compared to traditional campaign tactics, focusing on both exposure and acceptance phases of persuasion.

Key Findings: Short- and long-term persuasive effects of LLMs, cost-effectiveness of LLM-based persuasion ($48-$74 per persuaded voter), and scalability compared to traditional campaign approaches.

Citations: 7

Sample Size: 10417