Explore 40 peer-reviewed papers in Artificial Intelligence (2025–2026). Academic research using Prolific for high-quality human data collection.
This page lists 40 peer-reviewed papers in the discipline of Artificial Intelligence in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.
-
Authors: L Qiu, F Sha, K Allen, Y Kim, T Linzen, S van Steenkiste
Year: 2026
Published in: Nature …, 2026 - nature.com
Institution: Meta, Google DeepMind, Massachusetts Institute of Technology, Google Research, Google
Research Area: Probabilistic reasoning, Bayesian cognition, Neural language models, Reasoning, AI Evaluations
Discipline: Machine learning, Artificial intelligence
This paper sits at the intersection of machine learning and computational cognitive science, showing that large language models can acquire generalized probabilistic reasoning by being trained to imitate Bayesian belief updating rather than relying on prompting or heuristics.
Citations: 8
-
Authors: L Dai, Z Wang, L Chen, J Jin
Year: 2026
Published in: 2026•scholarspace.manoa.hawaii.edu
Institution: Shanghai International Studies University
Research Area: Socio-Economic Impacts of AI, Algorithmic Systems
Discipline: Computer Science, Artificial Intelligence
AI errors lead to broader negative generalizations about other AI systems compared to human errors, largely due to perceptions of AI's inflexibility and inability to learn from mistakes.
Methods: Conducted four one-factor experiments across distinct contexts to compare human responses to AI errors and human errors.
Key Findings: Generalization of error perceptions from one AI system to others, and psychological mechanisms driving this process.
-
Authors: N Petrova, A Gordon, E Blindow
Year: 2026
Published in: Open review
Institution: Prolific
Research Area: Human-centered AI evaluation, Bayesian statistics, Responsible AI, AI alignment, LLM Evaluation
Discipline: Machine Learning, Artificial Intelligence
The study introduces HUMAINE, a multidimensional evaluation framework for LLMs, revealing demographic-specific preference variations and ranking google/gemini-2.5-pro as the top-performing model with a posterior probability of 95.6%.
Methods: Multi-turn naturalistic conversations analyzed using a hierarchical Bayesian Bradley-Terry-Davidson model with post-stratification to census data, stratified across 22 demographic groups.
Key Findings: Performance of 28 LLMs across five human-centric dimensions, accounting for demographic-specific preferences.
Sample Size: 23404
-
Authors: S Chaudhari, P Aggarwal, V Murahari
Year: 2025
Published in: ACM Computing ..., 2025 - dl.acm.org
Institution: University of Massachusetts Amherst, Carnegie Mellon University, Princeton University
Research Area: Reinforcement Learning from Human Feedback (RLHF), LLM, RLHF
Discipline: Artificial Intelligence
The paper critically analyzes reinforcement learning from human feedback (RLHF) for large language models (LLMs), emphasizing the importance and limitations of reward models in improving human-aligned AI systems.
Methods: Analyzed RLHF frameworks through reinforcement learning principles; conducted a categorical literature review to identify modeling challenges, assumptions, and framework limitations.
Key Findings: Investigated RLHF's fundamentals, focusing on the role of reward models, implications of design choices in RLHF training algorithms, and underlying issues like generalization errors, model misspecification, and feedback sparsity.
Citations: 117
-
Authors: LM Schulze Buschoff, E Akata, M Bethge
Year: 2025
Published in: Nature Machine ..., 2025 - nature.com
Institution: Max Planck Institute
Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)
Discipline: Cognitive Science, Artificial Intelligence, Computer Vision
Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.
Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.
Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.
DOI: https://doi.org/10.1038/s42256-024-00963-y
Citations: 70
-
Authors: N Grgić-Hlača, G Lima, A Weller
Year: 2025
Published in: Proceedings of the 2nd ..., 2022 - dl.acm.org
Institution: Max Planck Institute, École Polytechnique Fédérale de Lausanne, University of Cambridge, The Alan Turing Institute
Research Area: Algorithmic Fairness, Human Perception, Diversity in AI Decision-Making
Discipline: Social Science, Artificial Intelligence
This study examines how sociodemographic factors and personal experience influence perceptions of fairness in algorithmic decision-making, particularly in bail decisions, highlighting the importance of diverse perspectives in regulatory oversight.
Methods: Explored perceptions of procedural fairness using surveys to assess the influence of demographics and personal experiences.
Key Findings: Impact of demographics (age, education, gender, race, political views) and personal experience on perceptions of fairness of algorithmic feature use in bail decisions.
DOI: 10.1145/3551624.3555306
Citations: 62
-
Authors: SSY Kim, JW Vaughan, QV Liao, T Lombrozo
Year: 2025
Published in: Proceedings of the ..., 2025 - dl.acm.org
Institution: Wake Forest University, University of Illinois at Urbana-Champaign, Princeton University, University of California Berkeley
Research Area: Appropriate Reliance on LLMs, Explainable AI, Human-AI Interaction, Cognitive Psychology
Discipline: Cognitive Psychology, Artificial Intelligence, Human-Computer Interaction (HCI)
The study examines factors that influence users' reliance on LLM responses, finding explanations increase reliance, while sources and inconsistent explanations reduce reliance on incorrect responses.
Methods: Think-aloud study followed by a pre-registered, controlled experiment to assess the impact of explanations, sources, and inconsistencies in LLM responses on user reliance.
Key Findings: Users' reliance on LLM responses, accuracy, and the influence of explanations, inconsistencies, and sources on these measures.
DOI: https://doi.org/10.1145/3706598.3714020
Citations: 38
Sample Size: 308
-
Authors: K Hackenburg, L Ibrahim, BM Tappin, M Tsakiris
Year: 2025
Published in: AI & SOCIETY, 2025 - Springer
Institution: Oxford Internet Institute, University of Oxford
Research Area: Political Communication and Persuasion, LLM
Discipline: Political Science, Artificial Intelligence
GPT-4's ability to generate persuasive messages rivaled human experts on polarized US political issues, suggesting AI tools may have significant implications for political campaigns and democracy.
Methods: Pre-registered experiment where GPT-4 generated partisan role-playing persuasive messages, which were compared to those from human persuasion experts.
Key Findings: Persuasive impact of GPT-4-generated messages versus human expert messages on U.S. political issues.
Citations: 35
Sample Size: 4955
-
Authors: C Diebel, M Goutier, M Adam, A Benlian
Year: 2025
Published in: Business & Information Systems ..., 2025 - Springer
Institution: Technical University of Darmstadt, University of Goettingen
Research Area: Human-AI Collaboration, System Satisfaction, User Competence
Discipline: Information Systems, Human-Computer Interaction (HCI), Artificial Intelligence
Proactive AI-based agent assistance decreases users' competence-based self-esteem and system satisfaction, especially for users with higher AI knowledge.
Methods: Vignette-based online experiment using self-determination theory as the framework to evaluate user responses to proactive vs. reactive AI assistance.
Key Findings: Impact of proactive vs. reactive AI help on users' competence-based self-esteem and system satisfaction, moderated by users' AI knowledge levels.
DOI: https://doi.org/10.1007/s12599-024-00918-y
Citations: 32
-
Authors: F Sun, N Li, K Wang, L Goette
Year: 2025
Published in: arXiv preprint arXiv:2505.02151, 2025 - arxiv.org
Institution: HKU Business School
Research Area: LLM Overconfidence and Human Bias Amplification, Bias, LLM
Discipline: Artificial Intelligence, Behavioral Science
Large language models (LLMs) exhibit overconfidence, amplifying human bias, especially in cases where their certainty declines, and their input doubles overconfidence in human decision making despite improving accuracy.
Methods: Algorithmically constructed reasoning problems with known ground truths were used to evaluate LLMs' confidence; comparisons were drawn with human performance using similar experimental protocols.
Key Findings: LLM confidence levels, correctness probabilities, comparison of bias between LLMs and humans, and effects of LLM input on human decision making.
Citations: 21
-
Authors: S Shekar, P Pataranutaporn, C Sarabu, GA Cecchi
Year: 2025
Published in: NEJM AI, 2025 - ai.nejm.org
Institution: MIT Media Lab, IBM Research, Stanford University, Massachusetts Institute of Technology
Research Area: AI Ethics, Healthcare, Patient Trust, Medical Misinformation
Discipline: Artificial Intelligence, Human-Computer Interaction (HCI), AI Ethics
This paper discusses a study by MIT researchers detailing patient trust in AI-generated medical advice, even when that advice is incorrect, raising concerns about misinformation in healthcare.
Citations: 19
-
Authors: K Zhou, JD Hwang, X Ren, N Dziri
Year: 2025
Published in: Proceedings of the ..., 2025 - aclanthology.org
Institution: Stanford University, University of Southern California, Carnegie Mellon University, Allen Institute for AI
Research Area: Human-LM Reliance, Interaction-Centered Framework, Human-Computer Interaction (HCI)
Discipline: Human-Computer Interaction (HCI), Artificial Intelligence
The study introduces Rel-A.I., an interaction-centered evaluation approach to measure human reliance on LLM responses, revealing that politeness and interaction context significantly influence user reliance.
Methods: Nine user studies were conducted, analyzing user reliance influenced by LLM communication features such as politeness and context through participant interaction experiments.
Key Findings: The degree of human reliance on LLM responses based on communication style (e.g., politeness) and interaction context (e.g., knowledge domain, prior interactions).
Citations: 18
Sample Size: 450
-
Authors: JQ Zhu, JC Peterson, B Enke, TL Griffiths
Year: 2025
Published in: Nature Human Behaviour, 2025 - nature.com
Institution: Princeton University, Boston University, Harvard University
Research Area: Strategic decision-making, Machine learning, Computational Cognitive Science
Discipline: Artificial Intelligence
This study used deep neural networks to analyze human strategic decision-making, predicting choices more accurately than existing theories and uncovering the context-dependent nature of reasoning and decision-making in complex games.
Methods: Deep neural networks trained on data from procedurally generated matrix games with over 2,400 variations; models were modified for interpretability.
Key Findings: Human choices and reasoning in initial play of two-player matrix games, focusing on strategic decision-making and response to game complexity.
DOI: https://doi.org/10.1038/s41562-025-02230-5
Citations: 16
Sample Size: 90000
-
Authors: P. Schoenegger, F. Salvi, J. Liu, X. Nan, R. Debnath, B. Fasolo, E. Leivada, G. Recchia, F. Günther, A. Zarifhonarvar, J. Kwon, Z. Ul Islam, M. Dehnert, D. Y. H. Lee, M. G. Reinecke, D. G. Kamper, M. Kobaş, A. Sandford, J. Kgomo, L. Hewitt, S. Kapoor, K. Oktar, E. E. Kucuk, B. Feng, C. R. Jones, I. Gainsburg, S. Olschewski, N. Heinzelmann, F. Cruz, B. M. Tappin, T. Ma, P. S. Park, R. Onyonka, A. Hjorth, P. Slattery, Q. Zeng, L. Finke, I. Grossmann, A. Salatiello, E. Karger
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: London School of Economics and Political Science, University of Cambridge, University College London, Massachusetts Institute of Technology, University of Oxford, Modulo Research, Stanford University, Federal Reserve Bank of Chicago, ETH Zürich, University of Johannesburg
Research Area: Computation and Language
Discipline: Social Science, Artificial Intelligence
This paper compares a frontier LLM (Claude Sonnet 3.5) against incentivized human persuaders in a conversational quiz setting, finding that the AI's persuasion capabilities surpass those of humans with real-money bonuses tied to performance.
Citations: 16
-
Authors: S de Jong, V Paananen, B Tag
Year: 2025
Published in: Proceedings of the ACM on ..., 2025 - dl.acm.org
Institution: Niels van Berkel: Aalborg University, Sander de Jong, Ville Paananen, Benjamin Tag: Monash University
Research Area: Cognitive Forcing, Human-AI Interaction, AI Explainability (XAI), Decision-Making in AI Systems.
Discipline: Human-Computer Interaction (HCI), Artificial Intelligence
Partial explanations encourage critical thinking and reduce user overreliance on incorrect AI suggestions, with performance varying based on individual need for cognition and task difficulty.
Methods: Two experiments were conducted: (1) participants identified shortest paths in weighted graphs, and (2) participants corrected spelling and grammar errors in text, with AI suggestions accompanied by no, partial, or full explanations.
Key Findings: Effectiveness of partial explanations in reducing overreliance on incorrect AI suggestions, and interaction of explanation type with task difficulty and user need for cognition.
DOI: https://doi.org/10.1145/3710946
Citations: 14
Sample Size: 474
-
Authors: A Dahlgren Lindström, L Methnani, L Krause
Year: 2025
Published in: Ethics and Information ..., 2025 - Springer
Institution: Umeå University, Vrije Universiteit Amsterdam
Research Area: AI Alignment, AI Safety, Reinforcement Learning from Human Feedback (RLHF), Sociotechnical Systems
Discipline: Artificial Intelligence, Ethics
The paper critiques AI alignment efforts using RLHF and RLAIF, highlighting theoretical and practical limitations in meeting the goals of helpfulness, harmlessness, and honesty, and advocates for a broader sociotechnical approach to AI safety and ethics.
Methods: Sociotechnical critique of RLHF techniques with an analysis of theoretical frameworks and practical implementations.
Key Findings: The alignment of AI systems with human values and the efficacy of RLHF techniques in achieving the HHH principle (helpfulness, harmlessness, honesty).
DOI: https://doi.org/10.1007/s10676-025-09837-2
Citations: 14
-
Authors: A Okoso, K Otaki, S Koide, Y Baba
Year: 2025
Published in: ACM Transactions on Recommender Systems, 2025•dl.acm.org
Institution: Toyota Central R and D Labs, Toyota
Research Area: Human-Computer Interaction (HCI)
Discipline: Machine Learning, Artificial Intelligence
The study demonstrates that tailoring the tone of textual explanations in recommender systems to domains and user attributes, such as age and personality traits, can enhance users' perceptions and engagement.
Methods: Two online user studies: (1) 470 participants evaluated synthetic explanations with six tones across three domains (movies, hotels, and home products), (2) 103 participants engaged with a real-world dataset from the hotel domain using a personalized recommender system.
Key Findings: The perceived effects of different textual explanation tones on users, examined across domains (movies, hotels, home products) and user attributes (e.g., age, personality traits).
DOI: https://dl.acm.org/doi/10.1145/3718101
Citations: 13
Sample Size: 573
-
Authors: L Muttenthaler, K Greff, F Born, B Spitzer, S Kornblith
Year: 2025
Published in: Nature, 2025 - nature.com
Institution: Google DeepMind, Google, Machine Learning Group, Technische Universität Berlin, BIFOLD, Berlin Institute for the Foundations of Learning and Data, Max Planck Institute
Research Area: Cognitive Alignment, Computer Vision, Multi-level Conceptual Knowledge
Discipline: Artificial Intelligence, Cognitive Science
This paper presents a method for **aligning machine vision model representations with human visual similarity judgments across different abstraction levels, improving how well models reflect human perceptual and conceptual organization and enhancing generalization and uncertainty prediction.
Citations: 11
-
Authors: MM Karim, S Khan, DH Van, X Liu, C Wang, Q Qu
Year: 2025
Published in: Future Internet, 2025 - mdpi.com
Institution: Chinese Academy of Sciences, Zhejiang University, South-Central Minzu University
Research Area: Artificial Intelligence, Data Annotation, Multi-Agent Systems
Discipline: Artificial Intelligence
The paper reviews the role of AI agents powered by large language models in addressing challenges in data annotation, focusing on architectures, workflows, real-world applications, and future research directions for improving efficiency, scalability, transparency, and bias mitigation.
Methods: Comprehensive review and analysis of AI agent architectures, workflows, applications, and evaluation methods in data annotation across multiple industries.
Key Findings: Capabilities of LLM-driven agents in reasoning, adaptive learning, collaborative annotation, and their impact on quality assurance, cost, scalability, and bias mitigation.
Citations: 10
-
Authors: Z Chen, J Kalla, Q Le, S Nakamura-Sakai
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: The affiliated institutions could not be determined from the provided context or an external search of the URL.
Research Area: Artificial Intelligence and Social Science, Persuasion Studies, Political Persuasion, LLM Chatbots, Democratic Societies
Discipline: Artificial Intelligence, Social Science
The study evaluates the cost-effectiveness and persuasive risks of Large Language Model (LLM) chatbots in political contexts, finding that while LLMs are as persuasive as campaign ads under exposure, their large-scale influence is currently limited by scalability and cost barriers.
Methods: Two survey experiments combined with real-world simulation exercises to measure the persuasiveness of LLM chatbots compared to traditional campaign tactics, focusing on both exposure and acceptance phases of persuasion.
Key Findings: Short- and long-term persuasive effects of LLMs, cost-effectiveness of LLM-based persuasion ($48-$74 per persuaded voter), and scalability compared to traditional campaign approaches.
Citations: 7
Sample Size: 10417