Reasoning Studies

This page lists 27 peer-reviewed papers tagged with Reasoning in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (20 of 27)

Bayesian teaching enables probabilistic reasoning in large language models

Authors: L Qiu, F Sha, K Allen, Y Kim, T Linzen, S van Steenkiste

Year: 2026

Published in: Nature …, 2026 - nature.com

Institution: Meta, Google DeepMind, Massachusetts Institute of Technology, Google Research, Google

Research Area: Probabilistic reasoning, Bayesian cognition, Neural language models, Reasoning, AI Evaluations

Discipline: Machine learning, Artificial Intelligence

This paper sits at the intersection of machine learning and computational cognitive science, showing that large language models can acquire generalized probabilistic reasoning by being trained to imitate Bayesian belief updating rather than relying on prompting or heuristics.

Citations: 8
Cognitive empathy and dehumanization co-vary with dark triad traits and agency detection sensitivity

Authors: K Rudnicki, O Borowiecki, K Poels, B Beersma

Year: 2026

Published in: Evolution and Human …, 2026 - Elsevier

Institution: University of Antwerp, University of Bialystok, VU University, Emory University

Research Area: Personality psychology, Social cognition, Cognitive neuroscience

Discipline: Evolutionary psychology, human behavioral ecology

In a preregistered study, psychopathy (more than the other Dark Triad traits) is linked to worse cognitive empathy and greater dehumanization, and this empathy–psychopathy link is especially strong among people who are less sensitive at detecting agency in others.
Visual cognition in multimodal large language models

Authors: LM Schulze Buschoff, E Akata, M Bethge

Year: 2025

Published in: Nature Machine ..., 2025 - nature.com

Institution: Max Planck Institute

Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)

Discipline: Cognitive Science, Artificial Intelligence, Computer Vision

Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.

Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.

Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.

DOI: https://doi.org/10.1038/s42256-024-00963-y

Citations: 70
LLM-generated messages can persuade humans on policy issues

Authors: H Bai, JG Voelkel, S Muldowney, JC Eichstaedt

Year: 2025

Published in: Nature ..., 2025 - nature.com

Institution: Stanford University

Research Area: Political Persuasion, Large Language Models

Discipline: Computational Social Science

LLM-generated messages can effectively persuade humans on policy issues similarly to human-crafted messages, with differences in perceived persuasion mechanisms.

Methods: Three pre-registered experiments were conducted comparing the persuasive effectiveness of LLM-generated and human-generated messages on policy attitudes, using control conditions with neutral messages.

Key Findings: Influence of LLM-generated messages on participants' policy attitudes and perceived characteristics of the message authors.

Citations: 37

Sample Size: 4829
To rely or not to rely? evaluating interventions for appropriate reliance on large language models

Authors: JY Bo, S Wan, A Anderson

Year: 2025

Published in: Proceedings of the 2025 CHI Conference ..., 2025 - dl.acm.org

Institution: University of Toronto

Research Area: Appropriate reliance on LLM, Human-Computer Interaction, AI-assisted decision making.

Discipline: Human-Computer Interaction

This paper explores the latest advancements and key trends in the field of Human-Computer Interaction (HCI), focusing on novel interfaces and user experience paradigms.

Citations: 25
Large Language Models are overconfident and amplify human bias

Authors: F Sun, N Li, K Wang, L Goette

Year: 2025

Published in: arXiv preprint arXiv:2505.02151, 2025 - arxiv.org

Institution: HKU Business School

Research Area: LLM Overconfidence and Human Bias Amplification, Bias, Large Language Models

Discipline: Artificial Intelligence, Behavioral Science

Large language models (LLMs) exhibit overconfidence, amplifying human bias, especially in cases where their certainty declines, and their input doubles overconfidence in human decision making despite improving accuracy.

Methods: Algorithmically constructed reasoning problems with known ground truths were used to evaluate LLMs' confidence; comparisons were drawn with human performance using similar experimental protocols.

Key Findings: LLM confidence levels, correctness probabilities, comparison of bias between LLMs and humans, and effects of LLM input on human decision making.

Citations: 21
Don't Be Fooled: The Misinformation Effect of Explanations in Human-AI Collaboration

Authors: P Spitzer, J Holstein, K Morrison

Year: 2025

Published in: ... Journal of Human ..., 2025 - Taylor & Francis

Institution: Karlsruhe Institute of Technology, Carnegie Mellon University, University of Bayreuth

Research Area: Human-AI Collaboration, Explainable AI (XAI)

Discipline: Human-Computer Interaction

Incorrect explanations in AI-assisted decision-making lead to a misinformation effect, negatively impacting human reasoning, procedural knowledge, and collaboration performance.

Methods: A study on human-AI collaboration involving AI-supported decision-making paired with explainable AI (XAI) to assess the effects of incorrect explanations.

Key Findings: Impact of incorrect explanations on human reasoning strategies, procedural knowledge, and team performance in human-AI collaboration.

Citations: 13

Sample Size: 160
Transforming data annotation with ai agents: A review of architectures, reasoning, applications, and impact

Authors: MM Karim, S Khan, DH Van, X Liu, C Wang, Q Qu

Year: 2025

Published in: Future Internet, 2025 - mdpi.com

Institution: Chinese Academy of Sciences, Zhejiang University, South-Central Minzu University

Research Area: Artificial Intelligence, Data Annotation, Multi-Agent Systems

Discipline: Artificial Intelligence

The paper reviews the role of AI agents powered by large language models in addressing challenges in data annotation, focusing on architectures, workflows, real-world applications, and future research directions for improving efficiency, scalability, transparency, and bias mitigation.

Methods: Comprehensive review and analysis of AI agent architectures, workflows, applications, and evaluation methods in data annotation across multiple industries.

Key Findings: Capabilities of LLM-driven agents in reasoning, adaptive learning, collaborative annotation, and their impact on quality assurance, cost, scalability, and bias mitigation.

Citations: 10
Reflection-Philosophy Order Effects and Correlations Across Samples

Authors: N Byrd

Year: 2025

Published in: Byrd, N. (2025). Reflection-Philosophy Order Effects and Correlations Across Samples. Analysis. DOI: 10.1093/analys/anaf015. https://osf.io/preprints/psyarxiv/y8sdm

Institution: Stevens Institute of Technology

Research Area: Behavioral Research Methods, Experimental Psychology, Crowdsourcing Platforms

Discipline: Psychology

Reflective reasoning correlates with certain philosophical decisions, and the study suggests bidirectional causal paths between reflection and philosophy, with test order effects influencing reflection test outcomes but not philosophical decisions.

Methods: Participants from four sources (Amazon Mechanical Turk, CloudResearch, Prolific, and a university) were tested on reflective reasoning and their decisions on 10 philosophical thought experiments.

Key Findings: Impact of reflective reasoning on philosophical decisions and the effect of test order on reflection and philosophy outcomes.

Citations: 4
To Mask or to Mirror: Human-AI Alignment in Collective Reasoning

Authors: C Qian, AT Parisi, C Bouleau, V Tsai

Year: 2025

Published in: Proceedings of the ..., 2025 - aclanthology.org

Institution: Google, Google DeepMind

Research Area: Human-AI Alignment, Collective Reasoning, Social Biases, LLM Simulation of Human Behavior, AI Bias

Discipline: Natural Language Processing, Artificial Intelligence, Computational Social Science

This study examines human-AI alignment in collective reasoning using an empirical framework, demonstrating how LLMs either mirror or mask human biases depending on context, cues, and model-specific inductive biases.

Methods: The study uses the Lost at Sea social psychology task in a large-scale online experiment, simulating LLM groups conditioned on human decision-making data across varying conditions of visible or pseudonymous demographics.

Key Findings: Alignment of LLM behavior with human social reasoning, focusing on collective decision-making and biases in group interactions.

Citations: 1

Sample Size: 748
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark

Authors: D Testa, G Bonetta, R Bernardi, A Bondielli

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Università di Roma La Sapienza

Research Area: Multimodal Reasoning, AI Benchmarking

Discipline: Artificial Intelligence

MAIA is a benchmark designed to evaluate the reasoning abilities of Vision Language Models (VLMs) on video-based tasks, with a focus on Italian culture and language, revealing their fragility in consistency and visually grounded language comprehension and generation.

Methods: MAIA comprises a set of video-related questions tested with two tasks: visual statement verification and open-ended visual question answering, categorized into twelve reasoning types to disentangle language-vision relations.

Key Findings: The ability of Vision Language Models (VLMs) to perform consistent, visually grounded natural language understanding and generation across fine-grained reasoning categories.

DOI: https://doi.org/10.48550/arXiv.2502.16989
Take caution in using LLMs as human surrogates: Scylla ex machina

Authors: Y Gao, D Lee, G Burtch, S Fazelpour

Year: 2024

Published in: arXiv preprint arXiv:2410.19599, 2024 - arxiv.org

Institution: Boston University, Northeastern University

Research Area: LLMs as Human Surrogates, Social Science Research Methods, Human Behavior Simulation

Discipline: Economics, Artificial Intelligence, Social Science

LLMs fail to accurately replicate human behavior in the 11-20 money request game, cautioning against their use as surrogates for human cognition in social science research.

Methods: The study evaluates the reasoning depth of various advanced LLMs through their performance on the 11-20 money request game, analyzing failure points related to input language, roles, and safeguarding.

Key Findings: The ability of LLMs to replicate human-like behavior and reasoning distribution in the context of social science simulations.

Citations: 25
Better Be Computer or I'm Dumb": A Large-Scale Evaluation of Humans as Audio Deepfake Detectors

Authors: K Warren, T Tucker, A Crowder, D Olszewski

Year: 2024

Published in: Proceedings of the ..., 2024 - dl.acm.org

Institution: University of Florida

Research Area: Audio Deepfake Detection, Human Factors in AI Security, Perceptual Studies, AI Security

Discipline: Computer Science

Humans outperform machine learning models in classifying real human audio versus deepfakes, but are often misled by preconceptions about generated content, highlighting the need for more synergistic approaches between human and machine decision-making.

Methods: A large-scale user study was conducted where over 1,200 participants evaluated audio samples from three widely-cited deepfake datasets. Performance was quantitatively measured and thematic analysis was used to explore user reasoning and differences from machine classification.

Key Findings: Comparison of human and machine classification performance on audio deepfake detection, analysis of user reasoning, and evaluation of error patterns between both humans and models.

DOI: https://doi.org/10.1145/3658644.3670325

Citations: 14

Sample Size: 1200
Large language models amplify human biases in moral decision-making

Authors: V Cheung, M Maier, F Lieder

Year: 2024

Published in: Psyarxiv preprint, 2024 - files.osf.io

Institution: University College LondonA

Research Area: AI Ethics, Moral Decision-Making, Cognitive Biases in LLMs, AI Bias

Discipline: Artificial Intelligence, Ethics

Citations: 11
Virtual lab coats: The effects of verified source information on social media post credibility

Authors: J Geels, P Graßl, H Schraffenberger, M Tanis

Year: 2024

Published in: Plos one, 2024 - journals.plos.org

Institution: Network Institute

Research Area: Source Credibility, Social Media, Information Flow

Discipline: Social Science

Citations: 11
Detecting Fallacies in Climate Misinformation: A Technocognitive Approach to Identifying Misleading Argumentation

Authors: F Zanartu, J Cook, M Wagner, J Garcia

Year: 2024

Published in: ArXiv

Institution: Monash University, University of Melbourne

Research Area: Artificial Intelligence, Computational Social Science, Misinformation Detection, Fallacy Analysis in Climate Communication.

Discipline: Artificial Intelligence, Computational Social Science

Citations: 6
Annotator in the Loop: A Case Study of In-Depth Rater Engagement to Create a Prosocial Benchmark Dataset

Authors: S Schmer-Galunder, R Wheelock, Z Jalan

Year: 2024

Published in: Proceedings of the ..., 2024 - ojs.aaai.org

Institution: Google DeepMind, Google, Accenture, Amazon

Research Area: AI Ethics and Prosocial Data Annotation

Discipline: Artificial Intelligence, Ethics, Behavioral Science

DOI: https://doi.org/10.1609/aies.v7i1.31726

Citations: 3
Attributions toward artificial agents in a modified Moral Turing Test

Authors: Eyal Aharoni, Sharlene Fernandes, Daniel J. Brady, Caelan Alexander, Michael Criner, Kara Queen, Javier Rando, Eddy Nahmias, Victor Crespo

Year: 2024

Published in: Nature

Institution: Duke University, ETH Zurich, Georgia State University

Research Area: Moral Responsibility, Agency in AI, Human-AI Moral Interaction

Discipline: AI Ethics
Can Large Language Models Understand Symbolic Graphics Programs?

Authors: Z Qiu, W Liu, H Feng, Z Liu, T Xiao

Year: 2024

Published in: ArXiv

Institution: Massachusetts Institute of Technology, Max Planck Institute, University of Cambridge

Research Area: Computational cognition, LLM evaluation, Program synthesis, Multimodal reasoning

Discipline: Artificial Intelligence

Introduces SGP-Bench, a benchmark testing whether LLMs can answer semantic and spatial questions about images purely from graphics programs (SVG/CAD), effectively probing “visual imagination without vision.” The authors show current LLMs struggle - sometimes near chance - even when images are trivial for humans, but demonstrate that Symbolic Instruction Tuning (SIT) can meaningfully improve thi...
Experiences of Censorship on TikTok Across Marginalised Identities

Authors: Eddie L. Ungless, Nina Markl, Björn Ross

Year: 2024

Published in: ArXiv

Institution: University of Edinburgh, University of Essex

Research Area: Computational Social Science, Human-Computer Interaction, Media Studies

Discipline: Computational Social Science