Browse 27 peer-reviewed papers in Reasoning. Discover studies powered by high-quality human data from Prolific.
This page lists 27 peer-reviewed papers tagged with Reasoning in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.
-
Authors: L Qiu, F Sha, K Allen, Y Kim, T Linzen, S van Steenkiste
Year: 2026
Published in: Nature …, 2026 - nature.com
Institution: Meta, Google DeepMind, Massachusetts Institute of Technology, Google Research, Google
Research Area: Probabilistic reasoning, Bayesian cognition, Neural language models, Reasoning, AI Evaluations
Discipline: Machine learning, Artificial Intelligence
This paper sits at the intersection of machine learning and computational cognitive science, showing that large language models can acquire generalized probabilistic reasoning by being trained to imitate Bayesian belief updating rather than relying on prompting or heuristics.
Citations: 8
-
Authors: K Rudnicki, O Borowiecki, K Poels, B Beersma
Year: 2026
Published in: Evolution and Human …, 2026 - Elsevier
Institution: University of Antwerp, University of Bialystok, VU University, Emory University
Research Area: Personality psychology, Social cognition, Cognitive neuroscience
Discipline: Evolutionary psychology, human behavioral ecology
In a preregistered study, psychopathy (more than the other Dark Triad traits) is linked to worse cognitive empathy and greater dehumanization, and this empathy–psychopathy link is especially strong among people who are less sensitive at detecting agency in others.
-
Authors: LM Schulze Buschoff, E Akata, M Bethge
Year: 2025
Published in: Nature Machine ..., 2025 - nature.com
Institution: Max Planck Institute
Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)
Discipline: Cognitive Science, Artificial Intelligence, Computer Vision
Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.
Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.
Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.
DOI: https://doi.org/10.1038/s42256-024-00963-y
Citations: 70
-
Authors: H Bai, JG Voelkel, S Muldowney, JC Eichstaedt
Year: 2025
Published in: Nature ..., 2025 - nature.com
Institution: Stanford University
Research Area: Political Persuasion, Large Language Models
Discipline: Computational Social Science
LLM-generated messages can effectively persuade humans on policy issues similarly to human-crafted messages, with differences in perceived persuasion mechanisms.
Methods: Three pre-registered experiments were conducted comparing the persuasive effectiveness of LLM-generated and human-generated messages on policy attitudes, using control conditions with neutral messages.
Key Findings: Influence of LLM-generated messages on participants' policy attitudes and perceived characteristics of the message authors.
Citations: 37
Sample Size: 4829
-
Authors: JY Bo, S Wan, A Anderson
Year: 2025
Published in: Proceedings of the 2025 CHI Conference ..., 2025 - dl.acm.org
Institution: University of Toronto
Research Area: Appropriate reliance on LLM, Human-Computer Interaction, AI-assisted decision making.
Discipline: Human-Computer Interaction
This paper explores the latest advancements and key trends in the field of Human-Computer Interaction (HCI), focusing on novel interfaces and user experience paradigms.
Citations: 25
-
Authors: F Sun, N Li, K Wang, L Goette
Year: 2025
Published in: arXiv preprint arXiv:2505.02151, 2025 - arxiv.org
Institution: HKU Business School
Research Area: LLM Overconfidence and Human Bias Amplification, Bias, Large Language Models
Discipline: Artificial Intelligence, Behavioral Science
Large language models (LLMs) exhibit overconfidence, amplifying human bias, especially in cases where their certainty declines, and their input doubles overconfidence in human decision making despite improving accuracy.
Methods: Algorithmically constructed reasoning problems with known ground truths were used to evaluate LLMs' confidence; comparisons were drawn with human performance using similar experimental protocols.
Key Findings: LLM confidence levels, correctness probabilities, comparison of bias between LLMs and humans, and effects of LLM input on human decision making.
Citations: 21
-
Authors: P Spitzer, J Holstein, K Morrison
Year: 2025
Published in: ... Journal of Human ..., 2025 - Taylor & Francis
Institution: Karlsruhe Institute of Technology, Carnegie Mellon University, University of Bayreuth
Research Area: Human-AI Collaboration, Explainable AI (XAI)
Discipline: Human-Computer Interaction
Incorrect explanations in AI-assisted decision-making lead to a misinformation effect, negatively impacting human reasoning, procedural knowledge, and collaboration performance.
Methods: A study on human-AI collaboration involving AI-supported decision-making paired with explainable AI (XAI) to assess the effects of incorrect explanations.
Key Findings: Impact of incorrect explanations on human reasoning strategies, procedural knowledge, and team performance in human-AI collaboration.
Citations: 13
Sample Size: 160
-
Authors: MM Karim, S Khan, DH Van, X Liu, C Wang, Q Qu
Year: 2025
Published in: Future Internet, 2025 - mdpi.com
Institution: Chinese Academy of Sciences, Zhejiang University, South-Central Minzu University
Research Area: Artificial Intelligence, Data Annotation, Multi-Agent Systems
Discipline: Artificial Intelligence
The paper reviews the role of AI agents powered by large language models in addressing challenges in data annotation, focusing on architectures, workflows, real-world applications, and future research directions for improving efficiency, scalability, transparency, and bias mitigation.
Methods: Comprehensive review and analysis of AI agent architectures, workflows, applications, and evaluation methods in data annotation across multiple industries.
Key Findings: Capabilities of LLM-driven agents in reasoning, adaptive learning, collaborative annotation, and their impact on quality assurance, cost, scalability, and bias mitigation.
Citations: 10
-
Authors: N Byrd
Year: 2025
Published in: Byrd, N. (2025). Reflection-Philosophy Order Effects and Correlations Across Samples. Analysis. DOI: 10.1093/analys/anaf015. https://osf.io/preprints/psyarxiv/y8sdm
Institution: Stevens Institute of Technology
Research Area: Behavioral Research Methods, Experimental Psychology, Crowdsourcing Platforms
Discipline: Psychology
Reflective reasoning correlates with certain philosophical decisions, and the study suggests bidirectional causal paths between reflection and philosophy, with test order effects influencing reflection test outcomes but not philosophical decisions.
Methods: Participants from four sources (Amazon Mechanical Turk, CloudResearch, Prolific, and a university) were tested on reflective reasoning and their decisions on 10 philosophical thought experiments.
Key Findings: Impact of reflective reasoning on philosophical decisions and the effect of test order on reflection and philosophy outcomes.
Citations: 4
-
Authors: C Qian, AT Parisi, C Bouleau, V Tsai
Year: 2025
Published in: Proceedings of the ..., 2025 - aclanthology.org
Institution: Google, Google DeepMind
Research Area: Human-AI Alignment, Collective Reasoning, Social Biases, LLM Simulation of Human Behavior, AI Bias
Discipline: Natural Language Processing, Artificial Intelligence, Computational Social Science
This study examines human-AI alignment in collective reasoning using an empirical framework, demonstrating how LLMs either mirror or mask human biases depending on context, cues, and model-specific inductive biases.
Methods: The study uses the Lost at Sea social psychology task in a large-scale online experiment, simulating LLM groups conditioned on human decision-making data across varying conditions of visible or pseudonymous demographics.
Key Findings: Alignment of LLM behavior with human social reasoning, focusing on collective decision-making and biases in group interactions.
Citations: 1
Sample Size: 748
-
Authors: D Testa, G Bonetta, R Bernardi, A Bondielli
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Università di Roma La Sapienza
Research Area: Multimodal Reasoning, AI Benchmarking
Discipline: Artificial Intelligence
MAIA is a benchmark designed to evaluate the reasoning abilities of Vision Language Models (VLMs) on video-based tasks, with a focus on Italian culture and language, revealing their fragility in consistency and visually grounded language comprehension and generation.
Methods: MAIA comprises a set of video-related questions tested with two tasks: visual statement verification and open-ended visual question answering, categorized into twelve reasoning types to disentangle language-vision relations.
Key Findings: The ability of Vision Language Models (VLMs) to perform consistent, visually grounded natural language understanding and generation across fine-grained reasoning categories.
DOI: https://doi.org/10.48550/arXiv.2502.16989
-
Authors: Y Gao, D Lee, G Burtch, S Fazelpour
Year: 2024
Published in: arXiv preprint arXiv:2410.19599, 2024 - arxiv.org
Institution: Boston University, Northeastern University
Research Area: LLMs as Human Surrogates, Social Science Research Methods, Human Behavior Simulation
Discipline: Economics, Artificial Intelligence, Social Science
LLMs fail to accurately replicate human behavior in the 11-20 money request game, cautioning against their use as surrogates for human cognition in social science research.
Methods: The study evaluates the reasoning depth of various advanced LLMs through their performance on the 11-20 money request game, analyzing failure points related to input language, roles, and safeguarding.
Key Findings: The ability of LLMs to replicate human-like behavior and reasoning distribution in the context of social science simulations.
Citations: 25
-
Authors: K Warren, T Tucker, A Crowder, D Olszewski
Year: 2024
Published in: Proceedings of the ..., 2024 - dl.acm.org
Institution: University of Florida
Research Area: Audio Deepfake Detection, Human Factors in AI Security, Perceptual Studies, AI Security
Discipline: Computer Science
Humans outperform machine learning models in classifying real human audio versus deepfakes, but are often misled by preconceptions about generated content, highlighting the need for more synergistic approaches between human and machine decision-making.
Methods: A large-scale user study was conducted where over 1,200 participants evaluated audio samples from three widely-cited deepfake datasets. Performance was quantitatively measured and thematic analysis was used to explore user reasoning and differences from machine classification.
Key Findings: Comparison of human and machine classification performance on audio deepfake detection, analysis of user reasoning, and evaluation of error patterns between both humans and models.
DOI: https://doi.org/10.1145/3658644.3670325
Citations: 14
Sample Size: 1200
-
Authors: V Cheung, M Maier, F Lieder
Year: 2024
Published in: Psyarxiv preprint, 2024 - files.osf.io
Institution: University College LondonA
Research Area: AI Ethics, Moral Decision-Making, Cognitive Biases in LLMs, AI Bias
Discipline: Artificial Intelligence, Ethics
Citations: 11
-
Authors: J Geels, P Graßl, H Schraffenberger, M Tanis
Year: 2024
Published in: Plos one, 2024 - journals.plos.org
Institution: Network Institute
Research Area: Source Credibility, Social Media, Information Flow
Discipline: Social Science
Citations: 11
-
Authors: F Zanartu, J Cook, M Wagner, J Garcia
Year: 2024
Published in: ArXiv
Institution: Monash University, University of Melbourne
Research Area: Artificial Intelligence, Computational Social Science, Misinformation Detection, Fallacy Analysis in Climate Communication.
Discipline: Artificial Intelligence, Computational Social Science
Citations: 6
-
Authors: S Schmer-Galunder, R Wheelock, Z Jalan
Year: 2024
Published in: Proceedings of the ..., 2024 - ojs.aaai.org
Institution: Google DeepMind, Google, Accenture, Amazon
Research Area: AI Ethics and Prosocial Data Annotation
Discipline: Artificial Intelligence, Ethics, Behavioral Science
DOI: https://doi.org/10.1609/aies.v7i1.31726
Citations: 3
-
Authors: Eyal Aharoni, Sharlene Fernandes, Daniel J. Brady, Caelan Alexander, Michael Criner, Kara Queen, Javier Rando, Eddy Nahmias, Victor Crespo
Year: 2024
Published in: Nature
Institution: Duke University, ETH Zurich, Georgia State University
Research Area: Moral Responsibility, Agency in AI, Human-AI Moral Interaction
Discipline: AI Ethics
-
Authors: Z Qiu, W Liu, H Feng, Z Liu, T Xiao
Year: 2024
Published in: ArXiv
Institution: Massachusetts Institute of Technology, Max Planck Institute, University of Cambridge
Research Area: Computational cognition, LLM evaluation, Program synthesis, Multimodal reasoning
Discipline: Artificial Intelligence
Introduces SGP-Bench, a benchmark testing whether LLMs can answer semantic and spatial questions about images purely from graphics programs (SVG/CAD), effectively probing “visual imagination without vision.” The authors show current LLMs struggle - sometimes near chance - even when images are trivial for humans, but demonstrate that Symbolic Instruction Tuning (SIT) can meaningfully improve thi...
-
Authors: Eddie L. Ungless, Nina Markl, Björn Ross
Year: 2024
Published in: ArXiv
Institution: University of Edinburgh, University of Essex
Research Area: Computational Social Science, Human-Computer Interaction, Media Studies
Discipline: Computational Social Science