Large Sample Studies

This page lists 24 peer-reviewed papers tagged with Large Sample in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (20 of 24)

M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset

Authors: J Geng, J Tonglet, I Gurevych

Year: 2026

Published in: arXiv preprint arXiv:2510.23508, 2025•arxiv.org

Institution: KU Leuven, TU Darmstadt, Ubiquitous Knowledge Processing Lab, MBZUAI, ATHENE

Research Area: Human-Computer Interaction

Discipline: Machine Learning, Artificial Intelligence

M4FC is a new dataset that addresses limitations in existing multimodal fact-checking datasets by providing multilingual and multicultural claims verified by professional fact-checkers across six fact-checking tasks.

Methods: The dataset was created by pairing 4,982 images with 6,980 claims, which were verified by professional fact-checkers from 22 organizations covering diverse cultural and geographic contexts. The claims are available in up to ten languages and span six different multimodal fact-checking tasks.

Key Findings: The study measured the efficacy of the M4FC dataset across six multimodal fact-checking tasks, with a focus on how combining intermediate tasks affects the performance of verdict prediction.

Citations: 3

Sample Size: 6980
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

Authors: N Petrova, A Gordon, E Blindow

Year: 2026

Published in: Open review

Institution: Prolific

Research Area: Human-centered AI evaluation, Bayesian statistics, Responsible AI, AI alignment, LLM Evaluation

Discipline: Machine Learning, Artificial Intelligence

The study introduces HUMAINE, a multidimensional evaluation framework for LLMs, revealing demographic-specific preference variations and ranking google/gemini-2.5-pro as the top-performing model with a posterior probability of 95.6%.

Methods: Multi-turn naturalistic conversations analyzed using a hierarchical Bayesian Bradley-Terry-Davidson model with post-stratification to census data, stratified across 22 demographic groups.

Key Findings: Performance of 28 LLMs across five human-centric dimensions, accounting for demographic-specific preferences.

Sample Size: 23404
Human detection of political speech deepfakes across transcripts, audio, and video

Authors: M Groh, A Sankaranarayanan, N Singh, DY Kim

Year: 2025

Published in: Nature ..., 2024 - nature.com

Institution: Northwestern University, Massachusetts Institute of Technology

Research Area: Deepfakes, Media Forensics, Human Perception of AI-Generated Content, Political Communication

Discipline: Computational Social Science

Humans are better at detecting deepfake political speeches using audio-visual cues than relying on text alone; state-of-the-art text-to-speech audio makes deepfakes harder to discern.

Methods: Five pre-registered randomized experiments with varied base rates of misinformation, audio sources, question framings, and media modalities were conducted.

Key Findings: Human accuracy in discerning real political speeches from deepfakes across media formats and contextual variables.

DOI: https://doi.org/10.1038/s41467-024-51998-z

Citations: 63

Sample Size: 2215
LLM-generated messages can persuade humans on policy issues

Authors: H Bai, JG Voelkel, S Muldowney, JC Eichstaedt

Year: 2025

Published in: Nature ..., 2025 - nature.com

Institution: Stanford University

Research Area: Political Persuasion, Large Language Models

Discipline: Computational Social Science

LLM-generated messages can effectively persuade humans on policy issues similarly to human-crafted messages, with differences in perceived persuasion mechanisms.

Methods: Three pre-registered experiments were conducted comparing the persuasive effectiveness of LLM-generated and human-generated messages on policy attitudes, using control conditions with neutral messages.

Key Findings: Influence of LLM-generated messages on participants' policy attitudes and perceived characteristics of the message authors.

Citations: 37

Sample Size: 4829
Comparing the persuasiveness of role-playing large language models and human experts on polarized US political issues

Authors: K Hackenburg, L Ibrahim, BM Tappin, M Tsakiris

Year: 2025

Published in: AI & SOCIETY, 2025 - Springer

Institution: Oxford Internet Institute, University of Oxford

Research Area: Political Communication and Persuasion, Large Language Models

Discipline: Political Science, Artificial Intelligence

GPT-4's ability to generate persuasive messages rivaled human experts on polarized US political issues, suggesting AI tools may have significant implications for political campaigns and democracy.

Methods: Pre-registered experiment where GPT-4 generated partisan role-playing persuasive messages, which were compared to those from human persuasion experts.

Key Findings: Persuasive impact of GPT-4-generated messages versus human expert messages on U.S. political issues.

Citations: 35

Sample Size: 4955
Scaling language model size yields diminishing returns for single-message political persuasion

Authors: K Hackenburg, BM Tappin, P Röttger, SA Hale

Year: 2025

Published in: Proceedings of the ..., 2025 - pnas.org

Institution: University of California Berkeley, University of Cambridge, University of Oxford, Max Planck Institute

Research Area: Political Persuasion, Large Language Models

Discipline: Computational Social Science, Political Science

Scaling language model sizes leads to diminishing returns in generating persuasive political messages, with larger models providing minimal gains compared to smaller ones after controlling for task completion metrics like coherence and relevance.

Methods: Generated 720 political messages using 24 LLMs of varying sizes and tested their persuasiveness through a large-scale randomized survey experiment.

Key Findings: Persuasive capability of language models across different sizes in generating political messages.

Citations: 31

Sample Size: 25982
Laypeople's use of and attitudes toward large language models and search engines for health queries: survey study

Authors: T Mendel, N Singh, DM Mann, B Wiesenfeld

Year: 2025

Published in: Journal of medical ..., 2025 - jmir.org

Institution: The City University of New York, George Washington University, New York University

Research Area: LLMs in Digital Health, Health Queries, User Attitudes

Discipline: Digital Health

Laypeople primarily use search engines over large language models (LLMs) for health queries, perceiving LLMs as less useful but less biased and more human-like while exhibiting no significant difference in trust or ease of use.

Methods: A screening survey followed by logistic regression analysis and a follow-up survey; comparisons were performed using ANOVA, Tukey post hoc tests, and paired-sample Wilcoxon tests.

Key Findings: Demographics and behaviors of LLM and search engine users for health queries, perceived usefulness, ease of use, trustworthiness, bias, and anthropomorphism.

Citations: 21

Sample Size: 2002
Capturing the complexity of human strategic decision-making with machine learning

Authors: JQ Zhu, JC Peterson, B Enke, TL Griffiths

Year: 2025

Published in: Nature Human Behaviour, 2025 - nature.com

Institution: Princeton University, Boston University, Harvard University

Research Area: Strategic decision-making, Machine learning, Computational Cognitive Science

Discipline: Artificial Intelligence

This study used deep neural networks to analyze human strategic decision-making, predicting choices more accurately than existing theories and uncovering the context-dependent nature of reasoning and decision-making in complex games.

Methods: Deep neural networks trained on data from procedurally generated matrix games with over 2,400 variations; models were modified for interpretability.

Key Findings: Human choices and reasoning in initial play of two-player matrix games, focusing on strategic decision-making and response to game complexity.

DOI: https://doi.org/10.1038/s41562-025-02230-5

Citations: 16

Sample Size: 90000
Collaborating with ai agents: Field experiments on teamwork, productivity, and performance

Authors: H Ju, S Aral

Year: 2025

Published in: arXiv preprint arXiv:2503.18238, 2025 - arxiv.org

Institution: Johns Hopkins Carey Business School, MIT Sloan School of Management

Research Area: Human-AI Collaboration, Teamwork, Organizational Productivity

Discipline: Human-AI Interaction

Collaboration with AI agents increases productivity, reshapes communication patterns, and improves text quality while human teams excel in image quality; AI requires fine-tuning for multimodal workflows.

Methods: Large-scale randomized controlled trials using Pairit platform with human-human and human-AI teams performing collaborative marketing tasks.

Key Findings: Productivity, communication patterns, workflow processes, ad quality (text and image), and ad performance metrics.

DOI: https://doi.org/10.48550/arXiv.2503.18238

Citations: 14

Sample Size: 2310
What's in the black box? How algorithmic knowledge promotes corrective and restrictive actions to counter misinformation in the USA, the UK, South Korea and Mexico

Authors: M Chung

Year: 2025

Published in: Internet Research, 2023 - emerald.com

Institution: University of Washington, Emory University

Research Area: Algorithmic Knowledge, Misinformation Countermeasures, Comparative Media Studies, Information Science

Discipline: Information Science

The study examines how algorithmic knowledge influences attitudes and actions against misinformation, revealing that perceptions of media influence on self and others predict corrective actions and support for regulation differently across four countries.

Methods: Four national surveys were conducted in the USA, UK, South Korea, and Mexico, with data analyzed through multigroup structural equation modeling (SEM).

Key Findings: Algorithmic knowledge, perceived influence of misinformation on self and others, intention to correct misinformation, support for regulation and content moderation.

DOI: https://doi.org/10.1108/INTR-07-2022-0578

Citations: 14

Sample Size: 5432
A Framework to Assess the Persuasion Risks Large Language Model Chatbots Pose to Democratic Societies

Authors: Z Chen, J Kalla, Q Le, S Nakamura-Sakai

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: The affiliated institutions could not be determined from the provided context or an external search of the URL.

Research Area: Artificial Intelligence and Social Science, Persuasion Studies, Political Persuasion, LLM Chatbots, Democratic Societies

Discipline: Artificial Intelligence, Social Science

The study evaluates the cost-effectiveness and persuasive risks of Large Language Model (LLM) chatbots in political contexts, finding that while LLMs are as persuasive as campaign ads under exposure, their large-scale influence is currently limited by scalability and cost barriers.

Methods: Two survey experiments combined with real-world simulation exercises to measure the persuasiveness of LLM chatbots compared to traditional campaign tactics, focusing on both exposure and acceptance phases of persuasion.

Key Findings: Short- and long-term persuasive effects of LLMs, cost-effectiveness of LLM-based persuasion ($48-$74 per persuaded voter), and scalability compared to traditional campaign approaches.

Citations: 7

Sample Size: 10417
Bias in the loop: How humans evaluate AI-generated suggestions

Authors: J Beck, S Eckman, C Kern, F Kreuter

Year: 2025

Published in: arXiv preprint arXiv:2509.08514, 2025 - arxiv.org

Institution: National Institutes of Health, National Center for Biotechnology Information

Research Area: Human-Computer Interaction

Discipline: Human-Computer Interaction

Human attitudes toward AI strongly influence performance in collaborative tasks, with skeptics showing better error detection and accuracy, while automation favorability increases overreliance on AI suggestions.

Methods: Randomized experiment with a controlled annotation task manipulating AI suggestion quality, task burden, and performance-based financial incentives; collected demographic, attitudinal, and behavioral data.

Key Findings: Impact of AI suggestion quality, task burden, and financial incentives on participant performance metrics (accuracy, correction activity, overcorrection, undercorrection); influence of demographic and psychological characteristics on performance.

Citations: 4

Sample Size: 2784
Conversational AI increases political knowledge as effectively as self-directed internet search

Authors: L Luettgau, HR Kirk, K Hackenburg, J Bergs, H Davidson, H Ogden, D Siddarth, S Huang

Year: 2025

Published in: ARXIV

Institution: AI Security Institute, I Policy Directorate, Collective Intelligence Project, Anthropic

Research Area: Experimental evaluation, RCT, Survey Research

Discipline: Computer Science, Human-Computer Interaction

Conversational AI is as effective as self-directed internet searches in increasing political knowledge, reducing misinformation beliefs, and promoting accuracy among users in the UK during the 2024 election period.

Methods: A national survey (N=2,499) measured conversational AI usage for political information-seeking, followed by a series of randomised controlled trials (N=2,858) comparing conversational AI to self-directed internet search in improving political knowledge.

Key Findings: Extent of conversational AI usage for political knowledge-seeking in the UK and its efficacy in enhancing political knowledge and reducing misinformation compared to traditional internet searches.

Citations: 3

Sample Size: 5357
iNews: A multimodal dataset for modeling personalized affective responses to news

Authors: T Hu, N Collier

Year: 2025

Published in: arXiv preprint arXiv:2503.03335, 2025 - arxiv.org

Institution: University of Cambridge

Research Area: Affective Computing, Natural Language Processing, Computational Social Science

Discipline: Computational Social Science

The iNews dataset is a multimodal resource for studying personalized affective responses to news, improving modeling accuracy by incorporating annotator persona metadata.

Methods: 292 demographically diverse UK participants annotated 2,899 Facebook news posts with multidimensional labels (e.g., emotions, valence, arousal), combined with comprehensive participant persona data.

Key Findings: Modeled personalized affective responses to news through annotations capturing valence, arousal, emotions, and persona metadata.

Citations: 2

Sample Size: 2899
Repeated Measure Designs are Superior for (Most) Experimental Survey Research Applications

Authors: D Jordan, T Ollerenshaw, A Trexler

Year: 2025

Published in: 2025 - weekendu.uh.edu

Institution: University of Houston, Duke University

Research Area: Experimental Survey Research Methodology

Discipline: Social Science, Research Methodology

Repeated measure designs offer enhanced precision with minimal bias, suitable for various experiments despite slight attenuation of treatment effects.

Methods: Experimentally manipulated six classic political science experiments across three sample types, including extensions with proximity manipulation and sample-type variations.

Key Findings: Suitability and precision of repeated measure designs in survey experiments, including treatment effect estimations and design applicability across different sample types and methodologies.

Citations: 1

Sample Size: 13163
A meta-analysis of the persuasive power of large language models

Authors: L Hölbling, S Maier, S Feuerriegel

Year: 2025

Published in: Scientific Reports, 2025 - nature.com

Institution: University of Lausanne, University of Zurich, University of St. Gallen

Research Area: LLMs in Persuasion, Meta-Analysis, Artificial Intelligence, Human-Computer Interaction

Discipline: Artificial Intelligence

Large language models (LLMs) demonstrate similar persuasive performance to humans overall, but their effectiveness varies widely based on contextual factors such as model type, conversation design, and domain.

Methods: Systematic review and meta-analysis using Hedges' g to compute standardized effect sizes, with exploratory moderator analyses and publication bias checks (Egger's test, trim-and-fill analysis).

Key Findings: The persuasive effectiveness of LLMs compared to humans across various contexts and studies.

Sample Size: 17422
Multimodal machine learning for video based single question mental health assessment

Authors: B Grimm, P Yilmam, B Talbot, L Larsen

Year: 2025

Published in: npj Digital Medicine, 2025 - nature.com

Institution: Videra Health

Research Area: Computational Mental Health Assessment, Multimodal Machine Learning

Discipline: Computational Health, Digital Medicine

A multimodal machine learning model using text (MPNet) and voice (HuBERT) analysis predicts depression, anxiety, and trauma from a single video-based question with strong performance and demographic consistency while significantly reducing assessment time.

Methods: Multimodal analysis combining MPNet for textual data and HuBERT for prosodic voice features trained on video-based responses.

Key Findings: Efficient prediction of self-reported scores for depression (PHQ-9), anxiety (GAD-7), and trauma (PCL-5) from brief video responses.

Sample Size: 2420
No Evidence of Experimenter Demand Effects in Three Online Psychology Experiments

Authors: L Woodley, X Roberts-Gaal, R Calcott, F Cushman

Year: 2025

Published in: files.osf.io

Institution: Harvard University

Research Area: Experimental Psychology, Research Methodology, Replication Studies

Discipline: Psychology, Social Science

Explicit demand cues do not alter participant behavior, judgments, or attitudes in online psychology experiments, despite participants adjusting their beliefs about study hypotheses.

Methods: Three preregistered experiments on Prolific tested the impact of explicit demand cues on participant behavior using a dictator game, a moral dilemma vignette, and a group attitude intervention. Participants were randomly assigned to receive information about the study hypothesis or no information.

Key Findings: Whether explicit demand cues influence behavior, judgments, or attitudes in online psychology studies.

Sample Size: 2254
Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents

Authors: K Grosse, N Ebert

Year: 2025

Published in: ARXIV

Institution: IBM Research, ZHAW

Research Area: Security and privacy risks, Large Language Models, Human-AI Interaction, AI Safety

Discipline: Computer Science

A survey of 3,270 UK adults reveals significant security and privacy risks in AI conversational agent usage, with a third engaging in risky behavior enabling attacks and many unaware of how their data are used or opting out.

Methods: Representative survey conducted via Prolific platform targeting UK adults, focusing on usage behaviors of AI conversational agents.

Key Findings: User behaviors related to security and privacy risks, data sanitization practices, attempts to jailbreak AI models, and awareness of data usage policies.

Sample Size: 3270
Influence of believed AI involvement on the perception of digital medical advice

Authors: M Reis, F Reis, W Kunde

Year: 2024

Published in: Nature Medicine, 2024 - nature.com

Institution: University of Cambridge, Julius Maximilians Universität

Research Area: AI in Healthcare, Medical Ethics, Cognitive Psychology, Human-Computer Interaction (HCI) in Medicine

Discipline: AI in Healthcare, Medical Ethics, Cognitive Psychology

The study found that medical advice labeled as being sourced from AI (or AI supervised by humans) is perceived as less reliable and empathetic compared to advice labeled as originating solely from a human physician, resulting in reduced willingness to follow such advice.

Methods: Two preregistered studies were conducted where participants were presented with identical medical advice scenarios but with manipulated labels for the advice source ('AI', 'human physician', 'human+AI').

Key Findings: Participants' perceptions of reliability, empathy, and willingness to follow medical advice based on the perceived source.

Citations: 78

Sample Size: 2280