Browse 23 peer-reviewed papers in Large Sample. Discover studies powered by high-quality human data from Prolific.
This page lists 23 peer-reviewed papers tagged with Large Sample in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.
-
Authors: N Petrova, A Gordon, E Blindow
Year: 2026
Published in: Open review
Institution: Prolific
Research Area: Human-centered AI evaluation, Bayesian statistics, Responsible AI, AI alignment, LLM Evaluation
Discipline: Machine Learning, Artificial Intelligence
The study introduces HUMAINE, a multidimensional evaluation framework for LLMs, revealing demographic-specific preference variations and ranking google/gemini-2.5-pro as the top-performing model with a posterior probability of 95.6%.
Methods: Multi-turn naturalistic conversations analyzed using a hierarchical Bayesian Bradley-Terry-Davidson model with post-stratification to census data, stratified across 22 demographic groups.
Key Findings: Performance of 28 LLMs across five human-centric dimensions, accounting for demographic-specific preferences.
Sample Size: 23404
-
Authors: M Groh, A Sankaranarayanan, N Singh, DY Kim
Year: 2025
Published in: Nature ..., 2024 - nature.com
Institution: Northwestern University, Massachusetts Institute of Technology
Research Area: Deepfakes, Media Forensics, Human Perception of AI-Generated Content, Political Communication
Discipline: Computational Social Science
Humans are better at detecting deepfake political speeches using audio-visual cues than relying on text alone; state-of-the-art text-to-speech audio makes deepfakes harder to discern.
Methods: Five pre-registered randomized experiments with varied base rates of misinformation, audio sources, question framings, and media modalities were conducted.
Key Findings: Human accuracy in discerning real political speeches from deepfakes across media formats and contextual variables.
DOI: https://doi.org/10.1038/s41467-024-51998-z
Citations: 63
Sample Size: 2215
-
Authors: H Bai, JG Voelkel, S Muldowney, JC Eichstaedt
Year: 2025
Published in: Nature ..., 2025 - nature.com
Institution: Stanford University
Research Area: Political Persuasion, Large Language Models
Discipline: Computational Social Science
LLM-generated messages can effectively persuade humans on policy issues similarly to human-crafted messages, with differences in perceived persuasion mechanisms.
Methods: Three pre-registered experiments were conducted comparing the persuasive effectiveness of LLM-generated and human-generated messages on policy attitudes, using control conditions with neutral messages.
Key Findings: Influence of LLM-generated messages on participants' policy attitudes and perceived characteristics of the message authors.
Citations: 37
Sample Size: 4829
-
Authors: K Hackenburg, L Ibrahim, BM Tappin, M Tsakiris
Year: 2025
Published in: AI & SOCIETY, 2025 - Springer
Institution: Oxford Internet Institute, University of Oxford
Research Area: Political Communication and Persuasion, Large Language Models
Discipline: Political Science, Artificial Intelligence
GPT-4's ability to generate persuasive messages rivaled human experts on polarized US political issues, suggesting AI tools may have significant implications for political campaigns and democracy.
Methods: Pre-registered experiment where GPT-4 generated partisan role-playing persuasive messages, which were compared to those from human persuasion experts.
Key Findings: Persuasive impact of GPT-4-generated messages versus human expert messages on U.S. political issues.
Citations: 35
Sample Size: 4955
-
Authors: K Hackenburg, BM Tappin, P Röttger, SA Hale
Year: 2025
Published in: Proceedings of the ..., 2025 - pnas.org
Institution: University of California Berkeley, University of Cambridge, University of Oxford, Max Planck Institute
Research Area: Political Persuasion, Large Language Models
Discipline: Computational Social Science, Political Science
Scaling language model sizes leads to diminishing returns in generating persuasive political messages, with larger models providing minimal gains compared to smaller ones after controlling for task completion metrics like coherence and relevance.
Methods: Generated 720 political messages using 24 LLMs of varying sizes and tested their persuasiveness through a large-scale randomized survey experiment.
Key Findings: Persuasive capability of language models across different sizes in generating political messages.
Citations: 31
Sample Size: 25982
-
Authors: T Mendel, N Singh, DM Mann, B Wiesenfeld
Year: 2025
Published in: Journal of medical ..., 2025 - jmir.org
Institution: The City University of New York, George Washington University, New York University
Research Area: LLMs in Digital Health, Health Queries, User Attitudes
Discipline: Digital Health
Laypeople primarily use search engines over large language models (LLMs) for health queries, perceiving LLMs as less useful but less biased and more human-like while exhibiting no significant difference in trust or ease of use.
Methods: A screening survey followed by logistic regression analysis and a follow-up survey; comparisons were performed using ANOVA, Tukey post hoc tests, and paired-sample Wilcoxon tests.
Key Findings: Demographics and behaviors of LLM and search engine users for health queries, perceived usefulness, ease of use, trustworthiness, bias, and anthropomorphism.
Citations: 21
Sample Size: 2002
-
Authors: JQ Zhu, JC Peterson, B Enke, TL Griffiths
Year: 2025
Published in: Nature Human Behaviour, 2025 - nature.com
Institution: Princeton University, Boston University, Harvard University
Research Area: Strategic decision-making, Machine learning, Computational Cognitive Science
Discipline: Artificial Intelligence
This study used deep neural networks to analyze human strategic decision-making, predicting choices more accurately than existing theories and uncovering the context-dependent nature of reasoning and decision-making in complex games.
Methods: Deep neural networks trained on data from procedurally generated matrix games with over 2,400 variations; models were modified for interpretability.
Key Findings: Human choices and reasoning in initial play of two-player matrix games, focusing on strategic decision-making and response to game complexity.
DOI: https://doi.org/10.1038/s41562-025-02230-5
Citations: 16
Sample Size: 90000
-
Authors: H Ju, S Aral
Year: 2025
Published in: arXiv preprint arXiv:2503.18238, 2025 - arxiv.org
Institution: Johns Hopkins Carey Business School, MIT Sloan School of Management
Research Area: Human-AI Collaboration, Teamwork, Organizational Productivity
Discipline: Human-AI Interaction
Collaboration with AI agents increases productivity, reshapes communication patterns, and improves text quality while human teams excel in image quality; AI requires fine-tuning for multimodal workflows.
Methods: Large-scale randomized controlled trials using Pairit platform with human-human and human-AI teams performing collaborative marketing tasks.
Key Findings: Productivity, communication patterns, workflow processes, ad quality (text and image), and ad performance metrics.
DOI: https://doi.org/10.48550/arXiv.2503.18238
Citations: 14
Sample Size: 2310
-
Authors: M Chung
Year: 2025
Published in: Internet Research, 2023 - emerald.com
Institution: University of Washington, Emory University
Research Area: Algorithmic Knowledge, Misinformation Countermeasures, Comparative Media Studies, Information Science
Discipline: Information Science
The study examines how algorithmic knowledge influences attitudes and actions against misinformation, revealing that perceptions of media influence on self and others predict corrective actions and support for regulation differently across four countries.
Methods: Four national surveys were conducted in the USA, UK, South Korea, and Mexico, with data analyzed through multigroup structural equation modeling (SEM).
Key Findings: Algorithmic knowledge, perceived influence of misinformation on self and others, intention to correct misinformation, support for regulation and content moderation.
DOI: https://doi.org/10.1108/INTR-07-2022-0578
Citations: 14
Sample Size: 5432
-
Authors: Z Chen, J Kalla, Q Le, S Nakamura-Sakai
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: The affiliated institutions could not be determined from the provided context or an external search of the URL.
Research Area: Artificial Intelligence and Social Science, Persuasion Studies, Political Persuasion, LLM Chatbots, Democratic Societies
Discipline: Artificial Intelligence, Social Science
The study evaluates the cost-effectiveness and persuasive risks of Large Language Model (LLM) chatbots in political contexts, finding that while LLMs are as persuasive as campaign ads under exposure, their large-scale influence is currently limited by scalability and cost barriers.
Methods: Two survey experiments combined with real-world simulation exercises to measure the persuasiveness of LLM chatbots compared to traditional campaign tactics, focusing on both exposure and acceptance phases of persuasion.
Key Findings: Short- and long-term persuasive effects of LLMs, cost-effectiveness of LLM-based persuasion ($48-$74 per persuaded voter), and scalability compared to traditional campaign approaches.
Citations: 7
Sample Size: 10417
-
Authors: J Beck, S Eckman, C Kern, F Kreuter
Year: 2025
Published in: arXiv preprint arXiv:2509.08514, 2025 - arxiv.org
Institution: National Institutes of Health, National Center for Biotechnology Information
Research Area: Human-Computer Interaction
Discipline: Human-Computer Interaction
Human attitudes toward AI strongly influence performance in collaborative tasks, with skeptics showing better error detection and accuracy, while automation favorability increases overreliance on AI suggestions.
Methods: Randomized experiment with a controlled annotation task manipulating AI suggestion quality, task burden, and performance-based financial incentives; collected demographic, attitudinal, and behavioral data.
Key Findings: Impact of AI suggestion quality, task burden, and financial incentives on participant performance metrics (accuracy, correction activity, overcorrection, undercorrection); influence of demographic and psychological characteristics on performance.
Citations: 4
Sample Size: 2784
-
Authors: L Luettgau, HR Kirk, K Hackenburg, J Bergs, H Davidson, H Ogden, D Siddarth, S Huang
Year: 2025
Published in: ARXIV
Institution: AI Security Institute, I Policy Directorate, Collective Intelligence Project, Anthropic
Research Area: Experimental evaluation, RCT, Survey Research
Discipline: Computer Science, Human-Computer Interaction
Conversational AI is as effective as self-directed internet searches in increasing political knowledge, reducing misinformation beliefs, and promoting accuracy among users in the UK during the 2024 election period.
Methods: A national survey (N=2,499) measured conversational AI usage for political information-seeking, followed by a series of randomised controlled trials (N=2,858) comparing conversational AI to self-directed internet search in improving political knowledge.
Key Findings: Extent of conversational AI usage for political knowledge-seeking in the UK and its efficacy in enhancing political knowledge and reducing misinformation compared to traditional internet searches.
Citations: 3
Sample Size: 5357
-
Authors: T Hu, N Collier
Year: 2025
Published in: arXiv preprint arXiv:2503.03335, 2025 - arxiv.org
Institution: University of Cambridge
Research Area: Affective Computing, Natural Language Processing, Computational Social Science
Discipline: Computational Social Science
The iNews dataset is a multimodal resource for studying personalized affective responses to news, improving modeling accuracy by incorporating annotator persona metadata.
Methods: 292 demographically diverse UK participants annotated 2,899 Facebook news posts with multidimensional labels (e.g., emotions, valence, arousal), combined with comprehensive participant persona data.
Key Findings: Modeled personalized affective responses to news through annotations capturing valence, arousal, emotions, and persona metadata.
Citations: 2
Sample Size: 2899
-
Authors: D Jordan, T Ollerenshaw, A Trexler
Year: 2025
Published in: 2025 - weekendu.uh.edu
Institution: University of Houston, Duke University
Research Area: Experimental Survey Research Methodology
Discipline: Social Science, Research Methodology
Repeated measure designs offer enhanced precision with minimal bias, suitable for various experiments despite slight attenuation of treatment effects.
Methods: Experimentally manipulated six classic political science experiments across three sample types, including extensions with proximity manipulation and sample-type variations.
Key Findings: Suitability and precision of repeated measure designs in survey experiments, including treatment effect estimations and design applicability across different sample types and methodologies.
Citations: 1
Sample Size: 13163
-
Authors: L Hölbling, S Maier, S Feuerriegel
Year: 2025
Published in: Scientific Reports, 2025 - nature.com
Institution: University of Lausanne, University of Zurich, University of St. Gallen
Research Area: LLMs in Persuasion, Meta-Analysis, Artificial Intelligence, Human-Computer Interaction
Discipline: Artificial Intelligence
Large language models (LLMs) demonstrate similar persuasive performance to humans overall, but their effectiveness varies widely based on contextual factors such as model type, conversation design, and domain.
Methods: Systematic review and meta-analysis using Hedges' g to compute standardized effect sizes, with exploratory moderator analyses and publication bias checks (Egger's test, trim-and-fill analysis).
Key Findings: The persuasive effectiveness of LLMs compared to humans across various contexts and studies.
Sample Size: 17422
-
Authors: B Grimm, P Yilmam, B Talbot, L Larsen
Year: 2025
Published in: npj Digital Medicine, 2025 - nature.com
Institution: Videra Health
Research Area: Computational Mental Health Assessment, Multimodal Machine Learning
Discipline: Computational Health, Digital Medicine
A multimodal machine learning model using text (MPNet) and voice (HuBERT) analysis predicts depression, anxiety, and trauma from a single video-based question with strong performance and demographic consistency while significantly reducing assessment time.
Methods: Multimodal analysis combining MPNet for textual data and HuBERT for prosodic voice features trained on video-based responses.
Key Findings: Efficient prediction of self-reported scores for depression (PHQ-9), anxiety (GAD-7), and trauma (PCL-5) from brief video responses.
Sample Size: 2420
-
Authors: L Woodley, X Roberts-Gaal, R Calcott, F Cushman
Year: 2025
Published in: files.osf.io
Institution: Harvard University
Research Area: Experimental Psychology, Research Methodology, Replication Studies
Discipline: Psychology, Social Science
Explicit demand cues do not alter participant behavior, judgments, or attitudes in online psychology experiments, despite participants adjusting their beliefs about study hypotheses.
Methods: Three preregistered experiments on Prolific tested the impact of explicit demand cues on participant behavior using a dictator game, a moral dilemma vignette, and a group attitude intervention. Participants were randomly assigned to receive information about the study hypothesis or no information.
Key Findings: Whether explicit demand cues influence behavior, judgments, or attitudes in online psychology studies.
Sample Size: 2254
-
Authors: K Grosse, N Ebert
Year: 2025
Published in: ARXIV
Institution: IBM Research, ZHAW
Research Area: Security and privacy risks, Large Language Models, Human-AI Interaction, AI Safety
Discipline: Computer Science
A survey of 3,270 UK adults reveals significant security and privacy risks in AI conversational agent usage, with a third engaging in risky behavior enabling attacks and many unaware of how their data are used or opting out.
Methods: Representative survey conducted via Prolific platform targeting UK adults, focusing on usage behaviors of AI conversational agents.
Key Findings: User behaviors related to security and privacy risks, data sanitization practices, attempts to jailbreak AI models, and awareness of data usage policies.
Sample Size: 3270
-
Authors: M Reis, F Reis, W Kunde
Year: 2024
Published in: Nature Medicine, 2024 - nature.com
Institution: University of Cambridge, Julius Maximilians Universität
Research Area: AI in Healthcare, Medical Ethics, Cognitive Psychology, Human-Computer Interaction (HCI) in Medicine
Discipline: AI in Healthcare, Medical Ethics, Cognitive Psychology
The study found that medical advice labeled as being sourced from AI (or AI supervised by humans) is perceived as less reliable and empathetic compared to advice labeled as originating solely from a human physician, resulting in reduced willingness to follow such advice.
Methods: Two preregistered studies were conducted where participants were presented with identical medical advice scenarios but with manipulated labels for the advice source ('AI', 'human physician', 'human+AI').
Key Findings: Participants' perceptions of reliability, empathy, and willingness to follow medical advice based on the perceived source.
Citations: 78
Sample Size: 2280
-
Authors: D Guilbeault, S Delecourt, T Hull, BS Desikan, M Chu
Year: 2024
Published in: Nature, 2024 - nature.com
Institution: University of California Berkeley, Institute For Public Policy Research, Columbia University, University of Southern California Los Angeles
Research Area: Gender Bias, Computational Social Science, Online Media, AI Bias
Discipline: Computational Social Science
Online images significantly amplify gender bias compared to text, with biases in visual content impacting societal beliefs about gender roles.
Methods: Analyzed 3,495 social categories using over one million images from platforms like Google, Wikipedia, and IMDb, compared visual content to billions of words from the same platforms, and conducted a preregistered national experiment to assess the psychological impact on participants' beliefs.
Key Findings: The prevalence and psychological impact of gender bias in online images compared to text, including gender associations and representation disparities.
DOI: https://doi.org/10.1038/s41586-024-07068-x
Citations: 72
Sample Size: 3495