Browse 39 peer-reviewed papers in Medium Sample. Discover studies powered by high-quality human data from Prolific.
This page lists 39 peer-reviewed papers tagged with Medium Sample in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.
-
Authors: F Salvi, M Horta Ribeiro, R Gallotti, R West
Year: 2025
Published in: Nature Human Behaviour, 2025 - nature.com
Institution: EPFL, Fondazione Bruno Kessle, Princeton University
Research Area: Conversational Persuasion of LLM, Human-Computer Interaction, Behavioral Science, Large Language Models
Discipline: Behavioral Science
GPT-4 can use personalized arguments to be more persuasive in debates, outperforming humans in 64.4% of AI-human comparisons when personalization is applied.
Methods: Preregistered controlled study involving multiround debates with random assignment to conditions focusing on AI-human comparisons, personalization, and opinion strength.
Key Findings: Effectiveness of persuasion by GPT-4, especially when using personalized arguments, compared to humans in debates.
Citations: 65
Sample Size: 900
-
Authors: T Zhang, A Koutsoumpis, JK Oostrom
Year: 2025
Published in: IEEE Transactions ..., 2024 - ieeexplore.ieee.org
Institution: Southeast University, Vrije Universiteit, Tilburg University
Research Area: LLM Personality Assessment, Human-AI Interaction, Large Language Models
Discipline: Human-AI Interaction, Social Science, Humanities
LLMs like GPT-3.5 and GPT-4 can rival or outperform task-specific AI models in assessing personality traits from asynchronous video interviews, but show uneven performance, low reliability, and potential biases, warranting cautious use in high-stakes scenarios.
Methods: The study evaluated GPT-3.5 and GPT-4 performance in assessing personality traits and interview performance using simulated AVI responses, comparing them with ratings from task-specific AI and human annotators.
Key Findings: Validity, reliability, fairness, and rating patterns of LLMs (GPT-3.5 and GPT-4) in personality assessment from asynchronous video interviews.
Citations: 31
Sample Size: 685
-
Authors: Y Ding, J You, TK Machulla, J Jacobs, P Sen
Year: 2025
Published in: Proceedings of the ..., 2022 - dl.acm.org
Institution: University of California Irvine, University of Florida, State University of New York at Buffalo, University of Waterloo, Virginia Tech
Research Area: Computational Social Science, Human-Computer Interaction, Sentiment Analysis
Discipline: Computational Social Science
Demographic differences among annotators significantly affect sentiment dataset labels, causing up to a 4.5% accuracy difference in sentiment prediction models.
Methods: Crowdsourced annotations from >1000 workers combined with demographic data; analysis of multimodal sentiment datasets and evaluation using machine learning models.
Key Findings: Impact of annotator demographics on sentiment labeling and its effect on model predictions.
DOI: https://doi.org/10.1145/3555632
Citations: 28
Sample Size: 1000
-
Authors: L Ibrahim, C Akbulut, R Elasmar, C Rastogi, M Kahng, MR Morris, KR McKee, V Rieser, M Shanahan, L Weidinger
Year: 2025
Published in: arXiv preprint arXiv:2502.07077, 2025•arxiv.org
Institution: Google DeepMind, Google, University of Oxford
Research Area: Multimodal conversational AI, conversational AI, Evaluation methodology, benchmarking
Discipline: Computer Science, Natural Language Processing, Human-Computer Interaction
The paper evaluates anthropomorphic behaviors in SOTA LLMs through a multi-turn methodology, showing that such behaviors, including empathy and relationship-building, predominantly emerge after multiple interactions and influence user perceptions.
Methods: Multi-turn evaluation of 14 anthropomorphic behaviors using simulations of user interactions, validated by a large-scale human subject study.
Key Findings: Anthropomorphic behaviors in large language models, including relationship-building and pronoun usage, and their perception by users.
Citations: 26
Sample Size: 1101
-
Authors: U Messer
Year: 2025
Published in: Computers in Human Behavior: Artificial Humans, 2025 - Elsevier
Institution: Universität der Bundeswehr München
Research Area: Political Bias in Generative AI, Human-AI Interaction, Affective Computing, AI Bias
Discipline: Computer Science, Human-AI Interaction
People's acceptance and reliance on Generative AI (GAI) increase when they perceive alignment between their political orientation and the bias of GAI-generated content, leading to expanded trust in sensitive applications.
Methods: Three experiments analyzing behavioral reactions to politically biased content generated by GAI, including the impact of perceived alignment on acceptance and trust.
Key Findings: Participants' acceptance, reliance, and trust in GAI based on perceived alignment between political bias of GAI-generated content and their own political beliefs.
DOI: https://doi.org/10.1016/j.chbah.2024.100108
Citations: 24
Sample Size: 513
-
Authors: A Okoso, K Otaki, S Koide, Y Baba
Year: 2025
Published in: ACM Transactions on Recommender Systems, 2025•dl.acm.org
Institution: Toyota Central R and D Labs, Toyota
Research Area: Human-Computer Interaction
Discipline: Machine Learning, Artificial Intelligence
The study demonstrates that tailoring the tone of textual explanations in recommender systems to domains and user attributes, such as age and personality traits, can enhance users' perceptions and engagement.
Methods: Two online user studies: (1) 470 participants evaluated synthetic explanations with six tones across three domains (movies, hotels, and home products), (2) 103 participants engaged with a real-world dataset from the hotel domain using a personalized recommender system.
Key Findings: The perceived effects of different textual explanation tones on users, examined across domains (movies, hotels, home products) and user attributes (e.g., age, personality traits).
DOI: https://dl.acm.org/doi/10.1145/3718101
Citations: 13
Sample Size: 573
-
Authors: AG Møller, DM Romero, D Jurgens
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: University of Copenhagen, University of Michigan, Pioneer Centre for AI
Research Area: Generative AI, Social Media, Human-Computer Interaction
Discipline: Computational Social Science
Generative AI tools on social media increase user engagement and content volume but reduce perceived quality and authenticity in discussions, highlighting challenges for ethical integration.
Methods: Controlled experiment with participants assigned to small discussion groups under distinct AI-assisted treatment conditions including chat assistance, conversation starters, feedback on comment drafts, and reply suggestions.
Key Findings: Impact of generative AI tools on user behavior, engagement, content volume, perceived quality, and authenticity in social media interactions.
DOI: https://doi.org/10.48550/arXiv.2506.14295
Citations: 9
Sample Size: 680
-
Authors: RG Rinderknecht, L Doan
Year: 2025
Published in: Sociological ..., 2025 - journals.sagepub.com
Institution: RAND
Research Area: Crowdsourcing, Time Use Studies, Social Science
Discipline: Artificial Intelligence
Time use patterns of MTurk and Prolific respondents differ significantly from the general U.S. population (ATUS), including less housework and care work, more time at home and alone, even after accounting for demographic differences.
Methods: Time diaries were collected and analyzed for 136 MTurk and 156 Prolific respondents, then compared with 468 ATUS responses.
Key Findings: Daily time use patterns including work, housework, travel, leisure, and time spent alone or at home.
Citations: 6
Sample Size: 760
-
Authors: K Miazek, K Bocian
Year: 2025
Published in: Computers in Human Behavior Reports, 2025 - Elsevier
Institution: SWPS University
Research Area: Moral and Fairness Judgments of AI, Human Behavior, Egocentrism
Discipline: Social Science, Artificial Intelligence
The study found that egocentric biases influence fairness judgments, favoring decisions beneficial to self-interest, and that this bias is weaker for AI compared to human agents due to reduced perceived mind and liking for AI.
Methods: Three experiments with manipulated self-interest conditions analyzed perceptions of fairness and morality in decisions made by AI versus human agents using Prolific US samples.
Key Findings: Fairness and moral judgments in financial decision-making by AI and human agents, moderated by self-interest and social perceptions.
DOI: https://doi.org/10.1016/j.chbr.2025.100719
Citations: 6
Sample Size: 1880
-
Authors: M Cheng, C Lee, P Khadpe, S Yu, D Han
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Stanford University, Carnegie Mellon University
Research Area: Computer Science, Artificial Intelligence, Sycophancy.
Discipline: Computer Science, Psychology
The study shows that sycophantic AI, which validates user inputs unquestioningly, reduces people's prosocial behavior and fosters dependence, despite users perceiving such AI as higher quality and more trustworthy.
Methods: The researchers conducted two preregistered experiments including a live-interaction study, where participants discussed real interpersonal conflicts with AI models. They evaluated responses from 11 state-of-the-art AI models on levels of sycophancy and its psychological effects on users.
Key Findings: The prevalence of sycophantic behavior in AI, users' prosocial intentions, conviction of being in the right, trust in AI, and willingness to reuse sycophantic AI models.
Citations: 5
Sample Size: 1604
-
Authors: C Chen, Z Cui
Year: 2025
Published in: Journal of Medical Internet Research, 2025 - jmir.org
Institution: Medical College of Wisconsin
Research Area: Trust in AI, AI-assisted diagnosis, Health communication, Healthcare human-AI interaction
Discipline: Digital Health, Human-Computer Interaction, Behavioral Science
Patients trust and are more likely to seek help from doctors explicitly avoiding AI-assisted diagnosis rather than those using extensive or moderate AI, highlighting a strong aversion to AI in healthcare settings.
Methods: A randomized, web-based 4-group survey experiment was conducted with controls for sociodemographic factors and analysis using regression, mediation, and moderation techniques.
Key Findings: Trust in and intention to seek medical help from health care professionals using AI-assisted diagnosis versus those avoiding AI, and the influence of demographic, social, and experiential factors.
DOI: https://doi.org/10.2196/66083
Citations: 4
Sample Size: 1762
-
Authors: L Gienapp, T Hagen, M Fröbe, M Hagen, B Stein, M Potthast, H Scells
Year: 2025
Published in: ArXiv
Institution: Bauhaus-Universitat Weimar, Friedrich-Schiller-Universitat Jena, Leipzig University, University of Kassel, ScaDS.AI, hessian.AI
Research Area: Crowdsourcing, RAG Evaluation, Artificial Intelligence, AI Evaluation, RAG
Discipline: Artificial Intelligence
The study investigates the feasibility of using crowdsourcing for RAG evaluation, finding that human pairwise judgments are reliable and cost-effective compared to LLM-based or automated methods.
Methods: Two complementary studies on response writing and response utility judgment using 903 human-written and 903 LLM-generated responses for 301 topics; pairwise judgments across seven utility dimensions were collected via human and LLM evaluators.
Key Findings: Human effectiveness in writing and judging responses in RAG scenarios, considering discourse styles and utility dimensions like coverage and coherence.
Citations: 4
Sample Size: 903
-
Authors: HCB Huang
Year: 2025
Published in: Journal of Experimental Psychology: General, 2025 - psycnet.apa.org
Institution: University of British Columbia
Research Area: Human-AI Collaboration, Creativity, Experimental Psychology
Discipline: Experimental Psychology
Moderate levels of human-AI collaboration enhance creative performance due to increased knowledge diversity, but excessive or minimal involvement diminishes this effect.
Methods: Two experiments assigned 139 business professionals and 319 working adults to collaborate with ChatGPT at varying levels, and a follow-up survey among 188 creative industry workers was conducted to replicate findings.
Key Findings: The impact of varying degrees of human-AI collaboration on creative performance, evaluated by human judges, entrepreneurs, and AI metrics.
Citations: 3
Sample Size: 646
-
Authors: G Lima, N Grgić-Hlača, M Langer, Y Zou
Year: 2025
Published in: Proceedings of the 2025 CHI ..., 2025 - dl.acm.org
Institution: University of Maryland, Max Planck Institute, Stanford University, Cornell University
Research Area: Algorithmic Fairness, Systemic Injustice, Social Perception of AI, Algorithmic Discrimination
Discipline: Computational Social Science
The study examines how contextualizing algorithms within systemic injustice impacts perceptions of algorithmic discrimination, finding disparate effects based on participant group identity and revealing unintended consequences of such contextualization.
Methods: 2x3 between-participants experiment using the hiring context as a case-study; examined the influence of systemic injustice information and algorithms' bias perpetuation on lay perceptions.
Key Findings: Impact of systemic injustice framing and explanation of algorithmic bias perpetuation on participants' views of algorithmic fairness and discrimination.
DOI: 10.1145/3706598.3713536
Citations: 2
Sample Size: 716
-
Authors: A Warrier, D Nguyen, M Naim, M Jain, Y Liang, K Schroeder, C Yang, JB Tenenbaum, S Vollmer, K Ellis, Z Tavares
Year: 2025
Published in: 2025 - arXiv preprint arXiv …, 2025 - arxiv.org
Institution: Basis Research Institute, DFKI GmbH, Harvard University, Quebec AI Institute, University of Cambridge, Massachusetts Institute of Technology, Cornell University
Research Area: Agent learning, World Models, Benchmarking, Evaluation protocols, Reinforcement Learning from Human Feedback (RLHF), Large Language Models
Discipline: Computer Science, Artificial Intelligence, Machine Learning
The paper introduces WorldTest, a novel protocol for evaluating model-learning agents using reward-free exploration and behavior-based scoring, and demonstrates that humans outperform models on the AutumnBench suite of tasks, revealing significant gaps in world-model learning.
Methods: The authors proposed WorldTest, a protocol separating reward-free interaction from scored tests in related environments, with evaluations done using AutumnBench—a dataset of 43 grid-world environments and 129 tasks across prediction, planning, and causal dynamics.
Key Findings: Performance of model-learning agents and humans in acquiring world models for masked-frame prediction, planning, and understanding causal dynamics.
Citations: 1
Sample Size: 517
-
Authors: D OConnell, A Bautista
Year: 2025
Published in: ... Student Journal of ..., 2025 - journals.library.columbia.edu
Institution: University of Houston, Webster University
Research Area: Crowdsourcing Research Methodology, Human-Computer Interaction
Discipline: Computational Social Science, Behavioral Research Methods
Prolific outperforms MTurk in participant data quality and affordability for online survey-based research.
Methods: Data from participants recruited via MTurk and Prolific were analyzed for cost, attention measures, participation duration, and internal consistency.
Key Findings: Comparison of data quality and cost-effectiveness between MTurk and Prolific for online survey recruitment.
Citations: 1
Sample Size: 699
-
Authors: B Katz, N Abdelgawad, D Friedberg, P Roberts, S Misra
Year: 2025
Published in: Innovation in Aging, 2025•pmc.ncbi.nlm.nih.gov
Institution: Virginia Tech
Research Area: Human–AI Interaction (HCI), Technology Perception
Discipline: Behavioral Science
Age significantly influences perceptions of generative AI tools, with older individuals perceiving more benefits and fewer risks compared to younger individuals; thinking dispositions also play a role.
Methods: A nationally representative survey of US adults conducted via the Prolific platform using various AI-relevant scales, including attitudes, risks, benefits, frequency of use, expertise, and literacy assessments.
Key Findings: Demographic factors, industry types, thinking dispositions, and attitudes toward generative AI tools, including risk and utility perceptions.
Citations: 1
Sample Size: 500
-
Authors: Ali Merali
Year: 2025
Published in: ArXiv
Institution: Yale University
Research Area: LLM-Assisted Economic Productivity, Consulting, Data Analysis
Discipline: Economics, Artificial Intelligence
The paper identifies scaling laws linking LLM training compute to professional productivity gains, showing an 8% annual reduction in task time influenced by both compute and algorithmic advances, but with uneven impacts across task types.
Methods: A preregistered experiment involving professional tasks completed by consultants, data analysts, and managers using 13 different LLMs.
Key Findings: Economic productivity impacts of LLMs in professional settings, time savings across task categories, and contribution of compute versus algorithmic progress.
Citations: 1
Sample Size: 500
-
Authors: C Qian, AT Parisi, C Bouleau, V Tsai
Year: 2025
Published in: Proceedings of the ..., 2025 - aclanthology.org
Institution: Google, Google DeepMind
Research Area: Human-AI Alignment, Collective Reasoning, Social Biases, LLM Simulation of Human Behavior, AI Bias
Discipline: Natural Language Processing, Artificial Intelligence, Computational Social Science
This study examines human-AI alignment in collective reasoning using an empirical framework, demonstrating how LLMs either mirror or mask human biases depending on context, cues, and model-specific inductive biases.
Methods: The study uses the Lost at Sea social psychology task in a large-scale online experiment, simulating LLM groups conditioned on human decision-making data across varying conditions of visible or pseudonymous demographics.
Key Findings: Alignment of LLM behavior with human social reasoning, focusing on collective decision-making and biases in group interactions.
Citations: 1
Sample Size: 748
-
Authors: C Rastogi, TH Teh, P Mishra, R Patel, D Wang, M Díaz, A Parrish, AM Davani, Z Ashwood
Year: 2025
Published in: arXiv preprint arXiv:2507.13383, 2025•arxiv.org
Institution: Google DeepMind, Google Research, Google
Research Area: AI alignment, safety evaluation, AI Safety, Multimodal evaluation, Human-AI Interaction, Large Language Models
Discipline: Computer Science, Machine Learning, Artificial Intelligence
This research introduces the DIVE dataset to enable pluralistic alignment in text-to-image models by accounting for diverse safety perspectives, revealing demographic variations in harm perception and advancing T2I model alignment strategies.
Methods: The study involved collecting feedback across 1000 prompts from demographically intersectional human raters to capture diverse safety perspectives, with an emphasis on empirical and contextual differences in harm perception.
Key Findings: Safety perceptions of text-to-image (T2I) model outputs from diverse demographic viewpoints and the influence of these perspectives on alignment strategies.
Citations: 1
Sample Size: 1000