Crowdsourcing Studies

This page lists 76 peer-reviewed papers tagged with Crowdsourcing in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (20 of 76)

Assessing credibility factors of short-form social media posts: A crowdsourced online experiment

Authors: J Li, M Kuutila, E Huusko, N Kariyakarawana

Year: 2025

Published in: Proceedings of the 15th ..., 2023 - dl.acm.org

Institution: University of Oulu

Research Area: Social Media Credibility, Crowdsourcing, Human-Computer Interaction

Discipline: Human-Computer Interaction

Credibility of short-form health-related social media posts is influenced by factors such as author profession and post engagement metrics, with experts being encouraged to actively participate in information correction online.

Methods: Crowdsourced online credibility assessment using health-themed social media posts with varied content features deployed across three platforms; quantitative and qualitative data collection.

Key Findings: Credibility factors like author profession, engagement metrics (likes/shares), and personal strategies influencing perceived trustworthiness of social media posts.

DOI: 10.1145/3605390.3605406

Citations: 11
Crowdsourced comparative judgement for evaluating learner texts: How reliable are judges recruited from an online crowdsourcing platform?

Authors: P Thwaites, N Vandeweerd, M Paquot

Year: 2025

Published in: Applied Linguistics, 2025 - academic.oup.com

Institution: University College Londonouvain, Radboud University Nijmegen, Fonds de la Recherche Scientifique – FNRS

Research Area: Applied Linguistics, Educational Assessment, Crowdsourcing

Discipline: Applied Linguistics

The study demonstrates that crowdsourcing platforms can recruit judges to evaluate learner texts with reliability and validity comparable to assessments conducted by trained linguists.

Methods: Judges recruited via an online crowdsourcing platform conducted comparative judgement assessments of learner texts to measure writing proficiency.

Key Findings: Reliability and concurrent validity of learner text evaluations performed via crowdsourced judges compared to linguist evaluations.

Citations: 10
The Daily Lives of Crowdsourced US Respondents: A Time Use Comparison of MTurk, Prolific, and ATUS

Authors: RG Rinderknecht, L Doan

Year: 2025

Published in: Sociological ..., 2025 - journals.sagepub.com

Institution: RAND

Research Area: Crowdsourcing, Time Use Studies, Social Science

Discipline: Artificial Intelligence

Time use patterns of MTurk and Prolific respondents differ significantly from the general U.S. population (ATUS), including less housework and care work, more time at home and alone, even after accounting for demographic differences.

Methods: Time diaries were collected and analyzed for 136 MTurk and 156 Prolific respondents, then compared with 468 ATUS responses.

Key Findings: Daily time use patterns including work, housework, travel, leisure, and time spent alone or at home.

Citations: 6

Sample Size: 760
"Optimal" Feedback Use in Crowdsourcing Contests: Source Effect and Priming Intervention

Authors: TK Koh

Year: 2025

Published in: Organization Science, 2025 - pubsonline.informs.org

Institution: University of North Carolina Chapel Hill

Research Area: Crowdsourcing Contests, Feedback Use, Priming Intervention, Organizational Science

Discipline: Behavioral Science

The paper examines how solvers in crowdsourcing contests prioritize feedback from seekers over peers, even when equally constructive, and proposes an intervention to improve feedback usage for better outcomes.

Methods: The study involved a field survey and three online experiments to test the theorized source effect and the proposed feedback evaluation intervention.

Key Findings: Solvers' feedback usage patterns, the source effect of feedback (seeker vs. peer), and the influence of feedback constructiveness on idea quality and solvers’ winning prospects.

Citations: 5
Reflection-Philosophy Order Effects and Correlations Across Samples

Authors: N Byrd

Year: 2025

Published in: Byrd, N. (2025). Reflection-Philosophy Order Effects and Correlations Across Samples. Analysis. DOI: 10.1093/analys/anaf015. https://osf.io/preprints/psyarxiv/y8sdm

Institution: Stevens Institute of Technology

Research Area: Behavioral Research Methods, Experimental Psychology, Crowdsourcing Platforms

Discipline: Psychology

Reflective reasoning correlates with certain philosophical decisions, and the study suggests bidirectional causal paths between reflection and philosophy, with test order effects influencing reflection test outcomes but not philosophical decisions.

Methods: Participants from four sources (Amazon Mechanical Turk, CloudResearch, Prolific, and a university) were tested on reflective reasoning and their decisions on 10 philosophical thought experiments.

Key Findings: Impact of reflective reasoning on philosophical decisions and the effect of test order on reflection and philosophy outcomes.

Citations: 4
The Viability of Crowdsourcing for RAG Evaluation

Authors: L Gienapp, T Hagen, M Fröbe, M Hagen, B Stein, M Potthast, H Scells

Year: 2025

Published in: ArXiv

Institution: Bauhaus-Universitat Weimar, Friedrich-Schiller-Universitat Jena, Leipzig University, University of Kassel, ScaDS.AI, hessian.AI

Research Area: Crowdsourcing, RAG Evaluation, Artificial Intelligence, AI Evaluation, RAG

Discipline: Artificial Intelligence

The study investigates the feasibility of using crowdsourcing for RAG evaluation, finding that human pairwise judgments are reliable and cost-effective compared to LLM-based or automated methods.

Methods: Two complementary studies on response writing and response utility judgment using 903 human-written and 903 LLM-generated responses for 301 topics; pairwise judgments across seven utility dimensions were collected via human and LLM evaluators.

Key Findings: Human effectiveness in writing and judging responses in RAG scenarios, considering discourse styles and utility dimensions like coverage and coherence.

Citations: 4

Sample Size: 903
Unlocking creativity with Artificial Intelligence: Field and experimental evidence on the Goldilocks (curvilinear) effect of human-AI collaboration.

Authors: HCB Huang

Year: 2025

Published in: Journal of Experimental Psychology: General, 2025 - psycnet.apa.org

Institution: University of British Columbia

Research Area: Human-AI Collaboration, Creativity, Experimental Psychology

Discipline: Experimental Psychology

Moderate levels of human-AI collaboration enhance creative performance due to increased knowledge diversity, but excessive or minimal involvement diminishes this effect.

Methods: Two experiments assigned 139 business professionals and 319 working adults to collaborate with ChatGPT at varying levels, and a follow-up survey among 188 creative industry workers was conducted to replicate findings.

Key Findings: The impact of varying degrees of human-AI collaboration on creative performance, evaluated by human judges, entrepreneurs, and AI metrics.

Citations: 3

Sample Size: 646
Credtwi: Investigating Social Media Credibility with a Browser Plugin

Authors: J Li, E Huusko, NN Ahooie, M Kuutila

Year: 2025

Published in: ... Journal of Human ..., 2025 - Taylor & Francis

Institution: University of Oulu

Research Area: Social Media Credibility, Human-Computer Interaction (HCI) in Social Media, Crowdsourcing

Discipline: Human-Computer Interaction

Credtwi, a browser plugin for assessing tweet credibility, revealed that perceived Twitter credibility declines with use and author verification status heavily influences perceived credibility.

Methods: A browser plugin was used for crowdsourced credibility assessment through participant questionnaires during a week-long field study.

Key Findings: Perceptions of online tweet credibility, factors affecting tweet credibility (e.g., verification status, bio), variations in credibility assessments across genders.

DOI: https://doi.org/10.1080/10447318.2025.2480885

Citations: 2

Sample Size: 150
Data quality in crowdsourcing and spamming behavior detection

Authors: Y Ba, MV Mancenido, EK Chiou, R Pan

Year: 2025

Published in: Behavior Research Methods, 2025 - Springer

Institution: University of Delaware, National Taiwan University, University of British Columbia, Monash University

Research Area: Crowdsourcing, Data Quality, Spamming Behavior Detection, LLM Applications in Behavioral Research

Discipline: Computer Science, Artificial Intelligence, Large Language Models

The paper introduces a systematic method to evaluate crowdsourced data quality and detect spam behaviors through variance decomposition, proposing a spammer index and credibility metrics to improve consistency and reliability in labeling tasks.

Methods: Variance decomposition, Markov chain models, and generalized random effects models were used to assess annotator consistency and credibility; metrics were applied to both simulated and real-world data from two crowdsourcing platforms.

Key Findings: Quality of crowdsourced data, spammer behaviors, annotators’ consistency, and credibility.

Citations: 2
Caution when Crowdsourcing: Prolific as a Superior Platform Compared with MTurk

Authors: D OConnell, A Bautista

Year: 2025

Published in: ... Student Journal of ..., 2025 - journals.library.columbia.edu

Institution: University of Houston, Webster University

Research Area: Crowdsourcing Research Methodology, Human-Computer Interaction

Discipline: Computational Social Science, Behavioral Research Methods

Prolific outperforms MTurk in participant data quality and affordability for online survey-based research.

Methods: Data from participants recruited via MTurk and Prolific were analyzed for cost, attention measures, participation duration, and internal consistency.

Key Findings: Comparison of data quality and cost-effectiveness between MTurk and Prolific for online survey recruitment.

Citations: 1

Sample Size: 699
Crowdsourcing: a modern tool for robust research sampling

Authors: JS Michel, G Sawhney, GP Watson

Year: 2025

Published in: How to Conduct and ..., 2025 - elgaronline.com

Institution: Auburn University

Research Area: Crowdsourcing, Research Methodology, Social Science

Discipline: Social Science

Crowdsourcing is a versatile tool leveraging collective intelligence for efficient task completion and has applications across various fields including decentralized finance, blockchain technologies, and IO Psychology research and practice.

Methods: The paper discusses the theoretical and practical applications of crowdsourcing in various domains, referencing prior work and examples such as Wikipedia, crowdfunding platforms, and blockchain networks.

Key Findings: The applications and impact of crowdsourcing in different fields, particularly its role in Industrial-Organizational Psychology for data collection and analysis.

Citations: 1
Evaluating LLM-contaminated Crowdsourcing Data Without Ground Truth

Authors: Y Zhang, J Pang, Z Zhu, Y Liu

Year: 2025

Published in: arXiv preprint arXiv:2506.06991, 2025 - arxiv.org

Institution: Rutgers University, University of California Santa Cruz

Research Area: Artificial Intelligence, Computational Social Science

Discipline: Computational Social Science

The paper proposes a training-free scoring mechanism using peer prediction to detect and mitigate LLM-assisted cheating in crowdsourced annotation tasks, with theoretical guarantees and empirical validation.

Methods: A peer prediction-based mechanism quantifies correlations between worker answers while conditioning on LLM-generated labels, without requiring ground truth or high-dimensional training data.

Key Findings: Detection of LLM-assisted low-effort cheating in crowdsourced annotation tasks, focusing on theoretical effectiveness and empirical robustness.

DOI: https://doi.org/10.48550/arXiv.2506.06991

Citations: 1
Evaluating mobile-based data collection for crowdsourcing behavioral research

Authors: DT Esch, N Mylonopoulos, V Theoharakis

Year: 2025

Published in: Behavior Research Methods, 2025 - Springer

Institution: University of Cologne, University of Piraeus, Aristotle University of Thessaloniki

Research Area: Crowdsourcing Behavioral Research, Mobile Data Collection

Discipline: Behavioral Research Methods

Mobile-based responses via platforms like Pollfish are comparable in quality to computer-based ones from MTurk and Prolific, though attentiveness varies significantly across platforms and is influenced by factors like incentives, distractions, and system 1 thinking.

Methods: Conducted two studies distributing the same survey across MTurk, Prolific, Pollfish, and Qualtrics panels to compare data quality and analyze attentiveness scores.

Key Findings: Attentiveness, device usage (mobile vs. computer), and factors influencing data quality such as incentives, respondent activity, distractions, and survey familiarity.

Citations: 1
Locating Risk: Task Designers and the Challenge of Risk Disclosure in RAI Content Work

Authors: A Qian, R Shaw, L Dabbish, J Suh, H Shen

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Carnegie Mellon University, University of Pittsburgh, University of Utah, Yale School of Medicine, Yale University

Research Area: Responsible AI, Content Moderation, Risk Disclosure, Worker Well-being in Human-Computer Interaction (HCI).

Discipline: Computational Social Science, Human-Computer Interaction

The paper examines how task designers approach well-being risk disclosure in Responsible AI (RAI) content work, highlighting a need for better frameworks to communicate such risks effectively.

Methods: Interviews were conducted with 23 task designers from academic and industry sectors to gather insights on risk recognition, interpretation, and communication practices.

Key Findings: How task designers recognize, interpret, and communicate well-being risks in RAI content work.

Citations: 1

Sample Size: 23
Real-World Summarization: When Evaluation Reaches Its Limits

Authors: P Schmidtová, O Dušek, S Mahamood

Year: 2025

Published in: ArXiv

Institution: Charles University, Trivago

Research Area: Summarization evaluation, Natural Language Processing, LLM-as-a-Judge, AI Evaluation

Discipline: Natural Language Processing

Simpler metrics like word overlap surprisingly correlate well with human judgments in summarization evaluation, outperforming complex methods in out-of-domain applications, though LLMs remain unreliable for assessment due to annotation biases.

Methods: Human evaluation campaigns with categorical error assessment, span-level annotations, and comparison of traditional metrics, trainable models, and LLM-as-a-judge approaches.

Key Findings: Effectiveness of summarization evaluation methods and their correlation with human judgment, along with business impacts of incorrect information in generated summaries.

Citations: 1
Who" designs better? A competition among human, artificial intelligence and human-AI collaboration

Authors: KHT Vo

Year: 2025

Published in: Design Science, 2025 - cambridge.org

Institution: Indiana University

Research Area: Human-AI Collaboration in Design

Discipline: Human-Computer Interaction

This research examines whether a machine, specifically Artificial Intelligence, can be creative by comparing design solutions for a practical competition – a light fixture for a pediatric waiting room – among AI, collaboration efforts and a human designer.

Citations: 1
Why you shouldn’t trust data collected on MTurk

Authors: CS Kay

Year: 2025

Published in: Behavior Research Methods, 2025•Springer

Institution: Stanford University

Research Area: Behavioral Research Methods

Discipline: Behavioral Science, Behavioral Research Methods

Data collected on Amazon's Mechanical Turk (MTurk) shows substantial quality issues, with semantic antonym pairs being positively correlated instead of negatively, even after implementing data screening and using high-reputation participants.

Methods: 27 semantic antonym pairs were administered to participants from Connect (N=100), Prolific (N=100), and MTurk (N=400, N=600) to examine response quality and correlation patterns.

Key Findings: The correlation of responses to semantic antonym pairs as an indicator of data quality across different survey platforms.

Citations: 1

Sample Size: 1200
Fairness Perceptions in Regression-based Predictive Models

Authors: Mukund Telukunta, Venkata Sriram Siddhardh Nadendla, Morgan Stuart, Casey Canfield

Year: 2025

Published in: ArXiv

Institution: Missouri University of Science and Technology, United Network for Organ Sharing

Research Area: Algorithmic Fairness, Healthcare AI, Decision Making

Discipline: Artificial Intelligence

The study investigates fairness in regression-based predictive models for kidney transplantation, introducing three group fairness notions and identifying social preferences for fairness criteria, revealing biases against age groups but fairness towards gender and race groups.

Methods: Three novel fairness notions (independence, separation, sufficiency) were introduced alongside crowd feedback analysis through a Mixed-Logit discrete choice model.

Key Findings: Fairness in regression-based predictive analytics regarding group fairness criteria across social dimensions such as age, gender, and race.

Sample Size: 85
Leveraging Social Media and Crowdsourcing to Recruit and Retain Military Veterans With Posttraumatic Stress Disorder or Experience of Harmful Gambling ...

Authors: C Heath, JM Williams, D Leightley

Year: 2025

Published in: JMIR mHealth and ..., 2025 - mhealth.jmir.org

Institution: Swansea University, King's College London, Reykjavík University

Research Area: mHealth Interventions, Crowdsourcing, Social Media Recruitment, Mental Health Research (PTSD, Harmful Gambling)

Discipline: Digital Health, Mental Health Research

Social media and online platforms like Facebook and Prolific were effective but faced challenges in recruiting and retaining military veterans with PTSD or harmful gambling for a digital mHealth intervention pilot study.

Methods: Multiple recruitment strategies were used, including paid and unpaid advertisements on Facebook, Prolific, direct mailing, event hosting with veterans' charities, snowball sampling, and incentives.

Key Findings: The effectiveness of different recruitment strategies for enrolling military veterans with PTSD or harmful gambling into a digital intervention study.

Sample Size: 79
Making the Switch: Towards Intelligent Integration of Gestures As an Input Modality for Microtask Crowdsourcing

Authors: G Allen, U Gadiraju

Year: 2025

Published in: Proceedings of the 4th Annual Symposium on ..., 2025 - dl.acm.org

Institution: TU Delft

Research Area: Gesture Recognition, Crowdsourcing, Input Modalities in HCI

Discipline: Human-Computer Interaction

Switching input modalities in microtask crowdsourcing does not affect worker accuracy or perceived cognitive load but influences task completion time; ergonomically informed gestures can integrate effectively without impacting worker experiences.

Methods: A between-subjects study was conducted across 16 experimental conditions with varying input modality sequences to assess impacts on task outcomes and worker experiences.

Key Findings: Effect of switching input modalities on task completion time, accuracy, and perceived cognitive load among crowd workers.

DOI: https://doi.org/10.1145/3729176.3729184

Sample Size: 717