Browse 76 peer-reviewed papers in Crowdsourcing. Discover studies powered by high-quality human data from Prolific.
This page lists 76 peer-reviewed papers tagged with Crowdsourcing in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.
-
Authors: J Li, M Kuutila, E Huusko, N Kariyakarawana
Year: 2025
Published in: Proceedings of the 15th ..., 2023 - dl.acm.org
Institution: University of Oulu
Research Area: Social Media Credibility, Crowdsourcing, Human-Computer Interaction
Discipline: Human-Computer Interaction
Credibility of short-form health-related social media posts is influenced by factors such as author profession and post engagement metrics, with experts being encouraged to actively participate in information correction online.
Methods: Crowdsourced online credibility assessment using health-themed social media posts with varied content features deployed across three platforms; quantitative and qualitative data collection.
Key Findings: Credibility factors like author profession, engagement metrics (likes/shares), and personal strategies influencing perceived trustworthiness of social media posts.
DOI: 10.1145/3605390.3605406
Citations: 11
-
Authors: P Thwaites, N Vandeweerd, M Paquot
Year: 2025
Published in: Applied Linguistics, 2025 - academic.oup.com
Institution: University College Londonouvain, Radboud University Nijmegen, Fonds de la Recherche Scientifique – FNRS
Research Area: Applied Linguistics, Educational Assessment, Crowdsourcing
Discipline: Applied Linguistics
The study demonstrates that crowdsourcing platforms can recruit judges to evaluate learner texts with reliability and validity comparable to assessments conducted by trained linguists.
Methods: Judges recruited via an online crowdsourcing platform conducted comparative judgement assessments of learner texts to measure writing proficiency.
Key Findings: Reliability and concurrent validity of learner text evaluations performed via crowdsourced judges compared to linguist evaluations.
Citations: 10
-
Authors: RG Rinderknecht, L Doan
Year: 2025
Published in: Sociological ..., 2025 - journals.sagepub.com
Institution: RAND
Research Area: Crowdsourcing, Time Use Studies, Social Science
Discipline: Artificial Intelligence
Time use patterns of MTurk and Prolific respondents differ significantly from the general U.S. population (ATUS), including less housework and care work, more time at home and alone, even after accounting for demographic differences.
Methods: Time diaries were collected and analyzed for 136 MTurk and 156 Prolific respondents, then compared with 468 ATUS responses.
Key Findings: Daily time use patterns including work, housework, travel, leisure, and time spent alone or at home.
Citations: 6
Sample Size: 760
-
Authors: TK Koh
Year: 2025
Published in: Organization Science, 2025 - pubsonline.informs.org
Institution: University of North Carolina Chapel Hill
Research Area: Crowdsourcing Contests, Feedback Use, Priming Intervention, Organizational Science
Discipline: Behavioral Science
The paper examines how solvers in crowdsourcing contests prioritize feedback from seekers over peers, even when equally constructive, and proposes an intervention to improve feedback usage for better outcomes.
Methods: The study involved a field survey and three online experiments to test the theorized source effect and the proposed feedback evaluation intervention.
Key Findings: Solvers' feedback usage patterns, the source effect of feedback (seeker vs. peer), and the influence of feedback constructiveness on idea quality and solvers’ winning prospects.
Citations: 5
-
Authors: N Byrd
Year: 2025
Published in: Byrd, N. (2025). Reflection-Philosophy Order Effects and Correlations Across Samples. Analysis. DOI: 10.1093/analys/anaf015. https://osf.io/preprints/psyarxiv/y8sdm
Institution: Stevens Institute of Technology
Research Area: Behavioral Research Methods, Experimental Psychology, Crowdsourcing Platforms
Discipline: Psychology
Reflective reasoning correlates with certain philosophical decisions, and the study suggests bidirectional causal paths between reflection and philosophy, with test order effects influencing reflection test outcomes but not philosophical decisions.
Methods: Participants from four sources (Amazon Mechanical Turk, CloudResearch, Prolific, and a university) were tested on reflective reasoning and their decisions on 10 philosophical thought experiments.
Key Findings: Impact of reflective reasoning on philosophical decisions and the effect of test order on reflection and philosophy outcomes.
Citations: 4
-
Authors: L Gienapp, T Hagen, M Fröbe, M Hagen, B Stein, M Potthast, H Scells
Year: 2025
Published in: ArXiv
Institution: Bauhaus-Universitat Weimar, Friedrich-Schiller-Universitat Jena, Leipzig University, University of Kassel, ScaDS.AI, hessian.AI
Research Area: Crowdsourcing, RAG Evaluation, Artificial Intelligence, AI Evaluation, RAG
Discipline: Artificial Intelligence
The study investigates the feasibility of using crowdsourcing for RAG evaluation, finding that human pairwise judgments are reliable and cost-effective compared to LLM-based or automated methods.
Methods: Two complementary studies on response writing and response utility judgment using 903 human-written and 903 LLM-generated responses for 301 topics; pairwise judgments across seven utility dimensions were collected via human and LLM evaluators.
Key Findings: Human effectiveness in writing and judging responses in RAG scenarios, considering discourse styles and utility dimensions like coverage and coherence.
Citations: 4
Sample Size: 903
-
Authors: HCB Huang
Year: 2025
Published in: Journal of Experimental Psychology: General, 2025 - psycnet.apa.org
Institution: University of British Columbia
Research Area: Human-AI Collaboration, Creativity, Experimental Psychology
Discipline: Experimental Psychology
Moderate levels of human-AI collaboration enhance creative performance due to increased knowledge diversity, but excessive or minimal involvement diminishes this effect.
Methods: Two experiments assigned 139 business professionals and 319 working adults to collaborate with ChatGPT at varying levels, and a follow-up survey among 188 creative industry workers was conducted to replicate findings.
Key Findings: The impact of varying degrees of human-AI collaboration on creative performance, evaluated by human judges, entrepreneurs, and AI metrics.
Citations: 3
Sample Size: 646
-
Authors: J Li, E Huusko, NN Ahooie, M Kuutila
Year: 2025
Published in: ... Journal of Human ..., 2025 - Taylor & Francis
Institution: University of Oulu
Research Area: Social Media Credibility, Human-Computer Interaction (HCI) in Social Media, Crowdsourcing
Discipline: Human-Computer Interaction
Credtwi, a browser plugin for assessing tweet credibility, revealed that perceived Twitter credibility declines with use and author verification status heavily influences perceived credibility.
Methods: A browser plugin was used for crowdsourced credibility assessment through participant questionnaires during a week-long field study.
Key Findings: Perceptions of online tweet credibility, factors affecting tweet credibility (e.g., verification status, bio), variations in credibility assessments across genders.
DOI: https://doi.org/10.1080/10447318.2025.2480885
Citations: 2
Sample Size: 150
-
Authors: Y Ba, MV Mancenido, EK Chiou, R Pan
Year: 2025
Published in: Behavior Research Methods, 2025 - Springer
Institution: University of Delaware, National Taiwan University, University of British Columbia, Monash University
Research Area: Crowdsourcing, Data Quality, Spamming Behavior Detection, LLM Applications in Behavioral Research
Discipline: Computer Science, Artificial Intelligence, Large Language Models
The paper introduces a systematic method to evaluate crowdsourced data quality and detect spam behaviors through variance decomposition, proposing a spammer index and credibility metrics to improve consistency and reliability in labeling tasks.
Methods: Variance decomposition, Markov chain models, and generalized random effects models were used to assess annotator consistency and credibility; metrics were applied to both simulated and real-world data from two crowdsourcing platforms.
Key Findings: Quality of crowdsourced data, spammer behaviors, annotators’ consistency, and credibility.
Citations: 2
-
Authors: D OConnell, A Bautista
Year: 2025
Published in: ... Student Journal of ..., 2025 - journals.library.columbia.edu
Institution: University of Houston, Webster University
Research Area: Crowdsourcing Research Methodology, Human-Computer Interaction
Discipline: Computational Social Science, Behavioral Research Methods
Prolific outperforms MTurk in participant data quality and affordability for online survey-based research.
Methods: Data from participants recruited via MTurk and Prolific were analyzed for cost, attention measures, participation duration, and internal consistency.
Key Findings: Comparison of data quality and cost-effectiveness between MTurk and Prolific for online survey recruitment.
Citations: 1
Sample Size: 699
-
Authors: JS Michel, G Sawhney, GP Watson
Year: 2025
Published in: How to Conduct and ..., 2025 - elgaronline.com
Institution: Auburn University
Research Area: Crowdsourcing, Research Methodology, Social Science
Discipline: Social Science
Crowdsourcing is a versatile tool leveraging collective intelligence for efficient task completion and has applications across various fields including decentralized finance, blockchain technologies, and IO Psychology research and practice.
Methods: The paper discusses the theoretical and practical applications of crowdsourcing in various domains, referencing prior work and examples such as Wikipedia, crowdfunding platforms, and blockchain networks.
Key Findings: The applications and impact of crowdsourcing in different fields, particularly its role in Industrial-Organizational Psychology for data collection and analysis.
Citations: 1
-
Authors: Y Zhang, J Pang, Z Zhu, Y Liu
Year: 2025
Published in: arXiv preprint arXiv:2506.06991, 2025 - arxiv.org
Institution: Rutgers University, University of California Santa Cruz
Research Area: Artificial Intelligence, Computational Social Science
Discipline: Computational Social Science
The paper proposes a training-free scoring mechanism using peer prediction to detect and mitigate LLM-assisted cheating in crowdsourced annotation tasks, with theoretical guarantees and empirical validation.
Methods: A peer prediction-based mechanism quantifies correlations between worker answers while conditioning on LLM-generated labels, without requiring ground truth or high-dimensional training data.
Key Findings: Detection of LLM-assisted low-effort cheating in crowdsourced annotation tasks, focusing on theoretical effectiveness and empirical robustness.
DOI: https://doi.org/10.48550/arXiv.2506.06991
Citations: 1
-
Authors: DT Esch, N Mylonopoulos, V Theoharakis
Year: 2025
Published in: Behavior Research Methods, 2025 - Springer
Institution: University of Cologne, University of Piraeus, Aristotle University of Thessaloniki
Research Area: Crowdsourcing Behavioral Research, Mobile Data Collection
Discipline: Behavioral Research Methods
Mobile-based responses via platforms like Pollfish are comparable in quality to computer-based ones from MTurk and Prolific, though attentiveness varies significantly across platforms and is influenced by factors like incentives, distractions, and system 1 thinking.
Methods: Conducted two studies distributing the same survey across MTurk, Prolific, Pollfish, and Qualtrics panels to compare data quality and analyze attentiveness scores.
Key Findings: Attentiveness, device usage (mobile vs. computer), and factors influencing data quality such as incentives, respondent activity, distractions, and survey familiarity.
Citations: 1
-
Authors: A Qian, R Shaw, L Dabbish, J Suh, H Shen
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Carnegie Mellon University, University of Pittsburgh, University of Utah, Yale School of Medicine, Yale University
Research Area: Responsible AI, Content Moderation, Risk Disclosure, Worker Well-being in Human-Computer Interaction (HCI).
Discipline: Computational Social Science, Human-Computer Interaction
The paper examines how task designers approach well-being risk disclosure in Responsible AI (RAI) content work, highlighting a need for better frameworks to communicate such risks effectively.
Methods: Interviews were conducted with 23 task designers from academic and industry sectors to gather insights on risk recognition, interpretation, and communication practices.
Key Findings: How task designers recognize, interpret, and communicate well-being risks in RAI content work.
Citations: 1
Sample Size: 23
-
Authors: P Schmidtová, O Dušek, S Mahamood
Year: 2025
Published in: ArXiv
Institution: Charles University, Trivago
Research Area: Summarization evaluation, Natural Language Processing, LLM-as-a-Judge, AI Evaluation
Discipline: Natural Language Processing
Simpler metrics like word overlap surprisingly correlate well with human judgments in summarization evaluation, outperforming complex methods in out-of-domain applications, though LLMs remain unreliable for assessment due to annotation biases.
Methods: Human evaluation campaigns with categorical error assessment, span-level annotations, and comparison of traditional metrics, trainable models, and LLM-as-a-judge approaches.
Key Findings: Effectiveness of summarization evaluation methods and their correlation with human judgment, along with business impacts of incorrect information in generated summaries.
Citations: 1
-
Authors: KHT Vo
Year: 2025
Published in: Design Science, 2025 - cambridge.org
Institution: Indiana University
Research Area: Human-AI Collaboration in Design
Discipline: Human-Computer Interaction
This research examines whether a machine, specifically Artificial Intelligence, can be creative by comparing design solutions for a practical competition – a light fixture for a pediatric waiting room – among AI, collaboration efforts and a human designer.
Citations: 1
-
Authors: CS Kay
Year: 2025
Published in: Behavior Research Methods, 2025•Springer
Institution: Stanford University
Research Area: Behavioral Research Methods
Discipline: Behavioral Science, Behavioral Research Methods
Data collected on Amazon's Mechanical Turk (MTurk) shows substantial quality issues, with semantic antonym pairs being positively correlated instead of negatively, even after implementing data screening and using high-reputation participants.
Methods: 27 semantic antonym pairs were administered to participants from Connect (N=100), Prolific (N=100), and MTurk (N=400, N=600) to examine response quality and correlation patterns.
Key Findings: The correlation of responses to semantic antonym pairs as an indicator of data quality across different survey platforms.
Citations: 1
Sample Size: 1200
-
Authors: Mukund Telukunta, Venkata Sriram Siddhardh Nadendla, Morgan Stuart, Casey Canfield
Year: 2025
Published in: ArXiv
Institution: Missouri University of Science and Technology, United Network for Organ Sharing
Research Area: Algorithmic Fairness, Healthcare AI, Decision Making
Discipline: Artificial Intelligence
The study investigates fairness in regression-based predictive models for kidney transplantation, introducing three group fairness notions and identifying social preferences for fairness criteria, revealing biases against age groups but fairness towards gender and race groups.
Methods: Three novel fairness notions (independence, separation, sufficiency) were introduced alongside crowd feedback analysis through a Mixed-Logit discrete choice model.
Key Findings: Fairness in regression-based predictive analytics regarding group fairness criteria across social dimensions such as age, gender, and race.
Sample Size: 85
-
Authors: C Heath, JM Williams, D Leightley
Year: 2025
Published in: JMIR mHealth and ..., 2025 - mhealth.jmir.org
Institution: Swansea University, King's College London, Reykjavík University
Research Area: mHealth Interventions, Crowdsourcing, Social Media Recruitment, Mental Health Research (PTSD, Harmful Gambling)
Discipline: Digital Health, Mental Health Research
Social media and online platforms like Facebook and Prolific were effective but faced challenges in recruiting and retaining military veterans with PTSD or harmful gambling for a digital mHealth intervention pilot study.
Methods: Multiple recruitment strategies were used, including paid and unpaid advertisements on Facebook, Prolific, direct mailing, event hosting with veterans' charities, snowball sampling, and incentives.
Key Findings: The effectiveness of different recruitment strategies for enrolling military veterans with PTSD or harmful gambling into a digital intervention study.
Sample Size: 79
-
Authors: G Allen, U Gadiraju
Year: 2025
Published in: Proceedings of the 4th Annual Symposium on ..., 2025 - dl.acm.org
Institution: TU Delft
Research Area: Gesture Recognition, Crowdsourcing, Input Modalities in HCI
Discipline: Human-Computer Interaction
Switching input modalities in microtask crowdsourcing does not affect worker accuracy or perceived cognitive load but influences task completion time; ergonomically informed gestures can integrate effectively without impacting worker experiences.
Methods: A between-subjects study was conducted across 16 experimental conditions with varying input modality sequences to assess impacts on task outcomes and worker experiences.
Key Findings: Effect of switching input modalities on task completion time, accuracy, and perceived cognitive load among crowd workers.
DOI: https://doi.org/10.1145/3729176.3729184
Sample Size: 717