Generative AI in crowdwork for web and social media research: A survey of workers at three platforms
Abstract
Crowdsourcing plays an important role in Web and social media research, from data annotation, to online experiments and user surveys. With the emergence of Generative AI (GenAI), researchers are considering how models and tools such as GPT might replace crowdwork. Many have already evaluated GPT on annotation tasks. However, it is less clear how GenAI might impact other types of tasks, or to what extent crowdworkers have already incorporated it into their work processes. Thus, we asked crowdworkers directly regarding their use of GenAI, via a survey at two points in time, across three commercial platforms. We found evidence that workers' self-reported use of GenAI did not change over time, but rather, was strongly correlated to the platform in which they operate, with MTurk workers using GenAI much more often than those operating at Clickworker and Prolific. As most respondents reported that survey completion is their "usual type of task", we discuss the implication of the use of GenAI in user surveys, via specific examples of ICWSM research.
Study specs
- Authors
- E Christoforou,G Demartini
- Institution
- University of Sheffield,University of Southampton
- Discipline
- Artificial Intelligence
- Year
- 2024
- Human Data Platform
- Prolific
- Source
- View Source DOI Google Scholar
Peer Review & Critical Discussion
Potential Selection Bias in 2023 Cohort
The participant pool shows a concerning overrepresentation of users from high-income demographics. Looking at Table 3, we can see that 78% of respondents had annual incomes above $75k, which significantly limits the generalizability of these findings to broader populations.
Non-naive Participants Issue
I've noticed a methodological concern regarding participant naivety. Given that Prolific users often complete multiple studies, there's a real risk that participants had prior exposure to similar experimental paradigms, which could confound the results.
RLHF Applicability to This Study Design
The implications for RLHF training pipelines are understated. If we accept the authors' conclusions about preference stability, this has direct consequences for how we should structure reward model training. The temporal decay effect described in Section 4.2 is particularly relevant.
Verify your expertise to join discussion
Create an account and verify your credentials to participate in peer discussions.