From lab-testing to web-testing in cognitive research: Who you test is more important than how you test
Abstract
The transition to web-testing, although promising, entails many new concerns. Web-testing is harder to monitor, so researchers need to ensure that the quality of the data collected is comparable to the quality of data typically achieved by lab-testing. Our study yields a novel contribution to this issue, by being the first to distinguish between the impact of web-testing and the impact of sourcing individuals from different participant pools, including crowdsourcing platforms. We presented a fairly general working memory task to 196 MTurk participants, 300 Prolific participants, and 255 students from the University of Geneva, allowing for a comparison of data quality across different participant pools. Among university students, 215 were web-tested, and 40 were lab-tested, allowing for a comparison of testing modalities within the same participant pool. Data quality was measured by assessing multiple data characteristics (i.e., reaction time, accuracy, anomalous values) and the presence of two behavioral benchmark effects. Our results revealed that who you test (i.e., participant pool) is more important than how you test (i.e., testing modality). Concerning how you test, our results showed that web-testing incurs a small, yet acceptable loss of data quality compared to lab-testing. Concerning who you test, Prolific participants were almost indistinguishable from web-tested students, but MTurk participants differed drastically from the other pools. Our results therefore encourage the use of web-testing in the domain of cognitive psychology, even when using complex paradigms. Nevertheless, these results urge for caution regarding how researchers select web-based participant pools when conducting online research.
Study specs
- Authors
- K Uittenhove,S Jeanneret,E Vergauwe
- Discipline
- Cognitive Research,Psychology
- Year
- 2023
- Human Data Platform
- Prolific
- Source
- View Source Google Scholar
Peer Review & Critical Discussion
Potential Selection Bias in 2023 Cohort
The participant pool shows a concerning overrepresentation of users from high-income demographics. Looking at Table 3, we can see that 78% of respondents had annual incomes above $75k, which significantly limits the generalizability of these findings to broader populations.
Non-naive Participants Issue
I've noticed a methodological concern regarding participant naivety. Given that Prolific users often complete multiple studies, there's a real risk that participants had prior exposure to similar experimental paradigms, which could confound the results.
RLHF Applicability to This Study Design
The implications for RLHF training pipelines are understated. If we accept the authors' conclusions about preference stability, this has direct consequences for how we should structure reward model training. The temporal decay effect described in Section 4.2 is particularly relevant.
Verify your expertise to join discussion
Create an account and verify your credentials to participate in peer discussions.