From lab-testing to web-testing in cognitive research: Who you test is more important than how you test

83 citations

Abstract

The transition to web-testing, although promising, entails many new concerns. Web-testing is harder to monitor, so researchers need to ensure that the quality of the data collected is comparable to the quality of data typically achieved by lab-testing. Our study yields a novel contribution to this issue, by being the first to distinguish between the impact of web-testing and the impact of sourcing individuals from different participant pools, including crowdsourcing platforms. We presented a fairly general working memory task to 196 MTurk participants, 300 Prolific participants, and 255 students from the University of Geneva, allowing for a comparison of data quality across different participant pools. Among university students, 215 were web-tested, and 40 were lab-tested, allowing for a comparison of testing modalities within the same participant pool. Data quality was measured by assessing multiple data characteristics (i.e., reaction time, accuracy, anomalous values) and the presence of two behavioral benchmark effects. Our results revealed that who you test (i.e., participant pool) is more important than how you test (i.e., testing modality). Concerning how you test, our results showed that web-testing incurs a small, yet acceptable loss of data quality compared to lab-testing. Concerning who you test, Prolific participants were almost indistinguishable from web-tested students, but MTurk participants differed drastically from the other pools. Our results therefore encourage the use of web-testing in the domain of cognitive psychology, even when using complex paradigms. Nevertheless, these results urge for caution regarding how researchers select web-based participant pools when conducting online research.

83
Citations
Research
Paper Only

Peer Review & Critical Discussion

3 threads

Potential Selection Bias in 2023 Cohort

DSJDr. Sarah J.
Verified PhD Candidate
12 replies

The participant pool shows a concerning overrepresentation of users from high-income demographics. Looking at Table 3, we can see that 78% of respondents had annual incomes above $75k, which significantly limits the generalizability of these findings to broader populations.

2 hours ago

Non-naive Participants Issue

MCM. Chen (OpenAI)
Data Scientist
8 replies

I've noticed a methodological concern regarding participant naivety. Given that Prolific users often complete multiple studies, there's a real risk that participants had prior exposure to similar experimental paradigms, which could confound the results.

5 hours ago

RLHF Applicability to This Study Design

PRWProf. R. Williams
Verified Researcher
15 replies

The implications for RLHF training pipelines are understated. If we accept the authors' conclusions about preference stability, this has direct consequences for how we should structure reward model training. The temporal decay effect described in Section 4.2 is particularly relevant.

1 day ago

Verify your expertise to join discussion

Create an account and verify your credentials to participate in peer discussions.