Careless participants are essential for our phishing study: Understanding the impact of screening methods
Abstract
Online surveys using crowdsourcing services have been widely adopted in academic research projects aimed at understanding human perception and behavior. Because there is a concern that online surveys may include dishonest or careless responses by crowdworkers who perform a large number of tasks, or responses by bots, several screening methods have been proposed to discard such low-quality responses. However, in security research, especially in phishing research where the attention of participants is considered to influence the results, the elimination of careless responses may lead to the removal of participants who should be included in the research. In this study, we address the following research question: “Does the adoption of existing screening methods bias the results of security surveys?” Using Amazon Mechanical Turk and Prolific Academic, two popular crowdsourcing platforms used in online surveys, we conducted online user studies (N = 600) on security knowledge, security behavior, and phishing email detection performance to elucidate the influence of screening methods on the results. The obtained results indicate that the adoption of the instructional manipulation check (IMC) screening method triggers bias in the demographics of the participants, as well as differences in the results of phishing email detection performance. In addition, the degree of these differences depends on the crowdsourcing platform. We also demonstrated that it is non-trivial to determine the correlation between screening methods and factors that can influence the results of a survey on security behavior. These findings suggest that caution should be exercised when applying screening methods such as attention checks and IMC in studies where the extent of user attention could have a significant impact on the results.
Study specs
- Authors
- T Matsuura,AA Hasegawa,M Akiyama
- Institution
- Waseda University
- Discipline
- Human-Computer Interaction
- Year
- 2021
- Human Data Platform
- Prolific
- Source
- View Source Google Scholar
Peer Review & Critical Discussion
Potential Selection Bias in 2023 Cohort
The participant pool shows a concerning overrepresentation of users from high-income demographics. Looking at Table 3, we can see that 78% of respondents had annual incomes above $75k, which significantly limits the generalizability of these findings to broader populations.
Non-naive Participants Issue
I've noticed a methodological concern regarding participant naivety. Given that Prolific users often complete multiple studies, there's a real risk that participants had prior exposure to similar experimental paradigms, which could confound the results.
RLHF Applicability to This Study Design
The implications for RLHF training pipelines are understated. If we accept the authors' conclusions about preference stability, this has direct consequences for how we should structure reward model training. The temporal decay effect described in Section 4.2 is particularly relevant.
Verify your expertise to join discussion
Create an account and verify your credentials to participate in peer discussions.