Can intelligent agents improve data quality in online questiosnnaires? A pilot study
Abstract
We explored the utility of chatbots for improving data quality arising from collection via sonline surveys. Three-hundred Australian adults sampled via Prolific Academic were randomized across chatbot-supported or unassisted online questionnaire conditions. The questionnaire comprised validated measures, along with challenge items formulated to be confusing yet aligned with the validated targets. The chatbot condition provided optional assistance with item clarity via a virtual support agent. Chatbot use and user satisfaction were measured through session logs and user feedback. Data quality was operationalized as between-group differences in relationships among validated and challenge measures. Findings broadly supported chatbot utility for online surveys, showing that most participants with chatbot access utilized it, found it helpful, and demonstrated modestly improved data quality (vs. controls). Absence of confusion for one challenge item is believed to have contributed to an underestimated effect. Findings show that assistive chatbots can enhance data quality, will be utilized by many participants if available, and are perceived as beneficial by most users. Scope constraints for this pilot study are believed to have led to underestimated effects. Future testing with longer-form questionnaires incorporating expanded item difficulty may further understanding of chatbot utility for survey completion and data quality.
Study specs
Randomized participants into chatbot-supported and unassisted survey conditions; assessed chatbot use, user satisfaction, and data quality via validated and deliberately confusing challenge items.
- Authors
- A Söderström,A Shatte
- Institution
- University of Helsinki
- Discipline
- Research Methodology,Behavioral Science
- Sample Size
- N=300
- Study Type
- Experimental Study
- Year
- 2025
- Human Data Platform
- Prolific
- Source
- View Source DOI Google Scholar
Measured Outcomes
Effects of chatbot assistance on data quality, user satisfaction, and usage patterns in online questionnaires.
Peer Review & Critical Discussion
Potential Selection Bias in 2023 Cohort
The participant pool shows a concerning overrepresentation of users from high-income demographics. Looking at Table 3, we can see that 78% of respondents had annual incomes above $75k, which significantly limits the generalizability of these findings to broader populations.
Non-naive Participants Issue
I've noticed a methodological concern regarding participant naivety. Given that Prolific users often complete multiple studies, there's a real risk that participants had prior exposure to similar experimental paradigms, which could confound the results.
RLHF Applicability to This Study Design
The implications for RLHF training pipelines are understated. If we accept the authors' conclusions about preference stability, this has direct consequences for how we should structure reward model training. The temporal decay effect described in Section 4.2 is particularly relevant.
Verify your expertise to join discussion
Create an account and verify your credentials to participate in peer discussions.