Taking the person seriously: Ethically aware IS research in the era of reinforcement learning-based personalization
Abstract
Advances in reinforcement learning and implicit data collection on large-scale commercial platforms mark the beginning of a new era of personalization aimed at the adaptive control of human user environments. We present five emergent features of this new paradigm of personalization that endanger persons and societies at scale and analyze their potential to reduce personal autonomy, destabilize social and political systems, and facilitate mass surveillance and social control, among other concerns. We argue that current data protection laws, most notably the European Union's General Data Protection Regulation, are limited in their ability to adequately address many of these issues. Nevertheless, we believe that IS researchers are well-situated to engage with and investigate this new era of personalization. We propose three distinct directions for ethically aware reinforcement learning-based personalization research uniquely suited to the strengths of IS researchers across the sociotechnical spectrum.
Study specs
The study presents a conceptual analysis of emergent features and societal risks associated with reinforcement learning-based personalization and proposes research directions.
- Discipline
- Information Systems
- Study Type
- Literature Review
- Year
- 2025
- Human Data Platform
- Prolific
- Source
- View Source DOI Google Scholar
Measured Outcomes
Potential harms of reinforcement learning-based personalization, such as reduced autonomy, social and political destabilization, and mass surveillance, alongside the limitations of current data protection laws.
Peer Review & Critical Discussion
Potential Selection Bias in 2023 Cohort
The participant pool shows a concerning overrepresentation of users from high-income demographics. Looking at Table 3, we can see that 78% of respondents had annual incomes above $75k, which significantly limits the generalizability of these findings to broader populations.
Non-naive Participants Issue
I've noticed a methodological concern regarding participant naivety. Given that Prolific users often complete multiple studies, there's a real risk that participants had prior exposure to similar experimental paradigms, which could confound the results.
RLHF Applicability to This Study Design
The implications for RLHF training pipelines are understated. If we accept the authors' conclusions about preference stability, this has direct consequences for how we should structure reward model training. The temporal decay effect described in Section 4.2 is particularly relevant.
Verify your expertise to join discussion
Create an account and verify your credentials to participate in peer discussions.