Human-in-the-loop fairness: Integrating stakeholder feedback to incorporate fairness perspectives in responsible AI

7 citations

Abstract

Fairness is a growing concern for high-risk decision-making using Artificial Intelligence (AI) but ensuring it through purely technical means is challenging: there is no universally accepted fairness measure, fairness is context-dependent, and there might be conflicting perspectives on what is considered fair. Thus, involving stakeholders, often without a background in AI or fairness, is a promising avenue. Research to directly involve stakeholders is in its infancy, and many questions remain on how to support stakeholders to feedback on fairness, and how this feedback can be integrated into AI models. Our work follows an approach where stakeholders can give feedback on specific decision instances and their outcomes with respect to their fairness, and then to retrain an AI model. In order to investigate this approach, we conducted two studies of a complex AI model for credit rating used in loan applications. In study 1, we collected feedback from 58 lay users on loan application decisions, and conducted offline experiments to investigate the effects on accuracy and fairness metrics. In study 2, we deepened this investigation by showing 66 participants the results of their feedback with respect to fairness, and then conducted further offline analyses. Our work contributes two datasets and associated code frameworks to bootstrap further research, highlights the opportunities and challenges of employing lay user feedback for improving AI fairness, and discusses practical implications for developing AI applications that more closely reflect stakeholder views about fairness.

7
Citations
Research
Paper Only

Peer Review & Critical Discussion

3 threads

Potential Selection Bias in 2023 Cohort

DSJDr. Sarah J.
Verified PhD Candidate
12 replies

The participant pool shows a concerning overrepresentation of users from high-income demographics. Looking at Table 3, we can see that 78% of respondents had annual incomes above $75k, which significantly limits the generalizability of these findings to broader populations.

2 hours ago

Non-naive Participants Issue

MCM. Chen (OpenAI)
Data Scientist
8 replies

I've noticed a methodological concern regarding participant naivety. Given that Prolific users often complete multiple studies, there's a real risk that participants had prior exposure to similar experimental paradigms, which could confound the results.

5 hours ago

RLHF Applicability to This Study Design

PRWProf. R. Williams
Verified Researcher
15 replies

The implications for RLHF training pipelines are understated. If we accept the authors' conclusions about preference stability, this has direct consequences for how we should structure reward model training. The temporal decay effect described in Section 4.2 is particularly relevant.

1 day ago

Verify your expertise to join discussion

Create an account and verify your credentials to participate in peer discussions.