First-person fairness in chatbots
Authors: T Eloundou, A Beutel, DG Robinson
Published: 2024
Publication: arXiv preprint arXiv ..., 2024 - arxiv.org
The paper introduces a counterfactual approach to evaluate 'first-person fairness' in chatbots, demonstrating that reinforcement learning can mitigate biases based on demographics across extensive chatbot interactions.
Methods: The study uses a Language Model as a Research Assistant (LMRA) to quantitatively and qualitatively assess biases based on demographics across millions of chatbot interactions, covering 66 tasks in 9 domains and involving two genders and four races. Bias evaluations are corroborated by independent human annotations.
Key Findings: Demographic biases in chatbot responses, including harmful stereotypes and response differences by gender and race, across diverse tasks and domains.
Institution: OpenAI, Google DeepMind, Google, University of Oxford
Research Area: Fairness in LLM, AI Bias, AI Ethics
Discipline: Artificial Intelligence , Social Science
Sample Size: 6000000 participants
Citations: 33
DOI: https://doi.org/10.48550/arXiv.2410.19803