I know this looks bad, but I can explain: Understanding when AI should explain actions in human-AI teams
Abstract
Explanation of artificial intelligence (AI) decision-making has become an important research area in human--computer interaction (HCI) and computer-supported teamwork research. While plenty of research has investigated AI explanations with an intent to improve AI transparency and human trust in AI, how AI explanations function in teaming environments remains unclear. Given that a major benefit of AI giving explanations is to increase human trust understanding how AI explanations impact human trust is crucial to effective human-AI teamwork. An online experiment was conducted with 156 participants to explore this question by examining how a teammate's explanations impact the perceived trust of the teammate and the effectiveness of the team and how these impacts vary based on whether the teammate is a human or an AI. This study shows that explanations facilitate trust in AI teammates when explaining why AI disobeyed humans' orders but hindered trust when explaining why an AI lied to humans. In addition, participants' personal characteristics (e.g., their gender and the individual's ethical framework) impacted their perceptions of AI teammates both directly and indirectly in different scenarios. Our study contributes to interactive intelligent systems and HCI by shedding light on how an AI teammate's actions and corresponding explanations are perceived by humans while identifying factors that impact trust and perceived effectiveness. This work provides an initial understanding of AI explanations in human-AI teams, which can be used for future research to build upon in exploring AI explanation implementation in collaborative environments.
Study specs
Conducted an online experiment analyzing participant responses to scenarios where AI explained its actions within a teamwork context, comparing trust in AI versus human teammates.
- Authors
- R Zhang,C Flathmann,G Musick,B Schelble
- Institution
- North Carolina State University,University of North Carolina at Charlotte,University of Georgia,University of Michigan
- Discipline
- Robotics,Artificial Intelligence
- Sample Size
- N=156
- Study Type
- Experimental Study
- Year
- 2024
- Human Data Platform
- Prolific
- Source
- View Source DOI Google Scholar
Measured Outcomes
Impact of AI explanations on human trust, team effectiveness, and how these vary with teammate identity (human or AI) and participant characteristics (e.g., gender, ethical framework).
Peer Review & Critical Discussion
Potential Selection Bias in 2023 Cohort
The participant pool shows a concerning overrepresentation of users from high-income demographics. Looking at Table 3, we can see that 78% of respondents had annual incomes above $75k, which significantly limits the generalizability of these findings to broader populations.
Non-naive Participants Issue
I've noticed a methodological concern regarding participant naivety. Given that Prolific users often complete multiple studies, there's a real risk that participants had prior exposure to similar experimental paradigms, which could confound the results.
RLHF Applicability to This Study Design
The implications for RLHF training pipelines are understated. If we accept the authors' conclusions about preference stability, this has direct consequences for how we should structure reward model training. The temporal decay effect described in Section 4.2 is particularly relevant.
Verify your expertise to join discussion
Create an account and verify your credentials to participate in peer discussions.