REL-AI: An interaction-centered approach to measuring human-lm reliance
Abstract
The ability to communicate uncertainty and knowledge limitations is crucial for the safety of large language models (LLMs). Current evaluations of these abilities typically examine the correspondence between model accuracy and its internal probabilities or linguistic outputs. However, evaluation of the uncertainty of LLM communication should also focus on the behaviors of their human interlocutors: how much do users rely on what the LLM says? We introduce an interaction-centered evaluation approach called Rel-A.I. (pronounced “rely”) that quantifies whether and how humans rely on LLMs’ responses, complementing existing calibration evaluations. Through nine user studies with 450 participants, we investigate three crucial aspects that influence user reliance. We show that emphatic expressions of politeness (e.g., “I’m happy to help!”) that precede LLM answers will cause participants to perceive these models as more competent, and in turn, rely 30% more on their generations. Additionally, the context of the interaction, such as the knowledge domain and nature of previous interactions with the LLM, substantially influences user reliance (e.g., users will rely 10% more on LLMs when responding to questions involving calculations). Our results show that calibration and language quality alone are insufficient in informing which LLMs are safely calibrated, and illustrate the need to consider features of the interactional context.
Study specs
Nine user studies were conducted, analyzing user reliance influenced by LLM communication features such as politeness and context through participant interaction experiments.
- Institution
- Stanford University,University of Southern California,Carnegie Mellon University,Allen Institute for AI
- Sample Size
- N=450
- Study Type
- Evaluation Study
- Year
- 2025
- Human Data Platform
- Prolific
- Source
- View Source Google Scholar
Measured Outcomes
The degree of human reliance on LLM responses based on communication style (e.g., politeness) and interaction context (e.g., knowledge domain, prior interactions).
Peer Review & Critical Discussion
Potential Selection Bias in 2023 Cohort
The participant pool shows a concerning overrepresentation of users from high-income demographics. Looking at Table 3, we can see that 78% of respondents had annual incomes above $75k, which significantly limits the generalizability of these findings to broader populations.
Non-naive Participants Issue
I've noticed a methodological concern regarding participant naivety. Given that Prolific users often complete multiple studies, there's a real risk that participants had prior exposure to similar experimental paradigms, which could confound the results.
RLHF Applicability to This Study Design
The implications for RLHF training pipelines are understated. If we accept the authors' conclusions about preference stability, this has direct consequences for how we should structure reward model training. The temporal decay effect described in Section 4.2 is particularly relevant.
Verify your expertise to join discussion
Create an account and verify your credentials to participate in peer discussions.