Authors: N Petrova, A Gordon, E Blindow
Year: 2026
Published in: Open review
Institution: Prolific
Research Area: Human-centered AI evaluation, Bayesian statistics, Responsible AI, AI alignment, LLM Evaluation
Discipline: Machine Learning, Artificial Intelligence
The study introduces HUMAINE, a multidimensional evaluation framework for LLMs, revealing demographic-specific preference variations and ranking google/gemini-2.5-pro as the top-performing model with a posterior probability of 95.6%.
Methods: Multi-turn naturalistic conversations analyzed using a hierarchical Bayesian Bradley-Terry-Davidson model with post-stratification to census data, stratified across 22 demographic groups.
Key Findings: Performance of 28 LLMs across five human-centric dimensions, accounting for demographic-specific preferences.
Sample Size: 23404
Authors: J Zhou, R Aloufi, N van Zalk
Year: 2025
Published in: 38th International BCS Human ..., 2025 - scienceopen.com
Institution: The affiliated institutions for the authors are not available in the provided context or search results.
Research Area: High-Stakes Decision-Making, Explainable AI, User Trust, Human-Centered AI, Interaction Design
Discipline: Human-Computer Interaction (HCI), Artificial Intelligence
This study explores how human collaboration and communication dynamics vary when interacting with an AI chatbot versus a human partner in a high-stakes decision-making task.
Methods: One-way between-subjects design using the NASA Moon Survival Task to compare behaviors, linguistic coordination, and perceptions in interactions with AI or human partners.
Key Findings: Collaboration processes, communicative dynamics, outcomes, retrospective interaction experience, partner perception, and linguistic coordination, with user profiling for AI benefit variations.
DOI: doi.org/10.14236/ewic/BCSHCI2025.52
Citations: 1