Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework
Authors: N Petrova, A Gordon, E Blindow
Published: 2026
Publication: Open review
The study introduces HUMAINE, a multidimensional evaluation framework for LLMs, revealing demographic-specific preference variations and ranking google/gemini-2.5-pro as the top-performing model with a posterior probability of 95.6%.
Methods: Multi-turn naturalistic conversations analyzed using a hierarchical Bayesian Bradley-Terry-Davidson model with post-stratification to census data, stratified across 22 demographic groups.
Key Findings: Performance of 28 LLMs across five human-centric dimensions, accounting for demographic-specific preferences.
Limitations: Potential demographic or regional bias limited to US and UK populations; possible influence of ambiguously measured dimensions like Trust, Ethics & Safety with high tie rates.
Institution: Prolific
Research Area: Human-centered AI evaluation, Bayesian statistics,Responsible AI, AI alignment, LLM Evaluation
Discipline: Machine Learning, Artificial Intelligence
Sample Size: 23404 participants