Can AI replace human subjects? a large-scale replication of psychological experiments with LLMs
Authors: Z Cui, N Li, H Zhou
Published: 2024
Publication: A Large-Scale Replication of Psychological ..., 2024 - papers.ssrn.com
Large Language Models (LLMs) like GPT-4 successfully replicate 76% of main effects and 47% of interaction effects from 154 psychological experiments, but exhibit overestimation and potential false positives, highlighting their complementary role rather than full replacement of human subjects.
Methods: Replication of 154 psychological experiments from top social science journals using GPT-4 as a simulated participant to measure main effects and interaction effects.
Key Findings: The ability of GPT-4 to replicate human responses in psychological experiments and the extent to which it produces similar results in terms of effect direction, significance, and confidence intervals.
Limitations: LLMs produced overestimated effect sizes with low confidence interval alignment to original results and a significant rate of unexpected findings, suggesting susceptibility to overestimation or false positives.
Institution: Harbin Institute of Technology at Weihai
Research Area: LLM replication of psychological experiments, Social Science Research Methods, Artificial Intelligence, Psychology
Discipline: Psychological Science
Sample Size: 154 participants
Citations: 29