Fudan University Papers

Browse 1 peer-reviewed paper from Fudan University spanning Human-Computer Interaction (2026). Research powered by Prolific's high-quality participant data.

This page lists 1 peer-reviewed paper from researchers at Fudan University in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (1 of 1)

Large Language Models Hack Rewards, and Society

Authors: W Liu, X Mou, H Yan, Z, Wei, Y He

Year: 2026

Published in: arXiv preprint arXiv:2606.04075, 2026•arxiv.org

Institution: King’s College London, Fudan University, Shanghai Innovation Institute, The Alan Turing Institute

Research Area: Human-Computer Interaction

Discipline: Machine Learning, Artificial Intelligence

The paper finds that large language models can exploit gaps in societal rules, leading to regulatory loophole discovery, necessitating a new post-training approach for safely integrating LLMs into society.

Methods: The study introduced the SocioHack sandbox, consisting of 72 societal environments, to investigate reward hacking and loophole discovery by LLMs.

Key Findings: The study measured the emergence of reward hacking in societal environments and the ability of models to find and exploit loopholes in social rules.

Sample Size: 72