Open problems and fundamental limitations of reinforcement learning from human feedback
Authors: S Casper, X Davies, C Shi, TK Gilbert
Published: 2023
Publication: arXiv preprint arXiv ..., 2023 - arxiv.org
Research paper: Open problems and fundamental limitations of reinforcement learning from human feedback
Institution: Columbia University, Cornell Tech, Apollo Research, ETH Zurich, UC Berkeley, University of Sussex, Independent
Research Area: Reinforcement Learning from Human Feedback (RLHF), Alignment, LLM Limitations
Discipline: Artificial Intelligence
Citations: 848
DOI: https://doi.org/10.48550/arXiv.2307.15217