Safe RLHF: Safe reinforcement learning from human feedback
Authors: J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang
Published: 2023
Publication: arXiv preprint arXiv ..., 2023 - arxiv.org
Research paper: Safe RLHF: Safe reinforcement learning from human feedback
Institution: Cornell University, Georgia Institute of Technology
Research Area: Reinforcement Learning from Human Feedback (RLHF), Safe AI, Reinforcement Learning
Discipline: Artificial Intelligence, Machine Learning
Citations: 598
DOI: https://doi.org/10.48550/arXiv.2310.12773