HRLAIF: Improvements in helpfulness and harmlessness in open-domain reinforcement learning from ai feedback
Authors: A Li, Q Xiao, P Cao, J Tang, Y Yuan, Z Zhao
Published: 2024
Publication: arXiv preprint arXiv ..., 2024 - arxiv.org
Research paper: HRLAIF: Improvements in helpfulness and harmlessness in open-domain reinforcement learning from ai feedback
Institution: Beijing University, Alibaba Group
Research Area: Reinforcement Learning from AI Feedback (RLAIF), Safety and Utility of Open-domain Language Models, Open Source LLM
Discipline: Artificial Intelligence
Citations: 12