Cornell Tech Papers

Browse 3 peer-reviewed papers from Cornell Tech spanning Reinforcement Learning from Human Feedback (RLHF), Agent learning (2023–2025). Research powered by Prolific's high-quality participant data.

This page lists 3 peer-reviewed papers from researchers at Cornell Tech in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (3 of 3)

Benchmarking World-Model Learning

Authors: A Warrier, D Nguyen, M Naim, M Jain, Y Liang, K Schroeder, C Yang, JB Tenenbaum, S Vollmer, K Ellis, Z Tavares

Year: 2025

Published in: 2025 - arXiv preprint arXiv …, 2025 - arxiv.org

Institution: Basis Research Institute, DFKI GmbH, Harvard University, Quebec AI Institute, University of Cambridge, Massachusetts Institute of Technology, Cornell University

Research Area: Agent learning, World Models, Benchmarking, Evaluation protocols, Reinforcement Learning from Human Feedback (RLHF), Large Language Models

Discipline: Computer Science, Artificial Intelligence, Machine Learning

The paper introduces WorldTest, a novel protocol for evaluating model-learning agents using reward-free exploration and behavior-based scoring, and demonstrates that humans outperform models on the AutumnBench suite of tasks, revealing significant gaps in world-model learning.

Methods: The authors proposed WorldTest, a protocol separating reward-free interaction from scored tests in related environments, with evaluations done using AutumnBench—a dataset of 43 grid-world environments and 129 tasks across prediction, planning, and causal dynamics.

Key Findings: Performance of model-learning agents and humans in acquiring world models for masked-frame prediction, planning, and understanding causal dynamics.

Citations: 1

Sample Size: 517
Open problems and fundamental limitations of reinforcement learning from human feedback

Authors: S Casper, X Davies, C Shi, TK Gilbert

Year: 2023

Published in: arXiv preprint arXiv ..., 2023 - arxiv.org

Institution: Columbia University, Cornell Tech, Apollo Research, ETH Zurich, UC Berkeley, University of Sussex, Independent

Research Area: Reinforcement Learning from Human Feedback (RLHF), Alignment, LLM Limitations

Discipline: Artificial Intelligence

DOI: https://doi.org/10.48550/arXiv.2307.15217

Citations: 848
Safe RLHF: Safe reinforcement learning from human feedback

Authors: J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang

Year: 2023

Published in: arXiv preprint arXiv ..., 2023 - arxiv.org

Institution: Cornell University, Georgia Institute of Technology

Research Area: Reinforcement Learning from Human Feedback (RLHF), Safe AI, Reinforcement Learning

Discipline: Artificial Intelligence, Machine Learning

DOI: https://doi.org/10.48550/arXiv.2310.12773

Citations: 598