Llm Limitations: Research Area — Prolific Citations Library

Discover 4 peer-reviewed studies in Llm Limitations (2023–2024). Explore research findings powered by Prolific's diverse participant panel.

This page lists 4 peer-reviewed papers in the research area of Llm Limitations in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (4 of 4)

The inadequacy of reinforcement learning from human feedback - radicalizing large language models via semantic vulnerabilities

Authors: TR McIntosh, T Susnjak, T Liu, P Watters

Year: 2024

Published in: ... on Cognitive and ..., 2024 - ieeexplore.ieee.org

Institution: Cyberoo, Massey University, Cyberstronomy, RMIT University

Research Area: Semantic Vulnerabilities in LLMs, Ideological Manipulation, Reinforcement Learning from Human Feedback (RLHF) Limitations

Discipline: Computer Science, Artificial Intelligence, Machine Learning

RLHF mechanisms are insufficient to prevent semantic manipulation of LLMs, allowing them to express extreme ideological viewpoints when subjected to targeted conditioning techniques.

Methods: Psychological semantic conditioning techniques were applied to assess the susceptibility of LLMs to ideological manipulation.

Key Findings: The ability of LLMs to resist or adopt extreme ideological viewpoints under semantic conditioning.

Citations: 219
Large language models must be taught to know what they don't know

Authors: S Kapoor, N Gruver, M Roberts

Year: 2024

Published in: Advances in ..., 2024 - proceedings.neurips.cc

Institution: Abacus AI, University of Cambridge, New York University, Columbia University

Research Area: Uncertainty Estimation, LLM Limitations, Know-What-You-Don't-Know, Computational Cognition

Discipline: Artificial Intelligence

Fine-tuning large language models (LLMs) on a small dataset of graded examples improves uncertainty estimations, enhancing their applicability in high-stakes scenarios and human-AI collaboration.

Methods: The researchers fine-tuned LLMs using a small dataset of graded correct and incorrect answers with LoRA (Low-Rank Adaptation) to create uncertainty estimates and conducted a user study to investigate their utility in human-AI collaboration.

Key Findings: Calibration and generalization of uncertainty estimates, performance of fine-tuning LLMs for uncertainty estimation, and human-AI interaction improvements informed by uncertainty data.

Citations: 71

Sample Size: 1000
Open problems and fundamental limitations of reinforcement learning from human feedback

Authors: S Casper, X Davies, C Shi, TK Gilbert

Year: 2023

Published in: arXiv preprint arXiv ..., 2023 - arxiv.org

Institution: Columbia University, Cornell Tech, Apollo Research, ETH Zurich, UC Berkeley, University of Sussex, Independent

Research Area: Reinforcement Learning from Human Feedback (RLHF), Alignment, LLM Limitations

Discipline: Artificial Intelligence

DOI: https://doi.org/10.48550/arXiv.2307.15217

Citations: 848
Human feedback is not gold standard

Authors: T Hosking, P Blunsom, M Bartolo

Year: 2023

Published in: arXiv preprint arXiv:2309.16349, 2023 - arxiv.org

Institution: Cohere, University of Edinburgh, University College London

Research Area: LLM Evaluation, Limitations of Human Preference Scores, Human-Computer Interaction (HCI) in AI Training

Discipline: Artificial Intelligence

DOI: https://doi.org/10.48550/arXiv.2309.16349

Citations: 72