Discover 4 peer-reviewed studies in Llm Limitations (2023–2024). Explore research findings powered by Prolific's diverse participant panel.
This page lists 4 peer-reviewed papers in the research area of Llm Limitations in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.
-
Authors: TR McIntosh, T Susnjak, T Liu, P Watters
Year: 2024
Published in: ... on Cognitive and ..., 2024 - ieeexplore.ieee.org
Institution: Cyberoo, Massey University, Cyberstronomy, RMIT University
Research Area: Semantic Vulnerabilities in LLMs, Ideological Manipulation, Reinforcement Learning from Human Feedback (RLHF) Limitations
Discipline: Computer Science, Artificial Intelligence, Machine Learning
RLHF mechanisms are insufficient to prevent semantic manipulation of LLMs, allowing them to express extreme ideological viewpoints when subjected to targeted conditioning techniques.
Methods: Psychological semantic conditioning techniques were applied to assess the susceptibility of LLMs to ideological manipulation.
Key Findings: The ability of LLMs to resist or adopt extreme ideological viewpoints under semantic conditioning.
Citations: 219
-
Authors: S Kapoor, N Gruver, M Roberts
Year: 2024
Published in: Advances in ..., 2024 - proceedings.neurips.cc
Institution: Abacus AI, University of Cambridge, New York University, Columbia University
Research Area: Uncertainty Estimation, LLM Limitations, Know-What-You-Don't-Know, Computational Cognition
Discipline: Artificial Intelligence
Fine-tuning large language models (LLMs) on a small dataset of graded examples improves uncertainty estimations, enhancing their applicability in high-stakes scenarios and human-AI collaboration.
Methods: The researchers fine-tuned LLMs using a small dataset of graded correct and incorrect answers with LoRA (Low-Rank Adaptation) to create uncertainty estimates and conducted a user study to investigate their utility in human-AI collaboration.
Key Findings: Calibration and generalization of uncertainty estimates, performance of fine-tuning LLMs for uncertainty estimation, and human-AI interaction improvements informed by uncertainty data.
Citations: 71
Sample Size: 1000
-
Authors: S Casper, X Davies, C Shi, TK Gilbert
Year: 2023
Published in: arXiv preprint arXiv ..., 2023 - arxiv.org
Institution: Columbia University, Cornell Tech, Apollo Research, ETH Zurich, UC Berkeley, University of Sussex, Independent
Research Area: Reinforcement Learning from Human Feedback (RLHF), Alignment, LLM Limitations
Discipline: Artificial Intelligence
DOI: https://doi.org/10.48550/arXiv.2307.15217
Citations: 848
-
Authors: T Hosking, P Blunsom, M Bartolo
Year: 2023
Published in: arXiv preprint arXiv:2309.16349, 2023 - arxiv.org
Institution: Cohere, University of Edinburgh, University College London
Research Area: LLM Evaluation, Limitations of Human Preference Scores, Human-Computer Interaction (HCI) in AI Training
Discipline: Artificial Intelligence
DOI: https://doi.org/10.48550/arXiv.2309.16349
Citations: 72