Authors: P. Schoenegger, F. Salvi, J. Liu, X. Nan, R. Debnath, B. Fasolo, E. Leivada, G. Recchia, F. Günther, A. Zarifhonarvar, J. Kwon, Z. Ul Islam, M. Dehnert, D. Y. H. Lee, M. G. Reinecke, D. G. Kamper, M. Kobaş, A. Sandford, J. Kgomo, L. Hewitt, S. Kapoor, K. Oktar, E. E. Kucuk, B. Feng, C. R. Jones, I. Gainsburg, S. Olschewski, N. Heinzelmann, F. Cruz, B. M. Tappin, T. Ma, P. S. Park, R. Onyonka, A. Hjorth, P. Slattery, Q. Zeng, L. Finke, I. Grossmann, A. Salatiello, E. Karger
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: London School of Economics and Political Science, University of Cambridge, University College London, Massachusetts Institute of Technology, University of Oxford, Modulo Research, Stanford University, Federal Reserve Bank of Chicago, ETH Zürich, University of Johannesburg
Research Area: Computation and Language
Discipline: Social Science, Artificial Intelligence
This paper compares a frontier LLM (Claude Sonnet 3.5) against incentivized human persuaders in a conversational quiz setting, finding that the AI's persuasion capabilities surpass those of humans with real-money bonuses tied to performance.
Citations: 16
Authors: A Karamolegkou, O Eberle, P Rust, C Kauf, A Søgaard
Year: 2025
Published in: ArXiv
Institution: Aleph Alpha, Massachusetts Institute of Technology
Research Area: Adversarial Ambiguity, Language Model Evaluation, Artificial intelligence, Computation and Language, LLM, AI Evaluation, Red Teaming
Discipline: Natural Language Processing
The paper assesses language models' sensitivity to ambiguity using an adversarial dataset and finds that direct prompting poorly identifies ambiguity, while linear probes achieve high accuracy in decoding ambiguity from model representations.
Methods: An adversarial ambiguity dataset was introduced with various types of ambiguities and transformations; models were tested using direct prompts and linear probes trained on internal representations.
Key Findings: Language models' ability to detect ambiguity, including syntactic, lexical, and phonological types, as well as performance under adversarial variations.
Citations: 2