Aleph Alpha: Research Institution — Prolific Citations Library

Browse 1 peer-reviewed paper from Aleph Alpha spanning Adversarial Ambiguity, Language Model Evaluation (2025). Research powered by Prolific's high-quality participant data.

This page lists 1 peer-reviewed paper from researchers at Aleph Alpha in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (1 of 1)

Trick or Neat: Adversarial Ambiguity and Language Model Evaluation

Authors: A Karamolegkou, O Eberle, P Rust, C Kauf, A Søgaard

Year: 2025

Published in: ArXiv

Institution: Aleph Alpha, Massachusetts Institute of Technology

Research Area: Adversarial Ambiguity, Language Model Evaluation, Artificial intelligence, Computation and Language, LLM, AI Evaluation, Red Teaming

Discipline: Natural Language Processing

The paper assesses language models' sensitivity to ambiguity using an adversarial dataset and finds that direct prompting poorly identifies ambiguity, while linear probes achieve high accuracy in decoding ambiguity from model representations.

Methods: An adversarial ambiguity dataset was introduced with various types of ambiguities and transformations; models were tested using direct prompts and linear probes trained on internal representations.

Key Findings: Language models' ability to detect ambiguity, including syntactic, lexical, and phonological types, as well as performance under adversarial variations.

Citations: 2