Authors: Elyas Meguellati1, Assad Zeghina2, Shazia Sadiq1, Gianluca Demartini1
Year: 2025
Published in: ArXiv
Institution: University of Queensland, University of Strasbourg
Research Area: Natural Language Processing, Harmful Content Detection
Discipline: Natural Language Processing
The paper introduces an approach using LLM-based semantic augmentation for harmful content detection on social media, achieving performance comparable to human-annotated models but at reduced cost.
Methods: The researchers utilize LLMs to clean noisy text and generate explanations for context-rich preprocessing, then evaluate the augmented training sets on multiple high-context datasets such as SemEval 2024 Persuasive Meme, Google Jigsaw toxic comments, and Facebook hateful memes datasets.
Key Findings: The efficacy of LLM-based semantic augmentation in enhancing training sets for social media tasks such as propaganda detection, hateful meme classification, and toxicity identification.