University Of California: Research Institution — Prolific Citations Library

Browse 38 peer-reviewed papers from University Of California spanning Human-AI Interaction, Computational Social Science (2024–2025). Research powered by Prolific's high-quality participant data.

This page lists 38 peer-reviewed papers from researchers at University Of California in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (20 of 38)

What large language models know and what people think they know

Authors: M Steyvers, H Tejeda, A Kumar, C Belem

Year: 2025

Published in: Nature Machine ..., 2025 - nature.com

Institution: University of California Irvine

Research Area: Computational Linguistics, Computational Social Science, AI Ethics, Trust in AI

Discipline: Computational Social Science

LLMs often lead to user overestimation of response accuracy, especially with longer explanations; adjusting explanation styles to align with model confidence improves calibration and discrimination gaps, enhancing trust in AI-assisted decision making.

Methods: Conducted experiments using multiple-choice and short-answer questions to study user confidence versus model-stated confidence; varied explanation length and alignment with model internal confidence.

Key Findings: Calibration gap (human vs. model confidence), discrimination gap (ability to distinguish correct vs. incorrect answers), and effects of explanation style and length on user trust.

Citations: 100
One-Minute Video Generation with Test-Time Training

Authors: K Dalal, D Koceja, G Hussein, J Xu, Y Zhao, Y Song, S Han, KC Cheung, J Kautz, C Guestrin, T Hashimoto, S Koyejo, Y Choi, Y Sun, X Wang

Year: 2025

Published in: ArXiv

Institution: Nvidia, Stanford University, UT Austin, University of California Berkeley, University of California San Diego

Research Area: Video Generation, Diffusion Models, Test-Time Training

Discipline: Computer Science

The paper introduces Test-Time Training (TTT) layers into Transformers to generate coherent one-minute videos from text storyboards, outperforming baselines in storytelling coherence but facing efficiency and artifact challenges.

Methods: Experimentation with Test-Time Training layers embedded in pre-trained Transformer models, evaluated using a dataset curated from Tom and Jerry cartoons and compared against Mamba 2, Gated DeltaNet, and sliding-window attention layers.

Key Findings: Effectiveness of video generation methods in creating coherent multi-scene stories in one-minute videos.

Citations: 52

Sample Size: 100
Fostering appropriate reliance on large language models: The role of explanations, sources, and inconsistencies

Authors: SSY Kim, JW Vaughan, QV Liao, T Lombrozo

Year: 2025

Published in: Proceedings of the ..., 2025 - dl.acm.org

Institution: Wake Forest University, University of Illinois at Urbana-Champaign, Princeton University, University of California Berkeley

Research Area: Appropriate Reliance on LLMs, Explainable AI, Human-AI Interaction, Cognitive Psychology

Discipline: Cognitive Psychology, Artificial Intelligence, Human-Computer Interaction (HCI)

The study examines factors that influence users' reliance on LLM responses, finding explanations increase reliance, while sources and inconsistent explanations reduce reliance on incorrect responses.

Methods: Think-aloud study followed by a pre-registered, controlled experiment to assess the impact of explanations, sources, and inconsistencies in LLM responses on user reliance.

Key Findings: Users' reliance on LLM responses, accuracy, and the influence of explanations, inconsistencies, and sources on these measures.

DOI: https://doi.org/10.1145/3706598.3714020

Citations: 38

Sample Size: 308
Scaling language model size yields diminishing returns for single-message political persuasion

Authors: K Hackenburg, BM Tappin, P Röttger, SA Hale

Year: 2025

Published in: Proceedings of the ..., 2025 - pnas.org

Institution: University of California Berkeley, University of Cambridge, University of Oxford, Max Planck Institute

Research Area: Political Persuasion, LLM

Discipline: Computational Social Science, Political Science

Scaling language model sizes leads to diminishing returns in generating persuasive political messages, with larger models providing minimal gains compared to smaller ones after controlling for task completion metrics like coherence and relevance.

Methods: Generated 720 political messages using 24 LLMs of varying sizes and tested their persuasiveness through a large-scale randomized survey experiment.

Key Findings: Persuasive capability of language models across different sizes in generating political messages.

Citations: 31

Sample Size: 25982
Impact of annotator demographics on sentiment dataset labeling

Authors: Y Ding, J You, TK Machulla, J Jacobs, P Sen

Year: 2025

Published in: Proceedings of the ..., 2022 - dl.acm.org

Institution: University of California Irvine, University of Florida, State University of New York at Buffalo, University of Waterloo, Virginia Tech

Research Area: Computational Social Science, Human-Computer Interaction (HCI), Sentiment Analysis

Discipline: Computational Social Science

Demographic differences among annotators significantly affect sentiment dataset labels, causing up to a 4.5% accuracy difference in sentiment prediction models.

Methods: Crowdsourced annotations from >1000 workers combined with demographic data; analysis of multimodal sentiment datasets and evaluation using machine learning models.

Key Findings: Impact of annotator demographics on sentiment labeling and its effect on model predictions.

DOI: https://doi.org/10.1145/3555632

Citations: 28

Sample Size: 1000
REL-AI: An interaction-centered approach to measuring human-lm reliance

Authors: K Zhou, JD Hwang, X Ren, N Dziri

Year: 2025

Published in: Proceedings of the ..., 2025 - aclanthology.org

Institution: Stanford University, University of Southern California, Carnegie Mellon University, Allen Institute for AI

Research Area: Human-LM Reliance, Interaction-Centered Framework, Human-Computer Interaction (HCI)

Discipline: Human-Computer Interaction (HCI), Artificial Intelligence

The study introduces Rel-A.I., an interaction-centered evaluation approach to measure human reliance on LLM responses, revealing that politeness and interaction context significantly influence user reliance.

Methods: Nine user studies were conducted, analyzing user reliance influenced by LLM communication features such as politeness and context through participant interaction experiments.

Key Findings: The degree of human reliance on LLM responses based on communication style (e.g., politeness) and interaction context (e.g., knowledge domain, prior interactions).

Citations: 18

Sample Size: 450
Made with AI: Consumer Engagement with Social Media Containing AI Disclosures

Authors: S Carney, I Riveros, S Tully

Year: 2025

Published in: Available at SSRN 4988760, 2025 - papers.ssrn.com

Institution: University of Southern California

Research Area: Consumer Engagement with AI Disclosures, Social Media Marketing, Social Psychology

Discipline: Social Science

AI-generated content disclosures on social media reduce consumer engagement primarily due to a decrease in parasocial connections, as users perceive creators to exert less effort; signaling greater effort can mitigate this effect.

Methods: Analysis of TikTok engagement data following AIGC disclosure implementation, supplemented by six preregistered experiments.

Key Findings: Impact of AIGC disclosures on consumer engagement and the mediating role of parasocial connections.

Citations: 6
Age and gender distortion in online media and large language models

Authors: D Guilbeault, S Delecourt, BS Desikan

Year: 2025

Published in: Nature, 2025 - nature.com

Institution: Stanford University, University of California Berkeley, University of Oxford

Research Area: AI Bias, Media Representation, Social Science

Discipline: Computational Social Science, Artificial Intelligence

The study highlights age-related gender bias in online media and language models, showing women are portrayed as younger than men, especially in high-status occupations, and explores how algorithms amplify these biases.

Methods: Analysis of 1.4 million images and videos from online sources and nine language models, followed by a pre-registered experiment involving participants to evaluate biases in internet content and algorithms.

Key Findings: Age and gender bias in occupational depiction across online platforms and language models, as well as its influence on beliefs and hiring preferences.

Citations: 4

Sample Size: 459
Associative Memory and Trustworthiness of Artificial Faces in Young and Older Adults

Authors: KO Alberts, AD Castel

Year: 2025

Published in: Experimental Aging Research, 2025 - Taylor & Francis

Institution: University of California Los Angeles

Research Area: Cognitive Aging, Associative Memory, Trustworthiness of Artificial Faces, Human-AI Interaction, Psychology, Trust in AI

Discipline: Psychology, Psychobiology, Aging Research

Older adults perceive artificial faces as equally trustworthy as real faces, unlike young adults who find artificial faces less trustworthy, and older adults show no difference in memory accuracy between face types.

Methods: Participants viewed real and artificial faces associated with scam or neutral conditions, then rated trustworthiness and were tested on associative memory.

Key Findings: Associative memory and perceived trustworthiness of real and artificial faces across young and older adults.

Citations: 1
Evaluating LLM-contaminated Crowdsourcing Data Without Ground Truth

Authors: Y Zhang, J Pang, Z Zhu, Y Liu

Year: 2025

Published in: arXiv preprint arXiv:2506.06991, 2025 - arxiv.org

Institution: Rutgers University, University of California Santa Cruz

Research Area: Artificial Intelligence, Computational Social Science

Discipline: Computational Social Science

The paper proposes a training-free scoring mechanism using peer prediction to detect and mitigate LLM-assisted cheating in crowdsourced annotation tasks, with theoretical guarantees and empirical validation.

Methods: A peer prediction-based mechanism quantifies correlations between worker answers while conditioning on LLM-generated labels, without requiring ground truth or high-dimensional training data.

Key Findings: Detection of LLM-assisted low-effort cheating in crowdsourced annotation tasks, focusing on theoretical effectiveness and empirical robustness.

DOI: https://doi.org/10.48550/arXiv.2506.06991

Citations: 1
Towards Strategic Persuasion with Language Models

Authors: Z Cheng, J You

Year: 2025

Published in: arXiv preprint arXiv:2509.22989, 2025 - arxiv.org

Institution: University of Southern California, University of California Berkeley

Research Area: Artificial Intelligence, Computers and Society, Computer Science and Game Theory, Strategic Persuasion, Reinforcement Learning, Language Models, LLM, RLHF

Discipline: Artificial Intelligence

This paper introduces a scalable framework, utilizing Bayesian Persuasion, to evaluate and train LLMs for strategic persuasion, demonstrating significant persuasion gains and effective strategies through reinforcement learning.

Methods: Repurposed human-human persuasion datasets for evaluation and training; applied Bayesian Persuasion framework; used reinforcement learning to optimize LLMs for strategic persuasion.

Key Findings: The persuasive capabilities and strategies of large language models (LLMs) in various settings.

Citations: 1
A Descriptive and Normative Theory of Human Beliefs in RLHF

Authors: S Dandekar, S Deshmukh, F Chiu, WB Knox

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: University of California, Davis, Northwestern University

Research Area: Reinforcement Learning from Human Feedback (RLHF), Human-AI Interaction, AI Theory

Discipline: Artificial Intelligence, Social Science

The paper investigates how human beliefs about agent capabilities influence preferences in RLHF, proposing a model to minimize the mismatch between beliefs and idealized agent capabilities, ultimately improving policy performance.

Methods: Human studies and synthetic experiments to model and test the impact of belief mismatches on human preferences and RLHF effectiveness.

Key Findings: Effects of human beliefs about agent capabilities on their provided preferences and the performance of RLHF policies.

DOI: https://doi.org/10.48550/arXiv.2506.01692
Comparing discriminatory behavior against AI and humans

Authors: M Zhuang, E Deschrijver, R Ramsey, O Turel

Year: 2025

Published in: Scientific Reports, 2025 - nature.com

Institution: Monash University, The University of Melbourne, KU Leuven, California State University Fullerton

Research Area: Human-AI Interaction, Social Bias, Decision-Making

Discipline: Social Science, Human-AI Interaction

The study found that humans exhibit similar discriminatory behavior toward both AI and human agents, with resource allocation being influenced more by decision alignment than the recipient's identity.

Methods: A preregistered experiment was conducted where participants distributed resources between themselves and either human or AI agents based on dot estimation decisions.

Key Findings: Discriminatory behavior and resource allocation preferences toward AI and human agents as influenced by decision congruency.

DOI: https://doi.org/10.1038/s41598-025-94631-9

Sample Size: 500
LLM-Generated Ads: From Personalization Parity to Persuasion Superiority

Authors: E Meguellati, S Civelli, L Han, A Bernstein

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Oregon Health Sciences University, Oregon University of California, Irvine, Han Institute, NYU School of Law, Bernstein Research

Research Area: Advertising, Persuasion Strategies, Human-AI Interaction in Content Generation

Discipline: Artificial Intelligence

LLM-generated advertisements achieved parity with human-written ads in personalization and demonstrated superiority in persuasion using psychological principles, outperforming human ads even when AI-origin detection impacted results.

Methods: Two-part study: First examined LLM personalization based on personality traits; second tested psychological persuasion principles using universal messages across authority, consensus, cognition, and scarcity.

Key Findings: Effectiveness of LLM-generated ads in personalization and persuasive storytelling compared to human-created ads.

Sample Size: 1200
Large Language Models Pass the Turing Test

Authors: Cameron R. Jones, Benjamin K. Bergen

Year: 2025

Published in: ArXiv

Institution: University of California San Diego

Research Area: Artificial Intelligence, Computational Linguistics, Turing Test, AI Evaluation

Discipline: Artificial Intelligence

GPT-4.5 passed the Turing Test by being misidentified as human 73% of the time, surpassing real humans and other models, marking the first conclusive evidence of an AI achieving this standard.

Methods: Randomised, controlled, pre-registered Turing Test where 5-minute conversations were conducted between human participants and AI systems, followed by judgments on which partner was human.

Key Findings: The ability of AI systems (ELIZA, GPT-4o, LLaMa-3.1-405B, GPT-4.5) to mimic human conversational behavior and be perceived as human.
AI can help people feel heard, but an AI label diminishes this impact

Authors: Y Yin, N Jia, CJ Wakslak

Year: 2024

Published in: Proceedings of the National Academy of ..., 2024 - pnas.org

Institution: University of Southern California Los Angeles

Research Area: Human-AI Interaction, Social Perception of AI, Media Effects

Discipline: Social Sciences

AI responses make people feel more heard and are better at emotional support compared to humans, but labeling responses as AI diminishes this effect.

Methods: Experiment and follow-up study to assess recipient reactions to AI vs. human-generated responses and determine emotional support efficacy.

Key Findings: The degree to which recipients feel heard, emotion detection accuracy, and third-party ratings of emotional support quality.

DOI: https://doi.org/10.1073/pnas.2319112121

Citations: 201
Online images amplify gender bias

Authors: D Guilbeault, S Delecourt, T Hull, BS Desikan, M Chu

Year: 2024

Published in: Nature, 2024 - nature.com

Institution: University of California Berkeley, Institute For Public Policy Research, Columbia University, University of Southern California Los Angeles

Research Area: Gender Bias, Computational Social Science, Online Media, AI Bias

Discipline: Computational Social Science

Online images significantly amplify gender bias compared to text, with biases in visual content impacting societal beliefs about gender roles.

Methods: Analyzed 3,495 social categories using over one million images from platforms like Google, Wikipedia, and IMDb, compared visual content to billions of words from the same platforms, and conducted a preregistered national experiment to assess the psychological impact on participants' beliefs.

Key Findings: The prevalence and psychological impact of gender bias in online images compared to text, including gender associations and representation disparities.

DOI: https://doi.org/10.1038/s41586-024-07068-x

Citations: 72

Sample Size: 3495
ImagenHub: Standardizing the evaluation of conditional image generation models

Authors: M Ku, T Li, K Zhang, Y Lu, X Fu, W Zhuang

Year: 2024

Published in: - arXiv preprint arXiv …, 2023 - arxiv.org

Institution: University of Waterloo, Ohio State University, University of California Santa Barbara, University of Pensylvania

Research Area: AI alignment, Representation learning, Cognitive computational modeling, Vision foundation models evaluation, Multimodal, Vision models

Discipline: Computer Science, Artificial Intelligence, Machine Learning

This paper presents a method for **aligning machine vision model representations with human visual similarity judgments across different abstraction levels, improving how well models reflect human perceptual and conceptual organization and enhancing generalization and uncertainty prediction.

DOI: https://doi.org/10.48550/arXiv.2310.01596

Citations: 59
Organic or diffused: Can we distinguish human art from ai-generated images?

Authors: AYJ Ha, J Passananti, R Bhaskar, S Shan

Year: 2024

Published in: Proceedings of the ..., 2024 - dl.acm.org

Institution: University of California Santa Barbara, The University of Chicago, Institute of Education, University College London

Research Area: Human-Computer Interaction (HCI), Generative AI, Digital Forensics

Discipline: Human-Computer Interaction (HCI), Generative AI, Digital Forensics

The paper investigates the effectiveness of different approaches, including both human and automated detectors, in distinguishing human art from AI-generated images, finding that a combination of methods offers the best performance despite persistent weaknesses.

Methods: Comparison of human art across 7 styles with AI-generated images from 5 generative models, assessed using 5 automated detectors and 3 human groups (crowdworkers, professional artists, expert artists).

Key Findings: Detection accuracy and robustness of human and automated methods in identifying AI-generated images under benign and adversarial conditions.

DOI: 10.1145/3658644.3670306

Citations: 52

Sample Size: 3993
Detecting the corruption of online questionnaires by artificial intelligence

Authors: B Lebrun, S Temtsin, A Vonasch

Year: 2024

Published in: Frontiers in Robotics and ..., 2024 - frontiersin.org

Institution: University of Lausanne, University of California Berkeley, University of Massachusetts Amherst, Arizona State University

Research Area: AI in Social Science Research, Survey Methodology, Data Quality

Discipline: Artificial Intelligence

The study examines the integrity of online questionnaire responses and concludes that humans can identify AI-generated text with 76% accuracy, but current AI detection systems are ineffective, raising concerns about data quality in online surveys.

Methods: Human participants and automatic AI detection systems were tested on their ability to differentiate AI-generated text from human-generated text in the context of online questionnaires.

Key Findings: The study measured the ability of humans and AI detection tools to correctly identify whether text was generated by a human or an AI system for online questionnaire responses.

DOI: https://doi.org/10.3389/frobt.2023.1277635

Citations: 26