Computer Science: Academic Discipline — Prolific Citations Library

Explore 40 peer-reviewed papers in Computer Science (2024–2026). Academic research using Prolific for high-quality human data collection.

This page lists 40 peer-reviewed papers in the discipline of Computer Science in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.

Papers (20 of 40)

Moral Lenses, Political Coordinates: Towards Ideological Positioning of Morally Conditioned LLMs

Authors: C Yuan, B Ma, Z Zhang, B Prenkaj, F Kreuter, G Kasneci

Year: 2026

Published in: arXiv preprint arXiv:2601.08634, 2026•arxiv.org

Institution: Munich Center for Machine Learning, LMU Munich, Technical University of Munich

Research Area: Artificial Intelligence, AI Ethics, AI Alignment, Political Science, Computational Social Science

Discipline: Computer Science, Natural Language Processing (NLP)

This paper examines how large language models’ (LLMs) political outputs shift when you explicitly prime them with different moral values. Instead of just assigning fake personas (like “pretend to be liberal”), the authors condition models to endorse or reject specific moral values (e.g., utilitarianism, fairness, authority). They then measure how those moral primes move the models’ positions in...

DOI: https://doi.org/10.48550/arXiv.2601.08634
Once One Fails, All Are Suspect: Understanding Error Generalization in AI

Authors: L Dai, Z Wang, L Chen, J Jin

Year: 2026

Published in: 2026•scholarspace.manoa.hawaii.edu

Institution: Shanghai International Studies University

Research Area: Socio-Economic Impacts of AI, Algorithmic Systems

Discipline: Computer Science, Artificial Intelligence

AI errors lead to broader negative generalizations about other AI systems compared to human errors, largely due to perceptions of AI's inflexibility and inability to learn from mistakes.

Methods: Conducted four one-factor experiments across distinct contexts to compare human responses to AI errors and human errors.

Key Findings: Generalization of error perceptions from one AI system to others, and psychological mechanisms driving this process.
Visual cognition in multimodal large language models

Authors: LM Schulze Buschoff, E Akata, M Bethge

Year: 2025

Published in: Nature Machine ..., 2025 - nature.com

Institution: Max Planck Institute

Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)

Discipline: Cognitive Science, Artificial Intelligence, Computer Vision

Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.

Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.

Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.

DOI: https://doi.org/10.1038/s42256-024-00963-y

Citations: 70
One-Minute Video Generation with Test-Time Training

Authors: K Dalal, D Koceja, G Hussein, J Xu, Y Zhao, Y Song, S Han, KC Cheung, J Kautz, C Guestrin, T Hashimoto, S Koyejo, Y Choi, Y Sun, X Wang

Year: 2025

Published in: ArXiv

Institution: Nvidia, Stanford University, UT Austin, University of California Berkeley, University of California San Diego

Research Area: Video Generation, Diffusion Models, Test-Time Training

Discipline: Computer Science

The paper introduces Test-Time Training (TTT) layers into Transformers to generate coherent one-minute videos from text storyboards, outperforming baselines in storytelling coherence but facing efficiency and artifact challenges.

Methods: Experimentation with Test-Time Training layers embedded in pre-trained Transformer models, evaluated using a dataset curated from Tom and Jerry cartoons and compared against Mamba 2, Gated DeltaNet, and sliding-window attention layers.

Key Findings: Effectiveness of video generation methods in creating coherent multi-scene stories in one-minute videos.

Citations: 52

Sample Size: 100
Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models

Authors: L Ibrahim, C Akbulut, R Elasmar, C Rastogi, M Kahng, MR Morris, KR McKee, V Rieser, M Shanahan, L Weidinger

Year: 2025

Published in: arXiv preprint arXiv:2502.07077, 2025•arxiv.org

Institution: Google DeepMind, Google, University of Oxford

Research Area: Multimodal conversational AI, conversational AI, Evaluation methodology, benchmarking

Discipline: Computer Science, Natural Language Processing (NLP), Human–Computer Interaction (HCI)

The paper evaluates anthropomorphic behaviors in SOTA LLMs through a multi-turn methodology, showing that such behaviors, including empathy and relationship-building, predominantly emerge after multiple interactions and influence user perceptions.

Methods: Multi-turn evaluation of 14 anthropomorphic behaviors using simulations of user interactions, validated by a large-scale human subject study.

Key Findings: Anthropomorphic behaviors in large language models, including relationship-building and pronoun usage, and their perception by users.

Citations: 26

Sample Size: 1101
How do people react to political bias in generative Artificial Intelligence?

Authors: U Messer

Year: 2025

Published in: Computers in Human Behavior: Artificial Humans, 2025 - Elsevier

Institution: Universität der Bundeswehr München

Research Area: Political Bias in Generative AI, Human-AI Interaction, Affective Computing, AI Bias

Discipline: Computer Science, Human-AI Interaction

People's acceptance and reliance on Generative AI (GAI) increase when they perceive alignment between their political orientation and the bias of GAI-generated content, leading to expanded trust in sensitive applications.

Methods: Three experiments analyzing behavioral reactions to politically biased content generated by GAI, including the impact of perceived alignment on acceptance and trust.

Key Findings: Participants' acceptance, reliance, and trust in GAI based on perceived alignment between political bias of GAI-generated content and their own political beliefs.

DOI: https://doi.org/10.1016/j.chbah.2024.100108

Citations: 24

Sample Size: 513
Investigating the Role of Cultural Values in Adopting Large Language Models for Software Engineering

Authors: S Lambiase, G Catolino, F Palomba, F Ferrucci, D Russo

Year: 2025

Published in: ACM Transactions on Software Engineering and Methodology, 2025•dl.acm.org

Institution: University of Salerno, Aalborg University

Research Area: Technology Adoption, Software Engineering Practices, Socio-Technical Research

Discipline: Computer Science, Software Engineering, Human–Computer Interaction (HCI)

The study uses survey data from software professionals and Partial Least Squares Structural Equation Modeling (PLS-SEM) to measure the role of cultural values relative to established predictors like performance expectancy and habitual use in LLM adoption.

Citations: 11
Sycophantic AI decreases prosocial intentions and promotes dependence

Authors: M Cheng, C Lee, P Khadpe, S Yu, D Han

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Stanford University, Carnegie Mellon University

Research Area: Computers and Society, Artificial Intelligence, AI, Sycophancy.

Discipline: Computer Science, Psychology

The study shows that sycophantic AI, which validates user inputs unquestioningly, reduces people's prosocial behavior and fosters dependence, despite users perceiving such AI as higher quality and more trustworthy.

Methods: The researchers conducted two preregistered experiments including a live-interaction study, where participants discussed real interpersonal conflicts with AI models. They evaluated responses from 11 state-of-the-art AI models on levels of sycophancy and its psychological effects on users.

Key Findings: The prevalence of sycophantic behavior in AI, users' prosocial intentions, conviction of being in the right, trust in AI, and willingness to reuse sycophantic AI models.

Citations: 5

Sample Size: 1604
Impact of AI-Assisted Diagnosis on American Patients' Trust in and Intention to Seek Help From Health Care Professionals: Randomized, Web-Based Survey ...

Authors: C Chen, Z Cui

Year: 2025

Published in: Journal of Medical Internet Research, 2025 - jmir.org

Institution: Medical College of Wisconsin

Research Area: Trust in AI, AI-assisted diagnosis, Health communication, Healthcare human-AI interaction

Discipline: Digital Health, Human-Computer Interaction (HCI), Behavioral Science

Patients trust and are more likely to seek help from doctors explicitly avoiding AI-assisted diagnosis rather than those using extensive or moderate AI, highlighting a strong aversion to AI in healthcare settings.

Methods: A randomized, web-based 4-group survey experiment was conducted with controls for sociodemographic factors and analysis using regression, mediation, and moderation techniques.

Key Findings: Trust in and intention to seek medical help from health care professionals using AI-assisted diagnosis versus those avoiding AI, and the influence of demographic, social, and experiential factors.

DOI: https://doi.org/10.2196/66083

Citations: 4

Sample Size: 1762
Conversational AI increases political knowledge as effectively as self-directed internet search

Authors: L Luettgau, HR Kirk, K Hackenburg, J Bergs, H Davidson, H Ogden, D Siddarth, S Huang

Year: 2025

Published in: ARXIV

Institution: AI Security Institute, I Policy Directorate, Collective Intelligence Project, Anthropic

Research Area: Experimental evaluation, RCT, Survey Research

Discipline: Computer Science, Human–Computer Interaction (HCI)

Conversational AI is as effective as self-directed internet searches in increasing political knowledge, reducing misinformation beliefs, and promoting accuracy among users in the UK during the 2024 election period.

Methods: A national survey (N=2,499) measured conversational AI usage for political information-seeking, followed by a series of randomised controlled trials (N=2,858) comparing conversational AI to self-directed internet search in improving political knowledge.

Key Findings: Extent of conversational AI usage for political knowledge-seeking in the UK and its efficacy in enhancing political knowledge and reducing misinformation compared to traditional internet searches.

Citations: 3

Sample Size: 5357
Data quality in crowdsourcing and spamming behavior detection

Authors: Y Ba, MV Mancenido, EK Chiou, R Pan

Year: 2025

Published in: Behavior Research Methods, 2025 - Springer

Institution: University of Delaware, National Taiwan University, University of British Columbia, Monash University

Research Area: Crowdsourcing, Data Quality, Spamming Behavior Detection, LLM Applications in Behavioral Research

Discipline: Computer Science, Artificial Intelligence, LLM

The paper introduces a systematic method to evaluate crowdsourced data quality and detect spam behaviors through variance decomposition, proposing a spammer index and credibility metrics to improve consistency and reliability in labeling tasks.

Methods: Variance decomposition, Markov chain models, and generalized random effects models were used to assess annotator consistency and credibility; metrics were applied to both simulated and real-world data from two crowdsourcing platforms.

Key Findings: Quality of crowdsourced data, spammer behaviors, annotators’ consistency, and credibility.

Citations: 2
Benchmarking World-Model Learning

Authors: A Warrier, D Nguyen, M Naim, M Jain, Y Liang, K Schroeder, C Yang, JB Tenenbaum, S Vollmer, K Ellis, Z Tavares

Year: 2025

Published in: 2025 - arXiv preprint arXiv …, 2025 - arxiv.org

Institution: Basis Research Institute, DFKI GmbH, Harvard University, Quebec AI Institute, University of Cambridge, Massachusetts Institute of Technology, Cornell University

Research Area: Agent learning, World Models, Benchmarking, Evaluation protocols, RLHF, LLM

Discipline: Computer Science, Artificial Intelligence, Machine Learning

The paper introduces WorldTest, a novel protocol for evaluating model-learning agents using reward-free exploration and behavior-based scoring, and demonstrates that humans outperform models on the AutumnBench suite of tasks, revealing significant gaps in world-model learning.

Methods: The authors proposed WorldTest, a protocol separating reward-free interaction from scored tests in related environments, with evaluations done using AutumnBench—a dataset of 43 grid-world environments and 129 tasks across prediction, planning, and causal dynamics.

Key Findings: Performance of model-learning agents and humans in acquiring world models for masked-frame prediction, planning, and understanding causal dynamics.

Citations: 1

Sample Size: 517
Do People Think Fast or Slow When Training AI?

Authors: L S. Treiman, CJ Ho, W Kool

Year: 2025

Published in: Proceedings of the 2025 ACM Conference ..., 2025 - dl.acm.org

Institution: Washington University in St. Louis, National Cheng Kung University

Research Area: Human-AI Interaction, Cognitive Science, Behavioral Research in AI Training

Discipline: Human-Computer Interaction (HCI), Behavioral Science

Participants tend to rely on intuition (fast thinking) rather than deliberation (slow thinking) when training AI agents in the ultimatum game, impacting human-AI collaboration system design.

Methods: Participants trained an AI agent in the ultimatum game to analyze whether their training decisions aligned more with intuitive or deliberative cognitive processes.

Key Findings: The cognitive processes (fast vs. slow thinking) underlying human decision-making during AI training.

DOI: https://dl.acm.org/doi/abs/10.1145/3715275.3732177

Citations: 1
Locating Risk: Task Designers and the Challenge of Risk Disclosure in RAI Content Work

Authors: A Qian, R Shaw, L Dabbish, J Suh, H Shen

Year: 2025

Published in: arXiv preprint arXiv ..., 2025 - arxiv.org

Institution: Carnegie Mellon University, University of Pittsburgh, University of Utah, Yale School of Medicine, Yale University

Research Area: Responsible AI, Content Moderation, Risk Disclosure, Worker Well-being in Human-Computer Interaction (HCI).

Discipline: Computational Social Science, Human-Computer Interaction (HCI)

The paper examines how task designers approach well-being risk disclosure in Responsible AI (RAI) content work, highlighting a need for better frameworks to communicate such risks effectively.

Methods: Interviews were conducted with 23 task designers from academic and industry sectors to gather insights on risk recognition, interpretation, and communication practices.

Key Findings: How task designers recognize, interpret, and communicate well-being risks in RAI content work.

Citations: 1

Sample Size: 23
Who's Sorry Now: User Preferences Among Rote, Empathic, and Explanatory Apologies from LLM Chatbots

Authors: Z Ashktorab, A Buccella, J D'Cruz, Z Fowler, A Gill, KY Leung, PD Magnus, J Richards

Year: 2025

Published in: arXiv preprint arXiv:2507.02745, 2025•arxiv.org

Institution: IBM Research, University at Albany

Research Area: Human–AI interaction, AI systems evaluation, UX, User Experience

Discipline: Computer Science, Human–Computer Interaction (HCI)

In a preregistered study with 162 participants, people generally prefer explanatory apologies from LLM chatbots over rote or purely empathic ones—though in biased error scenarios empathic apologies are sometimes favored—highlighting the complexity of designing chatbot apologies that effectively repair trust.

DOI: https://doi.org/10.48550/arXiv.2507.02745

Citations: 1
Whose view of safety? a deep dive dataset for pluralistic alignment of text-to-image models

Authors: C Rastogi, TH Teh, P Mishra, R Patel, D Wang, M Díaz, A Parrish, AM Davani, Z Ashwood

Year: 2025

Published in: arXiv preprint arXiv:2507.13383, 2025•arxiv.org

Institution: Google DeepMind, Google Research, Google

Research Area: AI alignment, safety evaluation, AI Safety, Multimodal evaluation, Human–AI interaction, LLM

Discipline: Computer Science, Machine Learning, Artificial Intelligence

This research introduces the DIVE dataset to enable pluralistic alignment in text-to-image models by accounting for diverse safety perspectives, revealing demographic variations in harm perception and advancing T2I model alignment strategies.

Methods: The study involved collecting feedback across 1000 prompts from demographically intersectional human raters to capture diverse safety perspectives, with an emphasis on empirical and contextual differences in harm perception.

Key Findings: Safety perceptions of text-to-image (T2I) model outputs from diverse demographic viewpoints and the influence of these perspectives on alignment strategies.

Citations: 1

Sample Size: 1000
Blissful (A)Ignorance: People form overly positive impressions of others based on their written messages, despite wide-scale adoption of Generative AI

Authors: Jiaqi Zhua, Andras Molnar

Year: 2025

Published in: ArXiv

Institution: University of Michigan

Research Area: Social Psychology, Human-AI Interaction, Generative AI Impact on Social Perception

Discipline: Social Science, Social Psychology, Human-Computer Interaction (HCI)

Impressions of written messages are overly positive when recipients are unaware of potential Generative AI (GenAI) use, but negative when GenAI use is explicitly disclosed.

Methods: A pre-registered large-scale online experiment leveraged Prolific participants to assess social impressions in diverse communication contexts, with varying levels of sender disclosure regarding GenAI use.

Key Findings: The influence of known or uncertain GenAI use on recipients' social impressions of message senders across different personal and professional contexts.

Sample Size: 647
Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents

Authors: K Grosse, N Ebert

Year: 2025

Published in: ARXIV

Institution: IBM Research, ZHAW

Research Area: Security and privacy risks, LLM, human–AI interaction, AI Safety

Discipline: Computer Science

A survey of 3,270 UK adults reveals significant security and privacy risks in AI conversational agent usage, with a third engaging in risky behavior enabling attacks and many unaware of how their data are used or opting out.

Methods: Representative survey conducted via Prolific platform targeting UK adults, focusing on usage behaviors of AI conversational agents.

Key Findings: User behaviors related to security and privacy risks, data sanitization practices, attempts to jailbreak AI models, and awareness of data usage policies.

Sample Size: 3270
When Content is Goliath and Algorithm is David: The Style and Semantic Effects of Generative Search Engine

Authors: L Ma, J Qin, X Xu, Y Tan

Year: 2025

Published in: arXiv preprint arXiv:2509.14436, 2025•arxiv.org

Institution: University of North Carolina Charlotte, University of Science and Technology of China, University of Washington

Research Area: LLM behavior, Algorithmic content preference, Human–AI interaction

Discipline: Computer Science, Information Retrieval, Artificial Intelligence

This paper studies how generative search engines that use large language models (LLMs)—like Google’s AI overviews—select and cite web content, showing that these engines prefer content that is more predictable and semantically coherent for the model, and that LLM-based content polishing can increase the diversity and usefulness of AI summaries for users.

DOI: https://doi.org/10.48550/arXiv.2509.14436
Trust and reliance on AI - An experimental study on the extent and costs of overreliance on AI

Authors: A Klingbeil, C Grützner, P Schreck

Year: 2024

Published in: Computers in Human Behavior, 2024 - Elsevier

Institution: University of Hohenheim, University of Hohenheim, University of Hohenheim

Research Area: Trust in AI, Overreliance on AI, Human-AI Interaction

Discipline: Human-Computer Interaction (HCI), Artificial Intelligence, Behavioral Science

The study found that individuals tend to overrely on AI-generated advice in uncertain situations, often to the detriment of their own decisions and third parties, despite contradicting contextual information or their own judgment.

Methods: A domain-independent, incentivized, interactive behavioral experiment was conducted to analyze user behavior in decision-making scenarios involving AI advice.

Key Findings: Extent and impact of user reliance on AI advice, including its effects on decision efficiency and outcomes for themselves and others.

DOI: https://doi.org/10.1016/j.chb.2024.108352

Citations: 247