Explore 40 peer-reviewed papers in Computer Science (2024–2026). Academic research using Prolific for high-quality human data collection.
This page lists 40 peer-reviewed papers in the discipline of Computer Science in the Prolific Citations Library, a curated collection of research powered by high-quality human data from Prolific.
-
Authors: C Yuan, B Ma, Z Zhang, B Prenkaj, F Kreuter, G Kasneci
Year: 2026
Published in: arXiv preprint arXiv:2601.08634, 2026•arxiv.org
Institution: Munich Center for Machine Learning, LMU Munich, Technical University of Munich
Research Area: Artificial Intelligence, AI Ethics, AI Alignment, Political Science, Computational Social Science
Discipline: Computer Science, Natural Language Processing (NLP)
This paper examines how large language models’ (LLMs) political outputs shift when you explicitly prime them with different moral values. Instead of just assigning fake personas (like “pretend to be liberal”), the authors condition models to endorse or reject specific moral values (e.g., utilitarianism, fairness, authority). They then measure how those moral primes move the models’ positions in...
DOI: https://doi.org/10.48550/arXiv.2601.08634
-
Authors: L Dai, Z Wang, L Chen, J Jin
Year: 2026
Published in: 2026•scholarspace.manoa.hawaii.edu
Institution: Shanghai International Studies University
Research Area: Socio-Economic Impacts of AI, Algorithmic Systems
Discipline: Computer Science, Artificial Intelligence
AI errors lead to broader negative generalizations about other AI systems compared to human errors, largely due to perceptions of AI's inflexibility and inability to learn from mistakes.
Methods: Conducted four one-factor experiments across distinct contexts to compare human responses to AI errors and human errors.
Key Findings: Generalization of error perceptions from one AI system to others, and psychological mechanisms driving this process.
-
Authors: LM Schulze Buschoff, E Akata, M Bethge
Year: 2025
Published in: Nature Machine ..., 2025 - nature.com
Institution: Max Planck Institute
Research Area: Visual Cognition, Multimodal Large Language Models (MLLMs), Vision-Language Models (VLMs)
Discipline: Cognitive Science, Artificial Intelligence, Computer Vision
Vision-based large language models show proficiency in visual data interpretation but fall short in human-like abilities for causal reasoning, intuitive physics, and social cognition.
Methods: Controlled experiments evaluating model performance on tasks related to intuitive physics, causal reasoning, and intuitive psychology using visual processing benchmarks.
Key Findings: Model capabilities in understanding physical interactions, causal relationships, and social preferences.
DOI: https://doi.org/10.1038/s42256-024-00963-y
Citations: 70
-
Authors: K Dalal, D Koceja, G Hussein, J Xu, Y Zhao, Y Song, S Han, KC Cheung, J Kautz, C Guestrin, T Hashimoto, S Koyejo, Y Choi, Y Sun, X Wang
Year: 2025
Published in: ArXiv
Institution: Nvidia, Stanford University, UT Austin, University of California Berkeley, University of California San Diego
Research Area: Video Generation, Diffusion Models, Test-Time Training
Discipline: Computer Science
The paper introduces Test-Time Training (TTT) layers into Transformers to generate coherent one-minute videos from text storyboards, outperforming baselines in storytelling coherence but facing efficiency and artifact challenges.
Methods: Experimentation with Test-Time Training layers embedded in pre-trained Transformer models, evaluated using a dataset curated from Tom and Jerry cartoons and compared against Mamba 2, Gated DeltaNet, and sliding-window attention layers.
Key Findings: Effectiveness of video generation methods in creating coherent multi-scene stories in one-minute videos.
Citations: 52
Sample Size: 100
-
Authors: L Ibrahim, C Akbulut, R Elasmar, C Rastogi, M Kahng, MR Morris, KR McKee, V Rieser, M Shanahan, L Weidinger
Year: 2025
Published in: arXiv preprint arXiv:2502.07077, 2025•arxiv.org
Institution: Google DeepMind, Google, University of Oxford
Research Area: Multimodal conversational AI, conversational AI, Evaluation methodology, benchmarking
Discipline: Computer Science, Natural Language Processing (NLP), Human–Computer Interaction (HCI)
The paper evaluates anthropomorphic behaviors in SOTA LLMs through a multi-turn methodology, showing that such behaviors, including empathy and relationship-building, predominantly emerge after multiple interactions and influence user perceptions.
Methods: Multi-turn evaluation of 14 anthropomorphic behaviors using simulations of user interactions, validated by a large-scale human subject study.
Key Findings: Anthropomorphic behaviors in large language models, including relationship-building and pronoun usage, and their perception by users.
Citations: 26
Sample Size: 1101
-
Authors: U Messer
Year: 2025
Published in: Computers in Human Behavior: Artificial Humans, 2025 - Elsevier
Institution: Universität der Bundeswehr München
Research Area: Political Bias in Generative AI, Human-AI Interaction, Affective Computing, AI Bias
Discipline: Computer Science, Human-AI Interaction
People's acceptance and reliance on Generative AI (GAI) increase when they perceive alignment between their political orientation and the bias of GAI-generated content, leading to expanded trust in sensitive applications.
Methods: Three experiments analyzing behavioral reactions to politically biased content generated by GAI, including the impact of perceived alignment on acceptance and trust.
Key Findings: Participants' acceptance, reliance, and trust in GAI based on perceived alignment between political bias of GAI-generated content and their own political beliefs.
DOI: https://doi.org/10.1016/j.chbah.2024.100108
Citations: 24
Sample Size: 513
-
Authors: S Lambiase, G Catolino, F Palomba, F Ferrucci, D Russo
Year: 2025
Published in: ACM Transactions on Software Engineering and Methodology, 2025•dl.acm.org
Institution: University of Salerno, Aalborg University
Research Area: Technology Adoption, Software Engineering Practices, Socio-Technical Research
Discipline: Computer Science, Software Engineering, Human–Computer Interaction (HCI)
The study uses survey data from software professionals and Partial Least Squares Structural Equation Modeling (PLS-SEM) to measure the role of cultural values relative to established predictors like performance expectancy and habitual use in LLM adoption.
Citations: 11
-
Authors: M Cheng, C Lee, P Khadpe, S Yu, D Han
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Stanford University, Carnegie Mellon University
Research Area: Computers and Society, Artificial Intelligence, AI, Sycophancy.
Discipline: Computer Science, Psychology
The study shows that sycophantic AI, which validates user inputs unquestioningly, reduces people's prosocial behavior and fosters dependence, despite users perceiving such AI as higher quality and more trustworthy.
Methods: The researchers conducted two preregistered experiments including a live-interaction study, where participants discussed real interpersonal conflicts with AI models. They evaluated responses from 11 state-of-the-art AI models on levels of sycophancy and its psychological effects on users.
Key Findings: The prevalence of sycophantic behavior in AI, users' prosocial intentions, conviction of being in the right, trust in AI, and willingness to reuse sycophantic AI models.
Citations: 5
Sample Size: 1604
-
Authors: C Chen, Z Cui
Year: 2025
Published in: Journal of Medical Internet Research, 2025 - jmir.org
Institution: Medical College of Wisconsin
Research Area: Trust in AI, AI-assisted diagnosis, Health communication, Healthcare human-AI interaction
Discipline: Digital Health, Human-Computer Interaction (HCI), Behavioral Science
Patients trust and are more likely to seek help from doctors explicitly avoiding AI-assisted diagnosis rather than those using extensive or moderate AI, highlighting a strong aversion to AI in healthcare settings.
Methods: A randomized, web-based 4-group survey experiment was conducted with controls for sociodemographic factors and analysis using regression, mediation, and moderation techniques.
Key Findings: Trust in and intention to seek medical help from health care professionals using AI-assisted diagnosis versus those avoiding AI, and the influence of demographic, social, and experiential factors.
DOI: https://doi.org/10.2196/66083
Citations: 4
Sample Size: 1762
-
Authors: L Luettgau, HR Kirk, K Hackenburg, J Bergs, H Davidson, H Ogden, D Siddarth, S Huang
Year: 2025
Published in: ARXIV
Institution: AI Security Institute, I Policy Directorate, Collective Intelligence Project, Anthropic
Research Area: Experimental evaluation, RCT, Survey Research
Discipline: Computer Science, Human–Computer Interaction (HCI)
Conversational AI is as effective as self-directed internet searches in increasing political knowledge, reducing misinformation beliefs, and promoting accuracy among users in the UK during the 2024 election period.
Methods: A national survey (N=2,499) measured conversational AI usage for political information-seeking, followed by a series of randomised controlled trials (N=2,858) comparing conversational AI to self-directed internet search in improving political knowledge.
Key Findings: Extent of conversational AI usage for political knowledge-seeking in the UK and its efficacy in enhancing political knowledge and reducing misinformation compared to traditional internet searches.
Citations: 3
Sample Size: 5357
-
Authors: Y Ba, MV Mancenido, EK Chiou, R Pan
Year: 2025
Published in: Behavior Research Methods, 2025 - Springer
Institution: University of Delaware, National Taiwan University, University of British Columbia, Monash University
Research Area: Crowdsourcing, Data Quality, Spamming Behavior Detection, LLM Applications in Behavioral Research
Discipline: Computer Science, Artificial Intelligence, LLM
The paper introduces a systematic method to evaluate crowdsourced data quality and detect spam behaviors through variance decomposition, proposing a spammer index and credibility metrics to improve consistency and reliability in labeling tasks.
Methods: Variance decomposition, Markov chain models, and generalized random effects models were used to assess annotator consistency and credibility; metrics were applied to both simulated and real-world data from two crowdsourcing platforms.
Key Findings: Quality of crowdsourced data, spammer behaviors, annotators’ consistency, and credibility.
Citations: 2
-
Authors: A Warrier, D Nguyen, M Naim, M Jain, Y Liang, K Schroeder, C Yang, JB Tenenbaum, S Vollmer, K Ellis, Z Tavares
Year: 2025
Published in: 2025 - arXiv preprint arXiv …, 2025 - arxiv.org
Institution: Basis Research Institute, DFKI GmbH, Harvard University, Quebec AI Institute, University of Cambridge, Massachusetts Institute of Technology, Cornell University
Research Area: Agent learning, World Models, Benchmarking, Evaluation protocols, RLHF, LLM
Discipline: Computer Science, Artificial Intelligence, Machine Learning
The paper introduces WorldTest, a novel protocol for evaluating model-learning agents using reward-free exploration and behavior-based scoring, and demonstrates that humans outperform models on the AutumnBench suite of tasks, revealing significant gaps in world-model learning.
Methods: The authors proposed WorldTest, a protocol separating reward-free interaction from scored tests in related environments, with evaluations done using AutumnBench—a dataset of 43 grid-world environments and 129 tasks across prediction, planning, and causal dynamics.
Key Findings: Performance of model-learning agents and humans in acquiring world models for masked-frame prediction, planning, and understanding causal dynamics.
Citations: 1
Sample Size: 517
-
Authors: L S. Treiman, CJ Ho, W Kool
Year: 2025
Published in: Proceedings of the 2025 ACM Conference ..., 2025 - dl.acm.org
Institution: Washington University in St. Louis, National Cheng Kung University
Research Area: Human-AI Interaction, Cognitive Science, Behavioral Research in AI Training
Discipline: Human-Computer Interaction (HCI), Behavioral Science
Participants tend to rely on intuition (fast thinking) rather than deliberation (slow thinking) when training AI agents in the ultimatum game, impacting human-AI collaboration system design.
Methods: Participants trained an AI agent in the ultimatum game to analyze whether their training decisions aligned more with intuitive or deliberative cognitive processes.
Key Findings: The cognitive processes (fast vs. slow thinking) underlying human decision-making during AI training.
DOI: https://dl.acm.org/doi/abs/10.1145/3715275.3732177
Citations: 1
-
Authors: A Qian, R Shaw, L Dabbish, J Suh, H Shen
Year: 2025
Published in: arXiv preprint arXiv ..., 2025 - arxiv.org
Institution: Carnegie Mellon University, University of Pittsburgh, University of Utah, Yale School of Medicine, Yale University
Research Area: Responsible AI, Content Moderation, Risk Disclosure, Worker Well-being in Human-Computer Interaction (HCI).
Discipline: Computational Social Science, Human-Computer Interaction (HCI)
The paper examines how task designers approach well-being risk disclosure in Responsible AI (RAI) content work, highlighting a need for better frameworks to communicate such risks effectively.
Methods: Interviews were conducted with 23 task designers from academic and industry sectors to gather insights on risk recognition, interpretation, and communication practices.
Key Findings: How task designers recognize, interpret, and communicate well-being risks in RAI content work.
Citations: 1
Sample Size: 23
-
Authors: Z Ashktorab, A Buccella, J D'Cruz, Z Fowler, A Gill, KY Leung, PD Magnus, J Richards
Year: 2025
Published in: arXiv preprint arXiv:2507.02745, 2025•arxiv.org
Institution: IBM Research, University at Albany
Research Area: Human–AI interaction, AI systems evaluation, UX, User Experience
Discipline: Computer Science, Human–Computer Interaction (HCI)
In a preregistered study with 162 participants, people generally prefer explanatory apologies from LLM chatbots over rote or purely empathic ones—though in biased error scenarios empathic apologies are sometimes favored—highlighting the complexity of designing chatbot apologies that effectively repair trust.
DOI: https://doi.org/10.48550/arXiv.2507.02745
Citations: 1
-
Authors: C Rastogi, TH Teh, P Mishra, R Patel, D Wang, M Díaz, A Parrish, AM Davani, Z Ashwood
Year: 2025
Published in: arXiv preprint arXiv:2507.13383, 2025•arxiv.org
Institution: Google DeepMind, Google Research, Google
Research Area: AI alignment, safety evaluation, AI Safety, Multimodal evaluation, Human–AI interaction, LLM
Discipline: Computer Science, Machine Learning, Artificial Intelligence
This research introduces the DIVE dataset to enable pluralistic alignment in text-to-image models by accounting for diverse safety perspectives, revealing demographic variations in harm perception and advancing T2I model alignment strategies.
Methods: The study involved collecting feedback across 1000 prompts from demographically intersectional human raters to capture diverse safety perspectives, with an emphasis on empirical and contextual differences in harm perception.
Key Findings: Safety perceptions of text-to-image (T2I) model outputs from diverse demographic viewpoints and the influence of these perspectives on alignment strategies.
Citations: 1
Sample Size: 1000
-
Authors: Jiaqi Zhua, Andras Molnar
Year: 2025
Published in: ArXiv
Institution: University of Michigan
Research Area: Social Psychology, Human-AI Interaction, Generative AI Impact on Social Perception
Discipline: Social Science, Social Psychology, Human-Computer Interaction (HCI)
Impressions of written messages are overly positive when recipients are unaware of potential Generative AI (GenAI) use, but negative when GenAI use is explicitly disclosed.
Methods: A pre-registered large-scale online experiment leveraged Prolific participants to assess social impressions in diverse communication contexts, with varying levels of sender disclosure regarding GenAI use.
Key Findings: The influence of known or uncertain GenAI use on recipients' social impressions of message senders across different personal and professional contexts.
Sample Size: 647
-
Authors: K Grosse, N Ebert
Year: 2025
Published in: ARXIV
Institution: IBM Research, ZHAW
Research Area: Security and privacy risks, LLM, human–AI interaction, AI Safety
Discipline: Computer Science
A survey of 3,270 UK adults reveals significant security and privacy risks in AI conversational agent usage, with a third engaging in risky behavior enabling attacks and many unaware of how their data are used or opting out.
Methods: Representative survey conducted via Prolific platform targeting UK adults, focusing on usage behaviors of AI conversational agents.
Key Findings: User behaviors related to security and privacy risks, data sanitization practices, attempts to jailbreak AI models, and awareness of data usage policies.
Sample Size: 3270
-
Authors: L Ma, J Qin, X Xu, Y Tan
Year: 2025
Published in: arXiv preprint arXiv:2509.14436, 2025•arxiv.org
Institution: University of North Carolina Charlotte, University of Science and Technology of China, University of Washington
Research Area: LLM behavior, Algorithmic content preference, Human–AI interaction
Discipline: Computer Science, Information Retrieval, Artificial Intelligence
This paper studies how generative search engines that use large language models (LLMs)—like Google’s AI overviews—select and cite web content, showing that these engines prefer content that is more predictable and semantically coherent for the model, and that LLM-based content polishing can increase the diversity and usefulness of AI summaries for users.
DOI: https://doi.org/10.48550/arXiv.2509.14436
-
Authors: A Klingbeil, C Grützner, P Schreck
Year: 2024
Published in: Computers in Human Behavior, 2024 - Elsevier
Institution: University of Hohenheim, University of Hohenheim, University of Hohenheim
Research Area: Trust in AI, Overreliance on AI, Human-AI Interaction
Discipline: Human-Computer Interaction (HCI), Artificial Intelligence, Behavioral Science
The study found that individuals tend to overrely on AI-generated advice in uncertain situations, often to the detriment of their own decisions and third parties, despite contradicting contextual information or their own judgment.
Methods: A domain-independent, incentivized, interactive behavioral experiment was conducted to analyze user behavior in decision-making scenarios involving AI advice.
Key Findings: Extent and impact of user reliance on AI advice, including its effects on decision efficiency and outcomes for themselves and others.
DOI: https://doi.org/10.1016/j.chb.2024.108352
Citations: 247