Unlocking creativity with Artificial Intelligence: Field and experimental evidence on the Goldilocks (curvilinear) effect of human-AI collaboration.
Abstract
Humans create artificial intelligence (AI), but can AI help humans create? Numerous studies show how AI enhances productivity; however, little is known about creativity—another aspect of performance that requires higher level problem-solving. To understand how AI affects the creative process, I conducted two experiments by assigning 139 business professionals and 319 working adults to collaborate in varying degrees with ChatGPT on an entrepreneurial challenge. In contrast to the well-documented positive correlation between AI usage and productivity and early studies suggesting the same for creativity, the present research shows a Goldilocks (curvilinear) effect: Moderate (vs. low or high) human–AI collaboration increases creative performance. This effect, holding across general creativity rated by human judges (either the crowdsourced public or specific trained individuals), business values by entrepreneurs, and AI-evaluated creativity, is explained by the generation of new diverse ideas (i.e., knowledge diversity) rather than problem restructuring during the brainstorming stage. I further replicate the Goldilocks phenomenon with multisource–multiwave surveys among workers in the creative industries (N = 188). Overall, these findings provide timely insights to the broader public regarding the effective approach to working with AI tools, such as ChatGPT, in daily and professional life. This research emphasizes the importance of striking the right balance—not too little, not too much—when working with AI technologies. (PsycInfo Database Record (c) 2025 APA, all rights reserved)
Study specs
Two experiments assigned 139 business professionals and 319 working adults to collaborate with ChatGPT at varying levels, and a follow-up survey among 188 creative industry workers was conducted to replicate findings.
- Authors
- HCB Huang
- Institution
- University of British Columbia
- Discipline
- Experimental Psychology
- Sample Size
- N=646
- Study Type
- Experimental Study
- Year
- 2025
- Human Data Platform
- Prolific
- Source
- View Source Google Scholar
Measured Outcomes
The impact of varying degrees of human-AI collaboration on creative performance, evaluated by human judges, entrepreneurs, and AI metrics.
Peer Review & Critical Discussion
Potential Selection Bias in 2023 Cohort
The participant pool shows a concerning overrepresentation of users from high-income demographics. Looking at Table 3, we can see that 78% of respondents had annual incomes above $75k, which significantly limits the generalizability of these findings to broader populations.
Non-naive Participants Issue
I've noticed a methodological concern regarding participant naivety. Given that Prolific users often complete multiple studies, there's a real risk that participants had prior exposure to similar experimental paradigms, which could confound the results.
RLHF Applicability to This Study Design
The implications for RLHF training pipelines are understated. If we accept the authors' conclusions about preference stability, this has direct consequences for how we should structure reward model training. The temporal decay effect described in Section 4.2 is particularly relevant.
Verify your expertise to join discussion
Create an account and verify your credentials to participate in peer discussions.