Academic Publication

An Empirical Study of the Non-Determinism of ChatGPT in Code Generation

160

Citations

February 28, 2025

Published Date

Research Abstract & Technology Focus

There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. However, results from LLMs can be highly unstable; non-deterministically returning very different code for the same prompt. Such non-determinism affects the correctness and consistency of the generated code, undermines developers’ trust in LLMs, and yields low reproducibility in LLM-based papers. Nevertheless, there is no work investigating how serious this non-determinism threat is.

To fill this gap, this article conducts an empirical study on the non-determinism of ChatGPT in code generation. We chose to study ChatGPT because it is already highly prevalent in the code generation research literature. We report results from a study of 829 code generation problems across three code generation benchmarks (i.e., CodeContests, APPS and HumanEval) with three aspects of code similarities: semantic similarity, syntactic similarity, and structural similarity. Our results reveal that ChatGPT exhibits a high degree of non-determinism under the default setting: the ratio of coding tasks with zero equal test output across different requests is 75.76%, 51.00% and 47.56% for three different code generation datasets (i.e., CodeContests, APPS and HumanEval), respectively. In addition, we find that setting the
temperature
to 0 does not guarantee determinism in code generation, although it indeed brings less non-determinism than the default configuration (
temperature

\(=\)

1). In order to put LLM-based research on firmer scientific foundations, researchers need to take into account non-determinism in drawing their conclusions.

Read Full Literature

AI Semantic Synergy Context

Connecting this academic literature to real-world market discussions and products.

An Empirical Study of the Non-Determinism of ChatGPT in Code Generation

Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues

Since its introduction in November 2022, ChatGPT has rapidly gained popularity due to its remarkable ability in language understanding and human-like responses. ChatGPT, based on GPT-3.5 architectu...

Self-Collaboration Code Generation via ChatGPT

Although large language models (LLMs) have demonstrated remarkable code-generation ability, they still struggle with complex tasks. In real-world software development, humans usually tackle complex...

The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis

No description provided.

ChatGPT’s ‘Adult Mode’ Could Spark a New Era of Intimate Surveillance

OpenAI plans to allow sexting with ChatGPT. A human-AI interaction expert warns of a privacy nightmare.

Frequently Asked Questions (FAQ)

Curated market intelligence mapped to this research.

What is the core focus of the research titled 'An Empirical Study of the Non-Determinism of ChatGPT in Code Generation'?

This literature focuses on: There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. However, results from LLMs can be highly unstable; non-deterministically returning very different code for...

Are there open-source GitHub repositories related to An Empirical Study of the Non-Determinism of ChatGPT in Code Generation?

Yes, open-source projects like DanOps-1/Gpt-Agreement-Payment (ChatGPT Plus/Team/Pro 订阅协议端到端重放工具集 · hCaptcha 视觉求解器 · 反欺诈机制实证研究 / End-to-end protocol replay toolkit for ChatGPT Plus/Tea...) are actively building upon these concepts.

Which startups are commercializing the technology behind An Empirical Study of the Non-Determinism of ChatGPT in Code Generation?

Products like Study OS are bringing this to market. Their focus is: A minimalist focus timer with tasks, notes & study music.

What other academic literature is closely related to 'An Empirical Study of the Non-Determinism of ChatGPT in Code Generation'?

Yes, highly correlated activity was mapped. An entry titled 'An Empirical Study of the Non-Determinism of ChatGPT in Code Generation' discusses this: There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. Howeve...

Are there commercial applications of 'An Empirical Study of the Non-Determinism of ChatGPT in Code Generation' in market news publications?

Yes, highly correlated activity was mapped. An entry titled 'ChatGPT’s ‘Adult Mode’ Could Spark a New Era of Intimate Surveillance' discusses this: OpenAI plans to allow sexting with ChatGPT. A human-AI interaction expert warns of a privacy nightmare.

Cite this Market Intelligence Report

Reference our AI-mapped synergy between this research and the commercial market to instantly build authority.

"Commercial Applications of An Empirical Study of the Non-Determinism of ChatGPT in Code Generation." ROIpad Intelligence Index, 2026. Available at: https://roipad.com/saas-metrics/research/cr_MTAuMTE0NS8zNjk3MDEw/an-empirical-study-of-the-non-determinism-of-chatgpt-in-code-generation

Commercial Realization

Startups and Open Source tools heavily associated with the concepts explored in this paper.

GitHub
DanOps-1/Gpt-Agreement-Payment
ChatGPT Plus/Team/Pro 订阅协议端到端重放工具集 · hCaptcha 视觉求解...
GitHub
study8677/awesome-architecture
🧭 Architecture-first system design: 26 bilingual tutorials, 25 arc...
Product Hunt
Study OS
A minimalist focus timer with tasks, notes & study music
Product Hunt
Skipper Study
Day Skipper theory practice, one round at a time

Associated Media Narrative

Controversial
Seths.blog • Jul 12, 2026
Degree of hypertension and subclinical coronary atherosclerosis in asymptomatic individuals without cardiovascular disease
Plos.org • Jul 10, 2026
The galaxy’s coldest “stars” may actually be alien megastructures
Science Daily • Jul 10, 2026