← Back to Research Radar
Academic Publication Academic Publication

An Empirical Study of the Non-Determinism of ChatGPT in Code Generation

133
Citations
February 28, 2025
Published Date

Research Abstract & Technology Focus

There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. However, results from LLMs can be highly unstable; non-deterministically returning very different code for the same prompt. Such non-determinism affects the correctness and consistency of the generated code, undermines developers’ trust in LLMs, and yields low reproducibility in LLM-based papers. Nevertheless, there is no work investigating how serious this non-determinism threat is.

To fill this gap, this article conducts an empirical study on the non-determinism of ChatGPT in code generation. We chose to study ChatGPT because it is already highly prevalent in the code generation research literature. We report results from a study of 829 code generation problems across three code generation benchmarks (i.e., CodeContests, APPS and HumanEval) with three aspects of code similarities: semantic similarity, syntactic similarity, and structural similarity. Our results reveal that ChatGPT exhibits a high degree of non-determinism under the default setting: the ratio of coding tasks with zero equal test output across different requests is 75.76%, 51.00% and 47.56% for three different code generation datasets (i.e., CodeContests, APPS and HumanEval), respectively. In addition, we find that setting the
temperature
to 0 does not guarantee determinism in code generation, although it indeed brings less non-determinism than the default configuration (
temperature

\(=\)

1). In order to put LLM-based research on firmer scientific foundations, researchers need to take into account non-determinism in drawing their conclusions.
Read Full Literature

AI Semantic Synergy Context

Connecting this academic literature to real-world market discussions and products.

crossref.org › academic paper
100%
🔥

An Empirical Study of the Non-Determinism of ChatGPT in Code Generation

There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. However, results from LLMs can be highly unstable; non-d...

crossref.org › academic paper
0%

Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues

Since its introduction in November 2022, ChatGPT has rapidly gained popularity due to its remarkable ability in language understanding and human-like responses. ChatGPT, based on GPT-3.5 architectu...

crossref.org › academic paper
0%

Self-Collaboration Code Generation via ChatGPT

Although large language models (LLMs) have demonstrated remarkable code-generation ability, they still struggle with complex tasks. In real-world software development, humans usually tackle complex...

crossref.org › academic paper
0%

The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis

No description provided.

roipad.com › trend story
0%

ChatGPT’s ‘Adult Mode’ Could Spark a New Era of Intimate Surveillance

OpenAI plans to allow sexting with ChatGPT. A human-AI interaction expert warns of a privacy nightmare.

Frequently Asked Questions (FAQ)

Curated market intelligence mapped to this research.

What is the core focus of the research titled 'An Empirical Study of the Non-Determinism of ChatGPT in Code Generation'?

This literature focuses on: There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. However, results from LLMs can be highly unstable; non-deterministically returning very different code for...

Are there open-source GitHub repositories related to An Empirical Study of the Non-Determinism of ChatGPT in Code Generation?

Yes, open-source projects like DanOps-1/Gpt-Agreement-Payment (ChatGPT Plus/Team/Pro 订阅协议端到端重放工具集 · hCaptcha 视觉求解器 · 反欺诈机制实证研究 / End-to-end protocol replay toolkit for ChatGPT Plus/Tea...) are actively building upon these concepts.

Which startups are commercializing the technology behind An Empirical Study of the Non-Determinism of ChatGPT in Code Generation?

Products like Study OS are bringing this to market. Their focus is: A minimalist focus timer with tasks, notes & study music.

What other academic literature is closely related to 'An Empirical Study of the Non-Determinism of ChatGPT in Code Generation'?

Yes, highly correlated activity was mapped. An entry titled 'An Empirical Study of the Non-Determinism of ChatGPT in Code Generation' discusses this: There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. Howeve...

Are there commercial applications of 'An Empirical Study of the Non-Determinism of ChatGPT in Code Generation' in market news publications?

Yes, highly correlated activity was mapped. An entry titled 'ChatGPT’s ‘Adult Mode’ Could Spark a New Era of Intimate Surveillance' discusses this: OpenAI plans to allow sexting with ChatGPT. A human-AI interaction expert warns of a privacy nightmare.

Cite this Market Intelligence Report

Reference our AI-mapped synergy between this research and the commercial market to instantly build authority.

Commercial Realization

Startups and Open Source tools heavily associated with the concepts explored in this paper.

Associated Media Narrative