Academic Publication

A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning

Citations

March 31, 2025

Published Date

Research Abstract & Technology Focus

In recent years, numerous Machine Learning (ML) models, including Deep Learning (DL) and classic ML models, have been developed to detect software vulnerabilities. However, there is a notable lack of comprehensive and systematic surveys that summarize, classify, and analyze the applications of these ML models in software vulnerability detection. This absence may lead to critical research areas being overlooked or under-represented, resulting in a skewed understanding of the current state of the art in software vulnerability detection. To close this gap, we propose a comprehensive and systematic literature review that characterizes the different properties of ML-based software vulnerability detection systems using six major Research Questions (RQs).

Using a custom web scraper, our systematic approach involves extracting a set of studies from four widely used online digital libraries: ACM Digital Library, IEEE Xplore, ScienceDirect, and Google Scholar. We manually analyzed the extracted studies to filter out irrelevant work unrelated to software vulnerability detection, followed by creating taxonomies and addressing RQs. Our analysis indicates a significant upward trend in applying ML techniques for software vulnerability detection over the past few years, with many studies published in recent years. Prominent conference venues include the International Conference on Software Engineering (ICSE), the International Symposium on Software Reliability Engineering (ISSRE), the Mining Software Repositories (MSR) conference, and the ACM International Conference on the Foundations of Software Engineering (FSE), whereas
Information and Software Technology
(IST),
Computers & Security
(C&S), and
Journal of Systems and Software
(JSS) are the leading journal venues.

Our results reveal that 39.1% of the subject studies use hybrid sources, whereas 37.6% of the subject studies utilize benchmark data for software vulnerability detection. Code-based data are the most commonly used data type among subject studies, with source code being the predominant subtype. Graph-based and token-based input representations are the most popular techniques, accounting for 57.2% and 24.6% of the subject studies, respectively. Among the input embedding techniques, graph embedding and token vector embedding are the most frequently used techniques, accounting for 32.6% and 29.7% of the subject studies. Additionally, 88.4% of the subject studies use DL models, with recurrent neural networks and graph neural networks being the most popular subcategories, whereas only 7.2% use classic ML models. Among the vulnerability types covered by the subject studies, CWE-119, CWE-20, and CWE-190 are the most frequent ones. In terms of tools used for software vulnerability detection, Keras with TensorFlow backend and PyTorch libraries are the most frequently used model-building tools, accounting for 42 studies for each. In addition, Joern is the most popular tool used for code representation, accounting for 24 studies.
Finally, we summarize the challenges and future directions in the context of software vulnerability detection, providing valuable insights for researchers and practitioners in the field.

Read Full Literature

AI Semantic Synergy Context

Connecting this academic literature to real-world market discussions and products.

Advancing cybersecurity: a comprehensive review of AI-driven detection techniques

AbstractAs the number and cleverness of cyber-attacks keep increasing rapidly, it's more important than ever to have good ways to detect and prevent them. Recognizing cyber threats quickly and accu...

GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning

No description provided.

Physics-informed machine learning: A comprehensive review on applications in anomaly detection and condition monitoring

No description provided.

E-Commerce Fraud Detection Based on Machine Learning Techniques: Systematic Literature Review

No description provided.

Address Snyk and Socket security audit findings in skill docs

Security audits by Snyk and Socket identified critical vulnerabilities in the 'codebase-to-course' skill, including risky credential handling, third-party content exposure from arbitrary repo intak...

Frequently Asked Questions (FAQ)

Curated market intelligence mapped to this research.

What is the core focus of the research titled 'A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning'?

This literature focuses on: In recent years, numerous Machine Learning (ML) models, including Deep Learning (DL) and classic ML models, have been developed to detect software vulnerabilities. However, there is a notable lack of comprehensive and systematic surveys that summa...

Are there open-source GitHub repositories related to A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning?

Yes, open-source projects like wanshuiyin/Auto-claude-code-research-in-sleep (ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and exper...) are actively building upon these concepts.

Which startups are commercializing the technology behind A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning?

Products like Brila are bringing this to market. Their focus is: One-page websites from real Google Maps reviews.

What other academic literature is closely related to 'A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning'?

Yes, highly correlated activity was mapped. An entry titled 'Advancing cybersecurity: a comprehensive review of AI-driven detection techniques' discusses this: AbstractAs the number and cleverness of cyber-attacks keep increasing rapidly, it's more important than ever to have good ways to detect and preven...

Cite this Market Intelligence Report

Reference our AI-mapped synergy between this research and the commercial market to instantly build authority.

"Commercial Applications of A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning." ROIpad Intelligence Index, 2026. Available at: https://roipad.com/saas-metrics/research/cr_MTAuMTE0NS8zNjk5NzEx/a-systematic-literature-review-on-automated-software-vulnerability-detection-using-machine-learning

Commercial Realization

Startups and Open Source tools heavily associated with the concepts explored in this paper.

GitHub
wanshuiyin/Auto-claude-code-research-in-sleep
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills...
GitHub
Narcooo/inkos
Autonomous novel writing CLI agent — AI agents write, audit, and re...
Product Hunt
Brila
One-page websites from real Google Maps reviews
Product Hunt
LaReview
Open-source free next-generation code review

Associated Media Narrative

radia-mcp 1.4.7
Pypi.org • Jul 15, 2026
Junk Punch #2 Preview: Goals Gone Wild, Science Gets Steamy
Bleeding Cool News • Jul 13, 2026
Beatbot AquaSense X Review: A Pool Robot That Cleans Itself
Wired • Jul 11, 2026