← Back to Research Radar
Academic Publication Academic Publication

Data-centric Artificial Intelligence: A Survey

164
Citations
May 31, 2025
Published Date

Research Abstract & Technology Focus

Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler of its great success is the availability of abundant and high-quality data for building machine learning models. Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of
data-centric AI
. The attention of researchers and practitioners has gradually shifted from advancing model design to enhancing the quality and quantity of the data. In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals (training data development, inference data development, and data maintenance) and the representative methods. We also organize the existing literature from automation and collaboration perspectives, discuss the challenges, and tabulate the benchmarks for various tasks. We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle. We hope it can help the readers efficiently grasp a broad picture of this field, and equip them with the techniques and further research ideas to systematically engineer data for building AI systems. A companion list of data-centric AI resources will be regularly updated on
https://github.com/daochenzha/data-centric-AI
.
Read Full Literature

AI Semantic Synergy Context

Connecting this academic literature to real-world market discussions and products.

openalex.org › research concept
0%

Artificial Intelligence for Social Good: Transforming the Lives of Tribal Women Entrepreneurs

The convergence of Artificial Intelligence for Social Good (AI4SG) and the marginalized economic sectors is a boundary in the inclusive technological development. The study explores the potential o...

news.ycombinator.com › AI insight
0%

Show HN: Autoresearch@home

Autoresearch@home represents a significant step towards democratizing and decentralizing AI research, particularly in the realm of large language models. By framing itself as "SETI@home, but for mo...

roipad.com › trend story
0%

Dev runs data-center AI model on MacBook — and it changes everything

A developer just pulled off running a massive data-center AI model on a MacBook Pro. And it may show Apple is winning the AI race after all. (via Cult of Mac - Your source for the latest Apple news...

roipad.com › trend story
0%

An experimental guide to Answer Engine Optimization

How I optimized my marketing site for AI answer engines. Markdoc, llms.txt, middleware rewrites, and the signals that actually matter.

crossref.org › academic paper
0%

Promises and challenges of generative artificial intelligence for human learning

No description provided.

Frequently Asked Questions (FAQ)

Curated market intelligence mapped to this research.

What is the core focus of the research titled 'Data-centric Artificial Intelligence: A Survey'?

This literature focuses on: Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler of its great success is the availability of abundant and high-quality data for building machine learning models. Recently, the role of data in AI has ...

Are there open-source GitHub repositories related to Data-centric Artificial Intelligence: A Survey?

Yes, open-source projects like BigBodyCobain/Shadowbroker (Open-source intelligence for the global theater. Track everything from the corporate/private jets of the wealthy, and spy satellites, to seismic ev...) are actively building upon these concepts.

Which startups are commercializing the technology behind Data-centric Artificial Intelligence: A Survey?

Products like JusticeFlow - AI-Powered Legal Practice are bringing this to market. Their focus is: Legal Practice Management with AI-Powered Intelligence.

What other academic literature is closely related to 'Data-centric Artificial Intelligence: A Survey'?

Yes, highly correlated activity was mapped. An entry titled 'Artificial Intelligence for Social Good: Transforming the Lives of Tribal Women Entrepreneurs' discusses this: The convergence of Artificial Intelligence for Social Good (AI4SG) and the marginalized economic sectors is a boundary in the inclusive technologic...

How is the concept of 'Data-centric Artificial Intelligence: A Survey' being discussed by engineers on Hacker News?

Yes, highly correlated activity was mapped. An entry titled 'Show HN: Autoresearch@home' discusses this: Autoresearch@home represents a significant step towards democratizing and decentralizing AI research, particularly in the realm of large language m...

Cite this Market Intelligence Report

Reference our AI-mapped synergy between this research and the commercial market to instantly build authority.

Commercial Realization

Startups and Open Source tools heavily associated with the concepts explored in this paper.

Associated Media Narrative