Data Quality, Strategic Importance
Data Set
AI Synthesis & Market Narrative
Data set quality is a critical concern, with numerous AI disease-prediction models found to be trained on dubious data, highlighting significant risks in AI deployment. Concurrently, large-scale data sets, such as pangenomes in genomics and seismic data in geophysics, are foundational for advanced research, while national performance in AI model development is increasingly benchmarked, underscoring the strategic importance of robust data.
Correlated Linguistic Patterns
["AI disease-prediction models trained on dubious data"
"pangenome"
"AI model performance"
"data quality"]
Curiosity Velocity (60 Days)
WIKIPEDIA API
Tracing the intersection of media narratives and actual public search interest. Dashed line is 7-day SMA.
Driving Media Context
Colossal Biosciences said it cloned red wolves. Is it for real?
If you want to capture something wolflike, it’s best to embark before dawn. So on a morning this January, with the eastern horizon still pink-hued, I drove w...
Dozens of AI disease-prediction models were trained on dubious data
The models are designed to predict someone’s risk of diabetes or stroke. A few might already have been used on patients.
logrittr: A Verbose Pipe Operator for Logging dplyr Pipelines
dplyr verbs are descriptive: let’s make them more verbose!
Yet another pipe for R.
Motivation
In SAS, every DATA step prints a log:
NOTE: There were...
Want to understand the current state of AI? Check out these charts.
If you’re following AI news, you’re probably getting whiplash. AI is a gold rush. AI is a bubble. AI is taking your job. AI can’t even read a clock. The 2026...
Swell-driven bursts of 26 s and 16 s seismic spectral peaks in the Gulf of Guinea
The study suggests that long-period seismic bursts are triggered by ocean swell through the activation of cracks and fluid movement in the crust, linking wav...
The DOJ Misled a Judge About How It’s Using Voter Roll Data
The acting head of the DOJ’s voting section told a judge last week that the agency had not touched the nonpublic voter roll data it has collected. That wasn’...
Question hangs as UFO data set for release: If aliens exist, what would they think of us?
For generations, human beings have wondered: What would alien life from another planet be like? But we rarely ask the opposite: What would they think of us? ...
Young People Are Falling Behind, but Not Because of AI
The case that AI is already stealing young people’s jobs is based on a statistical mirage.
Why I made a river my co-author
Anne Poelina gives first authorship to a source with deep knowledge about water — the river itself.
Same model, better shape: why centering improves MCMC
The Emergency departments leading the transformation of Alzheimer’s and dementia care (ED-LEAD) study, which I have written about in the past, is approaching...
SaaS Metrics