A dataset and analysis of 178 AI models' writing styles, identifying similarity clusters and distinctiveness based on 3,095 standardized AI responses.
Raw Developer Origin & Technical Request
Hacker News
Apr 8, 2026
We have a dataset of 3,095 standardized AI responses across 43 prompts. From each response, we extract a 32-dimension stylometric fingerprint (lexical richness, sentence structure, punctuation habits, formatting patterns, discourse markers).Some findings:- 9 clone clusters (>90% cosine similarity on z-normalized feature vectors)
- Mistral Large 2 and Large 3 2512 score 84.8% on a composite metric combining 5 independent signals
- Gemini 2.5 Flash Lite writes 78% like Claude 3 Opus. Costs 185x less
- Meta has the strongest provider "house style" (37.5x distinctiveness ratio)
- "Satirical fake news" is the prompt that causes the most writing convergence across all models
- "Count letters" causes the most divergenceThe composite clone score combines: prompt-controlled head-to-head similarity, per-feature Pearson correlation across challenges, response length correlation, cross-prompt consistency, and aggregate cosine similarity.Tech: stylometric extraction in Node.js, z-score normalization, cosine similarity for aggregate, Pearson correlation for per-feature tracking. Analysis script is ~1400 lines.
Developer Debate & Comments
Frequently Asked Questions
Market intelligence mapped to A dataset and analysis of 178 AI models' writing styles, identifying similarity clusters and distinctiveness based on 3,095 standardized AI responses..
What is the technical positioning of A dataset and analysis of 178 AI models' writing styles, identifying similarity clusters and distinctiveness based on 3,095 standardized AI responses.?
What is the general sentiment around A dataset and analysis of 178 AI models' writing styles, identifying similarity clusters and distinctiveness based on 3,095 standardized AI responses.?
What are the foundational technologies related to A dataset and analysis of 178 AI models' writing styles, identifying similarity clusters and distinctiveness based on 3,095 standardized AI responses.?
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like Node.js and cosine similarity by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
SaaS Metrics