A dataset and analysis of 178 AI models' writing styles, identifying similarity clusters and distinctiveness based on 3,095 standardized AI responses.
Raw Developer Origin & Technical Request
Hacker News
Apr 8, 2026
We have a dataset of 3,095 standardized AI responses across 43 prompts. From each response, we extract a 32-dimension stylometric fingerprint (lexical richness, sentence structure, punctuation habits, formatting patterns, discourse markers).Some findings:- 9 clone clusters (>90% cosine similarity on z-normalized feature vectors)
- Mistral Large 2 and Large 3 2512 score 84.8% on a composite metric combining 5 independent signals
- Gemini 2.5 Flash Lite writes 78% like Claude 3 Opus. Costs 185x less
- Meta has the strongest provider "house style" (37.5x distinctiveness ratio)
- "Satirical fake news" is the prompt that causes the most writing convergence across all models
- "Count letters" causes the most divergenceThe composite clone score combines: prompt-controlled head-to-head similarity, per-feature Pearson correlation across challenges, response length correlation, cross-prompt consistency, and aggregate cosine similarity.Tech: stylometric extraction in Node.js, z-score normalization, cosine similarity for aggregate, Pearson correlation for per-feature tracking. Analysis script is ~1400 lines.
Developer Debate & Comments
Frequently Asked Questions
Market intelligence mapped to A dataset and analysis of 178 AI models' writing styles, identifying similarity clusters and distinctiveness based on 3,095 standardized AI responses..
How is A dataset and analysis of 178 AI models' writing styles, identifying similarity clusters and distinctiveness based on 3,095 standardized AI responses. positioned in the market?
How is the developer community reacting to A dataset and analysis of 178 AI models' writing styles, identifying similarity clusters and distinctiveness based on 3,095 standardized AI responses.?
What are the foundational technologies related to A dataset and analysis of 178 AI models' writing styles, identifying similarity clusters and distinctiveness based on 3,095 standardized AI responses.?
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like Node.js and cosine similarity by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
SaaS Metrics