Token compression/cost optimization for LLM interactions.
Raw Developer Origin & Technical Request
GitHub Issue
Apr 6, 2026
First, I want to say that this is a super fun project!
Now wanted to clarify some stuff you probably already know but I feel frames this project dishonestly.
### 1. Tokens ≠ Words
The README uses "tokens" and "words" interchangeably ("75% less word", "few token do trick").
Tokens are subword units — "polymorphism" can be 3+ tokens, "useMemo" is 2. The 69→19
counts in the examples don't appear to be validated with an actual tokenizer, and look
closer to word counts on cherry-picked examples.
### 2. The skill itself costs input tokens
The skill file (~1.28 KB, ~300–350 tokens) is injected as context on **every request**.
The README only accounts for output token savings, but:
- Input tokens are also billed
- The skill overhead is **fixed per request**, regardless of response length
- For short responses, the overhead can exceed the savings
The net cost is: `(output tokens saved) - (skill input tokens added)`. Break-even only
happens above a certain response length. The "75% less cost" claim doesn't hold for
short interactions.
---
Happy to submit a PR with corrected wording if you're open to it.
Developer Debate & Comments
No active discussions extracted for this entry yet.
Adjacent Repository Pain Points
Other highly discussed features and pain points extracted from JuliusBrussee/caveman.
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like tokens and words by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
Market Trends