Vocab file generation (`vocab.bin`) for the C decoder in Flash-MoE.
Raw Developer Origin & Technical Request
GitHub Issue
Mar 21, 2026
It seems this file is also missing. An LLM created it after some analysis, and it worked for me.
```
#!/usr/bin/env python3
"""Export vocab.bin for the C decoder (token_id -> string mapping).
Usage: python export_vocab.py [tokenizer.json] [output.bin]
Binary format (must match load_vocab() in infer.m):
uint32_t num_entries
uint32_t max_id
for each entry (sorted by token_id):
uint16_t byte_len
char[byte_len] (UTF-8 bytes)
"""
import json
import struct
import sys
import os
def find_tokenizer():
"""Search common locations for tokenizer.json."""
candidates = []
if len(sys.argv) > 1:
candidates.append(sys.argv[1])
candidates.extend(
[
"tokenizer.json",
"metal_infer/tokenizer.json",
"../tokenizer.json",
]
)
# HuggingFace cache
hf_base = os.path.expanduser("~/.cache/huggingface/hub")
if os.path.isdir(hf_base):
for root, dirs, files in os.walk(hf_base):
if "tokenizer.json" in files:
candidates.append(os.path.join(root, "tokenizer.json"))
for p in candidates:
if os.path.isfile(p):
return p
return None
def main():
tok_path = find_tokenizer()
out_path = (
sys.argv[-1]
if len(sys.argv) > 1 and not sys.argv[-1].endswith(".json")
else "vocab.bin"
)
if not tok_path:
print("ERROR: tokenizer.json not found. Pass as first argument:")
print(" python export...
Developer Debate & Comments
No active discussions extracted for this entry yet.
Adjacent Repository Pain Points
Other highly discussed features and pain points extracted from danveloper/flash-moe.
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like vocab.bin missing and C decoder by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
Market Trends