v0.4.0: local embeddings via quantized Gemma 4 (no API cost)

safishamsi/graphify

Status: Open

Opened: Apr 6, 2026

## Summary Add an optional local embedding pass using a quantized model — leading candidate is **Gemma 4** (Q4/Q8 via `llama.cpp` or `ollama`) — to generate `semantically_similar_to` edges across all nodes without any API calls. ## Motivation Currently, semantic similarity edges come from Claude's judgment during extraction — one pass per file, subjective, and costs API tokens. A local embedding pass would: - Generate embeddings for every node (label + docstring) after the AST and semantic passes - Add cosine-similarity edges above a configurable threshold, marked `INFERRED` - Make cross-file concept linking exhaustive rather than sampled - Work fully offline, cached per-node alongside the existing SHA256 file cache - Cost zero API tokens after the initial model download The two approaches complement rather than replace each other — Claude finds the *interesting* cross-cutting edges, local embeddings find the *exhaustive* ones. Both end up in the same graph. ## Design **Model**: Gemma 4 Q4 or Q8 via `llama.cpp` or `ollama`. Produces strong semantic embeddings for code + text at ~2-4GB RAM, no GPU required. **Pipeline position**: after Part C (build + cluster), before export. Reads all node labels + docstrings, generates embeddings in batch, computes pairwise cosine similarity, adds edges above threshold. **Threshold**: configurable, default ~0.82. Exposed as `--embed-threshold 0.82`. **Backend**: support both `llama-cpp-python` and `ollama` client, auto-detect which...

Python

View on GitHub ↗