← Back to AI Insights
Gemini Executive Synthesis

Graphify's query mechanism, evolving from keyword-based BFS to embedding-based semantic search.

Technical Positioning
An AI coding assistant skill that turns code/docs into a queryable knowledge graph.
SaaS Insight & Market Implications
This issue outlines a critical upgrade for Graphify's query functionality, transitioning from keyword-based BFS to embedding-based semantic search. The current limitation, requiring exact string matches, severely restricts its utility as an 'understanding tool.' The proposed shift to semantic search, leveraging local embedding backends like `sentence-transformers` or optional API-based solutions, directly addresses this pain point. By enabling queries based on conceptual meaning rather than literal keywords, Graphify significantly enhances its value proposition for developers navigating complex codebases. This evolution positions Graphify as a more intelligent, intuitive tool for code comprehension and analysis. For B2B SaaS, this feature is a strong differentiator, moving the product beyond basic search to advanced semantic understanding, which is crucial for enterprise-level code intelligence and modernization efforts.
Proprietary Technical Taxonomy
BFS keyword matching grep with graph traversal embedding-based semantic search find concepts by meaning `sentence-transformers` `all-MiniLM-L6-v2` OpenAI embeddings API nomic-embed via ollama

Raw Developer Origin & Technical Request

Source Icon GitHub Issue Apr 6, 2026
Repo: safishamsi/graphify
v3: semantic query with embeddings

## Problem

Current `/graphify query` is BFS keyword matching - same as grep with graph traversal. Searching "find what handles authentication" only works if the word "auth" appears in node labels.

## Goal

Replace keyword BFS with embedding-based semantic search so queries find concepts by meaning, not exact string match.

## Plan

**Embedding backend (local by default):**
- `sentence-transformers` with `all-MiniLM-L6-v2` (80MB, no API key, works offline)
- Optional: OpenAI embeddings API, nomic-embed via ollama

**What changes:**
- On graph build, embed every node label + source context, store vectors in `graph.json`
- `/graphify query` computes query embedding, ranks nodes by cosine similarity, then does BFS from top-k hits
- `semantically_similar_to` edge detection can use embeddings instead of LLM (faster, cheaper)
- Node similarity surfaced in graph visualization

**New optional dependency:**
```
pip install graphifyy[embeddings]
```

## Why this matters

This is the difference between a search tool and an understanding tool. "Find what connects the optimizer to the attention mechanism" should work even if those exact words don't appear together anywhere in the codebase.

Developer Debate & Comments

No active discussions extracted for this entry yet.

Adjacent Repository Pain Points

Other highly discussed features and pain points extracted from safishamsi/graphify.

Extracted Positioning
Graphify's worked examples and their completeness, specifically the `graph.html` output.
An AI coding assistant skill that turns code/docs into a queryable knowledge graph.
Extracted Positioning
Graphify's semantic similarity feature, specifically adding local embeddings via quantized models (Gemma 4).
An AI coding assistant skill that turns code/docs into a queryable knowledge graph.
Extracted Positioning
Graphify's user onboarding and visualization of its output.
An AI coding assistant skill that turns code/docs into a queryable knowledge graph.
Extracted Positioning
Security vulnerabilities in Graphify's `_fetch_tweet` function (SSRF) and Neo4j Cypher export (injection).
An AI coding assistant skill that turns code/docs into a queryable knowledge graph.
Extracted Positioning
Graphify's language support expansion to include COBOL.
An AI coding assistant that turns code into a queryable knowledge graph.

Engagement Signals

0
Replies
open
Issue Status

Cross-Market Term Frequency

Quantifies the cross-market adoption of foundational terms like LLM and cosine similarity by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.