Needle – A 26M parameter open-source function-calling (tool use) model, distilled from Gemini, designed for efficient execution on consumer devices using a Simple Attention Networks architecture.
Raw Developer Origin & Technical Request
Hacker News
May 13, 2026
Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale.Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere. Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...).Training:
- Pretrained on 200B tokens across 16 TPU v6e (27 hours)
- Post-trained on 2B tokens of synthesized function-calling data (45 minutes)
- Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)You can test it right now and finetune on your Mac/PC: github.com/cactus-compute/ne... full writeup on the architecture is here: github.com/cactus-compute/ne... found that the "no FFN" finding generalizes beyond function calling to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input. Experimental results to published.While it beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, LFM2.5-350M on single-shot function calling, those models have more scope/capacity and excel in conversational settings. We encourage you to test on your own tools via the playground and finetune accordingly.This is part of our broader work on Cactus (github.com/cactus-compute/ca... an inference engine built from scratch for mobile, wearables and custom hardware. We wrote about Cactus here previously: news.ycombinator.com/item is MIT licensed. Weights: huggingface.co/Cactus-Compute/ne...
GitHub: github.com/cactus-compute/ne...
Developer Debate & Comments
Frequently Asked Questions
Market intelligence mapped to Needle – A 26M parameter open-source function-calling (tool use) model, distilled from Gemini, designed for efficient execution on consumer devices using a Simple Attention Networks architecture..
How is Needle – A 26M parameter open-source function-calling (tool use) model, distilled from Gemini, designed for efficient execution on consumer devices using a Simple Attention Networks architecture. positioned in the market?
What is the general sentiment around Needle – A 26M parameter open-source function-calling (tool use) model, distilled from Gemini, designed for efficient execution on consumer devices using a Simple Attention Networks architecture.?
What architecture is tied to Needle – A 26M parameter open-source function-calling (tool use) model, distilled from Gemini, designed for efficient execution on consumer devices using a Simple Attention Networks architecture.?
Is anyone launching products related to Needle – A 26M parameter open-source function-calling (tool use) model, distilled from Gemini, designed for efficient execution on consumer devices using a Simple Attention Networks architecture.?
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like Gemini and architecture by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
SaaS Metrics