Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model
We Distilled Gemini Tool Calling into a 26M Model. It positions itself as a lightweight, efficient solution for agentic models on budget consumer devices, arguing that 'massive models are overkill' for tool calling, which is fundamentally 'retrieval-and-assembly.'
View Origin LinkProduct Positioning & Context
- Pretrained on 200B tokens across 16 TPU v6e (27 hours)
- Post-trained on 2B tokens of synthesized function-calling data (45 minutes)
- Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)You can test it right now and finetune on your Mac/PC: https://github.com/cactus-compute/needleThe full writeup on the architecture is here: https://github.com/cactus-compute/needle/blob/main/docs/simp...We found that the "no FFN" finding generalizes beyond function calling to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input. Experimental results to published.While it beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, LFM2.5-350M on single-shot function calling, those models have more scope/capacity and excel in conversational settings. We encourage you to test on your own tools via the playground and finetune accordingly.This is part of our broader work on Cactus (https://github.com/cactus-compute/cactus), an inference engine built from scratch for mobile, wearables and custom hardware. We wrote about Cactus here previously: https://news.ycombinator.com/item?id=44524544Everything is MIT licensed. Weights: https://huggingface.co/Cactus-Compute/needle
GitHub: https://github.com/cactus-compute/needle
Related Ecosystem & Alternatives
Discover adjacent products, open-source repositories, and developer tools sharing similar technical architecture.
Deep-Dive FAQs
What is Needle: We Distilled Gemini Tool Calling into a 26M Model?
Where did Needle: We Distilled Gemini Tool Calling into a 26M Model originate?
When was Needle: We Distilled Gemini Tool Calling into a 26M Model publicly launched?
How popular is Needle: We Distilled Gemini Tool Calling into a 26M Model?
Which technical categories define Needle: We Distilled Gemini Tool Calling into a 26M Model?
What are some commercial alternatives to Needle: We Distilled Gemini Tool Calling into a 26M Model?
How does the creator describe Needle: We Distilled Gemini Tool Calling into a 26M Model?
Community Voice & Feedback
I had a thing[1] over 10 years ago that could handle this kind of problem using SPARQL and knowledge graphs.My question is how effective is it at handling ambiguity.Can I send it something like a text message "lets catch up at coffee tomorrow 10:00" and a command like "save this" and have it choose a "add appointment" action from hundreds (or even tens) of possible tools?[1] https://github.com/nlothian/Acuitra/wiki/About
Discovery Source
Hacker News Aggregated via automated community intelligence tracking.
Tech Stack Dependencies
No direct open-source NPM package mentions detected in the product documentation.
Media Tractions & Mentions
No mainstream media stories specifically mentioning this product name have been intercepted yet.
Deep Research & Science
No direct peer-reviewed scientific literature matched with this product's architecture.
SaaS Metrics