An agent-tuned caching system for RAG applications, featuring a two-tier cache (tool-result and semantic) and an autonomous agent for monitoring and configuration optimization.
Raw Developer Origin & Technical Request
Hacker News
May 11, 2026
The weekend of last week I built chat.betterdb.com as a RAG over Valkey/Redis/Dragonfly docs. The goal was to eat our own dogfood and test publicly our caching libraries. It also saved me from having to come up with various demo/test scenarios, as I could extend the building in public to the demo.There is a tool-result cache sitting between the SDK and tools. Each call is normalized and then checked before executing. If it hits we return from the cache, and if not, we check the semantic cache, which embeds the prompt and checks with KNN via valkey-search. If the cosine distance is close enough, we again skip the LLM and stream the cached response. In both cases, if we miss, we store the prompt embedding, actual model, input and output tokens from OpenAI's usage report, so a future hit has the dollars avoided as data.The two tiers handle different shapes. Predefined questions, copy-pasted questions, checking the same thing again after time - produces byte-identical strings the tool cache catches. Human paraphrase is what the semantic tier exists for.This Wednesday was a bank holiday where I live, so I used to extend it further - the libraries the chat relies on now store metadata in the Valkey (or Redis if that's your preference) instance, then our monitoring reads and analyze that data and suggests improvements. These are exported also through our MCP server, so the chat's agent can check and create suggestions as well, and since this is just a demo, it can also approve its suggestions (do not do this on real production environment, unless you are a true LLM believer). The libs also read the config from the Valkey instance, so there is no restart needed. I hooked it on cron inside Vercel and let it run over the night and next day.Between Run 1 and Run 3, it started making less tool calls. The first run it suggested several different TTL changes and applied them. Run 2 and 1 had similar suggestions, because the TTL is the wrong point of control - they take natural language input (`How fast is XADD?` vs `XADD performance` are two different strings, that "mean" the same thing) so the tool cache doesn't fire and are covered by the semantic cache. An actual fix would be to move these tools from the exact-match into the semantic cache checks - a code change, not a config change. It was an indicator of a problem the system can't fix on its own. In the future the routing might also become configurable to solve this without redeploying and test and verify in quicker loops. Run 3 just didn't propose anything new - 15 -> 13 -> 8 tool calls across the three runs.Curious how others running similar loops decide what the agent can touch. Am I too skeptical of hallucinations and overly cautious?The chat can be found at chat.betterdb.com (it has links to all of the repos in it)
And a more detailed write up can be found at betterdb.com/blog/cache-that-t...
Developer Debate & Comments
No active discussions extracted for this entry yet.
Frequently Asked Questions
Market intelligence mapped to An agent-tuned caching system for RAG applications, featuring a two-tier cache (tool-result and semantic) and an autonomous agent for monitoring and configuration optimization..
What is the technical positioning of An agent-tuned caching system for RAG applications, featuring a two-tier cache (tool-result and semantic) and an autonomous agent for monitoring and configuration optimization.?
Which technical concepts are associated with An agent-tuned caching system for RAG applications, featuring a two-tier cache (tool-result and semantic) and an autonomous agent for monitoring and configuration optimization.?
Are there startups building around An agent-tuned caching system for RAG applications, featuring a two-tier cache (tool-result and semantic) and an autonomous agent for monitoring and configuration optimization.?
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like agent and LLM by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
SaaS Metrics