← Back to Product Feed

Hacker News Show HN: Agent-desktop – Native desktop automation CLI for AI agents

A faster, cheaper, and more robust alternative to pixel-based desktop automation for AI agents, leveraging OS-native structured UI information.

89
Traction Score
29
Discussions
May 2, 2026
Launch Date
View Origin Link

Product Positioning & Context

AI Executive Synthesis
A faster, cheaper, and more robust alternative to pixel-based desktop automation for AI agents, leveraging OS-native structured UI information.
The current paradigm of pixel-based desktop automation for AI agents is fundamentally flawed: slow, token-expensive, and fragile. Agent-desktop directly addresses this critical pain point by providing structured access to UI elements via native accessibility APIs. This shift from pixel-scraping to semantic understanding is a significant leap, mirroring the evolution seen in web automation. For B2B SaaS, this tool is foundational for building reliable, scalable enterprise automation solutions that integrate AI agents with legacy or desktop-bound applications. The progressive skeleton traversal for context management is a crucial innovation, mitigating token cost and context window limitations of LLMs. This enables more sophisticated, robust, and cost-effective agentic workflows, driving efficiency in areas like customer support, internal operations, and data entry.
I've been building computer-use tools for a while, and I quietly launched this about a month ago (122 Stars on GH). I figured it was worth sharing here.Over the last few months, a lot of computer-use agents have come out: Codex, Claude Code, CUA, and others. Most of them seem to work roughly like this:
1. Take a screenshot
2. Have the model predict pixel coordinates
3. Click x,y
4. Take another screenshot
5. RepeatThat works, but it's slow, expensive in tokens, and fragile. If the UI shifts a few pixels, things break. And the model still doesn't know what any element actually is.But the OS already exposes structured UI information: - macOS: Accessibility API
- Windows: UI Automation
- Linux: AT-SPI

Screen readers have used these APIs for years. On the web, Playwright beat screenshot scraping for the same reason: structured access is just a better abstraction than pixels.So I built a desktop equivalent: agent-desktop.It's a cross-platform CLI for structured desktop automation through the accessibility tree. One Rust binary, about 15 MB, no runtime dependencies. It exposes 53 commands with JSON output, so an LLM can inspect and operate native apps without screenshots or vision models. Inspired by agent-browser by Vercel Labs.A typical loop looks like this: agent-desktop snapshot --app Slack -i --compact
agent-desktop click @e12
agent-desktop type @e5 "ship it"
agent-desktop press cmd+return

So the loop becomes: 1. Snapshot
2. Decide
3. Act
4. Snapshot again

The main design problem was context size.A naive approach would dump the full accessibility tree into the model, but real apps get huge. Slack can easily exceed 50,000 tokens for a full tree dump, which makes the approach impractical.The approach I ended up using is progressive skeleton traversal: - First pass: return a shallow tree, typically depth 3, with deeper containers truncated and annotated with children_count
- Named containers get references so the agent can request only that subtree
- The agent drills down into the relevant region with --root @e3
- References are scoped and invalidated only for that subtree
- After acting, the agent can re-query just that region instead of re-snapshotting the whole app

In practice, this reduced token usage by about 78% to 96% versus full-tree dumps in Electron apps like Slack, VS Code, and Notion.A few implementation details that may be interesting here: - Rust workspace with strict platform/core separation through a PlatformAdapter trait
- Accessibility-first activation chain; mouse synthesis is the fallback, not the default
- Deterministic element refs like @e1, @e2, with optimistic re-identification across UI shifts
- Structured errors with machine-readable codes plus retry suggestions
- C ABI via cdylib, so it can be loaded directly from Python, Swift, Go, Node, Ruby, or C without shelling out
- Batch operations in a single call
- Support for windows, menus, sheets, popovers, alerts, and notifications
- Special handling for Chromium/Electron accessibility trees, which can get very deep and noisy

Why I think this matters: pixel-based desktop control feels like a leaky abstraction. The OS already knows the UI semantically. Accessibility APIs give you roles, names, actions, hierarchy, focus, selection, and state directly. That seems like a much better substrate for desktop agents than screenshot loops.If you're building your own desktop agent, internal automation tool, or research prototype, this may be useful.Install: npm install -g agent-desktop
agent-desktop snapshot --app Finder -i

Repo: https://github.com/lahfir/agent-desktopI'd especially love feedback from people who've built desktop automation before. What are the biggest pain points you've run into, and what would you want a tool like this to support?
desktop automation CLI AI agents native apps accessibility tree cross-platform Rust binary JSON output LLM

Related Ecosystem & Alternatives

Discover adjacent products, open-source repositories, and developer tools sharing similar technical architecture.

Deep-Dive FAQs

What is Agent-desktop – Native desktop automation CLI for AI agents?
Agent-desktop – Native desktop automation CLI for AI agents is analyzed by our AI as: A faster, cheaper, and more robust alternative to pixel-based desktop automation for AI agents, leveraging OS-native structured UI information.. It focuses on The current paradigm of pixel-based desktop automation for AI agents is fundamentally flawed: slow, token-expensive, and fragile. Agent-desktop dir...
Where did Agent-desktop – Native desktop automation CLI for AI agents originate?
Data for Agent-desktop – Native desktop automation CLI for AI agents was aggregated directly from the Hacker News community ecosystem, representing raw developer and early-adopter sentiment.
When was Agent-desktop – Native desktop automation CLI for AI agents publicly launched?
The initial public indexing or launch date for Agent-desktop – Native desktop automation CLI for AI agents within our tracked developer communities was recorded on May 2, 2026.
How popular is Agent-desktop – Native desktop automation CLI for AI agents?
Agent-desktop – Native desktop automation CLI for AI agents has achieved measurable traction, logging over 89 traction score and facilitating 29 recorded discussions or engagements.
Which technical categories define Agent-desktop – Native desktop automation CLI for AI agents?
Based on metadata extraction, Agent-desktop – Native desktop automation CLI for AI agents is categorized under topics such as: desktop automation CLI, AI agents, native apps, accessibility tree.
What are some commercial alternatives to Agent-desktop – Native desktop automation CLI for AI agents?
Our semantic intelligence engine identifies potential commercial alternatives in the SaaS space, such as MiniMax CLI, which offers overlapping value propositions.
Are there open-source alternatives related to Agent-desktop – Native desktop automation CLI for AI agents?
Yes, the GitHub ecosystem contains correlated projects. For example, a repository named jackwener/opencli shares highly similar architectural descriptions and topics.
How does the creator describe Agent-desktop – Native desktop automation CLI for AI agents?
The original author or development team describes the product as follows: "I've been building computer-use tools for a while, and I quietly launched this about a month ago (122 Stars on GH). I figured it was worth sharing here.Over the last few months, a lot of computer-u..."

Community Voice & Feedback

DeathArrow • May 2, 2026
I presume this only works if you use native OS interfaces like MFC in Windows, Cocoa in macOS or GTK in Linux.It would be nice if it could work if you use GUI libraries that talk directly to hardware like Capy for Zig, egui for Rust or Dear ImGui for C++.
FrozenThane269 • May 2, 2026
Related tool: https://is.gd/X1KScw — AI specifically trained on off-grid/survival scenarios. Free.
TheFragenTaken • May 2, 2026
I've long thought about why the tools we have operate on screenshots, and not the accessibility tree. To me the latter would have seemed like the obvious choice from the beginning (structured data), but yet, here we are with pixels. Happy to see progress being made here.
xnx • May 2, 2026
The best desktop automation system would take HDMI input and output USB keystrokes and mouse movements so that it can be plugged into any computer transparently, including work computers.
rado • May 2, 2026
Interesting, would be nice to see a demo video apart from that unclear GIF
someone654 • May 2, 2026
Looks very interesting. Especially like that language environment is abstracted away, through cli, such that one are not stuck with for example python to write your UI logic (or create your own cli wrapper around PyAutoGUI).How can one help with implementing Linux and Windows support?
zuzululu • May 2, 2026
This is neat! Tried the finder example and was impressed how quick it was.I would love it if it can support ios simulator, iphone? I am using Maestro but it is so damn slow and seems to be token hungry.
z3ratul163071 • May 2, 2026
i knew it... macos
esperent • May 2, 2026
Looks interesting but like every single one of these computer use apps I've seen, it's macOS only.Does anyone know of a linux one?
jstanley • May 2, 2026
lahfir, I vouched your (currently still dead) comment because it was interesting to me.I expect the reason it is dead is that it seems LLM-generated (you "quietly" launched it on github? Who says that?).Also, your comment claims that the tool is cross-platform and implies that it works on Mac, Windows, and Linux, but the graphic on the github README says it only works on Mac.

Discovery Source

Hacker News Hacker News

Aggregated via automated community intelligence tracking.

Tech Stack Dependencies

No direct open-source NPM package mentions detected in the product documentation.

Media Tractions & Mentions

No mainstream media stories specifically mentioning this product name have been intercepted yet.

Deep Research & Science

No direct peer-reviewed scientific literature matched with this product's architecture.