Understudy – a local-first desktop agent runtime that can operate GUI apps, browsers, shell tools, files, and messaging in one session, teachable by demonstrating a task once.
Raw Developer Origin & Technical Request
Hacker News
Mar 13, 2026
I built Understudy because a lot of real work still spans native desktop apps, browser tabs, terminals, and chat tools. Most current agents live in only one of those surfaces.Understudy is a local-first desktop agent runtime that can operate GUI apps, browsers, shell tools, files, and messaging in one session. The part I'm most interested in feedback on is teach-by-demonstration: you do a task once, the agent records screen video + semantic events, extracts the intent rather than coordinates, and turns it into a reusable skill.Demo video:
the demo I teach it: Google Image search -> download a photo -> remove background in Pixelmator Pro -> export -> send via Telegram. Then I ask it to do the same for Elon Musk. The replay isn't a brittle macro: the published skill stores intent steps, route options, and GUI hints only as a fallback. In this example it can also prefer faster routes when they are available instead of repeating every GUI step.Current state: macOS only. Layers 1-2 are working today; Layers 3-4 are partial and still early. npm install -g @understudy-ai/understudyunderstudy wizard
GitHub: github.com/understudy-ai/und... to answer questions about the architecture, teach-by-demonstration, or the limits of the current implementation.
Developer Debate & Comments
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like local-first desktop agent runtime and semantic events by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
Market Trends