GitHub Issue

Windows companion app — open source Electron rewrite

Discovered On Apr 8, 2026
Primary Metric open
Hey! I built an open-source Windows version of Clicky using Electron + TypeScript. ## Repo **https://github.com/tekram/clicky-windows** ## What it does Same core experience as the macOS version, rebuilt for Windows: - **Screen capture** via Electron desktopCapturer - **Push-to-talk** with global hotkey (Ctrl+Alt+Space) - **Claude vision** — screenshots + conversation context sent to Claude API - **TTS** — ElevenLabs (cloud) or Windows SAPI (local/free) - **Cursor overlay** — parses `[POINT:x,y:label:screenN]` tags and animates a pointing cursor - **System tray** app with chat window and settings panel ## Key differences from macOS version - **BYOK (Bring Your Own Keys)** — users enter their own API keys (Anthropic, AssemblyAI, ElevenLabs) via the UI - **HIPAA mode** — toggle forces local-only processing (Whisper for transcription, Windows SAPI for TTS) - **No proxy dependency** — works with direct API calls, but supports optional proxy URL for orgs ## Collaboration ideas 1. **Shared proxy** — would you be open to letting the Windows version use your Cloudflare Worker as an optional free tier for non-HIPAA users? 2. **Protocol spec** — could standardize the POINT tag format, conversation schema, etc. in a shared doc so both platforms stay compatible 3. **Cross-linking** — happy to reference the macOS version prominently, and vice versa ## Screenshots The app features a modern dark UI with: - Inline API key setup in the chat window - Apple-inspired design language - Smooth...
View Raw Thread

Developer & User Discourse

dfordp • Apr 8, 2026
finally I was looking for this
tekram • Apr 8, 2026
Did you try it?
graz68a • Apr 9, 2026
finally, please can you add Openrouter support so we can get rid of Anthropic ? Also please can you add
a text chatbot optional interaction (useful to take note of long replies) ?
m13v • Apr 9, 2026
nice work on the Windows port. the HIPAA mode with local-only Whisper + SAPI is a solid differentiator.

one thing from building push-to-talk with vision on macOS: the screenshot capture timing matters more than you'd expect. if you capture the screenshot when the user starts speaking, by the time they finish their question (2-5 seconds later) the screen might have changed. we ended up capturing at push-to-talk release rather than press, and optionally a second capture mid-conversation if the user references something visual. the screen context at speech-end is almost always more relevant than at speech-start.

for the cursor overlay with POINT tags, have you considered using accessibility APIs alongside the coordinate-based approach? on Windows, UIA gives you element names and roles which makes the model's instructions more reliable than pixel coordinates. 'click the Submit button' is more robust than 'click at 450,320' especially across different screen resolutions.
m13v • Apr 9, 2026
our macOS implementation of accessibility-based element targeting alongside coordinate control: https://github.com/mediar-ai/mcp-server-macos-use/blob/main/Sources/MCPServer/main.swift

and the cross-platform element abstraction that handles UIA on Windows: https://github.com/mediar-ai/terminator/blob/main/crates/terminator/src/element.rs