ROIpad ← Back to Search
github.com › issue comment

Comment on: Windows companion app — open source Electron rewrite

Repo: farzaa/clicky by m13v
Posted: Apr 9, 2026
nice work on the Windows port. the HIPAA mode with local-only Whisper + SAPI is a solid differentiator. one thing from building push-to-talk with vision on macOS: the screenshot capture timing matters more than you'd expect. if you capture the screenshot when the user starts speaking, by the time they finish their question (2-5 seconds later) the screen might have changed. we ended up capturing at push-to-talk release rather than press, and optionally a second capture mid-conversation if the user references something visual. the screen context at speech-end is almost always more relevant than at speech-start. for the cursor overlay with POINT tags, have you considered using accessibility APIs alongside the coordinate-based approach? on Windows, UIA gives you element names and roles which makes the model's instructions more reliable than pixel coordinates. 'click the Submit button' is more robust than 'click at 450,320' especially across different screen resolutions.
GitHub Issue
Parent Entity
State: Open • Comments: 5
Other Comments / Reviews