My first screen capture tool was called Cropmon. Built it in late 2024 as a straightforward Electron app — HTML/CSS for the overlay, Node.js for file I/O, and Chromium's built-in desktopCapturer API for grabbing the screen.
It shipped fast. Worked fine for basic recording. Along the way I spent a lot of time with other capture tools — Snagit, CleanShot X, Screen Studio, Loom, OBS. They're all genuinely great products, and I learned a ton from each of them: how CleanShot nails the post-capture editing flow, how Screen Studio makes recordings look cinematic, how OBS manages scene composition and multi-source streaming. That study shaped my sense of what an all-in-one screen capture tool should feel like. But it also made the gaps in my own architecture impossible to ignore.
When I tried to make Cropmon a serious screen recorder capable of 4K 60fps recording with built-in annotation and editing, things fell apart.
By April 2026 I'd rewritten the entire capture pipeline in C++ and rebranded the project as Snapr.
Here's why, and what it actually took.
The wall#
Electron's desktopCapturer hands you a MediaStream. You pipe that into a MediaRecorder, pick a codec, and you're recording. The API surface is tiny. The constraints are not.
30fps, and there's nothing you can do about it. Electron's desktopCapturer is built on Chromium's WebRTC pipeline — an API designed for video calls, not screen recording. The default frame rate caps at 30fps. You can pass frameRate: { ideal: 60, max: 60 } in the constraints, but the engine largely ignores it.
The reasons are structural. Chromium's auto-throttle system actively monitors CPU/GPU load and will down-throttle frame rate and resolution when it decides the system is under pressure — which, at 4K, is basically always. On top of that, the capture engine uses a variable frame rate model that only captures when it detects screen damage. If nothing moves on screen, no frame is produced. This makes maintaining a constant 60fps structurally impossible.
Even if Chromium did deliver 60fps, the pipeline itself is inefficient: frames travel from the OS compositor → GPU memory → Chromium's capture layer → IPC to the renderer process → JavaScript MediaRecorder. Every hop adds latency. Native tools like OBS bypass all of this by talking directly to the OS capture API (DXGI on Windows, ScreenCaptureKit on macOS) and feeding frames straight to a hardware encoder.
For a screen recording tool, 30fps feels choppy. Cursor movement stutters. UI animations look janky. It's fine for a video call. It's not fine for a product demo or tutorial.
No system audio on macOS. Electron's desktopCapturer still can't capture system audio (loopback) on macOS. This has been an open issue since 2017, and it's blocked on Chromium itself adding ScreenCaptureKit support upstream. The only workaround is asking users to install BlackHole or Soundflower — third-party virtual audio drivers that route system output through a loopback device. For a screen recorder, telling users "install a kernel extension first" isn't a real solution.
No access to the OS. Enumerating windows with exact bounds, capturing a single window without decorations, reading global mouse/keyboard events for input visualization, warping the cursor for scroll capture — none of this is available from Chromium's sandbox. Each one would require a separate native module or a creative hack.
I could've patched around each issue. But the fundamental problem was architectural: I was running a real-time video pipeline inside a web browser runtime.
The split#
Instead of a full rewrite, I split the app into two processes:
Electron stays in charge of everything the user sees — the selection overlay, the image/video editor, the timeline, annotation tools, settings, workspace management. React 19 and Zustand handle the UI state. This is what Electron is genuinely good at.
A native CLI binary (C++20, Objective-C++ on macOS, WinRT/D3D11 on Windows) handles everything that needs to be fast and close to the OS — screenshot capture, video recording, audio recording, input event tracking, scroll capture, and frame stitching.
They talk through the simplest possible interface: Electron spawns the CLI as a child process and reads JSON from stdout.
Electron Main Process
│
│ spawn('./capture', ['record', '--target', '0,0,3840,2160', ...])
│
▼
Native CLI (C++20, macOS example)
│
│ ScreenCaptureKit → AVAssetWriter → H.264 hardware encoder
│ CGEventTap → input events
│ CoreAudio → system audio + microphone
│
▼
JSON result on stdout → Electron parses, opens editor
Why a separate process, not a native addon#
I already had a small node-addon-api module for window property manipulation on macOS. I considered linking the recording code the same way. But process isolation won over.
A recording session runs for minutes. If the native code segfaults — and during development, Objective-C++ memory management made sure it did — a crash in a child process means the recording fails gracefully. A crash in a native addon means the entire Electron app dies, and the user loses whatever they were working on.
Process isolation also means the CLI is independently testable:
./snapr-cli snap --area 0,0,1920,1080 -o test.png
./snapr-cli record --target window --fps 60 -o demo.mp4
No Electron needed. No JavaScript. Just a binary and some arguments.
There's one more reason I chose this path, and it's forward-looking: a standalone CLI is the most natural interface for AI agents. As coding assistants and automation tools get more capable, they'll need to interact with desktop applications programmatically — take a screenshot of a specific window, record a region, capture a scrollable page. A CLI with structured JSON output is exactly what an agent can call. A native addon buried inside an Electron process is not. I don't know when this will matter at scale, but I'd rather have the interface ready than retrofit it later.
Inside the native pipeline#
macOS: ScreenCaptureKit + AVAssetWriter#
The recording flow on macOS:
- ScreenCaptureKit provides an
SCStreamthat delivers raw BGRA pixel buffers through a delegate callback on a dedicated dispatch queue - Each frame goes into an AVAssetWriterInputPixelBufferAdaptor — Apple's hardware H.264 encoder handles the heavy lifting
- System audio arrives through a separate
SCStreamOutputcallback, encoded as AAC at 128kbps stereo, 48kHz - Microphone is recorded to a separate sidecar file (AAC 64kbps mono, 48kHz) and mixed later with microsecond-precise timestamp alignment
The bitrate formula: width × height × 6 bps. For 4K (3840×2160), that's roughly 49.5 Mbps — high, but hardware encoding on Apple Silicon handles it without visible CPU impact.
The critical detail: the Electron process is completely idle during recording. It shows the control bar and waits for a stop signal. All encoding happens in the CLI process on the GPU.
Windows: D3D11 + Media Foundation#
Same architecture, different APIs. DXGI Desktop Duplication for frame capture, Direct3D 11 for GPU access, Media Foundation for H.264 encoding. WASAPI for audio.
What changed#
| Electron (Cropmon) | Native CLI (Snapr) | |
|---|---|---|
| Max FPS | 30 (Chromium cap) | 60 (hardware encoder) |
| 4K recording | Choppy, frame drops | Smooth, GPU-accelerated |
| System audio (macOS) | Needs third-party driver | Native ScreenCaptureKit |
| Input event recording | Not possible | Native CGEventTap |
| Scroll capture | Not possible | Cursor warp + stitch |
Features that only exist because of native#
Going native wasn't just about fixing what was broken. It opened doors to features that couldn't exist in the Electron-only architecture.
Scroll capture#
This one's my favorite. The CLI can capture an entire scrollable page — a long web page, a chat thread, a document — as a single stitched image.
Because the CLI operates at the OS level, it can control the cursor, send synthetic scroll events, and capture frames directly from any window. This means scroll capture works with any application — browsers, chat apps, documents, code editors, even spreadsheets — as long as the content scrolls. It's not limited to a specific rendering engine or framework.
None of this is possible from Electron's sandbox. You can't control the cursor, send input events to another application's window, or access its scroll state from a Chromium process.
Device capture#
On macOS, the CLI uses AVFoundation to record the screen of a connected iPhone or iPad over USB. Electron has no API for this — it would require a native module that bridges AVCaptureDeviceInput with a video writer. With the native CLI, it's just another subcommand.
Clean desktop#
Before a capture starts, the CLI can hide all desktop icons and restore them after. A small detail, but it makes screenshots look professional without manual cleanup. On Windows, the CLI sends messages directly to the SysListView32 window handle. Again — OS-level access that Electron can't provide.
The awkward parts#
That said, this architecture isn't free lunch either. There are tradeoffs I deal with daily.
Display ID mismatch. Electron's screen.getAllDisplays() returns IDs that don't match what EnumDisplayMonitors gives on Windows. The bridge has to match displays by comparing resolution, scale factor, and origin coordinates. Multi-monitor setups with identical displays are a nightmare.
Video serving hack. Chromium's FFmpeg demuxer requires HTTP Range headers to seek in video. Electron's protocol.handle() doesn't support Range headers (electron#38749). So there's a local HTTP server on 127.0.0.1 with an explicit file allowlist, just to serve recorded videos to the <video> element in the editor.
Two build systems. CMake for the CLI, electron-vite + electron-builder for the app. macOS universal binaries mean building twice (arm64 + x64) and lipo-ing them together. The CI pipeline is not pretty.
Objective-C++ has a learning curve (to me). I'm not an Objective-C++ veteran, and it shows. ScreenCaptureKit is async-heavy — completion handlers, dispatch queues, semaphores — and the memory management rules are subtle. I've had crashes that only appeared 30 seconds after the actual bug, in a completely unrelated callback. The learning curve was steep, but it's the price of talking directly to Apple's frameworks.
Debugging across the boundary. When something goes wrong during a recording, the bug could be in the CLI's C++ code, the JSON bridge layer, or the Electron state machine. Logs from three different runtimes, two processes, no shared debugger. I've gotten better at structured logging and crash dumps, but it's never going to be as simple as console.log.
Would I do it again#
Yes. Every app is different, but if yours has parts that fundamentally depend on native OS capabilities — hardware encoding, input event access, system audio, window manipulation — I'd figure out that boundary early and design the split from the start. It doesn't have to be big. Even a thin CLI that handles one native operation saves you from retrofitting the architecture later.
Electron for UI, native for performance — it's genuinely the right tradeoff for a desktop tool. React gives me UI development speed I can't match in AppKit or WinUI. C++ gives me hardware encoders and platform APIs that JavaScript will never reach.
The real lesson: know where your abstraction stops. Electron abstracts the OS away. For buttons and text fields, that's fine. For a real-time video pipeline capturing half a billion pixels per second at 4K60, you need the actual OS.
Snapr is an all-in-one screen capture tool for macOS and Windows. Capture screenshots, record at 4K 60fps, scroll capture long pages, annotate fast, and edit — all in one app. The free tier covers all of this with no watermarks or time limits. Pro unlocks a growing set of advanced capture and post-editing features — one-time purchase.
Snapr