Keeping the Cursor App Stable

Keeping the Cursor app stable · Cursor

Product →<br>Enterprise

Pricing

Resources →

Sign inContactContact salesDownload

Blog / research

Many of our users spend their entire day using Cursor, which means even rare crashes can be extremely disruptive. At the same time, the challenge of keeping the app stable has grown as we've added users and shipped increasingly ambitious features like subagents, instant grep, browser use, and more.

Most of these crashes are caused by the app running out of memory (OOM). Over the past few months, we've implemented systems to give us observability on crashes and memory pressure, high-confidence fixes and optimizations for hot paths, and guardrails to catch regressions before they ship.

Our OOM-per-session rate aggregated across all versions of the Cursor app has fallen 80% since its late-February peak, while OOM-per-request has fallen 73% since March 1. This post details the systems we built to get there.

#Detecting and measuring instability

Our desktop app is built on the open-source foundations of Visual Studio Code and Electron, which gives it a multi-process architecture. This means crashes can occur in either the renderer processes which power the editor and the new agents window, or the utility processes which power extensions, storage, and agent functionality.

Renderer crashes are the most severe because they completely prevent the user from using the editor. We've found these are mostly caused by hitting V8 memory limits and are the focus of our most recent efforts. Extension crashes can also disrupt important functionality like language services, but typically recover without disrupting the user as much.

Every fatal crash is reported by our telemetry along with context such as the affected process, type of crash, device and application metadata, and minidumps and stack traces where available.

From these crash events, we've built metrics which we're able to break down by app version, calculating rates on a per-session or per-request basis, with the former roughly capturing how many sessions experience crashes, and the latter how severe the crash problem is for affected sessions. These dashboards update within minutes of crash events, so we're able to track releases of new versions closely and detect potential regressions quickly.

#Dual debugging strategies

We take a two-pronged strategy to debugging app crashes and out-of-memory issues.

#Top-down

The first is a top-down investigation focused on the most memory intensive features. If a feature is known to be memory-intensive, we can link crash metrics to the corresponding feature flag in Statsig, our experimentation platform, then A/B test it to measure its contribution to crash rates.

We can also track proxy metrics which correlate strongly with crashes and may be easier to observe in development. One such metric is oversize message payloads. Because our app uses a multi-process architecture, data is constantly being passed between the editor, extensions, and agents through inter-process channels and a persistence layer. We instrument both to track messages larger than some threshold, which correlates strongly with memory issues, and attach callstacks so we can trace each one back to its source in our application code.

To reconstruct what happens at the moment of a specific crash, we add breadcrumbs (special metadata logs attached to errors) for features like parallel agent usage, tool calls, and terminals, so that each crash event carries a record of the activity that preceded it.

#Bottom-up

In bottom-up investigations we trace individual crash events back to their root cause. The first step is to capture what happened at the moment the process died. We run a crash watcher service in the main process that uses the Chrome DevTools Protocol (CDP) to detect out-of-memory errors and capture crash stacks in real time, and have patched Electron upstream to make it possible to obtain these stacks without the heavyweight CDP machinery. These crash stacks feed an automation which runs daily, analyzing each stack in detail, making PRs with optimizations for stacks with high-confidence fixes, and verifying issue resolution version-over-version.

To understand how memory accumulates over the course of a session, we look at heap snapshots. When we detect that Cursor is using too much memory, we prompt the user to capture and send one. These snapshots can contain sensitive information such as the contents of open editors or chats, so sending them is entirely opt-in. But they're highly valuable for tracing the accumulation of memory pressure back to specific objects and retainers, which makes us appreciative when users choose to participate.

To understand memory usage patterns across the full user base, we run continuous heap allocation profiling at a low sampling rate. We aggregate this data per app version to build a breakdown of memory pressure by callstack. This gives us a bird's-eye view of memory pressure...

Keeping the Cursor App Stable

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan