Decoupling a High-Throughput Engagement Service from a Monetization System | by Stephanie Dover | Jun, 2026 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
Press enter or click to view image in full size
Decoupling a High-Throughput Engagement Service from a Monetization System
Stephanie Dover
5 min read·<br>6 days ago
Listen
Share
At large-scale consumer platforms, product reuse can quietly turn into technical debt.<br>In this case, a non-monetized engagement points feature shared a backend with a wallet-based monetization service.<br>The engagement system handled over a million transactions per second , far beyond what the transactional payments backend was built for.<br>What began as a pragmatic shortcut led to rising latency, ballooning storage costs, and operational strain on a system optimized for financial correctness rather than lightweight interactivity.<br>The only sustainable fix was to decouple them completely .<br>Problem Context<br>The monetization system prioritized:<br>Durable writes<br>Idempotency<br>Strong transactional guarantees<br>The engagement system prioritized:<br>Speed<br>Throughput<br>Low operational cost<br>By sharing a backend, the engagement workload forced the payments infrastructure to scale inefficiently adding expensive transactional overhead to a system that didn’t need it.<br>Scaling further wasn’t the answer. Isolation was.<br>Designing the New Backend<br>The decoupling required two major components:<br>A dedicated API and data model built for high-TPS, low-latency operations.<br>A live migration pipeline capable of achieving full data parity with zero downtime.<br>Infrastructure Overview<br>Component Purpose Go microservice Core logic and API layer Protobuf + gRPC Internal RPC communication AWS DynamoDB Primary datastore: high throughput, flexible schema AWS Kinesis Streams Real-time change-data capture AWS Lambda functions Stream processors handling event ordering and writes Redis cache Idempotency layer to prevent duplicate writes during dual-write and stream replay Terraform Infrastructure-as-code provisioning CloudWatch metrics Observability for throughput, lag, and latency Feature flag service Safe rollout and traffic control<br>The new backend used a lightweight schema aligned to engagement interactions, simpler, cheaper, and better suited for massive write volume.<br>Architecture Diagrams<br>Diagram 1: System Overview<br>flowchart TD<br>A[Engagement API] --> B[Old DynamoDB Table]<br>B --> C[AWS Kinesis Stream]<br>C --> D[AWS Lambda Functions]<br>D --> E[New DynamoDB Table]<br>D --> F[Redis Cache(idempotency)]<br>F --> E<br>E --> G[Feature-flag Dual-Write]Diagram 2: Migration Lifecycle<br>flowchart TD<br>A[1. Export snapshot → S3 → Import new table] --> B[2. Sync updates via Kinesis + Lambda]<br>B --> C[3. Redis ensures idempotencyfor dual-writes & replayed events]<br>C --> D[4. Feature flag directs dual-writes]<br>D --> E[5. Cutover & validation]Migration Strategy<br>Building the API was straightforward.<br>Migrating live data at 1M+ TPS without downtime was the challenge.<br>1. Snapshot Bootstrap<br>AWS provides a built-in mechanism to export a DynamoDB table snapshot to S3, which can then be imported into a new table.<br>This seeded the new database with a point-in-time baseline, no long-running scans or Glue jobs required.<br>2. Real-Time Sync via Kinesis + Lambda<br>Once the snapshot was imported, Kinesis Streams captured every subsequent change (insert, update, delete) from the source table.<br>Each event was processed by an AWS Lambda consumer that replayed the change into the new DynamoDB table.<br>Maintaining transaction order was critical, out-of-sequence events could cause corruption or lost updates.<br>To handle retries and potential duplicate delivery, I introduced a Redis-based idempotency layer .<br>Each event carried a unique transaction ID. Before processing, Lambda performed a fast Redis lookup to check whether that ID had already been written.<br>If found, the event was skipped, eliminating double writes both from Kinesis replays and from the feature-flagged dual-write traffic hitting the same endpoint.<br>This lightweight Redis layer made the migration safe, ensuring exactly-once behavior without compromising throughput.<br>Monitoring IteratorAge and Duration metrics in CloudWatch remained critical.<br>If IteratorAge rose, the stream was falling behind, meaning either smaller batches or more concurrency were needed.<br>With tuning and caching in place, the pipeline kept pace with over a million updates per second.<br>The full migration completed within hours, not days.<br>Cutover with Feature Flags<br>After the real-time sync stabilized, I rolled out the new backend via a feature-flagged dual-write :<br>Dual-write requests to both APIs.<br>Use Redis for idempotency checks to prevent duplicate writes.<br>Validate data parity.<br>Monitor Kinesis lag until zero.<br>Cut traffic to the old API.<br>Once validation passed, the engagement service ran entirely on its new infrastructure.<br>The monetization system was finally free of the extra load, and both systems could scale...