Show HN: WASM scanner to debug Postgres deadlocks without leaking SQL

gwei1 pts0 comments

PostgreSQL Deadlock ShareLock Transaction Audit | StackEngine

Initializing Enclave...<br>How to Fix PostgreSQL Deadlock Detected on ShareLock Transaction (With Root Cause Analysis)

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins

TL;DR

What broke: Two concurrent transactions acquired locks in inverse order — PostgreSQL's deadlock detector killed one to break the cycle, rolling back that transaction entirely.

How to fix it: Enforce a consistent lock acquisition order across all transactions touching the same rows; use SELECT FOR UPDATE with explicit ordering or SKIP LOCKED for queue-style workloads.

Use our Client-Side Sandbox below to paste your transaction logic and auto-refactor the lock ordering with zero data leaving your browser.

The Incident (What Does the Error Mean?)

ERROR: deadlock detected<br>DETAIL: Process 12345 waits for ShareLock on transaction 67890;<br>blocked by process 67890.<br>Process 67890 waits for ShareLock on transaction 12345;<br>blocked by process 12345.<br>HINT: See server log for query details.<br>CONTEXT: while updating tuple (0,42) in relation "orders"

PostgreSQL's deadlock detector runs every deadlock_timeout (default: 1 second) . When it fires, it picks one transaction as the victim and issues a hard rollback. The application receives this error on the next query execution. The rolled-back transaction's work is entirely lost — your application must detect this error code (40P01) and retry, or the operation silently fails.

ShareLocks in this context are row-level locks held by in-progress transactions , not table-level shared locks. The deadlock occurs when Transaction A holds a lock on Row 1 and wants Row 2, while Transaction B holds Row 2 and wants Row 1.

The Attack Vector / Blast Radius

This is not a one-off failure. In high-concurrency environments this is a recurring production degradation pattern :

Connection pool exhaustion: Threads waiting on locks pile up. If deadlock_timeout is 1s and you have 50 concurrent conflicting transactions, your connection pool saturates before the detector clears them.

Cascading retry storms: Naive retry logic without exponential backoff causes the same transactions to immediately re-conflict, worsening throughput under load.

Silent data loss: Applications that catch the error without retrying lose writes permanently — especially dangerous in financial ledgers, inventory systems, and order management where the rolled-back transaction updated multiple tables.

Replication lag amplification: On streaming replicas, the lock contention on primary causes WAL write spikes. Under sustained deadlock storms, replica lag can exceed your RTO .

ORM blind spots: Hibernate, SQLAlchemy, and ActiveRecord often wrap operations in implicit transactions with non-deterministic lock ordering based on object graph traversal order — making this nearly impossible to debug without query-level logging.

How to Fix It

Basic Fix — Enforce Consistent Lock Ordering

The root cause is always lock acquisition order inversion. Fix it by sorting the rows you intend to lock before acquiring locks.

-- Transaction A and B both update accounts: sender and receiver<br>-- BAD: Each transaction locks in application-determined order (non-deterministic)<br>- BEGIN;<br>- UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- locks row 1<br>- UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- waits for row 2<br>- COMMIT;

-- (Concurrent Transaction B)<br>- BEGIN;<br>- UPDATE accounts SET balance = balance - 50 WHERE id = 2; -- locks row 2<br>- UPDATE accounts SET balance = balance + 50 WHERE id = 1; -- DEADLOCK<br>- COMMIT;

-- GOOD: Always lock in ascending ID order regardless of transaction direction<br>+ BEGIN;<br>+ -- Pre-sort: always lock lower ID first<br>+ SELECT id FROM accounts WHERE id IN (1, 2) ORDER BY id FOR UPDATE;<br>+ UPDATE accounts SET balance = balance - 100 WHERE id = 1;<br>+ UPDATE accounts SET balance = balance + 100 WHERE id = 2;<br>+ COMMIT;

Enterprise Best Practice — SKIP LOCKED + Advisory Locks + Retry Logic

-- BAD: Blocking SELECT FOR UPDATE with no timeout, no retry handling<br>- SELECT * FROM job_queue WHERE status = 'pending' FOR UPDATE;

-- GOOD: Non-blocking queue consumption with SKIP LOCKED<br>+ SELECT * FROM job_queue<br>+ WHERE status = 'pending'<br>+ ORDER BY created_at ASC<br>+ LIMIT 1<br>+ FOR UPDATE SKIP LOCKED;

-- GOOD: Application-level retry with 40P01 detection (Python/psycopg2 example)<br>+ import psycopg2<br>+ from psycopg2 import errors<br>+ import time, random<br>+ def execute_with_retry(conn, fn, max_retries=5):<br>+ for attempt in range(max_retries):<br>+ try:<br>+ with conn.cursor() as cur:<br>+ fn(cur)<br>+ conn.commit()<br>+ return<br>+ except errors.DeadlockDetected:<br>+ conn.rollback()<br>+ wait = (2 ** attempt) + random.uniform(0, 0.5)<br>+ time.sleep(wait)<br>+ raise Exception("Max retries exceeded on deadlock")

-- GOOD: PostgreSQL advisory locks for application-level mutex (no row lock needed)<br>+ SELECT pg_advisory_xact_lock(hashtext('transfer:' || LEAST(1,2)::text || ':' ||...

transaction update balance lock locks deadlock

Related Articles