Detecting fraud rings: the social-graph problem in disguise

Detecting fraud rings: the social-graph problem in disguise Fraud at scale usually gives itself away through repetition. One phone number behind twelve identities. Same device fingerprint reused across signups. A shipping address that keeps showing up on accounts that supposedly have nothing to do with each other.

At some point the accounts stop looking independent. They’re a ring.

Finding rings is a graph problem. Nodes are accounts, edges are shared attributes, and what you want is connected components.

The usual recommendation is to throw the data into Neo4j and solve it there with Cypher queries. That absolutely works. It’s also a lot of infrastructure for something your warehouse can often handle directly.

This post is how to do it with regular SQL.

If you work on AML or transaction monitoring, the same shape applies. KYC profile overlaps are graph problems too, just with different attribute labels on the nodes.

What a fraud ring looks like

Simple example:

Account A and account B share a phone number at signup.

Account B and account C share a shipping address.

Account C and account D share a device fingerprint.

Account E and account A share an IP at first transaction.

A, B, C, D, and E are connected. Looked at individually they look like five separate customers. Together they’re almost always one operator running cards or laundering accounts.

Single shared attributes catch the obvious cases. The harder problem is following those connections outward and figuring out which accounts belong to the same cluster.

The data

Assume you have an accounts table and a shared_attributes table:

-- accounts account_id | signup_date | status -----------|-------------|-------- 1 | 2026-01-01 | active 2 | 2026-01-02 | active ...

-- shared_attributes account_a | account_b | attribute_type | attribute_value ----------|-----------|----------------|---------------- 1 | 7 | phone | +1-555-0100 2 | 14 | shipping_address | 123 Main St ... The shared_attributes table has one row per pair of accounts that share at least one identifying attribute. The exact build step depends on what identifiers you have available. The recursive part below assumes it exists.

If you don’t have it pre-built, you can build it inline:

SELECT a1.account_id AS account_a, a2.account_id AS account_b, 'phone' AS attribute_type, a1.phone AS attribute_value FROM accounts a1 JOIN accounts a2 ON a1.phone = a2.phone AND a1.account_id a2.account_id; The account_id predicate dedupes so you don’t get both A-B and B-A. Repeat the shape for each attribute and UNION ALL the results. Materialize it as a table if it gets large.

Finding pairs (the easy part)

Pairs are just a SELECT against shared_attributes.

SELECT account_a, account_b, count(*) AS shared_attribute_count FROM shared_attributes GROUP BY 1, 2 HAVING count(*) >= 2 ORDER BY shared_attribute_count DESC; Two shared attributes are usually a much stronger signal than one. Shared phone by itself might just be roommates or family. Shared phone plus device plus address is where things start getting interesting fast.

Finding clusters (the recursive part)

For clusters of arbitrary size you need a recursive CTE. Start from an account, follow every edge to find accounts connected to it, then follow every edge from those, and so on, until no new accounts get added.

WITH RECURSIVE ring AS ( -- Seed: pick a starting account SELECT account_a AS member, 1 AS depth FROM shared_attributes WHERE account_a = 1

UNION

-- Step: add accounts connected to any current ring member SELECT CASE WHEN s.account_a = r.member THEN s.account_b ELSE s.account_a END AS member, r.depth + 1 FROM ring r JOIN shared_attributes s ON r.member IN (s.account_a, s.account_b) WHERE r.depth 10 SELECT DISTINCT member FROM ring; One easy mistake here is using UNION ALL instead of UNION.

If A connects to B and B connects back to A, the recursion keeps walking the same loop forever unless duplicates get removed during each pass. Regular UNION handles that automatically.

In practice most of the clusters I’ve seen are relatively small, so a depth cap like 10 is usually enough to keep runaway recursion under control. The cap is also there for the rare malformed shared_attributes table that would otherwise blow up the query.

A note on platform support

Recursive CTEs work in Postgres, Snowflake, BigQuery, SQL Server, MySQL 8+, and SQLite. The syntax above is close to ANSI standard SQL and runs in all of them with minor adjustments. Postgres tends to be the cleanest. Snowflake and BigQuery both handle it but you’ll see different optimizer behavior.

BigQuery’s GoogleSQL graph queries went GA in 2025. If you’re already on BigQuery and your rings are large enough that the recursive CTE feels slow, the native graph syntax is the cleaner path before you reach for an external graph database.

Finding all rings (not just one seed)

The query above starts from one account. To enumerate every distinct ring across your whole population you...

Detecting fraud rings: the social-graph problem in disguise

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy