Eight window-function tricks beyond LAG and ROW_NUMBER

Eight window-function tricks beyond LAG and ROW_NUMBER The first post in this series got more reader response than I expected. The comments were full of practical follow-up questions and a few good corrections, and one thing that kept coming up was a request to go deeper on window functions specifically. Pattern 6 in the previous post leaned heavily on them. People wanted to see what else lives in that toolbox.

This is that.

LAG and ROW_NUMBER are usually the first two window functions analysts learn. They are the right starting point. But after a year or two of writing SQL you start running into shapes those two can’t quite hit, and you start hitting problems where those two stop being enough.

Eight patterns below. Same disclaimer as the previous post: I work as a data analyst on a program-integrity team. Examples use generic transaction tables and synthetic scenarios. Nothing here reflects actual systems, procedures, or data from any specific employer.

1. QUALIFY

QUALIFY lets you filter directly on window-function output without wrapping the whole query in another CTE or subquery.

SELECT cardholder_id, timestamp, count(*) OVER ( PARTITION BY cardholder_id ORDER BY timestamp RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW ) AS tx_5min FROM transactions QUALIFY tx_5min >= 5; Without QUALIFY this usually turns into an extra query layer just to filter on the computed column. Not a big deal in tiny examples, but once fraud rules start stacking together it gets messy fast.

Support is still inconsistent depending on the engine. Snowflake, BigQuery, DuckDB, Databricks, ClickHouse and a few others have it. Postgres, MySQL, SQLite still don’t.

If you’re on Postgres, the fallback is usually just a CTE and an outer WHERE clause.

2. Frame specs (ROWS vs RANGE)

A lot of people write:

OVER (PARTITION BY x ORDER BY y) and never think much about the frame underneath it. Usually fine until sliding windows start behaving strangely.

The two big ones:

ROWS BETWEEN N PRECEDING AND CURRENT ROW

RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW

ROWS counts rows. RANGE works off the ORDER BY value itself.

The difference starts mattering once duplicate timestamps show up. ROWS and RANGE stop behaving the same way at that point.

For fraud velocity checks, RANGE on timestamps is usually the right fit because you’re actually asking “how many transactions happened within this time window,” not “what happened in the previous 5 rows.”

For rolling hourly baselines or trailing-week comparisons, ROWS is often cleaner because each row already represents a fixed bucket.

One related thing that came up in the comments on the previous post and is worth surfacing here: the timestamp you key off matters more than people realize. Authorization time and settlement time can be days apart. If you use settlement time on a velocity or impossible-travel check, charges from days ago suddenly look like they happened “now” in the warehouse, and the windows blow up with false positives. Use auth time. Most card systems have both columns; the home-grown ones don’t always make the distinction obvious.

3. FILTER for conditional aggregation

Most people start with CASE WHEN inside SUM or COUNT:

sum(CASE WHEN amount > 100 THEN 1 ELSE 0 END) FILTER does the same thing with less noise:

SELECT merchant_id, count(*) FILTER (WHERE amount > 100) AS high_value_count, count(*) FILTER (WHERE timestamp::time BETWEEN '00:00' AND '06:00') AS off_hours_count FROM transactions GROUP BY merchant_id; Works inside window functions too:

sum(amount) FILTER (WHERE merchant_category = 'gas_station') OVER ( PARTITION BY cardholder_id ORDER BY timestamp ROWS BETWEEN 100 PRECEDING AND CURRENT ROW ) AS rolling_gas_spend Mostly just easier to read once queries get large.

Support varies though. Postgres, Snowflake, DuckDB and ClickHouse support it. MySQL still doesn’t. BigQuery handles this a little differently with IF inside the aggregate instead.

4. FIRST_VALUE / LAST_VALUE / NTH_VALUE

These are useful anytime you need one row’s value carried across the whole partition.

Example: tagging every transaction with the first merchant a cardholder ever used.

SELECT cardholder_id, timestamp, merchant_id, first_value(merchant_id) OVER ( PARTITION BY cardholder_id ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS first_merchant_ever, last_value(merchant_id) OVER ( PARTITION BY cardholder_id ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS most_recent_merchant FROM transactions; The LAST_VALUE behavior trips people up constantly.

Without the explicit frame, LAST_VALUE usually returns the current row instead of the actual last row in the partition because the default frame stops at the current row.

NTH_VALUE works the same way conceptually. Useful for questions like “what was the third transaction this cardholder ever made?“

5. LEAD chains for sequence detection

LAG gets most of the...

Eight window-function tricks beyond LAG and ROW_NUMBER

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast