Rolling Aggregations for Real-Time AI | Hopsworks<br>All posts<br>TL;DR
Rolling aggregations are among the most useful input data for AI systems, enabling behavioral change detection and anomaly detection in real-time. They capture recent trends and patterns in a compressed representation, enabling interactive AI systems to meet the lowest latency requirements for feature freshness. Rolling aggregations are used by both real-time ML systems (e.g., credit card fraud and personalized recommendations) and interactive agents (e.g., compressed user history/behavior or summary of recent activity).
Rolling Aggregations - the Queen of AI Aggregations
A rolling aggregation computes statistics over a continuously shifting window of data. You aggregate over the last N values or time period at each point.
Figure 1. A rolling aggregation computes an aggregation function (e.g., SUM, AVG, MIN, MAX, STDDEV, etc) over a window size of time-series data (e.g. last hour).
Traditional windowed aggregations in stream processing (e.g., Apache Flink) are not designed to compute rolling aggregations, as they are computationally expensive - for every new event, you have to recompute the aggregate over all N events in the bucket. Instead, they use tumbling or hopping (sliding) windows to group data into windows of time, such as the last 10 minutes or hour, and compute aggregations over those windows.
Figure 2. Tumbling windows and hopping windows introduce a delay between when an event arrives and when an aggregation result is computed. Rolling aggregations are computed immediately when the event arrives.
Figure 2 shows how tumbling windows only output aggregations after the window length and watermark has passed, while hopping windows wait until the hop size has passed. While tumbling and hopping windows make computing aggregations over streaming data computationally tractable, they introduce latency. The output aggregations are as stale as the window length + watermark or the hop size. In AI terms, the feature data they output is stale. Stale data means your interactive AI applications will not be intelligent, just laggy. In contrast, rolling aggregations output results immediately when a new event arrives, producing fresh feature data.
Figure 3. Brief history of increasing feature freshness for rolling aggregations.
In Figure 3, you can see a brief history of the journey from tumbling/hopping windows to solutions for computing rolling aggregations. The first adopted approach was tiled window aggregations that combined stream processing with further computation at request-time. A lower cost solution was developed by Feldera recently based on incremental views for streaming processing. And recently, with RonDB, we developed native database support for parallel processing of aggregations - avoiding the need for stream processing. We now describe these approaches.
Shift Left and Shift Right with Tiled Window Aggregations and Chronon
AirBnB’s Chronon framework provided the first novel solution to reduce the computational overhead of computing rolling aggregations with an approach called tiled time window aggregations. Tecton Rift (based on DuckDB) and Chalk.ai (based on Apache Velox) also provide variants on this solution for scalable rolling aggregations.
Say you want to compute a precise 240 hour aggregation, you could decompose the events into 24-hour tiles, computed daily at 12am. The idea is that you can compute the 240 hour aggregation by composing partial aggregates for the 24-hour tiles. This works trivially for some aggregations (e.g., min, max), but requires maintaining additional state for others (e.g., mean).
Now imagine, a request arrives to compute a rolling aggregation at 1pm. Your tiles are only from 12am to 11.59pm. You will not have yet computed a tile for the current day’s events (from 12am-1pm) and you won’t have a tile for the events from 1pm of the last day in the interval (the tile for that day contains events not included in the interval). These events that lie outside the tiles are called head and tail events, respectively. Tiled window aggregations are computed by composing the partial aggregates with on-demand aggregations over the unaligned head/tail events.
Figure 4. Tiled aggregation combines precomputed partial aggregates (tiles) with on-demand computation to compose aggregations from both partial aggregates and recent events (head/tail events that are not included in a tile).
More specifically, tiles partition a window of length N into M tiles, where MRolling aggregations can be computed in stream processing systems such as Apache Flink with OVERaggregates. However, even though Apache Flink’s OVERaggregates can be partitioned over many workers, they do not scale well with increasing window size and increasing event throughput, as every new event triggers the recalculation of the aggregation function, and its computational cost is proportional to the window size, see Figure 5.
Figure 5....