Pipe-Lining Time Series Calculations for Cache Efficiency (2014)

Pipe-lining Time Series Calculations for Cache Efficiency "

Contact

Home

Pulse Platform

QStudio

Data

kdb+ Tutorials

Blog

TimeStored Blog Home TimeStored Blog

" Bitwise Operators for Kdb+ Database

Command Line Kdb+ Charts "

Pipe-lining Time Series Calculations for Cache Efficiency

September 8th, 2014 by admin

I always like to investigate new technology and this week I found a nice automatic technique for improved cache use that I had previously seen some people manually write.

Consider a database query with three steps (three SQL SELECTs), some databases may pass results of each step to temporary tables in main memory. When the first step is finished, these intermediate results are passed back into CPU cache to be transformed by the second step, then back into a new temporary table in main memory, and so on.

To eliminate this back-and-forth, vector-based statistical functions can be pipelined, with the output of one function becoming input for the next, whose output feeds a third function, etc. Intermediate results stay in the pipeline inside CPU cache, with only the full result being materialized at the end.

This technology is part of ExtremeDB, they have a video that explains it well:

Time Series Calculations

-->

The video content presented here requires JavaScript to be enabled and the latest version of the Adobe Flash Player. If you are using a browser with JavaScript disabled please enable it now. Otherwise, please update your version of the free Adobe Flash Player by downloading here.

-->

Moving Averages Stock Price Example

This is what the actual code would look like to calculate the 5-day and 21-day moving averages for a stock and detect the points where the faster moving average (5-day) crosses over or under the slower moving average (21-day):

select seq_map(ClosePrice, seq_cross(seq_sub(seq_window_agg_avg(ClosePrice, 5), seq_window_agg_avg(ClosePrice, 21)), 1)) from Security;

Two invocations of ‘seq_window_agg_avg’ execute over the closing price sequence, ‘ClosePrice’, to obtain 5-day and 21-day moving averages.

The function ‘seq_sub’ subtracts 21- from 5-day moving averages;

The result “feeds” a fourth function, ‘seq_cross’, to identify where the 5- and 21-day moving averages cross.

Finally, the function ‘seq_map’ maps the crossovers to the original ‘ClosePrice’ sequence, returning closing prices where the moving averages crossed.

This approach eliminates the need to create, populate and query temporary tables outside CPU cache in main memory. One "tile" of input data is processed by each invocation of ‘mco_seq_window_agg_avg_double()’, each time producing one tile of output that is passed to ‘mco_seq_sub_double()’ which, in turn, produces one tile of output that is passed as input to mco_seq_cross_double(), which produces one tile of output that is passed to mco_seq_map_double(). When the last function, mco_seq_map_double() has exhausted its input, the whole process repeats from the top to produce new tiles for additional processing.

A very cool idea!

And yes, ExtremeDB are the same guys that posted the top Stac M3 benchmark for a while (in 2012/13 I think).

-->

Tags: column database, tick data, time series, timeseries database.

No Comments

Name (Required)

Mail (Will not be published) (Required)

Website

" Bitwise Operators for Kdb+ Database

Command Line Kdb+ Charts "

Pipe-Lining Time Series Calculations for Cache Efficiency (2014)

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits