A synthetic order analytics pipeline built on CDC from Postgres to ClickHouse

ugabuga1 pts0 comments

GitHub - el10savio/ecommrt: A real-time order analytics pipeline built on CDC from Postgres to ClickHouse on the Brazilian Ecommerce Dataset · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

el10savio

ecommrt

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>4 Commits<br>4 Commits

.github/workflows

.github/workflows

clickhouse

clickhouse

cmd

cmd

config

config

data/olist

data/olist

db

db

grafana/provisioning

grafana/provisioning

src

src

.dockerignore

.dockerignore

.gitignore

.gitignore

Dockerfile

Dockerfile

Grafana.png

Grafana.png

Makefile

Makefile

README.md

README.md

docker-compose.yml

docker-compose.yml

go.mod

go.mod

go.sum

go.sum

setup.sh

setup.sh

View all files

Repository files navigation

eCommerce Real Time CDC Pipeline

A real-time order analytics pipeline built on Change Data Capture on the Brazilian Ecommerce Dataset. Orders flow from a Go producer through Kafka into PostgreSQL, where Debezium captures every write via WAL replication and streams it into ClickHouse all visualised in Grafana.

Architecture

┌─────────────┐ orders.ingest ┌──────────────────────────────────────┐<br>│ Producer │─────────────────▶│ │<br>└─────────────┘ │ Kafka │<br>┌──▶│ │<br>│ └──────────────┬───────────────────────┘<br>│ orders.ingest │ postgres.public.*<br>│ │<br>│ ┌──────────┴──────────┐<br>│ │ │<br>│ ▼ ▼<br>│ Consumers ClickHouse<br>│ │ │<br>│ ▼ ▼<br>│ PostgreSQL Grafana<br>│ │<br>│ ▼<br>│ Debezium<br>│ │<br>└───────┘ postgres.public.*

The methodology of how we take a dataset and convert it into real time is:

We allow for the producer to iterate over the dataset and then spawn a limited number of coroutines to send it into Kafka it keeps running until the daset is completed.

Kafka then distributes it to consumers, who then write it into Postgres and Debezium, the CDC.

The CDC then recognizes the changes in the orders table and lets ClickHouse know of the changes via Kafka again.

We can then query ClickHouse in real time using Grafana to get the final real-time dashboard.

Prerequisites

Docker Desktop (Mac or Linux)

Docker Compose v2 (docker compose plugin)

Go 1.22+

The Olist Brazilian E-Commerce Dataset (place the CSV files under data/olist/ )

Required CSVs:

data/olist/olist_orders_dataset.csv<br>data/olist/olist_order_items_dataset.csv<br>data/olist/customers_dataset.csv<br>data/olist/products_dataset.csv<br>data/olist/sellers_dataset.csv

Getting Started

Place the Olist CSV files under data/olist/, then run:

make setup

setup.sh builds the app image, starts all services in dependency order, applies the Postgres schema, backfills reference tables, waits for each healthcheck, and registers the Debezium connector.

Once it completes, open Grafana at http://localhost:3000 and open the eCommerce Real Time CDC Pipeline dashboard.

Dashboard Layout

Row<br>Panels

Business KPIs<br>Total Orders, Total Revenue, Avg Order Value, Unique Customers

Live Trends<br>Orders per Minute, Revenue per Minute (pre-aggregated)

Leaderboards<br>Top 10 Products by Orders, Top 10 Customers by Order Count, Top 10 Sellers by Revenue

Pipeline Health<br>Event Throughput, Kafka Consumer Lag, Write Latency p99, Errors & Duplicate Skips

Application Metrics<br>CPU, Memory RSS, Goroutines, all per container, all replicas

Observability runs alongside the pipeline: both the producer and each consumer replica expose Go runtime and custom business metrics via OpenTelemetry to Prometheus, scraped individually per container.

References

Debezium PostgreSQL Connector

Olist Brazilian E-Commerce Dataset

About

A real-time order analytics pipeline built on CDC from Postgres to ClickHouse on the Brazilian Ecommerce Dataset

Resources

Readme

Uh oh!

There was an error while loading. Please reload this page.

Activity

Stars

stars

Watchers

watching

Forks

forks

Report repository

Releases

No releases published

Packages

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this...

olist data clickhouse pipeline real time

Related Articles