GitHub - el10savio/ecommrt: A real-time order analytics pipeline built on CDC from Postgres to ClickHouse on the Brazilian Ecommerce Dataset · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
el10savio
ecommrt
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>4 Commits<br>4 Commits
.github/workflows
.github/workflows
clickhouse
clickhouse
cmd
cmd
config
config
data/olist
data/olist
db
db
grafana/provisioning
grafana/provisioning
src
src
.dockerignore
.dockerignore
.gitignore
.gitignore
Dockerfile
Dockerfile
Grafana.png
Grafana.png
Makefile
Makefile
README.md
README.md
docker-compose.yml
docker-compose.yml
go.mod
go.mod
go.sum
go.sum
setup.sh
setup.sh
View all files
Repository files navigation
eCommerce Real Time CDC Pipeline
A real-time order analytics pipeline built on Change Data Capture on the Brazilian Ecommerce Dataset. Orders flow from a Go producer through Kafka into PostgreSQL, where Debezium captures every write via WAL replication and streams it into ClickHouse all visualised in Grafana.
Architecture
┌─────────────┐ orders.ingest ┌──────────────────────────────────────┐<br>│ Producer │─────────────────▶│ │<br>└─────────────┘ │ Kafka │<br>┌──▶│ │<br>│ └──────────────┬───────────────────────┘<br>│ orders.ingest │ postgres.public.*<br>│ │<br>│ ┌──────────┴──────────┐<br>│ │ │<br>│ ▼ ▼<br>│ Consumers ClickHouse<br>│ │ │<br>│ ▼ ▼<br>│ PostgreSQL Grafana<br>│ │<br>│ ▼<br>│ Debezium<br>│ │<br>└───────┘ postgres.public.*
The methodology of how we take a dataset and convert it into real time is:
We allow for the producer to iterate over the dataset and then spawn a limited number of coroutines to send it into Kafka it keeps running until the daset is completed.
Kafka then distributes it to consumers, who then write it into Postgres and Debezium, the CDC.
The CDC then recognizes the changes in the orders table and lets ClickHouse know of the changes via Kafka again.
We can then query ClickHouse in real time using Grafana to get the final real-time dashboard.
Prerequisites
Docker Desktop (Mac or Linux)
Docker Compose v2 (docker compose plugin)
Go 1.22+
The Olist Brazilian E-Commerce Dataset (place the CSV files under data/olist/ )
Required CSVs:
data/olist/olist_orders_dataset.csv<br>data/olist/olist_order_items_dataset.csv<br>data/olist/customers_dataset.csv<br>data/olist/products_dataset.csv<br>data/olist/sellers_dataset.csv
Getting Started
Place the Olist CSV files under data/olist/, then run:
make setup
setup.sh builds the app image, starts all services in dependency order, applies the Postgres schema, backfills reference tables, waits for each healthcheck, and registers the Debezium connector.
Once it completes, open Grafana at http://localhost:3000 and open the eCommerce Real Time CDC Pipeline dashboard.
Dashboard Layout
Row<br>Panels
Business KPIs<br>Total Orders, Total Revenue, Avg Order Value, Unique Customers
Live Trends<br>Orders per Minute, Revenue per Minute (pre-aggregated)
Leaderboards<br>Top 10 Products by Orders, Top 10 Customers by Order Count, Top 10 Sellers by Revenue
Pipeline Health<br>Event Throughput, Kafka Consumer Lag, Write Latency p99, Errors & Duplicate Skips
Application Metrics<br>CPU, Memory RSS, Goroutines, all per container, all replicas
Observability runs alongside the pipeline: both the producer and each consumer replica expose Go runtime and custom business metrics via OpenTelemetry to Prometheus, scraped individually per container.
References
Debezium PostgreSQL Connector
Olist Brazilian E-Commerce Dataset
About
A real-time order analytics pipeline built on CDC from Postgres to ClickHouse on the Brazilian Ecommerce Dataset
Resources
Readme
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
stars
Watchers
watching
Forks
forks
Report repository
Releases
No releases published
Packages
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this...