Deploy Platforms Instead of Charts

Deploy Platforms Instead Of Charts | torque

Deploy Platforms Instead Of Charts

Torque packages the stack root, profiles, app assets, public access checks, and run evidence into one repeatable release program. The proof run used real payment events and verified S3 objects, Redpanda schema contracts and offsets, ClickHouse rows, Iceberg tables, Trino queries, Spark batch output, replay/backfill, and public endpoints.

GitHub stack package Packaged stackfile Production values

Platform architecture Payments stream in real time while batch features rebuild from the lake

The API writes raw events to Redpanda and S3. Flink scores the stream with Ray Serve, ClickHouse stores decisions, Spark commits Iceberg tables through a REST catalog, Trino exposes analyst SQL, and Argo schedules replayable feature jobs.

stream

Payments API public event ingress

raw

Redpanda payments.raw topic

consume

Flink continuous risk job

scoring

Flink feature window

score

Ray Serve risk model endpoint

write

ClickHouse payment decisions

batch

Argo scheduled workflow

run

Spark bounded feature job

persist

S3 + Iceberg curated features

telemetry

Workloads API, jobs, model service

signals

SigNoz service health surface

store

ClickHouse observability backend

The central object in this article is the Torque stack file. It is not a values file for one chart. It is an ordered deployment contract for a full fraud platform: host setup, Kubernetes access, cloud storage, data services, application workloads, public checks, batch processing, replay, and final verification.

Helmfile can coordinate Helm releases. Argo CD and Flux can keep Kubernetes objects in sync with Git. Terraform and Pulumi can create cloud resources. Argo Workflows can run jobs. This stack uses that kind of tooling where it fits, but the stack boundary is wider. The deployment starts with a remote Linux host, creates a Firecracker backed k3s lab, opens a controlled tunnel for local Kubernetes access, creates or checks the S3 bucket, installs the platform services, deploys the fraud workloads, runs Spark, runs replay, and verifies the data path from outside the cluster.

The source package lives under stacks/fraud-platform. The rendered stack file is attached here as stack.yaml. The production shaped entrypoint is stack-packaged.yaml, with profile values such as values-prod.yaml. The lab profile expects TORQUE_LAB_SSH for the target host and TORQUE_LAB_PUBLIC_IP for public endpoint checks.

Platform Architecture

The platform has two data paths. The stream path starts at the payments API. That API is the public event ingress. It accepts generated card payment events, writes each raw event to the payments.raw Redpanda topic, and stores a raw JSON object in S3. Redpanda is more than a queue in this stack. It also exposes a schema registry with subjects for raw payment events, risk events, and payment decisions, and the verification step checks those subjects for backward compatibility.

Flink consumes the raw topic and builds the scoring request. It keeps the stream processor separate from the model service. Ray Serve owns the model endpoint and returns the score. Flink writes the decision to ClickHouse and also produces risk output back through Redpanda. ClickHouse is the fast operator store in this design: decisions, fraud rates, merchant behavior, and batch summaries are available without reading lake files.

The batch path starts from S3. Argo submits a Spark workflow after the platform is reachable. Spark reads raw payment objects and risk decisions, computes aggregate fraud features, writes a curated JSON artifact back to S3, inserts batch rows into ClickHouse, and commits three Iceberg tables through the REST catalog: raw_payments, risk_events, and batch_feature_summary. Trino is the analyst surface over both stores. It queries ClickHouse for low latency decisions and Iceberg for lakehouse tables. SigNoz covers service health and uses ClickHouse as its telemetry store.

The Kubernetes layout is also part of the architecture. The platform labels Firecracker nodes by workload role: control, observability, events, processing, machine learning and batch, and analytics. Redpanda runs in the data namespace. Flink runs in the stream namespace. Ray and Spark run in the machine learning namespace. The API and generator run in the apps namespace. SigNoz and its ClickHouse store run in observability. Trino and Iceberg REST run on the analytics node so SQL access is separate from stream processing and model serving.

Stream And Scoring