The Qbeast split-plane SaaS Architecture

fpj1 pts0 comments

The Qbeast split-plane SaaS Architecture | Qbeast Blog

Databricks: Thank you for the tremendous response at Databricks AI Summit 2025. Your interest in our work inspires us to keep pushing the boundaries of data innovation.<br>|       Qbeast Secures $7.6M Seed Funding from PeakXV to Help Open Data Platforms Scale Efficiently.  |<br>Databricks: Thank you for the tremendous response at Databricks AI Summit 2025. Your interest in our work inspires us to keep pushing the boundaries of data innovation.<br>|   Qbeast Secures $7.6M Seed Funding from PeakXV to Help Open Data Platforms Scale Efficiently  |

Databricks: Thank you for the tremendous response at Databricks AI Summit 2025. Your interest in our work inspires us to keep pushing the boundaries of data innovation.<br>Databricks: Thank you for the tremendous response at Databricks AI Summit 2025. Your interest in our work inspires us to keep pushing the boundaries of data innovation.

← Blogs /<br>The Qbeast split-plane SaaS Architecture

The Qbeast split-plane SaaS Architecture

Deploying into customer clouds securely, at scale, on AWS, Azure, and GCP.

A data lakehouse is a modern approach to data analytics built on a disaggregated, open stack. It is disaggregated in that it relies on an object store for data persistence and one or more query engines for query execution. Being open refers to the use of open file and table formats, along with interoperability across components from both vendors and the open source ecosystem.<br>At Qbeast, we develop technology that organizes the data layout of lakehouse tables to improve performance and reduce compute costs. The data layout refers to the files that make up a table and their contents. Organizing it means making deliberate choices about the number of files and what they contain, with the goal of maximizing query efficiency. We do this through clustering guided by an index tree, which lets us prune data significantly at query time and update table clustering incrementally as new data arrives.<br>This post is not about the indexing itself, though. It is about our implementation of data layout as infrastructure: the setup we developed to enable and support our customers with their lakehouse operations. We received requests to support all three major cloud providers — AWS, Google Cloud (GCP), and Microsoft Azure — and ultimately implemented support for all of them. Our goal was to centralize and automate as much as possible to reduce the complexity of managing multi-cloud deployments. To that end, we implemented a control plane using Crossplane.<br>Given widespread concern about data leaving the customer's account, we deploy directly into the customer environment. We currently offer two deployment modes: managed and provided. Because our components run on Kubernetes, customers can either provide us with a Kubernetes namespace (provided) or have us provision a cluster on their behalf (managed). In both cases, we are responsible for managing the Qbeast components ourselves.<br>With the control plane running in the Qbeast account and the data plane running in the customer account, we refer to this as a split-plane architecture. This post discusses the key architectural decisions and trade-offs we encountered in building it. We are still early in development and fully expect to adapt and refine our approach as we gain operational experience.<br>Building a control plane to support multiple clouds<br>Our customers run on all three major cloud providers, so we decided to support all of them: AWS, Azure, and GCP.<br>When choosing the tooling our SaaS would use to deploy into customer infrastructure, our first thought was a classical Terraform and Helm based solution, the setup we had relied on in our early deployments. At the time, the Crossplane project was maturing and gaining traction in the ecosystem, so we gave it a shot — first using it to manage part of our own infrastructure, then in real-world proofs of concept.<br>Crossplane offers a set of features that fit our needs remarkably well:<br>API-first design : Crossplane extends the Kubernetes Custom Resource Definition mechanism with managed resources that map to cloud resources (a VPC, an object storage bucket) or Kubernetes-native objects (a service account, a Helm release). Deploying a resource becomes creating an object in the Kubernetes cluster where Crossplane is installed.<br>Centralized resource state : Crossplane mirrors in Kubernetes the state of every resource it manages, embedding the status directly into the corresponding Kubernetes object. This makes it easy to understand the current state of a deployment at a glance.<br>Reconciliation loop : Crossplane continuously drives the target resources toward the desired state, retrying until they conform.<br>Native Kubernetes : Crossplane builds on the reliability of Kubernetes, so all the existing tooling for deploying and troubleshooting Kubernetes workloads applies.<br>Multi-cloud coverage : all three major cloud providers are supported, covering every kind...

data qbeast kubernetes plane databricks crossplane

Related Articles