Control Planes Are Control Systems

wmjbyatt1 pts0 comments

The Control Plane Is a Control System — byatt.io

The modern control plane is a control system. One might think that goes without saying, but what’s critical here is that this claim has theoretical content. When I say “the control plane is a control system,” this is not a metaphor (at least, no more than any other sentence is a metaphor). I’m claiming that the language, theory, and results of a century of control systems engineering make formal assertions about the systems we run, and, more to the point, that we can use those assertions to first explore and then constrain the design space when we build control planes.

The vocabulary of control

First, let’s establish what I mean by a control system. The vocabulary is blessedly short and finite. A control system has the following components:

The plant : the system being governed.

The reference : the description of what we want the plant to look like. This is the end goal, the desired state.

One or more sensors : the instrumentation we have to know what reality currently looks like.

The error : the measured difference between the reference and the sensor readings.

The controller : the component that consumes the error and develops a plan for correction.

The actuator : the component that applies the controller’s corrections to the plant. This is the point at which governance concretely occurs.

This maps onto the control plane with surprisingly little effort. Every component exists, with clean correspondence:

Control theory<br>Distributed systems

plant<br>data plane

reference<br>desired state, manifests

sensor<br>status reporting, health checks, watches, telemetry

error<br>drift

controller<br>reconciler, controller, operator

actuator<br>mutating operations

This correspondence is structural, not incidental. A Kubernetes reconciler very literally computes an error term — desired state minus observed state — and selects an actuation intended to drive that error to zero. It is a negative feedback loop with exactly the same structure the block diagram above describes, and that block diagram is the object about which control theory makes, supports, and proves claims.

One difference worth noting, because rigor demands it: our control planes are sampled, discrete-time, nonlinear systems, and the controller usually has at best a crude model of the plant: we don&rsquo;t model the precise execution state and out-of-order CPU pipeline of every downstream dataplane. The elegant closed-form results of classical control broadly assume linearity and continuity we don&rsquo;t have. Some of the algebra and, more pointedly, the differential calculus may not always transfer. What does transfer, however, is the taxonomy: the enumeration of the ways feedback loops behave and the ways they fail, and the library of mechanisms whose stabilizing properties are formally understood. For software architects, those mechanisms and their properties are what we need to understand to build better systems and deliver them to our organizations.

Modeling Problems

Any engineer or architect who has designed, built, or operated a control plane has encountered unstable control. This is the control failure where the system never reaches steady state. In the language of our control systems vocabulary, the plant never agrees with the reference, actual state never achieves desired state. Predicting instabilities can, at first, appear intractable, as though our only path is to aggressively exercise and monitor the control plane, wait for pathologies, and engineer them out.

However, if we appropriately model the control plane as a control system and apply controls-theoretic framing, we can get a robust enumeration of this problem space. This lets us rely on prior art to get ahead of failures: instead of designing against the happy path plus whatever failure modes we happen to imagine (a famously weak form of engineering imagination), we can take the known failure modes of feedback systems, ask what each one means in a control plane, and design against that.

This theory body gives us four primary destabilization mechanisms. That&rsquo;s four reasonably well-understood ways a control loop stops converging on its reference and starts generating incorrect, oscillating, or diverging corrections.

Lag. There is always non-zero latency between the control plane&rsquo;s observation of the data plane and the application of a change. Watch propagation, cache staleness, work queues, and the actuation itself all take real time. In control terms this is dead time in the loop, and dead time is poison: the controller is always correcting an error that existed some time ago. The canonical incident shape: an autoscaler observes high load and adds capacity; the capacity takes ninety seconds to become visible in the metrics; the autoscaler observes the same high load at the next tick and adds capacity again. The controller has amplified its own in-flight correction. Nothing malfunctioned....

control plane systems system rsquo state

Related Articles