What Happens to Platform Teams? | David Curlewis
This weekend the weather was a bit crap, so no golf. As a result I caught up on some reading. I read Martin Fowler’s June Fragments<br>which had a number of good posts, but one in particular got the thinking-juices flowing; Jamie Hurst’s blog post<br>about AI-assisted engineering from his perspective as a Principal Engineer.
I enjoyed his take, and it mirrors what I’m seeing in an engineering leadership role and within my teams. Since 2+ decades of my career (including my current role) have been in or close to platform engineering, it got me thinking about what impacts I’m seeing day-to-day due to the speed at which teams can now prototype-if-not-build their own software.
What is a platform team?
Just in case there’s anyone reading this who isn’t sure what a “platform team” is (and to ensure we’re all on the same page even if you do) - I define a platform team as any engineering team building for internal customers (i.e. users within the company). This is very broad, and open to nit-picking, I know. But let’s keep it simple for now, cool?
The reason for a platform function is to centralise certain capabilities that are generalisable, and where it wouldn’t make sense to have every team need to build out those capabilities themselves. Often the capabilities we build are commodity abstractions - commodities being those services which these days are common or table-stakes (e.g. storage, compute, networking, messaging systems, build & deploy, etc). We used to live by the 80:20 rule, whereby we would build for and support the majority of users and use-cases, while a smaller minority could choose to walk a different path, but at the risk of being unsupported to some extent.
There’s a second reason too. Even if building the thing were free, most product engineers don’t want to carry the cognitive load of running it; the version upgrades, the after-hours support, the future roadmap juggling… centralising it means someone else carries that load.
Story time with Uncle Dave
I like analogies (and pictures). One I often go back to is that of the platform team building nicely paved highways from here to your chosen destination. We make it easy for lots of people to get there with minimal fuss.<br>You can still walk there yourself though, and we might even give you a machete so you can beat your own path through the bush. But it’ll be slower and harder to maintain.<br>If on the other hand we see a significant portion of our users all deciding to take the bush track (maybe because our nice highway curves off in the wrong direction or takes too long to get there) then we can step in, chat to the original bush path creator, and offer to send our graders and asphalt machines in to widen and pave the path, with us taking on future maintenance and upkeep too.
AI changes this analogy now though. Instead of machetes, our users have AI-bulldozers, so if they choose not to use our highway, they can drive their bulldozer through the bush to their destination, leaving behind a still-pretty-rural dirt road, but better than the old bush track.
One shift we’re making (more on this below) is to offer lower-level APIs. Think of it as us, rather than still handing out machetes in this new AI-bulldozer world, getting into the business of building road-laying machines for our users to use instead of their somewhat agricultural bulldozers. Working a little closer with the user at the early stages means we end up with paved rather than dirt roads. Easier to maintain, built to a known baseline level of “good” thanks to us providing the machinery and building materials, and not as big a lift-and-shift if we do end up taking over ownership of the road in future.
The build-it-once argument is losing ground
I say “We used to live by the 80:20 rule” because I think it’s changing in the AI era. The cost of developing software systems is falling, so the argument for having a team to centralise any common software required is losing ground. And that’s because why should a team need to wait for us to add some functionality to a common inference runtime library when, thanks to the power of modern models and parallel agent execution, they can probably build their own library from scratch in a day?! It’s a fair question, and while any experienced engineer will tell you that the real cost isn’t up-front development but the total cost of ownership over years, the attraction of “just building it yourself” is hard for some to resist. Notice though that cheap building only erodes the first reason for a platform team. The second one, sparing people the cognitive load, gets stronger the more that gets built; more surface area and more moving parts to keep in your head.
This also isn’t me saying they should resist it. We are being told more and more these days...