Building a Soviet Nail Factory: how KPIs killed efficiency

In 2008, I landed my second job, in the network team at Orange Portails, the division behind the websites and search engine of the French telecom operator Orange. The place ran like clockwork: a comprehensive technical setup, a dedicated team for every part of the business, and room to focus on what I do best. A few years later, none of that mattered: thanks to an obsession with the numbers, we could no longer deliver new services on time.

Disclaimer

This is a story I like to tell to warn people about Goodhart’s law.1 As these events happened almost 15 years ago, my recollection is a bit fuzzy. I left in 2012.

Goodhart’s law often gets the credit, but Campbell’s law describes my experience even better: the more you lean on a number to make decisions, the faster people corrupt it. ❦

The first years#

During my first years, the department operated like a startup. Its cradle was the French company Echo. They built a search engine. France Télécom bought it and renamed it Voila. It was the most visited search engine in France in the early 2000s. France Télécom consolidated the portal activities into the Wanadoo Portails division, later renamed Orange Portails.

The technical environment was excellent. We had many internal tools:2 a ticket system, an RRD-based graphing tool, an IPAM, a reporting tool, and an SNMP-based alerting tool.3 We deployed our Linux servers with CFEngine. We installed systems and applications from internal Debian repositories. We documented everything in a private MediaWiki instance. Supervision was performed with an ancestor of Xymon. The network architecture was clean and scalable with little legacy. We onboarded new people in a day.

At the time, SaaS was not really a thing. I remember we considered, with a couple of colleagues, selling Wiremaps as a SaaS, with homomorphic encryption for the database. But who would outsource their observability stack? ❦

Snalert was a metacircular alerting tool in Perl. It was able to poll a very large number of SNMP targets in a short timespan. All our monitoring was SNMP-based, including system monitoring. ❦

It was a nurturing environment for me. I developed several tools: lldpd, an 802.1AB implementation, Snimpy, a pythonic binding for Net-SNMP, Wiremaps, a layer-2 discovery tool with a time machine to know which device is connected where, Kitérő, a tool to simulate network conditions, QCSS-3, a controller for load-balancers, and ipoo, a service available through a Jabber chatbot and a Greasemonkey script to expose IP-related information. I added SNMP support for Keepalived and Quagga. I also started this blog, with articles like “Anycast DNS,” TLS-related articles like “TLS computational DoS mitigation,” SNMP-related articles like “Integration of Net-SNMP into an event loop,” Linux-related articles like “Tuning Linux IPv4 route cache,” and an article about VXLAN long before it was cool.

The collapse#

When we needed new servers, the on-site team would take a set from the inventory, install our base Linux distribution on them, put them in the datacenter, and cable them to the top-of-the-rack switches. We opened a ticket describing the servers we needed, and one week later, our servers were available. 💫

Orange wanted to know if this team was performing well, so they asked for KPIs. They decided to use the number of tickets completed in a year. They asked to double this number. So instead of one ticket for a new service, we would open six tickets—one per server. By the end of the year, the KPIs had more than doubled.

Everybody saw it as a success for performance management. So, they asked to do the same for the next year. Now, we needed to open a ticket per server and per step. Again, the KPIs doubled. Behind the scenes, the tickets went to different people and were no longer handled in order. So, for the next year, it was decided to have meta-tickets and meetings to follow the progress of these tickets. Of course, all these extra steps pushed the KPI even higher.

This performance management method spread to the other teams.4 Everything became slower. Instead of a couple of weeks, a new service now took six months. We built a Soviet nail factory. But the KPIs were good, and we stopped caring.

My team also managed the rules of many Linux-based firewalls. To increase our KPIs, we used the same method: rather than accepting one ticket with a flow matrix, we requested one ticket per flow. ❦

Let me give you another example. We had to estimate the impact of each night operation. We weren’t half bad: we declared most operations “without any expected impact.” Most of the time, there was no impact. One time out of five, there was a 5-second impact. We were told to try harder to meet our expected impact. What did we do? We started declaring a 5-second expected impact. One day, we got a 30-second impact and were told we failed to match the expected impact. In the end, we declared most...

Building a Soviet Nail Factory: how KPIs killed efficiency

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

It's Not Just X. It's Y