How Long Does That Response Take for Real?

memcached - a distributed memory object caching system

Home

About

Downloads

Blog

Mailing List

Docs

Bugs

Sponsor Us!

How Long Does That Response Take... For Real? - Dormando (May 8, 2026)

Introduction

Why does memcached not have response time metrics? This is a frustrating question with an unsatisfying answer: the metrics would be misleading.

Kicking off with a spoiler: memcached response time is best measured by sampling response times from the client. This takes the entire round trip into consideration and gives the most actionable information, most of the time. The rest of this post is an exploration of why.

To Measure Time, You Have to Start Somewhere

What is the goal of measuring response time? We want it to inform us of the health of the system and its upstream impact. We want to drop the times in a graph and throw an alert if it gets out of whack, or correlate it with other data if a service is impacted. This seems like a silly question but it is important to ensure a metric actually answers what we think it does.

In most common services measuring response time works the same way: A request arrives at the service, and it notes the time when it begins to process it. When it is ready to ship the response back to the client, it checks the time again and compares it with the start. Easy, right?

Finding the Start for a Web API Call

A typical application is processing a request for much longer than a millisecond. It might compete for resources, make sub-requests to other services, read data from disk, etc. An application has many dependencies that influence how long it takes to generate a response.

A service reads a request from the network, notes the time it was received, then enqueues it or ships it off to other threads to process. In a Go app, for example, many lightweight threads are starting and stopping as a request moves through the program.

Finding the Start for a Memcached Call

That start time is key here. Memcached is unlike most infrastructure software: requests are typically processed in less than a millisecond. The request load does not usually change this number! This seems absurd, because we can observe response times much higher than this from a client when under load.

When we measure time is critical. Memcached processes requests as soon as they are read off of a network socket. Responses are very quickly generated. The first chance it gets to measure time is close to the end.

What Influences Total Time?

A large image will take longer to process than a smaller one. If loading a product category for a store, more items will take longer to process. It may make many database calls (or calls to memcached!) to decorate products with size, price, and inventory level.

What happens if a server gets overloaded? Does it keep reading requests from the network, creating a queue internally, then process them as it can? Does it refuse further requests and let a load balancer redirect to another server?

The thread model for memcached is one worker thread per CPU core. When requests are sent to memcached one thread gets notified that sockets are ready to read. It then iterates through the “ready” sockets one at a time, reading data from the network. Worker threads operate independently from each other, only sharing cache data.

If a worker thread has a lot of sockets to read from at the same time, the last socket in the list will have the worst response time. Sadly we cannot measure the time a request waits in a queue, only the time spent processing requests. A GET request will take the same amount of time regardless of how busy the server is.

What happens when memcached is overloaded? Requests will sit in OS network buffers waiting to be read, without any way of kicking off the stop watch. Now we fail to answer to our original goal: internal response times do not tell us much. Best case we are wasting CPU tracking the measurement.

How Measuring Time Can Mislead

I did lie a bit: internal response time can vary. The problem with response time is it does not tell us where to look.

Misconfigured? More worker threads than CPU cores? Too many other programs running?

If a client sends 100 requests at the same time, each individual request will take a tenth of a millisecond, but the client may only see responses after all 100 are processed.

Huge responses take the same amount of time for memcached to process as small ones. A client will take a lot longer to read and parse a megabyte than a kilobyte.

SET requests scale poorly in memcached. A very high SET load can cause requests to take measurably longer… but only SET commands! GETs are still stuck in a network queue.

We use SSD storage when extstore is enabled, which can slow down. This is legitimate and we should measure specifically time waiting on disk. This gives us one number that says “The disk is slow”, rather than something vague.

Measure from the Client

We recommend looking at the total...

How Long Does That Response Take for Real?

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI