How Long Does That Response Take for Real?

Aydarbek1 pts0 comments

memcached - a distributed memory object caching system

Home

About

Downloads

Blog

Mailing List

Docs

Bugs

Sponsor Us!

How Long Does That Response Take... For Real? - Dormando (May 8, 2026)

Introduction

Why does memcached not have response time metrics? This is a frustrating<br>question with an unsatisfying answer: the metrics would be misleading.

Kicking off with a spoiler: memcached response time is best measured by<br>sampling response times from the client. This takes the entire round trip into consideration<br>and gives the most actionable information, most of the time. The rest of this<br>post is an exploration of why.

To Measure Time, You Have to Start Somewhere

What is the goal of measuring response time? We want it to<br>inform us of the health of the system and its upstream impact. We want to drop the times in a graph and throw<br>an alert if it gets out of whack, or correlate it with other data if a service<br>is impacted. This seems like a silly question but it is important to ensure a<br>metric actually answers what we think it does.

In most common services measuring response time works the same way: A<br>request arrives at the service, and it notes the time when it begins to<br>process it. When it is ready to ship the response back to the client, it<br>checks the time again and compares it with the start. Easy, right?

Finding the Start for a Web API Call

A typical application is processing a request for much longer than a<br>millisecond. It might compete for resources, make sub-requests<br>to other services, read data from disk, etc. An application has many<br>dependencies that influence how long it takes to generate a response.

A service reads a request from the network, notes the time it was<br>received, then enqueues it or ships it off to other threads to process.<br>In a Go app, for example, many lightweight threads are starting and<br>stopping as a request moves through the program.

Finding the Start for a Memcached Call

That start time is key here. Memcached is unlike most infrastructure<br>software: requests are typically processed in less than a millisecond.<br>The request load does not usually change this number! This seems absurd,<br>because we can observe response times much higher than this from a client when<br>under load.

When we measure time is<br>critical. Memcached processes requests as soon as they are read off of a<br>network socket. Responses are very quickly generated. The first chance it gets<br>to measure time is close to the end.

What Influences Total Time?

A large image will take longer to process than a smaller<br>one. If loading a product category for a store, more items will take longer to<br>process. It may make many database calls (or calls to memcached!) to decorate<br>products with size, price, and inventory level.

What happens if a server gets overloaded? Does it keep reading requests from<br>the network, creating a queue internally, then process them as it can? Does it<br>refuse further requests and let a load balancer redirect to another server?

The thread model for memcached is one worker thread per CPU core. When<br>requests are sent to memcached one thread gets notified that sockets are ready to<br>read. It then iterates through the &ldquo;ready&rdquo; sockets one at a time, reading data<br>from the network. Worker threads operate independently from each other,<br>only sharing cache data.

If a worker thread has a lot of sockets to read from at the same time, the<br>last socket in the list will have the worst response time. Sadly we cannot<br>measure the time a request waits in a queue, only the time spent processing<br>requests. A GET request will take the same amount of time regardless of<br>how busy the server is.

What happens when memcached is overloaded? Requests will sit in OS network<br>buffers waiting to be read, without any way of kicking off the stop watch. Now<br>we fail to answer to our original goal: internal response times do not tell us<br>much. Best case we are wasting CPU tracking the measurement.

How Measuring Time Can Mislead

I did lie a bit: internal response time can vary. The problem with response<br>time is it does not tell us where to look.

Misconfigured? More worker threads than CPU cores? Too many other programs running?

If a client sends 100 requests at the same time, each individual request<br>will take a tenth of a millisecond, but the client may only see responses<br>after all 100 are processed.

Huge responses take the same amount of time for memcached to process as small ones. A<br>client will take a lot longer to read and parse a megabyte than a kilobyte.

SET requests scale poorly in memcached. A very high SET load can cause<br>requests to take measurably longer… but only SET commands! GETs are still<br>stuck in a network queue.

We use SSD storage when extstore is enabled, which can slow down. This is<br>legitimate and we should measure specifically time waiting on disk. This<br>gives us one number that says &ldquo;The disk is slow&rdquo;, rather than something vague.

Measure from the Client

We recommend looking at the total...

time response memcached requests take from

Related Articles