What Python async exposes that synchronous code hides
Menu
Asynchronous IO does more than overlap work. It changes the shape of time<br>inside a system, making behaviours visible that synchronous code hides. Hidden<br>waits become explicit. Accidental coupling shows up immediately. Backpressure<br>appears as measurable queueing rather than blocked threads.
One value of asynchrony is it acts as a diagnostic tool. The moment two steps<br>depend on each other, the dependency becomes visible because the event loop<br>cannot progress until the awaited operation completes. What looked OK in<br>synchronous code becomes an operational constraint you can now see, measure,<br>and reason about.
The example below shows how Python async reshapes execution flow. A long‑lived<br>call to the server yields instead of blocking, allowing the client to process<br>the next image while the previous upload is still in flight. The result is not<br>just a speedup of 6.7 times, it is a clearer picture of where your system is<br>coupled, where it is IO‑bound, and where backpressure begins to form.
Backpressure is the situation where a service becomes overloaded, slowing down<br>those that call in. In synchronous code, this slow down is communicated<br>downstream to code that relies on the now slower process. Threads in the slower<br>process block, no useful work can be done. The process seizes up. Synchrony<br>has coupled the performance of the two processes together.
Backpressure is the system signalling "I am saturated", and it shows up as<br>silence: growing waits, stalled pipelines, or accumulating tasks, rather than<br>explicit errors. The system slows down beore it falls over.
To get a handle on such errors in the asynchronous examples, OpenTelemetry spans<br>are included to show how observability aligns naturally with async boundaries.<br>Because each Python await is a structured yield point, tracing maps<br>directly onto the concurrency model.
Asynchrony: the fundamental idea
The fundamental idea is to put long-lived operations into a separate execution<br>context so that other work can be progressed while waiting for the long-lived<br>operation to complete.
In this way we overlap the completion of tasks. The long-lived execution is<br>started, but the call to it returns immediately. The other work can then<br>start.
It is important that the other work has no dependence on the long-lived work.<br>This is necessary because if the second was dependent on the first, the first<br>would have to complete before the second could be started. In this case, the<br>two bodies of work would have to operate serially, not concurrently.
For concurrency to work, the work items must be independent of one another.
Our example
Our example is a long-lived RESTful call that receives client JPG data. During<br>the long server wait, the client performs useful work preparing the next<br>image file for upload.
In addition, observability support is shown by using<br>OpenTelemetry, using a tracer and span to capture a timed, structured record of<br>one operation. The observability for the client and server sides of the JPG<br>processing are built by linking a span together.
A span is a timed, structured record of one operation.<br>Client and server observability is built by linking many spans together.
Input/Output
Input/Output (IO) is a common operation that takes time. Your code might have<br>to wait for:
a remote service to return such as a payment provider or cloud storage
a database query
a large file to be read into memory
machine learning inference
Kafka partitions to respond
A Synchronous Approach
When programming synchronously, the thread that is executing must wait (block)<br>for the entire IO operation. In a single threaded program, nothing else is<br>getting done while you wait.
In a threaded program using a thread pool, long IO wait times makes it more<br>likely that all threads block.
For example, in a pool of 32 threads, if the average blocking time is 2s and 32<br>requests are received within 2s, all 32 threads will be blocked. A 33rd request<br>will have to wait on a thread becoming available. Queued requests inherit the<br>delay.
If you have a downstream system waiting on the 32 requests completing, you now<br>have back pressure building and being communicated around your distributed<br>system.
The slow remote service is slowing down your downstream service which is likely<br>to slow any other downsteam services, which could be a user interface, giving<br>your end user a degraded experience.
In a synchronous system you want to give yourself a heads-up when production<br>moves towards backpressure. Observability of wait times, configuration (the use<br>of 32 threads as opposed to another number), and resource usage --- all 32<br>threads being occupied --- are important signals to track.
Using asynchrony
Python's asynchrony provides:
The event loop — a scheduler within Python
Coroutines — functions that can suspend and resume
Tasks — scheduled coroutines managed by the event loop
Await points — explicit yield boundaries where a coroutine...