The Not So Simple Task of Identifying Retail Flow | by Sam Markelon | Proof Reading | May, 2026 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
Proof Reading
Proof is a new institutional equities broker.
The Not So Simple Task of Identifying Retail Flow
Sam Markelon
9 min read·<br>1 hour ago
Listen
Share
SEC data indicates that about 60% of US households hold stocks, increasing steadily from 50% a decade ago. Further, individual (or retail) investors constitute approximately a third of daily trading volume on any given day [1]. The ability to identify retail flow in real time is of interest to Proof for a number of reasons. Most obviously, the overwhelming majority of retail flow is internalized by wholesalers (more on this later). This represents a source of flow that is inaccessible to us, thus being able to identify this flow in real time, as well as general persistent patterns in retail flow, would lead to adjustments in how we handle certain orders and tune parameters of certain algos. Moreover, in extreme situations, when there is outsized retail volume in a particular symbol, anomalous market events that we should account for can occur. For example: increased price volatility, opportunities for accessible retail liquidity as wholesalers pass through trades to exchanges, and potentially available liquidity from the wholesalers themselves, as they try to hedge against their position induced by the increased retail activity in a single direction. Such an event occurred during the GME saga in early 2021 [2].<br>So given that identifying retail flow is a worthwhile endeavor, how do we go about doing it? A good first step is to have access to all trades that occur in the US equities market in (roughly) real time. Luckily, the US Securities Information Processor (SIP) is a consolidated feed of all trades from every trading venue as well as the prevailing NBBO [3,4] At Proof, we ingest the SIP data feed in real time from Exegy[5].<br>To actually identify retail flow on the SIP we will need to know where to look. It is known that more than 90% of marketable retail orders are routed directly to off-exchange retail wholesales and filled, i.e., internalized [6]. Citadel and Virtu account for at least 70% of this wholesaler activity alone [7]. Retail brokers, like Robinhood, E*Trade, and Fidelity, first route client orders directly to these wholesalers where the large majority of the time the wholesaler acts as a market maker and fills the retail clients order directly. These wholesalers pay the retail brokers in exchange for the flow. This system is called “Payment for Order Flow” (PFOF). While there is much debate about this practice (see [8]), the friendly interpretation of PFOF is that both the retail client and wholesaler mutually benefit. The retail client benefits from commission free trading and often price improvement on their fill compared to the prevailing NBBO (the retail broker passing some of their fee taken from the wholesaler onto the client). Wholesalers of course take the spread as profit while getting a steady stream of flow that is thought to be less risky than institutional flow.<br>So great, we can simply look at our SIP data feed, pick out all trades that are marked as executing on these wholesalers’ platforms and identify almost all retail flow! Unfortunately, it is not that simple. These wholesaler platforms act under a different regulatory model than exchanges like NYSE and NASDAQ. They must still pass on trade information to the SIP within 10 seconds of execution time (usually they do report much faster than this), but they do so through FINRA’s trade reporting facility (TRF) [9]. The TRF then passes this information along to the SIP. However, the identifier of where the trade occurred is not present like it would be if it were to occur on a lit exchange. Rather, all wholesalers’ trades are simply identified with the same non-descript TRF identifier. This perhaps would not be so bad on its own if only retail wholesalers were reported in this way, but this is not the case. Dark pools also report trades in the same manner and end up with the same identifier on the SIP feed. As dark pools now account for 35–45% of daily trading volume [10], we are left with a problem where we need to distinguish retail flow from other dark trades.<br>Interestingly, in theory, it is not a difficult problem to identify retail trades that occur on exchange (very occasionally wholesalers will pass through marketable orders to exchanges and retail limit orders are generally routed directly to exchange). This is because a number of exchanges have retail liquidity programs, where specific flags are appended to retail orders (see [1,15]) making them easy to identify. However, this is not a pragmatic approach until we have first figured out a way to identify retail flow filled by wholesalers. First, to be able to observe these flags we would need to subscribe to proprietary...