Moving Averages
-->
Home
Blog
RSS
Moving Averages
I discuss moving or rolling averages, which are algorithms to compute means over different subsets of sequential data.
Published
04 June 2022
A moving average, sometimes called a rolling average, is a sequence of averages, constructed over subsets of a sequential data set. Moving averages are commonly used to process time series, particularly to smooth over noisy observations. For example, consider the noisy function in Figure 111. If these data represents a time series, we may not be interested in a single point estimate of the mean (red dashed line). Instead, we may be interested in a sequence of mean estimates, constructed as the data becomes available. By having an average that “moves”, we could construct a new sequential data set that is a smoothed version of the input data.
Figure 1. A noisy function (blue solid line), along with the mean (red line, "x" markers) of the entire data set.
While moving averages are fairly simple conceptually, there are details that are useful to understand. In this post, I’ll discuss and implement the cumulative average, the simple moving average and the weighted moving average. I’ll also discuss a special case of the weighted moving average, the exponentially weighted moving average.
Cumulative average
Consider a sequential data set of NNN observations,
{x1,x2,…,xN}.(1)<br>\{ x_1, x_2, \dots, x_N \}. \tag{1}<br>{x1,x2,…,xN}.(1)
These data do not need to be a time series, but they do need to be ordered. For simplicity, I will refer to the index n∈{1,2,…,N}n \in \{1, 2, \dots, N\}n∈{1,2,…,N} as “time”. Thus, at time nnn, the datum xnx_nxn has just been observed, while the datum xn+1x_{n+1}xn+1 is the next observation. These are useful ways of talking about the problem setup, but the operations being discussed could be applied to any sequential data, such as a row of pixels in an image.
Perhaps the simplest way to make the point estimate in Figure 111 “reactive” to new data is to compute a cumulative average (CA). In a CA, the sample mean is re-calculated as each new datum arrives. At time point nnn, the CA is defined as
CAn=x1+x2+⋯+xnn.(2)<br>\text{CA}_n = \frac{x_1 + x_2 + \dots + x_n}{n}. \tag{2}<br>CAn=nx1+x2+⋯+xn.(2)
The effect is a sequence of mean estimates that account for increasingly more of the data. See Figure 222 for an example of a simple CA applied to the data in Figure 111. Clearly, at time point NNN, the mean of the entire data set (red dashed line) is equal to CAN\text{CA}_NCAN (right-most point of blue dashed line).
Figure 2. The cumulative average (blue dashed line, "^" markers) computed on the function from Figure 111.
We can implement a CA in Python as follows:
def ca(data):<br>"""Compute cumulative average over data.<br>"""<br>return [mean(data[:i]) for i in range(1, len(data)+1)]
def mean(data):<br>"""Compute mean of data.<br>"""<br>return sum(data) / len(data)
Notice, however, that re-computing the mean at each time point is inefficient. We could simply undo the normalization of the previous step, add the new datum, and then re-normalize. Formally, the relationship between CAn\text{CA}_nCAn and CAn+1\text{CA}_{n+1}CAn+1 is
CAn+1=n⋅CAn+xn+1n+1.(3)<br>\text{CA}_{n+1} = \frac{n \cdot \text{CA}_n + x_{n+1}}{n+1}. \tag{3}<br>CAn+1=n+1n⋅CAn+xn+1.(3)
This results in a faster implementation:
def ca_iter(data):<br>"""Compute cumulative average over data.<br>"""<br>m = data[0]<br>out = [m]<br>for i in range(1, len(data)):<br>m = ((m * i) + data[i]) / (i+1)<br>out.append(m)<br>return out
As we’ll see, this idea of iteratively updating the mean, rather than fully recalculating it, is typical in moving average calculations.
Simple moving average
While the CA in Figure 222 is certainly an improvement upon a single average in Figure 111, we may want a sequence of estimates that are even more reactive. Look, for example, at the difference between the time series and the CA at n=100n=100n=100. The time series is rapidly increasing in value, but the CA is slow to react. One way to make the CA more reactive is to compute the average over a moving window of the data, where “window” refers to the subset of data in consideration. In other words, we can “forget” old data.
At each time point nnn, a simple moving average (SMA) computes the average over a backward-looking, fixed-width window. Let KKK denote the window size. Then for each index nnn, we want to compute
SMA(K)n=xn−k+1+xn−k+2+⋯+xn−1+xnK.(4)<br>\text{SMA}(K)_n = \frac{x_{n-k+1} + x_{n-k+2} + \dots + x_{n-1} + x_n}{K}. \tag{4}<br>SMA(K)n=Kxn−k+1+xn−k+2+⋯+xn−1+xn.(4)
Note that based on the definition in Equation 444, the first K−1K-1K−1 elements in the sequence are ill-defined. We could define them in one of two ways:
for mK,SMA(K)m={undefined,(x1+x2+⋯+xm−1+xm)/m,(5)<br>\text{for } m \lt K, \quad \text{SMA}(K)_m = \begin{cases}<br>\text{undefined}, \\ (x_1 + x_2 + \dots + x_{m-1} + x_m) / m,<br>\end{cases} \tag{5}<br>for mK,SMA(K)m={undefined,(x1+x2+⋯+xm−1+xm)/m,(5)
Following...