Twitter’s Anomaly Detection Algorithm Simplified

Have you ever wondered how does a company know when they’re receiving too much data? How do draw the line between a lot and too much?

Well, that is what Anomaly Detection tries to solve. Anomaly Detection is using a statistical approach to find outliers (or “spikes” in time series data) that deviate from what we would expect.

Anomaly detection is an extremely important aspect of today’s finance & economy sector as companies want to analyze how their data is being consumed and where they may have a problem or a potential avenue for growth.

In 2015, Twitter came up with a neat robust approach to finding anomalies in continuous data (time-series data). They made their work open-source and list some possible uses as identifying bots or scammers or in general tracking user engagement. [1]

In their work, they attempt to capture two types of anomalies.

  1. Global/Local: Due to natural company growth, Twitter does flag anomalies that deviate a little from seasonal trends — local anomalies. Additionally, the algorithm does catch very large anomalies — global anomalies.
  2. +/-: the algorithm catches anomalies that are beneficial (positive) such as strong user feedback or trending topics, but it also catches low user engagement and potential software issues.

The Algorithm:

The algorithm used builds off the ESD Test for anomalies. The ESD Test is a simple outlier test, similar to the Grubb’s Test, that is applied to an approximately normal distribution.

The generalized ESD Test does not work primarily due to its being parametric test (assumes normal distribution) and Twitter’s dataset contains multiple modes (seasonal data).

Also since Twitter does not say “hey let’s put 17 anomalies into out dataset today” — the Grubb’s test and ESD don’t work — as an upper limit on the number of anomalies must be given (but we do not know the number of anomalies in our data).

So what did Twitter’s team do?

They leveraged time-series decomposition by breaking up the data into 3 primary sections: residual, seasonality, and trend. After taking the raw data and breaking it down they analyzed the residual graph. In the previous multi-modal dataset there was a large standard deviation but the residual curve shows a tall, skinny curve meaning a small standard deviation.

From Twitter Presentation
From Twitter Presentation

So now by looking at the residual we have an approximately normal distribution. Next, the algorithm applies an ESD test on the residual dataset. But not just any ESD test.

The Twitter team altered the ESD test to take in the robust parameters of the median (median of the absolute deviations) and the sigma of the MAD.

In summary, the algorithm first uses STL decomposition to break up the raw time series. Then it detects global and local anomalies by applying an SH-ESD test (Seasonal Hybrid ESD Test) which has the input parameters of Mean Absolute Deviation Median and its corresponding standard deviation. Now, the algorithm can account for local and global anomalies. Problem solved.

My explanation is a bit generalized and if you want to see the algorithm’s performance or exact mechanics refer to the official paper.

This system does have many drawbacks as well, however, it’s cool to see a company tackle such a large problem using such a simple and efficient algorithm.

Open-Source Github Package in R:

Innovation Enthusiast.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store