Anomaly Detection in Manufacturing

Today, I am going to talk about the use of machine learning tools in the world of manufacturing industry. There are nowadays efficient methods to detect anomalies of industrial machines or any type of equipment that is able to continuously record modes of operation, command values and sensor measurements. Anomaly detection is a powerful tool for users to increase productivity with monitoring and failure prevention, but this is also essential to machine manufacturers for technology development in a process of continuous improvement. It is now possible to exploit machine learning and big data techniques to target higher productivity.

The Problem

The problem is quite simple to present. On one side, a set of machines, robots, pieces of equipment or vehicles able to record in time values of commands and sensors. On the other side, one company or manufacturer that wants to improve productivity and reliability with the use of the data generated by the machines.

The questions are

  • Where and when is there an abnormal behavior?
  • Where and when failures happened or are going to happen?
  • Where and when should we invest in maintenance or new equipment?

But the amount of data is very large and it is not clear what to look for. Indeed, this is worse than searching for a needle in a haystack as you don’t even know if we are looking for a needle. But fortunately we have one important starting point, we are looking for something unusual and this is exactly what machine learning is good at, to look for abnormal situations

No real need to tell algorithms what to look for, other than just abnormal.

The Machine

To illustrate the power of machine learning algorithms for anomaly detection in industrial machines, I built a software model of a machine with sensors. It is a bit simplified on purpose to be able to present results in this post. The characteristics of model are the following:

  • 3 command signals with respectively 15, 3 and 3 modes of operation
  • 5 sensors with complex behavior like non-linearity, integral and differential components as well as noise.
  • 40 days of record with around 6 hours per day and 1 measurement per minute

The traces of the sensors are shown in the figure below for the whole period. Of course, it is not possible to see every single measurement here but we can see that there are different modes of operations. A first observation is that everything looks normal !

Trace of variables

Trace of variables

The area between the two green dot lines actually shows the zone where one anomaly was inserted in the data. The black dash line shows border between the training period and the detection period used by the algorithm. It means that everything before the line is used to train an algorithm and the result of this training is used to detect abnormal behavior in the final part of the data named detection period.

Zoom on traces

Zoom on traces

If we zoom on a short period of time, we can observe different modes of operations, noise on the measurements as well as integral effects to simulate latency that we may have for example with temperature measurement. Although modes of operation are quite clear on the figure, we can also see that there are lots of levels, transitions and noise. This makes difficult to analyze signals and detects situations that are abnormal.

3D view of three variables

3D view of three variables

Though there are 8 dimensions in this data set, we can also arbitrarily select three dimensions and plot in a 3D space (see figure). This is just showing that it is not going to be easy to find something unusual.

Well, even with a small data set, it is lots of data in a high dimensional space. How to move forward?

The Results

The algorithm consists in using a clustering algorithm on the training data to find the regular modes of operations and the values of the sensors that we declare as normal. We can then calculate a probability of failure or abnormal situation by computing a kind of distance to the normal modes of operation. Note again that in the algorithm we don’t tell at all what is abnormal. We just tell the algorithm to find something in the detection data that never happened during the training data. And this is already very useful. So what are the results?

There are two ways of showing data: simple projections to observe two variables at a time and PCA to map data into a 3D space while keeping maximum behavior explanation.

2D scatter var0-var2

2D scatter var0-var2

Let’s first look at variables var0 and var2 in a scatter plot. Each point is a sample. Blue points are for the training period and green for the detection period. Red are identified as abnormal by the algorithm. The black points are the detected modes of operations. This if very interesting to see that the algorithm detected two zones (in red). On in the center indeed corresponds to the injected error and the other one is a normal mode but that was not present during the training period. Both are good candidates to be investigated if we would like to understand failures and to decide for a maintenance inspection.

2D scatter var1-var4

2D scatter var1-var4

If we plot samples in a different space (var1, var4) the mode of operation not in the training data is clearly identified (top-left). While the injected error is difficult to observe squeezed between normal modes in the center.

PCA map

PCA map

The above figure shows the same samples and the same coloring scheme but in a 3D PCA reduced space. It is interesting to see that PCA is able to present data in a space where we start to see the different modes of operations 15, 3 and 3. The following video is the figure with a rotation of the axes to see more details of the clustering algorithm.

But let’s come back to the variable-time space to see the detected anomaly. The next figure shows the five variables in time around the area detected as abnormal. The points in red are the high probability of anomaly and indeed the green dot lines show the real period where the error was injected. The match is perfect although it is not clear to see that it is abnormal because the injected error is not trivial as it is made of a non-linear combination of some variables. And there is the fourth variable that has two distinct modes of operation during the error period. This is why in the 3D space, we can observe not only one but two red zones for this anomaly (indeed two zones for this problem on the top of a third one for the normal mode not present during training).

Detected period

Detected period

Technology Breakthrough

We are lucky! There are finally methods to exploit the tons of data generated by industrial machines and sensors, and those methods are coming from machine learning and AI.

Anomaly probability

Anomaly probability

Algorithms can be implemented on cloud infrastructures for example in Spark language, to compute probability of anomalies (like on the figure on the right-side where color map from blue to purple indicates the level of probability) and to prevent from coming failures.

This new tool brings three important benefits to machine users and manufacturers:

  1. productivity increase: less failures and better quality
  2. mastery of technology: insights and continuous improvement
  3. metrics for maintenance: data driven investments

This is for sure, industry 4.0 is coming and of course

Math will rock your world !