Today, I am going to talk about the use of machine learning tools in the world of manufacturing industry. There are nowadays efficient methods to detect anomalies of industrial machines or any type of equipment that is able to continuously record modes of operation, command values and sensor measurements. Anomaly detection is a powerful tool for users to increase productivity with monitoring and failure prevention, but this is also essential to machine manufacturers for technology development in a process of continuous improvement. It is now possible to exploit machine learning and big data techniques to target higher productivity.
The problem is quite simple to present. On one side, a set of machines, robots, pieces of equipment or vehicles able to record in time values of commands and sensors. On the other side, one company or manufacturer that wants to improve productivity and reliability with the use of the data generated by the machines.
The questions are
- Where and when is there an abnormal behavior?
- Where and when failures happened or are going to happen?
- Where and when should we invest in maintenance or new equipment?
But the amount of data is very large and it is not clear what to look for. Indeed, this is worse than searching for a needle in a haystack as you don’t even know if we are looking for a needle. But fortunately we have one important starting point, we are looking for something unusual and this is exactly what machine learning is good at, to look for abnormal situations
No real need to tell algorithms what to look for, other than just abnormal.
To illustrate the power of machine learning algorithms for anomaly detection in industrial machines, I built a software model of a machine with sensors. It is a bit simplified on purpose to be able to present results in this post. The characteristics of model are the following:
- 3 command signals with respectively 15, 3 and 3 modes of operation
- 5 sensors with complex behavior like non-linearity, integral and differential components as well as noise.
- 40 days of record with around 6 hours per day and 1 measurement per minute
The traces of the sensors are shown in the figure below for the whole period. Of course, it is not possible to see every single measurement here but we can see that there are different modes of operations. A first observation is that everything looks normal !
The area between the two green dot lines actually shows the zone where one anomaly was inserted in the data. The black dash line shows border between the training period and the detection period used by the algorithm. It means that everything before the line is used to train an algorithm and the result of this training is used to detect abnormal behavior in the final part of the data named detection period.
If we zoom on a short period of time, we can observe different modes of operations, noise on the measurements as well as integral effects to simulate latency that we may have for example with temperature measurement. Although modes of operation are quite clear on the figure, we can also see that there are lots of levels, transitions and noise. This makes difficult to analyze signals and detects situations that are abnormal.
Though there are 8 dimensions in this data set, we can also arbitrarily select three dimensions and plot in a 3D space (see figure). This is just showing that it is not going to be easy to find something unusual.
Well, even with a small data set, it is lots of data in a high dimensional space. How to move forward?
The algorithm consists in using a clustering algorithm on the training data to find the regular modes of operations and the values of the sensors that we declare as normal. We can then calculate a probability of failure or abnormal situation by computing a kind of distance to the normal modes of operation. Note again that in the algorithm we don’t tell at all what is abnormal. We just tell the algorithm to find something in the detection data that never happened during the training data. And this is already very useful. So what are the results?
There are two ways of showing data: simple projections to observe two variables at a time and PCA to map data into a 3D space while keeping maximum behavior explanation.
Let’s first look at variables var0 and var2 in a scatter plot. Each point is a sample. Blue points are for the training period and green for the detection period. Red are identified as abnormal by the algorithm. The black points are the detected modes of operations. This if very interesting to see that the algorithm detected two zones (in red). On in the center indeed corresponds to the injected error and the other one is a normal mode but that was not present during the training period. Both are good candidates to be investigated if we would like to understand failures and to decide for a maintenance inspection.
If we plot samples in a different space (var1, var4) the mode of operation not in the training data is clearly identified (top-left). While the injected error is difficult to observe squeezed between normal modes in the center.
The above figure shows the same samples and the same coloring scheme but in a 3D PCA reduced space. It is interesting to see that PCA is able to present data in a space where we start to see the different modes of operations 15, 3 and 3. The following video is the figure with a rotation of the axes to see more details of the clustering algorithm.
But let’s come back to the variable-time space to see the detected anomaly. The next figure shows the five variables in time around the area detected as abnormal. The points in red are the high probability of anomaly and indeed the green dot lines show the real period where the error was injected. The match is perfect although it is not clear to see that it is abnormal because the injected error is not trivial as it is made of a non-linear combination of some variables. And there is the fourth variable that has two distinct modes of operation during the error period. This is why in the 3D space, we can observe not only one but two red zones for this anomaly (indeed two zones for this problem on the top of a third one for the normal mode not present during training).
We are lucky! There are finally methods to exploit the tons of data generated by industrial machines and sensors, and those methods are coming from machine learning and AI.
Algorithms can be implemented on cloud infrastructures for example in Spark language, to compute probability of anomalies (like on the figure on the right-side where color map from blue to purple indicates the level of probability) and to prevent from coming failures.
This new tool brings three important benefits to machine users and manufacturers:
- productivity increase: less failures and better quality
- mastery of technology: insights and continuous improvement
- metrics for maintenance: data driven investments
This is for sure, industry 4.0 is coming and of course
Math will rock your world !