Last year, in collaboration with Analog Devices, Inc (ADI), we expanded Kenning, our open source AI optimization, deployment and evaluation framework with AutoML support, automating the training and optimizing of AI models for specific tasks and hardware. The first use case we demonstrated for this flow involved finding the right model for detecting anomalies.

Continuing our work within the area of anomaly detection, we are introducing the Zephyr Sensor Anomalies library - an easy-to-use library for embedded devices running Zephyr RTOS, which allows for scanning sensor readings for various anomalies or patterns. The detection can be ran in the background on a separate thread, while the application is performing other tasks. In this article, we describe the unique challenges that arise in anomaly detection problems, the features of the library, and how it can be used to improve evaluating AI models in Kenning.

Zephyr Sensor Anomalies library

Challenges of anomaly detection on embedded devices

One of the common use cases for Machine Learning (ML) models running on microcontrollers are anomaly detection or pattern recognition on sensory data. When it comes to those problems, we deal with analyzing a series of data samples from sensors, and then trying to determine which samples constitute anomalous data.

For training an anomaly detection model, we need to take into consideration factors like detection delay and problems with accurate evaluation. The detection should, ideally, happen as soon as possible, in order to mitigate possible issues it might be pointing towards. If an anomaly spans 50 samples, we would much rather receive a signal from the model after 5 samples than after sample 45. And, unless the sole purpose of the MCU is anomaly detection, the process of checking for anomalies should run in the background, affecting the rest of the application as little as possible.

Overview of the Zephyr Sensor Anomalies library

The Zephyr Sensor Anomalies library is a new Zephyr RTOS library that provides lightweight functions for running anomaly detection and pattern recognition Machine Learning models using one of two supported ML backends:

The library allows for collection of sensor data and periodical model execution to be performed in the background, while the rest of the application on an MCU is running normally. If an anomaly occurs, the user-defined callback function is called. We showcase that in the demo below.

In this demo, we build an example application for the stm32f746g_disco board, with two 3-channel sensors attached. In the application, we bake-in a simple CNN (convolutional neural network) trained on the minispot dataset (detecting anomalies in accelerometer data from a robot). A key code snippet from the application can be seen below.

The model itself was trained and optimized for the TFLite-Micro ML execution framework, using Kenning. All resources (scripts, configuration files etc.) used for that workflow are available in the library’s GitHub repository, along with necessary documentation.

The dataset is available here.

We run the example app in Renode using a Python script provided with the library. The script then feeds data to the simulated sensors through Renode’s Python API. The application runs and detects anomalies using the Zephyr Sensor Anomalies library.

Loading terminal recording...

Below is a code snippet from the example application. The library consists of two parts. We first initialize the provider (for reading data from sensors), and then initialize the detector and pass the provider handle to the detector. At the end, we define the callback function and start the detection, which will run in a separate thread and call the callback if it finds an anomaly.

static void log_score_cb(void *ctx, float score) {
    printf("Anomaly detected with probability %.5f\n", (double)score);
}
int main()
{
    // Initializing sensor data collection
    static struct provider_reader pr = {
        .dst_N = 0,
        .ps_N = 0,
    };
    static struct provider_hdr_entry hdr[128];
    provider_reader_register_all_sensor(&pr);
    provider_reader_hdr(&pr, hdr);
    // Initializing anomaly detection with smoothing enabled (to avoid raising multiple alarms for 1 anomaly)
    detector_init();
    struct detector_classifier dc = {0};
    detector_classifier_init(&g_detector, &dc, &pr, DETECTOR_SMOOTHING_EXP_SMOOTHING, 100, 0.5);
    detector_classifier_register_cb(&dc, log_score_cb, NULL);
    detector_classifier_start(&dc, 6, DETECTOR_DEFAULT_OPTS);
    // ... (execution continues, while anomaly detection is running in the background)
}

The execution flow of the example application is shown on a diagram below. The provider automatically fetches the sensors available on the device tree of the board it runs on, allowing for easy tracking of the sensors present. It periodically collects readings from all sensors, which can be then forwarded to the detector running a Machine Learning model (optionally, a sliding window mechanism or a smoothing mechanism can be enabled).

The sensor data from the provider can also be used in other ways - for example it can be dumped over a serial port, to create a dataset for training and evaluation.

Zephyr Sensor Anomalies library graph

Our library, together with the real-time evaluation features recently added to Kenning, also enables better evaluation of an anomaly detection model by providing the user with a set of time-based quality metrics. We will now go into a little more detail about them and why they are beneficial when it comes to anomaly detection problems.

Time-based quality metrics for evaluating models with Kenning

While regular quality metrics used in classification problems (such as accuracy, precision, F1 score, or G-mean) are essential in training and evaluating a model, anomaly detection models benefit considerably from time-based quality metrics (like detection delay, or if the sliding window mechanism is enabled - fault detection rate and false alarm rate). The reason for that is because an anomaly often spans over multiple samples in a continuous stream of data. In this case, we usually do not need the model to correctly classify every single sample of the anomaly; flagging even just a single sample within the anomaly is often sufficient. This is explained further by the following two graphs.

Zephyr Sensor Anomalies library graph

In the graphs, actual anomalous samples were marked red, while the white area marks the samples a hypothetical evaluated model classified as anomalous.

In the graph above, a single 5-sample anomaly occurred in the evaluated period. The model correctly classified 3 of those samples, and also classified 2 non-anomalous samples as part of the anomaly. That makes for a rather low F1 score of 0,66. However, in practical terms - the model actually performed quite well by catching the multi-sample anomaly. And if we treat all 5 continuous samples the model flagged as a single ‘anomaly occurred’ signal - we can say there were no false positives. Therefore, we can see that the quality of the model is not accurately described by the low F1 score - but our time-based metrics (fault detection rate, false alarm rate and average detection delay) reflect it quite well.

An opposite example can be see on the graph below - we have a comparatively higher F1 score because the model flagged all anomalous samples. But in practical applications this model performed worse than the one shown previously - because one out of the two anomalies it flagged was a false alarm. Again - the classic metric used for classification (F1 score) inaccurately reflects the quality of the model, while the time-based metrics reflect it quite well (the false alarm rate is 0,5).

Zephyr Sensor Anomalies library graph

As we can see, time-based metrics allow for checking the quality of the model by assessing how well it performs in correctly flagging entire anomalous events, instead of evaluating how well it can classify each individual data point. The new Kenning real-time evaluation loop, with a dedicated report format for anomaly detection (both of which were developed alongside the library), supports those metrics.

A dedicated evaluation application provided with the Zephyr Sensor Anomalies library, together with the mentioned new Kenning features, allows for running a real-time evaluation loop. Kenning will accomplish this by running the application in Renode, and feeding test data to the simulated sensors (using Renode’s Python API) while receiving detection results through Kenning Protocol working in asynchronous mode over UART.

An example evaluation workflow can be found in the README file of the GitHub repository.

You can find a sample anomaly detection report here, which was generated from a real-time evaluation of a simple CNN which was trained on the aforementioned minispot dataset.

Simplified anomaly detection on embedded devices

The Zephyr Sensor Anomalies library is an easy-to-use Zephyr RTOS library that simplifies anomaly detection for MCUs. It has tools that automatically detect sensors present in the system, and collect readings from them; it allows for feeding those readings to an anomaly detection process running in the background and setting up callbacks for anomaly occurrence, and, together with some new features in Kenning (developed alongside the library) it lets the user evaluate AI models using time-based quality metrics (such as detection rate, false alarm rate, and average delay).

If the features of the library sound interesting to you, and you would like to learn more about how Antmicro helps customers with creating, optimizing, deploying and evaluating AI models, reach out to us at contact@antmicro.com.