Antmicro's work with Time Sensitive Networking support in the Zephyr RTOS
Topics: Open networking, Open tooling
Solving problems that require real-time calculations and precise control typically calls for using an RTOS. While we have been working with a wide variety of RTOSs for various applications (like Contiki-NG for IoT, RTEMS for space applications, eCos for satellite equipment, FreeRTOS in many other fields etc), Antmicro’s RTOS of choice these days has been Zephyr, a Linux-Foundation driven, well-structured, vendor-independent and scalable real-time OS. We’ve ported and adapted Zephyr to many platforms, encouraged its use as a standard on RISC-V, promoted it in less standard contexts like FPGA devices.
So if you have a single device with a real-time requirement, typically it’s not that hard to decide how to approach the problem - just use Zephyr!
The problem starts when there are multiple heterogeneous devices that have to communicate in a standardised and robust way, performing a complex operation involving a network protocol while never leaving the “real-time” world. Scenarios like this are typical in the aerospace, automotive, robotics industries, and increasingly those industries are looking to reuse technologies known from the commercial/consumer market to leverage the massive scale offered by omnipresent, commodity tech.
For your everyday use case, the easiest way to connect multiple devices is of course Ethernet, but plain old Ethernet does not list real-time capabilities in its dictionary - how then can it be used in a real-time use case?
The set of standards that define Time Sensitive Networking is the answer to that problem. Leveraging the physical and logical foundations of Ethernet and extends it to cover real-time use cases by defining different aspects of time sensitive communication: clock synchronization, traffic shaping, scheduling, fault tolerance etc.
TSN seems then like a good fit, and sure enough, open source support for TSN is widely available in Linux. In the RTOS world however, there has previously not existed a proper implementation of TSN, readily available and tested on real hardware platforms, well, not until Zephyr 1.13!
Initial work: towards a TSN implementation in Zephyr
As a member of the Zephyr Project, Antmicro is always excited to add new functionalities to the OS, especially in fields that open it up for adoption in new use cases. Here, we were happy to work with another Zephyr project member, Intel on getting gPTP support added to Zephyr. “gPTP” stands for “generic Precision Time Protocol” and is responsible for clock synchronization. When we joined the project it was already in progress, but far from being finished. We implemented the missing state machines and fixed various bugs in the existing code.
The initial target was making Zephyr’s clocks synchronize with external Grand Masters.
After initial support was done and merged, we proceeded with configuring Zephyr nodes as Grand Masters, as well as ensuring operational Zephyr-to-Zephyr clock synchronization.
Qav: an important part of TSN
PTP is only a part of Time Sensitive Networking. Another important part of TSN is queue management.
The platform of our choice (SAM E70 Xplained) has multiple hardware queues built into its MAC controller, which allowed us to use the same platform to extend Zephyr’s TSN capabilities.
Antmicro implemented support for credit-based shaper algorithms in Zephyr, which are described in the 802.1Qav standard.
The work in that area required us to design an API to manage the Qav-capable Ethernet queues. Through this API/management interface, we made it possible to set and read various parameters, like idle slope, delta bandwidth, traffic class, etc.
Additionally, some status parameters were implemented. These are now shown in the regular networking shell in Zephyr for the supported network interfaces.
Running a basic Zephyr gPTP sample application
To find out how to run the basic gPTP sample, please refer to Zephyr’s official documentation.
More general info about the gPTP stack implementation can be found in a dedicated chapter.
Testing in Renode Cloud Environment
A network stack plays a critical role in an operating system like Zephyr. It is also constantly under very heavy development by various parties. Our work on the TSN/gPTP support was heavily influenced by all the changes in the networking subsystem. As can be expected in large development campaigns, these completely unrelated things would break our implementation repeatedly.
The reason for that was lack of more sophisticated testing of the setup. Sure, there were multiple unit tests which directly tested our stack implementation, but Time Sensitive Networking can be broken by seemingly minor changes in other parts of the networking stack.
Obviously, network protocol testing is difficult. You can either use synthetic tests that easily get outdated and don’t really reflect real life scenarios or you can create complicated physical network setups connected to a CI system - which is costly, difficult to maintain and creates only a single, static configuration.
A much better, more scalable solution is to use simulation. With Antmicro’s Renode open source simulation framework you can create script-defined complex configurations, allowing you to verify virtually every scenario imaginable.
In Renode’s 1.7 release, Antmicro added support for the SAM E70 platform, along with Ethernet with gPTP capabilities. With these new features we were able to create a CI setup testing upstream Zephyr in a virtual environment.
And thanks to an integration with the Robot Framework, it’s very easy to create new test cases in Renode. That’s why, for Zephyr, we decided to create a suite of tests verifying a range of aspects of a single application.
This lines up perfectly with the introduction of Renode Cloud Environment - a new CI system introduced by Antmicro, that you’ll be able to read about more on our blog soon. Here is a sneak peek of the TSN testing setup running in RCE.
First, Renode verifies if the board sends a PTP packet, which means that the PTP stack is started properly. Next, we analyze its reception and the proper reaction from the recipient’s OS. We analyze whether the compile-time configuration of the PTP stack is properly reflected in its runtime, and the highest level properties: whether the correct Grand Master node has been selected and whether the slave nodes are properly synchronizing their clocks.
The whole testing setup can be easily recreated with upstream Renode and Zephyr. For instructions, please refer to the TSN testing tutorial.
Building a TSN system?
If you’d like to use TSN is your system, and feel that an RTOS like Zephyr is a good fit for your needs, be sure to reach out to us at firstname.lastname@example.org - we’d be happy to help you apply TSN on existing and new hardware, and perhaps in simulation, for real-world use cases!