Running out of time: How to test time-sensitive networks effectively

Connected vehicles depend on networks that won’t bog down when handling real-time data. Specialized TSN tests ensure messages arrive on time and intact.

Jeff Warra, Spirent
ETHERNET has been the backbone of IT infrastructure for more than 40 years, but it was never designed to be a determinist network. That’s why, over time, Ethernet standards have evolved to address low-latency applications. More recently, we’ve seen Audio/Video Bridging (AVB) change to Time-Sensitive Networking (TSN).

car ethernet — In tests of TSN functions, software might mimic the actions of vehicle systems that include sensors and driver displays.

Standard bodies — including the Institute of Electrical and Electronics Engineering (IEEE), and the Internet Engineering Task Force (IETF)– are constantly improving the quality of standards for TSN, which define mechanisms to guarantee data transport with bounded low latency and low delay variation for the time-sensitive transmission of data over ethernet networks. The target is to guarantee that application end-to-end latency requirements are not exceeded. Most of the standards address the transmission of deterministic and consistent transmission latency to provide zero congestion loss typically required for applications such as steaming audio and video, industrial controls, and autonomous driving control and feedback in real-time.

TSN looks to put in place:

- Timing and Synchronization
- Bounded low latency
- Credit-based shaper (basically, a means of selecting audio/video frames for transmission), Frame Preemption (temporarily suspending transmission of non-critical frames), Scheduled Traffic flows, Cyclic Queueing and Forwarding (basically, a transmission selection algorithm for calculating deterministic delays through a bridged network).
- Asynchronous Traffic Shaping (basically, mechanisms for a bridge to handle frame queues deterministically that don’t depend on clock synchronization)
- Ultra-reliable networks
- Frame Replication and Elimination (so bridges and end stations can replicate frames for redundant transmission and ID duplicates), Path Control, Per-Stream Filtering and Policing (which lets a bridge count frames, filter and police traffic based on the particular data stream to which the frame belongs, and specifies a synchronized cyclic time schedule)
- Time sync reliability across domains
- Dedicated resources and API
- Enhancements to Stream Reservation Protocol (basically, checks before making a connection to see if current resources can handle the connection)
- Link-local Reservation protocol (for reserving network addresses valid only within a specific network segment)

Today, there’s a virtual alphabet soup of TSN standards and extensions for timing and traffic shaping. It’s a lot for designers to consider as they develop devices, and the best way to navigate this complexity is to use tools that test each of the standards. Here are just some of the various TSN standards Spirent is testing today:

IEEE 802.1AS-Rev – Enhanced Generic Precise Timing Protocol: Adds support for Performance, Redundancy, Aggregation.
IEEE 802.1Qbv – Time Aware Shaper: Achieves the theoretical lowest possible latency in engineered networks.
IEEE 802.1Qbu & IEEE 802.3br – Packet Pre-emption: Reduces latency of time-sensitive streams in non-engineered networks.
IEEE 802.1CB – Frame Replication & Elimination: Supports zero switch over time when a link fails, or frames are dropped (aka: Seamless Redundancy).
IEEE 802.1Qcc – Enhanced Stream Reservation Protocol: Adds support class configurations, shaper and replication.
IEEE 802.1Qci – Per Stream Filtering & Policing: Assigns flows to policer.
IEEE 802.1Qch – Cyclic Queuing & Forwarding: Supports known latencies, no central controller needed, limits hops.
IEEE 802.1Qcr – Asynchronous Traffic Shaping: Supports zero congestion loss for asynchronous traffic, and deterministic latency without using network topology information.

Advanced testing drives quality

Packet-switched networks are being used to move the world’s data, from planes, trains and automobiles to industrial motor controllers, autonomous vehicles and the Internet of Things (IoT). The demand to create systems-of-systems has never been greater and introduces new challenges for product and network development engineers. The convergence of those domains is amongst us here with TSN. Engineers need to gain expertise and knowledge on new technologies to properly evaluate architecture designs to implement these highly sophisticated, time-aware networks.

New techniques for testing are needed to ensure smooth and reliable signal flows inside a network for safety-critical applications. Designing a quality product on time and under budget can be difficult if not impossible these days, and the key to success in this challenge is to partner with vendors that have performed the same or similar feats.

A simple cost breakdown of where and when bugs get found and fixed. Reference: Project Cost Management, Sharan Kalwani — IEEE SEM

It goes without saying: Finding product defects in early design stages save time and cost for a project. But it might seem problematic to find defects without all the other members in the network being present. The way to address this dilemma is by testing a product against standards for its application area. An example for automotive would be either AUTOSAR, OPEN Alliance or Avnu test suites that have been developed to help ensure conformance to industry standards.

TSN specifications are defining ways to enable zero congestion loss of time-critical data flows. But loss due to congestion is only half the issue, as network designers and architects also must measure worst-case end-to-end latency within the network. Latency measurements must also consider primary and redundant flows during fault recovery conditions. It is important to measure normal vs redundant data path flows in a live network as this parameter is extremely critical to industrial motor and autonomous vehicle control applications. Using the Spirent emulated devices, engineers can quickly check the Best Clock Master Algorithm (BCMA) inside real devices on the network to ensure they recover from faulted conditions. Once the emulated device timing is validated, the Spirent end-to-end latency measurement can be observed and characterized under various switch load conditions to validate against application requirements and to ensure safe and reliable data flows can be realized.

Testing the network with a mix of real devices and emulated devices using “generalized Precision Time Protocol” (gPTP) functions will enable designers to find out what scenarios can help them optimize and check network traffic to ensure no loss of time-critical data flows. (Briefly, gPTP provides a way of synchronizing network devices without forcing each of them to have a super-precise clock. It can provide accuracy to 1 µsec. across seven hops.) This checking process employs purpose-built equipment that contains metrics for counters and timers.

Time-sensitive networking came out of technical standards collectively known as audio video bridging or AVB. The specification for timing and synchronization for time-sensitive applications (gPTP) is IEEE 802.1AS. To understand the basics of gPTP and how it synchronizes a network, it is helpful to consider a simple example.

Consider the case where a master clock wants to transfer time across a network to a slave. The master transmits a sync message and perhaps a follow-up sync message which carries to the slave the time the message was generated. Suppose the message leaves the master at some time T1. It arrives at the slave sometime later at T2. For synchronization, the slave needs to know how long it took the message to get through the network. In this case, T2 is equal to T1 plus the message transit time.

We need to know the network transit time to synchronize the two nodes. We’ll consider one way to determine transit time called peer-to-peer. The time it takes a message to traverse the wires between devices, plus the time messages spend in devices, is calculated through something called a peer delay request-response mechanism. There is a special field in synchronization messages called a synchronization correction field. That correction field accumulates as the message goes through the network. By the time the message arrives at the final destination the correction field contains the total transit time through the network.

Network synchronization also involves timestamps, and it’s important that frequencies of the various clocks in the network are synchronized together so the timestamps are consistent. gPTP has a means for ensuring this synchronization inexpensively.

Store-and-forward vs. cut-through

volt, amp, ohm — RAM, CPU, and software stack functions can be viewed as being analogous to voltage, resistance, and current flow in electrical systems.

Consider Ohm, Volt and Amperes the fundamental components of electronics. As resistance impedes current flow, so similarly does a microprocessor (MPU) real-time scheduler affect the processing of data. If you’re constantly flushing your RAM, your capacity to handle larger data flows becomes a capacity issue (analogous to over/under voltage). Signal flow as it pertains to signal latency (analogous to Amperes) can reduce your overall signal update rates if your software stack is too large or not optimized.

When developing a device, a switch or endpoint, designers should take into consideration the number of packets a CPU will ultimately have to deal with. One of the critical parameters is frame per second (FPS) and the device’s ability to keep up with all of the ethernet traffic routing to ensure deterministic latency when forwarding packets. Depending on the ethernet switching mode – Store-and-forward or cut-through switching – the effects on the microprocessor, RAM and software stack will have drastic impacts on the performance of products with regard to signal latency, jitter, and capacity or bandwidth.

(As a quick review, store-and-forward techniques send information to an intermediate station (which could be a microprocessor). This station occupies microprocessor cycles to work with the data while routing it to a final destination. It tends to find use where a packet must be reviewed by the processor which can result in transmission delays and poor performance of the ethernet link. Cut-through switching involves techniques to begin the transmit process before the whole frame has been received, as a way to help streamline data and to reduce latency.)

To wit:

Memory (volt analogy) – All the processing in the world will not solve a RAM issue, and constantly swapping RAM hinders the best stacks.
SW Stack (ampere analogy) – All the RAM in the world will not solve a stack issue.
Computer Process (ohm analogy) – Overtaxed processor schedulers hinder the best stacks.
As you are deploying a time-aware network, take into consideration timing errors. The two main contributors to timing errors are the accuracy of the correction field and the peer delay calculations for each member in the network. Do they reflect the actual delay experienced by the sync messages? Possible errors may include variable errors, quantization errors in time stamps, synchronization/rounding errors in local DUT clocks, phase noise in oscillators, fixed errors, asymmetric delays in the physical layer, i.e., timestamp type and point; and cable lengths between forward and reverse paths.

time aware bridge test — Tests of a time-aware bridge in a TSN that uses gPTP protocol might introduce delays to verify the capacity of the bridge to deal with message transmission problems.

In a typical device being tested, the application turnaround time can and will impact the downstream path delay calculation when internal DUT hardware and software resources are shared between the application functions and network communications.

Having a Spirent test bed in place that can emulate, measure and impair gPTP networked devices using industry-proven techniques and products to help develop robust products with accurate timing.

Tests using “generalized Precision Time Protocol” (gPTP) functionality lets designers optimize and check network traffic to ensure no loss of time-critical data flows. These tests use purpose-built equipment that contains metrics for counters and timers. Typical gPTP test equipment tracks over 40 measurements in real-time for each received stream, including:

Advanced sequencing: In-order, lost, reordered, late and duplicate
Latency: Avg, min, max and short-term avg; first/last frame arrival timestamp
Latency modes: LILO, LIFO and FIFO
Data integrity: Generate Errors: IP checksum, TCP/UDP checksum, frame CRC, embedded CRC and PRBS bit errors
Histograms: Jitter, Inter-arrival, Latency, Sequence

Spirent also automatically calculates the following IEEE802.1as gPTP clock information:

IEEE802.1as Clock Results that include States (Clock Identity, State, Clock Accuracy), Timers (Current, Min, Max, Avg Mean Path Delay), Counter (TX / RX Announce, TX/RX Sync, TX / RX Follow up, TX / RX Peer Delay Request, TX / RX Peer Delay Response, TX / RX Peer Delay Follow up)
IEEE802.1as Time Properties Results: States (gPTP Time Scale, Current UTC offset Valid, Leap59, Leap61; Time/Frequency Traceable, Time Source), Counter (Current UTC offset).
IEEE802.1as Clock Sync Results: Timers (Time of Day), Counters (Current offset, Positive/Negative offset Peak and deviation; Current, Min, Max, Average Mean Path Delay; Average offset plus/minus deviation; Step Removed, Minimum Pdelay Request Interval; Peer Mean Path Delay; Sync/Follow-up/Pdelay Correction Field Response and follow-up; Invalid Timestamp count)
IEEE802.1as Parent Clock Info Results: States (Parent Stats, Step Mode), Timers (Observed Parent offset scaled log Variance; Observed Parent Clock Phase Change Rate; Grandmaster Identity; Grandmaster Clock Class, accuracy, offset Variance; Grandmaster Priority 1, 2)
IEEE802.1as Message Rate Results: Counters (Announce Rate: Tx / Rx Min, Max Average Packets per second; Sync Rate: RX Min, Max, Average Packets per second; Follow up Rate: RX Min, Max, Average Packets per second; Peer Delay Request Rate: RX Min, Max, Average; Peer Delay Response Rate: RX Min, Max, Average; Peer Delay Response Follow up Rate: RX Min, Max, Average)
IEEE802.1 State Summary Results: Counters (Faulty, Disabled Count, Listening Count; Pre-Master Count, Master / Slave Count; Passive, Uncalibrated Count, 802.1as up / down)