TY - GEN
T1 - Understanding the Micro-Behaviors of Hardware Offloaded Network Stacks with Lumina
AU - Yu, Zhuolong
AU - Su, Bowen
AU - Bai, Wei
AU - Raindel, Shachar
AU - Braverman, Vladimir
AU - Jin, Xin
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/9/1
Y1 - 2023/9/1
N2 - Hardware offloaded network stacks are widely adopted in modern datacenters to meet the demand for high throughput, ultra-low latency and low CPU overhead. To fully leverage their exceptional performance, users need to have a deep understanding of their behaviors. Despite many efforts on testing software network stacks, hardware network stacks impose unique challenges to testing tools due to their kernel bypass nature and high performance.In this paper, we present Lumina, a tool to test the correctness and performance of hardware network stacks. Lumina leverages network programmability to emulate various network scenarios at line rate. With user-friendly interfaces, Lumina enables developers to inject deterministic events, thus facilitating the development of precise and reproducible tests. Given the limited resource and flexibility of programmable network devices, we mirror all the packets to dedicated servers and dump them for offline analysis. We leverage Lumina to test four RDMA NICs from NVIDIA and Intel, and identify bugs that can significantly degrade performance or mislead network operations. Lumina also enables us to capture unexpected micro-behaviors which are missing or not clearly described in public documents and specifications. Vendors have confirmed the critical bugs we discovered and will include bug fixes in future releases.
AB - Hardware offloaded network stacks are widely adopted in modern datacenters to meet the demand for high throughput, ultra-low latency and low CPU overhead. To fully leverage their exceptional performance, users need to have a deep understanding of their behaviors. Despite many efforts on testing software network stacks, hardware network stacks impose unique challenges to testing tools due to their kernel bypass nature and high performance.In this paper, we present Lumina, a tool to test the correctness and performance of hardware network stacks. Lumina leverages network programmability to emulate various network scenarios at line rate. With user-friendly interfaces, Lumina enables developers to inject deterministic events, thus facilitating the development of precise and reproducible tests. Given the limited resource and flexibility of programmable network devices, we mirror all the packets to dedicated servers and dump them for offline analysis. We leverage Lumina to test four RDMA NICs from NVIDIA and Intel, and identify bugs that can significantly degrade performance or mislead network operations. Lumina also enables us to capture unexpected micro-behaviors which are missing or not clearly described in public documents and specifications. Vendors have confirmed the critical bugs we discovered and will include bug fixes in future releases.
KW - RDMA
KW - event injection
KW - hardware offloaded network stack
KW - network testing
KW - programmable networking
UR - https://www.scopus.com/pages/publications/85174026700
U2 - 10.1145/3603269.3604837
DO - 10.1145/3603269.3604837
M3 - Conference contribution
AN - SCOPUS:85174026700
T3 - SIGCOMM 2023 - Proceedings of the ACM SIGCOMM 2023 Conference
SP - 1074
EP - 1087
BT - SIGCOMM 2023 - Proceedings of the ACM SIGCOMM 2023 Conference
PB - Association for Computing Machinery, Inc
T2 - 2023 ACM SIGCOMM Conference, ACM SIGCOMM 2023
Y2 - 10 September 2023 through 14 September 2023
ER -