KTH Logo

Our upcoming NSDI 2022 paper “Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets”

Our upcoming NSDI 2022 paper Packet Order Matters shows a surprising result: deliberately delaying packets can improve the performance of backend servers by up to about a factor of 2 (e.g., those used for Network Function Virtualization)! This applies to both throughput and latency (including the time spent in our Reframer). We show three different scenarios in which Reframer can be deployed. Source code is available here.

This is joint work with:

Hamid Ghasemirahni, Tom Barbette, Georgios P. Katsikas, Alireza Farshin, Amir Roozbeh, Massimo Girondi, Marco Chiesa, Gerald Q. Maguire Jr., and Dejan Kostić.

Full abstract is below:

Data centers increasingly deploy commodity servers with high-speed network interfaces to enable low-latency communication. However, achieving low latency at high data rates crucially depends on how the incoming traffic interacts with the system’s caches. When packets that need to be processed in the same way are consecutive, i.e., exhibit high temporal and spatial locality, caches deliver great benefits.

In this paper, we systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that (i) the performance of a variety of widely deployed applications degrade substantially with even the slightest lack of traffic locality, and (ii) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing μs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%.

Our PAM 2021 paper: “What you need to know about (Smart) Network Interface Cards”

In our PAM 2021 paper, we study the performance of (smart) Network Interface Cards (NICs) for widely deployed packet classification operations, focusing on four 100-200 GbE NICs from one of the largest NIC vendors worldwide.

We show that the forwarding throughput of the tested NICs sharply degrades when i) the forwarding plane is updated and ii) packets match multiple forwarding tables in the NIC.

Moreover, we uncover that the standard DPDK rule update API realizes slow & non-atomic rule updates using a sequence of rule insertion and deletion operations.

We solve this problem by introducing a direct in-memory rule update mechanism that achieves 80% higher throughput than the standard DPDK rule update API.

This is joint work with Georgios P. Katsikas, Tom Barbette, Marco Chiesa, Dejan Kostic, and Gerald Q. Maguire Jr.

Our Cheetah paper at NSDI 2020: “A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency”

Tom will present our Cheetah paper at NSDI 2020 this February in Santa Clara, CA. This work on load-balancing across multiple servers comes right after our RSS++ paper on intra-datacenter load balancing. The Cheetah paper is available here. This is joint work with Tom Barbette, Chen Tang, Haoran Yao, Dejan Kostić, Gerald Q. Maguire Jr., Panagiotis Papadimitratos, and Marco Chiesa.

The abstract is below:

Large service providers use load balancers to dispatch millions of incoming connections per second towards thousands of servers. There are two basic yet critical requirements for a load balancer: uniform load distribution of the incoming connections across the servers and per-connection-consistency (PCC), i.e., the ability to map packets belonging to the same connection to the same server even in the presence of changes in the number of active servers and load balancers. Yet, meeting both these requirements at the same time has been an elusive goal. Today’s load balancers minimize PCC violations at the price of non-uniform load distribution.

This paper presents Cheetah, a load balancer that supports uniform load distribution and PCC while being scalable, memory efficient, resilient to clogging attacks, and fast at processing packets. The Cheetah LB design guarantees PCC for any realizable server selection load balancing mechanism and can be deployed in both a stateless and stateful manner, depending on the operational needs. We implemented Cheetah on both a software and a Tofino-based hardware switch. Our evaluation shows that a stateless version of Cheetah guarantees PCC, has negligible packet processing overheads, and can support load balancing mechanisms that reduce the flow completion time by a factor of 2-x.

Marco’s presentation at RIPE about unpredictable Internet clouds

At the RIPE meeting in October 2019, our colleague Marco Chiesa presented some of our findings from the study on the effects of recent traffic engineering trends in public Internet. The details are in the RIPE blog post. An excerpt of our findings is shown below.

TCP RTT latency measured between two Amazon VMs deployed in Oregon and Virginia. The black dotted vertical lines highlight moments when RTT latency changed and indicate potential TE activity (such as path changes).