We are hugely honored that our “Packet Order Matters!” paper received the Community Award at NSDI 2022! More details are available in our earlier post.
Presentation video is below:
[youtube https://www.youtube.com/watch?v=0M8EiLYgYpA]
We are hugely honored that our “Packet Order Matters!” paper received the Community Award at NSDI 2022! More details are available in our earlier post.
Presentation video is below:
[youtube https://www.youtube.com/watch?v=0M8EiLYgYpA]
We are happy to announce that on May 30, 2022 Waleed Reda successfully defended his PhD thesis at both KTH and UC Louvain! Marco Canini equally co-advised Waleed with Dejan Kostic in the beginning, and by defense time Waleed’s advisors were Dejan Kostic and Marco Chiesa at KTH, and Peter van Roy at UC Louvain. Adam Morrison was a superb opponent at the defense. Waleed’s thesis (the first to come in Dejan Kostic’s ERC ULTRA project) is available online:
Accelerating Distributed Storage in Heterogeneous Settings
Here’s the Zoom screenshot from this hybrid defense:
At NSDI ’22, Waleed presented our RedN paper that shows a suprising result, namely that Remote Direct Memory Access (RDMA), as implemented in widely deployed RDMA Network Interface Cards, is Turing Complete. We leverage this finding to reduce the tail latency of services running on busy servers by 35x! Full Abstract is below, and the video is on the USENIX Youtube channel. This is joint work with Waleed Reda, Marco Canini (KAUST), Dejan Kostić, and Simon Peter (UW).
It is becoming increasingly popular for distributed systems to exploit offload to reduce load on the CPU. Remote Direct Memory Access (RDMA) offload, in particular, has become popular. However, RDMA still requires CPU intervention for complex offloads that go beyond simple remote memory access. As such, the offload potential is limited and RDMA-based systems usually have to work around such limitations.
We present RedN, a principled, practical approach to implementing complex RDMA offloads, without requiring any hardware modifications. Using self-modifying RDMA chains, we lift the existing RDMA verbs interface to a Turing complete set of programming abstractions. We explore what is possible in terms of offload complexity and performance with a commodity RDMA NIC. We show how to integrate these RDMA chains into applications, such as the Memcached key-value store, allowing us to offload complex tasks such as key lookups. RedN can reduce the latency of key-value get operations by up to 2.6× compared to state-of-the-art KV designs that use one-sided RDMA primitives (e.g., FaRM-KV), as well as traditional RPC-over-RDMA approaches. Moreover, compared to these baselines, RedN provides performance isolation and, in the presence of contention, can reduce latency by up to 35× while providing applications with failure resiliency to OS and process crashes.
We are happy to announce that Amir Roozbeh successfully defended his PhD thesis! Prof. Gerald Q. Maguire Jr. has as usual done a stellar job as a co-advisor. Prof. Jonathan M. Smith was a superb opponent at the defense seminar. Amir’s thesis is available online:
This is the second year in which we couldn’t take the obligatory hallway shot, so here is the fake gift giving over Zoom!
Our upcoming NSDI 2022 paper Packet Order Matters shows a surprising result: deliberately delaying packets can improve the performance of backend servers by up to about a factor of 2 (e.g., those used for Network Function Virtualization)! This applies to both throughput and latency (including the time spent in our Reframer). We show three different scenarios in which Reframer can be deployed. Source code is available here.
This is joint work with:
Hamid Ghasemirahni, Tom Barbette, Georgios P. Katsikas, Alireza Farshin, Amir Roozbeh, Massimo Girondi, Marco Chiesa, Gerald Q. Maguire Jr., and Dejan Kostić.
Full abstract is below:
Data centers increasingly deploy commodity servers with high-speed network interfaces to enable low-latency communication. However, achieving low latency at high data rates crucially depends on how the incoming traffic interacts with the system’s caches. When packets that need to be processed in the same way are consecutive, i.e., exhibit high temporal and spatial locality, caches deliver great benefits.
In this paper, we systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that (i) the performance of a variety of widely deployed applications degrade substantially with even the slightest lack of traffic locality, and (ii) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing μs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%.