We are happy to announce that Massimo Girondi successfully defended his licentiate thesis (licentiate is a degree at KTH half-way to a PhD)! Marco Chiesa has done an excellent job as a co-advisor and as is customary we are very grateful to Prof. Gerald Q. Maguire Jr. for his key insights. Giuseppe Siracusano was a superb opponent at the licentiate seminar, with Amir Payberah as the examiner. Massimo’s thesis (second licentiate thesis of this project) is available online:
Group shot of Networked Systems Laboratory members (Massimo is beneath the KTH logo). Image taken by Voravit Tanyingyong
Dejan hands the gift to Massimo a few weeks later in the hallway that Massimo chose for the shot. Definitely looks better than the opposite side we used in the past! (image taken by Voravit Tanyingyong)
Can networking applications achieve suitable performance with IOMMU at high rates? Our recent PeerJ CS article answers this question by characterizing the performance implications of IOMMU and its cache (IOTLB) on recent Intel Xeon Scalable & AMD EPYC processors at 200 Gbps. Our study shows that enabling IOMMU at high rates could result in an up-to-20-percent throughput drop due to excessive IOTLB misses. Moreover, we present potential mitigation techniques to recover the introduced throughput drop caused by the “IOTLB wall” by using hugepage-backed buffers in the Linux kernel. This is joint work with Alireza Farshin (KTH), Luigi Rizzo (Google), Khaled Elmeleegy (Google), and Dejan Kostic (KTH). Follow the links for PDF and code.”
At NSDI ’22, Waleed presented our RedN paper that shows a suprising result, namely that Remote Direct Memory Access (RDMA), as implemented in widely deployed RDMA Network Interface Cards, is Turing Complete. We leverage this finding to reduce the tail latency of services running on busy servers by 35x! Full Abstract is below. This is joint work with Waleed Reda, Marco Canini (KAUST), Dejan Kostić, and Simon Peter (UW).
It is becoming increasingly popular for distributed systems to exploit offload to reduce load on the CPU. Remote Direct Memory Access (RDMA) offload, in particular, has become popular. However, RDMA still requires CPU intervention for complex offloads that go beyond simple remote memory access. As such, the offload potential is limited and RDMA-based systems usually have to work around such limitations.
We present RedN, a principled, practical approach to implementing complex RDMA offloads, without requiring any hardware modifications. Using self-modifying RDMA chains, we lift the existing RDMA verbs interface to a Turing complete set of programming abstractions. We explore what is possible in terms of offload complexity and performance with a commodity RDMA NIC. We show how to integrate these RDMA chains into applications, such as the Memcached key-value store, allowing us to offload complex tasks such as key lookups. RedN can reduce the latency of key-value get operations by up to 2.6× compared to state-of-the-art KV designs that use one-sided RDMA primitives (e.g., FaRM-KV), as well as traditional RPC-over-RDMA approaches. Moreover, compared to these baselines, RedN provides performance isolation and, in the presence of contention, can reduce latency by up to 35× while providing applications with failure resiliency to OS and process crashes.
We are hugely honored that our “Packet Order Matters!” paper received the Community Award at NSDI 2022! More details are available in our earlier post.
We are happy to announce that on May 30, 2022 Waleed Reda successfully defended his PhD thesis at both KTH and UC Louvain! Marco Canini equally co-advised Waleed with Dejan Kostic in the beginning, and by defense time Waleed’s advisors were Dejan Kostic and Marco Chiesa at KTH, and Peter van Roy at UC Louvain. Adam Morrison was a superb opponent at the defense. Waleed’s thesis (the first to come in this ERC project) is available online: