Skip to content

Hamid Ghasemirahni’s Licentiate Defense

We are happy to announce that Hamid Ghasemirahni successfully defended his licentiate thesis (licentiate is a degree at KTH half-way to a PhD)! Marco Chiesa  has done an excellent job as a co-advisor and we are once again very grateful to Prof. Gerald Q. Maguire Jr. for his key insights. Prof. Al Davis was a superb opponent at the licentiate seminar. Hamid’s thesis (hopefully one of many to come in this project) is available online:

Packet Order Matters!: Improving Application Performance by Deliberately Delaying Packets

We couldn’t take the obligatory hallway shot, so we faked the gift giving over Zoom!

 

 

Our PAM 2021 paper: “What you need to know about (Smart) Network Interface Cards”

In our PAM 2021 paper, we study the performance of (smart) Network Interface Cards (NICs) for widely deployed packet classification operations, focusing on four 100-200 GbE NICs from one of the largest NIC vendors worldwide.

We show that the forwarding throughput of the tested NICs sharply degrades when i) the forwarding plane is updated and ii) packets match multiple forwarding tables in the NIC.

Moreover, we uncover that the standard DPDK rule update API realizes slow & non-atomic rule updates using a sequence of rule insertion and deletion operations.

We solve this problem by introducing a direct in-memory rule update mechanism that achieves 80% higher throughput than the standard DPDK rule update API.

This is joint work with Georgios P. Katsikas, Tom Barbette, Marco Chiesa, Dejan Kostic, and Gerald Q. Maguire Jr.

Our ASPLOS ’21 Paper: “PacketMill: Toward Per-Core 100-Gbps Networking”

ASPLOS ’21 will feature Alireza’s presentation of our paper titled “PacketMill: Toward Per-Core 100-Gbps Networking”. This is joint work with Alireza Farshin, Tom Barbette, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kostić.

The full abstract (with the video and more resources below):

We present PacketMill , a system for optimizing software packet processing, which (i) introduces a new model to effjciently manage packet metadata and (ii) employs code-optimization techniques to better utilize commodity hardware. PacketMill grinds the whole packet processing stack, from the high-level network function confjguration fjle to the low-level userspace network (specifjcally DPDK) drivers, to mitigate ineffjciencies and produce a customized binary for a given network function. Our evaluation results show that PacketMill increases throughput (up to 36.4Gbps – 70%) & reduces latency (up to 101µs – 28%) and enables nontrivial packet processing (e.g., router) at ≈100Gbps , when new packets arrive > 10 × faster than main memory access times, while using only one processing core

PacketMill Webpage: https://packetmill.io/

PacketMill Paper: https://packetmill.io/docs/packetmill-asplos21.pdf
PacketMill source code: https://github.com/aliireza/packetmill
PacketMill Slides with English transcripts: https://people.kth.se/~farshin/documents/packetmill-asplos21-slides.pdf

Our OSDI 2020 Paper “Assise: Performance and Availability via Client-local NVM in a Distributed File System”

At USENIX OSDI 2020, Waleed presented our paper titled “Assise: Performance and Availability via Client-local NVM in a Distributed File System”. The slides and video are available at the USENIX site. Alternatively, the PDF is available here, while video is available below:

This is joint work with researchers spread all over our planet: Thomas E. Anderson (University of Washington), Marco Canini (KAUST) Jongyul Kim (KAIST), Dejan Kostić (KTH Royal Institute of Technology), Youngjin Kwon (KAIST), Simon Peter (The University of Texas at Austin), Waleed Reda (KTH Royal Institute of Technology and Université catholique de Louvain), Henry N. Schuh (University of Washington), and Emmett Witchel (The University of Texas at Austin)

The full abstract is as follows:

The adoption of low latency persistent memory modules (PMMs) upends the long-established model of remote storage for distributed file systems. Instead, by colocating computation with PMM storage, we can provide applications with much higher IO performance, sub-second application failover, and strong consistency. To demonstrate this, we built the Assise distributed file system, based on a persistent, replicated coherence protocol that manages client-local PMM as a linearizable and crash-recoverable cache between applications and slower (and possibly remote) storage. Assise maximizes locality for all file IO by carrying out IO on process-local, socket-local, and client-local PMM whenever possible. Assise minimizes coherence overhead by maintaining consistency at IO operation granularity, rather than at fixed block sizes.

We compare Assise to Ceph/BlueStore, NFS, and Octopus on a cluster with Intel Optane DC PMMs and SSDs for common cloud applications and benchmarks, such as LevelDB, Postfix, and FileBench. We find that Assise improves write latency up to 22x, throughput up to 56x, fail-over time up to 103x, and scales up to 6x better than its counterparts, while providing stronger consistency semantics.