KTH Inauguration on November 15, 2019

Yesterday was a superb day for us, with the KTH graduation taking place at the Stockholm’s concert hall, followed by the banquet at the Stockholm City Hall. Kirill and Georgios received their PhD degrees, and here is how we looked!

From the left: Georgios Katsikas, Dejan Kostic, and Kirill Bogdanov


Alireza Farshin’s Licentiate Defense

We are happy to announce that Alireza Farshin successfully defended his licentiate thesis (licentiate is a degree at KTH half-way to a PhD)! We are once again very grateful to Prof. Gerald Q. Maguire Jr. for a fantastic co-advising job. Prof. Babak Falsafi was a superb opponent at the licentiate seminar. Alireza’s thesis is available online:

Realizing Low-Latency Internet Services via Low-Level Optimization of NFV Service Chains: Every nanosecond counts!

Dejan handing Alireza the traditional licentiate degree gift at KTH (image credit: Marco Chiesa).


Alireza Farshin (front, right) with his doctoral advisors (behind, left: Dejan Kostic, behind, right: Gerald Q. Maguire, Jr.). Front left is Prof. Babak Falsafi, the opponent at the licentiate seminar (image credit: Marco Chiesa).

Video of our presentation at EuroSys 2019: “Make the Most out of Last Level Cache in Intel Processors”

On March 26, 2019 in Dresden, Alireza Farshin presented our EuroSys 2019 paper on unlocking a performance-enhancing feature that existed in Intel processors for almost a decade. We are making the video of the talk available. The slides are available as well.

CPUs typically have cache memory which increases the speed of access for the most commonly used data, and thus to a large extent masks the long latency of the main memory (DRAM). During this period we have witnessed ever-increasing “core” counts (the basic building blocks of CPUs that can operate independently). For the last nine years, the largest, last-level cache of Intel processors has been split into “slices,” each being physically faster to access by the core to which it is attached relative to other cores. Our work first showed that reading data from the nearest slice is 20% faster on Intel’s Haswell CPUs, and 40% faster on the newer Skylake architecture. We then proceeded to show that a large fraction of these potential gains can be realized when the application carefully places its working set (most commonly used data) in the slices nearest to the cores that will process the data. To showcase the benefits of our approach on real applications, we have built a transparent software layer called CacheDirector that reduced the tail latency (processing time at the 99th percentile) by 21% for packets going through a service chain working at 100 Gbps. Handling traffic at such large speeds is vital for handling increasing network demands.

Our work has a clear benefit of increasing performance “for free” or reducing energy consumption while performing the same amount of work performed even in finely tuned systems. Many applications can benefit from our contribution with relatively small changes. Future important societal applications will also clearly benefit from the higher probability of receiving predictable latency responses.