I am a doctoral student in the Network Systems Laboratory (NSLab) at KTH Royal Institute of Technology and I am doing research under the supervision of Professor Dejan Kostic and Professor Gerald Q. Maguire Jr. I have received my B.Sc. in Electrical Engineering (Electronics) from Sharif University of Technology, Tehran, Iran, and my M.Sc. in Electrical Engineering (Digital Electronic Systems) from Amirkabir University of Technology, Tehran, Iran.
My research interests include computer networks and networked systems. During my doctoral studies, I am trying to improve the performance of Network Function Virtualization (NFV) service chains by using low-level optimization techniques. You can read more in my licentiate thesis.
[EuroSys'19] Make the Most out of Last Level Cache in Intel Processors Alireza Farshin, Amir Roozbeh, Gerald Q. Maguire Jr., Dejan Kostić (Acceptance Rate: 45/207 ≈ 21.7%)
We exploited the characteristics of non-uniform cache architecture (NUCA) in recent Intel processors to introduce a new memory management scheme. The results of our work showed that our proposed scheme could reduce the tail latencies in latency-critical Network Function Virtualization (NFV) service chains by 21.5%. Furthermore, our work demonstrated that optimizing the computer systems and taking advantage of nanosecond improvements could have a higher impact on the performance of networking applications. Please follow the links for the paper, the presentation slides, the poster, and the video (EuroSys recording is also available here). This work has been featured in the Ericsson Blog, Tech Xplore, AlphaGalileo, Twitter, KTH main page, and KTH EECS news.
[ATC'20] Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks Alireza Farshin, Amir Roozbeh, Gerald Q. Maguire Jr., Dejan Kostić (Acceptance Rate: 65/348 ≈ 18.6%)
We study the current implementation of Direct Cache Access (DCA) in Intel processors, called Data Direct I/O (DDIO) technology. Our paper shows that it is important to understand the details of DDIO and to tune/optimize it appropriately for a given Internet service to achieve high-performance, especially with the introduction of multi-hundred-gigabit networks. A preliminary version of this paper has been presented in the EuroSys'20 poster session. Please follow the links for the ATC'20 paper materials (i.e., paper + slides + video) and the EuroSys'20 poster materials (i.e., extended abstract + poster + video teaser).
[ASPLOS'21] PacketMill: Toward Per-Core 100-Gbps Networking Alireza Farshin, Tom Barbette, Amir Roozbeh, Gerald Q. Maguire Jr., Dejan Kostić (Acceptance Rate: 75/398 ≈ 18.8%)
We present PacketMill, a system for optimizing software packet processing, which (i) introduces a new model to efficiently manage packet metadata and (ii) employs code-optimization techniques to better utilize commodity hardware. PacketMill grinds the whole packet processing stack, from the high-level network function configuration file to the low-level userspace network (specifically DPDK) drivers, to mitigate inefficiencies and produce a customized binary for a given network function. Our evaluation results show that PacketMill increases throughput (up to 36.4 Gbps - 70%) & reduces latency (up to 101 us - 28%) and enables nontrivial packet processing (e.g., router) at ≈100 Gbps, when new packets arrive >10× faster than main memory access times, while using only one processing core. Please follow the links for the ASPLOS'21 extended abstract, paper, slides, and the video with English/Farsi/French subtitles. This work has been featured in Ericsson Blog.
[NSDI'22] Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets [To Appear] Hamid Ghasemirahni, Tom Barbette, Georgios Katsikas, Alireza Farshin, Massimo Girondi, Amir Roozbeh, Marco Chiesa, Gerald Q. Maguire Jr., Dejan Kostić (Acceptance Rate Spring: 28/104 ≈ 26.9%)
We systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that (i) the performance of a variety of widely deployed applications degrade substantially with even the slightest lack of traffic locality, and (ii) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing µs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%.