Accelerating Distributed Storage in Heterogeneous Settings
Time: Mon 2022-05-30 15.00
Location: Ka-Sal C (Sven-Olof Öhrvik), Kistagången 16, Kista
Video link: https://kth-se.zoom.us/meeting/register/u5Iqd-yhqTorGNLFoosJXIJmTXZl3rsxS55J
Language: English
Subject area: Computer Science
Doctoral student: Waleed Reda , Network Systems Laboratory (NS Lab)
Opponent: Adam Morrison, Tel Aviv University
Supervisor: Professor Dejan Kostic, Kommunikationssystem, CoS; Professor Peter Van Roy, Université catholique de Louvain; Associate Professor Marco Chiesa, Network Systems Laboratory (NS Lab)
This work was also supported by a fellowship from the Erasmus Mundus Joint Doctorate in Distributed Computing (EMJD-DC), funded by the European Commission (EACEA) (FPA 2012-0030). QC 20220509
Abstract
Heterogeneity in cloud environments is a fact of life—from workload skews and network path changes, to the diversity of server hardware components, these are all factors that impact the performance of distributed storage. In this dissertation, we identify that heterogeneity can in fact be one of the primary causes of service degradation for storage systems. We then tackle this challenge by building next-generation distributed storage systems that can operate amidst heterogeneity while providing fast and predictable response times. First, we study skews in cloud workloads and propose scheduling strategies for key-value stores that seek to optimize latency. We then conduct a measurements study in one of the largest cloud provider networks to quantify variations in network latencies, and possible implications for storage services. Next, with fast non-volatile RAM (NVRAM) now becoming commercially available, we look into how storage systems can deal with the increasing diversity of storage technologies. We design and evaluate a distributed file system that can manage data across NVRAM and other types of storage, while providing low latency and high scalability. Lastly, we build a framework that transforms commodity Remote Direct Memory Access (RDMA) NICs into Turing machines—capable of performing arbitrary computations. This provides yet another compute resource on server machines, and we show how we can leverage it to accelerate common storage tasks as well as real storage applications.