Skip to main content

Improving the performance of stream processing pipeline for vehicle data

Master's thesis presentation

Time: Thu 2020-10-22 13.00

Location: Zoom - https://kth-se.zoom.us/j/65326854876

Participating: Wenyu Gu

Export to calendar

The growing amount of position-dependent data (containing both geo position data (i.e. latitude, longitude) and also vehicle/driver-related information) collected from sensors on vehicles poses a challenge to computer programs to process the aggregate amount of data from many vehicles. While handling this growing amount of data, the computer programs that process this data need to exhibit low latency and high throughput – as otherwise the value of the results of this processing will be reduced. As a solution, big data and cloud computing technologies have been widely adopted by industry.

This thesis examines a cloud-based processing pipeline that processes vehicle location data. The system receives real-time vehicle data and processes the data in a streaming fashion. The goal is to improve the performance of this streaming pipeline, mainly with respect to latency, throughput and cost.

The work began by looking at the current solution using AWS Kinesis and AWS Lambda. A benchmarking environment was created and used to measure the current system’s performance.

Additionally, a literature study was conducted to find a processing framework that best meets both industrial and academic requirements. After a comparison, Flink was chosen as the new framework. A new solution was designed to use Fink. Next the performance of the current solution and the new Flink solution were compared using the same benchmarking environment and. The conclusion is that the new Flink solution has 86.2% lower latency while supporting triple the throughput of the current system at almost same cost.

Keywords: Cloud computing, Stream processing, Flink, Amazon Web Service (AWS)