Skip to main content

Batch processing to utilize parallelism

Batch processing to utilize parallelism

Time: Mon 2020-05-18 16.00

Lecturer: Emma Lind and Mattias Stahre

Location: Zoom Meeting

The threat level (specifically in this thesis, for aircraft) in an environment can be determined by analyzing radar signals. This task is critical and has to be solved fast and with high accuracy. In order to classify a radar emitter, the electromagnetic pulses have to be identified. Usually, there are several emitters transmitting radar pulses at the same time in an environment. These pulses need to be sorted into groups, where each group contains pulses from the same emitter. This thesis aims to find a fast and accurate solution to sort the pulses in parallel. During the last decade, there has been a considerable development within the GPU performance compared to CPU. However, the performance benefit of a GPU can only be utilized by writing code that does many calculations in parallel.

This thesis evaluates how a clustering algorithm, when creating batches of data, can be used in a real-time system. The selected approach analyzes batches of data in parallel for subsequent analysis and exploits the advantages of a GPU. The first problem was to find a suitable clustering algorithm. The second problem was to find an optimal batch size for the clustering algorithm to perform with high clustering accuracy and to process the batches of pulses in parallel fast. A quantitative method based on experiments was used to measure performance, accuracy, and parallelism as a function of batch sizes when using the selected clustering algorithm. The algorithm selected for clustering the data was DBSCAN because of its advantages, such as not having to specify the number of clusters in advance, its ability to find arbitrary shapes of a cluster in a data set, and its low time complexity.

The evaluation shows that implementing parallel batch processing is possible while achieving sufficient clustering accuracy when compared to a previous implementation using a maximum likelihood solution.

The optimal size of the batch in terms of data points is hard to determine since the size is very dependent on the input data. A solution would be to adjust the batch size for diverse input data. However, with a high level of parallelism, an additional delay is introduced that depends on the difference between the batch size (in ms) and the time it takes to process the batch, thus the system will be slower to output its result for a given batch compared to a sequential system. For a time-critical system, a high level of parallelism is unsuitable because it leads to slow response time.

Keywords:Parallelization, GPU, Unsupervised learning, Commodity GPU, Clustering,Signal Separation, DBSCAN