Project #17
Title: Exploring the Efficiency of Big Data Processing with Hadoop MapReduce
Leader's Name: Brian Ye
Member2 Name: Anders Ye
Related paper:
(see below)
Presentation Day: May 20
Model: LE
Abstract:
Processing Big Data has throughout history always been a challenge for scientists, in both the academia and the industry. Hadoop MapReduce is a commonly used engine used to process Big Data. The Hadoop MapReduce framework uses a distributed file system to read and write data, called the Hadoop Distributed File Systems. It is also assumed that the software in the framework is reliable and faulttolerant. As the authors of the related paper state, Hadoop MapReduce often lack of appropriate indexes. Hadoop MapReduce has a performance problem due to its data layout and indexing, we want to combine an appropriate indexing technique and data layout of the database with Hadoop MapReduce in order to increase job performance and query processing. This project will take a theoretical approach since it is hard to implement an application considering the size of the data set that will be handled.