Skip to main content

Seminar 2015-08-20

Performance Characterization of In-Memory Data Analytics on Scale-up Servers

Speaker: Ahsan Javed Awan, KTH, ICT, SCS


With a deluge in the volume and variety of data collected, large-scale web enterprises (such as Yahoo, Facebook, and Google) run big data analytic applications using clusters of commodity servers. However, it has been recently reported that using clusters is a case of over-provisioning since a majority of analytic jobs do not process huge data sets and that modern scale-up servers are adequate to run analytic jobs. Additionally, commonly used predictive analytics such as  machine learning algorithms work on filtered datasets that easily fit into memory of modern scale-up servers. Therefore, modern scale-up servers are becoming an important processing platform for big data analytics.


In this seminar, I will talk about lessons learned from deploying Apache Spark based data analysis workloads on scale-up server.  I will explain, "Why Spark based applications do not scale on NUMA machine "

Slides: A.Awan.pdf (pdf 2.7 MB)

Page responsible:Web editors at EECS
Belongs to: Software and Computer Systems
Last changed: Oct 12, 2015