Till KTH:s startsida Till KTH:s startsida

Project #02

2.pdf

Title: Distributed data management using Apache Cassandra

Leader's Name: Alexander Roghult
Member2 Name: Erik Ranby
Member3 Name: Frej Connolly

Related paper: Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen, Sergio Gómez-Villamor, Victor Muntés-Mulero, Serge Mankovskii. Solving Big Data Challenges for Enterprise Application Performance Management. Proceedings of the VLDB Endowment, Volume 5 Issue 12, August 2012, Pages 1724-1735 .
http://dl.acm.org/citation.cfm?id=2367512

Presentation Day: May 25

Model: ES

Abstract: The aim of this project is to get a hands on experience with a distributed database system by developing an application that uses one of these. The authors of the paper Solving Big Data Challenges for Enterprise Application Performance Management, did a performance evaluation of HBase, Cassandra, Redis, Voldemort, VoltDB and MySQL, which are six different open-source data stores. The result of the authors was that Cassandra was the clear winner with regards to scalability. For this reason we wish to use Cassandra.

Apache Cassandra is a distributed database management system with the aim of handling large amounts of data without a single point of failure. It was initially developed at Facebook, but it is now a top level project at Apache.

The application should use Cassandra in a way that uses the system’s scalability. To be able to evaluate this behaviour a set of a 5-10 GB of data will be distributed over 2-5 nodes. An application will be evaluated which queries Cassandra while more data is inserted. A staged failure of one or more nodes will be done and the result from the application measured. We are also interested in examining how the data is distributed as well as replicated over the different nodes.

p1724_tilmannrabl_vldb2012.pdf