Project #04
Title: Personalized PageRank on MapReduce
Leader's Name: Anton Lund
Member2 Name: Andreas Cederholm
Member3 Name: Isaac Rondon Sosa
Member4 Name: Muhammad Haky Rufianto
Related Paper: Bahmani, Bahman, Kaushik Chakrabarti, and Dong Xin. "Fast personalized pagerank on mapreduce." Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 2011.
http://131.107.65.14/pubs/145447/mod113-bahmani.pdf
Presentation Day: May 20
Model: ES
PageRank is an algorithm that ranks web pages by calculating the number and quality of links pointing towards the webpage. A variant of PageRank is the Personalized PageRank Algorithm. The algorithm is applied to a network or graph and a defined source node. It then performs random walks through the network with random jumps that takes it back to the source node. With a Monte Carlo approach, the stationary distribution of visited nodes can be approximated. This approach can possibly benefit by being implemented in a MapReduce for processing large graphs.
This project aims to investigate and show how the Personalized PageRank algorithm can be implemented in a MapReduce framework and test it on an application domain. The application in this project will be a crawler that maps a web domain and ranks the pages according to the algorithm. We will aim to use common frameworks such as Apache Nutch and Apache Hadoop to build the application.