Experiences from working with large graphs at Telefonica Research
Speaker: Vasiliki Kalavri
In this talk, I will give an overview of the projects I worked on, during my 6-month internship at Telefonica.
1. Leveraging semi-metricity in graph analysis.
In this project, we leverage the concept of the metric backbone to improve the efficiency in large-scale graph analytics. The metric backbone is the minimum subgraph that preserves the shortest paths of a weighted graph. While one can compute the metric backbone by solving the all-pairs-shortest-paths (APSP) problem, this approach incurs prohibitive time and space complexity for big graphs. Instead, we propose an algorithm for computing the metric backbone that avoids the computation of APSP and can scale to large graphs. We use the metric backbone in place of the original graph to improve the performance of graph analytics applications on two different systems, a batch graph processing system (Apache Giraph) and a graph database (Neo4j).
2. Automatic classification of trackers.
In this work, we aim to build a machine learning classifier that can automatically discover new trackers on the web. We analyze user traffic logs to identify features that characterize trackers on the web and gain insight on their behavior and their network structure. We use this dataset to build a bipartite graph of web accesses and apply classification algorithms that exploit the graph structure to identify trackers.
This part of the talk will present early results from analyzing this dataset and describe the properties that we believe can be used as discriminating features in this classification problem. Then, I will give an outline of the data analysis pipeline that we have built using Apache flink, in order to analyze such a dataset both locally and on a distributed cluster.