From Big Data to Software Engineering, and back!
Speaker: Amine Benelallam, Univ Rennes, Inria, CNRS, IRISA, France
Title: From Big Data to Software Engineering, and back!
With the rapidly growing availability of big data and the advances made building scalable and complex software, Big data and software engineering should go hand in hand in order to empower a new generation of software-enabled solutions. In the one hand, Big Data Software Engineering (BDSE) promises to improve the development of big data applications. On the other hand, Big data analysis techniques may provide means to improve software development, design, and reuse, introducing quantitative empirical software engineering as a new data science discipline.
Software Engineering for Big Data: Big data processing platforms provide high-level languages simplifying the implementation and the distributed execution of big data processing and analysis routines. Nonetheless, these languages are data-source-agnostic, and not user-friendly. To embrace this valuable information, amongst other skills, non-experts end-users are expected to be good at programming.
Domain Specific Languages (DSLs), which are aware of the data structure and source data-source ad-hoc processing languages not only democratize the analysis of these datasets but they also help to build scalable and easy to optimize languages. In the first part of the talk, I’ll report our experience on scaling up common graph management operations implemented on top of ATL, A DSL for graph transformations. We show how, thanks to the high level of abstraction of ATL, we were able to map its execution semantics to a well-known distributed programming model. We improve the performance of the transformation by a factor of 6 in a cluster composed of 8 machines.
Big Data for Software Engineering: The wide availability of open-source software repositories such as version control system repositories, artifact repositories, and bug databases, opens up new opportunities for a better understanding of software engineering processes, evolution, and trends, thanks to existing big data processing and analysis platforms. In the second part of the talk, I’ll report our experience on mining Java libraries artifacts and their dependencies as a large-scale temporal graph. We introduce the Maven-miner tool, its implementation, its architecture and the design decisions we made to run such complex and time-consuming tasks on top of a budget-limit infrastructure. We released 5 different versions fixing in total 20 bugs and enhancement features. We analyzed merely than 2.8M Maven artifact, resulting in a graph database with roughly 2.5M node and 12M dependency. Finally, we discuss the research opportunities this very large dataset would enable.
My name is Amine BENELALLAM, I am a post-doctoral fellow in Software Engineering working in the DiverSE Inria team, led by Pr. Olivier Barais. In particular, I am interested in unleashing the power of Model-driven Engineering (MDE). This is either by enabling the development of scalable MDE-based applications or redefining the boundaries of the MDE approach to handle timely aspects.
Not so far ago, I started working on quantitative empirical software engineering as a means to improve software development, design, and reuse. I am also working on improving knowledge representation in self-adaptive systems based on novel MDE techniques. This is by recognizing the importance of the temporal dimension and introducing time as a built-in concept at application design time.