Seminar 2015-10-28 A.J.
A compiler automated decoupled access-execute approach
Speaker: Alexandra Jimborean
Alexandra Jimborean received her PhD from the University of Strasbourg, France for her work on adapting the polytope model for dynamic and speculative parallelization. Currently, Alexandra is Associate Senior Lecturer (Assist. Prof.) in Uppsala University researching in the area of compile-time and run-time code analyses, optimizations for performance and energy efficiency and software-hardware co-designs. Her research focuses on transforming code in the compiler's intermediate representation to generate self-tuning code that automatically and dynamically adapts to the execution environment.
Computer architects have been successful in the quest of reducing the energy expenditure with negligible impact on performance, using techniques such as Dynamic Voltage and Frequency Scaling (DVFS). However, as we approach the end of Dennard scaling, we lose the quadratic energy benefit from reduced voltage, and are limited to the linear performance/energy trade-off from adjusting frequency. To continue benefiting from DVFS, we provide compiler support to address the limits of current hardware.
One promising approach is software decoupled access-execute, in which the compiler transforms the code into coarse-grain phases that are well-matched to the DVFS capabilities of the hardware. The method is proved efficient for statically analyzable codes, as the compiler is able to perform complex memory analysis and to apply static heuristics for designing a lean and efficient access phase. Yet, general purpose applications pose significant challenges due to the notorious pointer aliasing problem, complex control flow and unknown runtime events. We propose a universal compile-time method to decouple general purpose applications (e.g., serial codes such as SPEC-CPU2006), using multiversioning. To address the statically unknown events, we designed and compared one purely software approach and one technique based on a novel use of Hardware Transactional Memory. Our solutions overcome the challenges of complex code and show that irregular or memory-bound applications can see significant efficiency gains from automatic decoupled execution, with a negligible performance loss in other cases.