Skip to main content
To KTH's start page

Advancing Dependability of SRAM-FPGA

Towards Improved Mitigation and Fault Injection Strategies for Single Event Upset

Time: Fri 2025-11-14 14.00

Location: Kollegiesalen, Brinellvägen 8, Stockholm

Video link: https://kth-se.zoom.us/j/69161493932

Language: English

Subject area: Information and Communication Technology

Doctoral student: Trishna Rajkumar , Elektronik och inbyggda system

Opponent: Professor Peeter Ellervee, Tallinn University of Technology, Tallinn, Estonia

Supervisor: Associate professor Johnny Öberg, Elektronik och inbyggda system; Professor Elena Dubrova, Elektronik och inbyggda system

Export to calendar

QC 20251020

Abstract

SRAM-based field programmable gate arrays (SRAM-FPGAs) are a class of programmable integrated circuits that use static random-access memory (SRAM) cells to configure their logic and routing resources. These devices  play a pivotal role in digital computing owing to their inherent parallelism, high logic capacity and reconfigurability. These attributes have led to their widespread adoption in  space missions, aerospace, medical devices, data centers, nuclear reactors and high-energy particle accelerators.  In hazardous radiation environments, SRAM-FPGAs are valued not only for their high performance and cost-effectiveness but also for their ability to support design updates  with minimal manual intervention and no physical hardware modifications. However, these devices are vulnerable to single event upset (SEU), a radiation-induced error  which  inverts SRAM cell contents. Since the configuration memory, which stores the FPGA functionality and the routing information, is composed of SRAM cells, such changes can have catastrophic consequences in safety-critical applications. The continued CMOS scaling further exacerbates this problem through reduced feature size and increased volume of configuration bits. As devices shrink, they become more susceptible  to multiple errors that weaken the traditional mitigation schemes. Moreover, the exponential growth in configurable elements increase the cost and complexity of validation techniques such as fault injection.  Addressing these dependability challenges in SRAM-FPGAs forms the core objective of this thesis. To achieve this, we introduce techniques to (1) detect failures in the scrubber, a widely used mitigation scheme for configuration memory (2) optimize fault injection to reduce experimental time and (3) identify vulnerable areas of the FPGA fabric to streamline dependability efforts.

To detect scrubber failures, this work introduces two non-invasive, log-based frameworks: a Markov chain model for scrubber health monitoring, and AnoDe, a self-supervised failure detection system. They cater to varying levels of domain knowledge, with the former leveraging IP specifications and the latter requiring none, making them adaptable to diverse operational scenarios. For optimizing fault injection, a Bayesian sampling framework is proposed to reduce the number of injections by integrating prior knowledge with the observed data.  This method maintains the statistical confidence and the black-box nature of classical statistical fault injection  while addressing the inflated sample size caused by parameter uncertainty.  Finally, the thesis presents learning-based strategies using Monte Carlo Tree Search and Long Short-Term Memory  models to  identify critical bits in the configuration memory without reverse engineering the FPGA layout. These approaches integrate seamlessly into existing fault injection setups and are particularly valuable in environments with limited access to radiation facilities. Collectively, the methods developed in this work advance the state of the art in SEU resilience and enable the broader adoption of commercial SRAM-FPGAs in safety-critical domains. 

urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-371773