Till innehåll på sidan

Improving the Precision of Automatic Program Repair with Machine Learning

Tid: Fr 2023-02-24 kl 13.30

Plats: Kollegiesalen, Brinellvägen 6

Videolänk: https://kth-se.zoom.us/j/63393781380

Språk: Engelska

Respondent: He Ye , Teoretisk datalogi, TCS

Opponent: Associate Professor Baishakhi Ray, Columbia University

Handledare: Professor Martin Monperrus, Teoretisk datalogi, TCS; Professor Benoit Baudry, Programvaruteknik och datorsystem, SCS

QC 20230208


Automatic program repair as a research field aims to eliminate software bugs and vulnerabilities in an automatic manner. Automatic program repair holds great promise to reduce the debugging cost and increase the productivity of software development. Test suite is one of the most widely used specifications for automatic program repair to specify correct program behavior and guide patch generation. However, test suite is an incomplete specification with limited input-output data. This results in automatic program repair generating patches that merely satisfy test suite specifications, yet fail to repair buggy programs in general. The generation of a great number of incorrect patches leads to a low precision of automatic program repair.

In this thesis, we focus on improving the precision of automatic program repair from three perspectives: patch generation, patch assessment in practice, and patch assessment for scientific usage. This thesis makes contributions to the following in automatic program repair. First of all, to increase the precision of patch generation, we propose two learning-based automatic program repair approaches to encourage the generation of more correct patches with fewer candidate patches. Second, to increase the precision of patch assessment in practice, we propose to build a probabilistic model based on static code features to discard incorrect patches and thus increase the ratio of correct patches to all generated patches. Third, to increase the patch assessment precision for scientific usage, we propose to use automatically generated test cases to discard incorrect patches.