Interpretable, Interaction-Aware Vehicle Trajectory Prediction with Uncertainty
Time: Fri 2021-02-26 10.00
Subject area: Computer Science
Doctoral student: Joonatan Mänttäri , Robotik, perception och lärande, RPL
Opponent: Professor Michael Felsberg, Linköping University
Supervisor: Associate Professor John Folkesson, Robotik, perception och lärande, RPL
Autonomous driving technologies have recently made great strides in development, with several companies and research groups getting close to producing a vehicle with full autonomy. Self-driving cars introduce many advantages, including increased traffic safety and added ride-sharing capabilities which reduce environmental effects. To achieve these benefits, many modules must work together on an autonomous platform to solve the multiple tasks required. One of these tasks is the prediction of the future positions and maneuvers of surrounding human drivers. It is necessary for autonomous driving platforms to be able to reason about, and predict, the future trajectories of other agents in traffic scenarios so that they can ensure their planned maneuvers remain safe and feasible throughout their execution. Due to the stochastic nature of many traffic scenarios, these predictions should also take into account the inherent uncertainty involved, caused by both the road structure and driving styles of human drivers. Since many traffic scenarios include vehicles changing their behavior based on the actions of others, for example by yielding or changing lanes, these interactions should be taken into account to produce more robust predictions. Lastly, the prediction methods should also provide a level of transparency and traceability. On an self-driving platform with many safety-critical tasks, it is important to be able to identify where an error occurred in a failure case, and what caused it. This helps prevent the problem from reoccurring, and can also aid in finding new and relevant test cases for simulation.
In this thesis, we present a framework for trajectory prediction of vehicles based on deep learning to fulfill these criteria. We first show that by operating on a generic representation of the traffic scene, our model can implicitly learn interactions between vehicles by capturing the spatio-temporal features in the data using recurrent and convolutional operations, and produce predictions for all vehicles simultaneously. We then explore different methods for incorporating uncertainty regarding the actions of human drivers, and show that Conditional Variational Auto Encoders are highly suited for our prediction method, allowing it to produce multi-modal predictions accounting for different maneuvers as well as variations within them. To address the issue of transparency for deep learning methods, we also develop an interpretability framework for deep learning models operating on sequences of images. This allows us to show, both spatially and temporally, what the models base their output on for all modes of input without requiring a dedicated model architecture, using the proposed Temporal Masks method. Finally, all these extensions are incorporated into one method, and the resulting prediction module is implemented and interfaced with a real-world autonomous driving research platform.