Skip to main content
To KTH's start page To KTH's start page

Weakly-Supervised Action Localization

Two very exciting seminars by two notable speakers:
Josef Sivic and Cees Snoek.

Time: Fri 2017-01-13 15.00 - 16.20

Location: Venue: Room 304 Teknikringen 14

Participating: Cees Snoek

Contact:

Export to calendar

Abstract: We strive for spatiotemporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier demanding carefully annotated box annotations at train time. Annotating action boxes in video is cumbersome, tedious, and error prone. In this talk I will highlight two recent approaches for action localization that avoid box annotations. In the first one we propose to annotate actions in video with points on a sparse subset of frames only. We introduce an overlap measure between action proposals and points and incorporate them all into the objective of a non-convex 
Multiple Instance Learning optimization. In the second one, we present VideoLSTM, a new architecture for end-to-end sequence learning of actions in video. Starting from the soft-Attention LSTM, VideoLSTM makes three novel contributions. First, to exploit the spatial correlation we hardwire convolutions in the soft-Attention LSTM architecture. Second, we introduce motion to guide the attention towards the relevant spatiotemporal locations. And finally, we demonstrate how the attention from VideoLSTM can be used for action localization by relying on just the action class label.

Cees Snoek:  www.ceessnoek.info