Research Seminar on AI: Video Object Segmentation

Dienstag, 29.06.2021, 16.00 Uhr

The event series "Research Seminar on AI", hosted by the Cluster of Excellence "Internet of Production" (IoP) and the AI Center, starts with a talk by Paul Voigtlaender. He will introduce the computer vision task of video object segmentation (VOS) and one of his methods to tackle it. For VOS, one or multiple objects in a video need to be tracked over time on a pixel-level. VOS has many important applications and is a challenging task. In order to address these challenges, Fast End-to-End Embedding Learning for Video Object Segmentation (FEELVOS) is introduced. FEELVOS is a simple and fast method which unlike previous work on VOS does not rely on first-frame fine-tuning. In order to segment a video, for each frame FEELVOS uses a semantic pixel-wise embedding together with a global and a local matching mechanism to transfer information from the first frame and from the previous frame of the video to the current frame. In contrast to previous work, this embedding is only used as an internal guidance of a convolutional network. The novel dynamic segmentation head allows to train the network, including the embedding, end-to-end for the multiple object segmentation task with a cross entropy loss. At the time of publication, FEELVOS achieved a new state-of-the-art in VOS without fine-tuning.

Paul Voigtlaender is a fifth-year Ph.D. student at the Chair for Computer Vision with Prof. Dr. Bastian Leibe at RWTH Aachen University. He received his M.Sc. and B.Sc. degrees in Computer Science from RWTH Aachen University. His research interests include Video Object Segmentation, Multi-Object Tracking and Segmentation, Single Object Tracking, and semi-automatic annotation of datasets.