Large-scale Data-driven Video Analysis

Videos have become ubiquitous on the Internet, which profoundly increased the necessity for automatically analyzing their semantic content for vision applications such as classification, recognition, and tracking. Recently, deep learning-based methods have proved to be very successful in automatic analysis of videos and provided the state-of-the-art results for various video analysis tasks. However, the existing algorithms are usually of very high complexity and have high storage and processing requirements, which limits the scalability and real-time capabilities of the algorithms drastically.

In order to decrease computation and storage overhead of video analysis algorithms, we develop algorithms that drastically reduce computational and storage requirement for real-time operations. Specifically, we focus on algorithms, which carry out a portion of the analysis over compressed video by making use of codec-specific features such as motion vectors, coding block types, and transform coefficients for a preliminary semantic analysis. We also develop low-complexity, hierarchical algorithms for vision tasks such as action recognition and person/object tracking which aim to decrease the computational burden of the state-of-the-art feature-based machine learning and deep learning approaches.

Related publications:

Vignesh Srinivasan, Sebastian Lapuschkin, Cornelius Hellge, Klaus-Robert Müller, and Wojciech Samek:
Interpretable human action recognition in compressed domain,
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), New Orleans, LA, USA, March 2017.

Serhan Gül, Jan Timo Meyer, Thomas Schierl, Cornelius Hellge, and Wojciech Samek:
Hybrid video object tracking in H.265/HEVC video streams,
Proceedings of the International Workshop on Multimedia Signal Processing (MMSP), Montreal, Canada, September 2016.
Vignesh Srinivasan, Serhan Gül, Sebastian Bosse, Jan Timo Meyer, Thomas Schierl, Cornelius Hellge and Wojciech Samek:
On the robustness of action recognition methods in compressed and pixel domain,
Proceedings of the European Workshop on Visual Information Processing (EUVIP), Marseille, France, October 2016.