Pose and gesture analysis play an important role in various applications such as human-machine interaction, behaviour analysis, video surveillance, annotation, search and retrieval, motion capture for entertainment and media industry and interactive web-based applications. We are performing research and development of real-time depth and video analysis algorithms since many years, mainly focusing on body, head, eye, hand and finger tracking as well as gesture and gait analysis.
These modules are used in various applications such as annotation of gestural behaviour in humanities research, ambient assisted living solutions, interaction with applications and media content as well as novel video communication applications.
Due to cheap depth sensors, real-time gesture recognition for a large variety of more complex still and dynamic hand gestures is possible. This allows the user to interact with media content or to manipulate something touchless in an easy and non-intrusive way. In the video below, an example is given for orientation-independent gesture recognition based on depth sensor.
In humanities research such as psycho-linguistic or neuro-psychology, video recordings of interview sessions are analysed to carry out research. Current practice is to perform manual annotation of the video content in order to develop or validate theoretical studies. Due to the tremendous amount of video material and the huge amount of time required for manual annotation, automatic video analysis becomes obvious. Our pose and gesture algorithms allow a fast, robust and automatic annotation of human motion and behaviour for a large variety of scenarios.
In the German funded project AUVIS (2012-2015), a video analysis framework for automatic annotation of gestural behaviour has been developed that provides automatic annotation following the NEUROGES coding system.
Early work on hand tracking and finger gesture analysis has been done for avatar animation. Based on the stable and robust hand and head tracking, the 2D position of the hands and the head rotation is transferred to body animation parameters (BAP) as defined in the video standard MPEG-4 (Part 2 (Visual). The resulting animation parameters are sent to the receiving side at very low bandwidth compared to full video transmission. The system also provides gesture recognition due to the high quality segmentation results based on skin colour. A set of 10 different gestures from the American Sign Language set are recognized and immediately shown by the avatar. In the figure below, the animated avatar (left) and the related input video (right) are shown.
- O. Schreer, S. Masneri, H. Lausberg, H. Skomroch: Coding Hand Movement Behavior and Gesture with NEUROGES Supported by Automatic Video Analysis, 9th International Conference on Methods and Techniques in Behavioral Research (Measuring Behavior 2014), Wageningen, The Netherlands, August 27-19, 2014.
- O. Schreer, S. Masneri: Automatic Video Analysis for Annotation of Human Body Motion in Humanities Research, International Workshop on Multimodal Corpora in conjunction with 9th edition of the Language Resources and Evaluation Conference, Reykjavik, Iceland, May 26-31, 2014.
- O. Schreer, D. Schneider: Supporting linguistic research using generic automatic audio/video analysis, Language Documentation & Conservation Special Publication: Potentials of Language Documentation: Methods, Analyses, and Utilization, ed. By Frank Seifart, Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts, and Paul Trilsbeek, ISBN 978-0-9856211-0-0, no. 6, pp. 39–46, 2012.