Wolfgang Paier, research associate in the "Vision and Imaging Technologies" department at Fraunhofer Heinrich Hertz Institute (HHI), is the winner of this year’s Best Paper Award assigned by the European Conference on Visual Media Production (CVMP). The scientist received the award for his paper "Neural Face Models for Example-Based Visual Speech Synthesis." In this paper Wolfgang Paier explores an example-based animation approach combined with a neural face model to animate realistic faces. The paper is co-authored by Dr. Anna Hilsmann and Prof. Peter Eisert who work in the same department at Fraunhofer HHI.
The CVMP conference was held virtually on December 7-8, 2020. The event brings together production and post-production specialists from the worlds of film, broadcast and games with researchers from the field of imaging and graphics. Every year, the conference organizers recognize the best full paper submission with a 1000 Euro prize. This is already the second time that Fraunhofer HHI has won the award.
Wolfgang Paier completed his studies at FU Berlin with a Master of Science degree in Computer Science in 2013. He wrote his thesis entitled "Acquisition of 3D-Head-Models using SLR-Cameras and RGBZ-Sensors" at Fraunhofer HHI. Prof. Peter Eisert who leads the "Vision and Imaging Technologies" department and former Fraunhofer HHI researcher David Blumenthal-Barby provided academic guidance and supervision. Since 2011, Wolfgang Paier has been part of the institute’s "Computer Vision and Graphics" research group. He had started as a student and then became a research associate in 2013. His work addresses the capture and animation of human faces, which are used to animate realistic avatars. In his research, he combines new technologies such as deep neural networks with traditional model-based methods.
Creating realistic animations of human faces with computer graphic models is still a challenging task. The award-winning paper addresses this problem by exploring example-based animation techniques for speech synthesis. The focus is on mouth movements of depicted persons. For the animation, an actress was recorded at Fraunhofer HHI’s volumetric studio while speaking words, sentences and a short text. These recordings were divided into separate visemes, concrete mouth images that are created during speech. With the help of a deep neural network, this data was then used to generate an animatable face model. This model, unlike other face model approaches, captures both geometry and texture and can thus realistically represent complex areas such as mouth (cavity) and eyes.
"We are very pleased that our novel research approach gets more visibility with this award. Data-driven methods based on neural networks will be the future of realistic facial animation," said Dr. Anna Hilsmann, head of "Computer Vision and Graphics" group at Fraunhofer HHI.
The paper is part of the EU project Content4All. It aims at making content more accessible for the sign language community by implementing an automatic sign-translation workflow with a photo-realistic 3D human avatar.
You can find further information on the research topic here .