When applying deep learning to computer vision: is the whole greater than the sum of two parts?

July 1, 2020

Dr. Anna Hilsmann, head of the Computer Vision & Graphics (CVG) research group in the Department for Vision and Imaging Technologies (VIT) at Fraunhofer Heinrich Hertz Institute, was recently awarded 1.6 Million Euro funding from the Federal Ministry of Education and Research (BMBF) for her 4-yr project entitled “Model-based deep learning for computer vision problems (MoDL).” The project will shine light on the value of integrating a priori knowledge into deep neural networks in the context of computer vision.

BMBF recognizes the disjunction between the high number of qualified female scientists and the low number of females conducting research in the field of artificial intelligence (AI). To tap this potential and enhance gender diversity in a growing and dynamic field, BMBF opened a call for funding  to promote female researchers exploring innovative themes in AI. In response to the call, Dr. Hilsmann—whose previous research lies at the intersection of computer vision, machine learning, computer graphics, and visual computing with applications in multimedia, industry, augmented reality, security, and medical technology—proposed a project that will demonstrate the value of combining two forms of information in artificial intelligence: (a) a priori knowledge generated from physical, heuristic, or statistical models; and (b) deep neural network–derived knowledge.

Independently, each form of knowledge has an Achilles heel. The former can only provide an approximation of complex relationships. The latter is highly dependent on the representativeness (i.e., quality and quantity) of training data and suffers from the “black box” conundrum. MoDL addresses these weaknesses by developing solutions that combine both forms of information. Specifically, through integrating model-based knowledge in deep neural networks, Dr. Hilsmann will enhance the interpretability and generalizability of deep neural network model and reduce the amount of training data required to address complex questions in computer vision. As use cases, Dr. Hilsmann will consider three main Computer Vision tasks for the generation of high-quality models from visual data: (i) reconstruction of three-dimensional geometry of complex objects; (ii) acquisition and modeling of non-rigid movements; and (iii) estimation and modeling of reflection properties, texture, and shading. However, the results of this study promise to have broader applications in the field of artificial intelligence.