MoDL

MoDL: Model-based Deep Learning for Computer Vision Problems

Duration: July 2020 - June 2024

MoDL will demonstrate the value of combining two forms of information in artificial intelligence: (a) a priori knowledge generated from physical, heuristic, or statistical models; and (b) deep neural network–derived knowledge. Independently, each form of knowledge has an Achilles heel. The former can only provide an approximation of complex relationships. The latter is highly dependent on the representativeness (i.e., quality and quantity) of training data and suffers from the “black box” conundrum. MoDL addresses these weaknesses by developing solutions that combine both forms of information. Specifically, through integrating model-based knowledge in deep neural networks, we will enhance the interpretability and generalizability of deep neural network model and reduce the amount of training data required to address complex questions in computer vision.

As use cases, MoDL will consider three main Computer Vision tasks for the generation of high-quality models from visual data: (i) reconstruction of three dimensional geometry of complex objects; (ii) acquisition and modeling of non-rigid movements; and (iii) estimation and modeling of reflection properties, texture, and shading. However, the results of this study promise to have broader applications in the field of artificial intelligence.

To generate strong, robust and generalizable models, this project follows an interdisciplinary approach and combines computer vision with computer graphics in an analysis-by-synthesis approach for deep learning: computer graphics models can be used as a priori knowledge to improve computer vision methods. These improved algorithms provide improved data, which in turn leads to improved models.

The methods developed in this project will also make valuable contributions to the following points: Interpretability: By integrating a priori knowledge, the decisions of an AI system could become more comprehensible and plausible; Generalizability: Since model-based prior knowledge is not purely learned from data, AI solutions can be generalized to unseen data in a better and more controlled way; Training data: The limitation of the solution space by a priori knowledge allows training even with smaller data sets.