Artificial Intelligence for Health: Establishment of a benchmarking process for AI algorithms

April 8, 2019

The Focus Group on "AI for Health" (FG-AI4H), with the participation of the Fraunhofer Heinrich Hertz Institute HHI, invites experts from the fields of medicine, artificial intelligence (AI), data analysis, and regulation to participate in the benchmarking processes for artificial intelligence in healthcare. To safely implement AI technologies in the healthcare sector, further insights into their reliability as well as their training processes need to be obtained for different data sets. To this end, FG-AI4H is developing a benchmarking process for AI algorithms in the healthcare sector that can be used as an international, independent standard evaluation framework.

Topic groups, which were established at past meetings, are now agreeing on a pragmatic approach to start the benchmarking process for each application case. This includes a clear definition of the application field (e.g., the health condition to be diagnosed) and the desired results of the AI models in this application, the identification of adequate sources for training and testing data, as well as the simplified preparation of heterogeneous data from multiple sources. The benchmarking process is conducted with secure, confidential testing data. The data will be obtained from different sources to determine whether the use of AI models in different populations, measurement objects, and healthcare systems can be standardized. Furthermore, it is advisable to compare the performance of the AI model with that of a person (e.g., a pathologist) or a person with AI assistance. For this purpose, the same use case should be applied. These comparisons can provide meaningful insights into the work of AI algorithms.

Once this procedure is established, AI models can be submitted to an online platform for evaluation using the test data. A benchmarking process established in such a manner should produce reliable, robust, and independent evaluation systems. In addition, independent data sets are provided for model validation, which can be used to report on multivariable predictive models in the healthcare sector in accordance with best practice examples.

The FG-AI4H held its fourth meeting in Shanghai, China, from April 2 to 5, 2019. Further meetings this year will take place in Geneva, Switzerland; Zanzibar, Tanzania; and New Delhi, India.

The comment "WHO and ITU establish benchmarking process for artificial intelligence in health" was published in „The Lancet“, one of the world's oldest, most prestigious, and best known general medical journals.

Please find the article (DOI: 10.1016/S0140-6736) here .

Artificial Intelligence for Health: Establishment of a benchmarking process for AI algorithms

Martina Müller

Prof. Dr.-Ing. Thomas Wiegand