Electrocardiography (ECG) is a widely used, non-invasive diagnostic procedure. Its interpretation is increasingly supported by automatic interpretation algorithms. However, progress in this field has been limited by a lack of data for training the algorithms and a lack of suitable evaluation procedures to ensure the comparability of different algorithms. In order to address these problems, researchers of the Fraunhofer Heinrich Hertz Institute HHI in cooperation with the Physikalisch-Technische Bundesanstalt (PTB) have prepared the currently largest public clinical ECG data set and presented first corresponding benchmarking results.
Cardiovascular diseases (CVD) are among the diseases with the highest mortality rates worldwide. Electrocardiography (ECG) is a non-invasive instrument for assessing the general cardiac condition of patients and serves as an initial examination for the diagnosis of CVD. A second major field of application of ECGs, which will become even more important in the future, is telemedicine, especially the monitoring of long-term ECGs. However, ECGs are mostly evaluated by often still inexperienced physicians with no or only minimal algorithmic decision support. Deep learning algorithms are able to recognize patterns in large amounts of data in a way that perhaps only experienced cardiologists can do otherwise. Through the support of automatic ECG interpretation algorithms based on deep learning, the workload of medical personnel could be reduced considerably.
Currently, however, research in the field of automatic ECG diagnosis is delayed by several challenges: Existing algorithms with excellent performance have typically been trained on non-public data sets and thus elude the use of the wider scientific community, while public data sets have been too small for training and especially for a reliable evaluation of machine learning algorithms. Furthermore, the evaluation methodology is not standardized, which results in a lack of comparability of the results.
The Fraunhofer HHI researchers’ Scientific-Data-Paper addresses the lack of training data sets by providing a data set that was prepared in cooperation with the Physikalisch-Technische Bundesanstalt. It contains 21,837 12-lead ECGs from 18,885 patients and is thus the largest public clinical data set of this kind to date (about 40 times larger than the PTB Diagnostic Database used so far). It provides machine-readable findings and more than 70 different ECG annotations by up to two cardiologists. This diverse data set also includes many comorbidities, healthy patients who are often underrepresented in clinical data sets and different signal qualities. It is therefore ideally suited for training and evaluating machine learning algorithms on a real-world data set. The data set is publicly available on PhysioNet and described in a Data Descriptor at Scientific Data . A joint press release of PTB and HHI is available via IDW .
A corresponding benchmarking article addresses the problem of evaluation methodology by defining different benchmarking tasks with clearly defined evaluation procedures; from predicting ECG annotations to age and gender prediction to signal quality assessment. In addition, it evaluates the latest algorithms on these tasks and thus provides first benchmarking results on the new PTB-XL data set, which other research teams/studies can measure themselves against. For clinical use, further quality criteria beyond quantitative accuracy are of great importance. Here, for example, the quantification of the prediction reliability and the interpretability should be mentioned, which in the above article were only exploratively investigated, but which must be analyzed in detail in future research.
The benchmarking article has been published as a preprint on arXiv .