QoT Dataset for SDM Networks

This dataset has been developed within the framework of the project STARFALL (16KIS1419), funded by the German Ministry of Education and Research (BMBF).

1. Overview

This dataset collection includes eighteen novels, publicly available quality of transmission (QoT) datasets designed specifically for space-division multiplexing (SDM) networks. All datasets of this collection have the same data structure. These datasets were generated using two distinct network topologies: (i) the continental core optical network of the United States (CONUS) and (ii) a short-haul topology, as illustrated in Fig. 3.2 of [1]. For each topology, three different scenarios were investigated that involved the selection of fiber and amplifier types: (i) bundles of seven single-mode fibers (Bu-SMFs) and standard C-band erbium-doped fiber amplifiers (EDFAs); (ii) seven-core hexagonal arranged weakly-coupled multi-core fibers (WC-MCFs) that share the same physical properties as Bu-SMFs and also utilize standard C-band EDFAs; and (iii) seven-core hexagonal arranged MCFs with parameters described in [2], paired with SDM amplifiers detailed in [3]. The formation of spatial super-channels was assumed, and three different network switching strategies were evaluated: (i) independent switching (Ind-Sw), (ii) joint switching (J-Sw), and (iii) fractional joint switching (FrJ-Sw) [4]. Table 1 provides an overview of the datasets and their corresponding titles. Each QoT dataset consists of representations of lightpath data. A single lightpath sample captures the characteristics of the lightpath under test, along with a few network-wide parameters relevant to that specific lightpath. All samples of the datasets are labelled with the QoT metrics Optical Signal-to-Noise Ratio (OSNR), Signal-to-Noise Ratio (SNR), and Bit-Error-Ratio (BER). Additionally, each sample includes a binary class label (0 or 1) indicating whether the lightpath is above (0) or below (1) a predefined BER threshold. The BER threshold is constrained by the Forward Error Correction (FEC) mechanism of the transceivers. Total number of samples plus the proportion of each class contribution are shown in Table 1.

Table 1. SDM QoT Dataset

    Number of samples
Dataset Title Total Class 1 Class 0
01 CONUS, Bu-SMFs, J-Sw 860276 555157 305119
02 CONUS, Bu-SMFs, FrJ-Sw 864105 554846 309259
03 CONUS, Bu-SMFs, Ind-Sw 862713 553737 308976
04 CONUS, WC-MCFs, J-Sw 827790 546163 281627
05 CONUS, WC-MCFs, FrJ-Sw 831394 546556 284838
06 CONUS, WC-MCFs, Ind-Sw 827556 545632 281924
07 CONUS, WC-MCFs [2], SDM Amplifier [3], J-Sw 877840 618699 259141
08 CONUS, WC-MCFs [2], SDM Amplifier [3], FrJ-Sw 897896 624036 273860
09 CONUS, WC-MCFs [2], SDM Amplifier [3], Ind-Sw 903145 627341 275804
10 Short-haul, Bu-SMFs, J-Sw 157745 124815 32930
11 Short-haul, Bu-SMFs, FrJ-Sw 171534 131746 39788
12 Short-haul, Bu-SMFs, Ind-Sw 174372 132696 41676
13 Short-haul, WC-MCFs, J-Sw 136650 108320 28330
14 Short-haul, WC-MCFs, FrJ-Sw 182361 145051 37310
15 Short-haul, WC-MCFs, Ind-Sw 170306 131616 38690
16 Short-haul, WC-MCFs [2], SDM Amplifier [3], J-Sw 138419 124613 13806
17 Short-haul, WC-MCFs [2], SDM Amplifier [3], FrJ-Sw 151531 134372 17159
18 Short-haul, WC-MCFs [2], SDM Amplifier [3], Ind-Sw 154679 136919 17760

2. Background

To produce synthetic data for network monitoring, Fraunhofer HHI's optical network planning tool, PLATON was used. A comprehensive overview of PLATON's capabilities can be found in [5]. To adapt the tool for SDM network planning, PLATON has been enhanced by incorporating different switching architectures, various fiber types and updating the nonlinear analytical channel model used to derive QoT metrics.

To generate all SDM QoT datasets from network simulation data, a pipeline was developed that transforms the data obtained from the network planning tool into structured datasets.

3. Networking Simulation Scenarios

Eighteen QoT datasets were developed based on data collected from simulations of SDM-EON operating in the C-band. Various data rates were induced by employing different modulation formats while maintaining a fixed symbol rate of 32 GBd. The frequency grid utilized a fixed channel spacing of 37.5 GHz. The network simulation scenarios for datasets 01–09 are based on the CONUS network topology. In contrast, datasets 10–18 are modeled using a smaller topology, which is depicted in Fig. 3.2 of [1]. In the CONUS topology simulations, each run handles 50,000 traffic requests, while the short-haul simulations process 10,000 requests. In the simulations, traffic requests are uniformly distributed across all nodes equipped with add/drop capabilities. The inter-arrival time (tia) and holding time (th) for all connections follow a Poisson distribution, with expected values of E[tia] = 5 seconds and E[th] = {4000, 5000, 6000, 7000} seconds. In all simulations, a consistent lightpath provisioning procedure were employed. The algorithm identifies the three shortest, most link-disjoint paths between each node pair using Dijkstra’s algorithm [1]. Spectrum and spatial resources are allocated according to a first-fit strategy [1], while five modulation formats are considered: quadrature phase-shift keying (QPSK), and 4-, 8-, 16-, 32-, and 64-quadrature amplitude modulation (QAM). Each dataset relies on a set of eight simulations. The simulations were divided evenly: half use a more pessimistic lookup table (assuming fully loaded links), while the other half use a more optimistic table (assuming empty links). Across all simulations, a range of incoming traffic data rates from 100 Gb/s to 2100 Gb/s, with increments of 50 Gb/s were tested. However, certain data rates were excluded, as their required space allocation—based on appropriate modulation—exceeded the limit of seven cores. In total, 144 simulations were carried out to produce the complete set of eighteen datasets.

4. Dataset Structure

The dataset X RD×N consists of D samples x(d) RN, d N, d D. Each sample represents a lightpath, characterized by N features xn R, n N, n N describing both the lightpath itself and the status of the network links it traverses. The features are a scalar value, representing the status of individual spatial spaces allocated to the lightpath.

5. Tasks

These datasets aim to support the development of ML-driven automation in SDM networks, making studies more efficient, reliable, and comparable. Additionally, these datasets can be used for the task of ML-based QoT estimation including classification and regression problems.

6. Files

The dataset files are available for download on this webpage. If you are interested in receiving the dataset, please follow the link.

7. Subject Keywords

Quality of Transmission, Space-division Multiplexing, Inter-core Crosstalk, Classification, Regression, Lightpath-based QoT, GN-Model, Simulation, SD-FEC, SDM-EON, ML.

8. Conflicts of Interest

The authors have no conflicts of interest to declare.

References

[1] J. M. Simmons, Optical Network Design and Planning, 2nd ed. (Springer, 2014). 

[2] T. Hayashi, T. Taru, O. Shimakawa, T. Sasaki and E. Sasaoka, "Uncoupled multi-core fiber enhancing signal-to-noise ratio," Opt. Express, 20, B94-B103 (2012). 

[3] H. Takeshita, Y. Shimomura and K. Hosokawa, "Demonstration of C+L 8.7-THz 7-core multicore EDFA with a single pump laser diode using pump recycling technology," in European Conference on Optical Communications(2023), pp. 246-249. 

[4] D. M. Marom and M. Blau, "Switching solutions for WDM-SDM optical networks," IEEE Communications Magazine, 53, 60-68 (2015). 

[5] G. Bergk, B. Shariati, P. Safari and J. K. Fischer, "ML-assisted QoT estimation: a dataset collection and data visualization for dataset quality evaluation," Journal of Optical Communications and Networking, 14, 43-55 (2022).