ML Learning (ML)-based Quality of Transmission QoT estimation has recently attracted significant attention from the optical networking community. Numerous papers have been published addressing different aspects of this challenge. However, each contribution uses a different, locally generated dataset. Predominantly, the provided description, mainly in terms of the number of samples and used features, is not sufficient to reproduce a statistically similar dataset. Thus, the reported results are difficult to reproduce and hardly comparable. Publicly available datasets enable fair comparisons of different ML approaches and allow researchers to focus on the ML-specific challenges rather than creating an in-house dataset for their work.
This dataset collection includes multiple datasets, which can be used for benchmark various ML-assisted QoT estimation approaches such as regression and classification. The datasets can be used in lightpath-based of network-wide Qot estimation problems. All datasets of this collection have the same data structure. They only differ in the simulated networking scenario they rely on. All samples of the datasets are labelled with the QoT metrics Optical Signal-to-Noise Ratio (OSNR), Signal-to-Noise Ratio (SNR), and Bit-Error-Ratio (BER). Additionally, each sample includes a binary class label (0 or 1) indicating whether the lightpath is above (0) or below (1) a predefined BER threshold. The BER threshold is constrained by the Forward Error Correction (FEC) mechanism of the tranceivers.
The QoT dataset collection provides datasets for the task of ML-assisted QoT estimation in Elastic Optical Networks (EONs). The datasets are listed in Table 1. Each dataset includes two distinct data representations: a multidimensional network status representation and a unidimensional lightpath representation. A network status sample describes the entire network status, including all the already active lightpaths, at the instance of provisioning a new lightpath. The new lightpath is also referred to as the lightpath under test or probe lightpath. One network status sample comprises N feature matrices that together describe the state of the network. In contrast, a lightpath sample describes mainly the characteristics of the lightpath under test and a few network-wide descriptive values relevant to the lightpath under test in a single vector. The dataset samples in the network status dataset and the lightpath dataset are obtained from identical network state instances.
Table 1: The QoT dataset collection
CONUS - 37.5GHz fixed channel spacing
|02||CONUS - 37.5GHz fixed channel spacing|
100, 200, 300, and 400 Gb/s connection requests
Load Balancing and Predefined Transceiver Mode
|03||TSNN - 37.5GHz fixed channel spacing|
100, 200, 300, and 400 Gb/s connection requests
Load Balancing and Online Transceiver Mode
|04||CONUS - 37.5GHz fixed channel spacing|
50, 100, 150, 200, 250, and 300 Gb/s connection requests
Load Balancing and Predefined Transceiver Mode
Table 2 shows the size and class balance of the four QoT datasets. Each QoT dataset contains more than 1.2 million samples. Most of them are of positive class since in operational optical networks provisioned lightpaths rarely fail the given QoT requirements.
Table 2: Size and class-balance of the QoT datasets.
Number of samples
|Dataset||Total||Positive class||Negative class||Positive class [%]||Negative class [%]|
The available datasets are created using the Planning Tool for Optical Networks (PLATON) developed by Fraunhofer HHI. The statistics of network simulations including information about all established, released, and rejected lightpaths and their QoT metrics are recorded in a Traffic Engineering Database (TED). The QoT metrics are computed by a QoT estimation module interfaced with a nonlinear channel model. The channel model analytically calculates the OSNR, SNR, and the BER of optical channels on a link level with respect to Amplified Spontaneous Emission (ASE) and Nonlinear Interference (NLI) noise. We transformed the network simulation data saved in TEDs to a ML dataset for QoT estimation tasks using a data pipeline of Data Analytics Toolkit for Optical Networks (DALTON) developed by Fraunhofer HHI that we adapted for this specific use case. We provide a short description of PLATON and the nonlinear channel model used for creating the datasets of the QoT dataset collection in the next sections.
The network planning tool PLATON operates in either dynamic network operation mode or offline network planning mode. PLATON is built upon a set of routing and spectrum allocation algorithms which form the basis for its features and capabilities. Depending on which features are enabled and how they are adjusted, PLATON introduces specific constraints and modifies the service provisioning procedure. The simulations used to create the datasets of this collection were carried out using the dynamic network operation mode, which relies on a Discrete Event Simulation (DES) engine.
In its dynamic network operation mode, PLATON’s module for traffic generation triggers lightpath provisioning procedures by frequently producing traffic requests. The routing module finds available routes between the two nodes between which the connection is requested. In a next step, eligible transceiver configurations, that can potentially serve the connection request, are pre-selected from a pool of available transceiver configurations. The list of potential transceiver configurations is sorted in descending order of efficiency. The efficiency of a lightpath is defined as the number of bits per symbol times the quotient of symbol rate and the number of links the route includes. Starting from the most efficient potential transceiver configuration, the spectrum allocation module assigns frequency slots under wavelength continuity constraint and the QoT estimation module validates whether the pre-selected transceiver configuration can be used to establish one or multiple lightpaths that all fulfil the given QoT requirement. If the QoT requirement is expect to be satisfied, the requested service is served with the selected lightpath configuration. Otherwise the next efficient configuration is considered. If none of the eligible configurations can be used, the request is blocked.
The routing, spectrum allocation, and transceiver configuration algorithms are configurable and support various features to be enabled or disabled. This can lead to different constraints on the simulation scenario. The most important features and constraints used in the process of creating the datasets are defined next.
Features and Constraints of PLATON
QoT Violation Awareness
Establishing a new lightpath may violate the QoT of active lightpaths that share fiber links with that new lightpath. If QoT Violation Awareness is enabled, PLATON recalculates the QoT of all active lightpaths that share a link with the new lightpath when a new lightpath is established. This has two consequences. First, the minimum QoT assurance during provisioning phase guarantees not only the QoT of the new lightpath to be equal to or above the pre-defined minimum, but also guarantees that the QoT of all affected active lightpaths remains above or equal to the required minimum after they suffered from the QoT degradation. Second, the QoT metrics of affected lightpaths are updated after the establishment of a new lightpath. The updated QoT metrics are traced accordingly in the TED of PLATON for each event. If QoT Violation Awareness is disabled, QoT violation is not considered during lightpath provisioning and the QoT of active lightpaths is not updated after the establishment of a new lightpath. Disabling QoT Violation Awareness significantly reduces computation complexity.
The BVT module of PLATON emulates the configuration of network transceivers. It can be set to operate either in Online or Predefined Transceiver Mode. While the Predefined Transceiver Mode is enabled, PLATON can only choose from a set of predefined transceiver configurations available in a pool of configurations. A transceiver configuration is a tuple of symbol rate, modulation format, FEC scheme, channel width, and launch power. The predefined transceiver mode allows PLATON to take into account the transmission modes of commercial transceivers according to their datasheet. While Online Transceiver Mode is enabled, PLATON can combine any transceiver configurations. In this case multiple transceiver configurations are assembled in a list of potential candidate configurations to establish a lightpath. The Online Transceiver Mode provides the highest level of flexibility, yet it may suggest a transceiver configuration that is not available for corresponding commercial transceivers and can also lead to overprovisioning.
Load Balancing allows serving incoming connection requests that exceed the capacity of available transceivers in the network. In , a load balancing engine is proposed that distributes the traffic of big connection requests over various routes, when the shortest paths between the source and destination node are very congested. PLATON includes a load balancing mechanism that splits the traffic volume of a single connection request across multiple lightpaths if the incoming connection request cannot be established over a single lightpath; either due to insufficient capacity of the transceivers or inadequate QoT. Note that each lightpath of the load-balanced set is assigned to an individual frequency slot. However, different to the mechanism in , the entire set is forced to use the same transceiver configuration and route. For instance, a 400 Gb/s connection can only be served by four 100 Gb/s or two 200 Gb/s lightpaths but not by a 100 Gb/s and a 300 Gb/s lightpath.
The QoT estimation module of PLATON determines the expected OSNR, SNR and the BER of a lightpath under test. PLATON supports the configuration of interchangeable micro-services for this purpose. To create the ML datasets, we ran simulations in which the QoT estimation module interfaces a nonlinear channel model that computes the OSNR based on the Gaussian noise model . The OSNR is the ratio between the signal power and the sum of the ASE noise and the NLI noise. The NLI noise is calculated using the analytical approximation of the Gaussian noise reference formula for the case of non-identical channels [2, Eq. 120, 124, 125, 126]. The SNR is calculated based on [3, Eq. 34]. The BER is computed as a function of SNR [4, p. 158, 465]. When requesting QoT estimates from the channel model, PLATON provides physical layer parameters for each span of the links that comprise the route of the lightpath under test and the relevant lightpath characteristics. The detailed list of these parameters is provided in the individual description of each dataset.
All datasets of the developed QoT dataset collection include data collected from simulated long-haul networks in dynamic network operation. Each dataset relies on a set of eight simulations performed by PLATON. One simulation includes 100,000 connection requests that are generated between randomly chosen node pairs. The traffic simulator generates connection requests with Poisson distributed inter-arrival and holding times. The networks are simulated as flex-rate fixed-grid EONs operating in C-Band. Flex-rate allows established wavelengths to carry various data rates along lightpaths. The frequency grid implements a fixed channel spacing of 37.5 GHz. To reduce computation time, the feature QoT Violation Awareness is disabled for all simulations. The Load Balancing feature is always turned on. We considered two different look-up tables for the pre-selection of eligible transceiver configurations. One is more optimistic assuming empty links. The other one is more pessimistic and assumes fully-loaded links. The selected routing algorithm finds three most link-disjoint shortest paths based on Dijkstra’s algorithm . The spectrum allocation uses a First-Fit algorithm .
Configurations Specific to Dataset 01
All samples of the dataset 01 are from eight simulations of the Continental Core Optical Network of the United States (CONUS) . In the networking scenario, the traffic simulator generates connection requests for data rates of 100, 200, or 400 Gb/s with a probability of 0.4, 0.4, and 0.2, respectively. PLATON runs in Online Transceiver Mode.
Configurations Specific to Dataset 02
All samples of the dataset 02 are from eight simulations of CONUS. In the networking scenario, the traffic simulator generates connection request for data rates of 100, 200, or 400 Gb/s with a probability of 0.4, 0.4, and 0.2, respectively. PLATON runs in Predefined Transceiver Mode.
Configurations Specific to Dataset 03
All samples of the dataset 03 are from eight simulations of the Telefónica Spain National Network (TSNN) . In the networking scenario, the traffic simulator generates connection request for data rates of 100, 200, or 400 Gb/s with a probability of 0.4, 0.4, or 0.2, respectively. PLATON runs in Online Transceiver Mode.
Configurations Specific to Dataset 04
All samples of the dataset 04 are from eight simulations of CONUS. In the networking scenario, the traffic simulator generates connection request for data rates of 50, 100, 150, 200, 250, and 300 Gb/s with evenly distributed probabilities. PLATON runs in Predefined Transceiver Mode.
The datasets are stored in the netCDF file format. All datasets of the QoT dataset collection include a jupyter notebook that shows how to load and view the data. The datasets contain self-descriptive information about their structure. A comprehensive description of each dataset can be found in the datasheet that can be downloaded together with each dataset.
Not all features included in the datasets might be reasonable to be used for classification. Select only those which are relevant and reasonable for the approach you want to follow to solve the QoT estimation task.
The jupyter notebook included with each dataset of this collection shows how to load and view the data.
1.0.0 Initial release of the QoT dataset collection.
The dataset can be used for the task of ML-based QoT estimation including classification and regression problems. Specifically, the network status representation can be used for network-wide QoT estimation tasks, while the lightpath-based representation can be used for the design of per-lightpath QoT estimation models. The dataset samples in both data representations are identical.
Quality of Transmission, Classification, Estimation, Network-wide QoT, Lightpath-based QoT, GN-Model, Simulation, HD-FEC, EON.
The authors have no conflicts of interest to declare.
 B. Shariati et al., “Impact of Spatial and Spectral Granularity on the Performance of SDM Networks Based on Spatial Superchannel Switching,” J. Lightwave Technol., vol. 35, no. 13, pp. 2559–2568, 2017, doi: 10.1109/JLT.2017.2692301.
 G. Bosco, P. Poggiolini, A. Carena, V. Curri, and F. Forghieri, “Analytical Results on Channel Capacity in Uncompensated Optical Links with Coherent Detection,” in 37th European Conference and Exhibition on Optical Communication (ECOC), Palexpo, Geneva, Switzerland, 2011.
 R.-J. Essiambre, G. Kramer, P. J. Winzer, G. J. Foschini, and B. Goebel, “Capacity Limits of Optical Fiber Networks,” J. Lightwave Technol., vol. 28, no. 4, pp. 662–701, 2010, doi: 10.1109/JLT.2009.2039464.
 F. Xiong, Digital modulation techniques, 2nd ed. Boston, London: Artech House, 2006.
 J. M. Simmons, “Routing Algorithms,” in Optical Networks, Optical network design and planning, J. M. Simmons, Ed., New York: Springer, 2014, pp. 89–145.
 J. M. Simmons, “Wavelength Assignment,” in Optical Networks, Optical network design and planning, J. M. Simmons, Ed., New York: Springer, 2014, pp. 187–227. Monarch Network Architects, DARPA Core Optical Networks (CORONET) Continental United States (CONUS) topology. [Online]. Available: monarchna.com/CORONET_CONUS_Topology.xls (accessed: Oct. 28 2020).