Advanced MPEG-DASH

Motivation

Dynamic Adaptive Streaming over HTTP (DASH) [1] is an emerging standard of MPEG, which has raised the interest of the market and is envisioned to be deployed and widely used in the upcoming years. Over the last years, there has been a change in the paradigm of audio/video streaming techniques. Since audio/video streaming is real time streaming, audio/video was typically transported over RTP/UDP. However, the tremendous success of streaming proprietary solutions based on HTTP/TCP has shown that the assumption that TCP based solutions were not adequate for audio/video streaming services was not correct. In fact HTTP streaming has shown to be very valuable for streaming services in many aspects, provided that the increased latency is acceptable.

A non-exhaustive list of benefits is shown in the following:

  • Avoidance of problems with firewall and NAT traversal, typical in services based on RTP/UDP
  • Relief of servers, due to reuse of widely deployed network caches
  • Complexity moved to the client: adaptation to available bandwidth is done at the client
  • Servers are agnostic to delivered content
  • Deployment of Content Delivery Networks (CDNs) can be kept simple, e.g., surrogate servers do not have any special capability (media agnostic). They act as simple caches

Besides, network resources may vary over the time, depending on the characteristics of the Internet traffic of the moment. DASH allows clients to adapt to the available resources at the network at every moment to ensure a service of quality. An example of a DASH session is presented in this animation.

Overview

Dynamic Adaptive Streaming over HTTP (DASH)[1] defines the format of the Media Presentation Description (MPD) which is basically an XML document that describes the media data available at the servers. The MPD allows clients to select the data, which they are interested in, and request it from the server. DASH further defines segments, which are HTTP resources that are available for download and can either be initialization segments, i.e. segments containing the initialization data for decoding, or media segments that contain the media data corresponding to specific time intervals of the whole content. The segment format is based on MPEG2-TS and ISO base Media File Format. However, segments based on other formats can be specified as long as the requirements described in [1] are fulfilled.

Figure 2 shows how the media presentation is organized in the MP

The media presentation described in the MPD is organized in Periods, which are high level time intervals of the whole presentation timeline. Periods are typically used for content splicing and advert insertion. For each Period several AdaptationSets may be offered. Each AdaptationSet contains different media data of the offered content, e.g. one AdaptationSet may contain data for the video component, another AdaptationSet data for the audio component, etc., while it is also possible two offer more than one component in an AdaptationSet. For each AdaptationSet different qualities (alternative versions) of the data are offered. Each of these alternative versions is referred to as a Representation, which consists of one or more Segments. The MPD provides the URLs of those Segments so that clients can request the data available at the server.

Advanced DASH - DASH with SVC and MVC

MPEG-DASH allows for efficient transmission of layered codecs, such as the Scalable Video Coding (SVC) and Multiview Video Coding (MVC). For efficient transmission of layer codecs, each of the layers is described by a different representation. Enhancement layers/views are considered dependent Representations that depend on representations that contain lower layers, referred to as complementary Representations. The dependency on other representations is indicated in the MPD by the dependency_id (dep_id in the following figures), which indicates additional Representations that need to be downloaded to be able to decode a dependent Representation.
Figure 3 shows how the media presentation is organized for SVC in the MPD:

Splitting the different layers into different Representations allows for an advanced and efficient usage of network resources and client operation, as discussed in [2]...[6].

Similarly, for MVC, the concept of dependent Representations allows for an advanced and efficient usage of MPEG-DASH. For MVC, when the different views are stored into different representations, those representations have to be grouped into different AdaptationSets, each for a different view. An example with two views is shown in the previous figure, where the representations for the base view are grouped in AdaptationSet1 and the representations for the enhancement view are grouped in AdaptationSet2.

References

[1] ISO/IEC JTC1/SC29/WG11, "Information technology - Dynamic adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and segment formats", ISO/IEC 23009-1:2012, 2012.

Related Publications

[2] Yago Sanchez, Thomas Schierl, Cornelius Hellge, Dohy Hong, Danny De Vleeschauwer, Werner Van Leekwijck, Yannick Lelouedec, and Thomas Wiegand: Improved caching for HTTP-based Video on Demand using Scalable Video Coding, IEEE Consumer Communications and Networking Conference (CCNC) 2011, Las Vegas, NV, USA, January 2011.

[3] Yago Sanchez, Thomas Schierl, Cornelius Hellge, Dohy Hong, Danny De Vleeschauwer, Werner Van Leekwijck, Yannick Le Louedec, and Thomas Wiegand: iDASH: Improved Dynamic Adaptive Streaming over HTTP using Scalable Video Coding, ACM Multimedia Systems 2011, San Jose, CA, USA, February 2011.

[4] Yago Sanchez, Cornelius Hellge, Werner Van Leekwijck, Yannick Le Louédec and Thomas Schierl: Scalable Video Coding based DASH for efficient usage of network resources, Position Paper for the Third W3C Web and TV workshop, Los Angeles, CA, USA, September 2011.

[5] Cornelius Hellge, Yago Sanchez, Thomas Schierl, Thomas Wiegand, Danny De Vleeschauwer, Dohy Hong, and Yannick Le Louedec: CDNs with DASH and iDASH using Priority Caching, Proccedings of Pacific- Rim Conference on Multimedia (PCM 2011), Sydney, Australia, December 2011.

[6] Yago Sanchez, Thomas Schierl, Cornelius Hellge, Thomas Wiegand, Dohy Hong, Danny De Vleeschauwer, W. Van Leekwijck, and Yannick Le Louedec: Efficient HTTP-based Streaming using Scalable Video Coding, Elsevier Signal Processing: Image Communication, November 2011.