Volumetric Video Formats


Volumetric video represents a new way of experiencing immersive media content. It refers to the process of capturing objects (e.g., people) from multiple cameras, which can be later viewed from any angle at any point in time. This enables a variety of applications with particular relevance for the fields of Augmented Reality (AR) and Virtual Reality (VR). Volumetric objects are typically stored as point clouds or meshes. Compared to traditional two-dimensional video, volumetric video requires the storage of geometric information in addition to texture (i.e., color information), which results in a huge amount of data. For example, a raw point cloud object consisting of 2.8 million points requires a bandwidth of about 110 billion bits per second at 30 frames per second. Therefore, efficient compression is essential for such applications.

The Multimedia Communications group is developing tools for the compression, packaging, and multiplexing of mesh-based volumetric video while optimizing the trade-off between compression efficiency and rendered quality. In addition, the group is focusing on the implementation of the volumetric player for mobile AR and VR applications as well as cloud-based streaming solutions.

Content creation and compression

Figure 1 illustrates content preparation for a mesh-based volumetric video. Each recorded object typically consists of three media sources: a two-dimensional video that represents the texture for each frame, three-dimensional meshes that describe the shape of the volumetric object, and an audio track. Each of these resources is compressed using the corresponding encoder, after which the resulting bitstreams are synchronously multiplexed into a single MP4 file.

Volumetric video player

After the content preparation, all of the resources of the volumetric object are contained in a single MP4 file, which can be delivered to a user using different streaming methods (e.g., HTTP adaptive streaming).

Then, a volumetric video player processes the file in reverse order to the content creation. Figure 2 illustrates a simplified volumetric video player architecture. The MP4 file containing the three sources of media (video, mesh, and audio) is demultiplexed into three elementary streams. Each elementary stream is decoded by the corresponding decoder and rendered in a scene. The “Rendering Engine” allows the user to interact with the volumetric video object. At the same time, the “Application” has complete control over the “Media Engine,” which allows the user to control the playback of volumetric video as in traditional two-dimensional video player applications.



  1. S. Gül, D. Podborski, A. Hilsmann, W. Morgenstern, P. Eisert, O. Schreer, T. Buchholz , T. Schierl , C. Hellge, "Interactive Volumetric Video from the Cloud", International Broadcasting Convention (IBC) 2020.
  2. S. Gül, D. Podborski, T. Buchholz, T. Schierl, C. Hellge, "Low-latency Cloud-based Volumetric Video Streaming Using Head Motion Prediction", In Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV ’20). Association for Computing Machinery, Istanbul, Turkey, June 2020. [pdf] [video] [slides]
  3. S. Gül, D. Podborski, J. Son, G. Bhullar, T. Buchholz, T. Schierl, C. Hellge, "Cloud Rendering-based Volumetric Video Streaming System for Mixed Reality Services", Proceedings of the 11th ACM Multimedia Systems Conference (MMSys), June 2020. [pdf] [video] [slides]
  4. O. Schreer,  I. Feldmann, P. Kauff, P. Eisert, D. Tatzelt, C. Hellge, K. Müller, T. Ebner, and  S. Bliedung,(2019). Lessons learnt during one year of commercial volumetric video production, IBC 2019, (top 10 best papers IBC 2019). [download]



  1. Fraunhofer HHI: Volumetric video player demo, Tdoc S4-190646, 3GPP TSG-SA4 104th meeting, Cork, Ireland, July 2019. [download]
  2. Fraunhofer HHI: Volumetric video capture and production, Tdoc S4-190368, 3GPP TSG-SA4 103rd meeting, Newport Beach, CA, USA, April 2019. [download]


Related links