Volumetric video is an immersive media content produced by capturing an object (e.g., people) using multiple, synchronized cameras in a three-dimensional space. This object can be later viewed from any angle at any point in time. Volumetric video enables the representation of six degrees-of-freedom (6DoF) content, through which a viewer can “move” within a video and view objects from different angles and distances.
The resulting data can be represented as point clouds [a set of points with the coordinates (x,y,z)] or meshes (vertices and texture).
Video-based Point Cloud Compression (V-PCC) standard compresses the object’s geometry, texture, and occupancy map separately and feeds the encoded bitstreams into a multiplexer along with metadata.
To process the mesh data, both the texture and vertices are encoded separately and fed into a multiplexer.
Volumetric video requires considerably more data (i.e., geometric information and texture) than its two-dimensional counterpart. Processing these data introduces challenges including: a lack of hardware decoders, difficulties associated with compression of point clouds, high bit rates (e.g., a raw point cloud object consisting of 2.8 million points requires a bandwidth of about 110 billion bits per second at 30 frames per second), and high processing power requirements for combining multiple volumetric objects.
The hardware found in current mobile devices is inadequate for decoding volumetric video content. A workaround is to offload the processing to the cloud. Here, a view from the three-dimensional object can be rendered into a two-dimensional video. This two-dimensional video can be compressed and transmitted to a user’s device. The rendered view is dynamically updated according to user interaction. Benefits include: leveraging the existing two-dimensional video processing pipeline on mobile phones, enabling the display of complex scenes on legacy devices, and achieving practicable bitrates for transmission of complex volumetric scenes. However, cloud-based rendering results in additional network latency (i.e., “motion-to-photon” latency).
The tradeoff for using cloud rendering is added network latency, which has been known to cause degraded user experience and motion sickness. To reduce network latency (beyond the benefits of using the 5G Radio Access Network), we use efficient hardware decoders, real-time streaming protocols, and predict the user’s head movement in 6DoF space.
This approach can reduce motion-to-photon latency and improve the user’s viewing experience. To accomplish this, algorithms predict the position of a user on the cloud server side for a given look-ahead time, and the corresponding rendered image is sent to the client.
In our follow-up work (ACM Multimedia 2020), we designed a Kalman filter for head motion prediction in our cloud-based volumetric video streaming system. We analyzed the performance of our approach using recorded head motion traces and compared its performance to the autoregression model for different prediction intervals (look-ahead times). Our results show that the Kalman filter can predict head orientations 0.5 degrees more accurately than the autoregression model for a look-ahead time of 60 ms.
A pre-print version of our work can be found here.
In order to increase the immersive experience, we even go beyond the application of free-viewpoint volumetric video and present a new framework for the creation of interactive volumetric video content of humans. Re-animation and alteration of an actor’s performance captured in a volumetric studio becomes possible through semantic enrichment of the captured volumetric video data and new hybrid geometry- and video-based animation methods that allow a direct animation of the high-quality data itself instead of creating an animatable model that resembles the captured data.
Details of our work on interactive volumetric video can be found here.
- J. Son, S. Gül, G.S. Bhullar, G. Hege, W. Morgenstern, A. Hilsmann, T. Ebner, S. Bliedung, P. Eisert, T. Schierl, T. Buchholz, C. Hellge, "Split Rendering for Mixed Reality: Interactive Volumetric Video in Action" In Proceedings of SIGGRAPH Asia 2020 XR (SA ’20 XR), December 2020, doi:10.1145/3415256.3421491. [pdf] [video]
- S. Gül, S. Bosse, D. Podborski, T. Schierl, C. Hellge, "Kalman Filter-based Head Motion Prediction for Cloud-based Mixed Reality", In Proceedings of the 28th ACM International Conference on Multimedia (ACMMM), October 2020, doi: 10.1145/3394171.3413699. [pdf] [video] [slides]
- S. Gül, D. Podborski, A. Hilsmann, W. Morgenstern, P. Eisert, O. Schreer, T. Buchholz , T. Schierl , C. Hellge, "Interactive Volumetric Video from the Cloud", International Broadcasting Convention (IBC), September 2020. [pdf]
- S. Gül, D. Podborski, T. Buchholz, T. Schierl, C. Hellge, "Low-latency Cloud-based Volumetric Video Streaming Using Head Motion Prediction", In Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV ’20). Association for Computing Machinery, Istanbul, Turkey, June 2020, doi: 10.1145/3386290.3396933. [pdf] [video] [slides]
- S. Gül, D. Podborski, J. Son, G.S. Bhullar, T. Buchholz, T. Schierl, C. Hellge, "Cloud Rendering-based Volumetric Video Streaming System for Mixed Reality Services", Proceedings of the 11th ACM Multimedia Systems Conference (MMSys), June 2020, doi: 10.1145/3339825.3393583. [pdf] [video] [slides]
- A. Hilsmann, P. Fechteler, W. Morgenstern, S. Gül, D. Podborski, C. Hellge, T. Schierl, P. Eisert, "Interactive Volumetric Video Rendering and Streaming", In: Culture and Computer Science – Extended Reality, Proceedings of KUI 2020, ISBN: 978-3-86488-169-5.
- D. Podborski, S. Gül, J. Son, G.S. Bhullar, R. Skupin, Y. Sanchez, T. Schierl, C. Hellge, "Interactive Low Latency Video Streaming Of Volumetric Content", ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, May 2020. [pdf] [video] [slides]
- A.Hilsmann, P. Fechteler, W. Morgenstern, W. Paier, I. Feldmann, O. Schreer, P. Eisert, "Going beyond Free Viewpoint: Creating Animatable Volumetric Video of Human Performances", IET Computer Vision, Special Issue on Computer Vision for the Creative Industries, April 2020, doi: 10.1049/iet-cvi.2019.0786
- P. Eisert and A. Hilsmann, "Hybrid Human Modeling: Making Volumetric Video Animatable", In: Real VR – Immersive Digital Reality: How to Import the Real World into Head-Mounted Immersive Displays (Lecture Notes in Computer Science), Markus Magnor and Alexander Sorkine-Hornung, Lecture Notes in Computer Science, Springer, April 2020, doi: 10.1007/978-3-030-41816-8_7
- O. Schreer, I. Feldmann, P. Kauff, P. Eisert, D. Tatzelt, C. Hellge, K. Müller, T. Ebner, S. Bliedung, "Lessons learnt during one year of commercial volumetric video production", International Broadcasting Convention (IBC) 2019, (top 10 best papers IBC 2019). [pdf]
- 3GPP SA WG4, Technical Report TR 26.118 (Release 15), "Virtual Reality (VR) profiles for streaming applications" [link]
- ISO/IEC 23090-8, Information technology - Coded representation of immersive media - Part 8: Network based media processing [link]
- ISO/IEC 23090-5, Information technology - Coded representation of immersive media - Part 5: Video-based point cloud compression [link]
- ISO/IEC 23090-10, Information technology - Coded representation of immersive media - Part 10: Carriage of video-based point cloud compression data [link]