The Capture of Reality

When creating experiences for Immersive Virtual Reality, there are essentially two approaches. The first one of these is manual construction through Computer-Generated Imagery (CGI), and is how most games and VR experiences are made.  The second approach is far more automatic and attempts to ‘capture reality’ instead of actively generating it. It is this approach that we will discuss in this entry. In addition to presenting the technicalities of the methods of capture, we will also discuss its limitations, and provide an innovative example of how these can be solved in the future, drawn from a student project at the University of Bergen.

An early 360° camera — horisontally at least — probably the first with a synchronised shutter.

360° Video

In a previous article on Virtual Reality Journalism, we discussed how 360° 3D cameras can be used to present a user to an immersive experience. This approach has several unique benefits. First of all, it is far less time-consuming to capture and re-use already existing physical environments, instead of spending time creating it through 3D modelling. The same is perhaps especially the case when the environment involves any human actors, as it easier to avoid the uncanny valley effect and simultaneously maintain high standards of realism when using image capture equipment, than it is to create it with 3D animation.

How does it work?

360° cameras usually comprise two or more (ultra-)wide angle lenses. In the case of cameras with just two lenses, such as the GEAR 360 or Ricoh Theta V, each of these lenses then have to be able to capture 180° degrees horizontally and vertically. The recordings from these lenses, when raw straight from the camera, are separate — and need to be stitched together with software (for instance) an equirectangular view to compose a spherical view of 360° (See Illustration 2). Illustration 1 illustrates how the equirectangular format works, in the format of a world map, perhaps our most relatable example of spherical / global shapes presented in the format of rectangles.

Illustration 1: A relatable example of the equirectangular format. The furthest point west is close to the furthest point east, and as such we deal with a ‘sphere’, or more rightly globe, that is stretched out to a rectangle. The closer we get to the poles, such as Antarctica, the more the image is stretched, as the circumference of the earth is lesser at the poles.
Illustration 2: In this equirectangular photo, captured with a Ricoh Theta V, we see the same effect as in Illustration 1. My hands, which enclose the bottom of the camera, are given the same effects as Antarctica in the map. The stairs, however, which appear to be circular are straight, but it’s bending by the lenses are especially clear when viewing it ‘equirectangularly’.

When an equirectangular image is viewed through an HMD or a smartphone, the software selects only about 110° of 360° of the image, relying on the sensors in the HMD or phone on which degrees of the image to present.

3D Images

Although regular 360˚ cameras (GEAR VR; Ricoh Theta V) to a large extent cover the world as we see it in all it’s 360°, their images are still monoscopic. Essentially, this means that the same image  is presented to each eye when viewed in a HMD, and this is not the way we ordinarily see reality. As our eyes are distanced by a centimeter or two,  the visual feed slightly varies in its capture of reality. It is this which enables us to perceive the depth of the world, that is, when our eyes are not fooled by illusions exploiting this effect, such as VR itself. We discuss this in more detail in our entry on the History of VR, in which we discuss the invention of the Stereoscope, but a small introduction will also be given in this entry. Essentially, 3D 360˚ cameras utilise the same feature as human beings to capture depth, by separating the cameras similarly to that of the human eye. Such cameras are, however, more cumbersome and costly to produce, and to capture stereoscopic images one needs to double the minimum of lenses — leading to a minimum of four lenses —two for each eye for each 180˚ of capture. Unlike the  4K 360˚ monoscopic cameras available rather cheaply at the commercial market (from $200 and up), stereoscopic cameras have not entered the market at very reasonable prices yet. There is hope, however, and I can personally recommend Vuze+, a 360˚ 3D camera that deliver 4K resolution per eye, and comes with a well-designed acommpanying stitching- and editing software. The price is still a bit stiff for most non-professional use, at $1200, but it brings hope for future technology that these can soon be more affordable. We have used the Vuze+ camera in a research project at the University of Bergen, with good results. It is comparable to the quality of a Ricoh Theta V — except that it delivers the stereoscopic images rather than monoscopic ones.

Regarding Resolution

Unfortunately, a resolution of 4K per eye sounds great — and many are dissapointed when they view the recordings of a camera such as GEAR VR, Ricoh Theta, or the Vuze+. They may recall their images on their 4K TV as incredibly sharp, and yet, their recorded videos appear somewhat blurry and pixelated. The answer to why this is the case is quite simple. The 360˚ images do indeed have a 4K resolution, however, we are unable to view all the pixels at a time as they are stretched out on a sphere.  To keep matters simple, let’s say that your Head-Mounted Display has a Field of View of 90˚ (although most have 110˚). In this case, just  1/4 of the 4K image is being seen at any given time. Thus, we will have to divide the pixel count by four. This is somewhat simplified because of stretching, but it should be enough to get the point. To get an effective resolution of 4K, or something akin to 3K such as the HTC Vive Pro and Samsung Oddysey(+) can afford, one would need a far higher resolution of the cameras.

Another Step in Fidelity: Volumetric Video

At first thought, it may perhaps be hard to imagine how we can proceed to more details in immersive  360˚ 3D recordings except by increasing the resolution. As we briefly commented, however, stereoscopics in 3D movies at the cinema, or in 360˚ 3D recordings merely provide an illusion of depth — not actual depth. The same goes for our eyes, although they mostly perceive it correctly,  they are easily fooled. 360˚ 3D cameras is an example of this, they merely fool our eyes: although it seems that there is depth, we can not really move in the image — as there is no actual depth to it. Here, volumetric video acts differently, and affords positional interaction. Volumetric video attributes the recorded images in a 3D (x,y,z) space, in addition to delivering stereoscopy so that we can perceive it. Volumetric video is unfortunately very hard to create while still retaining high quality, and plug-and-play solutions still seem far off. To get an idea of how volumetric video works, we recommend to look into the concepts of photogrammetry — and perhaps even to create a 3D model yourself, using images captured with your smartphone. This YouTube tutorial shows you how to do this in Agisoft Photoscan Pro, which has a free trial available.

Limitations

Developed in an undergraduate course at the University of Bergen, the short 360 movie “Schizophrenia“, experimented with interactive 360 video.

Despite these great innovations in the capture of reality, CGI has some benefits that neither 360˚ 3D or Volumetric videos can really achieve. The most important of these is that of interactivity . As 360° videos are linear (that is, they have a predetermined beginning and end), the user can not really affect what happens in the video — except by choosing which degrees of the video to see.

In our course in VR Journalism at the University of Bergen where I taught students VR programming, 360° video and photogrammetry — we faced this exact limitation. A group that worked on providing an experience of the reality-shattering disorder of Schizophrenia, wanted hallucinations to occur when the user viewed at certain areas. The students solved this by placing transparent gifs over the video in A-Frame, edited based on the real footage, and put gaze event listeners to activate the playing of the gif. The results were extraordinary, and could well provide a new way to provide a means of simpler interaction on top of 360° videos. The experience, which voices are in Norwegian, can be viewed here (WebVR browser such as Chrome is necessary).