The sensory scope of virtual reality systems

The sensory scope of virtual reality systems is determined by how many of the human senses are engaged.  The number may be weighted by whether the senses included are “high bandwidth” or “low bandwidth” in nature. Vision, hearing and touch have a higher capacity for rapid, complex transmission and thus can be viewed as high bandwidth senses for communication between humans and computers.  Thus it is not surprising that these three senses have dominated virtual reality systems.  In comparison, the senses of taste and smell are relatively low bandwidth senses and few virtual reality systems engage them.  The sensory scale of virtual reality systems is the degree of sensory bandwidth that is engaged by communication between humans and computers.  This includes both the size of the signal relative to total human perception and the realism of that signal.
Vision is the single most important human sense and three-dimensional depth perception is central to vision.  Thus, three-dimensional perception is critical for immersive virtual reality.  Human eyes convert light into electrochemical signals that are transmitted and processed through a series of increasingly complex neural cells.   Some cells detect basic object and image components such as edges, color, and movement.  Higher-level cells combine these image components and make macro-level interpretations about what is being seen.   Cues that humans use for three-dimensional perception are based on this processing system and can be categorized into three general areas: interaction among objects; the geometry of object edges; and the texture and shading of object surfaces.
Many cues for three-dimensional perception come from interaction among objects.  Key attributes of these interactions are overlap, scale, and parallax.    Objects that overlap on top of other objects are perceived as closer.  Objects believed to be similar in actual size but appearing larger are perceived as closer and objects that grow in apparent size are perceived as moving closer.  Objects that move a greater distance relative to other objects when the viewer’s head moves are perceived as closer.
Parallax vision (or stereoscopic vision) comes from the fact that human eyes see real world objects from two different angles.  Eye muscles and neural processing of the human brain work together to combine these two different images into perception a single image with three dimensions.   Muscles in each eye change the shape of each the lens to focus at the distance of the object viewed.  Other muscles change the orientations of the eyes so that lines of vision from the two eyes intersect at that same distance.  In real world vision, these two muscle functions work in harmony.   In virtual reality, they may conflict. When images are displayed very far away, then the size of the screen required for immersion is prohibitively large and it is difficult to present different images to the eyes.  When images are displayed very close to the eyes extremely high image resolution is required and the two muscle functions of the eyes tend to conflict.
One method to have the eyes see different images on a distant screen is to have eyes view the screen with different polarized filters.  This is how “3D glasses” work in movies.   The interaction of the polarized filters with colors or other attributes of the image on the screen shifts the images, causing different perspectives and depth perception.  However, this method has significant limitations.
Another method to present the eyes with different images is to use “shutter glasses.”  Shutter glasses alternatively block the image from first one eye and then the other, in synchronization with images from two different perspectives shown successively on a single screen.  When the alternating images are shown in sufficiently rapid succession, then the brain combines the two images into a single three-dimensional image.  Most Head Mounted Displays (HMDs) used in virtual reality are some type of helmet that includes: some version of shutter glasses; a relatively close high-resolution screen with an image that spans more than 60 degrees of the field of vision and moves with head motion; and a mechanical, optical, magnetic or other mechanism to track head motion.
An object’s edges separate it from the environment.  The geometry of these edges also provides perceptual cues about its three-dimensionality. The outer edges of an object form its outline and are the bridge between interaction among objects (including overlap, scale, and parallax as discussed above) and the internal orientation of the object. An object’s inner edges bridge the outer boundaries of the object and its inner surfaces and textures.  Together, the outer and inner edges of an object provide powerful cues about its three-dimensional size, location, orientation, and movement.
Early three-dimensional graphics used the basic geometry of object edges, generally combinations of straight lines, to create moving three-dimensional, transparent “wire” figures.   Although three-dimensional graphics are now much more sophisticated, the underlying geometry of object edges remains central to three-dimensional rendering.
An object’s surfaces are in the spaces within its edges. In addition to the interaction among objects and the geometry of object edges discussed above, the texture and lighting of an object’s surfaces also provide important cues for three-dimensional perception.  One of the most important aspects of three-dimensional perception of surfaces is how they interact with light.  Humans are accustomed to viewing objects illuminated from above by the sun and thus most readily interpret the three-dimensionality of objects lit from above by a single light source.  Nonetheless, illumination from multiple light sources or from directions other than above can also convey three-dimensionality if done consistently.

“Texture mapping” is an efficient method to create surfaces for three-dimensional virtual objects by overlaying basically two-dimensional texture gradients on object surfaces.   Depth perception of these surfaces can be then be refined through the use of shading and reflected light.  “Ray tracing” takes light reflection to a high level by tracking individual rays of light as they reflect among objects and ultimately bounce from object surfaces to the viewer.  Texture mapping, light shading, and ray tracing are computationally intensive, particularly for complex virtual environments with moving objects.  Fortunately for the sake of computational economy, humans do not track as much vision detail in moving objects as in stationary objects.  Thus, computational effort in virtual reality can be conserved without significant loss in perceptual realism by rendering the surfaces of moving objects in less detail than the surfaces of stationery objects.
The essence of virtual reality is fooling the human body into perceiving things that are not real.  From this perspective, it is not surprising that the human body can respond negatively, particularly when it receives conflicting signals from different senses and is not entirely fooled.  With respect to vision, one problem with current VR imaging systems is conflict between eye focus (adjusting the lens of each eye at the apparent distance of the object viewed) and eye axial convergence (coordinating the orientation of both eyes to intersect lines of sight at the apparent distance of the object).   This problem is more acute for HMD systems in which images are displayed relatively close to the eyes.
Another problem is latency (a lag) between the kinetic motion signals that the brain receives from the semicircular canals of the inner ear and the visual motion signals that the brain receives from the eyes.  When there is a lag in visual image processing, then the body receives signals of motion from kinetic senses in real-time but signals of motion from vision after the lag.
Eye focus conflict and virtual image latency can cause eye strain, disorientation, nausea and even long-run health problems.  These symptoms are called “Simulation Adaptation Syndrome” or SAS.  Females tend to experience greater SAS than males.
People can adapt to virtual reality to some extent.  Also, SAS is generally less severe when people are exposed to immersive virtual reality gradually through a series of sessions.  The sessions start out only a couple minutes long and then gradually increase in duration, with real world intermissions between sessions.
With current technology it is difficult to avoid these problems.  However, these problems may eventually be greatly reduced by evolving technologies such as: external imaging systems with variable distance imaging (such as domes with multiple layers of translucent screens), holographic imaging (with three-dimensional images projected in mid-air), or direct internal body imaging (projecting images directly onto the retinas or direct neural-coded transmission from a computer to the optic nerve or neural centers in the brain).

Leave a Reply

Your email address will not be published. Required fields are marked *