SAP '16: Proceedings of the ACM Symposium on Applied Perception

Full Citation in the ACM Digital Library

FrankenFolk: distinctiveness and attractiveness of voice and motion

User, metric, and computational evaluation of foveated rendering methods

Perceptually lossless foveated rendering methods exploit human perception by selectively rendering at different quality levels based on eye gaze (at a lower computational cost) while still maintaining the user's perception of a full quality render. We consider three foveated rendering methods and propose practical rules of thumb for each method to achieve significant performance gains in real-time rendering frameworks. Additionally, we contribute a new metric for perceptual foveated rendering quality building on HDR-VDP2 that, unlike traditional metrics, considers the loss of fidelity in peripheral vision by lowering the contrast sensitivity of the model with visual eccentricity based on the Cortical Magnification Factor (CMF). The new metric is parameterized on user-test data generated in this study. Finally, we run our metric on a novel foveated rendering method for real-time immersive 360° content with motion parallax.

Is the motion of a child perceivably different from the motion of an adult?

Binocular eye tracking calibration during a virtual ball catching task using head mounted display

When tracking the eye movements of an active observer, the quality of the tracking data is continuously affected by physical shifts of the eye-tracker on an observers head. This is especially true for eye-trackers integrated within virtual-reality (VR) helmets. These configurations modify the weight and inertia distribution well beyond that of the eye-tracker alone. Despite the continuous nature of this degradation, it is common practice for calibration procedures to establish eye-to-screen mappings, fixed over the time-course of an experiment. Even with periodic recalibration, data quality can quickly suffer due to head motion. Here, we present a novel post-hoc calibration method that allows for continuous temporal interpolation between discrete calibration events. Analysis focuses on the comparison of fixed vs. continuous calibration schemes and their effects upon the quality of a binocular gaze data to virtual targets, especially with respect to depth. Calibration results were applied to binocular eye tracking data from a VR ball catching task and improved the tracking accuracy especially in the dynamic case.

Perceptual constancy of mechanical properties of cloth under variation of external forces

Deformable objects such as cloth exhibit their mechanical properties (e.g. stiffness) through shape deformation over time under external forces. Mechanical properties are important because they tell us the affordance of the object and helps us predict what type of action can be done upon it. Previous research shows that motion statistics can be used to develop computer vision algorithms to estimate mechanical properties of cloth under an unknown wind force. It is unclear what motion cues human use to estimate mechanical properties. Estimating mechanical properties is difficult because both the intrinsic properties of the fabric and the external force contribute to the apparent motion of the fabric. However, in order to achieve invariant material perception, the visual system needs to discount the effects of external force. In this paper, we investigate whether humans have an invariant representation of mechanical properties of fabrics under varying external forces in dynamic scenes. Then we study what visual cues allow humans to achieve this perceptual constancy. The stimuli are animated videos containing a hanging fabric moving under oscillating wind. We vary both intrinsic mechanical properties such as mass and stiffness of the cloth as well as the strength of the wind force. We discuss our results in the context of optical flow statistics. This advances the current understanding of the role of motion in perception of material properties in dynamic scenes.

Perception of lighting and shading for animated virtual characters

The design of lighting in Computer Graphics is directly derived from cinematography, and many digital artists follow the conventional wisdom on how lighting is set up to convey drama, appeal, or emotion. In this paper, we are interested in investigating the most commonly used lighting techniques to more formally determine their effect on our perception of animated virtual characters. Firstly, we commissioned a professional animator to create a sequence of dramatic emotional sentences for a typical CG cartoon character. Then, we rendered that character using a range of lighting directions, intensities, and shading techniques. Participants of our experiment rated the emotion, the intensity of the performance, and the appeal of the character. Our results provide new insights into how animated virtual characters are perceived, when viewed under different lighting conditions.

Predicting destination using head orientation and gaze direction during locomotion in VR

This paper reports preliminary investigations into the extent to which future directional intention might be reliably inferred from head pose and eye gaze during locomotion. Such findings could help inform the more effective implementation of realistic detailed animation for dynamic virtual agents in interactive first-person crowd simulations in VR, as well as the design of more efficient predictive controllers for redirected walking. In three different studies, with a total of 19 participants, we placed people at the base of a T-shaped virtual hallway environment and collected head position, head orientation, and gaze direction data as they set out to perform a hidden target search task across two rooms situated at right angles to the end of the hallway. Subjects wore an nVisorST50 HMD equipped with an Arrington Research ViewPoint eye tracker; positional data were tracked using a 12-camera Vicon MX40 motion capture system. The hidden target search task was used to blind participants to the actual focus of our study, which was to gain insight into how effectively head position, head orientation and gaze direction data might predict people's eventual choice of which room to search first. Our results suggest that eye gaze data does have the potential to provide additional predictive value over the use of 6DOF head tracked data alone, despite the relatively limited field-of-view of the display we used.

Do I trust you, abstract creature?: a study on personality perception of abstract virtual faces

Studies in the field of social psychology have shown evidence that the dimensions of human facial features can directly impact the perception of personality of that human. Traits such as aggressiveness, trustworthiness and dominance have been directly correlated with facial features. If the same correlations were true for virtual faces, this could be a valuable design guideline to direct the creation of characters with intended personalities. In particular, this is relevant for extremely abstract characters that have minimal facial features (often seen in video games and movies), and rely heavily on these features for portraying personality. We conducted an exploratory study in order to retrieve insights about the way certain facial features affect the perceived personality, as well as affinity of very abstract virtual faces. We specifically tested the effect of different head shapes, eye shapes and eye sizes. Interestingly, our findings show that the same rules for real human faces do not apply to the perception of abstract faces, and in some cases are the complete reverse. These results provide us with a better understanding of the perception of abstract virtual faces, and a starting point for the creation of guidelines for how to portray personality using minimal facial cues.

Evaluating human gaze patterns during grasping tasks: robot versus human hand

Perception and gaze are an integral part of determining where and how to grasp an object. In this study we analyze how gaze patterns differ when participants are asked to manipulate a robotic hand to perform a grasping task when compared with using their own. We have three findings. First, while gaze patterns for the object are similar in both conditions, participants spent substantially more time gazing at the robotic hand then their own, particularly the wrist and finger positions. Second, We provide evidence that for complex objects (eg, a toy airplane) participants essentially treated the object as a collection of sub-objects. Third, we performed a follow-up study that shows that choosing camera angles that clearly display the features participants spend time gazing at are more effective for determining the effectiveness of a grasp from images. Our findings are relevant both for automated algorithms (where visual cues are important for analyzing objects for potential grasps) and for designing tele-operation interfaces (how best to present the visual data to the remote operator).

The effects of artificially reduced field of view and peripheral frame stimulation on distance judgments in HMDs

Numerous studies have reported underestimated egocentric distances in virtual environments through head-mounted displays (HMDs). However, it has been found that distance judgments made through Oculus Rift HMDs are much less compressed, and their relatively high device field of view (FOV) may play an important role. Some studies showed that applying constant white light in viewers' peripheral vision improved their distance judgments through HMDs. In this study, we examine the effects of the device FOV and the peripheral vision by performing a blind walking experiment through an Oculus Rift DK2 HMD with three different conditions. For the BlackFrame condition, we rendered a rectangular black frame to reduce the device field of view of the DK2 HMD to match an NVIS nVisor ST60 HMD. In the WhiteFrame and GreyFrame conditions, we changed the frame color to solid white and middle grey. From the results, we found that the distance judgments made through the black frame were significantly underestimated relative to the WhiteFrame condition. However, no significant differences were observed between the WhiteFrame and GreyFrame conditions. This result provides evidence that the device FOV and peripheral light could influence distance judgments in HMDs, and the degree of influence might not change proportionally with respect to the peripheral light brightness.

Action coordination with agents: crossing roads with a computer-generated character in a virtual environment

We investigated how people jointly coordinate their decisions and actions with a computer-generated character (agent) in a large-screen virtual environment. The task for participants was to physically cross a steady stream of traffic on a virtual road without getting hit by a car. Participants performed this task with another person or with a computer-generated character (Fig. 1). The character was programmed to be either safe (taking only large gaps) or risky (also taking relatively small gaps). We found that participants behaved in many respects similarly with real and virtual partners. They maintained similar distances between themselves and their partner, they often crossed the same gap with their partner, and they synchronized their crossing with their partner. We also found that the riskiness of the character influenced the gap choices of participants. This study demonstrates the potential for using large-screen virtual environments to study how people interact with CG characters when performing whole-body joint actions.

Learning a human-perceived softness measure of virtual 3D objects

We introduce the problem of computing a human-perceived softness measure for virtual 3D objects. As the virtual objects do not exist in the real world, we do not directly consider their physical properties but instead compute the human-perceived softness of the geometric shapes. We collect crowdsourced data where humans rank their perception of the softness of vertex pairs on virtual 3D models. We then compute shape descriptors and use a learning-to-rank approach to learn a softness measure mapping any vertex to a softness value. Finally, we demonstrate our framework with a variety of 3D shapes.

Need a hand?: how appearance affects the virtual hand illusion

How does the appearance of a virtual hand affect own-body perception? Previous studies have compared either two or three hand models at a time, with their appearances limited to realistic hands and abstract or simple objects. To investigate the effects of different realisms, render styles, and sensitivities to pain on the virtual hand illusion (VHI), we conduct two studies in which participants take on controllable hand models with six distinct appearances. We collect questionnaire data and comments regarding responses to impacts and threats to assess differences in the strength of the VHI.

Our findings indicate that an illusion can be created for any model for some participants, but that the effect is perceived weakest for a non-anthropomorphic block model and strongest for a realistic human hand model in direct comparison. We furthermore find that the responses to our experiments highly vary between participants.

Psychoacoustic characterization of propagation effects in virtual environments

Animated versus static views of steady flow patterns

Two experiments were conducted to test the hypothesis that animated representations of vector fields are more effective than common static representations even for steady flow. We compared four flow visualization methods: animated streamlets, animated orthogonal line segments (where short lines were elongated orthogonal to the flow direction but animated in the direction of flow), static equally spaced streamlines, and static arrow grids. The first experiment involved a pattern detection task in which the participant searched for an anomalous flow pattern in a field of similar patterns. The results showed that both the animation methods produced more accurate and faster responses. The second experiment involved mentally tracing an advection path from a central dot in the flow field and marking where the path would cross the boundary of a surrounding circle. For this task the animated streamlets resulted in better performance than the other methods, but the animated orthogonal particles resulted in the worst performance. We conclude with recommendations for the representation of steady flow patterns.

An empirical evaluation of visuo-haptic feedback on physical reaching behaviors during 3D interaction in real and immersive virtual environments

Enhancing stress management techniques using virtual reality

Chronic stress is one of the major problems in our current fast paced society. The body reacts to environmental stress with physiological changes (e.g. accelerated heart rate), increasing the activity of the sympathetic nervous system. Normally the parasympathetic nervous system should bring us back to a more balanced state after the stressful event is over. However, nowadays we are often under constant pressure, with a multitude of stressful events per day, which can result in us constantly being out of balance. This highlights the importance of effective stress management techniques that are readily accessible to a wide audience. In this paper we present an exploratory study investigating the potential use of immersive virtual reality for relaxation with the purpose of guiding further design decisions, especially about the visual content as well as the interactivity of virtual content. Specifically, we developed an underwater world for head-mounted display virtual reality. We performed an experiment to evaluate the effectiveness of the underwater world environment for relaxation, as well as to evaluate if the underwater world in combination with breathing techniques for relaxation was preferred to standard breathing techniques for stress management. The underwater world was rated as more fun and more likely to be used at home than a traditional breathing technique, while providing a similar degree of relaxation.

Decoupling light reflex from pupillary dilation to measure emotional arousal in videos

Predicting the exciting portions of a video is a widely relevant problem because of applications such as video summarization, searching for similar videos, and recommending videos to users. Researchers have proposed the use of physiological indices such as pupillary dilation as a measure of emotional arousal. The key problem with using the pupil to measure emotional arousal is accounting for pupillary response to brightness changes. We propose a linear model of pupillary light reflex to predict the pupil diameter of a viewer based only on incident light intensity. The residual between the measured pupillary diameter and the model prediction is attributed to the emotional arousal corresponding to that scene. We evaluate the effectiveness of this method of factoring out pupillary light reflex for the particular application of video summarization. The residual is converted into an exciting-ness score for each frame of a video. We show results on a variety of videos, and compare against ground truth as reported by three independent coders.

Emotion recognition in autism spectrum disorder: does stylization help?

We investigate the effect that stylized facial expressions have on the perception and categorization of emotions by participants with high-functioning Autism Spectrum Disorder (ASD) in contrast to two control samples: one with Attention-Deficit/Hyperactivity Disorder (ADHD), and one with neurotypically developed peers (NTD). Realtime Non-Photorealistic Rendering (NPR) techniques with different levels of abstraction are applied to stylize two animated virtual characters performing expressions for six basic emotions. Our results show that the accuracy rates of the ASD group were unaffected by the NPR styles and reached about the same performance as for the characters with realistic-looking appearance. This effect, however, was not seen in the ADHD and NTD groups.

Looking at faces: autonomous perspective invariant facial gaze analysis

Eye-tracking provides a mechanism for researchers to monitor where subjects deploy their visual attention. Eye-tracking has been used to gain insights into how humans scrutinize faces, however the majority of these studies were conducted using desktop-mounted eye-trackers where the subject sits and views a screen during the experiment. The stimuli in these experiments are typically photographs or videos of human faces. In this paper we present a novel approach using head-mounted eye-trackers which allows for automatic generation of gaze statistics for tasks performed in real-world environments. We use a trained hierarchy of Haar cascade classifiers to automatically detect and segment faces in the eye-tracker's scene camera video. We can then determine if fixations fall within the bounds of the face or other possible regions of interest and report relevant gaze statistics. Our method is easily adaptable to any feature-trained cascade to allow for rapid object detection and tracking. We compare our results with previous research on the perception of faces in social environments. We also explore correlations between gaze and confidence levels measured during a mock interview experiment.

Revisiting detection thresholds for redirected walking: combining translation and curvature gains

Redirected walking enables the exploration of large virtual environments while requiring only a finite amount of physical space. Unfortunately, in living room sized tracked areas the effectiveness of common redirection algorithms such as Steer-to-Center is very limited. A potential solution is to increase redirection effectiveness by applying two types of perceptual manipulations (curvature and translation gains) simultaneously. This paper investigates how such combination may affect detection thresholds for curvature gain. To this end we analyze the estimation methodology and discuss selection process for a suitable estimation method. We then compare curvature detection thresholds obtained under different levels of translation gain using two different estimation methods: method of constant stimuli and Green's maximum likelihood procedure. The data from both experiments shows no evidence that curvature gain detection thresholds were affected by the presence of translation gain (with test levels spanning previously estimated interval of undetectable translation gain levels). This suggests that in practice currently used levels of translation and curvature gains can be safely applied simultaneously. Furthermore, we present some evidence that curvature detection thresholds may be lower that previously reported. Our estimates indicate that users can be redirected on a circular arc with radius of either 11.6m or 6.4m depending on the estimation method vs. the previously reported value of 22m. These results highlight that the detection threshold estimates vary significantly with the estimation method and suggest the need for further studies to define efficient and reliable estimation methodology.

Seeing jelly: judging elasticity of a transparent object

Taking advantage of computer graphics technologies, recent psychophysical study on material perception has revealed how human vision estimates the mechanical property of objects, such as liquid viscosity, from image features. Here we consider how human perceive another important mechanical material property --- elasticity. We simulated scenes in which a transparent cube falling on the floor, while manipulating the elasticity of the cube. We asked observers to rate the elasticity using a 5-point scale. Human observers were quite sensitive to the change in the simulated elasticity of the cube. In comparison with the original condition, the elasticity was overestimated when only the cube contour deformation was visible, whereas underestimated when the cube contour deformation was hidden and only internal optical deformation was visible. The effects of contour and optical deformations on elasticity rating were almost the same when the observers viewed white noise fields that reproduced the optical flow fields of the cube movies. Increasing frame duration (which decreased image speed) also increased the apparent elasticity. These results suggest that human elasticity judgment is based on the pattern of image motion arising from contour and optical deformations. This scientific finding may provide a hint for computationally efficient rendering of perceptually realistic dynamic scenes.

Analyzing gaze synchrony in cinema: a pilot study

Recent advances in personalized displays now allow for the delivery of high-fidelity content only to the most sensitive regions of the visual field, a process referred to as foveation [Guenter et al. 2012]. Because foveated systems require accurate knowledge of gaze location, attentional synchrony is particularly relevant: this is observed when multiple viewers attend to the same image region concurrently.

Automatic scanpath generation with deep recurrent neural networks

Many computer vision algorithms are biologically inspired and designed based on the human visual system. Convolutional neural networks (CNNs) are similarly inspired by the primary visual cortex in the human brain. However, the key difference between current visual models and the human visual system is how the visual information is gathered and processed. We make eye movements to collect information from the environment for navigation and task performance. We also make specific eye movements to important regions in the stimulus to perform the task-at-hand quickly and efficiently. Researchers have used expert scanpaths to train novices for improving the accuracy of visual search tasks. One of the limitations of such a system is that we need an expert to examine each visual stimuli beforehand to generate the scanpaths. In order to extend the idea of gaze guidance to a new unseen stimulus, there is a need for a computational model that can automatically generate expert-like scanpaths. We propose a model for automatic scanpath generation using a convolutional neural network (CNN) and long short-term memory (LSTM) modules. Our model uses LSTMs due to the temporal nature of eye movement data (scanpaths) where the system makes fixation predictions based on previous locations examined.

Binocular tone reproduction display for an HDR panorama image

In virtual reality (VR) applications, it is often required to display a surrounding scenery with a high dynamic range (HDR) of luminance to a binocular display device such as an Oculus Rift. A surrounding scenery of an outdoor environment usually includes both bright regions, such as a sunny place and a clear sky, and dark regions, such as a shadow area. The dynamic range of the luminances between bright and dark regions are quite large.

Color appearance modeling in augmented reality

Augmented Reality (AR) technology enables humans to mix synthetic objects and colors with real scenes. Applications that can benefit from AR include education, entertainment, design and medical applications (Van Krevelen and Poelman, 2010). Over the years, most of the attempts in AR area have been focused on fundamental problems in designing an efficient AR system. Although much progress has been made in the hardware and software areas, researchers are still tackling problems in registration, occlusion, tracking, focus, etc.

Embodied visuo-locomotive experience analysis: immersive reality based summarisation of experiments in environment-behaviour studies

Evidence-based design (EBD) for architecture involves the study of post-occupancy behaviour of building users with the aim to provide an empirical basis for improving building performance [Hamilton and Watkins 2009]. Within EBD, the high-level, qualitative analysis of the embodied visuo-locomotive experience of representative groups of building users (e.g., children, senior citizens, individuals facing physical challenges) constitutes a foundational approach for understanding the impact of architectural design decisions, and functional building performance from the viewpoint of areas such as environmental psychology, wayfinding research, human visual perception studies, spatial cognition, and the built environment [Bhatt and Schultz 2016].

Exploring users' perceived activities in a sketch-based intelligent tutoring system through eye movement data

Intelligent tutoring systems (ITS) empower instructors to make teaching more engaging by providing a platform to tutor, deliver learning material, and to assess students' progress. Despite the advantages, existing ITS do not automatically assess how students engage in problem solving? How do they perceive various activities? and How much time they spend on each activity leading to the solution? In this research, we present an eye tracking framework that, based on eye movement data, can assess students' perceived activities and overall engagement in a sketch based Intelligent tutoring system, "Mechanix" [Valentine et al. 2012]. Based on an evaluation involving 21 participants, we present the key eye movement features, and demonstrate the potential of leveraging eye movement data to recognize students' perceived activities, "reading, gazing at an image, and problem solving," with an accuracy of 97.12%.

How experts' mental model affects 3D image segmentation

3D image segmentation is a fundamental process in many scientific and medical applications. Automatic algorithms do exist, but there are many use cases where these algorithms fail. The gold standard is still manual segmentation or review. Unfortunately, existing 3D segmentation tools do not currently take into account human mental models, low-level perception actions, and higher-level cognitive tasks. Our goal is to improve the quality and efficiency of manual segmentation by analyzing the process in terms of human mental models and low-level perceptual tasks. Preliminary results from our in-depth field studies suggest that compared to novices, experts have a stronger mental model of the 3D structures they segment. To validate this assumption, we introduce a novel test instrument to explore experts' mental model in the context of 3D image segmentation. We use this test instrument to measure individual differences in various spatial segmentation and visualization tasks. The tasks involve identifying valid 2D contours, slicing planes and 3D shapes.

Learning movements from a virtual instructor

We examined the effects of perspective (first person versus third person) and immersion (immersive versus nonimmersive) on motor learning in order to assess the format of action representations. Participants viewed the instructor from either a first or a third person perspective. During immersive conditions, they wore a 6 DoF-tracked head-mounted display, as in Figure 1(a). For nonimmersive conditions, they viewed a computer monitor, as in Figure 1(b). We also evaluated whether these effects were modulated by experience.

Experienced dancers and novices practiced dances by imitating a virtual instructor and then subsequently had to perform the dances from memory without an instructor present, following a delay. Accuracy for both practice and test trials was video coded.

In line with theoretical models of motor learning, mean accuracy increased with successive trials in accordance with the power law of practice. First person perspective formats led to better accuracy, but immersive formats did not, as shown in Figure 2. Experienced dancers were more accurate than novices, but format did not interact with experience.

These results suggest that during learning, individuals across experience levels represent complex actions in first person perspective, and that virtual instruction does not require immersion to be effective.

Leveraging gaze data for segmentation and effects on comics

In this work, we present a semi-automatic method based on gaze data to identify the objects in comic images on which digital effects will look best. Our key contribution is a robust technique to cluster the noisy gaze data without having to specify the number of clusters as input. We also present an approach to segment the identified object of interest.

Measuring viewers' heart rate response to environment conservation videos

Digital media, particularly pictures and videos, have long been used to influence a person's cognition as well as her consequent actions. Previous work has shown that physiological indices such as heart rate variability can be used to measure emotional arousal. We measure heart rate variability as participants watch environment conservation videos. We compare the heart rate response against the pleasantness rating recorded during an independent Internet survey.

Perception of drowsiness based on correlation with facial image features

This paper presents a video-based method for detecting drowsiness. Generally, human beings can perceive their fatigue and drowsiness through looking at faces. The ability to perceive the fatigue and the drowsiness has been studied in many ways. The drowsiness detection method based on facial videos has been proposed [Nakamura et al. 2014]. In their method, a set of the facial features calculated with the Computer Vision techniques and the k-nearest neighbor algorithm are applied to classify drowsiness degree. However, the facial features that are ineffective against reproducing the perception of human beings with the machine learning method are not removed. This factor can decrease the detection accuracy.

Saliency and optical flow for gaze guidance in videos

Computer-based gaze guidance techniques have important applications in computer graphics, data visualization, image analysis, and training. Bailey et al. [2009] showed that it is possible to influence exactly where attention is allocated using a technique called Subtle Gaze Direction (SGD). The SGD approach combines eye tracking with brief image-space modulations in the peripheral regions of the field of view to guide viewer gaze about a scene. A fast eye-tracker is used to monitor gaze in real-time and the modulations are terminated before they can be scrutinized by the viewer's high acuity foveal vision. The SGD technique has been shown to improve spatial learning, visual search task performance, and problem solving in static digital imagery [Sridharan et al. 2012]. However, guiding attention in videos is challenging due to competing motion cues in the visual stimuli. We propose a novel method that uses scene saliency (spatial information) and optical flow (temporal information) to enable gaze guidance in dynamic scenes. The results of a user study show that the accuracy of responses to questions related to target regions in videos was higher among subjects who were gaze guided with our approach compared to a control group that was not actively guided.

Scan path and movie trailers for implicit annotation of videos

Affective annotation of videos is important for video understanding, ranking, retrieval, and summarization. We present an approach that uses excerpts that appeared in the official trailers of movies, as training data. Total scan path is computed as a metric for emotional arousal, based on previous eye tracking research. Arousal level on trailer excerpts is modeled as a Gaussian distribution, and signed distance from the mean of this distribution is used to separate out exemplars of high and low emotional arousal in movies.

The perception of symmetry in the moving image: multi-level computational analysis of cinematographic scene structure and its visual reception

This research is driven by visuo-spatial perception focussed cognitive film studies, where the key emphasis is on the systematic study and generation of evidence that can characterise and establish correlates between principles for the synthesis of the moving image, and its cognitive (e.g., embodied visuo-auditory, emotional) recipient effects on observers [Suchan and Bhatt 2016b; Suchan and Bhatt 2016a]. Within this context, we focus on the case of "symmetry" in the cinematographic structure of the moving image, and propose a multi-level model of interpreting symmetric patterns therefrom. This provides the foundation for integrating scene analysis with the analysis of its visuo-spatial perception based on eye-tracking data. This is achieved by the integration of: computational semantic interpretation of the scene [Suchan and Bhatt 2016b] ---involving scene objects (people, objects in the scene), cinematographic aids (camera movement, shot types, cuts and scene structure)--- and perceptual artefacts (fixations, saccades, scan-path, areas of attention).

User sensitivity to speed- and height-mismatch in VR

Facebook's purchase of Oculus VR in 2014 ushered in a new era of consumer virtual reality head-mounted displays (HMDs). Converging technological advancements in small, high-resolution displays and motion-detection devices propelled VR beyond the purview of high-tech research laboratories and into the mainstream. However, technological hurdles still remain. As more consumer grade products develop, user comfort and experience will be of the utmost importance. One of the biggest issues for HMDs that lack external tracking is drift in the user position and rotation sensors. Drift can cause motion sickness and make stationary items in the virtual environment to appear to shift in position. For developers who seek to design VR experiences that are rooted in real environments, drift can create large errors in positional tracking if left uncorrected over time. Although much of the current VR hardware makes use of external tracking devices to mitigate positional and rotational drift, the creation of head-mounted displays that can operate without the use of extremal tracking devices would make VR hardware more portable and flexible, and may therefore be a goal for future development.

Until technology advances sufficiently to completely overcome the hardware problems that cause drift, software solutions are a viable option to correct for it. It may be possible to speed up and slow down users as they move though the virtual world in order to bring their tracked position back into alignment with their position in the real world. If speed changes can be implemented without users noticing the alteration, it may offer a seamless solution that does not interfere with the VR experience.

In Experiments 1 and 2, we artificially introduced speed changes that made users move through the VR environment either faster than or slower than their actual real-world speed. Users were tasked with correctly identifying when they were moving at the correct true-to-life speed when compared to an altered virtual movement speed. Fore and aft movement and movement from side to side initiated by seated users bending at the waist were tested separately in two experiments. In Experiment 3, we presented alternating views of the virtual scene from different user heights. In this study, users had to correctly distinguish the view of the virtual scene presented at the correct height from incorrect shorter and taller heights.

In Experiments 1 and 2, we found that on average speed increases and decreases up to approximately 25% went unnoticed by users, suggesting that there is flexibility for programs to add speed changes imperceptible to users to correct for drift. In contrast, Experiment 3 demonstrates that on average users were aware of height changes after virtual heights were altered by just 5 cm. These thresholds can be used by VR developers to compensate for tracking mismatches between real and virtual positions of users of virtual environments, and also by engineers to benchmark new virtual reality hardware against human perceptual abilities.