SAP '19: ACM Symposium on Applied Perception 2019

Full Citation in the ACM Digital Library

SESSION: Paper Session 1: Avatars and Rendering Humans

Empirical Evaluation of the Interplay of Emotion and Visual Attention in Human-Virtual Human Interaction

We examined the effect of rendering style and the interplay between attention and emotion in users during interaction with a virtual patient in a medical training simulator. The virtual simulation was rendered representing a sample from the photo-realistic to the non-photorealistic continuum, namely Near-Realistic, Cartoon or Pencil-Shader. In a mixed design study, we collected 45 participants’ emotional responses and gaze behavior using surveys and an eye tracker while interacting with a virtual patient who was medically deteriorating over time. We used a cross-lagged panel analysis of attention and emotion to understand their reciprocal relationship over time. We also performed a mediation analysis to compare the extent to which the virtual agent’s appearance and his affective behavior impacted users’ emotional and attentional responses. Results showed the interplay between participants’ visual attention and emotion over time and also showed that attention was a stronger variable than emotion during the interaction with the virtual human.

A psychophysical model to control the brightness and key-to-fill ratio in CG cartoon character lighting

Lighting is a commonly used tool to manipulate the appearance of virtual characters in a range of applications. However, there are few studies which systematically examine the effect of lighting changes on complex dynamic stimuli. Our study presents several perceptual experiments, designed to investigate the ability of participants to discriminate lighting levels and the ratio of light intensity projected on the two sides of a cartoon character’s face (key-to-fill ratio) in portrait lighting design. We used a standard psychophysical method for measuring discrimination, typical in low-level perceptual studies but not frequently considered for evaluating complex stimuli. We found that people can easily differentiate lighting intensities, and distinguish between shadow strength and scene brightness under bright conditions but not under dark conditions. We provide a model of the results, and empirically validate the predictions of the model. We discuss the practical implications of our results and how they can be exploited to make the process of portrait lighting for CG cartoon characters more consistent, such as a tool for manipulating shadow while maintaining the level of perceived brightness.

The Influence of the Viewpoint in a Self-Avatar on Body Part and Self-Localization

The goal of this study is to determine how a self-avatar in virtual reality, experienced from different viewpoints on the body (at eye- or chest-height), might influence body part localization, as well as self-localization within the body. Previous literature shows that people do not locate themselves in only one location, but rather primarily in the face and the upper torso. Therefore, we aimed to determine if manipulating the viewpoint to either the height of the eyes or to the height of the chest would influence self-location estimates towards these commonly identified locations of self. In a virtual reality (VR) headset, participants were asked to point at several of their body parts (body part localization) as well as ”directly at you” (self-localization) with a virtual pointer. Both pointing tasks were performed before and after a self-avatar adaptation phase where participants explored a co-located, scaled, gender-matched, and animated self-avatar. We hypothesized that experiencing a self-avatar might reduce inaccuracies in body part localization, and that viewpoint would influence pointing responses for both body part and self-localization. Participants overall pointed relatively accurately to some of their body parts (shoulders, chin, and eyes), but very inaccurately to others, with large undershooting for the hips, knees, and feet, and large overshooting for the top of the head. Self-localization was spread across the body (as well as above the head) with the following distribution: the upper face (25%), the upper torso (25%), above the head (15%) and below the torso (12%). We only found an influence of viewpoint (eye- vs chest-height) during the self-avatar adaptation phase for body part localization and not for self-localization. The overall change in error distance for body part localization for the viewpoint at eye-height was small (M = –2.8 cm), while the overall change in error distance for the viewpoint at chest-height was significantly larger, and in the upwards direction relative to the body parts (M = 21.1 cm). In a post-questionnaire, there was no significant difference in embodiment scores between the viewpoint conditions. Most interestingly, having a self-avatar did not change the results on the self-localization pointing task, even with a novel viewpoint (chest-height). Possibly, body-based cues, or memory, ground the self when in VR. However, the present results caution the use of altered viewpoints in applications where veridical position sense of body parts is required.

Virtual Grasping Feedback and Virtual Hand Ownership

In this work, we investigate the influence of different visualizations on a manipulation task in virtual reality (VR). Without the haptic feedback of the real world, grasping in VR might result in intersections with virtual objects. As people are highly sensitive when it comes to perceiving collisions, it might look more appealing to avoid intersections and visualize non-colliding hand motions. However, correcting the position of the hand or fingers results in a visual-proprioceptive discrepancy and must be used with caution. Furthermore, the lack of haptic feedback in the virtual world might result in slower actions as a user might not know exactly when a grasp has occurred. This reduced performance could be remediated with adequate visual feedback.

In this study, we analyze the performance, level of ownership, and user preference of eight different visual feedback techniques for virtual grasping. Three techniques show the tracked hand (with or without grasping feedback), even if it intersects with the grasped object. Another three techniques display a hand without intersections with the object, called outer hand, simulating the look of a real world interaction. One visualization is a compromise between the two groups, showing both a primary outer hand and a secondary tracked hand. Finally, in the last visualization the hand disappears during the grasping activity.

In an experiment, users perform a pick-and-place task for each feedback technique. We use high fidelity marker-based hand tracking to control the virtual hands in real time. We found that the tracked hand visualizations result in better performance, however, the outer hand visualizations were preferred. We also find indications that ownership is higher with the outer hand visualizations.

The Influence of Visual Perspective on Body Size Estimation in Immersive Virtual Reality

The creation of realistic self-avatars that users identify with is important for many virtual reality applications. However, current approaches for creating biometrically plausible avatars that represent a particular individual require expertise and are time-consuming. We investigated the visual perception of an avatar’s body dimensions by asking males and females to estimate their own body weight and shape on a virtual body using a virtual reality avatar creation tool. In a method of adjustment task, the virtual body was presented in an HTC Vive head-mounted display either co-located with (first-person perspective) or facing (third-person perspective) the participants. Participants adjusted the body weight and dimensions of various body parts to match their own body shape and size. Both males and females underestimated their weight by 10-20% in the virtual body, but the estimates of the other body dimensions were relatively accurate and within a range of ± 6%. There was a stronger influence of visual perspective on the estimates for males, but this effect was dependent on the amount of control over the shape of the virtual body, indicating that the results might be caused by where in the body the weight changes expressed themselves. These results suggest that this avatar creation tool could be used to allow participants to make a relatively accurate self-avatar in terms of adjusting body part dimensions, but not weight, and that the influence of visual perspective and amount of control needed over the body shape are likely gender-specific.

SESSION: Paper Session 2: Human Motions

EVA: Generating Emotional Behavior of Virtual Agents using Expressive Features of Gait and Gaze

We present a novel, real-time algorithm, EVA, for generating virtual agents with various perceived emotions. Our approach is based on using Expressive Features of gaze and gait to convey emotions corresponding to happy, sad, angry, or neutral. We precompute a data-driven mapping between gaits and their perceived emotions. EVA uses this gait emotion association at runtime to generate appropriate walking styles in terms of gaits and gaze. Using the EVA algorithm, we can simulate gaits and gazing behaviors of hundreds of virtual agents in real-time with known emotional characteristics. We have evaluated the benefits in different multi-agent VR simulation environments. Our studies suggest that the use of expressive features corresponding to gait and gaze can considerably increase the sense of presence in scenarios with multiple virtual agents.

Perceptual Comparison of Procedural and Data-Driven Eye Motion Jitter

Research has shown that keyframed eye motions are perceived as more realistic when some noise is added to eyeball motions and to pupil size changes. We investigate whether this noise, in contrast to being motion captured, can be synthesized with standard techniques, e.g., procedural or data-driven approaches. In a two-alternative forced choice task, we compare eye animations created with four different techniques: motion captured, procedural, data-driven, and keyframed (lacking noise). Our perceptual experiment uses three character models with different levels of realism and two motions. Our results suggest that procedural and data-driven noise can be used to create animations at similar perceived naturalness to our motion captured approach. Participants’ eye movements when viewing the animations show that animations without jitter yielded fewer fixations, suggesting ease of dismissal as unnatural.

Do We Have to Look at the Mirror All the Time? Effect of Partial Visuomotor Feedback on Body Ownership of a Virtual Human Tail

Studies have shown that the sense of body ownership towards virtual humanoid avatars with additional body parts can be successfully elicited when synchronous visuomotor and/or visuotactile feedback is given. In an interactive virtual reality (VR) application, however, it is difficult for users to uninterruptedly observe the added body parts, especially when they are attached to the backs of the avatars. Thus, the embodiment of such body parts needs to be achieved using limited synchronous visuomotor feedback. Commonly, it is specifically done by looking at a virtual mirror reflection of the avatar’s movement. However, the methodology of eliciting the sense of body ownership in such conditions remains to be studied.

In this paper, we investigate whether it is possible to elicit a sense of body ownership using an avatar with a tail attached to its coccyx, even when the synchronous visuomotor feedback from a mirror is partial (i.e., interrupted or reduced, not a complete asynchrony). In the experiment, participants performed a task under the following three conditions regarding visuomotor synchrony provision: where the feedback is constantly given, given until halfway through the trial (reduction), and interruptedly given (interruption). Results suggest that the interruption or the reduction of the synchronous visuomotor feedback does not significantly disturb the elicitation of body ownership of the virtual tail. Thus, ownership in this fashion can be elicited in a manner insignificantly inferior to that of when synchronous visuomotor feedback is constantly given. Our findings create opportunities for researchers and engineers to more freely design interactive VR applications involving the embodiment of virtual avatars with extra body parts attached.

SESSION: Paper Session 3: Virtual Space

Infinity Walk in VR: Effects of Cognitive Load on Velocity during Continuous Long-Distance Walking

Bipedal walking is generally considered to be the most natural and common locomotion technique in the physical world, for humans, and the most presence-enhancing form of locomotion in virtual reality (VR). However, there are significant differences in the way people walk in VR compared to their walking behaviour in the real world. For instance, previous studies have shown a significant decrease of gait parameters, in particular, velocity and step length in the virtual environment (VE). However, those studies have only considered short periods of walking. In contrast, many VR applications involve extended exposures to the VE and often include additional cognitive tasks such as way-finding. Hence, it remains an open question whether velocity during VR walking will further slowdown over time or if users of VR will eventually speed-up and adapt their velocity to the VE and move with the same speed as in the real world.

In this paper we present a study to compare the effects of cognitive task on velocity during long-distance walking in VR compared to walking in the real world. Therefore, we used an exact virtual replica model of the users’ real surrounding. To reliably evaluate locomotion performance, we analyzed walking velocity during long-distance walking. This was achieved by 60 consecutive cycles using a left/right figure-8 protocol, which avoids the limitations of treadmill and non-consecutive walking protocols (i. e., start-stop). The results show a significant decrease of velocity in the VE compared to the real world even after 60 consecutive cycles with and without the cognitive task.

Stimulating the Brain in VR: Effects of Transcranial Direct-Current Stimulation on Redirected Walking

Redirected walking (RDW) enables virtual reality (VR) users to explore large virtual environments (VE) in confined tracking spaces by guiding users on different paths in the real world than in the VE. However, so far, spaces larger than typical room-scale setups of 5m × 5m are still required to allow infinitely straight walking, i. e., to prevent a subjective mismatch between real and virtual paths. This mismatch could in theory be reduced by interacting with the underlying brain activity. Transcranial direct-current stimulation (tDCS) presents a simply method able to modify ongoing cortical activity and excitability levels. Hence, this approach provides enormous potential to widen detection thresholds for RDW, and consequently reduce the above mentioned space requirements. In this paper, we conducted a psychophysical experiment using tDCS to evaluate detection thresholds for RDW gains. In the stimulation conditon 1.25 mA cathodal tDCS were applid over the prefrontal cortex (AF4 with Pz for the return current) for 20 minutes. TDCS failed to exert a significant overall effect on detection thresholds. However, for the highest gain only, path deviance was significantly modified by tDCS. In addition, subjectively reported disorientation was significantly lower during the tDCS as compared to the sham condition. Along the same line, oculomotor cyber sickness symptoms after the session were significantly decreased compared to baseline in tDCS, while there was no significant effect in sham. This work presents the first use of tDCS during virtual walking which provides new vistas for future research in the area of neurostimulation in VR.

Perception of Spatial Relationships in Impossible Spaces

Impossible spaces have been used to increase the amount of virtual space available for real walking within a constrained physical space. In this technique, multiple virtual rooms are allowed to occupy overlapping portions of the physical space, in a way which is not possible in real euclidean space. Prior work has explored detection thresholds for impossible spaces, however very little work has considered other aspects of how impossible spaces alter participants’ perception of spatial relationships within virtual environments. In this paper, we present a within-subjects study (n = 30) investigating how impossible spaces altered participants perceptions of the location of objects placed in different rooms. Participants explored three layouts with varying amounts of overlap between rooms and then pointed in the direction of various objects they had been tasked to locate. Significantly more error was observed when pointing at objects in overlapping spaces as compared to the non-overlapping layout. Further analysis suggests that participants pointed towards where objects would be located in the non-overlapping layout, regardless of how much overlap was present. This suggests that, when participants are not aware that any manipulation is present, they automatically adapt their representation of the spaces based on judgments of relative size and visible constraints on the size of the whole system.

How Video Game Locomotion Methods Affect Navigation in Virtual Environments

Navigation, or the means by which people find their way in an environment, depends on the ability to combine information from multiple sources so that properties of an environment, such as the location of a goal, can be estimated. An important source of information for navigation are spatial cues generated by self-motion. Navigation based solely on body-based cues generated by self-motion is called path integration. In virtual reality and video games, many locomotion systems, that is, methods that move users through a virtual environment, can often distort or deprive users of important self-motion cues. There has been much study of this issue, and in this paper, we extend that study in novel directions by assessing the effect of four game-like locomotion interfaces on navigation performance using path integration. The salient features of our locomotion interfaces are that two are primarily continuous, i.e., more like a joystick, and two are primarily discrete, i.e., more like teleportation. Our main findings are that the perspective of path integration, people are able to use all methods, although continuous methods outperform discrete methods.

SESSION: Paper Session 4: Viewpoint in VR

An Analysis of User Perception Regarding Body-Worn 360° Camera Placements and Heights for Telepresence

Our work investigates body-worn 360° camera placements for telepresence, to balance height and clarity of view. We conducted a user study in a Virtual Reality (VR) simulation, using a 3x3 within-subjects experimental design varying placement and height, with 26 participants. We found that shoulder mounted cameras were significantly less preferable than our other conditions due to the occlusions caused by the wearer’s head. Our results did not show a significant effect of camera height within a range of +/- 12 inches from the user’s natural height. As such, in the context of body-worn 360° cameras, there is leeway for camera height, whereas strategic bodily placements are more important. Based on these results, we provide design recommendations for content creators using wearable cameras for immersive telepresence.

Am I Floating or Not? : Sensitivity to Eye Height Manipulations in HMD-based Immersive Virtual Environments

Eye height manipulations have previously been found to affect judgments of object size and egocentric distance in both real and immersive virtual environments. In this short paper we report the results of an experiment that explores people’s sensitivity to various offsets of their eye height in VR using a forced-choice task in a wide variety of different architectural models. Our goal is to better understand the range of eye height manipulations that can be surreptitiously employed under different environmental conditions.

We exposed each of 10 standing participants to a total of 121 randomly-ordered trials, spanning 11 different eye height offsets between –80cm to +80cm, in 11 different highly detailed virtual indoor environments, and asked them to report whether they felt that their (invisible) feet were floating above or sunken below the virtual floor. We fit psychometric functions to the pooled data and to the data from each virtual environment and each participant individually. In the pooled data, we found a point-of-subjective-equality (PSE) very close to zero (–3.8cm), and 25% and 75% detection thresholds of –16.1cm and +8.6cm respectively, for an uncertainty interval of 24.7cm. We also observed some interesting variations in the results between individual rooms, which we discuss in more detail in the paper. Our findings can help to inform VR developers about users’ sensitivity to incorrect eye height placement, to elucidate the potential impact of various features of interior spaces on people’s tolerance of eye height manipulations, and to inform future work seeking to employ eye height manipulations to mitigate distance underestimation in VR.

Differences in Haptic and Visual Perception of Expressive 1DoF Motion

Humans can perceive motion through a variety of different modalities. Vision is a well explored modality; however haptics can greatly increase the richness of information provided to the user. The detailed differences in perception of motion between these two modalities are not well studied and can provide an additional avenue for communication between humans and haptic devices or robots. We analyze these differences in the context of users interactions with a non-anthropomorphic haptic device. In this study, participants experienced different levels and combinations of stiffness, jitter, and acceleration curves via a one degree of freedom linear motion display. These conditions were presented with and without the opportunity for users to touch the setup. Participants rated the experiences within the contexts of emotion, anthropomorphism, likeability, and safety using the SAM scale, HRI metrics, as well as with qualitative feedback. A positive correlation between stiffness and dominance, specifically due to the haptic condition, was found; additionally, with the introduction of jitter, decreases in perceived arousal and likeability were recorded. Trends relating acceleration curves to perceived dominance as well as stiffness and jitter to valence, arousal, dominance, likeability, and safety were also found. These results suggest the importance of considering which sensory modalities are more actively engaged during interactions and, concomitantly, which behaviors designers should employ in the creation of non-anthropomorphic interactive haptic devices to achieve a particular interpreted affective state.

SESSION: Paper Session 5: Haptics and Images

The Effect of Motion on the Perception of Material Appearance

We analyze the effect of motion in the perception of material appearance. First, we create a set of stimuli containing 72 realistic materials, rendered with varying degrees of linear motion blur. Then we launch a large-scale study on Mechanical Turk to rate a given set of perceptual attributes, such as brightness, roughness, or the perceived strength of reflections. Our statistical analysis shows that certain attributes undergo a significant change, varying appearance perception under motion. In addition, we further investigate the perception of brightness, for the particular cases of rubber and plastic materials. We create new stimuli, with ten different luminance levels and seven motion degrees. We launch a new user study to retrieve their perceived brightness. From the users’ judgements, we build two-dimensional maps showing how perceived brightness varies as a function of the luminance and motion of the material.

Comparison of subjective methods, with and without explicit reference, for quality assessment of 3D graphics

Numerous methodologies for subjective quality assessment exist in the field of image processing. In particular, the Absolute Category Rating with Hidden Reference (ACR-HR) and the Double Stimulus Impairment Scale (DSIS) are considered two of the most prominent methods for assessing the visual quality of 2D images and videos. Are these methods valid/accurate to evaluate the perceived quality of 3D graphics data? Is the presence of an explicit reference necessary, due to the lack of human prior knowledge on 3D graphics data compared to natural images/videos? To answer these questions, we compare these two subjective methods (ACR-HR and DSIS) on a dataset of high-quality colored 3D models, impaired with various distortions. These subjective experiments were conducted in a virtual reality (VR) environment. Our results show differences in the performance of the methods depending on the 3D contents and the types of distortions. We show that DSIS outperforms ACR-HR in term of accuracy and points out a stable performance. Results also yield interesting conclusions on the importance of a reference for judging the quality of 3D graphics. We finally provide recommendations regarding the influence of the number of observers on the accuracy.

Spectral Visualization Sharpening

In this paper, we propose a perceptually-guided visualization sharpening technique. We analyze the spectral behavior of an established comprehensive perceptual model to arrive at our approximated model based on an adapted weighting of the bandpass images from a Gaussian pyramid. The main benefit of this approximated model is its controllability and predictability for sharpening color-mapped visualizations. Our method can be integrated into any visualization tool as it adopts generic image-based post-processing, and it is intuitive and easy to use as viewing distance is the only parameter. Using highly diverse datasets, we show the usefulness of our method across a wide range of typical visualizations.

SESSION: Paper Session 6: Gaze and Attention

Transsaccadic Awareness of Scene Transformations in a 3D Virtual Environment

In gaze-contingent displays, the viewer’s eye movement data are processed in real-time to adjust the graphical content. To provide a high-quality user experience, these graphical updates must occur with minimum delay. Such updates can be used to introduce imperceptible changes in virtual camera pose in applications such as networked gaming, collaborative virtual reality and redirected walking. For such applications, perceptual saccadic suppression can help to hide the graphical artifacts. We investigated whether the visibility of these updates depends on the type of image transformation. Users viewed 3D scenes in which the displacement of a target object triggered them to generate a vertical or horizontal saccade, during which a translation or rotation was applied to the virtual camera used to render the scene. After each trial, users indicated the direction of the scene change in a forced-choice task. Results show that type and size of the image transformation affected change detectability. During horizontal or vertical saccades, rotations along the roll axis were the most detectable, while horizontal and vertical translations were least noticed. We confirm that large 3D adjustments to the scene viewpoint can be introduced unobtrusively and with low latency during saccades, but the allowable extent of the correction varies with the transformation applied.

Measurements of contrast sensitivity for peripheral vision

Contrast detection thresholds were measured for the eccentricities from 0° to 27° and a range of stimuli frequencies from 0.125cpd to 16cpd. The measurements were motivated by the need to collect visual performance data for the gaze-contingent rendering system. For this application, the mixed chromatic and achromatic stimuli are even more important than purely chromatic cases. Therefore, the detection of sine-gratings with Gaussian patches was measured for four mixed chromatic/achromatic with a varying share of the achromatic components. To verify that our experimental setup generates the results consistent with the previous work, we also measured the contrast thresholds for achromatic (black to white) stimulus. Five observers participated in the experiments and they individually determined the detection threshold for each stimulus using the QUEST method. The results plotted as the contrast sensitivity function (CSF) follow the state-of-the-art CSF models. However, we report lower sensitivity to contrast for achromatic stimuli caused by the small size of the stimulus. The color directions closer to the chromatic green-to-red axis show higher contrast sensitivity in comparison to achromatic stimuli, while for the yellow-to-blue axis the sensitivity is lower. The higher achromatic component in the mixed stimuli approaches contrast sensitivity to the achromatic CSF.

Reading Speed Decreases for Fast Readers Under Gaze-Contingent Rendering

Gaze-contingent rendering and display could help meet the increasing resolution and frame rate demands of modern displays while reducing the required latency, bandwidth, and power. However, it is still unclear how degradation of the peripheral image impacts behavior, particularly for the important task of reading. We examined changes in reading speed with different levels of peripheral degradation, varying the size of the text, foveal region, and sub-sampling kernel. We found a wide spread of responses across subjects, with the average change in reading speed ranging from -123 words per minute (WPM) to +67 WPM. We did not find significant effects across types of peripheral degradation, but the change in reading speed was significantly inversely correlated with baseline reading speed (r=-0.513, n=17, p=0.0352), indicating that faster readers were more negatively impacted.

Towards VR Attention Guidance: Environment-dependent Perceptual Threshold for Stereo Inverse Brightness Modulation

In this paper, we propose a new method for attention and gaze redirection, specifically designed for immersive stereo displays. Exploiting the dual nature of stereo imagery, our stimulus is composed of complementary parts displayed for each individual eye. This attracts viewers’ attention due to induced binocular rivalry. In a perceptual study, we investigate size- and intensity-related perceptual thresholds of our stimulus for six different real-world panorama images. Our results show that a flexible parameterization allows the stimulus to be perceived even in complex surroundings. To prepare for technical innovations expected in future-generation virtual reality headsets, we used a commercially available head-mounted display as well as a high-resolution dps.

Assessment of Driver Attention during a Safety Critical Situation in VR to Generate VR-based Training

Crashes involving pedestrians on urban roads can be fatal. In order to prevent such crashes and provide safer driving experience, adaptive pedestrian warning cues can help to detect risky pedestrians. However, it is difficult to test such systems in the wild, and train drivers using these systems in safety critical situations. This work investigates whether low-cost virtual reality (VR) setups, along with gaze-aware warning cues, could be used for driver training by analyzing driver attention during an unexpected pedestrian crossing on an urban road. Our analyses show significant differences in distances to crossing pedestrians, pupil diameters, and driver accelerator inputs when the warning cues were provided. Overall, there is a strong indication that VR and Head-Mounted-Displays (HMDs) could be used for generating attention increasing driver training packages for safety critical situations.