SAP '18: Proceedings of the 15th ACM Symposium on Applied Perception

Full Citation in the ACM Digital Library

SESSION: Avatars

Virtual shadows for real humans in a CAVE: influence on virtual embodiment and 3D interaction

In immersive projection systems (IPS), the presence of the user's real body limits the possibility to elicit a virtual body ownership illusion. But, is it still possible to embody someone else in an IPS even though the users are aware of their real body? In order to study this question, we propose to consider using a virtual shadow in the IPS, which can be similar or different from the real user's morphology. We have conducted an experiment (N=27) to study the users' sense of embodiment whenever a virtual shadow was or was not present. Participants had to perform a 3D positioning task in which accuracy was the main requirement. The results showed that users widely accepted their virtual shadow (agency and ownership) and felt more comfortable when interacting with it (compare to no virtual shadow). Yet, due to the awareness of their real body, the users have less acceptance of the virtual shadow whenever the shadow gender differs from their own. Furthermore, the results showed that virtual shadows increase the users' spatial perception of the virtual environment by decreasing the inter-penetrations between the user and the virtual objects. Taken together, our results promote the use of dynamic and realistic virtual shadows in IPS and pave the way for further studies on "virtual shadow ownership" illusion.

Effects of anthropomorphic fidelity of self-avatars on reach boundary estimation in immersive virtual environments

Research has shown that self-avatars (life-size representations of the user in Virtual Reality (VR)) can affect how people perceive virtual environments. In this paper, we investigated whether the visual fidelity of a self-avatar affects reach boundary perception, as assessed through two variables: 1) action taken (or verbal response) and 2) correct judgment. Participants were randomly assigned to one of four conditions: i) high-fidelity self-avatar, ii) low-fidelity self-avatar, iii) no avatar (end-effector), and iv) real-world as reference task group. Results indicate that all three VR viewing conditions were significantly different from real world in regards to correctly judging the reachability of the target. However, based on verbal responses, only the "no avatar" condition had a non-trivial difference with real world condition. Taken together with reachability data, participants in "no avatar" condition were less likely to correctly reach to the reachable targets. Overall, participant performance improved after completing a calibration phase with feedback, such that correct judgments increased and participants reached to fewer unreachable targets.

The role of avatar fidelity and sex on self-motion recognition

Avatars are important for games and immersive social media applications. Although avatars are still not complete digital copies of the user, they often aim to represent a user in terms of appearance (color and shape) and motion. Previous studies have shown that humans can recognize their own motions in point-light displays. Here, we investigated whether recognition of self-motion is dependent on the avatar's fidelity and the congruency of the avatar's sex with that of the participants. Participants performed different actions that were captured and subsequently remapped onto three different body representations: a point-light figure, a male, and a female virtual avatar. In the experiment, participants viewed the motions displayed on the three body representations and responded to whether the motion was their own. Our results show that there was no influence of body representation on self-motion recognition performance, participants were equally sensitive to recognize their own motion on the point-light figure and the virtual characters. In line with previous research, recognition performance was dependent on the action. Sensitivity was highest for uncommon actions, such as dancing and playing ping-pong, and was around chance level for running, suggesting that the degree of individuality of performing certain actions affects self-motion recognition performance. Our results show that people were able to recognize their own motions even when individual body shape cues were completely eliminated and when the avatar's sex differed from own. This suggests that people might rely more on kinematic information rather than shape and sex cues for recognizing own motion. This finding has important implications for avatar design in game and immersive social media applications.

SESSION: Perception and self-movement

Perception of height in virtual reality: a study of climbing stairs

Most virtual environments that people locomote through with head-mounted displays are flat to match the physical environment that people are actively walking on. In this paper we simulated stair climbing, and evaluated how well people could assess the distance they had climbed after several minutes of the activity under various conditions. We varied factors such as the presence of virtual feet (shoes), whether the stairwell was open or enclosed, the presence or absence of passive haptic markers, and whether a subject was ascending or descending. In general, the distance climbed or descended was overestimated, consistent with prior work on the perception of height. We find that subjects have significantly better ability to estimate their error with the presence of virtual shoes than without, and when the environment was open. Having shoes also resulted in significantly higher ratings of presence. We also find a significant tendency for females to show higher ratings of simulator sickness.

Individual differences and impact of gender on curvature redirection thresholds

To enable real walking in a virtual environment (VE) that is larger than the available physical space, redirection techniques that introduce multisensory conflicts between visual and nonvisual cues to manipulate different aspects of a user's trajectory could be applied. When applied within certain thresholds, these manipulations could go unnoticed and immersion remains intact. Research effort has been spent on identifying these thresholds and a wide range of thresholds was reported in different studies. These differences in thresholds could be explained by many factors such as individual differences, walking speed, or context settings such as environment design, cognitive load, distractors, etc.

In this paper, we present a study to investigate the role of gender on curvature redirection thresholds (RDTs) using the maximum likelihood procedure with the classical two-alternative force choice task. Results show high variability in individuals' RDTs, and that on average women have higher curvature RDTs than men. Furthermore, results also confirm existing findings about the negative correlation between walking speed and curvature RDTs.

Judging action capabilities in augmented reality

The utility of mediated environments increases when environmental scale (size and distance) is perceived accurately. We present the use of perceived affordances---judgments of action capabilities---as an objective way to assess space perception in an augmented reality (AR) environment. The current study extends the previous use of this methodology in virtual reality (VR) to AR. We tested two locomotion-based affordance tasks. In the first experiment, observers judged whether they could pass through a virtual aperture presented at different widths and distances, and also judged the distance to the aperture. In the second experiment, observers judged whether they could step over a virtual gap on the ground. In both experiments, the virtual objects were displayed with the HoloLens in a real laboratory environment. We demonstrate that affordances for passing through and perceived distance to the aperture are similar in AR to those measured in the real world, but that judgments of gap-crossing in AR were underestimated. These differences across two affordances may result from the different spatial characteristics of the virtual objects (on the ground versus extending off the ground).

Evaluating the effects of four VR locomotion methods: joystick, arm-cycling, point-tugging, and teleporting

In this work we present two novel methods of exploring a large immersive virtual environment (IVE) viewed through a head-mounted display (HMD) using the tracked controllers that come standard with commodity-level HMD systems. With the first method, "Point-Tugging," users reach and pull the controller trigger at a point in front of them and move in the direction of the point they "tug" with the controller. With the second method, "Arm-Cycling," users move their arms while pulling the trigger on the hand-held controllers to translate in the yaw direction that their head is facing. We perform a search task experiment to directly compare four locomotion techniques: Joystick, Arm-Cycling, Point-Tugging, and Teleporting. In the joystick condition, a joystick is used to translate the user in the yaw direction of gaze with physical rotations matching virtual rotations. In the teleporting condition, the controllers create an arched beam that allows the user to select a point on the ground and instantly teleport to this location. We find that Arm-Cycling has advantages over the other methods and could be suitable for wide-spread use.

Comparison of unobtrusive visual guidance methods in an immersive dome environment

Comparing input methods and cursors for 3D positioning with head-mounted displays

Moving objects is an important task in 3D user interfaces. In this work, we focus on (precise) 3D object positioning in immersive virtual reality systems, especially head-mounted displays (HMDs). To evaluate input method performance for 3D positioning, we focus on an existing sliding algorithm, in which objects slide on any contact surface. Sliding enables rapid positioning of objects in 3D scenes on a desktop system but is yet to be evaluated in an immersive system. We performed a user study that compared the efficiency and accuracy of different input methods (mouse, hand-tracking, and trackpad) and cursor display conditions (stereo cursor and one-eyed cursor) for 3D positioning tasks with the HTC Vive. The results showed that the mouse outperformed hand-tracking and the trackpad, in terms of efficiency and accuracy. Stereo cursor and one-eyed cursor did not demonstrate a significant difference in performance, yet the stereo cursor condition was rated more favourable. For situations where the user is seated in immersive VR, the mouse is thus still the best input device for precise 3D positioning.

SESSION: Haptics

Touch with foreign hands: the effect of virtual hand appearance on visual-haptic integration

Hand tracking and haptics are gaining more importance as key technologies of virtual reality (VR) systems. For designing such systems, it is fundamental to understand how the appearance of the virtual hands influences user experience and how the human brain integrates vision and haptics. However, it is currently unknown whether multi-sensory integration of visual and haptic feedback can be influenced by the appearance of virtual hands in VR. We performed a user study in VR to gain insight into the effect of hand appearance on how the brain combines visual and haptic signals using a cue-conflict paradigm. In this paper, we show that the detection of surface irregularities (bumps and holes) sensed by eyes and hands is affected by the rendering of avatar hands. However, sensitivity changes do not correlate with the degree of perceived limb ownership. Qualitative feedback provides insights into potentially distracting cues in visual-haptic integration.

Expanding the sense of touch outside the body

Under normal circumstances, our sense of touch is limited to our body. Recent evidence suggests, however, that our perception of touch can also be expanded to objects we are holding when certain tactile illusions are elicited by delivering vibrotactile stimuli in a particular manner. Here, we examined whether an extra-corporeal illusory sense of touch could be elicited using vibrotactile stimuli delivered via two independent handheld controllers while in virtual reality. Our results suggest that under the right conditions, one's sense of touch in space can be extended outside the body, and even into the empty space that surrounds us. Specifically, we show, in virtual reality, that one's sense of touch can be extended to a virtual stick one is holding, and also into the empty space between one's hands. These findings provide a means with which to expand the sense of touch beyond the hands in VR systems using two independent controllers, and also have important implications for our understanding of the human representation of touch.

Learning to feel words: a comparison of learning approaches to acquire haptic words

Recent studies have shown that decomposing spoken or written language into phonemes and transcribing each phoneme into a unique vibrotactile pattern enables people to receive lexical messages on the arm. A potential barrier to adopting this new communication system is the time and effort required to learn the association between phonemes and vibrotactile patterns. Therefore, in this study, we compared the learnability and generalizability of different learning approaches, including guided learning, self-guided learning, and a mnemonic device. We found that after 65 minutes of learning spread across 3 days, 67% of participants, including both native and non-native English speakers, following the guided learning could identify 100 haptic words with over 90% accuracy, while only 20% of participants using the self-guided learning paradigm could do so.

Interaction between static visual cues and force-feedback on the perception of mass of virtual objects

We use force-feedback device and a game engine to measure the effects of material appearance on the perception of mass of virtual objects. We discover that the perceived mass is mainly determined by the ground-truth mass output by the force-feedback device. Different from the classic Material Weight Illusion (MWI), however, heavy-looking objects (e.g. steel) are consistently rated heavier than light-looking ones (e.g. fabric) with the same ground-truth mass. Analysis of the initial accelerated velocity of the movement trajectories of the virtual probe shows greater acceleration for materials with heavier rated mass. This effect is diminished when the participants lift the object for the second time, meaning that the influence of visual appearance disappears in the movement trajectories once it is calibrated by the force-feedback. We also show how the material categories are affected by both the visual appearance and the weight of the object. We conclude that visual appearance has a significant interaction with haptic force-feedback on the perception of mass and also affects the kinematics of how participants manipulate the object.

SESSION: Perception

Analysis of hair shine using rendering and subjective evaluation

Investigating perception time in the far peripheral vision for virtual and augmented reality

Far peripheral vision (beyond 60° eccentricity) is beginning to be supported in the latest virtual and augmented reality (VR and AR) headsets. This benefits the VR and AR experiences by allowing a greater amount of information to be conveyed, reducing visual clutter, and enabling subtle visual attention management. However, the visual properties of the far periphery are different from those of the central vision, because of the physiological differences between the areas on the visual cortex responsible for the respective vision types. In this paper, we investigate the perception time in the far peripheral vision, specifically the time it takes for a user to perceive a pattern at a high eccentricity. We have characterized the perception time in the far peripheral vision by conducting a user study on 40 participants in which the participants distinguish between two types of patterns displayed at several sizes and at various eccentricities in their field of view. Our results show that at higher eccentricities, participants take longer to perceive a pattern. Based on user study data, we are able to characterize the desired scaling of patterns at higher eccentricities, so that they can be perceived within a similar amount of time as in the central vision.

An appearance uniformity metric for 3D printing

A method is presented for perceptually characterizing appearance non-uniformities that result from 3D printing. In contrast to physical measurements, the model is designed to take into account the human visual system and variations in observer conditions such as lighting, point of view, and shape. Additionally, it is capable of handling spatial reflectance variations over a material's surface. Motivated by Schrödinger's line element approach to studying color differences, an image-based psychophysical experiment that explores paths between materials in appearance space is conducted. The line element concept is extended from color to spatially-varying appearances-including color, roughness and gloss-which enables the measurement of fine differences between appearances along a path. We define two path functions, one interpolating reflectance parameters and the other interpolating the final imagery. An image-based uniformity model is developed, applying a trained neural network to color differences calculated from rendered images of the printed non-uniformities. The final model is shown to perform better than commonly used image comparison algorithms, including spatial pattern classes that were not used in training.

SESSION: Speech perception & eye movement

The semantic space for emotional speech and the influence of different methods for prosody isolation on its perception

Normally, when people talk to other people, they communicate not only using specific words, but also with intentional changes in their voice melody, facial expressions, and gestures. Not only is human communication inherently multimodal, it is also multi-layered. That is, it conveys more than simple semantic information, but also passes on a wide variety of social, emotional, and functional (e.g., conversation control) information. Previous work has examined the perception of socio-emotional information conveyed by words and facial expressions. Here, we build on that work and examine the perception of socio-emotional information based solely on prosody (e.g., speech melody, rate, tempo, intensity). To examine the perception of affective prosody, it is necessary to remove all semantics from the speech signal - without changing the prosody! In this paper, we compare several different state-of-the-art methods for removing semantics. We started by recording an audio database containing a German sentence spoken by 11 people in 62 different emotional states. We then removed or masked the semantics using three different techniques. We also recorded the same 62 states for a pseudo-language phrase. Each of these five sets of stimuli were subjected to a semantic differential rating task to derive and compare the semantic spaces for emotions. The results show that each of the methods successfully removed the semantic component, but also changed the perception of the emotional content. Interestingly, the pseudo-word stimuli diverged most from the normal sentences. Furthermore, although each of the filters affected the perception of the sentence in some manner, they did so in different ways.

Effects of virtual acoustics on target-word identification performance in multi-talker environments

Many virtual reality applications let multiple users communicate in a multi-talker environment, recreating the classic cocktail-party effect. While there is a vast body of research focusing on the perception and intelligibility of human speech in real-world scenarios with cocktail party effects, there is little work in accurately modeling and evaluating the effect in virtual environments. Given the goal of evaluating the impact of virtual acoustic simulation on the cocktail party effect, we conducted experiments to establish the signal-to-noise ratio (SNR) thresholds for target-word identification performance. Our evaluation was performed for sentences from the coordinate response measure corpus in presence of multi-talker babble. The thresholds were established under varying sound propagation and spatialization conditions. We used a state-of-the-art geometric acoustic system integrated into the Unity game engine to simulate varying conditions of reverberance (direct sound, direct sound & early reflections, direct sound and early reflections and late reverberation) and spatialization (mono, stereo, and binaural). Our results show that spatialization has the biggest effect on the ability of listeners to discern the target words in multi-talker virtual environments. Reverberance, on the other hand, slightly affects the target word discerning ability negatively.

Analysis of neural correlates of saccadic eye movements

In a concurrent electroencephalography (EEG) and eye-tracking study, we explore the specific neural responses associated with saccadic eye movements. We hypothesise that there is a distinct saccade-related neural response that occurs well before a physical saccade and that this response is different for free, natural saccades versus forced saccades. Our results show a distinct and measurable brain response approximately 200 ms before a physical saccade actually occurs. This response is distinctly different for free saccades versus forced saccades. Our results open up possibilities of predicting saccades based on neural data. This is of particular relevance for creating effective gaze guidance mechanisms within a virtual reality (VR) environment and for creating faster brain computer interfaces (BCI).

A comparison of eye-head coordination between virtual and physical realities

Past research has shown that humans exhibit certain eye-head responses to the appearance of visual stimuli, and these natural reactions change during different activities. Our work builds upon these past observations by offering new insight to how humans behave in Virtual Reality (VR) compared to Physical Reality (PR). Using eye- and head- tracking technology, and by conducting a study on two groups of users - participants in VR or PR - we identify how often these natural responses are observed in both environments. We find that users statistically move their heads more often when viewing stimuli in VR than in PR, and VR users also move their heads more in the presence of text. We open a discussion for identifying the HWD factors that cause this difference, as this may not only affect predictive models using eye movements as features, but also VR user experience overall.

SESSION: Selective rendering & virtual characters

Foveated depth-of-field filtering in head-mounted displays

Assessing vignetting as a means to reduce VR sickness during amplified head rotations

Redirected and amplified head movements have the potential to provide more natural interaction with virtual environments (VEs) than using controller-based input, which causes large discrepancies between visual and vestibular self-motion cues and leads to increased VR sickness. However, such amplified head movements may also exacerbate VR sickness symptoms over no amplification. Several general methods have been introduced to reduce VR sickness for controller-based input inside a VE, including a popular vignetting method that gradually reduces the field of view.

In this paper, we investigate the use of vignetting to reduce VR sickness when using amplified head rotations instead of controller-based input. We also investigate whether the induced VR sickness is a result of the user's head acceleration or velocity by introducing two different modes of vignetting, one triggered by acceleration and the other by velocity. Our dependent measures were pre and post VR sickness questionnaires as well as estimated discomfort levels that were assessed each minute of the experiment. Our results show interesting effects between a baseline condition without vignetting, as well as the two vignetting methods, generally indicating that the vignetting methods did not succeed in reducing VR sickness for most of the participants and, instead, lead to a significant increase. We discuss the results and potential explanations of our findings.

Deep learning of biomimetic visual perception for virtual humans

Future generations of advanced, autonomous virtual humans will likely require artificial vision systems that more accurately model the human biological vision system. With this in mind, we propose a strongly biomimetic model of visual perception within a novel framework for human sensorimotor control. Our framework features a biomechanically simulated, musculoskeletal human model actuated by numerous skeletal muscles, with two human-like eyes whose retinas have spatially nonuniform distributions of photoreceptors not unlike biological retinas. The retinal photoreceptors capture the scene irradiance that reaches them, which is computed using ray tracing. Within the sensory subsystem of our model, which continuously operates on the photoreceptor outputs, are 10 automatically-trained, deep neural networks (DNNs). A pair of DNNs drive eye and head movements, while the other 8 DNNs extract the sensory information needed to control the arms and legs. Thus, exclusively by means of its egocentric, active visual perception, our biomechanical virtual human learns, by synthesizing its own training data, efficient, online visuomotor control of its eyes, head, and limbs to perform tasks involving the foveation and visual pursuit of target objects coupled with visually-guided reaching actions to intercept the moving targets.

Perceptual adjustment of eyeball and pupil diameter jitter amplitudes for virtual characters