MIG '21: Motion, Interaction and Games

Full Citation in the ACM Digital Library

SESSION: Virtual Reality

Ego-Interaction: Visual Hand-Object Pose Correction for VR Experiences

Immersive virtual reality (VR) experiences may track both a user’s hands and a physical object at the same time and use the information to animate computer generated representations of the two interacting. However, to render visually without artefacts requires highly accurate tracking of the hands and the objects themselves as well as their relative locations – made even more difficult when the objects are articulated or deformable. If this tracking is incorrect, then the quality and immersion of the visual experience is reduced. In this paper we turn the problem around – instead of focusing on producing quality renders of hand-object interactions by improving tracking quality, we acknowledge there will be tracking errors and just focus on fixing the visualisations. We propose a Deep Neural Network (DNN) that modifies hand pose based on its relative position with the object. However, to train the network we require sufficient labelled data. We therefore also present a new dataset of hand-object interactions – Ego-Interaction. This is the first hand-object interaction dataset with egocentric RGBD videos and 3D ground truth data for both rigid and non-rigid objects. The Ego-Interaction dataset contains 92 sequences with 4 rigid, 1 articulated and 4 non-rigid objects and demonstrates hand-object interactions with 1 and 2 hands carefully captured, rigged and animated using motion capture. We provide our dataset as a general resource for researchers in the VR and AI community interested in other hand-object and egocentric tracking related problems.

Assessing the Impact of Mixed Reality Immersion on Presence and Embodiment

When placed inside an immersive virtual simulation, subjects will tend to experience the feeling of being ’really there’ and to respond realistically to their environment, forgetting that it is not real. This behaviour is observed when subjects experience a high sense of presence, the sensation of being in a real place and that the scenario being depicted to them is real. Here we present an experiment designed to evaluate the impact of different levels of immersion, and of different blending of virtual and real objects and body representations, on participant’s subjective experience. Presence is evaluated with an innovative method combining the random introduction of breaks-in-presence (BiP) with a rapid decision-making test. Results show that the level of immersion impacts both the Sense of Presence (SoP) and the Sense of Embodiment (SoE), that the BiP has a limited impact on the SoE without breaking it, and that the level of confidence in the decision test correlates with both the SoP and the SoE.

VR Natural Walking in Impossible Spaces

Locomotion techniques in Virtual Reality (VR) are the means by which users traverse a Virtual Environment (VE) and are considered an integral and indispensable part of user interaction.

This paper investigates the potential that natural walking in impossible spaces provides as a viable locomotion technique in VR when compared to conventional alternatives, such as teleportation, arm-swinging and touchpad/joystick. In this context, impossible spaces are locally Euclidean orbit-manifolds — subspaces separated by portals that are individually consistent but are able to impossibly overlap in space without interacting.

A quantitative user experiment was conducted with n = 25 participants, who were asked to complete a set of tasks inside four houses, in each case using a different locomotion technique to navigate. After completing all tasks for a given house, participants were then asked to complete a set of three questionnaires regarding the technique used, namely the Simulator Sickness Questionnaire (SSQ), Game Experience Questionnaire (GEQ) and System Usability Scale (SUS). Time for task completion was also recorded.

It was found that natural walking in impossible spaces significantly improves (α = 0.05) immersion (as compared to teleportation and touchpad/joystick, r > 0.7) and system usability (over touchpad/joystick and arm-swinging, r ≥ 0.38), but seems to lead to slower task completion.

SESSION: Learning

GarMatNet: A Learning-based Method for Predicting 3D Garment Mesh with Parameterized Materials

Recent progress in learning-based methods of garment mesh generation is resulting in increased efficiency and maintenance of reality during the generation process. However, none of the previous works so far have focused on variations in material types based on a parameterized material parameter under static poses. In this work, we propose a learning-based method, GarMatNet, for predicting garment deformation based on the functions of human poses and garment materials while maintaining detailed garment wrinkles. GarMatNet consists of two components: a generally-fitting network for predicting smoothed garment mesh and a locally-detailed network for adding detailed wrinkles based on smoothed garment mesh. We hypothesize that material properties play an essential role in the deformation of garments. Since the influences of material type are relatively smaller than pose or body shape, we employ linear interpolation among different factors to control deformation. More specifically, we apply a parameterized material space based on the mass-spring model to express the difference between materials and construct a suitable network structure with weight adjustment between material properties and poses. The experimental results demonstrate that GarMatNet is comparable to the physically-based simulation (PBS) prediction and offers advantages regarding generalization ability, model size, and training time over the baseline model.

How to train your dog: Neural enhancement of quadruped animations

Creating realistic quadruped animations is challenging. Producing realistic animations using methods such as key-framing is time consuming and requires much artistic expertise. Alternatively, motion capture methods have their own challenges (getting the animal into a studio, attaching motion capture markers, and getting the animal to put on the desired performance) and the resulting animation will still most likely require cleaning up. It would be useful if an animator could provide an initial rough animation and in return be given a corresponding high quality realistic one. To this end, we present a deep-learning approach for the automatic enhancement of quadruped animations. Given an initial animation, possibly lacking the subtle details of true quadruped motion and/or containing small errors, our results show that it is possible for a neural network to learn how to add these subtleties and correct errors to produce an enhanced animation while preserving the semantics and context of the initial animation. Our work also has potential uses in other applications, for example, its ability to be used in real-time means it could form part of a quadruped embodiment system.

Motor Babble: Morphology-Driven Coordinated Control of Articulated Characters

Locomotion in humans and animals is highly coordinated, with many joints moving together. Learning similar coordinated locomotion in articulated virtual characters, in the absence of reference motion data, is a challenging task due to the high number of degrees of freedom and the redundancy that comes with it. In this paper, we present a method for learning locomotion for virtual characters in a low dimensional latent space which defines how different joints move together. We introduce a technique called motor babble, wherein a character interacts with its environment by actuating its joints through uncoordinated, low-level (motor) excitations, resulting in a corpus of motion data from which a manifold latent space is extracted. Dimensions of the extracted manifold define a wide variety of synergies pertaining to the character and, through reinforcement learning, we train the character to learn locomotion in the latent space by selecting a small set of appropriate latent dimensions, along with learning the corresponding policy.

PFPN: Continuous Control of Physically Simulated Characters using Particle Filtering Policy Network

Data-driven methods for physics-based character control using reinforcement learning have been successfully applied to generate high-quality motions. However, existing approaches typically rely on Gaussian distributions to represent the action policy, which can prematurely commit to suboptimal actions when solving high-dimensional continuous control problems for highly-articulated characters. In this paper, to improve the learning performance of physics-based character controllers, we propose a framework that considers a particle-based action policy as a substitute for Gaussian policies. We exploit particle filtering to dynamically explore and discretize the action space, and track the posterior policy represented as a mixture distribution. The resulting policy can replace the unimodal Gaussian policy which has been the staple for character control problems, without changing the underlying model architecture of the reinforcement learning algorithm used to perform policy optimization. We demonstrate the applicability of our approach on various motion capture imitation tasks. Baselines using our particle-based policies achieve better imitation performance and speed of convergence as compared to corresponding implementations using Gaussians, and are more robust to external perturbations during character control. Related code is available at: https://motion-lab.github.io/PFPN.

SESSION: Perception and Appearance

Interactive Viewpoint Exploration for Constructing View-Dependent Models

We introduce an interactive method to sequentially find viewpoints for constructing view-dependent models which represent view-specific deformations in classic 2D cartoons [Chaudhuri et al. 2004, 2007; Koyama and Igarashi 2013; Rademacher 1999]. As users design one view-specific model from a single-fixed viewpoint, the system searches successive viewpoints for subsequent modeling and instantly jumps to the next viewpoints. Thereby, the users can efficiently repeat the design process of view-specific deformations until they are satisfied. This method is simple enough to easily implement in an existing modeling system. We conduct a user study with novice and amateur users and confirm that the proposed system is effective for designing view-specific models envisioned by the users.

Perception of Motion Variations in Large-Scale Virtual Human Crowds

Virtual human crowds are regularly featured in movies and video games. With a large number of virtual characters each behaving in their own way, spectacular scenes can be produced. The more diverse the characters and their behaviors are, the more realistic the virtual crowd is expected to be perceived. Hence, creating virtual crowds is a trade-off between the cost associated with acquiring more diverse assets, namely more virtual characters with their animations, and achieving better realism. In this paper, our focus is on the perceived variety in virtual crowd character motions. We present an experiment exploring whether observers are able to identify virtual crowds including motion clones in the case of large-scale crowds (from 250 to 1000 characters). As it is not possible to acquire individual motions for such numbers of characters, we rely on a state-of-the-art motion variation approach to synthesize unique variations of existing examples for each character in the crowd. Participants then compared pairs of videos, where each character was animated either with a unique motion or using a subset of these motions. Our results show that virtual crowds with more than two motions (one per gender) were perceptually equivalent, regardless of their size. We believe these findings can help create efficient crowd applications, and are an additional step into a broader understanding of the perception of motion variety.

Emulating Foveated Path Tracing

At full resolution, path tracing cannot be deployed in real-time based on current graphics hardware due to slow convergence times and noisy outputs, despite recent advances in denoisers. In this work, we develop a perceptual sandbox emulating a foveated path tracer to determine the eccentricity angle thresholds that enable imperceptible foveated path tracing. In a foveated path tracer the number of rays fired can be decreased, and thus performance can be increased. For this study, due to current hardware limitations prohibiting real-time path-tracing for multiple samples-per-pixel, we pre-render image buffers and emulate foveated rendering as a post-process by selectively blending the pre-rendered content, driven by an eye tracker capturing eye motion. We then perform three experiments to estimate conservative thresholds of eccentricity boundaries for which image manipulations are imperceptible. Contrary to our expectation of a single threshold across the three experiments, our results indicated three different average thresholds, one for each experiment. We hypothesise that this is due to the dissimilarity of the methodologies, i.e., A-B testing vs sequential presentation vs custom adjustment of eccentricities affecting the perceptibility of peripheral blur among others. We estimate, for the first time for path tracing, specific thresholds of eccentricity that limit any perceptual repercussions whilst maintaining high performance. We perform an analysis to determine potential computational complexity reductions due to foveation in path tracing. Our analysis shows a significant boost in path-tracing performance (≥ 2x − 3x) using our foveated rendering method as a result of the reduction in the primary rays.

Does Synthetic Voice alter Social Response to a Photorealistic Character in Virtual Reality?

In this paper, we investigate the effect of a realism mismatch in the voice and appearance of a photorealistic virtual character in virtual reality. While many studies have investigated voice attributes for robots, not much is known about the effect voice naturalness has on the perception of realistic virtual characters. We conducted an experiment in Virtual Reality (VR) with over two hundred participants investigating the mismatch between realistic appearance and unrealistic voice on the feeling of presence, and the emotional response of the user to the character expressing a strong negative emotion (sadness, guilt). We predicted that the mismatched voice would lower social presence and cause users to have a negative emotional reaction and feelings of discomfort towards the character. We found that the concern for the virtual character was indeed altered by the unnatural voice, though interestingly it did not affect social presence.

SESSION: Physics

A Constraint-based Formulation of Stable Neo-Hookean Materials

In computer graphics, soft body simulation is often used to animate soft tissue on characters or rubber like objects. Both are highly incompressible, however commonly used models such as co-rotational FEM, show significant volume loss, even under moderate strain. The Neo-Hookean model has recently become popular in graphics. It has superior volume conservation, recovers from inverted states, and does not require a polar decomposition. However, solvers for Neo-Hookean finite-element problems are typically based on Newton methods, which require energy Hessians, their Eigen-decomposition, and sophisticated linear solvers. In addition, minimizing the energy directly in this way does not accommodate modeling incompressible materials since it would require infinitely stiff forces. In this paper we present a constraint-based model of the Neo-Hookean energy. By decomposing the energy into deviatoric (distortional), and hydrostatic (volume preserving) constraints, we can apply iterative constrained-optimization methods that require only first-order gradients. We compare our constraint-based formulation to state-of-the-art force-based solvers and show that our method is often an order of magnitude more efficient for stiff volume preserving materials.

ESPEFs: Exponential Spring Potential Energy Functions for Simulating Deformable Objects

Extended Position-based Dynamics (XPBD) is a well-known method to carry out the simulation of deformable objects. It extends the Position-based Dynamics (PBD) algorithm with a compliance parameter for the material stiffness and implicitly adapts the damping function within the Gauss-Seidel iteration. Although the XPBD method improves upon PBD, it can be cumbersome to fine-tune the required parameters for the desired material properties of the deformable objects. In this paper, we introduce the exponential spring potential energy functions (ESPEFs) for the XPBD simulation of the deformable objects with reduced parameter adjustments. Our method reformulates the well-known spring potential energy functions on an exponential basis which provides more vivid motion during physics-based simulations. ESPEFs enrich the hyperelasticity of the deformable models without any additional effort while the classical methods require cumbersome parameter tunings with trial-and-error tests. To demonstrate the benefits of ESPEFs, we extensively compare our simulation results with the well-known spring models, strain-based dynamics including the constitutive materials and the output of another common iterative solver (Projective Dynamics). The resulting approach is simple, stable, interactive and produces visually pleasing results.

QLB: Collision-Aware Quasi-Newton Solver with Cholesky and L-BFGS for Nonlinear Time Integration

We advocate for the straightforward applications of the Cholesky and the Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithms in the context of nonlinear time integration of deformable objects with dynamic collisions. At the beginning of each time step, we form and factor the Hessian matrix, accounting for all internal forces while omitting the implicit cross-coupling terms from the collision forces between multiple dynamic objects or self collisions. Then during the nonlinear solver iterations of the time step, we implicitly update this Hessian with L-BFGS. This approach is simple to implement and can be readily applied to any nonlinear time integration scheme, including higher-order schemes and quasistatics. We show that this approach works well in a wide range of settings involving complex nonlinear materials, including heterogeneity and anisotropy, as well as collisions, including frictional contact and self collisions.

Catching and Throwing Control of a Physically Simulated Hand

We design a nominal controller for animating an articulated physics-based human arm model, including the hands and fingers, to catch and throw objects. The controller is based on a finite state machine that defines the target poses for proportional-derivative control of the hand, as well as the orientation and position of the center of the palm using the solution of an inverse kinematics solver. We then use reinforcement learning to train agents to improve the robustness of the nominal controller for achieving many different goals. Imitation learning based on trajectories output by a numerical optimization is used to accelerate the training process. The success of our controllers is demonstrated by a variety of throwing and catching tasks, including flipping objects, hitting targets, and throwing objects to a desired height, and for several different objects, such as cans, spheres, and rods. We also discuss ways to extend our approach so that more challenging tasks, such as juggling, may be accomplished.

SESSION: Crowds and Navigation

SNAP:Successor Entropy based Incremental Subgoal Discovery for Adaptive Navigation

Reinforcement learning (RL) has demonstrated great success in solving navigation tasks but often fails when learning complex environmental structures. One open challenge is to incorporate low-level generalizable skills with human-like adaptive path-planning in an RL framework. Motivated by neural findings in animal navigation, we propose a Successor eNtropy-based Adaptive Path-planning (SNAP) that combines a low-level goal-conditioned policy with the flexibility of a classical high-level planner. SNAP decomposes distant goal-reaching tasks into multiple nearby goal-reaching sub-tasks using a topological graph. To construct this graph, we propose an incremental subgoal discovery method that leverages the highest-entropy states in the learned Successor Representation. The Successor Representation encodes the likelihood of being in a future state given the current state and capture the relational structure of states based on a policy. Our main contributions lie in discovering subgoal states that efficiently abstract the state-space and proposing a low-level goal-conditioned controller for local navigation. Since the basic low-level skill is learned independent of state representation, our model easily generalizes to novel environments without intensive relearning. We provide empirical evidence that the proposed method enables agents to perform long-horizon sparse reward tasks quickly, take detours during barrier tasks, and exploit shortcuts that did not exist during training. Our experiments further show that the proposed method outperforms the existing goal-conditioned RL algorithms in successfully reaching distant-goal tasks and policy learning. To evaluate human-like adaptive path-planning, we also compare our optimal agent with human data and found that, on average, the agent was able to find a shorter path than the human participants.

Interactive Simulation of Disease Contagion in Dynamic Crowds

We propose an agent-to-agent contagion-immunity formulation that can simulate detailed COVID-19 spreading within moving crowds. Specifically, we develop a diffusion-based disease contagion model for discrete systems that considers the effect of health interventions, such as social distancing, immunity, and vaccination. We integrate our contagion-immunity formulation with the governing equations of motion for crowd dynamics for investigating the distribution of disease in crowds with different numbers of people. For the same crowd simulation, our model can interactively simulate virus spread for different initial distributions of infected people. To the best of our knowledge, our work is the first to simulate the disease contagion within moving crowds in computer graphics. Our numerical results for the number of infected people in unprotected dense crowds agree with the SIS model, while our model provides richer information for disease spread and shows that vaccination is the best health intervention to prevent infection.

PSM: Parametric Saliency Maps for Autonomous Pedestrians

Modeling visual attention is an important aspect of simulating realistic virtual humans. This work proposes a parametric model and method for generating real-time saliency maps from the perspective of virtual agents which approximate those of vision-based saliency approaches. The model aggregates a saliency score from user-defined parameters for objects and characters in an agent’s view and uses that to output a 2D saliency map which can be modulated by an attention field to incorporate 3D information as well as a character’s state of attentiveness. The aggregate and parameterized structure of the method allows the user to model a range of diverse agents. The user may also expand the model with additional layers and parameters. The proposed method can be combined with normative and pathological models of the human visual field and gaze controllers, such as the recently proposed model of egocentric distractions for casual pedestrians that we use in our results.

A2X: An Agent and Environment Interaction Benchmark for Multimodal Human Trajectory Prediction

In recent years, human trajectory prediction (HTP) has garnered attention in computer vision literature. Although this task has much in common with the longstanding task of crowd simulation, there is little from crowd simulation that has been borrowed, especially in terms of evaluation protocols. The key difference between the two tasks is that HTP is concerned with forecasting multiple steps at a time and capturing the multimodality of real human trajectories. A majority of HTP models are trained on the same few datasets, which feature small, transient interactions between real people and little to no interaction between people and the environment. Unsurprisingly, when tested on crowd egress scenarios, these models produce erroneous trajectories that accelerate too quickly and collide too frequently, but the metrics used in HTP literature cannot convey these particular issues. To address these challenges, we propose (1) the A2X dataset, which has simulated crowd egress and complex navigation scenarios that compensate for the lack of agent-to-environment interaction in existing real datasets, and (2) evaluation metrics that convey model performance with more reliability and nuance. A subset of these metrics are novel multiverse metrics, which are better-suited for multimodal models than existing metrics. The dataset is available at: https://mubbasir.github.io/HTP-benchmark/.