MIG '20: Motion, Interaction and Games

Full Citation in the ACM Digital Library

SECTION: Session 1: Perception

Does scaling player size skew one’s ability to correctly evaluate object sizes in a virtual environment?

This study attempts to evaluate whether a navigation technique based on scaling the user’s avatar impacts the user’s ability to correctly assess the size of virtual objects in a virtual environment. This study was realized during the CERN Open Days with data from 177 participants over eighteen years old. We were able to observe well-established phenomena such as the effect of inter-pupillary distance (IPD) on perception of scale, as well as original results associated with scaling factor and avatar embodiment. We observed that the user is more prone to overestimate object sizes from the Virtual Environment (VE) when provided with an avatar, while scaling the IPD according to the scale of the user’s avatar contributes to a reduction in the overestimation of object sizes within the VE.

Investigating perceptually based models to predict importance of facial blendshapes

Blendshape facial rigs are used extensively in the industry for facial animation of virtual humans. However, storing and manipulating large numbers of facial meshes is costly in terms of memory and computation for gaming applications, yet the relative perceptual importance of blendshapes has not yet been investigated. Research in Psychology and Neuroscience has shown that our brains process faces differently than other objects, so we postulate that the perception of facial expressions will be feature-dependent rather than based purely on the amount of movement required to make the expression. In this paper, we explore the noticeability of blendshapes under different activation levels, and present new perceptually based models to predict perceptual importance of blendshapes. The models predict visibility based on commonly-used geometry and image-based metrics.

Eye Thought You Were Sick! Exploring Eye Behaviors for Cybersickness Detection in VR

Cybersickness induced through Virtual Reality (VR) applications is still one of its main barriers as it can induce unwanted side-effects in players, significantly hindering the overall experience. Despite the wealth of research available on this topic, it is still an unsolved problem. Although previous studies have explored methods of cybersickness mitigation in addition to correlating physiological factors, there has been little research on the potential correlation of eye behavior and cybersickness. Thanks to advances in eye-tracking technology within HMDs, detecting eye behavior has become a much easier process. This paper explores the differences of pupil position and blink rate in relation to cybersickness intensity. The latter is measured through the standard and a customized version of the Simulator Sickness Questionnaire (SSQ). A total of 34 participant data was collected from two separate playing sessions of a VR maze game, where each session presented a unique control scheme.

Performance Is Not Everything: Audio Feedback Preferred Over Visual Feedback for Grasping Task in Virtual Reality

In this work, we investigate the influence that audio and visual feedback have on a manipulation task in virtual reality (VR). Without the tactile feedback of a controller, grasping virtual objects using one’s hands can result in slower interactions because it may be unclear to the user that a grasp has occurred. Providing alternative feedback, such as visual or audio cues, may lead to faster and more precise interactions, but might also affect user preference and perceived ownership of the virtual hands. In this study, we test four feedback conditions for virtual grasping. Three of the conditions provide feedback for when a grasp or release occurs, either visual, audio, or both, and one provides no feedback for these occurrences. We analyze the effect each feedback condition has on interaction performance, measure their effect on the perceived ownership of the virtual hands, and gauge user preference. In an experiment, users perform a pick-and-place task with each feedback condition. We found that audio feedback for grasping is preferred over visual feedback even though it seems to decrease grasping performance, and found that there were little to no differences in ownership between our conditions.

SECTION: Session 2: Physics

CUDA Deformers for Model Reduction

Real-time deformable object simulation is important in interactive applications such as games and virtual reality. One common approach to achieve speed is to employ model reduction, a technique whereby the equations of motion of a deformable object are projected to a suitable low-dimensional space. Improving the real-time performance of model-reduced systems has been the subject of much research. While modern GPUs play an important role in real-time simulation and parallel computing, existing model reduction systems typically utilize CPUs and seldom employ GPUs. We give a method to efficiently employ GPUs for vertex position computation in model-reduced simulations. Our CUDA-based algorithm gives a substantial speedup compared to a CPU implementation, thanks to our system architecture that employs a memory layout friendly to GPU memory, reduces the communication between the CPU and GPU, and enables the CPU and GPU to work in parallel.

ConJac: Large Steps in Dynamic Simulation

We present a new approach that allows large time steps in dynamic simulations. Our approach, ConJac, is based on condensation, a technique for eliminating many degrees of freedom (DOFs) by expressing them in terms of the remaining degrees of freedom. In this work, we choose a subset of nodes to be dynamic nodes, and apply condensation at the velocity level by defining a linear mapping from the velocities of these chosen dynamic DOFs to the velocities of the remaining quasistatic DOFs. We then use this mapping to derive reduced equations of motion involving only the dynamic DOFs. We also derive a novel stabilization term that enables us to use complex nonlinear material models. ConJac remains stable at large time steps, exhibits highly dynamic motion, and displays minimal numerical damping. In marked contrast to subspace approaches, ConJac gives exactly the same configuration as the full space approach once the static state is reached. ConJac works with a wide range of moderate to stiff materials, supports anisotropy and heterogeneity, handles topology changes, and can be combined with existing solvers including rigid body dynamics.

Multi-resolution Clustering for Enhanced Elastic Behavior in Clustered Shape Matching

Clustered shape matching is an approach for physics-based animation of deformable objects, which breaks an object into overlapping clusters of particles. At each timestep, it computes a best-fit rigid transformation between a cluster’s rest state and current particle configuration and Hookean springs are used to pull particles toward desired goal positions. In this paper, we present multi-resolution clustering as an extension to clustered shape matching. We iteratively construct fine-to-coarse sets of clusters and weights over the set of particles and compute dynamics in a single coarse-to-fine pass. We demonstrate that our approach enhances the possible elastic behavior available to artists and provides an intuitive parameterization to blend between stiffness and deformation richness, which are in contention in the traditional clustered shape matching approach that operates at a single spatial scale. We can specify a different stiffness value for each resolution level, where a greater weight at coarser levels result in a stiffer object while a greater weight at finer levels yield richer deformation; we evaluate a number of approaches for choosing these stiffness values and demonstrate the differences in the accompanying video.

Towards Animating Virtual Humans in Flooded Environments

The simulation of virtual humans organized in groups and crowds has been widely explored in the literature. Nevertheless, the simulation of virtual humans that interact with fluids is still incipient. Indeed it is easy to understand that human behavior is different from ordinary rigid bodies when affected by fluids, i.e., on the one hand, agents can try to walk, achieve their goals against fluid forces, trying to survive. On the other hand, humans can also be completely carried by the fluid, depending on the conditions, as a passive rigid body. A challenge in this area is that virtual agent simulation research often focuses on the realism of their trajectories and interaction with the environment, obstacles, and other agents, without considering that agents might evolve into an environment that can take control of their movements and trajectories in certain conditions. In this case, it is essential to note that, with proper integration between agents and fluids, we should be able to simulate agents who can continue walking despite an existing fluid (e.g., a weak fluid stream), walking with an effort to stay in the desired direction (e.g., medium stream), until they are partially or totally carried by a fluid, like a strong flow of water in a river or the sea. The main contribution of our model is to give the first step into simulating the steering behaviors of humans in environments with fluids. We integrate two published methodologies and available source codes in order to create our method. For the motion of virtual humans, we use BioCrowds; and SPlisHSPlasH as a fluid dynamics model. Results indicate that the proposed approach generates coherent behaviors regarding the influence of fluids on people in real events, even if this is not the objective of this paper, because other variables should be incorporated, in cases of serious simulations.

SECTION: Session 3: Crowd & Characters

Appearance Controlled Face Texture Generation for Video Game Characters

Manually creating realistic, digital human heads is a difficult and time-consuming task for artists. While 3D scanners and photogrammetry allow for quick and automatic reconstruction of heads, finding an actor who fits specific character appearance descriptions can be difficult. Moreover, modern open-world videogames feature several thousands of characters that cannot realistically all be cast and scanned. Therefore, researchers are investigating generative models to create heads fitting a specific character appearance description. While current methods are able to generate believable head shapes quite well, generating a corresponding high-resolution and high-quality texture which respects the character’s appearance description is not possible using current state of the art methods.

This work presents a method that generates synthetic face textures under the following constraints: (i) there is no reference photograph to build the texture, (ii) game artists control the generative process by providing precise appearance attributes, the face shape, and the character’s age and gender, and (iii) the texture must be of adequately high resolution and look believable when applied to the given face shape. Our method builds upon earlier deep learning approaches addressing similar problems. We propose several key additions to these methods to be able to use them in our context, specifically for artist control and small training data. In spite of training with a limited amount of training data, just over 100 samples, our model produces realistic textures which comply to a diverse range of skin, hair, lip and iris colors specified through our intuitive description format and augmentation thereof.

A Social Distancing Index: Evaluating Navigational Policies on Human Proximity using Crowd Simulations

The importance of social distancing for public health is well established. However, the policies and regulations regarding occupancy rates have not been designed with this in mind. While there are analytical tools and related measures that are used in practice to evaluate how the design of a built environment serves the needs of its intended occupants, these metrics cannot directly apply to the problem of preventing the spread of infectious diseases such as COVID-19. By using a crowd-based simulator using three levels of behavior and agent control in a given environment, a novel evaluation metric for a space layout can be calculated to reflect the proclivity of maintaining a safe distance throughout the shopping experience. We refer to this metric as the Social Distancing Index (SDI), accounting for the occupancy throughput and number of distance-based violations found. Through a case study of a realistic retail store, we demonstrate the proposed platforms performance and output on multiple scenarios by changing agent-behavior, occupancy rate, and navigational guidelines.

Extreme-Density Crowd Simulation: Combining Agents with Smoothed Particle Hydrodynamics

In highly dense crowds of humans, collisions between people occur often. It is common to simulate such a crowd as one fluid-like entity (macroscopic), and not as a set of individuals (microscopic, agent-based). Agent-based simulations are preferred for lower densities because they preserve the properties of individual people. However, their collision handling is too simplistic for extreme-density crowds. Therefore, neither paradigm is ideal for all possible densities.

In this paper, we combine agent-based crowd simulation with the concept of Smoothed Particle Hydrodynamics (SPH), a particle-based method that is popular for fluid simulation. Our combination augments the usual agent-collision handling with fluid dynamics when the crowd density is sufficiently high. A novel component of our method is a dynamic rest density per agent, which intuitively controls the crowd density that an agent is willing to accept.

Experiments show that SPH improves agent-based simulation in several ways: better stability at high densities, more intuitive control over the crowd density, and easier replication of wave-propagation effects. Our implementation can simulate tens of thousands of agents in real-time. As such, this work successfully prepares the agent-based paradigm for crowd simulation at all densities.

Watch Out! Modelling Pedestrians with Egocentric Distractions

The use of mobile devices is one of the most commonly observed family of distracted behaviours exhibited by pedestrians in urban environments. We develop an event-driven behaviour tree model for distracted pedestrians that includes initiating mobile device use as well as terminating or pausing mobile device use based on internal or external cues to refocus attention. We present a simple, probabilistic attention model for such pedestrians. The proposed model is not meant to be complete. It primarily focuses on computing the probability that a distracted agent looks up, based on the agent’s individual characteristics and the elements in their environment. We condition the potentially attention grabbing elements in the environment on distraction-specific egocentric fields for visual attention. We also propose an oriented ellipse model for capturing the affects of cognitively fuzzy goals during distracted navigation. Our model is simple and intuitively parameterized, and thus can be easily edited and extended.

SECTION: Session 4: Machine Learning

Adult2child: Motion Style Transfer using CycleGANs

Child characters are commonly seen in leading roles in top-selling video games. Previous studies have shown that child motions are perceptually and stylistically different from those of adults. Creating motion for these characters by motion capturing children is uniquely challenging because of confusion, lack of patience and regulations. Retargeting adult motion, which is much easier to record, onto child skeletons, does not capture the stylistic differences. In this paper, we propose that style translation is an effective way to transform adult motion capture data to the style of child motion. Our method is based on CycleGAN, which allows training on a relatively small number of sequences of child and adult motions that do not even need to be temporally aligned. Our adult2child network converts short sequences of motions called motion words from one domain to the other. The network was trained using a motion capture database collected by our team containing 23 locomotion and exercise motions. We conducted a perception study to evaluate the success of style translation algorithms, including our algorithm and recently presented style translation neural networks. Results show that the translated adult motions are recognized as child motions significantly more often than adult motions.

SPNets: Human-like Navigation Behaviors with Uncertain Goals

Most path planning techniques use exact, global information of the environment to make optimal or near-optimal plans. In contrast, humans navigate using only local information, which they must augment with their understanding of typical building layouts to guess what lies ahead, while integrating what they have seen already to form mental representations of building structure. Here, we propose Scene Planning Networks (SPNets), a neural network based approach for formulating the long-range navigation problem as a series of local decisions similar to what humans face when navigating. Agents navigating using SPNets build additive neural representations of previous observations to understand local obstacle structure, and use a network-based planning approach to plan the next steps towards a fuzzy goal region. Our approach reproduces several important aspects of human behavior that are not captured by either full global planning or simple local heuristics.

Deep Integration of Physical Humanoid Control and Crowd Navigation

Many multi-agent navigation approaches make use of simplified representations such as a disk. These simplifications allow for fast simulation of thousands of agents but limit the simulation accuracy and fidelity. In this paper, we propose a fully integrated physical character control and multi-agent navigation method. In place of sample complex online planning methods, we extend the use of recent deep reinforcement learning techniques. This extension improves on multi-agent navigation models and simulated humanoids by combining Multi-Agent and Hierarchical Reinforcement Learning. We train a single short term goal-conditioned low-level policy to provide directed walking behaviour. This task-agnostic controller can be shared by higher-level policies that perform longer-term planning. The proposed approach produces reciprocal collision avoidance, robust navigation, and emergent crowd behaviours. Furthermore, it offers several key affordances not previously possible in multi-agent navigation including tunable character morphology and physically accurate interactions with agents and the environment. Our results show that the proposed method outperforms prior methods across environments and tasks, as well as, performing well in terms of zero-shot generalization over different numbers of agents and computation time.

Learning to Locomote: Understanding How Environment Design Matters for Deep Reinforcement Learning

Learning to locomote is one of the most common tasks in physics-based animation and deep reinforcement learning (RL). A learned policy is the product of the problem to be solved, as embodied by the RL environment, and the RL algorithm. While enormous attention has been devoted to RL algorithms, much less is known about the impact of design choices for the RL environment. In this paper, we show that environment design matters in significant ways and document how it can contribute to the brittle nature of many RL results. Specifically, we examine choices related to state representations, initial state distributions, reward structure, control frequency, episode termination procedures, curriculum usage, the action space, and the torque limits. We aim to stimulate discussion around such choices, which in practice strongly impact the success of RL when applied to continuous-action control problems of interest to animation, such as learning to locomote.

SECTION: Session 5: Games & Storytelling

Collaborative Storytelling with Large-scale Neural Language Models

Storytelling plays a central role in human socializing and entertainment. However, much of the research on automatic storytelling generation assumes that stories will be generated by an agent without any human interaction. In this paper, we introduce the task of collaborative storytelling, where an artificial intelligence agent and a person collaborate to create a unique story by taking turns adding to it. We present a collaborative storytelling system which works with a human storyteller to create a story by generating new utterances based on the story so far. We constructed the storytelling system by tuning a publicly-available large scale language model on a dataset of writing prompts and their accompanying fictional works. We identify generating sufficiently human-like utterances to be an important technical issue and propose a sample-and-rank approach to improve utterance quality. Quantitative evaluation shows that our approach outperforms a baseline, and we present qualitative evaluation of our system’s capabilities.

An interactive staging-and-shooting solver for virtual cinematography

Research in virtual cinematography often narrows the problem down to computing the optimal viewpoint for the camera to properly convey a scene’s content. In contrast we propose to address simultaneously the questions of placing cameras, lights, objects and actors in a virtual environment through a high-level specification. We build on a staging language and propose to extend it by defining complex temporal relationships between these entities. We solve such specifications by designing pruning operators which iteratively reduce the range of possible degrees of freedom for entities while satisfying temporal constraints. Our solver first decomposes the problem by analyzing the graph of relationships between entities and then solves an ordered sequence of sub-problems. Users have the possibility to manipulate the current result for fine-tuning purposes or to creatively explore ranges of solutions while maintaining the relationships. As a result, the proposed system is the first staging-and-shooting cinematography system which enables the specification and solving of spatio-temporal cinematic layouts.

Topology-aware Camera Control for Real-time Applications

Placing and moving virtual cameras in real-time 3D environments is a task that remains complex due to the many requirements which need to be satisfied simultaneously. Beyond the essential features of ensuring visibility and frame composition for one or multiple targets, an ideal camera system should provide designers with tools to create variations in camera placement and motions, and create shots which conform to aesthetic recommendations. In this paper, we propose a controllable process that will assist developers and artists in placing cinematographic cameras and camera paths throughout complex virtual environments, a task that was often manually performed until now. With no specification and no previous knowledge on the events, our tool exploits a topological analysis of the environment to capture the potential movements of the agents, highlight linearities and create an abstract skeletal representation of the environment. This representation is then exploited to automatically generate potentially relevant camera positions and trajectories organized in a graph representation with visibility information. At run-time, the system can then efficiently select appropriate cameras and trajectories according to artistic recommendations. We demonstrate the features of the proposed system with realistic game-like environments, highlighting the capacity to analyze a complex environment, generate relevant camera positions and camera tracks, and run efficiently with a range of different camera behaviours.

Foldit Drug Design Game Usability Study: Comparison of Citizen and Expert Scientists

In building a new drug design mode for the popular citizen scientist game Foldit, we focus on creating an easy-to-use and intuitive interface to confer complex scientific concepts to citizen scientist players. We hypothesize that to be efficient in the hands of citizen scientists such an interface will look different from well-established drug-design software used by experts. We used the relaxed think-aloud method to compare citizen and expert scientists working with our prototype interface for Foldit Drug Design Mode (FDDM). First, we tested if the two groups are providing different feedback when it comes to the usability of the prototype interface. Second, we investigated how the difference between the two groups might inform a new game design. As expected, the results confirm that experienced scientists differ from citizen scientists in engaging their background knowledge when interacting with the game. We then provided a prioritization list of background knowledge employed by the expert scientists to derive design suggestions for FDDM.