MIG '19: Motion, Interaction and Games

Full Citation in the ACM Digital Library

SESSION: Deep Learning for Posture, Gesture and Gait

DSPP: Deep Shape and Pose Priors of Humans

The prior knowledge of real human body shapes and poses is fundamental in computer games and animation (e.g. performance capture). Linear subspaces such as the popular SMPL model have a limited capacity to represent the large geometric variations of human shapes and poses. What is worse is that random sampling from them often produces non-realistic humans because the distribution of real humans is more likely to concentrate on a non-linear manifold instead of the full subspace. Towards this problem, we propose to learn human shape and pose manifolds using a more powerful deep generator network, which is trained to produce samples that cannot be distinguished from real humans by a deep discriminator network. In contrast to previous work that learn both the generator and discriminator in the original geometry spaces, we learn them in the more representative latent spaces discovered by a shape and a pose auto-encoder network respectively. Random sampling from our priors produces higher-quality human shapes and poses. The capacity of our priors is best applied to applications such as virtual human synthesis in games.

Natural Posture Blending Using Deep Neural Networks

Motion synthesis approaches are widely used throughout different domains such as gaming, virtual crowds or simulation within production industries. With ongoing digitization, these systems are becoming increasingly indispensable. In general, the utilized technologies can be subdivided in data-driven and model-based approaches, whereas each category has its advantages and disadvantages. In the field of data-driven motion synthesis, recent works present deep learning based approaches for full body motion synthesis, which offer great potential for modeling natural motions, while considering heterogeneous influence factors. In this paper, we propose a novel deep blending approach for blending collision-free and feasible postures between a humanoid start and target posture. The network has been trained utilizing the CMU database to generate feasible postures. The proposed approach can be utilized for posture-blending, motion synthesis with known start and end-posture or key-frame animation. A preliminary evaluation indicates the validity and the potential of the novel approach.

Multi-objective adversarial gesture generation

Applications for conversational virtual agents are on the rise, but producing realistic non-verbal behavior for spoken utterances remains an unsolved problem. We explore the use of a generative adversarial training paradigm to map speech to 3D gesture motion. We define the gesture generation problem as a series of smaller sub-problems, including plausible gesture dynamics, realistic joint configurations, and diverse and smooth motion. Each sub-problem is monitored by separate adversaries. For the problem of enforcing realistic gesture dynamics in our output, we train a classifier to automatically detect gesture phases. We find adversarial training to be superior to the use of a standard regression loss and discuss the benefit of each of our training objectives. We recorded a dataset of over 6 hours of natural, unrehearsed speech with high-quality motion capture, as well as audio and video recording.

Data-driven Gaze Animation using Recurrent Neural Networks

We present a data-driven gaze animation method using recurrent neural networks. The neural network is trained with motion capture data including different poses such as standing, sitting, and lying down and is able to learn the constraints related with each particular pose. A simplified version of the neural network is also presented for Level of Detail (LOD) animation. We compare various neural network architectures and show that our method produces natural gaze motion in real-time. Results from a user study conducted among game industry professionals shows that our method has better perceived naturalness compared to the procedural gaze animation system of a well-known game company. Our approach is the first one to show the feasibility of gaze motions using deep neural networks.

SESSION: Game and VR

A Mobile Game for Crowdsourced Molecular Docking Pathways

Mobile gaming has become a popular pastime in recent years making it a viable avenue for crowdsourcing data collection with scientific games. We present one such application of scientific games on mobile devices by adapting an existing molecular docking game with a user interface suitable for this platform. In this initial study, players explore the state space of molecular interactions, and data is collected to be used in molecular motion planning. The results were compared to states collected from an automated Gaussian sampler commonly used in motion planning. Players were able to contribute states that could aid planners in finding molecular motion pathways with energies lower than the automated sampler. However, there remain challenges to the players’ ability to reach states in difficult areas due to the lack of molecular flexibility and guidance towards exploration over simply finding the lowest energy state.

A Creative Game Design and Programming App

We present a game creation app for tablets that builds on the popularity of video games while focusing attention on creativity and problem solving. With our app, users design and build a game by first drawing characters and objects on paper with markers and crayons, and then automatically integrate them with our app. An event-based visual programming language allows to program the game logic. In the spirit of creative play, users can jump at any point between the design, programming and test phases in order to realize their imagination. We evaluate our app with a user study to understand how gender and the use of self-made drawings influence the type of games users create and their state of flow during the process. Our results show that letting users draw their own game elements can lead to higher engagement. We also show that girls tend to spend more time programming and less time testing compared to boys, and that our app can help girls gain self-confidence.

The Case for Haptic Props: Shape, Weight and Vibro-tactile Feedback

The use of haptic props in a virtual environment setting is purported to improve both user immersion and task performance. While the efficacy of various forms of haptics has been tested through user experiments, this is not the case for hand-held tool props, an important class of input device with both gaming and non-gaming applications. From a cost and complexity of implementation perspective it is also worth investigating the relative benefits of the different types of passive and active haptics that can be incorporated into such props.

Accordingly, in this paper we present the results of a quantitative user experiment (n = 42) designed to assess a typical VR controller against passive, weighted, and active-haptic versions of a tracked prop, measured according to game experience, performance, and stance adopted by participants. The task involved playing a VR baseball game and the prop was a truncated baseball bat.

We found a statistically significant improvement (at α = 0.05) with medium to large effect size (r > 0.38) for certain aspects of game experience (competence, immersion, flow, positive affect), performance (mean hit distance) and pose (two-handed grip) for the weighted prop over a generic controller, and in many cases over the unweighted passive prop as well. There was no significant difference between our weighted prop and the active-haptic version. This suggests that, for batting and striking tasks, tool props with passive haptics improve user experience and task performance but only if they match the weight of the original real-world tool, and that such weighting is more important than simple vibro-tactile style force-feedback.

Elicitation Study of Body Gestures for Locomotion in HMD-VR Interfaces in a Sitting-Position

Proxy gestures have proven to be a powerful tool and widely used for desktop and smartphone based user interactions. However, regarding its use for virtual travel in Virtual environments (VEs), specific limitation arises concerning the gestures naturalness, intuitiveness and the fatigue involved while performing the gesture in the real space. The proxy gestures must be natural, demand less effort and have the ability to move long virtual distances without colliding with the real-world boundaries. In this paper, we present a gesture elicitation study to find the most natural and intuitive body-gestures for the virtual travel task in three different VEs in a sitting position. This paper discusses our two experiments. In experiment 1, we extract the most natural and intuitive gesture by asking participants to perform gestures for virtual travel in three different VEs including selection and manipulation of objects in different difficulty levels. This is followed by the experiment 2, where a new group of 40 participants evaluated the extracted gestures based on appropriateness, ease of use of the gesture, effort and user preference. We identified the leaning gesture suitable for the virtual travel task in VE1 and VE3 and the pointing gesture suitable for the VE2. We discuss the results and qualitative findings of both experiments in this paper.

SESSION: Animation Capture and Authoring

Robust Marker Trajectory Repair for MOCAP using Kinematic Reference

Processing motion capture data from optical markers for use in computer animations presents numerous technical challenges. Artifacts caused by noise, marker swaps, and marker occlusions often require manual intervention of a professionally trained marker tracking artist that spends large amounts of time and effort fixing these issues. Existing automatic solutions that attempt to fix marker data lack robustness due to either failing to properly detect and fix marker paths, or generating solutions that are challenging to integrate within current animation pipelines. In this paper, we present a method that robustly identifies invalid marker paths, removes the associated segments and generates new kinematically correct paths. We start by comparing the kinematic solutions generated by commercial software against the one generated by the state-of-the-art methods, using this information to determine which animation keyframes are invalid. Subsequently, we regenerate marker paths from the neural network based method  [Holden 2018] and use a sophisticated marker filling algorithm to combine them with the original marker paths at sections where we detect the original data to be invalid. Our method outperforms alternatives by generating solutions that are both closer to the ground truth and more robust, allowing for manual intervention if required.

Spatial Motion Doodles: Sketching Animation in VR Using Hand Gestures and Laban Motion Analysis

We present a method for easily drafting expressive character animation by playing with instrumented rigid objects. We parse the input 6D trajectories (position and orientation over time) – called spatial motion doodles – into sequences of actions and convert them into detailed character animations using a dataset of parameterized motion clips which are automatically fitted to the doodles in terms of global trajectory and timing. Moreover, we capture the expressiveness of user-manipulation by analyzing Laban effort qualities in the input spatial motion doodles and transferring them to the synthetic motions we generate. We validate the ease of use of our system and the expressiveness of the resulting animations through a series of user studies, showing the interest of our approach for interactive digital storytelling applications dedicated to children and non-expert users, as well as for providing fast drafting tools for animators.

Parameterized Animated Activities

This work addresses the development of a character animation editing method that accommodates animation changes while preserving the animator’s original artistic intent. Our goal is to give the artist control over the automatic editing of animations by extending them with artist-defined metadata. We propose a metadata representation that describes which aspects of an animation can be varied. To make the authoring process easier, we have developed an interface for specifying the metadata. Our method extracts a collection of trajectories of both effectors and objects for the animation. We approximate and parameterize the trajectories with a series of cubic Bézier curves. Then, we generate a set of high-level parameters for editing which are related to trajectory deformations. The only possible deformations are those that preserve the fine structure of the original motion. From the trajectories, we use inverse kinematics to generate a new animation that conforms to the user’s edits while preserving the overall character of the original.

An automatic tool to facilitate authoring animation blending in game engines

Achieving realistic virtual humans is crucial in virtual reality applications and video games. Nowadays there are software and game development tools, that are of great help to generate and simulate characters. They offer easy to use GUIs to create characters by dragging and drooping features, and making small modifications. Similarly, there are tools to create animation graphs and setting blending parameters among others. Unfortunately, even though these tools are relatively user friendly, achieving natural animation transitions is not straight forward and thus non-expert users tend to spend a large amount of time to generate animations that are not completely free of artefacts. In this paper we present a method to automatically generate animation blend spaces in Unreal engine, which offers two advantages: the first one is that it provides a tool to evaluate the quality of an animation set, and the second one is that the resulting graph does not depend on user skills and it is thus not prone to user errors.

SESSION: Perception

Social presence and place illusion are affected by photorealism in embodied VR

Photorealism of virtual characters and environments is becoming more achievable in Virtual Reality (VR). With this development comes the need for further investigation into the role it plays on people’s responses to characters. Whether or not these improvements make any difference to the perception and response towards the virtual character was the central question of the present study. In order to evaluate this, we designed a within-subjects experiment, where participants were embodied in a high-fidelity virtual body in VR and were observing an animated character, rendered in photorealistic and simplified style. The character displayed a simple interactive behaviour with the participant (eye-gaze) and was designed to express an emotional reaction to induce an empathetic response in participants. Our goal was to evaluate if photorealism alone is enough to increase self-reported and behavioural signs (interpersonal distance or proximity) of social presence, place illusion, and empathetic concern for the character in virtual reality. This was found to be the case for self-reported social presence and place illusion, while empathetic concern depended on the order of condition. behavioural measure proximity was not affected by render style.

The Effect of Multimodal Emotional Expression and Agent Appearance on Trust in Human-Agent Interaction

Emotional expressivity can boost trust in human-human and human-machine interaction. As a multimodal phenomenon, previous research argued that a mismatch in the expressive channels provides evidence of joint audio-video emotional processing. However, while previous work studied this from the point of view of emotion recognition and processing, not much is known about what effect a multimodal agent would have on a human-agent interaction task. Also, agent appearance could influence this interaction too. Here we manipulated the agent’s multimodal emotional expression (”smiling face” and ”smiling voice”, or both) and agent type (photorealistic or cartoon-like virtual human) and assessed people’s trust toward this agent. We measured trust using a mixed-methods approach, combining behavioural data from a survival task, questionnaire ratings and qualitative comments. These methods gave different results: while people commented on the importance of emotional expressivity in the agent’s voice, this factor had limited influence on trusting behaviours; while people rated the cartoon-like agent on several traits higher than the photorealistic one, the agent’s style also was not the most influential feature on people’s trusting behaviour. These results highlight the contribution of a mixed-methods approach in human-machine interaction, as both explicit and implicit perception and behaviour will contribute to the success of the interaction.

Identifying Indoor Navigation Landmarks Using a Hierarchical Multi-Criteria Decision Framework

Landmarks play a vital role in human wayfinding by providing the structure for mental spatial representations and indicating locations with which to orient. Less research effort has been allocated towards automated landmark identification in indoor environments despite a growing interest in indoor navigation in the scientific community. In this paper, we propose a computational framework to identify indoor landmarks that is based on a hierarchical multi-criteria decision model and grounded in theories of spatial cognition and human information processing. Our model of landmark salience is represented as a hierarchical integration process of low-level features derived from a three-part, higher-level, salience vector (i.e., cognitive, spatial, and subjective salience). We use a fuzzy hierarchical composite-weighted (objective and subjective) Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) to derive the rankings for identified objects at decision points (i.e., intersections). The top N objects are then selected and compared to a list of landmarks derived from an eye-tracking based virtual reality (VR) experiment. A substantial overlap of 79% was observed between these two lists. The proposed framework is capable of reliably and accurately detecting indoor landmarks, which can be employed in the development of landmark-based robot/autonomous agent motion and indoor guidance systems.

SESSION: Sound

Animation Synthesis Triggered by Vocal Mimics

We propose a method leveraging the naturally time-related expressivity of our voice to control an animation composed of a set of short events. The user records itself mimicking onomatopoeia sounds such as ”Tick”, ”Pop”, or ”Chhh” which are associated with specific animation events. The recorded soundtrack is automatically analyzed to extract every instant and types of sounds. We finally synthesize an animation where each event type and timing correspond with the soundtrack. In addition to being a natural way to control animation timing, we demonstrate that multiple stories can be efficiently generated by recording different voice sequences. Also, the use of more than one soundtrack allows us to control different characters with overlapping actions.

Procedural Sound Generation for Soft Bodies in Video Games

We propose a real-time, data-driven method to synthesize the sound of typical soft bodies (such as cloth and rope) in video games. For a given soft body, we perform a geometric analysis of its shape at each frame. These data are used to calculate a variety of motion events at run time that would produce sounds. We then record a database of sounds, which contains sequences of segmented sound units. Finally, we use a concatenative synthesis method to select and synthesize the actual soft-body sounds according to the extracted motion signals. Our approach is more computationally efficient compared to existing soft-body sound synthesis methods and is compatible with any particle-based soft-body physics in video games. We implement our method in Unreal Engine 4 and demonstrate its efficiency with several examples.

Automatic Sign Dance Synthesis from Gesture-based Sign Language

Automatic dance synthesis has become more and more popular due to the increasing demand in computer games and animations. Existing research generates dance motions without much consideration for the context of the music. In reality, professional dancers make choreography according to the lyrics and music features. In this research, we focus on a particular genre of dance known as sign dance, which combines gesture-based sign language with full body dance motion. We propose a system to automatically generate sign dance from a piece of music and its corresponding sign gesture. The core of the system is a Sign Dance Model trained by multiple regression analysis to represent the correlations between sign dance and sign gesture/music, as well as a set of objective functions to evaluate the quality of the sign dance. Our system can be applied to music visualization, allowing people with hearing difficulties to understand and enjoy music.

SESSION: Learning to Move

On Learning Symmetric Locomotion

Human and animal gaits are often symmetric in nature, which points to the use of motion symmetry as a potentially useful source of structure that can be exploited for learning. By encouraging symmetric motion, the learning may be faster, converge to more efficient solutions, and be more aesthetically pleasing. We describe, compare, and evaluate four practical methods for encouraging motion symmetry. These are implemented via particular choices of structure for the policy network, data duplication, or via the loss function. We experimentally evaluate the methods in terms of learning performance and achieved symmetry, and provide summary guidelines for the choice of symmetry method. We further describe some practical and conceptual issues that arise. Because similar implementation choices exist for other types of inductive biases, the insights gained may also be relevant to other learning problems with applicable symmetry abstractions.

Low Dimensional Motor Skill Learning Using Coactivation

We propose an approach for motor skill learning of highly articulated characters based on the systematic exploration of low-dimensional joint coactivation spaces. Through analyzing human motion, we first show that the dimensionality of many motion tasks is much smaller than the full degrees of freedom (DOFs) of the character. Indeed, joint motion appears organized across DOFs, with multiple joints moving together and working in synchrony. We exploit such redundancy for character control by extracting task-specific joint coactivations from human recorded motion, capturing synchronized patterns of simultaneous joint movements that effectively reduce the control space across DOFs. By learning how to excite such coactivations using deep reinforcement learning, we are able to train humanlike controllers using only a small number of dimensions. We demonstrate our approach on a range of motor tasks and show its flexibility against a variety of reward functions, from minimalistic rewards that simply follow the center-of-mass of a reference trajectory to carefully shaped ones that fully track reference characters. In all cases, by learning a 10-dimensional controller on a full 28 DOF character, we reproduce high-fidelity locomotion even in the presence of sparse reward functions.

Self-Imitation Learning of Locomotion Movements through Termination Curriculum

Animation and machine learning research have shown great advancements in the past decade, leading to robust and powerful methods for learning complex physically-based animations. However, learning can take hours or days, especially if no reference movement data is available. In this paper, we propose and evaluate a novel combination of techniques for accelerating the learning of stable locomotion movements through self-imitation learning of synthetic animations. First, we produce synthetic and cyclic reference movement using a recent online tree search approach that can discover stable walking gaits in a few minutes. This allows us to use reinforcement learning with Reference State Initialization (RSI) to find a neural network controller for imitating the synthesized reference motion. We further accelerate the learning using a novel curriculum learning approach called Termination Curriculum (TC), that adapts the episode termination threshold over time. The combination of the RSI and TC ensures that simulation budget is not wasted in regions of the state space not visited by the final policy. As a result, our agents can learn locomotion skills in just a few hours on a modest 4-core computer. We demonstrate this by producing locomotion movements for a variety of characters.

SESSION: Geometry for Animation

Fast Projective Skinning

We present a novel physics-based character skinning approach that improves the recent Projective Skinning in terms of animation quality and computational performance. Our method provides physically plausible animations, dynamic secondary motion effects, and global collision handling in a real-time skinning simulation. We achieve this through a custom-tailored GPU implementation of the underlying projective dynamics simulation and a high-quality upsampling from the simulation mesh to the high-resolution visualization mesh based on quadratic moving least squares.

Spring Rigs for Skinning

Animation tools have benefited greatly from advances in skinning and surface deformation techniques, yet it still remains difficult to author articulated character animations that display the free and highly expressive shape change that characterize hand-drawn animation. We present a new skinning representation that allows skeletal deformation and more flexible shape control to be combined in a single framework, along with an intuitive, sketch-based interface. Our approach offers the convenience of skeletal control and smooth skinning with the functionality to embed surface deformation and animation as a core component of the skinning technique. The approach binds vertices to attachment points on the skeleton, defining a vector from bone to surface. Three types of springs are defined: intervertex springs help maintain surface relationships, springs from vertices to the attachment point help maintain appropriate bone offsets, and torsion springs around these attachment vectors help with deformation control as bones rotate. Edits to the mesh surface can also be represented by varying the radial length and direction of these vectors, enabling a new range of expressive power. Use of sketch-based interfaces and graphics hardware make both skeletal and mesh deformation simple to control and fast enough for interactive use.

Contact Preserving Shape Transfer For Rigging-Free Motion Retargeting

Retargeting a motion from a source to a target character is an important problem in computer animation, as it allows to reuse existing rigged databases or transfer motion capture to virtual characters. Surface based pose transfer is a promising approach to avoid the trial-and-error process when controlling the joint angles. The main contribution of this paper is to investigate whether shape transfer instead of pose transfer would better preserve the original contextual meaning of the source pose. To this end, we propose an optimization-based method to deform the source shape+pose using three main energy functions: similarity to the target shape, body part volume preservation, and collision management (preserve existing contacts and prevent penetrations). The results show that our method is able to retarget complex poses, including several contacts, to very different morphologies. In particular, we introduce new contacts that are linked to the change in morphology, and which would be difficult to obtain with previous works based on pose transfer that aim at distance preservation between body parts. These preliminary results are encouraging and open several perspectives, such as decreasing computation time, and better understanding how to model pose and shape constraints.

A Robust Interactive Facial Animation Editing System

Over the past few years, the automatic generation of facial animation for virtual characters has garnered interest among the animation research and industry communities. Recent research contributions leverage machine-learning approaches to enable impressive capabilities at generating plausible facial animation from audio and/or video signals. However, these approaches do not address the problem of animation edition, meaning the need for correcting an unsatisfactory baseline animation or modifying the animation content itself. In facial animation pipelines, the process of editing an existing animation is just as important and time-consuming as producing a baseline. In this work, we propose a new learning-based approach to easily edit a facial animation from a set of intuitive control parameters. To cope with high-frequency components in facial movements and preserve a temporal coherency in the animation, we use a resolution-preserving fully convolutional neural network that maps control parameters to blendshapes coefficients sequences. We stack an additional resolution-preserving animation autoencoder after the regressor to ensure that the system outputs natural-looking animation. The proposed system is robust and can handle coarse, exaggerated edits from non-specialist users. It also retains the high-frequency motion of the facial animation. The training and the tests are performed on an extension of the B3D(AC)23032 database [Fanelli et al. 2010], that we make available with this paper at http://www.rennes.centralesupelec.fr/biwi3D.

SESSION: Physically-based Simulation

Volume Maps: An Implicit Boundary Representation for SPH

In this paper, we present a novel method for the robust handling of static and dynamic rigid boundaries in Smoothed Particle Hydrodynamics (SPH) simulations. We build upon the ideas of the density maps approach which has been introduced recently by Koschier and Bender. They precompute the density contributions of solid boundaries and store them on a spatial grid which can be efficiently queried during runtime. This alleviates the problems of commonly used boundary particles, like bumpy surfaces and inaccurate pressure forces near boundaries. Our method is based on a similar concept but we precompute the volume contribution of the boundary geometry and store it on a grid. This maintains all benefits of density maps but offers a variety of advantages which are demonstrated in several experiments. Firstly, in contrast to the density maps method we can compute derivatives in the standard SPH manner by differentiating the kernel function. This results in smooth pressure forces, even for lower map resolutions, such that precomputation times and memory requirements are reduced by more than two orders of magnitude compared to density maps. Furthermore, this directly fits into the SPH concept so that volume maps can be seamlessly combined with existing SPH methods. Finally, the kernel function is not baked into the map such that the same volume map can be used with different kernels. This is especially useful when we want to incorporate common surface tension or viscosity methods that use different kernels than the fluid simulation.

Global Momentum Preservation for Position-based Dynamics

Position-based dynamics has emerged as an exceedingly popular approach for animating soft body dynamics. Unfortunately, the basic approach suffers from artificial loss of angular momentum. We propose a simple approach to preserve global linear and angular momenta of bodies by directly tracking these quantities and adjusting velocities to ensure they are preserved. This approach entails negligible computational cost, requires less than 25 lines of code, and exactly preserves global linear and angular momenta.

High fidelity simulation of corotational linear FEM for incompressible materials

We present a novel method of simulating incompressible materials undergoing large deformation without locking artifacts. We apply it for simulating silicone soft robots with a Poisson ratio close to 0.5. The new approach is based on the mixed finite element method (FEM) using a pressure-displacement formulation; the deviatoric deformation is still handled in a traditional fashion. We support large deformations without volume increase using the corotational formulation of linear elasticity. Stability is ensured by an implicit integration scheme which always reduces to a sparse linear system. For even more deformation accuracy we support higher order simulation through the use of Bernstein-Bézier polynomials.

Early Termination of Conjugate Gradients for Corotated Finite Elements

Since the introduction of the conjugate gradient method to computer graphics, researchers have largely treated it as a black box. In particular, an arbitrary small value is chosen for the tolerance and the method is run to convergence. In the context of soft body animation, this approach results in significant wasted computation and has led researchers to consider alternative, more complex, and less versatile approaches. In this paper we argue that in the context of corotational finite elements, less than 10 iterations can give a good enough solution and substantial savings of computational cost. We examine the use of different preconditioners for conjugate gradient including the mass and Jacobi matrices, as well as the use of different initial guesses. We show that for our examples an initial guess of the previous velocity and the Jacobi preconditioner works best.

SESSION: Moving with Style

Motion Adaptation with Cascaded Inequality Tasks

A clip of character motion can be adapted to a change in environment or to another character of a different body size via a numerical optimization with several tasks including the objective of movement and physical constraints. Conventional methods, however, lack the design flexibility of such adaptation tasks because of the simple problem formulation. We propose a motion adaptation framework based on a cascaded series of quadratic programs. Our system introduces a layered structure of strictly prioritized tasks, each layer of which comprises arbitrary types of equality and inequality tasks. The cascaded solver identifies the optimal solution in each layer without affecting the fulfillment of the higher layer tasks. The stable computation of the cascaded optimization supports the intuitive design of the spacetime tasks even for novice users. The capability of our method was demonstrated through several experiments of motion adaptation with prioritized inequality tasks, such as environmental adaptation and adaptation of interactive behavior between two characters.

Learning a Continuous Control of Motion Style from Natural Examples

The simulation of humanoid avatars is relevant for a multitude of applications, such as movies, games, simulations for autonomous vehicles, virtual avatars and many more. In order to achieve the simulation of realistic and believable characters, it is important to simulate motion with the natural motion style matching the character’s characteristic. A female avatar, for example, should move in a female style and different characters should vary in their expressiveness of this style. However, the manual definition, as well as the acting of a natural female or male style, is non-trivial. Previous work on style transfer is insufficient, as the style examples are not necessarily a natural depiction of female or male locomotion. We propose a novel data-driven method to infer the style information based on individual samples of male and female motion capture data. For this purpose, the data of 12 female and 12 male participants was captured in an experimental setting. A neural network based motion model is trained for each participant and the style dimension is learned in the latent representation of these models. Thus a linear style model is inferred on top of the motion models. It can be utilized to synthesize network models of different style expressiveness on a continuous scale while retaining the performance and content of the original network model. A user study supports the validity of our approach while highlighting issues with simpler approaches to infer the style.

Stylistic Locomotion Modeling and Synthesis using Variational Generative Models

We propose a novel approach to create generative models for distinctive styles of locomotion for humanoid characters. Our approach only requires a single or a few style examples and a neutral motion database. We are inspired by the observation that human styles can be easily distinguished from a few examples. However, learning a generative model for natural human motions which can display huge amounts of variations and randomness would require a lot of training data. Furthermore, it would require considerable efforts to create such a large motion database for each style. One solution for that is motion style transfer, which provides the possibility of converting the content of the motion from one style to the other. Typically style transfer focuses on transferring the content motion to target style explicitly. We propose a variational generative model to combine the large variation in neutral motion database and style information from a limited number of examples. We formulate the style motion modeling as a conditional distribution learning problem and style transfer is implicitly applied during the model learning process. A conditional variational autoencoder (CVAE) is applied to learn the distribution and stylistic examples are used as constraints. We demonstrate that our approach can generate any number of natural-looking, various human motions with a similar style to the target.

SESSION: Autonomous Agents and Crowds

Connecting Global and Local Agent Navigation via Topology

We present a novel topology-driven method for improving the navigation of agents in virtual environments. In agent-based crowd simulations, the combination of global path planning and local collision avoidance can cause conflicts and undesired motion. These conflicts are related to the decisions to pass obstacles or agents on certain sides. In this paper, we define an agent’s navigation behavior as a topological strategy amidst obstacles and other agents. We show how to extract such a strategy from a global path and from a local velocity. Next, we propose a simulation framework that computes these strategies for path planning, path following, and collision avoidance. By detecting conflicts between strategies, we can decide reliably when and how an agent should re-plan an alternative path. As such, this work bridges a long-existing gap between global and local planning. Experiments show that our method can improve the behavior of agents while preserving real-time performance. It can be applied to many agent-based simulations, regardless of their specific navigation algorithms. The strategy concept is also suitable for explicitly sending agents in particular directions.

Attracted by light: vision-based steering virtual characters among dark and light obstacles

This paper introduces the use of numerical optical flow (OF) in vision-based steering techniques - that control characters locomotion trajectories by using a simulation of their visual perception. In contrast with synthetic OF that was previously used, numerical OF is sensitive to the contrast of objects, and provides, for example, uncertain results in dark areas. Thus, we here propose a locomotion control technique which is robust to such uncertainty: dark areas in the scene are processed as obstacles, that however may be traversed in case of necessity. As demonstrated in various scenarios, this tends to make character avoiding darkest areas, or traversing them more carefully, as it can be observed for real humans.

Joint Exploration and Analysis of High-Dimensional Design–Occupancy Templates

Crowd simulations provide a practical approach to evaluate building design alternatives with respect to human-centric criteria, such as evacuation times and flow in case of emergency scenarios. Coupled with Building Information Modeling (BIM) tools, they support architects’ iterative exploration of design alternatives. However, methods based on manually configuring a design and a corresponding simulation are not practical for exploring the potentially very large number of design solutions that satisfy human-centric design goals and requirements. Often, for practical reasons, designers may consider standard crowd configurations which do not capture the behavior of diverse occupants that may exhibit different locomotion abilities, movement patterns, and social behaviors. We posit that a joint exploration of high-dimensional building design and occupancy features is necessary to more accurately capture the mutual relations between buildings and the behavior of their occupants. To test this hypothesis, we conducted a series of experiments to automatically explore joint high dimensional design–occupancy patterns using an unsupervised pattern recognition technique (i.e. K-MEANS). We demonstrate that joint design–occupancy explorations provide more accurate results compared with sequential exploration processes that consider default design or crowd features, despite the longer computational times to simulate a large number of solutions. The findings of this case study have practical applications to the design of next-generation design exploration tools that support human-centric analyses in architectural design.

Scenario Generalization of Data-driven Imitation Models in Crowd Simulation

Crowd simulation, the study of the movement of multiple agents in complex environments, presents a unique application domain for machine learning. One challenge in crowd simulation is to imitate the movement of expert agents in highly dense crowds. An imitation model could substitute an expert agent if the model behaves as good as the expert. This will bring many exciting applications. However, we believe no prior studies have considered the critical question of how training data and training methods affect imitators when these models are applied to novel scenarios. In this work, a general imitation model is represented by applying either the Behavior Cloning (BC) training method or a more sophisticated Generative Adversarial Imitation Learning (GAIL) method, on three typical types of data domains: standard benchmarks for evaluating crowd models, random sampling of state-action pairs, and egocentric scenarios that capture local interactions. Simulated results suggest that (i) simpler training methods are overall better than more complex training methods, (ii) training samples with diverse agent-agent and agent-obstacle interactions are beneficial for reducing collisions when the trained models are applied to new scenarios. We additionally evaluated our models in their ability to imitate real world crowd trajectories observed from surveillance videos. Our findings indicate that models trained on representative scenarios generalize to new, unseen situations observed in real human crowds.

SESSION: Poster Abstracts

3D Car Shape Reconstruction from a Single Sketch Image

Efficient car shape design is a challenging problem in both the automotive industry and the computer animation/games industry. In this paper, we present a system to reconstruct the 3D car shape from a single 2D sketch image. To learn the correlation between 2D sketches and 3D cars, we propose a Variational Autoencoder deep neural network that takes a 2D sketch and generates a set of multi-view depth & mask images, which are more effective representation comparing to 3D mesh, and can be combined to form the 3D car shape. To ensure the volume and diversity of the training data, we propose a feature-preserving car mesh augmentation pipeline for data augmentation. Since deep learning has limited capacity to reconstruct fine-detail features, we propose a lazy learning approach that constructs a small subspace based on a few relevant car samples in the database. Due to the small size of such a subspace, fine details can be represented effectively with a small number of parameters. With a low-cost optimization process, a high-quality car with detailed features is created. Experimental results show that the system performs consistently to create highly realistic cars of substantially different shape and topology, with a very low computational cost.

A VR Game-based System for Multimodal Emotion Data Collection

The rising popularity of learning techniques in data analysis has recently led to an increased need of large-scale datasets. In this study, we propose a system consisting of a VR game and a software platform designed to collect the player’s multimodal data, synchronized with the VR content, with the aim of creating a dataset for emotion detection and recognition. The game was implemented ad-hoc in order to elicit joy and frustration, following the emotion elicitation process described by Roseman’s appraisal theory. In this preliminary study, 5 participants played our VR game along with pre-existing ones and self-reported experienced emotions.

An interactive motion analysis framework for diagnosing and rectifying potential injuries caused through resistance training

With the rapid increase in individuals participating in resistance training activities, the number of injuries pertaining to these activities has also grown just as aggressively. Diagnosing the causes of injuries and discomfort requires a large amount of resources from highly experienced physiotherapists. In this paper, we propose a new framework to analyse and visualize movement patterns during performance of four major compound lifts. The analysis generated will be used to efficiently determine whether the exercises are being performed correctly, ensuring anatomy remains within its functional range of motion, in order to prevent strain or discomfort that may lead to injury.

Dancing ICE: A Rhythm Game to Control the Amount of Movement Through Pre-Recorded Healthy Moves

This paper presents a motion-based rhythm game that facilitates rehabilitation at home. In this game, the player has to collect procedurally generated nodes, based on a pool of pre-recorded moves which can be modified by medical experts. A targeted percentage of movement can be specified for each part of the body; the game will generate a level that makes the player’s movement match the targeted percentages.

Emotion Transfer for Hand Animation

We propose a new data-driven framework for synthesizing hand motion at different emotion levels. Specifically, we first capture high-quality hand motion using VR gloves. The hand motion data is then annotated with the emotion type and a latent space is constructed from the motions to facilitate the motion synthesis process. By interpolating the latent representation of the hand motion, new hand animation with different levels of emotion strength can be generated. Experimental results show that our framework can produce smooth and consistent hand motions at an interactive rate.

Improving Brain Memory through Gaming Using Hand Clenching and Spreading

This paper investigates to which degrees of playing games by using hand motions improve brain memory. Based on previous studies in psychology reporting that hand exercise could affect brain function and memory, we develop an interface for playing games by using hand motions. It works by detecting hand motions through image processing, translating them into commands for keyboard press, and sending such commands to the target game application; it works with any existing games without the need to modify the game source code and only requires an off-the-self webcam. An experiment is conducted on a jumping game, where three types of game interfaces: a keyboard, a Kinect, and the proposed interface are compared. The results show that playing games using hand clenching/spreading leads to the best memory test performance according to the N-back test, a commonly used cognitive task for the assessment of working memory.

Player Dominance Adjustment Motion Gaming AI for Health Promotion

This paper presents an opponent fighting game AI for promoting balancedness in use of body segments of the player during full-body motion gaming. The proposed AI, named PDAHP-AI, is based on Monte Carlo tree-search and employs a recently purposed concept called Player Dominance Adjustment, where the AI determines its actions based on the player’s inputs so as to adjust the player’s dominant power. The basic idea is to let the player dominate the game when they perform healthy movement and on the contrary to have the AI take a strong action against the player when she or he performs unhealthy movement. The AI outperforms an existing dynamic difficulty adjustment AI designed for the same propose.

Prior-less 3D Human Shape Reconstruction with an Earth Mover’s Distance Informed CNN

We propose a novel end-to-end deep learning framework, capable of 3D human shape reconstruction from a 2D image without the need of a 3D prior parametric model. We employ a “prior-less” representation of the human shape using unordered point clouds. Due to the lack of prior information, comparing the generated and ground truth point clouds to evaluate the reconstruction error is challenging. We solve this problem by proposing an Earth Mover’s Distance (EMD) function to find the optimal mapping between point clouds. Our experimental results show that we are able to obtain a visually accurate estimation of the 3D human shape from a single 2D image, with some inaccuracy for heavily occluded parts.

The Influence of Step Length to Step Frequency Ratio on the Perception of Virtual Walking Motions

The Q*bird Level Designer: User-assisted procedural level design in augmented reality

Augmented reality (AR) gaming is becoming widely available thanks to improvements in hand-held devices such as phones and tablets. In this work, we describe our system for generating levels for the AR game, Q*bird. In Q*bird, the player must visit every cell in the level while avoiding bees and cannon balls, similarly to the 1982 arcade game, Q*bert. To create a new level, designers place game elements using virtual cards. The system then generates the remainder of the level, ensuring that it’s navigable. Designers can edit these levels by dragging and dropping the created geometry. To test, the designer can drop a character into the level and play it. This system aids playtesting and level design by allowing levels to be quickly specified and tested in the same environment in which the game is played. Furthermore, this system offers an example of how the design of AR levels can also be performed in AR.

Why did the human cross the road?

‘‘Humans at rest tend to stay at rest. Humans in motion tend to cross the road – Isaac Newton.” Even though this response is meant to be a joke to indicate the answer is quite obvious, this important feature of real world crowds is rarely considered in simulations. Answering this question involves several things such as how agents balance between reaching goals, avoid collisions with heterogeneous entities and how the environment is being modeled. As part of a preliminary study, we introduce a reinforcement learning framework to train pedestrians to cross streets with bidirectional traffic. Our initial results indicate that by using a very simple goal centric representation of agent state and a simple reward function, we can simulate interesting behaviors such as pedestrians crossing the road through crossings or waiting for cars to pass.