SA '20: SIGGRAPH Asia 2020 Technical Communications

Full Citation in the ACM Digital Library

SESSION: Session 1: Geometry and Modeling

Distorted Perspective for the Forward Camera Dolly

We present a method to add distorted perspective effects for scenes with a forward camera dolly (the camera moves in the depth direction). Our target scene is a first-person view of the main character traversing a long path. A distorted perspective has been applied to such scenes in hand-drawn animation films to create more dynamic and dramatic motions. Unfortunately, it is difficult to create a perspective for the camera direction in 3D animation. Therefore, we created an interactive tool to design cartoon-like perspectives for 3D computer-generated animations. Users can control the affine transformation, including translation, rotation, and scaling, in the depth direction of the camera coordinate system on a 2D screen. We implemented the proposed deformer on a vertex shader to ensure the real-time performance of our system.

Semi-global Quad Mesh Structure Simplification via Separatrix Operations

This paper presents a semi-global method to simplify the structure of an all-quad mesh. The simplification aims to reduce the number of singularities, while preserving boundary features. The simplification operations of our method are based on the separatrices connecting adjacent singularities. The proposed semi-global method can handle quad-meshes with complex structures (e.g., quad-meshes obtained via Catmull-Clark subdivision of the triangle meshes) and produce quad meshes with much simpler structures.

Neurally-Guided Texturing for Garment Line Drawings

Adding texture to a line drawing is an important process in the production of comics and illustrations. Garment drawings especially often have large deformations with self-occlusions, so deforming patterns is essential for representing realistic garment designs. However, it is currently done manually and requires a significant amount of effort by experts. A possible approach is to infer 3D surface geometry and then apply texture to all 3D surfaces, but it is difficult to represent deep creases using this approach. In this paper, we introduce a “neurally-guided” optimization system for automatically deforming and directly mapping 2D texture patterns to 2D line drawings, bypassing 3D geometry. First, we build a deep neural network to estimate local transformation matrices of texture patterns, called neural-guidance, from line drawings. Second, we build a 2D triangle mesh for the garment and deforms the mesh to obtain the texture coordinates by integrating the local transformations. Our algorithm is effective and easy to integrate into existing drawing systems. We provide several examples to demonstrate the efficiency of our proposed system over previous methods and illustrate the versatility of our method.

Simple Methods to Represent Shapes with Sample Spheres

Representing complex shapes with simple primitives in high accuracy is important for a variety of applications in computer graphics and geometry processing. Existing solutions may produce suboptimal samples or are complex to implement. We present methods to approximate given shapes with user-tunable number of spheres to balance between accuracy and simplicity: touching medial/scale-axis polar balls and k-means smallest enclosing circles. Our methods are easy to implement, run efficiently, and can approach quality similar to manual construction.

SESSION: Session 2: Computer Vision and Image Processing

DMCR-GAN: Adversarial Denoising for Monte Carlo Renderings with Residual Attention Networks and Hierarchical Features Modulation of Auxiliary Buffers

Learning-based denoising single-frame Monte Carlo rendering methods have achieved better rendering quality in the photo-realistic rendering research. However, most of the works ignore the rich information of auxiliary buffers and treat all features equally. In this paper, we propose an adversarial approach for denoising Monte Carlo renderings (DMCR-GAN) with residual attention networks and hierarchical features modulation of auxiliary buffers. Specifically, we use a residual in residual (RIR) structure to make the network deeper and ease the flow of low-frequency information. Moreover, we propose a convolution dense block group (CDBG) to extract hierarchical features of auxiliary buffers and then to modulate the noisy features in RIR structure. Furthermore, we propose a channel attention (CA) and spatial attention (SA) mechanism to exploit the inter-channel and inter-spatial dependencies of features. Compared with the state-of-the-art methods, our approach can restore more high-frequency information of images.

PoseFromGraph: Compact 3-D Pose Estimation using Graphs

With the rising need for reliable and real-time pose estimation in resource constrained environments such as smartphones, IoT devices, and head mounts, we need an efficient and compact pose estimation framework. To this end, we propose PoseFromGraph1, a light-weight 3D pose estimation framework. The inputs to PoseFromGraph are: a graph obtained by skeletonizing the 3D meshes using the prairie-fire analogy and the RGB image, and the output is the 3D pose of the object. The introduction of 3D shapes to the architecture makes our model category-agnostic. Unlike computationally expensive multi-view geometry and point-cloud based representations to estimate pose, our approach uses a message passing network to incorporate local neighborhood information at the same time maintaining global shape property in a graph by optimizing a neighborhood preserving objective. PoseFromGraph surpasses the state-of-the-art pose estimation methods in terms of accuracy achieving, 84.43% on the Pascal3D dataset, and at the same time yields 4 × reduction in the space and time complexity. The compact pose estimation models can then be used to facilitate on-device inference in applications in Augmented Reality and Robotics for 3D virtual model overlay.

Learning Illumination from Diverse Portraits

We present a learning-based technique for estimating high dynamic range (HDR), omnidirectional illumination from a single low dynamic range (LDR) portrait image captured under arbitrary indoor or outdoor lighting conditions. We train our model using portrait photos paired with their ground truth illumination. We generate a rich set of such photos by using a light stage to record the reflectance field and alpha matte of 70 diverse subjects in various expressions. We then relight the subjects using image-based relighting with a database of one million HDR lighting environments, compositing them onto paired high-resolution background imagery recorded during the lighting acquisition. We train the lighting estimation model using rendering-based loss functions and add a multi-scale adversarial loss to estimate plausible high frequency lighting detail. We show that our technique outperforms the state-of-the-art technique for portrait-based lighting estimation, and we also show that our method reliably handles the inherent ambiguity between overall lighting strength and surface albedo, recovering a similar scale of illumination for subjects with diverse skin tones. Our method allows virtual objects and digital characters to be added to a portrait photograph with consistent illumination. As our inference runs in real-time on a smartphone, we enable realistic rendering and compositing of virtual objects into live video for augmented reality.

SESSION: Session 3: Video Gaming and AR/VR

NeuralDrum: Perceiving Brain Synchronicity in XR Drumming

Brain synchronicity is a neurological phenomena where two or more individuals have their brain activation in phase when performing a shared activity. We present NeuralDrum, an extended reality (XR) drumming experience that allows two players to drum together while their brain signals are simultaneously measured. We calculate the Phase Locking Value (PLV) to determine their brain synchronicity and use this to directly affect their visual and auditory experience in the game, creating a closed feedback loop. In a pilot study, we logged and analysed the users’ brain signals as well as had them answer a subjective questionnaire regarding their perception of synchronicity with their partner and the overall experience. From the results, we discuss design implications to further improve NeuralDrum and propose methods to integrate brain synchronicity into interactive experiences.

FaceMagic: Real-time Facial Detail Effects on Mobile

We present a novel real-time face detail reconstruction method capable of recovering high quality geometry on consumer mobile devices. Our system firstly uses a morphable model and semantic segmentation of facial parts to achieve robust self-calibration. We then capture fine-scale surface details using a patch-based Shape from Shading (SfS) approach. We pre-compute the patch-wise constant Moore–Penrose inverse matrix of the resulting linear system to achieve real-time performance. Our method achieves high interactive frame-rates and experiments show that our new approach is capable of reconstructing high-fidelity geometry with corresponding results to off-line techniques. We illustrate this through comparisons with off-line and on-line related works, and include demonstrations of novel face detail shader effects processing.

Light Field Near-eye Display Resolution Enhancement Using a Staggered Microlens Array

We propose a method of improving the resolution of near-eye light field displays (LFDs) using a staggered microlens array (MLA). The staggered matrix is formed by interlacing multiple identical sparse MLAs with the same microlens periodicity. The sparse MLAs are rectilinearly shifted relative to each other such that the distance between neighboring microlenses is not one fixed value. We demonstrate that by introducing this irregular shift, the staggered MLA-based LFD can support a higher spatial display resolution.

Lazy Build of Acceleration Structures with Traversal Shaders

Modern ray tracing APIs allow developers to easily build acceleration structures (AS) with various optimization techniques. However, the visibility-driven on-demand build can not be implemented with the current APIs due to the lack of flexibility during ray traversal. In this paper, we propose a new algorithm to lazily build ASes for real-time ray tracing with an extended programming model supporting flexible ray traversal. The core idea of our approach is a multi-pass build-traversal, which computes instance visibility and builds the visible ASes in different passes. This allows us to lazily build the entire AS only when necessary without hardware implication. Applying our algorithm to dynamic scenes, we demonstrate that the build cost is significantly reduced with minimal overhead.

SESSION: Session 4: Animation and Visual Effects

Song2Face: Synthesizing Singing Facial Animation from Audio

We present Song2Face, a deep neural network capable of producing singing facial animation from an input of singing voice and singer label. The network architecture is built upon our insight that, although facial expression when singing varies between different individuals, singing voices store valuable information such as pitch, breathe, and vibrato that expressions may be attributed to. Therefore, our network consists of an encoder that extracts relevant vocal features from audio, and a regression network conditioned on a singer label that predicts control parameters for facial animation. In contrast to prior audio-driven speech animation methods which initially map audio to text-level features, we show that vocal features can be directly learned from singing voice without any explicit constraints. Our network is capable of producing movements for all parts of the face and also rotational movement of the head itself. Furthermore, stylistic differences in expression between different singers are captured via the singer label, and thus the resulting animations singing style can be manipulated at test time.

HedgehogAD: Interactive Design Exploration of Omni-Directional Aerodynamics

OmniAD [Martin et al. 2015] is a data-driven pipeline for physics-based aerodynamics animations. In the pipeline, a real-time aerodynamic model which handles omni-directional airflow has also been introduced. However, their framework requires a captured motion, which implies that their model is not suitable for designing a customized motion. In this paper, we present a method to add user-controllability to the aerodynamic model in OmniAD by allowing the user to directly interact with the aerodynamics model. The system first visualizes the force and torque coefficients, which represent (x, y, z) components of aerodynamic force (i.e., drag and lift force) and torque, as a set of arrows on a sphere. The user then modifies the arrows on the screen as desired. The system updates the internal representation (parameters for spherical harmonics) and shows resulting animation. We run a user study and the participants successfully designed physically plausible falling motions using the proposed method.

Spherical Light Integration over Spherical Caps via Spherical Harmonics

Spherical area light sources are widely used in synthetic rendering. However, traditional Monte Carlo methods can require an excessive number of samples for sufficient accuracy. We propose a Spherical Harmonics (SH) based method to provide a trade-off between performance and accuracy. Our key idea is an analytical integration of SH over spherical caps. The SH integration is first decomposed into a weighted sum of Zonal Harmonics (ZH) integration, which could be evaluated using recurrence formulae. The resulting integration could then be used for rendering spherical area lights efficiently, saving 50% light samples at best while maintaining competitive accuracy. Our method can easily fit into an existing SH based rendering framework to support near-field sphere lighting.