SA '21 Technical Communications: SIGGRAPH Asia 2021 Technical Communications

Full Citation in the ACM Digital Library


Controlling Eye Blink for Talking Face Generation via Eye Conversion

A real talking face video includes not only the movement of the mouth, but also realistic blinking details. For a computer generated talking face video, realistic eye movements are critical to overcome the uncanny valley effect. However, it remains a great challenge to introduce realistic eye movements into talking face generation systems. In this paper, we propose a two-stage system for generating talking face video with realistic controllable blinking actions. Through eye conversion and frame replacement, our architecture can ensure the controllability of the blinking motion generation. We propose an eye conversion GAN, which can convert a face image into any stages of blinking, and maintain the consistency of facial identity features. In this network, we design joint training to increase the network’s ability of generating closed and half-closed eye images, which improves the authenticity of the eyes. Experiments on two popular data sets show that compared with previous work, our method can not only guarantee the authenticity of mouth movements, but also generate realistic and controllable eye blinks.

Dynamic Neural Face Morphing for Visual Effects

In this work we present a machine learning approach for face morphing in videos, between two or more identities. We devise an autoencoder architecture with distinct decoders for each identity, but with an underlying learnable linear basis for their weights. Each decoder has a learnable parameter that defines the interpolating weights (or ID weights) for the basis which can successfully decode its identity. During inference, the ID weights can be interpolated to produce a range of morphing identities. Our method produces temporally consistent results and allows blending different aspects of the identities by exposing the blending weights for each layer of the decoder network. We deploy our trained models to image compositors as 2D nodes with independent controls for the blending weights. Our approach has been successfully used in production, for the aging of David Beckham in the Malaria Must Die campaign.

SESSION: Interactivity and Simulation

Autocomplete Repetitive Stroking with Image Guidance

Image-guided drawing can compensate for the lack of skills but often requires a significant number of repetitive strokes to create textures. Existing automatic stroke synthesis methods are usually limited to predefined styles or require indirect manipulation that may break the spontaneous flow of drawing. We present a method to autocomplete repetitive short strokes during users’ normal drawing process. Users can draw over a reference image as usual. At the same time, our system silently analyzes the input strokes and the reference to infer strokes that follow users’ input style when certain repetition is detected. Our key idea is to jointly analyze image regions and operation history for detecting and predicting repetitions. The proposed system can reduce tedious repetitive inputs while being fully under user control.

GPU Cloth Simulation Pipeline in Lightchaser Animation Studio

We present the simulation pipeline of character effects in Lightchaser Animation Studio and how we utilize GPU resources to accelerate clothing simulations. Many characters from ancient Chinese tales in our films are in complex costumes.(Figure 1) Such costumes contain five to six layers of fabrics, where several hundred thousand triangles are used to show the delicate folds of different cloth materials. At the same time, there are more than 600 similar shots bringing more than 1000 cloth simulation tasks in a film project. Therefore, the efficiency and accuracy of our cloth simulator is the key to the CFX production pipeline.

Inverse Free-form Deformation for interactive UV map editing

Free-form deformation (FFD) is useful for manual 2D texture mapping in a 2D domain. The user first places a coarse regular grid in the texture space, and then adjusts the positions of the grid points in the image space. In this paper, we consider the inverse way of this problem, namely, inverse FFD. In this problem setting, we assume that an initial image-to-texture dense mapping is already obtained by some automatic method, such as a data-driven inference. However, this initial dense mapping may not be satisfactory, so the user may want to modify it. Nonetheless, it is difficult to manually edit the dense mapping due to its huge degrees of freedom. We thus convert the dense mapping to a coarse FFD mapping to facilitate manual editing of the mapping. Inverse FFD is formulated as a least-squares optimization, so one can solve it very efficiently.

Skeleton2Stroke: Interactive Stroke Correspondence Editing with Pose Features

Inbetweening is an important technique for computer animations where the stroke correspondence of hand-drawn illustrations plays a significant role. Previous works typically require image vectorization and enormous computation cost to achieve this goal. In this paper, we propose an interactive method to construct stroke correspondences in character illustrations. First, we utilize a deep learning-based skeleton estimation to improve the accuracy of closed-area correspondences, which are obtained using greedy algorithm. Second, we construct stroke correspondences based on the estimated closed-area correspondences. The proposed user interface is verified by our experiment to ensure that the users can achieve high accuracy with low correction in stroke correspondence.

SESSION: Machine Learning for Graphics

A Multi-Stage Advanced Deep Learning Graphics Pipeline

In this paper we propose the Advanced Deep Learning Graphics Pipeline (ADLGP). ADLGP is a novel approach that uses existing deep learning architectures to convert scene data into rendered images. Our goal of generating frames from semantic data has produced successful renderings with similar structures and composition as target frames. We demonstrate the success of ADLGP with side-by-side comparisons of frames generated through standard rendering procedures. We assert that a fully implemented ADLGP framework would reduce the time spent in visualizing 3D environments, and help selectively offload the requirements of the current graphics rendering pipeline.

Anime Character Colorization using Few-shot Learning

In this paper, we propose an automatic Anime-style colorization method using only a small number of colorized reference images manually colorized by artists. To accomplish this, we introduce a few-shot patch-based learning method considering the characteristics of Anime line-drawing. To streamline the learning process, we derive optimal settings with acceptable colorization accuracy and training time for a production pipeline. We demonstrate that the proposed method helps to reduce manual labor for artists.

Comic Image Inpainting via Distance Transform

Inpainting techniques for natural images have progressed significantly. However, if these methods are applied to comic images, the results are not satisfactory because of very noticeable artifacts, especially around line drawings. Line drawings are challenging to inpaint because of their high-frequency components. In this paper, we propose a novel method for inpainting comic images in the distance transform domain. In this method, we first convert a line drawing into a distance image and then inpaint the distance image. By transforming line drawings into distance images, we can eventually reduce the high-frequency components, which leads to improve inpainting performance. We compared the results of our proposed method with those of the conventional methods. The results showed that the proposed method achieved 0.1% lower l1 loss, 0.5dB higher PSNR, and 0.5% higher SSIM than those of the conventional inpainting methods.

Guided Image Weathering using Image-to-Image Translation

In this paper, we present a guided image weathering method that allows the user to generate the weathering process. The core of our method is a three-step method to generate textures at different time steps of the weathering process. The input texture is analyzed first to obtain the weathering degree (age map) for each pixel, then we train a conditional adversarial network to generate texture patches with diverse weathering effects. Once the training is finished, new weathering results can be generated by manipulating the age map, such as automatic interpolation and manually modified by the user.

SESSION: Material Acquisition and Representations

Efficient spherical harmonic shading for separable BRDF

Spherical Harmonics (SH) are commonly and widely used in computer graphics in order to speed up the evaluation of the rendering equation. With separable BRDF, the diffuse and specular contributions are traditionally computed separately. Our first contribution is to demonstrate that there is a simple relationship between both computations, but one-way, i.e. from specular to diffuse. We show how to deduce the diffuse contribution from the specular contribution, using a single multiplication. This replaces the use of tens of multiplications for some cases up to complex rotations for other cases. Our second contribution is an efficient way to compute the SH product between an arbitrary function and a clamped cosine, much less expensive than the traditional SH triple product.

EpiScope: Optical Separation of Reflected Components by Rotation of Polygonal Mirror

Separating reflection components is an important task in computer graphics and vision. Episcan3D has been proposed to separate the direct and indirect reflection components in real-time. This method uses a scanning laser projector and a rolling shutter camera, so it requires unmanageably precise geometric alignment and temporal synchronization. In this paper, we propose a novel optical system that achieves the same function without imaging devices. In this method, the ray directions of projection, observation, and presentation are optically and mechanically synchronized by a rotating polygonal mirror. The direct or indirect components can be selected by a mask-based light-field filter. Especially, the selected reflection components can be seen directly by our naked eye, and there are no restrictions on image quality or delays in presentation due to the number of pixels or frame rate of the imaging system.

Experimental Analysis of Multiple Scattering BRDF Models

SESSION: Metaverse and VR

SpiCa: Stereoscopic Effect Design with 3D Pottery Wheel-type Transparent Canvas

Flow effects such as flames, smoke, and liquids play an important role in activating illustrations, but drawing these effects requires artistic expertise as well as a great deal of effort. In this paper, we propose a method for adding stereoscopic flow effects to character illustrations using various shapes of 3D pottery wheel-type transparent canvases. One approach to designing a flow effect to decorate a character relies on simple curved geometry to beautify its flow in an organized composition. We extend this approach to present a drawing system—SpiCa (spinning canvas), which enables users to use transparent surface of revolution canvases to design 3D flow effects. User evaluations showed that users were able to create such effects more easily and effectively and reduce their workload with SpiCa in comparison with an existing 2D illustration tool.

Tool-based Asymmetric Interaction for Selection in VR

Mainstream Virtual Reality (VR) devices on the market nowadays mostly use symmetric interaction design for input, yet common practice by artists suggests asymmetric interaction using different input tools in each hand could be a better alternative for 3D modeling tasks in VR. In this paper, we explore the performance and usability of a tool-based asymmetric interaction method for a 3D object selection task in VR and compare it with a symmetric interface. The symmetric VR interface uses two identical handheld controllers to select points on a sphere, while the asymmetric interface uses a handheld controller and a stylus. We conducted a user study to compare these two interfaces, and found that the asymmetric system was faster, required less workload, and was rated with better usability. We also discuss the opportunities for tool-based asymmetric input to optimize VR art workflows, and future research directions.

Transition Motion Tensor: A Data-Driven Approach for Versatile and Controllable Agents in Physically Simulated Environments

This paper proposes the Transition Motion Tensor, a data-driven framework that creates novel and physically accurate transitions outside of the motion dataset. It enables simulated characters to adopt new motion skills efficiently and robustly without modifying existing ones. Given several physically simulated controllers specializing in different motions, the tensor serves as a temporal guideline to transition between them. Through querying the tensor for transitions that best fit user-defined preferences, we can create a unified controller capable of producing novel transitions and solving complex tasks that may require multiple motions to work coherently. We apply our framework on both quadrupeds and bipeds, perform quantitative and qualitative evaluations on transition quality, and demonstrate its capability of tackling complex motion planning problems while following user control directives.

Marvel's Spider-Man: Miles Morales Procedural Tools for PlayStation 5 Content Authoring

Artists at Insomniac Games created a wonderful and detailed open world, Marvel's Manhattan in autumn, on the PlayStation 4 console for Marvel's Spider-Man. This technical communication provides an overview of several procedural systems developed or improved upon for the standalone game, Marvel's Spider-Man: Miles Morales, and a snapshot of the procedural processes in use during the game's production on the PlayStation 5 console. Procedural systems allowed artists at Insomniac Games to efficiently update the setting to winter, increase visual fidelity, and propagate mark-up data across a large open world environment. The combination of procedural techniques and bespoke artistry enabled Insomniac Games to create a memorable launch title for the PlayStation 5.

SESSION: Ray Tracing Techniques

Real Time Cluster Path Tracing

Sparse Volume Rendering using Hardware Ray Tracing and Block Walking

We propose a method to render sparse volumetric data using ray-tracing hardware efficiently. To realize this, we introduce a novel data structure, traversal algorithm, and density encoding that allows for an annotated BVH representation. In order to avoid API calls to ray tracing hardware which reduces the efficiency in the rendering, we propose the block walking for which we store information about adjacent nodes in each BVH node’s corresponding field, taking advantage of the knowledge of the content layout. Doing so enables us to traverse the tree more efficiently without repeatedly accessing the spatial acceleration structure maintained by the driver. We demonstrate that our method achieves higher performance and scalability with little memory overhead, enabling interactive rendering of volumetric data.

Vectorized Reservoir Sampling

Reservoir sampling is becoming an essential component of realtime rendering as it enables importance resampling with limited storage. Chao’s weighted random sampling algorithm is a popular choice because of its simplicity. Although it is elegant, there is a fundamental issue that many random numbers must be generated to update reservoirs. To address this issue, we modify Chao’s algorithm with sample warping. We apply sample warping in two different ways and compare them. We further vectorize the modified algorithm to make reservoir sampling more useful for CPU rendering and give a couple of practical examples.

Viewport-Resolution Independent Anti-Aliased Ray Marching on Interior Faces in Cube-Map Space

This paper presents a novel approach to anti-aliased ray marching by indirect shading in cube-map space. Our volume renderer firstly performs ray marching on each visible interior pixel of a maximum-resolution-limited cube map, and then resamples (usually up-scales) the cube imposter in viewport space. By this viewport-resolution-independent strategy, developers can improve both ray-marching performance and its quality of anti-aliasing when allowing larger marching strides. Moreover, our solution also covers depth-occlusion anti-aliasing for mixed mesh-volume rendering, cube-map level-of-details (LOD) optimization for a further performance boost, and multiple-volume rendering by leveraging the GPU inline ray tracing. Besides, our implementation is developer-friendly and the performance-quality tradeoff determined by the parameter configuration is easily controllable.


Path-traced global illumination of scenes with complex lighting remains particularly challenging at real-time framerates. Reservoir-based resampling methods for light sampling allow for significant noise reduction at the cost of very few shadow rays per pixel. However, current image-space approaches to reservoir reuse do not scale to sample lighting at further bounces, as is required for efficiently evaluating indirect illumination.

We present a novel approach to performing reservoir-based spatiotemporal importance resampling in world space, allowing for efficient light sampling at arbitrary vertices along the eye path. Our approach caches the reservoirs of the path vertices into the cells of a hash grid built entirely on the GPU. Such a structure allows for stochastic reuse of neighboring reservoirs across space and time for efficient spatiotemporal reservoir resampling at any point in space.