SA '19- SIGGRAPH Asia 2019 Technical Briefs

Full Citation in the ACM Digital Library

The Power of Box Filters: Real-time Approximation to Large Convolution Kernel by Box-filtered Image Pyramid

This paper presents a novel solution for approximations to some large convolution kernels by leveraging a weighted box-filtered image pyramid set. Convolution filters are widely used, but still compute-intensive for real-time rendering when the kernel size is large. Our algorithm approximates the convolution kernels, such as Gaussian and cosine filters, by two phases of down and up sampling on a GPU. The computational complexity only depends on the input image resolution and is independent of the kernel size. Therefore, our method can be applied to nonuniform blurs, irradiance probe generations, and ray-traced glossy global illuminations in real time, and runs in effective and efficient performance.

ChinaStyle: A Mask-Aware Generative Adversarial Network for Chinese Traditional Image Translation

GANs make it effective to generate artworks using appropriate collections. However, most training dataset either contain paintings that were only from one artist or contain only one category. There is few training datasets for Chinese traditional figure paintings. This paper presents a new high-quality dataset named ChinaStyle Dataset including six categories, containing 1913 images totally. We further proposes Mask-Aware Generative Adversarial Networks (MA-GAN) to transfer realistic portraiture to different styles of Chinese paintings. Different from existing mothed, MA-GAN uses a single model only once with our unpaired dataset. Besides, Mask-aware strategy is used to generate free-hand style of Chinese paintings. In addition, a color preserved loss is proposed to alleviate the color free problem. Experimental results and user study demonstrate that MA-GAN achieves a natural and competitive performance compared with existing methods.

A Decomposition Method of Object Transfiguration

Existing deep learning-based object transfiguration methods are based on unsupervised image-to-image translation which shows reasonable performance. However, previous methods often fail in tasks where the shape of an object changes significantly. In addition, the shape and texture of an original object remain in the converted image. To address these issues, we propose a novel method that decomposes an object transfiguration task into two subtasks: object removal and object synthesis. This prevents an original object from affecting a generated object and makes the generated object better suited to the background. Then, we explicitly formulate each task distinguishing a background and an object using instance information (e.g. object segmentation masks). Our model is unconstrained by position, shape, and size of an original object compared to other methods. We show qualitative and quantitative comparisons with other methods demonstrating the effectiveness of the proposed method.

Structure-Aware Image Expansion with Global Attention

We present a novel structure-aware strategy for image expansion which aims to complete an image from a small patch. Different from image inpainting, the majority of the pixels are absent here. Hence, there are higher requirements for global structure-aware prediction to produce visually plausible results. Thus, treating the expansion tasks as inpainting from the outside is ill-posed. Therefore, we propose a learning-based method combining structure-aware and visual attention strategies to make better prediction. Our architecture consists of two stages. Since visual attention cannot be taken full advantage of when the global structure is absent, we first use the ImageNet-pre-trained VGG-19 to make the structure-aware prediction on the pre-training stage. Then, we implement a non-local attention layer on the coarsely-completed results on the refining stage. Our network can well predict the global structures and semantic details from small input image patches, and generate full images with structural consistency. We apply our method on a human face dataset, which containing rich semantic and structural details. The results show its stability and effectiveness.

Flexible Ray Traversal with an Extended Programming Model

The availability of hardware-accelerated ray tracing in GPUs and standardized APIs has led to a rapid adoption of ray tracing in games. While these APIs allow programmable surface shading and intersections, most of the ray traversal is assumed to be fixed-function. As a result, the implementation of per-instance Level-of-Detail (LOD) techniques is very limited. In this paper, we propose an extended programming model for ray tracing which includes an additional programmable stage called the traversal shader that enables procedural selection of acceleration structures for instances. Using this programming model, we demonstrate multiple applications such as procedural multi-level instancing and stochastic LOD selection that can significantly reduce the bandwidth and memory footprint of ray tracing with no perceptible loss in image quality.

Faster RPNN: Rendering Clouds with Latent Space Light Probes

We introduce latent space light probes for fast rendering of high albedo anisotropic materials with multiple scattering. Our Faster RPNN model improves the performance of cloud rendering by precomputing some parts of the neural architecture, separating the parts that should be inferred at runtime. The model provides 2-3x speedup over state of the art Radiance-Predicting Neural Networks (RPNN), has negligible precomputation cost and low memory footprint, while providing results with low bias that are visually indistinguishable from computationally intensive path tracing.

Accelerated Volume Rendering with Chebyshev Distance Maps

Volume rendering has useful applications with emerging technologies such as virtual and augmented reality. The high frame rate targets of these technologies poses a problem for volume rendering because of its very high computational complexity compared with conventional surface rendering. We developed an efficient empty space skipping algorithm for accelerating volume rendering. A distance map is generated which indicates the Chebyshev distance to the nearest occupied region (with non-transparent voxels) within a volume. The distance map is used to efficiently skip empty regions while volume ray casting. We show improved performance over state-of-the-art empty space skipping techniques.

Outdoor Sound Propagation in Inhomogeneous Atmosphere via Precomputation

Most of the sound propagation simulation methods are dedicated to room scenes, and only few of them can be used for outdoor scenes. Meanwhile, although ray tracing is used for simulation, it cannot accurately simulate some acoustic effects. In acoustics, some wave-based methods are accurate but suffer from low computational efficiency. We present a novel wave-based precomputation method that enables accurate and fast simulation of sound propagation in inhomogeneous atmosphere. An extended FDTD-PE method is used to calculate the sound pressure in 3D scene. The space is divided into two parts, the source region in which the FDTD method is employed and the far-field region in which the PE method is employed. A coupling methodology is applied at the junction between the two regions. The sound pressure data is further compressed to get the impulse response (IR) of the source region and sound attenuation function of the far-field region. Finally, we validated our method through various experiments, and the results indicate that our method can accurately simulate the sound propagation, with quite higher speed and lower storage.

Automatic Generation of Chinese Vector Fonts via Deep Layout Inferring

Designing a high-quality Chinese vector font library which can be directly used in real applications is very time-consuming, since the font library typically consists of large amounts of glyphs. To address this problem, we propose a data-driven system in which only a small number (about 10%) of glyphs need to be designed. Specifically, the system first automatically decomposes those input glyphs into vectorized components. Then, a layout prediction module based on deep neural network is applied to learn the layout and structure information of input characters. Finally, proper components are selected to assemble each character based on the predicted layout to build the font library that can be directly used in computers and smart mobile devices. Experimental results demonstrate that our system synthesizes high-quality glyphs and significantly enhances the producing efficiency of Chinese vector fonts.

Enhancing Piecewise Planar Scene Modeling from a Single Image via Multi-View Regularization

Recent studies on planar scene modeling from a single image employ multi-branch neural networks to simultaneously segment pla-nes and recover 3D plane parameters. However, the generalizability and accuracy of these supervised methods heavily rely on the scale of available annotated data. In this paper, we propose multi-view regularization for network training to further enhance single-view reconstruction networks, without demanding extra annotated data. Our multi-view regularization emphasizes multi-view consistency in the training phase, making the feature embedding more robust against view change and lighting variation. Thus, the neural network trained with our regularization can be better generalized to a wide range of views and lightings. Our method achieves state-of-the-art reconstruction performance compared to previous piecewise planar reconstruction methods on the public ScanNet dataset.

Architecture of Integrated Machine Learning in Low Latency Mobile VR Graphics Pipeline

In this paper, we discuss frameworks to execute machine learning algorithms in the mobile VR graphics pipeline to improve performance and rendered image quality in real time. We analyze and compare the benefits and costs of various possibilities. We illustrate the strength of using machine framework in graphics pipeline with an application of efficient spatial temporal super-resolution that amplifies GPU render power to achieve better image quality.

Unpaired Sketch-to-Line Translation via Synthesis of Sketches

Converting hand-drawn sketches into clean line drawings is a crucial step for diverse artistic works such as comics and product designs. Recent data-driven methods using deep learning have shown their great abilities to automatically simplify sketches on raster images. Since it is difficult to collect or generate paired sketch and line images, lack of training data is a main obstacle to use these models. In this paper, we propose a training scheme that requires only unpaired sketch and line images for learning sketch-to-line translation. To do this, we first generate realistic paired sketch and line images from unpaired sketch and line images using rule-based line augmentation and unsupervised texture conversion. Next, with our synthetic paired data, we train a model for sketch-to-line translation using supervised learning. Compared to unsupervised methods that use cycle consistency losses, our model shows better performance at removing noisy strokes. We also show that our model simplifies complicated sketches better than models trained on a limited number of handcrafted paired data.

Saliency Diagrams

Keyframes are a core notion used by animators to understand and describe the motion. In this paper, we take inspiration from keyframe animation to compute a feature that we call the “Saliency diagram” of the animation. To create our saliency diagrams, we visualize how often each frame becomes a keyframe when using an existing selection technique. Animators can use the resulting Saliency diagram to analyze the motion.

Piku Piku Interpolation

We propose a sampling algorithm that reassembles real-life movements to add detail to early-stage facial animation. We examine the results of applying our algorithm with FACS data extracted from video. Using our algorithm like an interpolation scheme, animators can reduce the time required to produce detailed animation.

Fast Terrain-Adaptive Motion Generation using Deep Neural Networks

We propose a fast motion adaptation framework using deep neural networks. Traditionally, motion adaptation is performed via iterative numerical optimization. We adopted deep neural networks and replaced the iterative process with the feed-forward inference consisting of simple matrix multiplications. For efficient mapping from contact constraints to character motion, the proposed system is composed of two types of networks: trajectory and pose generators. The networks are trained using augmented motion capture data and are fine-tuned using the inverse kinematics loss. In experiments, our system successfully generates multi-contact motions of a hundred of characters in real-time, and the result motions contain the naturalness existing in the motion capture data.

Interactive editing of performance-based facial animation

While performance-based facial animation efficiently produces realistic animation, it still needs additional editing after automatic solving and retargeting. We review why additional editing is required and present a set of interactive editing solutions for VFX studios. The presented solutions allow artists to enhance the result of the automatic solve-retarget with a few tweaks. The methods are integrated into our performance-based facial animation framework and have been actively used in high-quality movie production.

Beyond the Screen

While working on the theme park ride project, we were required to solve problems of making a projection screen as a window that shows the virtual world behind it. To create this magical effect, we developed our own image resampling pipeline called ”BeyondScreen”. For each screen, it generates a video clip that makes the audience in the ride feel like they are seeing the virtual space. It produces a sense of depth by showing hidden areas beyond the screen as the viewpoint moves. After ensuring that the algorithm works well, we developed custom plug-ins for Nuke, RenderMan, and Houdini so that it can be easily used in the existing VFX pipeline.

Embedded Concave Micromirror Array-based See-through Light Field Near-eye Display

We propose a direct-view see-through light field near-eye display (NED) using a semi-reflective embedded concave micromirror array (ECMMA) that can generate virtual images at different depths of focus. The ECMMA is a planar optical element having a thin, semi-reflective embedded metallic film for the mirror array. Being a flat element, the ECMMA has zero net refractive power, and the light rays originating from the background scene do not change their original direction of propagation when they pass the ECMMA. Therefore, the see-through view of the proposed ECMMA-NED looks clear with negligible disturbance to the quality of the background scene.

The Potential of Light Fields in Media Productions

One aspect of the EU funded project SAUCE is to explore the possibilities and challenges of integrating light field capturing and processing into media productions. A special light field camera was build by Saarland University [Herfet et al. 2018] and is first tested under production conditions in the test production “Unfolding” as part of the SAUCE project. Filmakademie Baden-Württemberg developed the contentual frame, executed the post-production and prepared a complete previsualization. Calibration and post-processing algorithms are developed by the Trinity College Dublin and the Brno University of Technology. This document describes challenges during building and shooting with the light field camera array, as well as its potential and challenges for the post-production.

ARSpectator: Exploring Augmented Reality for Sport Events

Augmented Reality (AR) has gained a lot of interests recently and has been used for various applications. Most of these applications are however limited to small indoor environments. Despite the wide range of large scale application areas that could highly benefit from AR usage, until now there are rarely AR applications that target such environments. In this work, we discuss how AR can be used to enhance the experience of on-site spectators at live sport events. We investigate the challenges that come with applying AR for such a large scale environment and explore state-of-the-art technology and its suitability for an on-site AR spectator experience. We also present a concept design and explore the options to implement AR applications inside large scale environments.

Binary Space Partitioning Visibility Tree for Polygonal Light Rendering

In this paper, we present a method to render shadows for physically-based materials under polygonal light sources. Direct illumination calculation from a polygonal light source involves the triple product integral of the lighting, the bidirectional reflectance distribution function (BRDF), and the visibility function over the polygonal domain, which is computation intensive. To achieve real-time performance, work on polygonal light shading exploits analytic solutions of boundary integrals along the edges of the polygonal light at the cost of lacking shadowing effects. We introduce a hierarchical representation for the pre-computed visibility function to retain the merits of closed-form solutions for boundary integrals. Our method subdivides the polygonal light into a set of polygons visible from a point to be shaded. Experimental results show that our method can render complex shadows with a GGX microfacet BRDF from polygonal light sources at interactive frame rates.

A Flexible Neural Renderer for Material Visualization

Photo realism in computer generated imagery is crucially dependent on how well an artist is able to recreate real-world materials in the scene. The workflow for material modeling and editing typically involves manual tweaking of material parameters and uses a standard path tracing engine for visual feedback. A lot of time may be spent in iterative selection and rendering of materials at an appropriate quality. In this work, we propose a convolutional neural network that quickly generates high-quality ray traced material visualizations on a shaderball. Our novel architecture allows for control over environment lighting which assists in material selection and also provides the ability to render spatially-varying materials. Comparison with state-of-the-art denoising and neural rendering techniques suggests that our neural renderer performs faster and better. We provide an interactive visualization tool and an extensive dataset to foster further research in this area.

Real-time Rendering of Layered Materials with Anisotropic Normal Distributions

This paper proposes a lightweight bidirectional scattering distribution function (BSDF) model for layered materials with anisotropic reflection and refraction properties. In our method, each layer of the materials can be described by a microfacet BSDF using an anisotropic normal distribution function (NDF). Furthermore, the NDFs of layers can be defined on tangent vector fields, which differ from layer to layer. Our method is based on a previous study in which isotropic BSDFs are approximated by projecting them onto base planes. However, the adequateness of this previous work has not been well investigated for anisotropic BSDFs. In this paper, we demonstrate that the projection is also applicable to anisotropic BSDFs and that they can be approximated by elliptical distributions using covariance matrices.

Ray Guiding for Production Lightmap Baking

We present a ray guiding technique for improving the computation times in the context of production lightmap baking. Compared to state-of-the-art, our method has better scalability and lower variance.

Recovering Turbulence Details using Velocity Correction for SPH Fluids

In general, kinetic energy of water molecules at translational rotational degree of freedoms (DOFs) occupies the dominant position. However, coarse space discretization always results in severe numerical dissipation if only the linear kinetic energy is considered. Therefore, we proposed a novel turbulence refinement method using velocity correction for SPH simulation. In this method, surface details were enhanced by recovering the energy lost in DOFs for SPH particles. We used a free vortex model to convert particles’ diffused and stretched angular kinetic energy to its neighbours’ linear kinetic energy. Turbulence details would be efficiently generated using the shear between slices. Compared with previous methods, our method can generate turbulence and vortex more vividly and stably.

PaintersView: Automatic Suggestion of Optimal Viewpoints for 3D Texture Painting

Although 3D texture painting has an advantage of making it easy to grasp the overall shape compared with a method of drawing directly onto a UV map, a disadvantage is unpainted (or distorted) regions appearing in the result due to, for example, self-occluded parts. Thus, in order to perform painting without leaving unpainted parts, sequential change of viewpoints is necessary. However, this process is highly time-consuming. To address this problem, we propose an automatic suggestion of optimal viewpoints for 3D texture painting. As the user paints a model, the system searches for optimal viewpoints for subsequent painting and presents them as multiple suggestions. The user switches to a suggested viewpoint by clicking on a suggestion. We conducted a user study and confirmed that the proposed workflow was effective for 3D texture painting envisioned by users.

How NASA Uses Render Time Procedurals for Scientific Data Visualization

In data-driven visualizations, the size and accessibility of data files can greatly impact the computer graphics production pipeline. Loading large and complex data structures into 3D animation software such as Maya may result in system performance issues that limit interactivity.  At NASA's Scientific Visualization Studio, we have implemented methods to procedurally read data files and generate graphics at render time. We accomplish this by creating per-frame calls in our animation software that are executed by the renderer. This procedural workflow accelerates visualization production and iteration.

Bezalel - Towards low-cost pin-based shape displays

The usage of shape changing interfaces is widely discussed in the HCI field as a promising strategy for the physical representation of digital data. Such interfaces are expected to greatly impact a wide range of applications, such as virtual reality, architectural design and education of blind people. Unfortunately, the widespread usage of pin-based shape displays is currently limited by their typically high manufacturing costs. The high costs of pin-based shape displays are mainly due to the number of actuators, which tends to grow quadratically with display resolution. Given that, we present Bezalel: a solution for pin-based shape displays that allows 2 actuators to efficiently actuate pin-based shape displays with n pins. Our solution is able to fully render any 2.5D shape within a time proportional to , which outperforms the 2014 Lemelson-MIT ”Use it!” graduate winner solution using half as much actuators. Additionally, results also show that, for specific shapes, our approach can perform as well as the most efficient and much more expensive technologies currently used. We expect that our solution will make it possible to create low-cost actuated surfaces with different sizes, from small tactile objects to large structures such as shape-changing floors.

Latency of 30 ms Benefits First Person Targeting Tasks More Than Refresh Rate Above 60 Hz

In competitive sports, human performance makes the difference between who wins and loses. In some competitive video games (esports), response time is an essential factor of human performance. When the athlete’s equipment (computer, input and output device) responds with lower latency, it provides a measurable advantage. In this study, we isolate latency and refresh rate by artificially increasing latency when operating at high refresh rates. Eight skilled esports athletes then perform gaming-inspired first person targeting tasks under varying conditions of refresh rate and latency, completing the tasks as quickly as possible. We show that reduced latency has a clear benefit in task completion time while increased refresh rate has relatively minor effects on performance when the inherent latency reduction present at high refresh rates is removed. Additionally, for certain tracking tasks, there is a small, but marginally significant effect from high refresh rates alone.

Augmented Reality Guided Respiratory Liver Tumors Punctures: A Preliminary Feasibility Study

CT-guided radiofrequency ablation (RFA) has evolved rapidly over the past decade and become a widely accepted treatment option for patients with liver tumors. However, it is hard for doctors to locate tumors precisely while avoid damaging the surrounding risk structures with 2D CT images, which only provides limited static information, especially in case of respiratory motion. This paper presents a novel augmented reality guidance modality for improving the precision of liver tumors punctures by providing visual cue of 3D personalized anatomy with respiratory motion. Optical see-through display devices Epson MoveRio BT300 and Microsoft HoloLens are used to mix pre-operative 3D personalized data and intra-operative physical scene. Here an augmented reality based surgical navigation pipeline is proposed to achieve the transformation from raw medical data to virtual guidance information and precisely superimpose this information onto real experimental animal. In addition, to alleviate the difficulty during needle placement induced by respiratory motion, we proposed a correlation model to real-timely predict the tumor position via regression based respiration state estimation and the statistical tumor motion model. We experimentally validated the proposed system on in vivo beagle dogs with artificial lesion, which can effectively improve the puncture efficiency and precision. The proposed augmented reality modality is a general strategy to guide the doctors perform precise percutaneous puncture under respiration conditions and has the potential to be used for other surgical navigation tasks.

Effect of Attention Adaptive Personal Audio Deliverable System on Digital Signage

The purpose of this study is to improve the comfortability of the space using digital signage and the effectiveness of the advertisement to be displayed. So we have developed a system that only reached sound information to people who need it, like people watching a screen.

This system is composed of head direction and human position detected by a camera and super directive sound by a parametric speaker. These make the sound volume increase only when the head faces camera, and the speaker to be used is automatically selected according to the position of the person. Because of the super directive sound, it is designed not to be outputted to anyone who does not need it.

The experiment using the system was conducted. Compared to the conventional loudspeaker, this system has improved space comfortability while maintaining advertising effect. This suggests that this system can create a more comfortable environment for customers.