SA '18- SIGGRAPH Asia 2018 Technical Briefs

Full Citation in the ACM Digital Library

SESSION: Character animation

A magic wand for motion capture editing and edit propagation

This paper introduces a new method for editing character animation, by using a data-driven pose distance as a falloff to interpolate edited poses seamlessly into a motion sequence. This pose distance is defined using Green's function of the pose space Laplacian. The resulting falloff shape and timing are derived from and reflect the motion itself, replacing the need for a manually adjusted falloff spline. This data-driven region of influence is somewhat analogous to the difference between a generic 2D spline and the "magic wand" selection in an image editor, but applied to the animation domain. It also supports powerful non-local edit propagation in which edits are applied to all similar poses in the entire animation sequence.

Smeat: ADMM based tools for character deformation

Recent work on physical simulation in computer graphics has focused on energy minimization formulations of dynamics based on fast optimization methods. Because these methods can efficiently tackle a wide range of problems in the realms of both dynamic simulation and static relaxation, we adopted one such method, ADMM, and used it to implement a set of creature tools we call Smeat. We will describe this tool set and give a case study of how it was used to create character effects on "Avengers: Infinity War".

Neural network in combination with a differential evolutionary training algorithm for addressing ambiguous articulated inverse kinematic problems

Inverse kinematic systems are an important tool in many disciplines (from animated game characters to robotic structures). However, inverse kinematic problems are a challenging topic (due to their computational cost, highly non-linear nature and discontinuous, ambiguous characteristics with multiple or no-solutions). Neural networks offer a flexible computational model that is able to address these difficult inverse kinematic problems where traditional, formal techniques would be difficult or impossible In this paper, we present a solution that combines an artificial neural network and a differential evolutionary algorithm for solving inverse kinematic problems. We explore the potential advantages of neural networks for providing robust solutions to a wide range of inverse kinematic problems, particularly areas involving multiple fitness criteria, optimization, pattern and comfort factors, and function approximation. We evaluate the technique through experimentation, such as, training times, fitness criteria and quality metrics.

Recurrent transition networks for character locomotion

We present a novel approach, based on deep recurrent neural networks, to automatically generate transition animations given a past context of a few frames, a target character state and optionally local terrain information. The proposed Recurrent Transition Network (RTN) is trained without any gait, phase, contact or action labels. Our system produces realistic and fluid transitions that rival the quality of Motion Capture-based animations, even without any inverse-kinematics post-process. Our system could accelerate the creation of transition variations for large coverage or even replace transition nodes in a game's animation graph. The RTN also shows impressive results on a temporal super-resolution task.

SESSION: Making pretty pictures

As-compact-as-possible vectorization for character images

It is quite time-consuming to produce high-quality compact font libraries, especially for north-east Asia language systems that contain thousands of characters. However, existing vectorization algorithms either generate results with large storage requirements, or lose most stylish details after vectorization. To solve this problem, we propose a novel data-driven vectorization algorithm for character images to make the number of control points on vectorized contours as few as possible while preserving significant details. Experimental results demonstrate that our method clearly outperforms other state-of-the-art approaches by not only preserving most stylish features but also dramatically reducing the size of vectorized fonts.

Brushing element fields

Aggregate elements following certain directions have a variety of applications in graphics, design, and visualization. However, authoring oriented elements in various output domains, especially in 3D, remains challenging. We propose a novel brushing system to facilitate interactive authoring of aggregate elements with diverse properties over given output domains via an element synthesis approach. To increase output quality and reduce input workload, we further propose element fields that can automatically orient the entire elements in better alignments over the output domains according to partially user-specified strokes. The proposed system can effectively synthesize distinct types of elements within various output domains in higher quality and efficiency and offer more user friendliness than existing practices. Our method can be applied to practical applications such as graphic design, artistic collage, and aggregate modeling.

Learning photo enhancement by black-box model optimization data generation

We address the problem of automatic photo enhancement, in which the challenge is to determine the optimal enhancement for a given photo according to its content. For this purpose, we train a convolutional neural network to predict the best enhancement for given picture. While such machine learning techniques have shown great promise in photo enhancement, there are some limitations. One is the problem of interpretability, i.e., that it is not easy for the user to discern what has been done by a machine. In this work, we leverage existing manual photo enhancement tools as a black-box model, and predict the enhancement parameters of that model. Because the tools are designed for human use, the resulting parameters can be interpreted by their users. Another problem is the difficulty of obtaining training data. We propose generating supervised training data from high-quality professional images by randomly sampling realistic de-enhancement parameters. We show that this approach allows automatic enhancement of photographs without the need for large manually labelled supervised training datasets.

SESSION: Interaction and education

Hybridizing education of both video games and animated films

Our animation program is a relatively small program that uses large-group projects to teach students. For 15 years, the undergraduate seniors have grouped together each year to create a single large-group animated short film, a consistently successful educational experience leading to solid foundational knowledge, successful hires, and yearly top awards. Six years ago, some of our students approached us wanting to create video games instead of films. This raised the question: can we teach both film and games without compromising the success and educational value that has come from focusing only on animated film? After some experience, our answer was yes. Though films and games have significant differences, we are still able to create both film and video games within a single program. Here we address how to overlap similarities and approach differences in combining the teaching of film and games.

Personalizing homemade bots with plug & play AI for STEAM education

In this study, we propose a new framework for hands-on educational modules to introduce ideas in AI and robotics casually, quickly, and effectively in one package for beginners of all ages in STEAM fields. Today, courses on introductory robotics are found everywhere, from K-12 summer camps to adult continuing education. However, most of them are limited to learning basic skills on sensor-actuator interactions due to their limited time and can rarely introduce what recent exciting AI can do, such as image recognition. As a case study to demonstrate the idea of the framework, an educational module to create a toy car with a camera controlled by Raspberry Pi is introduced. Our approach uses both physical and digital environments. Participants experience running their toy cars on a physical track using a convolutional neural network (CNN) trained based on how participants drive cars in a virtual game. The tested idea can be extensible as a framework to many other examples of robotics projects and can make ideas of AI and robotics more accessible to everyone. A proposed AI model is trained to assimilate the participant's game-play style in a VR environment which will be later re-enacted by the physical robot assembled by participants. Through this approach, we intend to demonstrate the AI's ability to personalize things and hope to stimulate participants' curiosity and motivation to learn.

Interactive design and optimization of free-formed returning boomerang

We present an interactive tool to model a returning boomerang envisioned by a user. Designing a functional and fashionable boomerang requires the computation of aerodynamics based on fluid simulation, but this computation remains difficult for interactive designs. Hence, we employ a data-driven approach [Nakamura et al. 2016; Umetani et al. 2014] by using a simple approximation instead of fluid simulation. The result shows that our interface can interactively visualize the overall 3D flight trajectory of free-formed boomerangs. We also propose an automatic assistance that maximizes two functional elements, namely (1) "spin ability" about an axis perpendicular to the direction of its flight and (2) "returning ability" to return to its throwers. By fabricating the resulting design, we can enjoy a fascinating flight of the boomerang. In addition, we conduct a user study and confirm that the proposed interface is effective for a creative boomerang design.

Coded skeleton: shape changing user interface with mechanical metamaterial

We propose a design method for fabricating a novel shape-changing user interface, called "Coded Skeleton", by computationally integrating actuators and sensors using a mechanical metamaterial. This design method realizes the deformation of various curves using simple expansion and contraction actuators, leveraging the fact that the Coded Skeleton is flexible in one deformation mode but stiff in other. We describe the design method and structural analysis of the mechanical metamaterial that can uniquely define deformation along with outlining the creation and control method of the Coded Skeleton using this structure. Finally, we propose three applications of the Coded Skeleton.

SESSION: Production rendering

Directional lightmap encoding insights

Lightmaps that respond to normal mapping are commonly used in video games. This short paper describes a novel parameterization of a standard lightmap encoding, Ambient Highlight Direction (AHD) --- a model for directional irradiance consisting of ambient and directional light --- that eliminates common interpolation artifacts. It also describes a technique for fitting the AHD model to lighting represented as spherical harmonics, where the unknown model parameters are solved in the null space of the constraint that irradiance is preserved.

Spider-Man IG-impostors: cityscapes and beyond

Spider-Man's traversal though Manhattan in the video game <u>Marvel's Spider-Man</u> (2018) for Sony's PS4 platform allows the player to climb any building and jump off any structure while rotating the view 360 degrees at 30 frames per second and displayed at 4k resolution. Flat card "billboard" style impostor systems could not represent the city environment at the desired quality so Insomniac Games developed the 3D IG-Impostor system to represent the mid to distant cityscape for Marvel's Manhattan in an efficient and persistent cache. This environment data cache was then available for multi-view rendering used by other systems within the Insomniac Engine. There are no 2D impostors in <u>Marvel's Spider-Man.</u>

Robust deep residual denoising for Monte Carlo rendering

We propose a Deep Residual Learning based method that consistently outperforms both the state-of-the-art handcrafted denoisers and learning-based methods for single-image Monte Carlo denoising. Unlike the indirect nature of existing learning-based methods which estimate the parameters and kernel weights of a filter, we map directly the noisy input image to its noise-free counterpart. Our method uses only three common auxiliary features (depth, normal, and albedo), and this minimal requirement on auxiliary data simplifies both the training and integration of our method into most production rendering pipelines. We have evaluated our method on unseen images produced by a different renderer. Consistently high quality denoising results are obtained in all cases. We plan to release our training dataset as we are aware that the lack of publicly available training data is currently an entry barrier of learning based denoising research for Monte Carlo rendering.

Production ray tracing of feature lines

Automated feature line drawing of virtual 3D objects helps artists depict shapes and allows for creating stylistic rendering effects. High-fidelity drawing of lines that are very thin or have varying thickness and color, or lines of recursively reflected and refracted objects, is a challenging task. In this paper we describe an image-based feature detection and line drawing method that integrates naturally into a ray tracing renderer and runs as a post-process, after the pixel sampling stage. Our method supports arbitrary camera projections and surface shaders, and its performance does not dependent on the geometric complexity of the scene but on the pixel sampling rate. By leveraging various attributes stored in every pixel sample, which are typically available in production renderers, e.g. for arbitrary output variables (AOVs), feature lines of reflected and refracted objects can be obtained with relative ease. The color and width of the lines can be driven by the surface shaders, which allows for achieving a wide variety of artistic styles.

SESSION: Rays of light

Hessian-based robust ray-tracing of implicit surfaces on GPU

In recent years, the Ray Tracing of Implicit Surfaces on a GPU has been studied by many researchers. However, the existing methods have challenges that mainly includes solving for self-intersecting surfaces. General solutions for Ray Tracing suffer from the problem of false roots, and robust solutions are hard to generalize. In this paper, we present a robust algorithm based on Extended Taylor-Test Adaptive Marching Points, which allows a robust rendering of Self-Intersecting Implicit Surfaces on a GPU. We are using the Second Order Taylor Series expansion to alleviate the problem of double-roots in Self-Intersecting Implicit Surfaces. Our approach is simple to implement and is based on the Hessian Matrix of the Implicit Surface that can be attributed to the Hessian Matrix can be used to obtain second-order Taylor Series expansion for the univariate ray-equation. We compare our results using the simulated ground-truth with the smallest step-size possible with the proposed algorithm, and our proposed algorithm gives the best visual results as well as highest SSIM percentage than other approaches.

Error estimation for many-light rendering with supersampling

Many-light rendering unifies the computation of various visual and illumination effects, which include anti-aliasing, depth of field, volumetric scattering, and subsurface scattering, into a simple direct illumination computation from many virtual point lights (VPLs). As a naive approach that sums the direct illumination from a large number of VPLs is computationally expensive, scalable methods cluster VPLs and estimate the sum by sampling a small number of VPLs for efficient computation. Although scalable methods have achieved significant speed-ups, they cannot control the error owing to clustering, resulting in noise in the rendered images. In this paper, we propose a method to improve the estimation accuracy for many-light rendering of such visual and illumination effects. We demonstrate that our method can improve the estimation accuracy for various visual and illumination effects up to 2.3 times compared with the previous method.

Fast raycasting using a compound deep image for VPL range determination

The concept of using multiple deep images has been explored, under a variety of different names, as a possible acceleration approach for finding ray-geometry intersections. We leverage recent advances in deep image processing from Order Independent Transparency (OIT) for fast building of a Compound Deep Image (CDI) using a coherent memory format well suited for raycasting. We explore the use of a CDI and raycasting for the problem of determining distance between Virtual Point Lights (VPLs) and geometry for indirect lighting, with the raycasting step being a small fraction of total frametime.

Reconstruction of volumetric reflectance using spatio-sequential frequency correlation imaging

In this paper, we propose a novel pro-cam technique for reconstruction of the volumetric reflectance inside an object. The key concept is use of spatio-sequentially modulated illumination to extract only the required signal at a 3D point. We discovered an effect where a projector and camera pair with different focal lengths naturally produced a spatial frequency modulation. By combining this effect with a direct conversion technique to demodulate the signals, the resulting spatio-sequential frequency correlation enables reconstruction of the reflectance at a 3D point within the object. Experimental results based on both synthetic and real data show that the proposed method can reconstruct volumetric reflectance discretely.

SESSION: Understanding images

Gourmet photography dataset for aesthetic assessment of food images

In this study, we present the Gourmet Photography Dataset (GPD), which is the first large-scale dataset for aesthetic assessment of food photographs. We collect 12,000 food images together with human-annotated labels (i.e., aesthetically positive or negative) to build this dataset. We evaluate the performance of several popular machine learning algorithms for aesthetic assessment of food images to verify the effectiveness and importance of our GPD dataset. Experimental results show that deep convolutional neural networks trained on GPD can achieve comparable performance with human experts in this task, even on unseen food photographs. Our experiments also provide insights to support further study and applications related to visual analysis of food images.

On the convergence and mode collapse of GAN

Generative adversarial network (GAN) is a powerful generative model. However, it suffers from several problems, such as convergence instability and mode collapse. To overcome these drawbacks, this paper presents a novel architecture of GAN, which consists of one generator and two different discriminators. With the fact that GAN is the analogy of a minimax game, the proposed architecture is as follows. The generator (G) aims to produce realistic-looking samples to fool both of two discriminators. The first discriminator (D1) rewards high scores for samples from the data distribution, while the second one (D2) favors samples from the generator conversely. Specifically, the ResBlock and minibatch discrimination (MD) architectures are adopted in D1 to improve the diversity of the samples. The leaky rectified linear unit (Leaky ReLU) and batch normalization (BN) are replaced by the scaled exponential linear unit (SELU) in D2 to alleviate the convergence problem. A new loss function that minimizes the KL divergence is designed to better optimize the model. Extensive experiments on CIFAR-10/100 datasets demonstrate that the proposed method can effectively solve the problems of convergence and mode collapse.

Removing objects from videos with a few strokes

We present a system for the removal of objects from videos. As an input, the system only needs a user to draw a few strokes in at least one frame, roughly delimiting the objects to be removed. These rough masks are then automatically refined and propagated through the video. The corresponding regions are resynthesized using video inpainting techniques. Our system is able to deal with multiple, possibly crossing objects, with complex motions and with dynamic textures. This results in a computational tool that can alleviate tedious manual operations for editing high-quality videos.

Dunhuang mural restoration using deep learning

As time goes by, the art pieces inside Dunhuang Grottoes have suffered from tremendous damage such as mural deterioration, and they are usually difficult to be repaired. Although we can achieve digital preservation by modeling the caves and preserving mural as textures in virtual environment, we still cannot have a glimpse of how the grottoes look like without damage. In this work, we propose a systematic restoration framework, which is based on Generative Adversarial Network (GAN) technique, for these high-resolution but deteriorated mural textures. The main idea is to make the machine learn the transformation between deteriorated mural textures and restored mural textures. However, the resolution of training texture images (i.e. 8192×8192) is too high to be applied with GAN technology directly due to GPU RAM limitation. Instead, our method restores a set of high-resolution yet color-inconsistent textures patch-by-patch and a set of low-resolution but color-consistent full textures, and then combines them to get the final high-resolution and color-consistent result.

SESSION: Methods for automation

Automatic site selection of cultural venues

Cultural venues, such as libraries, theatres, cinemas and galleries, contribute to a city's tourism and economy, and enrich the cultural life of the local residents. In this paper, we propose a novel approach to automatic site selection of cultural venues in an urban area, which requires less expertise in urban planning. The two-stage approach consists of a learning stage for predicting zones as a prior constraint, and an optimisation stage for determining the number of cultural venues and their exact locations according to multiple criteria. Given an input set of urban data, our approach generates an optimal configuration of two-dimensional locations for cultural venues that complies with land use policies and provides easy access for the public. We implemented the approach using reliable methods of deep learning and stochastic optimisation, and the results demonstrate the approach's effectiveness by a comparison to their real-world counterparts.

Automatic route planning for GPS art generation

In this paper, we present a novel approach for the automated route generation of global positioning system (GPS) artwork. The term GPS artwork describes the generation of drawings by leaving virtual traces on digital maps. Until now, a creation of these images has required a manual planning phase in which the artists design the route by hand. Once the route for this artwork has been planned, GPS devices have been used to track the movement. Using the presented solution, the lengthy planning phase can be significantly shortened and art creation is open to a broader public.

Optimal and interactive keyframe selection for motion capture

Motion capture is increasingly used in games and movies. However, it often requires editing before it can be used. Unfortunately, editing is laborious because of the low-level representation of the data. Existing motion editing methods accomplish modest changes, but larger edits require the artist to "re-animate" the motion by manually selecting a subset of the frames as keyframes. In this paper, we automatically find sets of frames that serve as keyframes for editing the motion. We formulate the problem of selecting an optimal set of keyframes as a type of shortest-path problem, and solve this problem using efficient dynamic programming. Our algorithm can simplify motion capture to around 10% of the original number of frames while retaining most of its detail. By simplifying animation with our algorithm, we realize a new approach to motion editing and stylization founded on the time-tested keyframe interface.

Accurate 3D locating and tracking of basketball players from multiple videos

With the development of pedestrian detection technologies, existing methods cannot simultaneously satisfy high-quality detection and fast calculation for practical applications, especially for accurate 3D locating and tracking of basketball players. We propose an algorithm which can robustly and automatically locate and track basketball players from multiple videos. After extracting the foregrounds, the voxels in the basketball court space are projected back to the foreground images. Occupied voxels are accumulated and smoothed based on integral space for acceleration. Two Gaussian Mixture Models including Grouping Gaussian Mixture Model(GGMM) and Locating Gaussian Mixture Model(LGMM) are designed for continuous locating and grouping players, and a simple blob detector is employed to handle out-of-bound players. Our algorithm is insensitive to occlusions, shadows, lights and computation errors.

SESSION: AR/VR

FlyingHand: extending the range of haptic feedback on virtual hand using drone-based object recognition

This paper presents a Head Mounted Display (HMD) integrated system, that uses a drone and a virtual hand to help the users explore remote environment. The system allows the users to use hand gestures to control the drone and identify the Objects of Interest (OOI) through tactile feedback. The system uses a Convolutional Neural Network to perform object classification with the drone captured image and provides a virtual hand to realize interaction with the object. Accodingly, tactile feedback is also provided to users' hands to enhance the virtual hand body ownership. The system aims to help users assess space and objects regardless of body limitations, which could not only benefit elderly or handicapped people, but make potential contributions in environment measurement and daily life as well.

Head-tracked off-axis perspective projection improves gaze readability of 3D virtual avatars

Virtual avatars have been employed in many contexts, from simple conversational agents to communicating the internal state and intentions of large robots when interacting with humans. Rarely, however, are they employed in scenarios which require non-verbal communication of spatial information or dynamic interaction from a variety of perspectives. When presented on a flat screen, many illusions and visual artifacts interfere with such applications, which leads to a strong preference for physically-actuated heads and faces.

By adjusting the perspective projection used to render 3D avatars to match a viewer's physical perspective, they could provide a useful middle ground between typical 2D/3D avatar representations, which are often ambiguous in their spatial relationships, and physically-actuated heads/faces, which can be difficult to construct or impractical to use in some environments. A user study was conducted to determine to what extent a head-tracked perspective projection scheme was able to mitigate the issues in readability of a 3D avatar's expression or gaze target compared to use of a standard perspective projection. To the authors' knowledge, this is the first user study to perform such a comparison, and the results show not only an overall improvement in viewers' accuracy when attempting to follow the avatar's gaze, but a reduction in spatial biases in predictions made from oblique viewing angles.

Efficient light field computation for view range expansion using viewpoint reduction

In this paper, we present an improved light field display method with wider view range. In our system, two stacked transparent LCDs are used for glasses-free light field display. They modulate the uniform backlight which enters the observer's eyes to approximate the desired light field. The patterns displayed on the LCDs are optimized according to the target light field using an algorithm based on nonnegative matrix factorization (NMF). In order to achieve wider view range and reduce the computational complexity of the pattern optimization, we utilize a small number of sampling views of the light field. To properly choose the subset of the sampling views, a stochastic sampling algorithm is adopted. The effectiveness of the proposed method is demonstrated by the experimental results, and similar light field display results can be generated with reduced sampling views.

Eholo glass: electroholography glass. a lensless approach to holographic augmented reality near-eye display

We present a design and rendering method for large eye-box, fully parallax, depth of field included near-eye augmented reality (AR) display. As developments in AR progress, field of view and sense of depth are one of the most crucial factors for rendering convincing virtual objects into real environments. We propose computer generated holography (CGH) that is able to reconstruct image with real world depth of field faithfully as rendering method. Previous studies have proposed various near-eye optic design such as the use of beamsplitter and Holographic Optical Element with 4f lens system. However pure beamsplitter design suffers from the narrow field of view while 4f lens system has lens aberration as well as minimal focusing issues that leads to smaller eyebox. Having a wide field of view that matches our eyes is crucial for having an immersive experience and often narrow field of view may even leads to nausea and negative impacts on comfortability. We propose a design that utilizes a Dihedral Corner Reflector Array and a novel beamsplitter embedded optics as our eyepiece. Our primary contribution is having a reasonably large eyebox while maintaining the simple optical design as well as rendering of virtual objects with depth of field in real time without any special optics or moving parts.

SESSION: Data acquisition

Synthesis and rendering of seamless and non-repetitive 4D texture variations for measured optical material properties

We have lifted the one weakness of an existing fully automatic acquisition system for spatially varying optical material behavior of real object surfaces. While its expression of spatially varying material behavior with spherical dependence on incoming light as 4D texture (ABTF material model) allows flexible mapping on arbitrary 3D geometries, photo-realistic rendering and interaction in real-time, this very method of texture-like representation exposed it to common problems of texturing, striking in two levels. First, non-seamless textures create visible border artifacts. Second, even a perfectly seamless texture causes repetition artifacts due to side-by-side distribution in large numbers over the 3D surface. We solved both problems through our novel texture synthesis that generates a set of seamless texture variations randomly distributed on the surface at shading time. When compared to regular 2D textures, the inter-dimensional coherence of the 4D ABTF material model poses entirely new challenges to texture synthesis, which includes maintaining the consistency of material behavior throughout the space spanned by the spatial image domain and the angular illumination hemisphere. In addition, we tackle the increased memory consumption caused by the numerous variations through a fitting scheme specifically designed to reconstruct the most prominent effects captured in the material model.

Creating a virtual human that visualizes skin strain distribution for apparel wearing simulation

This paper describes the first step of our research collaboration to create a virtual human that simulates strain of the skin to aid the design of comfortable, ergonomic sportswear. We apply insights from sports science and computer graphics. For the former, human's kinematic properties, such as the angular rotation for each joint or the magnitude of strain on the subject's skin, play an important role in understanding and simulating human movements. Contrarily, during the creation of computer graphics characters, artists mainly focus on a plausible appearance instead of these properties. Therefore, creating such a virtual human model poses several significant interdisciplinary challenges. We demonstrate several collaborative efforts and initial research results, focusing on how to visualize human skin strain distribution of lower body of a 3D human model. The human surface model, skin strain shader, and its rig are also briefly described. Finally, we discuss our early results and future scope of our collaborative work.

FIST: a fast, implicit model of the human hand with semi-anatomical structures

There are many opportunities to draw human hands in computer graphics. The motion of internal organs within a human hand has a non-negligible effect on the natural change in the appearance of the hand's surface. In this work, we propose a method for expressing this change interactively with the use of an implicit model of a human hand that has semi-anatomical structures. The model is referred to as Fast, Implicit model with Semi-anatomical sTructures or FIST. In the FIST model, bones are modeled anatomically based on computed tomography imaging, while soft tissues are modeled artificially. It can be controlled only by specifying the angles of the joints. The proposed method can contribute to a compelling expression of the dynamism in such hand motions as grasping, pinching, and scratching in immersive virtual reality and games.