I3D '20: Symposium on Interactive 3D Graphics and Games

Full Citation in the ACM Digital Library

Procedural band patterns

We seek to cover a parametric domain with a set of evenly spaced bands which number and width varies according to a density field. We propose an implicit procedural algorithm, that generates the band pattern from a pixel shader and adapts to changes to the control fields in real time. Each band is uniquely identified by an integer. This allows a wide range of texturing effects, including specifying a different appearance in each individual bands. Our technique also affords for progressive gradations of scales, avoiding the abrupt doubling of the number of lines of typical subdivision approaches. This leads to a general approach for drawing bands, drawing splitting and merging curves, and drawing evenly spaced streamlines. Using these base ingredients, we demonstrate a wide variety of texturing effects.

Stochastic Substitute Trees for Real-Time Global Illumination

With the introduction of hardware-supported ray tracing and deep learning for denoising, computer graphics has made a considerable step toward real-time global illumination. In this work, we present an alternative global illumination method: The stochastic substitute tree (SST), a hierarchical structure inspired by lightcuts with light probability distributions as inner nodes. Our approach distributes virtual point lights (VPLs) in every frame and efficiently constructs the SST over those lights by clustering according to Morton codes. Global illumination is approximated by sampling the SST and considers the BRDF at the hit location as well as the SST nodes’ intensities for importance sampling directly from inner nodes of the tree. To remove the introduced Monte Carlo noise, we use a recurrent autoencoder. In combination with temporal filtering, we deliver real-time global illumination for complex scenes with challenging light distributions.

Real-time Face Video Swapping From A Single Portrait

We present a novel high-fidelity real-time method to replace the face in a target video clip by the face from a single source portrait image. Specifically, we first reconstruct the illumination, albedo, camera parameters, and wrinkle-level geometric details from both the source image and the target video. Then, the albedo of the source face is modified by a novel harmonization method to match the target face. Finally, the source face is re-rendered and blended into the target video using the lighting and camera parameters from the target video. Our method runs fully automatically and at real-time rate on any target face captured by cameras or from legacy video. More importantly, unlike existing deep learning based methods, our method does not need to pre-train any models, i.e., pre-collecting a large image/video dataset of the source or target face for model training is not needed. We demonstrate that a high level of video-realism can be achieved by our method on a variety of human faces with different identities, ethnicities, skin colors, and expressions.

Repurposing a Relighting Network for Realistic Compositions of Captured Scenes

Multi-view stereo can be used to rapidly create realistic virtual content, such as textured meshes or a geometric proxy for free-viewpoint Image-Based Rendering (IBR). These solutions greatly simplify the content creation process compared to traditional methods, but it is difficult to modify the content of the scene. We propose a novel approach to create scenes by composing (parts of) multiple captured scenes. The main difficulty of such compositions is that lighting conditions in each captured scene are different; to obtain a realistic composition we need to make lighting coherent. We propose a two-pass solution, by adapting a multi-view relighting network. We first match the lighting conditions of each scene separately and then synthesize shadows between scenes in a subsequent pass. We also improve the realism of the composition by estimating the change in ambient occlusion in contact areas between parts and compensate for the color balance of the different cameras used for capture. We illustrate our method with results on multiple compositions of outdoor scenes and show its application to multi-view image composition, IBR and textured mesh creation.

DenseGATs: A Graph-Attention-Based Network for Nonlinear Character Deformation

In animation production, animators always spend significant time and efforts to develop quality deformation systems for characters with complex appearances and details. In order to decrease the time spent repetitively skinning and fine-tuning work, we propose an end-to-end approach to automatically compute deformations for new characters based on existing graph information of high-quality skinned character meshes. We adopt the idea of regarding mesh deformations as a combination of linear and nonlinear parts and propose a novel architecture for approximating complex nonlinear deformations. Linear deformations on the other hand are simple and therefore can be directly computed, although not precisely. To enable our network handle complicated graph data and inductively predict nonlinear deformations, we design the graph-attention-based (GAT) block to consist of an aggregation stream and a self-reinforced stream in order to aggregate the features of the neighboring nodes and strengthen the features of a single graph node. To reduce the difficulty of learning huge amount of mesh features, we introduce a dense connection pattern between a set of GAT blocks called “dense module” to ensure the propagation of features in our deep frameworks. These strategies allow the sharing of deformation features of existing well-skinned character models with new ones, which we call densely connected graph attention network (DenseGATs). We tested our DenseGATs and compared it with classical deformation methods and other graph-learning-based strategies. Experiments confirm that our network can predict highly plausible deformations for unseen characters.

Generalized Microscropic Crowd Simulation using Costs in Velocity Space

To simulate the low-level (‘microscopic’) behavior of human crowds, a local navigation algorithm computes how a single person (‘agent’) should move based on its surroundings. Many algorithms for this purpose have been proposed, each using different principles and implementation details that are difficult to compare.

This paper presents a novel framework that describes local agent navigation generically as optimizing a cost function in a velocity space. We show that many state-of-the-art algorithms can be translated to this framework, by combining a particular cost function with a particular optimization method. As such, we can reproduce many types of local algorithms using a single general principle.

Our implementation of this framework, named umans(Unified Microscopic Agent Navigation Simulator), is freely available online. This software enables easy experimentation with different algorithms and parameters. We expect that our work will help understand the true differences between navigation methods, enable honest comparisons between them, simplify the development of new local algorithms, make techniques available to other communities, and stimulate further research on crowd simulation.

Interactive Inverse Spatio-Temporal Crowd Motion Design

We introduce a new inverse modeling method to interactively design crowd animations. Few works focus on providing succinct high-level and large-scale crowd motion modeling. Our methodology is to read in real or virtual agent trajectory data and automatically infer a set of parameterized crowd motion models. Then, components of the motion models can be mixed, matched, and altered enabling rapidly producing new crowd motions. Our results show novel animations using real-world data, using synthetic data, and imitating real-world scenarios. Moreover, by combining our method with our interactive crowd trajectory sketching tool, we can create complex spatio-temporal crowd animations in about a minute.

Real-time Muscle-based Facial Animation using Shell Elements and Force Decomposition

We present a novel algorithm for physics-based real-time facial animation driven by muscle deformation. Unlike the previous works using 3D finite elements, we use a 2D shell element to avoid inefficient or undesired tessellation due to the thin structure of facial muscles. To simplify the analysis and achieve real-time performance, we adopt real-time thin shell simulation of [Choi et al. 2007]. Our facial system is composed of four layers of skin, subcutaneous layer, muscles, and skull, based on human facial anatomy. Skin and muscles are composed of shell elements, subcutaneous fatty tissue is assumed as a uniform elastic body, and the fixed part of facial muscles is handled by static position constraint. We control muscles to have stretch deformation using modal analysis and apply mass-spring force to skin mesh which is triggered by the muscle deformation. In our system, only the region of interest for skin can be affected by the muscle. To handle the coupled result of facial animation, we decouple the system according to the type of external forces applied to the skin. We show a series of real-time facial animation caused by selected major muscles that are relevant to expressive skin deformation. Our system has generality for importing new types of muscles and skin mesh when their shape or positions are changed.

Contour-based 3D Modeling through Joint Embedding of Shapes and Contours

In this paper, we propose a novel space that jointly embeds both 2D occluding contours and 3D shapes via a variational autoencoder (VAE) and a volumetric autoencoder. Given a dataset of 3D shapes, we extract their occluding contours via projections from random views and use the occluding contours to train the VAE. Then, the obtained continuous embedding space, where each point is a latent vector that represents an occluding contour, can be used to measure the similarity between occluding contours. After that, the volumetric autoencoder is trained to first map 3D shapes onto the embedding space through a supervised learning process and then decode the merged latent vectors of three occluding contours (from three different views) of a 3D shape to its 3D voxel representation. We conduct various experiments and comparisons to demonstrate the usefulness and effectiveness of our method for sketch-based 3D modeling and shape manipulation applications.

User-guided 3D reconstruction using multi-view stereo

We present a user-guided system for accessible 3D reconstruction and modeling of real-world objects using multi-view stereo. The system is an interactive tool where the user models the object on top of multiple selected photographs. Our tool helps the user place quads correctly aligned to the photographs using a multi-view stereo algorithm. This algorithm in combination with user-provided information about topology, visibility, and how to separate foreground from background, creates favorable conditions in successfully reconstructing the object.

The user only needs to manually specify a coarse topology which, followed by subdivision and a global optimization algorithm, creates an accurate model with the desired mesh density. This global optimization algorithm has a higher probability of converging to an accurate result than a fully automatic system.

With our proposed tool, we lower the barrier of entry for creating high-quality 3D reconstructions of real-world objects with a desirable topology. Our interactive tool separates the most tedious and difficult parts of modeling to the computer, while giving the user control over the most common robustness issues in automatic 3D reconstruction.

The provided workflow can be a preferable alternative to using automatic scanning techniques followed by re-topologization.

Progressive Regularization of Satellite-Based 3D Buildings for Interactive Rendering

Automatic creation of lightweight 3D building models from satellite image data enables large and widespread 3D interactive urban rendering. Towards this goal, we present an inverse procedural modeling method to automatically create building envelopes from satellite imagery. Our key observation is that buildings exhibit regular properties. Hence, we can overcome the low-resolution, noisy, and partial building data obtained from satellite by using a two stage inverse procedural modeling technique. Our method takes in point cloud data obtained from multi-view satellite stereo processing and produces a crisp and regularized building envelope suitable for fast rendering and optional projective texture mapping. Further, our results show highly complete building models with quality superior to that of other compared-to approaches.

Automatic GPU Data Compression and Address Swizzling for CPUs via Modified Virtual Address Translation

We describe how to modify hardware page translation to enable CPU software access to compressed and swizzled GPU data arrays as if they were decompressed and stored in row-major order. In a shared memory system, this allows CPU to directly access the GPU data without copying the data or losing the performance and bandwidth benefits of using compression and swizzling on the GPU.

Our method is flexible enough to support a wide variety of existing and future swizzling and compression schemes, including block-based lossless compression that requires per-block meta-data.

Providing automatic compression can improve performance, even without considering the cost of copying data. In our experiments, we observed up to 33% reduction in CPU/memory energy use and up to 35% reduction in CPU computation time.

On Ray Reordering Techniques for Faster GPU Ray Tracing

We study ray reordering as a tool for increasing the performance of existing GPU ray tracing implementations. We focus on ray reordering that is fully agnostic to the particular trace kernel. We summarize the existing methods for computing the ray sorting keys and discuss their properties. We propose a novel modification of a previously proposed method using the termination point estimation that is well-suited to tracing secondary rays. We evaluate the ray reordering techniques in the context of the wavefront path tracing using the RTX trace kernels. We show that ray reordering yields significantly higher trace speed on recent GPUs (1.3 − 2.0 ×), but to recover the reordering overhead in the hardware-accelerated trace phase is problematic.

Computing Centroidal Voronoi Tessellation Using the GPU

We propose a novel algorithm to compute centroidal Voronoi tessellation using the GPU. It is based on the iterative approach of Lloyd’s method while having good considerations to address the two major challenges of achieving fast convergence with few iterations, and at the same time achieving fast computation within each iteration. Our implementation of the algorithm can complete the computation for a large image in the order of hundreds of milliseconds and is faster than all prior work on a state-of-the-art GPU. As such, it is now easier to integrate centroidal Voronoi tessellations into interactive applications.

RANDM: Random Access Depth Map Compression Using Range-Partitioning and Global Dictionary

We present a novel random-access depth map compression algorithm (RANDM) for interactive rendering. Our compressed representation provides random access to the depth values and enables real-time parallel decompression on commodity hardware. Our method partitions the depth range captured in a given scene into equal-sized intervals and uses this partition to generate three separate components that exhibit higher coherence. Each of these components is processed independently to generate the compressed stream. Our decompression algorithm is simple and performs prefix-sum computations while also decoding the entropy compressed blocks. We have evaluated the performance on large databases on depth maps and obtain a compression ratio of 20 − 100 × with a root-means-square (RMS) error of 0.05 − 2 in the disparity values of the depth map. The decompression algorithm is fast and takes about 1 microsecond per block on a single thread on an Intel Xeon CPU. To the best of our knowledge, RANDM is the first depth map compression algorithm that provides random access capability for interactive applications.

The Effect of Lighting, Landmarks and Auditory Cues on Human Performance in Navigating a Virtual Maze

The human visual and auditory systems are attentive to variations in both lighting and sound, and changes in either are used in audiovisual media to draw attention. In video games specifically, many embedded navigational cues are utilised to subtly suggest navigational choices to players. Both lighting and audio cues are commonly utilised by game designers to signal specific events or to draw player focus and therefore influence navigational decisions. We analyse the influence that combinations of landmark, auditory and illumination cues have on player navigation. 134 participants navigated through a randomly assigned subset of thirty structurally similar virtual mazes with variations of lighting, spatial audio and landmark cues. The solve times for these mazes were analysed to determine the influence of each individual cue and evaluate any cue competition or interactions effects detected. The findings demonstrate that auditory and subtle lighting had distinct effects on navigation and maze solve times and that interactions and cue competition effects were also evident.

The Role of the Field Dependence-independence Construct on the Flow-performance Link in Virtual Reality

The flow experience-performance link is commonly found weak in virtual environments (VEs). The weak association model (WAM) suggests that distraction caused by disjointed features may be associated with the weak association. People characterized by field independent (FI) or field dependent (FD) cognitive style have different abilities in handling sustained attention, thus they may perform differently in the flow-performance link. To explore the role of the field dependence-independence (FDI) construct on the flow-performance link in virtual reality (VR), we developed a VR experimental environment, based on which two empirical studies were performed. Study 1 revealed FD individuals have higher dispersion degree of fixations and showed a weaker flow-performance link. Next, we provide visual cues that utilize distractors to achieve more task-oriented attention. Study 2 found it helps strengthen the task performance, as well as the flow-performance link of FD individuals without increasing distraction. This paper helps draw conclusions on the effects of human diversity on the flow-performance link in VEs and found ways to design a VR system according to individual characteristics.