ACM Transactions on Graphics (TOG): Vol. 39, No. 6. 2020

Full Citation in the ACM Digital Library

SESSION: All about sketches

A benchmark for rough sketch cleanup

Sketching is a foundational step in the design process. Decades of sketch processing research have produced algorithms for 3D shape interpretation, beautification, animation generation, colorization, etc. However, there is a mismatch between sketches created in the wild and the clean, sketch-like input required by these algorithms, preventing their adoption in practice. The recent flurry of sketch vectorization, simplification, and cleanup algorithms could be used to bridge this gap. However, they differ wildly in the assumptions they make on the input and output sketches. We present the first benchmark to evaluate and focus sketch cleanup research. Our dataset consists of 281 sketches obtained in the wild and a curated subset of 101 sketches. For this curated subset along with 40 sketches from previous work, we commissioned manual vectorizations and multiple ground truth cleaned versions by professional artists. The sketches span artistic and technical categories and were created by a variety of artists with different styles. Most sketches have Creative Commons licenses; the rest permit academic use. Our benchmark's metrics measure the similarity of automatically cleaned rough sketches to artist-created ground truth; the ambiguity and messiness of rough sketches; and low-level properties of the output parameterized curves. Our evaluation identifies shortcomings among state-of-the-art cleanup algorithms and discusses open problems for future research.

Sketch2CAD: sequential CAD modeling by sketching in context

We present a sketch-based CAD modeling system, where users create objects incrementally by sketching the desired shape edits, which our system automatically translates to CAD operations. Our approach is motivated by the close similarities between the steps industrial designers follow to draw 3D shapes, and the operations CAD modeling systems offer to create similar shapes. To overcome the strong ambiguity with parsing 2D sketches, we observe that in a sketching sequence, each step makes sense and can be interpreted in the context of what has been drawn before. In our system, this context corresponds to a partial CAD model, inferred in the previous steps, which we feed along with the input sketch to a deep neural network in charge of interpreting how the model should be modified by that sketch. Our deep network architecture then recognizes the intended CAD operation and segments the sketch accordingly, such that a subsequent optimization estimates the parameters of the operation that best fit the segmented sketch strokes. Since there exists no datasets of paired sketching and CAD modeling sequences, we train our system by generating synthetic sequences of CAD operations that we render as line drawings. We present a proof of concept realization of our algorithm supporting four frequently used CAD operations. Using our system, participants are able to quickly model a large and diverse set of objects, demonstrating Sketch2CAD to be an alternate way of interacting with current CAD modeling systems.

Interactive liquid splash modeling by user sketches

Splashing is one of the most fascinating liquid phenomena in the real world and it is favored by artists to create stunning visual effects, both statically and dynamically. Unfortunately, the generation of complex and specialized liquid splashes is a challenging task and often requires considerable time and effort. In this paper, we present a novel system that synthesizes realistic liquid splashes from simple user sketch input. Our system adopts a conditional generative adversarial network (cGAN) trained with physics-based simulation data to produce raw liquid splash models from input sketches, and then applies model refinement processes to further improve their small-scale details. The system considers not only the trajectory of every user stroke, but also its speed, which makes the splash model simulation-ready with its underlying 3D flow. Compared with simulation-based modeling techniques through trials and errors, our system offers flexibility, convenience and intuition in liquid splash design and editing. We evaluate the usability and the efficiency of our system in an immersive virtual reality environment. Thanks to this system, an amateur user can now generate a variety of realistic liquid splashes in just a few minutes.

Pixelor: a competitive sketching AI agent. so you think you can sketch?

We present the first competitive drawing agent Pixelor that exhibits humanlevel performance at a Pictionary-like sketching game, where the participant whose sketch is recognized first is a winner. Our AI agent can autonomously sketch a given visual concept, and achieve a recognizable rendition as quickly or faster than a human competitor. The key to victory for the agent's goal is to learn the optimal stroke sequencing strategies that generate the most recognizable and distinguishable strokes first. Training Pixelor is done in two steps. First, we infer the stroke order that maximizes early recognizability of human training sketches. Second, this order is used to supervise the training of a sequence-to-sequence stroke generator. Our key technical contributions are a tractable search of the exponential space of orderings using neural sorting; and an improved Seq2Seq Wasserstein (S2S-WAE) generator that uses an optimal-transport loss to accommodate the multi-modal nature of the optimal stroke distribution. Our analysis shows that Pixelor is better than the human players of the Quick, Draw! game, under both AI and human judging of early recognition. To analyze the impact of human competitors' strategies, we conducted a further human study with participants being given unlimited thinking time and training in early recognizability by feedback from an AI judge. The study shows that humans do gradually improve their strategies with training, but overall Pixelor still matches human performance. The code and the dataset are available at http://sketchx.ai/pixelor.

Lifting freehand concept sketches into 3D

We present the first algorithm capable of automatically lifting real-world, vector-format, industrial design sketches into 3D. Targeting real-world sketches raises numerous challenges due to inaccuracies, use of overdrawn strokes, and construction lines. In particular, while construction lines convey important 3D information, they add significant clutter and introduce multiple accidental 2D intersections. Our algorithm exploits the geometric cues provided by the construction lines and lifts them to 3D by computing their intended 3D intersections and depths. Once lifted to 3D, these lines provide valuable geometric constraints that we leverage to infer the 3D shape of other artist drawn strokes. The core challenge we address is inferring the 3D connectivity of construction and other lines from their 2D projections by separating 2D intersections into 3D intersections and accidental occlusions. We efficiently address this complex combinatorial problem using a dedicated search algorithm that leverages observations about designer drawing pREFERENCES, and uses those to explore only the most likely solutions of the 3D intersection detection problem. We demonstrate that our separator outputs are of comparable quality to human annotations, and that the 3D structures we recover enable a range of design editing and visualization applications, including novel view synthesis and 3D-aware scaling of the depicted shape.

Continuous curve textures

Repetitive patterns are ubiquitous in natural and human-made objects, and can be created with a variety of tools and methods. Manual authoring provides unmatched degree of freedom and control, but can require significant artistic expertise and manual labor. Computational methods can automate parts of the manual creation process, but are mainly tailored for discrete pixels or elements instead of more general continuous structures. We propose an example-based method to synthesize continuous curve patterns from exemplars. Our main idea is to extend prior sample-based discrete element synthesis methods to consider not only sample positions (geometry) but also their connections (topology). Since continuous structures can exhibit higher complexity than discrete elements, we also propose robust, hierarchical synthesis to enhance output quality. Our algorithm can generate a variety of continuous curve patterns fully automatically. For further quality improvement and customization, we also present an autocomplete user interface to facilitate interactive creation and iterative editing. We evaluate our methods and interface via different patterns, ablation studies, and comparisons with alternative methods.

SESSION: Animation: Fluid

An extended cut-cell method for sub-grid liquids tracking with surface tension

Simulating liquid phenomena utilizing Eulerian frameworks is challenging, since highly energetic flows often induce severe topological changes, creating thin and complex liquid surfaces. Thus, capturing structures that are small relative to the grid size become intractable, since continually increasing the resolution will scale sub-optimally due to the pressure projection step. Previous methods successfully relied on using higher resolution grids for tracking the liquid surface implicitly; however this technique comes with drawbacks. The mismatch of pressure samples and surface degrees of freedom will cause artifacts such as hanging blobs and permanent kinks at the liquid-air interface. In this paper, we propose an extended cut-cell method for handling liquid structures that are smaller than a grid cell. At the core of our method is a novel iso-surface Poisson Solver, which converges with second-order accuracy for pressure values while maintaining attractive discretization properties such as symmetric positive definiteness. Additionally, we extend the iso-surface assumption to be also compatible with surface tension forces. Our results show that the proposed method provides a novel framework for handling arbitrarily small splashes that can also correctly interact with objects embodied by complex geometries.

RBF liquids: an adaptive PIC solver using RBF-FD

We introduce a novel liquid simulation approach that combines a spatially adaptive pressure projection solver with the Particle-in-Cell (PIC) method. The solver relies on a generalized version of the Finite Difference (FD) method to approximate the pressure field and its gradients in tree-based grid discretizations, possibly non-graded. In our approach, FD stencils are computed by using meshfree interpolations provided by a variant of Radial Basis Function (RBF), known as RBF-Finite-Difference (RBF-FD). This meshfree version of the FD produces differentiation weights on scattered nodes with high-order accuracy. Our method adapts a quadtree/octree dynamically in a narrow-band around the liquid interface, providing an adaptive particle sampling for the PIC advection step. Furthermore, RBF affords an accurate scheme for velocity transfer between the grid and particles, keeping the system's stability and avoiding numerical dissipation. We also present a data structure that connects the spatial subdivision of a quadtree/octree with the topology of its corresponding dual-graph. Our data structure makes the setup of stencils straightforward, allowing its updating without the need to rebuild it from scratch at each time-step. We show the effectiveness and accuracy of our solver by simulating incompressible inviscid fluids and comparing results with regular PIC-based solvers available in the literature.

An adaptive staggered-tilted grid for incompressible flow simulation

Enabling adaptivity on a uniform Cartesian grid is challenging due to its highly structured grid cells and axis-aligned grid lines. In this paper, we propose a new grid structure - the adaptive staggered-tilted (AST) grid - to conduct adaptive fluid simulations on a regular discretization. The key mechanics underpinning our new grid structure is to allow the emergence of a new set of tilted grid cells from the nodal positions on a background uniform grid. The original axis-aligned cells, in conjunction with the populated axis-tilted cells, jointly function as the geometric primitives to enable adaptivity on a regular spatial discretization. By controlling the states of the tilted cells both temporally and spatially, we can dynamically evolve the adaptive discretizations on an Eulerian domain. Our grid structure preserves almost all the computational merits of a uniform Cartesian grid, including the cache-coherent data layout, the easiness for parallelization, and the existence of high-performance numerical solvers. Further, our grid structure can be integrated into other adaptive grid structures, such as an Octree or a sparsely populated grid, to accommodate the T-junction-free hierarchy. We demonstrate the efficacy of our AST grid by showing examples of large-scale incompressible flow simulation in domains with irregular boundaries.

Frequency-domain smoke guiding

We propose a simple and efficient method for guiding an Eulerian smoke simulation to match the behavior of a specified velocity field, such as a low-resolution animation of the same scene, while preserving the rich, turbulent details arising in the simulated fluid. Our method works by simply combining the high-frequency component of the simulated fluid velocity with the low-frequency component of the input guiding field. We show how to eliminate the grid-aligned artifacts that appear in naive guiding approaches, and provide a frequency-domain analysis that motivates the use of ideal low-pass and high-pass filters to prevent artificial dissipation of small-scale details. We demonstrate our method on many scenes including those with static and moving obstacles, and show that it produces high-quality results with very little computational overhead.

Semi-analytic boundary handling below particle resolution for smoothed particle hydrodynamics

In this paper, we present a novel semi-analytical boundary handling method for spatially adaptive and divergence-free smoothed particle hydrodynamics (SPH) simulations, including two-way coupling. Our method is consistent under varying particle resolutions and allows for the treatment of boundary features below the particle resolution. We achieve this by first introducing an analytic solution to the interaction of SPH particles with planar boundaries, in 2D and 3D, which we extend to arbitrary boundary geometries using signed distance fields (SDF) to construct locally planar boundaries. Using this boundary-integral-based approach, we can directly evaluate boundary contributions, for any quantity, allowing an easy integration into state of the art simulation methods. Overall, our method improves interactions with small boundary features, readily handles spatially adaptive fluids, preserves particle-boundary interactions across varying resolutions, can directly be implemented in existing SPH methods, and, for non-adaptive simulations, provides a reduction in memory consumption as well as an up to 2× speedup relative to current particle-based boundary handling approaches.

SESSION: Animation: Fluids - phenomenon

Surface-only ferrofluids

We devise a novel surface-only approach for simulating the three dimensional free-surface flow of incompressible, inviscid, and linearly magnetizable ferrofluids. A Lagrangian velocity field is stored on a triangle mesh capturing the fluid's surface. The two key problems associated with the dynamic simulation of the fluid's interesting geometry are the magnetization process transitioning the fluid from a non-magnetic into a magnetic material, and the evaluation of magnetic forces. In this regard, our key observation is that for linearly incompressible ferrofluids, their magnetization and application of magnetic forces only require knowledge about the position of the fluids' boundary. Consequently, our approach employs a boundary element method solving the magnetization problem and evaluating the so-called magnetic pressure required for the force evaluation. The magnetic pressure is added to the Dirichlet boundary condition of a surface-only liquids solver carrying out the dynamical simulation. By only considering the fluid's surface in contrast to its whole volume, we end up with an efficient approach enabling more complex and realistic ferrofluids to be explored in the digital domain without compromising efficiency. Our approach allows for the use of physical parameters leading to accurate simulations as demonstrated in qualitative and quantitative evaluations.

Stormscapes: simulating cloud dynamics in the now

The complex interplay of a number of physical and meteorological phenomena makes simulating clouds a challenging and open research problem. We explore a physically accurate model for simulating clouds and the dynamics of their transitions. We propose first-principle formulations for computing buoyancy and air pressure that allow us to simulate the variations of atmospheric density and varying temperature gradients. Our simulation allows us to model various cloud types, such as cumulus, stratus, and stratoscumulus, and their realistic formations caused by changes in the atmosphere. Moreover, we are able to simulate large-scale cloud super cells - clusters of cumulonimbus formations - that are commonly present during thunderstorms. To enable the efficient exploration of these stormscapes, we propose a lightweight set of high-level parameters that allow us to intuitively explore cloud formations and dynamics. Our method allows us to simulate cloud formations of up to about 20 km × 20 km extents at interactive rates. We explore the capabilities of physically accurate and yet interactive cloud simulations by showing numerous examples and by coupling our model with atmosphere measurements of real-time weather services to simulate cloud formations in the now. Finally, we quantitatively assess our model with cloud fraction profiles, a common measure for comparing cloud types.

A moving least square reproducing kernel particle method for unified multiphase continuum simulation

In physically based-based animation, pure particle methods are popular due to their simple data structure, easy implementation, and convenient parallelization. As a pure particle-based method and using Galerkin discretization, the Moving Least Square Reproducing Kernel Method (MLSRK) was developed in engineering computation as a general numerical tool for solving PDEs. The basic idea of Moving Least Square (MLS) has also been used in computer graphics to estimate deformation gradient for deformable solids. Based on these previous studies, we propose a multiphase MLSRK framework that animates complex and coupled fluids and solids in a unified manner. Specifically, we use the Cauchy momentum equation and phase field model to uniformly capture the momentum balance and phase evolution/interaction in a multiphase system, and systematically formulate the MLSRK discretization to support general multiphase constitutive models. A series of animation examples are presented to demonstrate the performance of our new multiphase MLSRK framework, including hyperelastic, elastoplastic, viscous, fracturing and multiphase coupling behaviours etc.

Simulation, modeling and authoring of glaciers

Glaciers are some of the most visually arresting and scenic elements of cold regions and high mountain landscapes. Although snow-covered terrains have previously received attention in computer graphics, simulating the temporal evolution of glaciers as well as modeling their wide range of features has never been addressed. In this paper, we combine a Shallow Ice Approximation simulation with a procedural amplification process to author high-resolution realistic glaciers. Our multiresolution method allows the interactive simulation of the formation and the evolution of glaciers over hundreds of years. The user can easily modify the environment variables, such as the average temperature or precipitation rate, to control the glacier growth, or directly use brushes to sculpt the ice or bedrock with interactive feedback. Mesoscale and smallscale landforms that are not captured by the glacier simulation, such as crevasses, moraines, seracs, ogives, or icefalls are synthesized using procedural rules inspired by observations in glaciology and according to the physical parameters derived from the simulation. Our method lends itself to seamless integration into production pipelines to decorate reliefs with glaciers and realistic ice features.

A novel discretization and numerical solver for non-fourier diffusion

We introduce the C-F diffusion model [Anderson and Tamma 2006; Xue et al. 2018] to computer graphics for diffusion-driven problems that has several attractive properties: (a) it fundamentally explains diffusion from the perspective of the non-equilibrium statistical mechanical Boltzmann Transport Equation, (b) it allows for a finite propagation speed for diffusion, in contrast to the widely employed Fick's/Fourier's law, and (c) it can capture some of the most characteristic visual aspects of diffusion-driven physics, such as hydrogel swelling, limited diffusive domain for smoke flow, snowflake and dendrite formation, that span from Fourier-type to non-Fourier-type diffusive phenomena. We propose a unified convection-diffusion formulation using this model that treats both the diffusive quantity and its associated flux as the primary unknowns, and that recovers the traditional Fourier-type diffusion as a limiting case. We design a novel semi-implicit discretization for this formulation on staggered MAC grids and a geometric Multigrid-preconditioned Conjugate Gradients solver for efficient numerical solution. To highlight the efficacy of our method, we demonstrate end-to-end examples of elastic porous media simulated with the Material Point Method (MPM), and diffusion-driven Eulerian incompressible fluids.

SESSION: Animation: Pretty solid physics research

Complementary dynamics

We present a novel approach to enrich arbitrary rig animations with elastodynamic secondary effects. Unlike previous methods which pit rig displacements and physical forces as adversaries against each other, we advocate that physics should complement artists' intentions. We propose optimizing for elastodynamic displacements in the subspace orthogonal to displacements that can be created by the rig. This ensures that the additional dynamic motions do not undo the rig animation. The complementary space is high-dimensional, algebraically constructed without manual oversight, and capable of rich high-frequency dynamics. Unlike prior tracking methods, we do not require extra painted weights, segmentation into fixed and free regions or tracking clusters. Our method is agnostic to the physical model and plugs into non-linear FEM simulations, geometric as-rigid-as-possible energies, or mass-spring models. Our method does not require a particular type of rig and adds secondary effects to skeletal animations, cage-based deformations, wire deformers, motion capture data, and rigid-body simulations.

P-cloth: interactive complex cloth simulation on multi-GPU systems using dynamic matrix assembly and pipelined implicit integrators

We present a novel parallel algorithm for cloth simulation that exploits multiple GPUs for fast computation and the handling of very high resolution meshes. To accelerate implicit integration, we describe new parallel algorithms for sparse matrix-vector multiplication (SpMV) and for dynamic matrix assembly on a multi-GPU workstation. Our algorithms use a novel work queue generation scheme for a fat-tree GPU interconnect topology. Furthermore, we present a novel collision handling scheme that uses spatial hashing for discrete and continuous collision detection along with a non-linear impact zone solver. Our parallel schemes can distribute the computation and storage overhead among multiple GPUs and enable us to perform almost interactive simulation on complex cloth meshes, which can hardly be handled on a single GPU due to memory limitations. We have evaluated the performance with two multi-GPU workstations (with 4 and 8 GPUs, respectively) on cloth meshes with 0.5 -- 1.65M triangles. Our approach can reliably handle the collisions and generate vivid wrinkles and folds at 2 -- 5 fps, which is significantly faster than prior cloth simulation systems. We observe almost linear speedups with respect to the number of GPUs.

Higher-order finite elements for embedded simulation

As demands for high-fidelity physics-based animations increase, the need for accurate methods for simulating deformable solids grows. While higherorder finite elements are commonplace in engineering due to their superior approximation properties for many problems, they have gained little traction in the computer graphics community. This may partially be explained by the need for finite element meshes to approximate the highly complex geometry of models used in graphics applications. Due to the additional perelement computational expense of higher-order elements, larger elements are needed, and the error incurred due to the geometry mismatch eradicates the benefits of higher-order discretizations. One solution to this problem is the embedding of the geometry into a coarser finite element mesh. However, to date there is no adequate, practical computational framework that permits the accurate embedding into higher-order elements.

We develop a novel, robust quadrature generation method that generates theoretically guaranteed high-quality sub-cell integration rules of arbitrary polynomial accuracy. The number of quadrature points generated is bounded only by the desired degree of the polynomial, independent of the embedded geometry. Additionally, we build on recent work in the Finite Cell Method (FCM) community so as to tackle the severe ill-conditioning caused by partially filled elements by adapting an Additive-Schwarz-based preconditioner so that it is suitable for use with state-of-the-art non-linear material models from the graphics literature. Together these two contributions constitute a general-purpose framework for embedded simulation with higher-order finite elements.

We finally demonstrate the benefits of our framework in several scenarios, in which second-order hexahedra and tetrahedra clearly outperform their first-order counterparts.

Monolith: a monolithic pressure-viscosity-contact solver for strong two-way rigid-rigid rigid-fluid coupling

We propose Monolith, a monolithic pressure-viscosity-contact solver for more accurately, robustly, and efficiently simulating non-trivial two-way interactions of rigid bodies with inviscid, viscous, or non-Newtonian liquids. Our solver simultaneously handles incompressibility and (optionally) implicit viscosity integration for liquids, contact resolution for rigid bodies, and mutual interactions between liquids and rigid bodies by carefully formulating these as a single unified minimization problem. This monolithic approach reduces or eliminates an array of problematic artifacts, including liquid volume loss, solid interpenetrations, simulation instabilities, artificial "melting" of viscous liquid, and incorrect slip at liquid-solid interfaces. In the absence of solid-solid friction, our minimization problem is a Quadratic Program (QP) with a symmetric positive definite (SPD) matrix and can be treated with a single Linear Complementarity Problem (LCP) solve. When friction is present, we decouple the unified minimization problem into two subproblems so that it can be effectively handled via staggered projections with alternating LCP solves. We also propose a complementary approach for non-Newtonian fluids which can be seamlessly integrated and addressed during the staggered projections. We demonstrate the critical importance of a contact-aware, unified treatment of fluid-solid coupling and the effectiveness of our proposed Monolith solver in a wide range of practical scenarios.

An implicit updated lagrangian formulation for liquids with large surface energy

We present an updated Lagrangian discretization of surface tension forces for the simulation of liquids with moderate to extreme surface tension effects. The potential energy associated with surface tension is proportional to the surface area of the liquid. We design discrete forces as gradients of this energy with respect to the motion of the fluid over a time step. We show that this naturally allows for inversion of the Hessian of the potential energy required with the use of Newton's method to solve the systems of nonlinear equations associated with implicit time stepping. The rotational invariance of the surface tension energy makes it non-convex and we define a definiteness fix procedure as in [Teran et al. 2005]. We design a novel level-set-based boundary quadrature technique to discretize the surface area calculation in our energy based formulation. Our approach works most naturally with Particle-In-Cell [Harlow 1964] techniques and we demonstrate our approach with a weakly incompressible model for liquid discretized with the Material Point Method [Sulsky et al. 1994]. We show that our approach is essential for allowing efficient implicit numerical integration in the limit of high surface tension materials like liquid metals.

SESSION: Computational holography

Design and fabrication of freeform holographic optical elements

Holographic optical elements (HOEs) have a wide range of applications, including their emerging use in virtual and augmented reality displays, but their design and fabrication have remained largely limited to configurations using simple wavefronts. In this paper, we present a pipeline for the design, optimization, and fabrication of complex, customized HOEs that enhances their imaging performance and enables new applications. In particular, we propose an optimization method for grating vector fields that accounts for the unique selectivity properties of HOEs. We further show how our pipeline can be applied to two distinct HOE fabrication methods. The first uses a pair of freeform refractive elements to manufacture HOEs with high optical quality and precision. The second uses a holographic printer with two wavefront-modulating arms, enabling rapid prototyping. We propose a unified wavefront decomposition framework suitable for both fabrication approaches. To demonstrate the versatility of these methods, we fabricate and characterize a series of specialized HOEs, including an aspheric lens, a head-up display lens, a lens array, and, for the first time, a full-color caustic projection element.

Neural holography with camera-in-the-loop training

Holographic displays promise unprecedented capabilities for direct-view displays as well as virtual and augmented reality applications. However, one of the biggest challenges for computer-generated holography (CGH) is the fundamental tradeoff between algorithm runtime and achieved image quality, which has prevented high-quality holographic image synthesis at fast speeds. Moreover, the image quality achieved by most holographic displays is low, due to the mismatch between the optical wave propagation of the display and its simulated model. Here, we develop an algorithmic CGH framework that achieves unprecedented image fidelity and real-time framerates. Our framework comprises several parts, including a novel camera-in-the-loop optimization strategy that allows us to either optimize a hologram directly or train an interpretable model of the optical wave propagation and a neural network architecture that represents the first CGH algorithm capable of generating full-color high-quality holographic images at 1080p resolution in real time.

Learned hardware-in-the-loop phase retrieval for holographic near-eye displays

Holography is arguably the most promising technology to provide wide field-of-view compact eyeglasses-style near-eye displays for augmented and virtual reality. However, the image quality of existing holographic displays is far from that of current generation conventional displays, effectively making today's holographic display systems impractical. This gap stems predominantly from the severe deviations in the idealized approximations of the "unknown" light transport model in a real holographic display, used for computing holograms.

In this work, we depart from such approximate "ideal" coherent light transport models for computing holograms. Instead, we learn the deviations of the real display from the ideal light transport from the images measured using a display-camera hardware system. After this unknown light propagation is learned, we use it to compensate for severe aberrations in real holographic imagery. The proposed hardware-in-the-loop approach is robust to spatial, temporal and hardware deviations, and improves the image quality of existing methods qualitatively and quantitatively in SNR and perceptual quality. We validate our approach on a holographic display prototype and show that the method can fully compensate unknown aberrations and erroneous and non-linear SLM phase delays, without explicitly modeling them. As a result, the proposed method significantly outperforms existing state-of-the-art methods in simulation and experimentation - just by observing captured holographic images.

Rendering near-field speckle statistics in scattering media

We introduce rendering algorithms for the simulation of speckle statistics observed in scattering media under coherent near-field imaging conditions. Our work is motivated by the recent proliferation of techniques that use speckle correlations for tissue imaging applications: The ability to simulate the image measurements used by these speckle imaging techniques in a physically-accurate and computationally-efficient way can facilitate the widespread adoption and improvement of these techniques. To this end, we draw inspiration from recently-introduced Monte Carlo algorithms for rendering speckle statistics under far-field conditions (collimated sensor and illumination). We derive variants of these algorithms that are better suited to the near-field conditions (focused sensor and illumination) required by tissue imaging applications. Our approach is based on using Gaussian apodization to approximate the sensor and illumination aperture, as well as von Mises-Fisher functions to approximate the phase function of the scattering material. We show that these approximations allow us to derive closed-form expressions for the focusing operations involved in simulating near-field speckle patterns. As we demonstrate in our experiments, these approximations accelerate speckle rendering simulations by a few orders of magnitude compared to previous techniques, at the cost of negligible bias. We validate the accuracy of our algorithms by reproducing ground truth speckle statistics simulated using wave-optics solvers, and real-material measurements available in the literature. Finally, we use our algorithms to simulate biomedical imaging techniques for focusing through tissue.

SESSION: Computational robotics

RoboGrammar: graph grammar for terrain-optimized robot design

We present RoboGrammar, a fully automated approach for generating optimized robot structures to traverse given terrains. In this framework, we represent each robot design as a graph, and use a graph grammar to express possible arrangements of physical robot assemblies. Each robot design can then be expressed as a sequence of grammar rules. Using only a small set of rules our grammar can describe hundreds of thousands of possible robot designs. The construction of the grammar limits the design space to designs that can be fabricated. For a given input terrain, the design space is searched to find the top performing robots and their corresponding controllers. We introduce Graph Heuristic Search - a novel method for efficient search of combinatorial design spaces. In Graph Heuristic Search, we explore the design space while simultaneously learning a function that maps incomplete designs (e.g., nodes in the combinatorial search tree) to the best performance values that can be achieved by expanding these incomplete designs. Graph Heuristic Search prioritizes exploration of the most promising branches of the design space. To test our method we optimize robots for a number of challenging and varied terrains. We demonstrate that RoboGrammar can successfully generate nontrivial robots that are optimized for a single terrain or a combination of terrains.

Learning to manipulate amorphous materials

We present a method of training character manipulation of amorphous materials such as those often used in cooking. Common examples of amorphous materials include granular materials (salt, uncooked rice), fluids (honey), and visco-plastic materials (sticky rice, softened butter). A typical task is to spread a given material out across a flat surface using a tool such as a scraper or knife. We use reinforcement learning to train our controllers to manipulate materials in various ways. The training is performed in a physics simulator that uses position-based dynamics of particles to simulate the materials to be manipulated. The neural network control policy is given observations of the material (e.g. a low-resolution density map), and the policy outputs actions such as rotating and translating the knife. We demonstrate policies that have been successfully trained to carry out the following tasks: spreading, gathering, and flipping. We produce a final animation by using inverse kinematics to guide a character's arm and hand to match the motion of the manipulation tool such as a knife or a frying pan.

ADD: analytically differentiable dynamics for multi-body systems with frictional contact

We present a differentiable dynamics solver that is able to handle frictional contact for rigid and deformable objects within a unified framework. Through a principled mollification of normal and tangential contact forces, our method circumvents the main difficulties inherent to the non-smooth nature of frictional contact. We combine this new contact model with fully-implicit time integration to obtain a robust and efficient dynamics solver that is analytically differentiable. In conjunction with adjoint sensitivity analysis, our formulation enables gradient-based optimization with adaptive trade-offs between simulation accuracy and smoothness of objective function landscapes. We thoroughly analyse our approach on a set of simulation examples involving rigid bodies, visco-elastic materials, and coupled multi-body systems. We furthermore showcase applications of our differentiable simulator to parameter estimation for deformable objects, motion planning for robotic manipulation, trajectory optimization for compliant walking robots, as well as efficient self-supervised learning of control policies.

A harmonic balance approach for designing compliant mechanical systems with nonlinear periodic motions

We present a computational method for designing compliant mechanical systems that exhibit large-amplitude oscillations. The technical core of our approach is an optimization-driven design tool that combines sensitivity analysis for optimization with the Harmonic Balance Method for simulation. By establishing dynamic force equilibrium in the frequency domain, our formulation avoids the major limitations of existing alternatives: it handles nonlinear forces, side-steps any transient process, and automatically produces periodic solutions. We introduce design objectives for amplitude optimization and trajectory matching that enable intuitive high-level authoring of large-amplitude motions. Our method can be applied to many types of mechanical systems, which we demonstrate through a set of examples involving compliant mechanisms, flexible rod networks, elastic thin shell models, and multi-material solids. We further validate our approach by manufacturing and evaluating several physical prototypes.

Offsite aerial path planning for efficient urban scene reconstruction

With rapid development in UAV technologies, it is now possible to reconstruct large-scale outdoor scenes using only images captured by low-cost drones. The problem, however, becomes how to plan the aerial path for a drone to capture images so that two conflicting goals are optimized: maximizing the reconstruction quality and minimizing mid-air image acquisition effort. Existing approaches either resort to pre-defined dense and thus inefficient view sampling strategy, or plan the path adaptively but require two onsite flight passes and intensive computation in-between. Hence, using these methods to capture and reconstruct large-scale scenes can be tedious. In this paper, we present an adaptive aerial path planning algorithm that can be done before the site visit. Using only a 2D map and a satellite image of the to-be-reconstructed area, we first compute a coarse 2.5D model for the scene based on the relationship between buildings and their shadows. A novel Max-Min optimization is then proposed to select a minimal set of viewpoints that maximizes the reconstructability under the the same number of viewpoints. Experimental results on benchmark show that our planning approach can effectively reduce the number of viewpoints needed than the previous state-of-the-art method, while maintaining comparable reconstruction quality. Since no field computation or a second visit is needed, and the view number is also minimized, our approach significantly reduces the time required in the field as well as the off-line computation cost for multi-view stereo reconstruction, making it possible to reconstruct a large-scale urban scene in a short time with moderate effort.

SESSION: Differentiable graphics

Differentiable vector graphics rasterization for editing and learning

We introduce a differentiable rasterizer that bridges the vector graphics and raster image domains, enabling powerful raster-based loss functions, optimization procedures, and machine learning techniques to edit and generate vector content. We observe that vector graphics rasterization is differentiable after pixel prefiltering. Our differentiable rasterizer offers two prefiltering options: an analytical prefiltering technique and a multisampling anti-aliasing technique. The analytical variant is faster but can suffer from artifacts such as conflation. The multisampling variant is still efficient, and can render high-quality images while computing unbiased gradients for each pixel with respect to curve parameters.

We demonstrate that our rasterizer enables new applications, including a vector graphics editor guided by image metrics, a painterly rendering algorithm that fits vector primitives to an image by minimizing a deep perceptual loss function, new vector graphics editing algorithms that exploit well-known image processing methods such as seam carving, and deep generative models that generate vector content from raster-only supervision under a VAE or GAN training objective.

Modular primitives for high-performance differentiable rendering

We present a modular differentiable renderer design that yields performance superior to previous methods by leveraging existing, highly optimized hardware graphics pipelines. Our design supports all crucial operations in a modern graphics pipeline: rasterizing large numbers of triangles, attribute interpolation, filtered texture lookups, as well as user-programmable shading and geometry processing, all in high resolutions. Our modular primitives allow custom, high-performance graphics pipelines to be built directly within automatic differentiation frameworks such as PyTorch or TensorFlow. As a motivating application, we formulate facial performance capture as an inverse rendering problem and show that it can be solved efficiently using our tools. Our results indicate that this simple and straightforward approach achieves excellent geometric correspondence between rendered results and reference imagery.

Differentiable refraction-tracing for mesh reconstruction of transparent objects

Capturing the 3D geometry of transparent objects is a challenging task, ill-suited for general-purpose scanning and reconstruction techniques, since these cannot handle specular light transport phenomena. Existing state-of-the-art methods, designed specifically for this task, either involve a complex setup to reconstruct complete refractive ray paths, or leverage a data-driven approach based on synthetic training data. In either case, the reconstructed 3D models suffer from over-smoothing and loss of fine detail. This paper introduces a novel, high precision, 3D acquisition and reconstruction method for solid transparent objects. Using a static background with a coded pattern, we establish a mapping between the camera view rays and locations on the background. Differentiable tracing of refractive ray paths is then used to directly optimize a 3D mesh approximation of the object, while simultaneously ensuring silhouette consistency and smoothness. Extensive experiments and comparisons demonstrate the superior accuracy of our method.

Match: differentiable material graphs for procedural material capture

We present MATch, a method to automatically convert photographs of material samples into production-grade procedural material models. At the core of MATch is a new library DiffMat that provides differentiable building blocks for constructing procedural materials, and automatic translation of large-scale procedural models, with hundreds to thousands of node parameters, into differentiable node graphs. Combining these translated node graphs with a rendering layer yields an end-to-end differentiable pipeline that maps node graph parameters to rendered images. This facilitates the use of gradient-based optimization to estimate the parameters such that the resulting material, when rendered, matches the target image appearance, as quantified by a style transfer loss. In addition, we propose a deep neural feature-based graph selection and parameter initialization method that efficiently scales to a large number of procedural graphs. We evaluate our method on both rendered synthetic materials and real materials captured as flash photographs. We demonstrate that MATch can reconstruct more accurate, general, and complex procedural materials compared to the state-of-the-art. Moreover, by producing a procedural output, we unlock capabilities such as constructing arbitrary-resolution material maps and parametrically editing the material appearance.

Functional optimization of fluidic devices with differentiable stokes flow

We present a method for performance-driven optimization of fluidic devices. In our approach, engineers provide a high-level specification of a device using parametric surfaces for the fluid-solid boundaries. They also specify desired flow properties for inlets and outlets of the device. Our computational approach optimizes the boundary of the fluidic device such that its steady-state flow matches desired flow at outlets. In order to deal with computational challenges of this task, we propose an efficient, differentiable Stokes flow solver. Our solver provides explicit access to gradients of performance metrics with respect to the parametric boundary representation. This key feature allows us to couple the solver with efficient gradient-based optimization methods. We demonstrate the efficacy of this approach on designs of five complex 3D fluidic systems. Our approach makes an important step towards practical computational design tools for high-performance fluidic devices.

SESSION: Digital geometry processing

Opening and closing surfaces

We propose a new type of curvature flow for curves in 2D and surfaces in 3D. The flow is inspired by the mathematical morphology opening and closing operations. These operations are classically defined by composition of dilation and erosion operations. In practice, existing methods implemented this way will result in re-discretizing the entire shape, even if some parts of the surface do not change. Instead, our surface-only curvature-based flow moves the surface selectively in areas that should be repositioned. In our triangle mesh discretization, vertices in regions unaffected by the opening or closing will remain exactly in place and do not affect our method's complexity, which is output-sensitive.

Nonlinear spectral geometry processing via the TV transform

We introduce a novel computational framework for digital geometry processing, based upon the derivation of a nonlinear operator associated to the total variation functional. Such an operator admits a generalized notion of spectral decomposition, yielding a convenient multiscale representation akin to Laplacian-based methods, while at the same time avoiding undesirable over-smoothing effects typical of such techniques. Our approach entails accurate, detail-preserving decomposition and manipulation of 3D shape geometry while taking an especially intuitive form: non-local semantic details are well separated into different bands, which can then be filtered and re-synthesized with a straightforward linear step. Our computational framework is flexible, can be applied to a variety of signals, and is easily adapted to different geometry representations, including triangle meshes and point clouds. We showcase our method through multiple applications in graphics, ranging from surface and signal denoising to enhancement, detail transfer, and cubic stylization.

Shape approximation by developable wrapping

We present an automatic tool to approximate curved geometries with piece-wise developable surfaces. At the center of our work is an algorithm that wraps a given 3D input surface with multiple developable patches, each modeled as a discrete orthogonal geodesic net. Our algorithm features a global optimization routine for effectively finding the placement of the developable patches. After wrapping the mesh, we use these patches and a non-linear projection step to generate a surface that approximates the original input, but is also amendable to simple and efficient fabrication techniques thanks to being piecewise developable. Our algorithm allows users to steer the trade-off between approximation power and the number of developable patches used. We demonstrate the effectiveness of our approach on a range of 3D shapes. Compared to previous approaches, our results exhibit a smaller or comparable error with fewer patches to fabricate.

To cut or to fill: a global optimization approach to topological simplification

We present a novel algorithm for simplifying the topology of a 3D shape, which is characterized by the number of connected components, handles, and cavities. Existing methods either limit their modifications to be only cutting or only filling, or take a heuristic approach to decide where to cut or fill. We consider the problem of finding a globally optimal set of cuts and fills that achieve the simplest topology while minimizing geometric changes. We show that the problem can be formulated as graph labelling, and we solve it by a transformation to the Node-Weighted Steiner Tree problem. When tested on examples with varying levels of topological complexity, the algorithm shows notable improvement over existing simplification methods in both topological simplicity and geometric distortions.

Sparse cholesky updates for interactive mesh parameterization

We present a novel linear solver for interactive parameterization tasks. Our method is based on the observation that quasi-conformal parameterizations of a triangle mesh are largely determined by boundary conditions. These boundary conditions are typically constructed interactively by users, who have to take several artistic and geometric constraints into account while introducing cuts on the geometry. Commonly, the main computational burden in these methods is solving a linear system every time new boundary conditions are imposed. The core of our solver is a novel approach to efficiently update the Cholesky factorization of the linear system to reflect new boundary conditions, thereby enabling a seamless and interactive workflow even for large meshes consisting of several millions of vertices.

SESSION: Fabrication: Carving, dicing, and printing

VDAC: volume decompose-and-carve for subtractive manufacturing

We introduce carvable volume decomposition for efficient 3-axis CNC machining of 3D freeform objects, where our goal is to develop a fully automatic method to jointly optimize setup and path planning. We formulate our joint optimization as a volume decomposition problem which prioritizes minimizing the number of setup directions while striving for a minimum number of continuously carvable volumes, where a 3D volume is continuously carvable, or simply carvable, if it can be carved with the machine cutter traversing a single continuous path. Geometrically, carvability combines visibility and monotonicity and presents a new shape property which had not been studied before. Given a target 3D shape and the initial material block, our algorithm first finds the minimum number of carving directions by solving a set cover problem. Specifically, we analyze cutter accessibility and select the carving directions based on an assessment of how likely they would lead to a small carvable volume decomposition. Next, to obtain a minimum decomposition based on the selected carving directions efficiently, we narrow down the solution search by focusing on a special kind of points in the residual volume, single access or SA points, which are points that can be accessed from one and only one of the selected carving directions. Candidate carvable volumes are grown starting from the SA points. Finally, we devise an energy term to evaluate the carvable volumes and their combinations, leading to the final decomposition. We demonstrate the performance of our decomposition algorithm on a variety of 2D and 3D examples and evaluate it against the ground truth, where possible, and solutions provided by human experts. Physically machined models are produced where each carvable volume is continuously carved following a connected Fermat spiral toolpath.

Reinforced FDM: multi-axis filament alignment with controlled anisotropic strength

The anisotropy of mechanical strength on a 3D printed model can be controlled in a multi-axis 3D printing system as materials can be accumulated along dynamically varied directions. In this paper, we present a new computational framework to generate specially designed layers and toolpaths of multi-axis 3D printing for strengthening a model by aligning filaments along the directions with large stresses. The major challenge comes from how to effectively decompose a solid into a sequence of strength-aware and collision-free working surfaces. We formulate it as a problem to compute an optimized governing field together with a selected orientation of fabrication setup. Iso-surfaces of the governing field are extracted as working surface layers for filament alignment. Supporting structures in curved layers are constructed by extrapolating the governing field to enable the fabrication of overhangs. Compared with planar-layer based Fused Deposition Modeling (FDM) technology, models fabricated by our method can withstand up to 6.35× loads in experimental tests.

DHFSlicer: double height-field slicing for milling fixed-height materials

3-axis milling enables cheap and precise fabrication of target objects from precut slabs of materials such as wood or stone. However, the space of directly millable shapes is limited since a 3-axis mill can only carve a height-field (HF) surface during each milling and their size is bounded by the slab dimensions, one of which, the height, is typically significantly smaller than the other two for many typical materials. Extending 3-axis milling of precut slabs to general arbitrarily-sized shapes requires decomposing them into bounded-height 3-axis millable parts, or slices, which can be individually milled and then assembled to form the target object. We present DHFSlicer, a novel decomposition method that satisfies the above constraints and significantly reduces both milling time and material waste compared to alternative approaches. We satisfy the fabrication constraints by partitioning target objects into double height-field (DHF) slices, which can be fabricated using two milling passes: the HF surface accessible from one side is milled first, the slice is then flipped using appropriate fixtures, and then the second, remaining, HF surface is milled. DHFSlicer uses an efficient coarse-to-fine decomposition process: It first partitions the inputs into maximally coarse blocks that satisfy a local DHF criterion with respect to per-block milling axes, and then cuts each block into well-sized DHF slices. It minimizes milling time and material waste by keeping the slice count small, and maximizing slice height. We validate our method by embedding it within an end-to-end DHF milling pipeline and fabricating objects from slabs of foam, wood, and MDF; demonstrate that using the obtained slices reduces milling time and material waste by 42% on average compared to existing automatic alternatives; and highlight the benefits of DHFSlicer via extensive ablation studies.

Towards spatially varying gloss reproduction for 3D printing

3D printing technology is a powerful tool for manufacturing complex shapes with high-quality textures. Gloss, next to color and shape, is one of the most salient visual aspects of an object. Unfortunately, printing a wide range of spatially-varying gloss properties using state-of-the-art 3D printers is challenging as it relies on geometrical modifications to achieve the desired appearance. A common post-processing step is to apply off-the-shelf varnishes that modify the final gloss. The main difficulty in automating this process lies in the physical properties of the varnishes which owe their appearance to a high concentration of large particles and as such, they cannot be easily deposited with current 3D color printers. As a result, fine-grained control of gloss properties using today's 3D printing technologies is limited in terms of both spatial resolution and the range of achievable gloss. We address the above limitations and propose new printing hardware based on piezo-actuated needle valves capable of jetting highly viscous varnishes. Based on the new hardware setup, we present the complete pipeline for controlling the gloss of a given 2.5 D object, from printer calibration, through material selection, to the manufacturing of models with spatially-varying reflectance. Furthermore, we discuss the potential integration with current 3D printing technology. Apart from being a viable solution for 3D printing, our method offers an additional and essential benefit of separating color and gloss fabrication which makes the process more flexible and enables high-quality color and gloss reproduction.

Neural light field 3D printing

Modern 3D printers are capable of printing large-size light-field displays at high-resolutions. However, optimizing such displays in full 3D volume for a given light-field imagery is still a challenging task. Existing light field displays optimize over relatively small resolutions using a few co-planar layers in a 2.5D fashion to keep the problem tractable. In this paper, we propose a novel end-to-end optimization approach that encodes input light field imagery as a continuous-space implicit representation in a neural network. This allows fabricating high-resolution, attenuation-based volumetric displays that exhibit the target light fields. In addition, we incorporate the physical constraints of the material to the optimization such that the result can be printed in practice. Our simulation experiments demonstrate that our approach brings significant visual quality improvement compared to the multilayer and uniform grid-based approaches. We validate our simulations with fabricated prototypes and demonstrate that our pipeline is flexible enough to allow fabrications of both planar and non-planar displays.

SESSION: Fabrication: Computational design

Computational design of cold bent glass façades

Cold bent glass is a promising and cost-efficient method for realizing doubly curved glass façades. They are produced by attaching planar glass sheets to curved frames and must keep the occurring stress within safe limits. However, it is very challenging to navigate the design space of cold bent glass panels because of the fragility of the material, which impedes the form finding for practically feasible and aesthetically pleasing cold bent glass façades. We propose an interactive, data-driven approach for designing cold bent glass façades that can be seamlessly integrated into a typical architectural design pipeline. Our method allows non-expert users to interactively edit a parametric surface while providing real-time feedback on the deformed shape and maximum stress of cold bent glass panels. The designs are automatically refined to minimize several fairness criteria, while maximal stresses are kept within glass limits. We achieve interactive frame rates by using a differentiable Mixture Density Network trained from more than a million simulations. Given a curved boundary, our regression model is capable of handling multistable configurations and accurately predicting the equilibrium shape of the panel and its corresponding maximal stress. We show that the predictions are highly accurate and validate our results with a physical realization of a cold bent glass surface.

Freeform quad-based kirigami

Kirigami, the traditional Japanese art of paper cutting and folding generalizes origami and has initiated new research in material science as well as graphics. In this paper we use its capabilities to perform geometric modeling with corrugated surface representations possessing an isometric unfolding into a planar domain after appropriate cuts are made. We initialize our box-based kirigami structures from orthogonal networks of curves, compute a first approximation of their unfolding via mappings between meshes, and complete the process by global optimization. Besides the modeling capabilities we also study the interesting geometry of special kirigami structures from the theoretical side. This experimental paper strives to relate unfoldable checkerboard arrangements of boxes to principal meshes, to the transformation theory of discrete differential geometry, and to a version of the Gauss theorema egregium.

Weavecraft: an interactive design and simulation tool for 3D weaving

3D weaving is an emerging technology for manufacturing multilayer woven textiles. In this work, we present Weavecraft: an interactive, simulation-based design tool for 3D weaving. Unlike existing textile software that uses 2D representations for design patterns, we propose a novel weave block representation that helps the user to understand 3D woven structures and to create complex multi-layered patterns. With Weavecraft, users can create blocks either from scratch or by loading traditional weaves, compose the blocks into large structures, and edit the pattern at various scales. Furthermore, users can verify the design with a physically based simulator, which predicts and visualizes the geometric structure of the woven material and reveals potential defects at an interactive rate. We demonstrate a range of results created with our tool, from simple two-layer cloth and well known 3D structures to a more sophisticated design of a 3D woven shoe, and we evaluate the effectiveness of our system via a formative user study.

Freely orientable microstructures for designing deformable 3D prints

Nature offers a marvel of astonishing and rich deformation behaviors. Yet, most of the objects we fabricate are comparatively rather inexpressive, either rigid or exhibiting simple homogeneous deformations when interacted with.

We explore the synthesis and fabrication of novel microstructures that mimic the effects of having oriented rigid fibers in an otherwise flexible material: the result is extremely rigid along a transverse direction while being comparatively very flexible in the locally orthogonal plane. By allowing free gradation of the rigidity direction orientation within the object, the microstructures can be designed such that, under deformation, distances along fibers in the volume are preserved while others freely change. Through a simple painting tool, this allows a designer to influence the way the volume reshapes when deformed, and results in a wide range of novel possibilities. Many gradations are possible: local free orientation of the fibers; local control of the overall material rigidity (structure density); local canceling of the effect of the fibers, obtaining a more isotropic material.

Our algorithm to synthesize the structures builds upon procedural texturing. It produces a cellular geometry that can be fabricated reliably despite 3D printing walls at a minimal thickness, allowing prints to be very flexible. The synthesis algorithm is efficient and scales to large volumes.

Appearance-preserving tactile optimization

Textures are encountered often on various common objects and surfaces. Many textures combine visual and tactile aspects, each serving important purposes; most obviously, a texture alters the object's appearance or tactile feeling as well as serving for visual or tactile identification and improving usability. The tactile feel and visual appearance of objects are often linked, but they may interact in unpredictable ways. Advances in high-resolution 3D printing enable highly flexible control of geometry to permit manipulation of both visual appearance and tactile properties. In this paper, we propose an optimization method to independently control the tactile properties and visual appearance of a texture. Our optimization is enabled by neural network-based models, and allows the creation of textures with a desired tactile feeling while preserving a desired visual appearance at a relatively low computational cost, for use in a variety of applications.

SESSION: Generation and inference from images

SymmetryNet: learning to predict reflectional and rotational symmetries of 3D shapes from single-view RGB-D images

We study the problem of symmetry detection of 3D shapes from single-view RGB-D images, where severely missing data renders geometric detection approach infeasible. We propose an end-to-end deep neural network which is able to predict both reflectional and rotational symmetries of 3D objects present in the input RGB-D image. Directly training a deep model for symmetry prediction, however, can quickly run into the issue of overfitting. We adopt a multi-task learning approach. Aside from symmetry axis prediction, our network is also trained to predict symmetry correspondences. In particular, given the 3D points present in the RGB-D image, our network outputs for each 3D point its symmetric counterpart corresponding to a specific predicted symmetry. In addition, our network is able to detect for a given shape multiple symmetries of different types. We also contribute a benchmark of 3D symmetry detection based on single-view RGB-D images. Extensive evaluation on the benchmark demonstrates the strong generalization ability of our method, in terms of high accuracy of both symmetry axis prediction and counterpart estimation. In particular, our method is robust in handling unseen object instances with large variation in shape, multi-symmetry composition, as well as novel object categories.

Monster mash: a single-view approach to casual 3D modeling and animation

We present a new framework for sketch-based modeling and animation of 3D organic shapes that can work entirely in an intuitive 2D domain, enabling a playful, casual experience. Unlike previous sketch-based tools, our approach does not require a tedious part-based multi-view workflow with the explicit specification of an animation rig. Instead, we combine 3D inflation with a novel rigidity-preserving, layered deformation model, ARAP-L, to produce a smooth 3D mesh that is immediately ready for animation. Moreover, the resulting model can be animated from a single viewpoint --- and without the need to handle unwanted inter-penetrations, as required by previous approaches. We demonstrate the benefit of our approach on a variety of examples produced by inexperienced users as well as professional animators. For less experienced users, our single-view approach offers a simpler modeling and animating experience than working in a 3D environment, while for professionals, it offers a quick and casual workspace for ideation.

Dynamic facial asset and rig generation from a single scan

The creation of high-fidelity computer-generated (CG) characters for films and games is tied with intensive manual labor, which involves the creation of comprehensive facial assets that are often captured using complex hardware. To simplify and accelerate this digitization process, we propose a framework for the automatic generation of high-quality dynamic facial models, including rigs which can be readily deployed for artists to polish. Our framework takes a single scan as input to generate a set of personalized blendshapes, dynamic textures, as well as secondary facial components (e.g., teeth and eyeballs). Based on a facial database with over 4, 000 scans with pore-level details, varying expressions and identities, we adopt a self-supervised neural network to learn personalized blendshapes from a set of template expressions. We also model the joint distribution between identities and expressions, enabling the inference of a full set of personalized blendshapes with dynamic appearances from a single neutral input scan. Our generated personalized face rig assets are seamlessly compatible with professional production pipelines for facial animation and rendering. We demonstrate a highly robust and effective framework on a wide range of subjects, and showcase high-fidelity facial animations with automatically generated personalized dynamic textures.

iOrthoPredictor: model-guided deep prediction of teeth alignment

In this paper, we present iOrthoPredictor, a novel system to visually predict teeth alignment in photographs. Our system takes a frontal face image of a patient with visible malpositioned teeth along with a corresponding 3D teeth model as input, and generates a facial image with aligned teeth, simulating a real orthodontic treatment effect. The key enabler of our method is an effective disentanglement of an explicit representation of the teeth geometry from the in-mouth appearance, where the accuracy of teeth geometry transformation is ensured by the 3D teeth model while the in-mouth appearance is modeled as a latent variable. The disentanglement enables us to achieve fine-scale geometry control over the alignment while retaining the original teeth appearance attributes and lighting conditions. The whole pipeline consists of three deep neural networks: a U-Net architecture to explicitly extract the 2D teeth silhouette maps representing the teeth geometry in the input photo, a novel multilayer perceptron (MLP) based network to predict the aligned 3D teeth model, and an encoder-decoder based generative model to synthesize the in-mouth appearance conditional on the original teeth appearance and the aligned teeth geometry. Extensive experimental results and a user study demonstrate that iOrthoPredictor is effective in qualitatively predicting teeth alignment, and applicable to the orthodontic industry.

Data-driven authoring of large-scale ecosystems

In computer graphics populating a large-scale natural scene with plants in a fashion that both reflects the complex interrelationships and diversity present in real ecosystems and is computationally efficient enough to support iterative authoring remains an open problem. Ecosystem simulations embody many of the botanical influences, such as sunlight, temperature, and moisture, but require hours to complete, while synthesis from statistical distributions tends not to capture fine-scale variety and complexity.

Instead, we leverage real-world data and machine learning to derive a canopy height model (CHM) for unseen terrain provided by the user. Trees in the canopy layer are then fitted to the resulting CHM through a constrained iterative process that optimizes for a given distribution of species, and, finally, an understorey layer is synthesised using distributions derived from biome-specific undergrowth simulations. Such a hybrid data-driven approach has the advantage that it incorporates subtle biotic, abiotic, and disturbance factors implicitly encoded in the source data and evidences accepted biological behaviour, such as self-thinning, climatic adaptation, and gap dynamics.

SESSION: Hands and faces

RGB2Hands: real-time tracking of 3D hand interactions from monocular RGB video

Tracking and reconstructing the 3D pose and geometry of two hands in interaction is a challenging problem that has a high relevance for several human-computer interaction applications, including AR/VR, robotics, or sign language recognition. Existing works are either limited to simpler tracking settings (e.g., considering only a single hand or two spatially separated hands), or rely on less ubiquitous sensors, such as depth cameras. In contrast, in this work we present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera that explicitly considers close interactions. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN that regresses multiple complementary pieces of information, including segmentation, dense matchings to a 3D hand model, and 2D keypoint positions, together with newly proposed intra-hand relative depth and inter-hand distance maps. These predictions are subsequently used in a generative model fitting framework in order to estimate pose and shape parameters of a 3D hand model for both hands. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline through an extensive ablation study. Moreover, we demonstrate that our approach offers previously unseen two-hand tracking performance from RGB, and quantitatively and qualitatively outperforms existing RGB-based methods that were not explicitly designed for two-hand interactions. Moreover, our method even performs on-par with depth-based real-time methods.

Constraining dense hand surface tracking with elasticity

Many of the actions that we take with our hands involve self-contact and occlusion: shaking hands, making a fist, or interlacing our fingers while thinking. This use of of our hands illustrates the importance of tracking hands through self-contact and occlusion for many applications in computer vision and graphics, but existing methods for tracking hands and faces are not designed to treat the extreme amounts of self-contact and self-occlusion exhibited by common hand gestures. By extending recent advances in vision-based tracking and physically based animation, we present the first algorithm capable of tracking high-fidelity hand deformations through highly self-contacting and self-occluding hand gestures, for both single hands and two hands. By constraining a vision-based tracking algorithm with a physically based deformable model, we obtain an algorithm that is robust to the ubiquitous self-interactions and massive self-occlusions exhibited by common hand gestures, allowing us to track two hand interactions and some of the most difficult possible configurations of a human hand.

Single image portrait relighting via explicit multiple reflectance channel modeling

Portrait relighting aims to render a face image under different lighting conditions. Existing methods do not explicitly consider some challenging lighting effects such as specular and shadow, and thus may fail in handling extreme lighting conditions. In this paper, we propose a novel framework that explicitly models multiple reflectance channels for single image portrait relighting, including the facial albedo, geometry as well as two lighting effects, i.e., specular and shadow. These channels are finally composed to generate the relit results via deep neural networks. Current datasets do not support learning such multiple reflectance channel modeling. Therefore, we present a large-scale dataset with the ground-truths of the channels, enabling us to train the deep neural networks in a supervised manner. Furthermore, we develop a novel module named Lighting guided Feature Modulation (LFM). In contrast to existing methods which simply incorporate the given lighting in the bottleneck of a network, LFM fuses the lighting by layer-wise feature modulation to deliver more convincing results. Extensive experiments demonstrate that our proposed method achieves better results and is able to generate challenging lighting effects.

MakeltTalk: speaker-aware talking-head animation

We present a method that generates expressive talking-head videos from a single facial image with audio as the only input. In contrast to previous attempts to learn direct mappings from audio to raw pixels for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking-head dynamics. Another key component of our method is the prediction of facial landmarks reflecting the speaker-aware dynamics. Based on this intermediate representation, our method works with many portrait images in a single unified framework, including artistic paintings, sketches, 2D cartoon characters, Japanese mangas, and stylized caricatures. In addition, our method generalizes well for faces and characters that were not observed during training. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking-heads of significantly higher quality compared to prior state-of-the-art methods.

Speech gesture generation from the trimodal context of text, audio, and speaker identity

For human-like agents, including virtual avatars and social robots, making proper gestures while speaking is crucial in human-agent interaction. Co-speech gestures enhance interaction experiences and make the agents look alive. However, it is difficult to generate human-like gestures due to the lack of understanding of how people gesture. Data-driven approaches attempt to learn gesticulation skills from human demonstrations, but the ambiguous and individual nature of gestures hinders learning. In this paper, we present an automatic gesture generation model that uses the multimodal context of speech text, audio, and speaker identity to reliably generate gestures. By incorporating a multimodal context and an adversarial training scheme, the proposed model outputs gestures that are human-like and that match with speech content and rhythm. We also introduce a new quantitative evaluation metric for gesture generation models. Experiments with the introduced metric and subjective human evaluation showed that the proposed gesture generation model is better than existing end-to-end generation models. We further confirm that our model is able to work with synthesized audio in a scenario where contexts are constrained, and show that different gesture styles can be generated for the same speech by specifying different speaker identities in the style embedding space that is learned from videos of various speakers. All the code and data is available at https://github.com/ai4r/Gesture-Generation-from-Trimodal-Context.

SESSION: Image synthesis with generative models

PIE: portrait image embedding for semantic control

Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated control parameter. Very recently, high-quality semantically controlled editing has been demonstrated, however only on synthetically created StyleGAN images. We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image. Semantic editing in parameter space is achieved based on StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. We design a novel hierarchical non-linear optimization problem to obtain the embedding. An identity preservation energy term allows spatially coherent edits while maintaining facial integrity. Our approach runs at interactive frame rates and thus allows the user to explore the space of possible edits. We evaluate our approach on a wide set of portrait photos, compare it to the current state of the art, and validate the effectiveness of its components in an ablation study.

Neural crossbreed: neural based image metamorphosis

We propose Neural Crossbreed, a feed-forward neural network that can learn a semantic change of input images in a latent space to create the morphing effect. Because the network learns a semantic change, a sequence of meaningful intermediate images can be generated without requiring the user to specify explicit correspondences. In addition, the semantic change learning makes it possible to perform the morphing between the images that contain objects with significantly different poses or camera views. Furthermore, just as in conventional morphing techniques, our morphing network can handle shape and appearance transitions separately by disentangling the content and the style transfer for rich usability. We prepare a training dataset for morphing using a pre-trained BigGAN, which generates an intermediate image by interpolating two latent vectors at an intended morphing value. This is the first attempt to address image morphing using a pre-trained generative model in order to learn semantic transformation. The experiments show that Neural Crossbreed produces high quality morphed images, overcoming various limitations associated with conventional approaches. In addition, Neural Crossbreed can be further extended for diverse applications such as multi-image morphing, appearance transfer, and video frame interpolation.

Face identity disentanglement via latent space mapping

Learning disentangled representations of data is a fundamental problem in artificial intelligence. Specifically, disentangled latent representations allow generative models to control and compose the disentangled factors in the synthesis process. Current methods, however, require extensive supervision and training, or instead, noticeably compromise quality.

In this paper, we present a method that learns how to represent data in a disentangled way, with minimal supervision, manifested solely using available pre-trained networks. Our key insight is to decouple the processes of disentanglement and synthesis, by employing a leading pre-trained unconditional image generator, such as StyleGAN. By learning to map into its latent space, we leverage both its state-of-the-art quality, and its rich and expressive latent space, without the burden of training it.

We demonstrate our approach on the complex and high dimensional domain of human heads. We evaluate our method qualitatively and quantitatively, and exhibit its success with de-identification operations and with temporal identity coherency in image sequences. Through extensive experimentation, we show that our method successfully disentangles identity from other facial attributes, surpassing existing methods, even though they require more training and supervision.

Manga filling style conversion with screentone variational autoencoder

Western color comics and Japanese-style screened manga are two popular comic styles. They mainly differ in the style of region-filling. However, the conversion between the two region-filling styles is very challenging, and manually done currently. In this paper, we identify that the major obstacle in the conversion between the two filling styles stems from the difference between the fundamental properties of screened region-filling and colored region-filling. To resolve this obstacle, we propose a screentone variational autoencoder, ScreenVAE, to map the screened manga to an intermediate domain. This intermediate domain can summarize local texture characteristics and is interpolative. With this domain, we effectively unify the properties of screening and color-filling, and ease the learning for bidirectional translation between screened manga and color comics. To carry out the bidirectional translation, we further propose a network to learn the translation between the intermediate domain and color comics. Our model can generate quality screened manga given a color comic, and generate color comic that retains the original screening intention by the bitonal manga artist. Several results are shown to demonstrate the effectiveness and convenience of the proposed method. We also demonstrate how the intermediate domain can assist other applications such as manga inpainting and photo-to-comic conversion.

SketchPatch: sketch stylization via seamless patch-level synthesis

The paradigm of image-to-image translation is leveraged for the benefit of sketch stylization via transfer of geometric textural details. Lacking the necessary volumes of data for standard training of translation systems, we advocate for operation at the patch level, where a handful of stylized sketches provide ample mining potential for patches featuring basic geometric primitives. Operating at the patch level necessitates special consideration of full sketch translation, as individual translation of patches with no regard to neighbors is likely to produce visible seams and artifacts at patch borders. Aligned pairs of styled and plain primitives are combined to form input hybrids containing styled elements around the border and plain elements within, and given as input to a seamless translation (ST) generator, whose output patches are expected to reconstruct the fully styled patch. An adversarial addition promotes generalization and robustness to diverse geometries at inference time, forming a simple and effective system for arbitrary sketch stylization, as demonstrated upon a variety of styles and sketches.

SESSION: Learning new viewpoints

Mononizing binocular videos

This paper presents the idea of mono-nizing binocular videos and a framework to effectively realize it. Mono-nize means we purposely convert a binocular video into a regular monocular video with the stereo information implicitly encoded in a visual but nearly-imperceptible form. Hence, we can impartially distribute and show the mononized video as an ordinary monocular video. Unlike ordinary monocular videos, we can restore from it the original binocular video and show it on a stereoscopic display. To start, we formulate an encoding-and-decoding framework with the pyramidal deformable fusion module to exploit long-range correspondences between the left and right views, a quantization layer to suppress the restoring artifacts, and the compression noise simulation module to resist the compression noise introduced by modern video codecs. Our framework is self-supervised, as we articulate our objective function with loss terms defined on the input: a monocular term for creating the mononized video, an invertibility term for restoring the original video, and a temporal term for frame-to-frame coherence. Further, we conducted extensive experiments to evaluate our generated mononized videos and restored binocular videos for diverse types of images and 3D movies. Quantitative results on both standard metrics and user perception studies show the effectiveness of our method.

Synthesizing light field from a single image with variable MPI and two network fusion

We propose a learning-based approach to synthesize a light field with a small baseline from a single image. We synthesize the novel view images by first using a convolutional neural network (CNN) to promote the input image into a layered representation of the scene. We extend the multiplane image (MPI) representation by allowing the disparity of the layers to be inferred from the input image. We show that, compared to the original MPI representation, our representation models the scenes more accurately. Moreover, we propose to handle the visible and occluded regions separately through two parallel networks. The synthesized images using these two networks are then combined through a soft visibility mask to generate the final results. To effectively train the networks, we introduce a large-scale light field dataset of over 2,000 unique scenes containing a wide range of objects. We demonstrate that our approach synthesizes high-quality light fields on a variety of scenes, better than the state-of-the-art methods.

Learned feature embeddings for non-line-of-sight imaging and recognition

Objects obscured by occluders are considered lost in the images acquired by conventional camera systems, prohibiting both visualization and understanding of such hidden objects. Non-line-of-sight methods (NLOS) aim at recovering information about hidden scenes, which could help make medical imaging less invasive, improve the safety of autonomous vehicles, and potentially enable capturing unprecedented high-definition RGB-D data sets that include geometry beyond the directly visible parts. Recent NLOS methods have demonstrated scene recovery from time-resolved pulse-illuminated measurements encoding occluded objects as faint indirect reflections. Unfortunately, these systems are fundamentally limited by the quartic intensity fall-off for diffuse scenes. With laser illumination limited by eye-safety limits, recovery algorithms must tackle this challenge by incorporating scene priors. However, existing NLOS reconstruction algorithms do not facilitate learning scene priors. Even if they did, datasets that allow for such supervision do not exist, and successful encoder-decoder networks and generative adversarial networks fail for real-world NLOS data. In this work, we close this gap by learning hidden scene feature representations tailored to both reconstruction and recognition tasks such as classification or object detection, while still relying on physical models at the feature level. We overcome the lack of real training data with a generalizable architecture that can be trained in simulation. We learn the differentiable scene representation jointly with the reconstruction task using a differentiable transient renderer in the objective, and demonstrate that it generalizes to unseen classes and unseen real-world scenes, unlike existing encoder-decoder architectures and generative adversarial networks. The proposed method allows for end-to-end training for different NLOS tasks, such as image reconstruction, classification, and object detection, while being memory-efficient and running at real-time rates. We demonstrate hidden view synthesis, RGB-D reconstruction, classification, and object detection in the hidden scene in an end-to-end fashion.

A reduced-precision network for image reconstruction

Neural networks are often quantized to use reduced-precision arithmetic, as it greatly improves their storage and computational costs. This approach is commonly used in image classification and natural language processing applications. However, using a quantized network for the reconstruction of HDR images can lead to a significant loss in image quality. In this paper, we introduce QW-Net, a neural network for image reconstruction, in which close to 95% of the computations can be implemented with 4-bit integers. This is achieved using a combination of two U-shaped networks that are specialized for different tasks, a feature extraction network based on the U-Net architecture, coupled to a filtering network that reconstructs the output image. The feature extraction network has more computational complexity but is more resilient to quantization errors. The filtering network, on the other hand, has significantly fewer computations but requires higher precision. Our network recurrently warps and accumulates previous frames using motion vectors, producing temporally stable results with significantly better quality than TAA, a widely used technique in current games.

SESSION: Learning to move and synthesize

TAP-Net: transport-and-pack using reinforcement learning

We introduce the transport-and-pack (TAP) problem, a frequently encountered instance of real-world packing, and develop a neural optimization solution based on reinforcement learning. Given an initial spatial configuration of boxes, we seek an efficient method to iteratively transport and pack the boxes compactly into a target container. Due to obstruction and accessibility constraints, our problem has to add a new search dimension, i.e., finding an optimal transport sequence, to the already immense search space for packing alone. Using a learning-based approach, a trained network can learn and encode solution patterns to guide the solution of new problem instances instead of executing an expensive online search. In our work, we represent the transport constraints using a precedence graph and train a neural network, coined TAP-Net, using reinforcement learning to reward efficient and stable packing. The network is built on an encoder-decoder architecture, where the encoder employs convolution layers to encode the box geometry and precedence graph and the decoder is a recurrent neural network (RNN) which inputs the current encoder output, as well as the current box packing state of the target container, and outputs the next box to pack, as well as its orientation. We train our network on randomly generated initial box configurations, without supervision, via policy gradients to learn optimal TAP policies to maximize packing efficiency and stability. We demonstrate the performance of TAP-Net on a variety of examples, evaluating the network through ablation studies and comparisons to baselines and alternative network designs. We also show that our network generalizes well to larger problem instances, when trained on small-sized inputs.

Scene mover: automatic move planning for scene arrangement by deep reinforcement learning

We propose a novel approach for automatically generating a move plan for scene arrangement.1 Given a scene like an apartment with many furniture objects, to transform its layout into another layout, one would need to determine a collision-free move plan. It could be challenging to design this plan manually because the furniture objects may block the way of each other if not moved properly; and there is a large complex search space of move action sequences that grow exponentially with the number of objects. To tackle this challenge, we propose a learning-based approach to generate a move plan automatically. At the core of our approach is a Monte Carlo tree that encodes possible states of the layout, based on which a search is performed to move a furniture object appropriately in the current layout. We trained a policy neural network embedded with a LSTM module for estimating the best actions to take in the expansion step and simulation step of the Monte Carlo tree search process. Leveraging the power of deep reinforcement learning, the network learned how to make such estimations through millions of trials of moving objects. We demonstrated our approach for moving objects under different scenarios and constraints. We also evaluated our approach on synthetic and real-world layouts, comparing its performance with that of humans and other baseline approaches.

ShapeAssembly: learning to generate programs for 3D shape structure synthesis

Manually authoring 3D shapes is difficult and time consuming; generative models of 3D shapes offer compelling alternatives. Procedural representations are one such possibility: they offer high-quality and editable results but are difficult to author and often produce outputs with limited diversity. On the other extreme are deep generative models: given enough data, they can learn to generate any class of shape but their outputs have artifacts and the representation is not editable.

In this paper, we take a step towards achieving the best of both worlds for novel 3D shape synthesis. First, we propose ShapeAssembly, a domain-specific "assembly-language" for 3D shape structures. ShapeAssembly programs construct shape structures by declaring cuboid part proxies and attaching them to one another, in a hierarchical and symmetrical fashion. ShapeAssembly functions are parameterized with continuous free variables, so that one program structure is able to capture a family of related shapes.

We show how to extract ShapeAssembly programs from existing shape structures in the PartNet dataset. Then, we train a deep generative model, a hierarchical sequence VAE, that learns to write novel ShapeAssembly programs. Our approach leverages the strengths of each representation: the program captures the subset of shape variability that is interpretable and editable, and the deep generative model captures variability and correlations across shape collections that is hard to express procedurally.

We evaluate our approach by comparing the shapes output by our generated programs to those from other recent shape structure synthesis models. We find that our generated shapes are more plausible and physically-valid than those of other methods. Additionally, we assess the latent spaces of these models, and find that ours is better structured and produces smoother interpolations. As an application, we use our generative model and differentiable program interpreter to infer and fit shape programs to unstructured geometry, such as point clouds.

PhysCap: physically plausible monocular 3D motion capture in real time

Marker-less 3D human motion capture from a single colour camera has seen significant progress. However, it is a very challenging and severely ill-posed problem. In consequence, even the most accurate state-of-the-art approaches have significant limitations. Purely kinematic formulations on the basis of individual joints or skeletons, and the frequent frame-wise reconstruction in state-of-the-art methods greatly limit 3D accuracy and temporal stability compared to multi-view or marker-based motion capture. Further, captured 3D poses are often physically incorrect and biomechanically implausible, or exhibit implausible environment interactions (floor penetration, foot skating, unnatural body leaning and strong shifting in depth), which is problematic for any use case in computer graphics.

We, therefore, present PhysCap, the first algorithm for physically plausible, real-time and marker-less human 3D motion capture with a single colour camera at 25 fps. Our algorithm first captures 3D human poses purely kinematically. To this end, a CNN infers 2D and 3D joint positions, and subsequently, an inverse kinematics step finds space-time coherent joint angles and global 3D pose. Next, these kinematic reconstructions are used as constraints in a real-time physics-based pose optimiser that accounts for environment constraints (e.g., collision handling and floor placement), gravity, and biophysical plausibility of human postures. Our approach employs a combination of ground reaction force and residual force for plausible root control, and uses a trained neural network to detect foot contact events in images. Our method captures physically plausible and temporally stable global 3D human motion, without physically implausible postures, floor penetrations or foot skating, from video in real time and in general scenes. PhysCap achieves state-of-the-art accuracy on established pose benchmarks, and we propose new metrics to demonstrate the improved physical plausibility and temporal stability.

MoGlow: probabilistic and controllable motion synthesis using normalising flows

Data-driven modelling and synthesis of motion is an active research area with applications that include animation, games, and social robotics. This paper introduces a new class of probabilistic, generative, and controllable motion-data models based on normalising flows. Models of this kind can describe highly complex distributions, yet can be trained efficiently using exact maximum likelihood, unlike GANs or VAEs. Our proposed model is autoregressive and uses LSTMs to enable arbitrarily long time-dependencies. Importantly, is is also causal, meaning that each pose in the output sequence is generated without access to poses or control inputs from future time steps; this absence of algorithmic latency is important for interactive applications with real-time motion control. The approach can in principle be applied to any type of motion since it does not make restrictive, task-specific assumptions regarding the motion or the character morphology. We evaluate the models on motion-capture datasets of human and quadruped locomotion. Objective and subjective results show that randomly-sampled motion from the proposed method outperforms task-agnostic baselines and attains a motion quality close to recorded motion capture.

SESSION: Light transport: Methods

Glossy probe reprojection for interactive global illumination

Recent rendering advances dramatically reduce the cost of global illumination. But even with hardware acceleration, complex light paths with multiple glossy interactions are still expensive; our new algorithm stores these paths in precomputed light probes and reprojects them at runtime to provide interactivity. Combined with traditional light maps for diffuse lighting our approach interactively renders all light paths in static scenes with opaque objects. Naively reprojecting probes with glossy lighting is memory-intensive, requires efficient access to the correctly reflected radiance, and exhibits problems at occlusion boundaries in glossy reflections. Our solution addresses all these issues. To minimize memory, we introduce an adaptive light probe parameterization that allocates increased resolution for shinier surfaces and regions of higher geometric complexity. To efficiently sample glossy paths, our novel gathering algorithm reprojects probe texels in a view-dependent manner using efficient reflection estimation and a fast rasterization-based search. Naive probe reprojection often sharpens glossy reflections at occlusion boundaries, due to changes in parallax. To avoid this, we split the convolution induced by the BRDF into two steps: we precompute probes using a lower material roughness and apply an adaptive bilateral filter at runtime to reproduce the original surface roughness. Combining these elements, our algorithm interactively renders complex scenes while fitting in the memory, bandwidth, and computation constraints of current hardware.

Path cuts: efficient rendering of pure specular light transport

In scenes lit with sharp point-like light sources, light can bounce several times on specular materials before getting into our eyes, forming purely specular light paths. However, to our knowledge, rendering such multi-bounce pure specular paths has not been handled in previous work: while many light transport methods have been devised to sample various kinds of light paths, none of them are able to find multi-bounce pure specular light paths from a point light to a pinhole camera. In this paper, we present path cuts to efficiently render such light paths. We use a path space hierarchy combined with interval arithmetic bounds to prune non-contributing regions of path space, and to slice the path space into regions small enough to empirically contain at most one solution. Next, we use an automatic differentiation tool and a Newton-based solver to find an admissible specular path within a given path space region. We demonstrate results on several complex specular configurations, including RR, TT, TRT and TTTT paths.

Slope-space integrals for specular next event estimation

Monte Carlo light transport simulations often lack robustness in scenes containing specular or near-specular materials. Widely used uni- and bidirectional sampling strategies tend to find light paths involving such materials with insufficient probability, producing unusable images that are contaminated by significant variance.

This article addresses the problem of sampling a light path connecting two given scene points via a single specular reflection or refraction, extending the range of scenes that can be robustly handled by unbiased path sampling techniques. Our technique enables efficient rendering of challenging transport phenomena caused by such paths, such as underwater caustics or caustics involving glossy metallic objects.

We derive analytic expressions that predict the total radiance due to a single reflective or refractive triangle with a microfacet BSDF and we show that this reduces to the well known Lambert boundary integral for irradiance. We subsequently show how this can be leveraged to efficiently sample connections on meshes comprised of vast numbers of triangles.

Our derivation builds on the theory of off-center microfacets and involves integrals in the space of surface slopes.

Our approach straightforwardly applies to the related problem of rendering glints with high-resolution normal maps describing specular microstructure. Our formulation alleviates problems raised by singularities in filtering integrals and enables a generalization of previous work to perfectly specular materials. We also extend previous work to the case of GGX distributions and introduce new techniques to improve accuracy and performance.

CPPM: chi-squared progressive photon mapping

We present a novel chi-squared progressive photon mapping algorithm (CPPM) that constructs an estimator by controlling the bandwidth to obtain superior image quality. Our estimator has parametric statistical advantages over prior nonparametric methods. First, we show that when a probability density function of the photon distribution is subject to uniform distribution, the radiance estimation is unbiased under certain assumptions. Next, the local photon distribution is evaluated via a chi-squared test to determine whether the photons follow the hypothesized distribution (uniform distribution) or not. If the statistical test deems that the photons inside the bandwidth are uniformly distributed, bandwidth reduction should be suspended. Finally, we present a pipeline with a bandwidth retention and conditional reduction scheme according to the test results. This pipeline not only accumulates sufficient photons for a reliable chi-squared test, but also guarantees that the estimate converges to the correct solution under our assumptions. We evaluate our method on various benchmarks and observe significant improvement in the running time and rendering quality in terms of mean squared error over prior progressive photon mapping methods.

Path tracing estimators for refractive radiative transfer

Rendering radiative transfer through media with a heterogeneous refractive index is challenging because the continuous refractive index variations result in light traveling along curved paths. Existing algorithms are based on photon mapping techniques, and thus are biased and result in strong artifacts. On the other hand, existing unbiased methods such as path tracing and bidirectional path tracing cannot be used in their current form to simulate media with a heterogeneous refractive index. We change this state of affairs by deriving unbiased path tracing estimators for this problem. Starting from the refractive radiative transfer equation (RRTE), we derive a path-integral formulation, which we use to generalize path tracing with next-event estimation and bidirectional path tracing to the heterogeneous refractive index setting. We then develop an optimization approach based on fast analytic derivative computations to produce the point-to-point connections required by these path tracing algorithms. We propose several acceleration techniques to handle complex scenes (surfaces and volumes) that include participating media with heterogeneous refractive fields. We use our algorithms to simulate a variety of scenes combining heterogeneous refraction and scattering, as well as tissue imaging techniques based on ultrasonic virtual waveguides and lenses. Our algorithms and publicly-available implementation can be used to characterize imaging systems such as refractive index microscopy, schlieren imaging, and acousto-optic imaging, and can facilitate the development of inverse rendering techniques for related applications.

SESSION: Light transport: Sampling

Deep combiner for independent and correlated pixel estimates

Monte Carlo integration is an efficient method to solve a high-dimensional integral in light transport simulation, but it typically produces noisy images due to its stochastic nature. Many existing methods, such as image denoising and gradient-domain reconstruction, aim to mitigate this noise by introducing some form of correlation among pixels. While those existing methods reduce noise, they are known to still suffer from method-specific residual noise or systematic errors. We propose a unified framework that reduces such remaining errors. Our framework takes a pair of images, one with independent estimates, and the other with the corresponding correlated estimates. Correlated pixel estimates are generated by various existing methods such as denoising and gradient-domain rendering. Our framework then combines the two images via a novel combination kernel. We model our combination kernel as a weighting function with a deep neural network that exploits the correlation among pixel estimates. To improve the robustness of our framework for outliers, we additionally propose an extension to handle multiple image buffers. The results demonstrate that our unified framework can successfully reduce the error of existing methods while treating them as black-boxes.

Neural control variates

We propose neural control variates (NCV) for unbiased variance reduction in parametric Monte Carlo integration. So far, the core challenge of applying the method of control variates has been finding a good approximation of the integrand that is cheap to integrate. We show that a set of neural networks can face that challenge: a normalizing flow that approximates the shape of the integrand and another neural network that infers the solution of the integral equation. We also propose to leverage a neural importance sampler to estimate the difference between the original integrand and the learned control variate. To optimize the resulting parametric estimator, we derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice. When applied to light transport simulation, neural control variates are capable of matching the state-of-the-art performance of other unbiased approaches, while providing means to develop more performant, practical solutions. Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.

Screen-space blue-noise diffusion of monte carlo sampling error via hierarchical ordering of pixels

We present a novel technique for diffusing Monte Carlo sampling error as a blue noise in screen space. We show that automatic diffusion of sampling error can be achieved by ordering the pixels in a way that preserves locality, such as Morton's Z-ordering, and assigning the samples to the pixels from successive sub-sequences of a single low-discrepancy sequence, thus securing well-distributed samples for each pixel, local neighborhoods, and the whole image. We further show that a blue-noise distribution of the error is attainable by scrambling the Z-ordering to induce isotropy. We present an efficient technique to implement this hierarchical scrambling by defining a context-free grammar that describes infinite self-similar lookup trees. Our concept is scalable to arbitrary image resolutions, sample dimensions, and sample count, and supports progressive and adaptive sampling.

Unbiased warped-area sampling for differentiable rendering

Differentiable rendering computes derivatives of the light transport equation with respect to arbitrary 3D scene parameters, and enables various applications in inverse rendering and machine learning. We present an unbiased and efficient differentiable rendering algorithm that does not require explicit boundary sampling. We apply the divergence theorem to the derivative of the rendering integral to convert the boundary integral into an area integral. We rewrite the converted area integral to a form that is suitable for Monte Carlo rendering. We then develop an efficient Monte Carlo sampling algorithm for solving the area integral. Our method can be easily plugged into a traditional path tracer and does not require dedicated data structures for sampling boundaries.

We analyze the convergence properties through bias-variance metrics, and demonstrate our estimator's advantages over existing methods for some synthetic inverse rendering examples.

Path differential-informed stratified MCMC and adaptive forward path sampling

Markov Chain Monte Carlo (MCMC) rendering is extensively studied, yet it remains largely unused in practice. We propose solutions to several practicability issues, opening up path space MCMC to become an adaptive sampling framework around established Monte Carlo (MC) techniques. We address non-uniform image quality by deriving an analytic target function for imagespace sample stratification. The function is based on a novel connection between variance and path differentials, allowing analytic variance estimates for MC samples, with potential uses in other adaptive algorithms outside MCMC. We simplify these estimates down to simple expressions using only quantities known in any MC renderer. We also address the issue that most existing MCMC renderers rely on bi-directional path tracing and reciprocal transport, which can be too costly and/or too complex in practice. Instead, we apply our theoretical framework to optimize an adaptive MCMC algorithm that only uses forward path construction. Notably, we construct our algorithm by adapting (with minimal changes) a full-featured path tracer into a single-path state space Markov Chain, bridging another gap between MCMC and existing MC techniques.

SESSION: Meticulous meshes

Bijective projection in a shell

We introduce an algorithm to convert a self-intersection free, orientable, and manifold triangle mesh T into a generalized prismatic shell equipped with a bijective projection operator to map T to a class of discrete surfaces contained within the shell whose normals satisfy a simple local condition. Properties can be robustly and efficiently transferred between these surfaces using the prismatic layer as a common parametrization domain.

The combination of the prismatic shell construction and corresponding projection operator is a robust building block readily usable in many downstream applications, including the solution of PDEs, displacement maps synthesis, Boolean operations, tetrahedral meshing, geometric textures, and nested cages.

Conforming weighted delaunay triangulations

Given a set of points together with a set of simplices we show how to compute weights associated with the points such that the weighted Delaunay triangulation of the point set contains the simplices, if possible. For a given triangulated surface, this process provides a tetrahedral mesh conforming to the triangulation, i.e. solves the problem of meshing the triangulated surface without inserting additional vertices. The restriction to weighted Delaunay triangulations ensures that the orthogonal dual mesh is embedded, facilitating common geometry processing tasks.

We show that the existence of a single simplex in a weighted Delaunay triangulation for given vertices amounts to a set of linear inequalities, one for each vertex. This means that the number of inequalities for a given triangle mesh is quadratic in the number of mesh elements, making the naive approach impractical. We devise an algorithm that incrementally selects a small subset of inequalities, repeatedly updating the weights, until the weighted Delaunay triangulation contains all constrained simplices or the problem becomes infeasible. Applying this algorithm to a range of triangle meshes commonly used graphics demonstrates that many of them admit a conforming weighted Delaunay triangulation, in contrast to conforming or constrained Delaunay that require additional vertices to split the input primitives.

You can find geodesic paths in triangle meshes by just flipping edges

This paper introduces a new approach to computing geodesics on polyhedral surfaces---the basic idea is to iteratively perform edge flips, in the same spirit as the classic Delaunay flip algorithm. This process also produces a triangulation conforming to the output geodesics, which is immediately useful for tasks in geometry processing and numerical simulation. More precisely, our FlipOut algorithm transforms a given sequence of edges into a locally shortest geodesic while avoiding self-crossings (formally: it finds a geodesic in the same isotopy class). The algorithm is guaranteed to terminate in a finite number of operations; practical runtimes are on the order of a few milliseconds, even for meshes with millions of triangles. The same approach is easily applied to curves beyond simple paths, including closed loops, curve networks, and multiply-covered curves. We explore how the method facilitates tasks such as straightening cuts and segmentation boundaries, computing geodesic Bézier curves, extending the notion of constrained Delaunay triangulations (CDT) to curved surfaces, and providing accurate boundary conditions for partial differential equations (PDEs). Evaluation on challenging datasets such as Thingi10k indicates that the method is both robust and efficient, even for low-quality triangulations.

Fast and robust mesh arrangements using floating-point arithmetic

We introduce a novel algorithm to transform any generic set of triangles in 3D space into a well-formed simplicial complex. Intersecting elements in the input are correctly identified, subdivided, and connected to arrange a valid configuration, leading to a topologically sound partition of the space into piece-wise linear cells. Our approach does not require the exact coordinates of intersection points to calculate the resulting complex. We represent any intersection point as an unevaluated combination of input vertices. We then extend the recently introduced concept of indirect predicates [Attene 2020] to define all the necessary geometric tests that, by construction, are both exact and efficient since they fully exploit the floating-point hardware. This design makes our method robust and guaranteed correct, while being virtually as fast as non-robust floating-point based implementations. Compared with existing robust methods, our algorithm offers a number of advantages: it is much faster, has a better memory layout, scales well on extremely challenging models, and allows fully exploiting modern multi-core hardware with a parallel implementation. We thoroughly tested our method on thousands of meshes, concluding that it consistently outperforms prior art. We also demonstrate its usefulness in various applications, such as computing efficient mesh booleans, Minkowski sums, and volume meshes.

SESSION: Modeling and capturing appearances

A practical ply-based appearance model of woven fabrics

Simulating the appearance of woven fabrics is challenging due to the complex interplay of lighting between the constituent yarns and fibers. Conventional surface-based models lack the fidelity and details for producing realistic close-up renderings. Micro-appearance models, on the other hand, can produce highly detailed renderings by depicting fabrics fiber-by-fiber, but become expensive when handling large pieces of clothing. Further, neither surface-based nor micro-appearance model has not been shown in practice to match measurements of complex anisotropic reflection and transmission simultaneously.

In this paper, we introduce a practical appearance model for woven fabrics. We model the structure of a fabric at the ply level and simulate the local appearance of fibers making up each ply. Our model accounts for both reflection and transmission of light and is capable of matching physical measurements better than prior methods including fiber based techniques. Compared to existing micro-appearance models, our model is light-weight and scales to large pieces of clothing.

A wave optics based fiber scattering model

Existing fiber scattering models in rendering are all based on tracing rays through fiber geometry, but for small fibers diffraction and interference are non-negligible, so relying on ray optics can result in appearance errors. This paper presents the first wave optics based fiber scattering model, introducing an azimuthal scattering function that comes from a full wave simulation. Solving Maxwell's equations for a straight fiber of constant cross section illuminated by a plane wave reduces to solving for a 3D electromagnetic field in a 2D domain, and our fiber scattering simulator solves this 2.5D problem efficiently using the boundary element method (BEM). From the resulting fields we compute extinction, absorption, and far-field scattering distributions, which we use to simulate shadowing and scattering by fibers in a path tracer. We validate our path tracer against the wave simulation and the simulation against a measurement of diffraction from a single textile fiber. Our results show that our approach can reproduce a wide range of fibers with different sizes, cross sections, and material properties, including textile fibers, animal fur, and human hair. The renderings include color effects, softening of sharp features, and strong forward scattering that are not predicted by traditional ray-based models, though the two approaches produce similar appearance for complex fiber assemblies under many conditions.

A general framework for pearlescent materials

The unique and visually mesmerizing appearance of pearlescent materials has made them an indispensable ingredient in a diverse array of applications including packaging, ceramics, printing, and cosmetics. In contrast to their natural counterparts, such synthetic examples of pearlescence are created by dispersing microscopic interference pigments within a dielectric resin. The resulting space of materials comprises an enormous range of different phenomena ranging from smooth lustrous appearance reminiscent of pearl to highly directional metallic gloss, along with a gradual change in color that depends on the angle of observation and illumination. All of these properties arise due to a complex optical process involving multiple scattering from platelets characterized by wave-optical interference. This article introduces a flexible model for simulating the optics of such pearlescent 3D microstructures. Following a thorough review of the properties of currently used pigments and manufacturing-related effects that influence pearlescence, we propose a new model which expands the range of appearance that can be represented, and closely reproduces the behavior of measured materials, as we show in our comparisons. Using our model, we conduct a systematic study of the parameter space and its relationship to different aspects of pearlescent appearance. We observe that several previously ignored parameters have a substantial impact on the material's optical behavior, including the multi-layered nature of modern interference pigments, correlations in the orientation of pigment particles, and variability in their properties (e.g. thickness). The utility of a general model for pearlescence extends far beyond computer graphics: inverse and differentiable approaches to rendering are increasingly used to disentangle the physics of scattering from real-world observations. Our approach could inform such reconstructions to enable the predictive design of tailored pearlescent materials.

MaterialGAN: reflectance capture using a generative SVBRDF model

We address the problem of reconstructing spatially-varying BRDFs from a small set of image measurements. This is a fundamentally under-constrained problem, and previous work has relied on using various regularization priors or on capturing many images to produce plausible results. In this work, we present MaterialGAN, a deep generative convolutional network based on StyleGAN2, trained to synthesize realistic SVBRDF parameter maps. We show that MaterialGAN can be used as a powerful material prior in an inverse rendering framework: we optimize in its latent representation to generate material maps that match the appearance of the captured images when rendered. We demonstrate this framework on the task of reconstructing SVBRDFs from images captured under flash illumination using a hand-held mobile phone. Our method succeeds in producing plausible material maps that accurately reproduce the target images, and outperforms previous state-of-the-art material capture methods in evaluations on both synthetic and real data. Furthermore, our GAN-based latent space allows for high-level semantic material editing operations such as generating material variations and material morphing.

Mixed integer ink selection for spectral reproduction

We introduce a novel ink selection method for spectral printing. The ink selection algorithm takes a spectral image and a set of inks as input, and selects a subset of those inks that results in optimal spectral reproduction. We put forward an optimization formulation that searches a huge combinatorial space based on mixed integer programming. We show that solving this optimization in the conventional reflectance space is intractable. The main insight of this work is to solve our problem in the spectral absorbance space with a linearized formulation. The proposed ink selection copes with large-size problems for which previous methods are hopeless. We demonstrate the effectiveness of our method in a concrete setting by lifelike reproduction of handmade paintings. For a successful spectral reproduction of high-resolution paintings, we explore their spectral absorbance estimation, efficient coreset representation, and accurate data-driven reproduction.

SESSION: Neural rendering

Layered neural rendering for retiming people in video

We present a method for retiming people in an ordinary, natural video --- manipulating and editing the time in which different motions of individuals in the video occur. We can temporally align different motions, change the speed of certain actions (speeding up/slowing down, or entirely "freezing" people), or "erase" selected people from the video altogether. We achieve these effects computationally via a dedicated learning-based layered video representation, where each frame in the video is decomposed into separate RGBA layers, representing the appearance of different people in the video. A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate---e.g., shadows, reflections, and motion of loose clothing. The layers can be individually retimed and recombined into a new video, allowing us to achieve realistic, high-quality renderings of retiming effects for real-world videos depicting complex actions and involving multiple individuals, including dancing, trampoline jumping, or group running.

X-Fields: implicit neural view-, light- and time-image interpolation

We suggest to represent an X-Field ---a set of 2D images taken across different view, time or illumination conditions, i.e., video, lightfield, reflectance fields or combinations thereof---by learning a neural network (NN) to map their view, time or light coordinates to 2D images. Executing this NN at new coordinates results in joint view, time or light interpolation. The key idea to make this workable is a NN that already knows the "basic tricks" of graphics (lighting, 3D projection, occlusion) in a hard-coded and differentiable form. The NN represents the input to that rendering as an implicit map, that for any view, time, or light coordinate and for any pixel can quantify how it will move if view, time or light coordinates change (Jacobian of pixel position with respect to view, time, illumination, etc.). Our X-Field representation is trained for one scene within minutes, leading to a compact set of trainable parameters and hence real-time navigation in view, time and illumination.

Deferred neural lighting: free-viewpoint relighting from unstructured photographs

We present deferred neural lighting, a novel method for free-viewpoint relighting from unstructured photographs of a scene captured with handheld devices. Our method leverages a scene-dependent neural rendering network for relighting a rough geometric proxy with learnable neural textures. Key to making the rendering network lighting aware are radiance cues: global illumination renderings of a rough proxy geometry of the scene for a small set of basis materials and lit by the target lighting. As such, the light transport through the scene is never explicitely modeled, but resolved at rendering time by a neural rendering network. We demonstrate that the neural textures and neural renderer can be trained end-to-end from unstructured photographs captured with a double hand-held camera setup that concurrently captures the scene while being lit by only one of the cameras' flash lights. In addition, we propose a novel augmentation refinement strategy that exploits the linearity of light transport to extend the relighting capabilities of the neural rendering network to support other lighting types (e.g., environment lighting) beyond the lighting used during acquisition (i.e., flash lighting). We demonstrate our deferred neural lighting solution on a variety of real-world and synthetic scenes exhibiting a wide range of material properties, light transport effects, and geometrical complexity.

Deep relightable textures: volumetric performance capture with neural rendering

The increasing demand for 3D content in augmented and virtual reality has motivated the development of volumetric performance capture systemsnsuch as the Light Stage. Recent advances are pushing free viewpoint relightable videos of dynamic human performances closer to photorealistic quality. However, despite significant efforts, these sophisticated systems are limited by reconstruction and rendering algorithms which do not fully model complex 3D structures and higher order light transport effects such as global illumination and sub-surface scattering. In this paper, we propose a system that combines traditional geometric pipelines with a neural rendering scheme to generate photorealistic renderings of dynamic performances under desired viewpoint and lighting. Our system leverages deep neural networks that model the classical rendering process to learn implicit features that represent the view-dependent appearance of the subject independent of the geometry layout, allowing for generalization to unseen subject poses and even novel subject identity. Detailed experiments and comparisons demonstrate the efficacy and versatility of our method to generate high-quality results, significantly outperforming the existing state-of-the-art solutions.

Light stage super-resolution: continuous high-frequency relighting

The light stage has been widely used in computer graphics for the past two decades, primarily to enable the relighting of human faces. By capturing the appearance of the human subject under different light sources, one obtains the light transport matrix of that subject, which enables image-based relighting in novel environments. However, due to the finite number of lights in the stage, the light transport matrix only represents a sparse sampling on the entire sphere. As a consequence, relighting the subject with a point light or a directional source that does not coincide exactly with one of the lights in the stage requires interpolation and resampling the images corresponding to nearby lights, and this leads to ghosting shadows, aliased specularities, and other artifacts. To ameliorate these artifacts and produce better results under arbitrary high-frequency lighting, this paper proposes a learning-based solution for the "super-resolution" of scans of human faces taken from a light stage. Given an arbitrary "query" light direction, our method aggregates the captured images corresponding to neighboring lights in the stage, and uses a neural network to synthesize a rendering of the face that appears to be illuminated by a "virtual" light source at the query location. This neural network must circumvent the inherent aliasing and regularity of the light stage data that was used for training, which we accomplish through the use of regularized traditional interpolation methods within our network. Our learned model is able to produce renderings for arbitrary light directions that exhibit realistic shadows and specular highlights, and is able to generalize across a wide variety of subjects. Our super-resolution approach enables more accurate renderings of human subjects under detailed environment maps, or the construction of simpler light stages that contain fewer light sources while still yielding comparable quality renderings as light stages with more densely sampled lights.

SESSION: Shape analysis

DeformSyncNet: Deformation transfer via synchronized shape deformation spaces

Shape deformation is an important component in any geometry processing toolbox. The goal is to enable intuitive deformations of single or multiple shapes or to transfer example deformations to new shapes while preserving the plausibility of the deformed shape(s). Existing approaches assume access to point-level or part-level correspondence or establish them in a preprocessing phase, thus limiting the scope and generality of such approaches. We propose DeformSyncNet, a new approach that allows consistent and synchronized shape deformations without requiring explicit correspondence information. Technically, we achieve this by encoding deformations into a class-specific idealized latent space while decoding them into an individual, model-specific linear deformation action space, operating directly in 3D. The underlying encoding and decoding are performed by specialized (jointly trained) neural networks. By design, the inductive bias of our networks results in a deformation space with several desirable properties, such as path invariance across different deformation pathways, which are then also approximately preserved in real space. We qualitatively and quantitatively evaluate our framework against multiple alternative approaches and demonstrate improved performance.

Discovering pattern structure using differentiable compositing

Patterns, which are collections of elements arranged in regular or near-regular arrangements, are an important graphic art form and widely used due to their elegant simplicity and aesthetic appeal. When a pattern is encoded as a flat image without the underlying structure, manually editing the pattern is tedious and challenging as one has to both preserve the individual element shapes and their original relative arrangements. State-of-the-art deep learning frameworks that operate at the pixel level are unsuitable for manipulating such patterns. Specifically, these methods can easily disturb the shapes of the individual elements or their arrangement, and thus fail to preserve the latent structures of the input patterns. We present a novel differentiable compositing operator using pattern elements and use it to discover structures, in the form of a layered representation of graphical objects, directly from raw pattern images. This operator allows us to adapt current deep learning based image methods to effectively handle patterns. We evaluate our method on a range of patterns and demonstrate superiority in the context of pattern manipulations when compared against state-of-the-art pixel- or point-based alternatives.

MeshWalker: deep mesh understanding by random walks

Most attempts to represent 3D shapes for deep learning have focused on volumetric grids, multi-view images and point clouds. In this paper we look at the most popular representation of 3D shapes in computer graphics---a triangular mesh---and ask how it can be utilized within deep learning. The few attempts to answer this question propose to adapt convolutions & pooling to suit Convolutional Neural Networks (CNNs). This paper proposes a very different approach, termed MeshWalker to learn the shape directly from a given mesh. The key idea is to represent the mesh by random walks along the surface, which "explore" the mesh's geometry and topology. Each walk is organized as a list of vertices, which in some manner imposes regularity on the mesh. The walk is fed into a Recurrent Neural Network (RNN) that "remembers" the history of the walk. We show that our approach achieves state-of-the-art results for two fundamental shape analysis tasks: shape classification and semantic segmentation. Furthermore, even a very small number of examples suffices for learning. This is highly important, since large datasets of meshes are difficult to acquire.

MapTree: recovering multiple solutions in the space of maps

In this paper we propose an approach for computing multiple high-quality near-isometric dense correspondences between a pair of 3D shapes. Our method is fully automatic and does not rely on user-provided landmarks or descriptors. This allows us to analyze the full space of maps and extract multiple diverse and accurate solutions, rather than optimizing for a single optimal correspondence as done in most previous approaches. To achieve this, we propose a compact tree structure based on the spectral map representation for encoding and enumerating possible rough initializations, and a novel efficient approach for refining them to dense pointwise maps. This leads to a new method capable of both producing multiple high-quality correspondences across shapes and revealing the symmetry structure of a shape without a priori information. In addition, we demonstrate through extensive experiments that our method is robust and results in more accurate correspondences than state-of-the-art for shape matching and symmetry detection.

Chordal decomposition for spectral coarsening

We introduce a novel solver to significantly reduce the size of a geometric operator while preserving its spectral properties at the lowest frequencies. We use chordal decomposition to formulate a convex optimization problem which allows the user to control the operator sparsity pattern. This allows for a trade-off between the spectral accuracy of the operator and the cost of its application. We efficiently minimize the energy with a change of variables and achieve state-of-the-art results on spectral coarsening. Our solver further enables novel applications including volume-to-surface approximation and detaching the operator from the mesh, i.e., one can produce a mesh tailor-made for visualization and optimize an operator separately for computation.

SESSION: VR and real-time techniques

OmniPhotos: casual 360° VR photography

Virtual reality headsets are becoming increasingly popular, yet it remains difficult for casual users to capture immersive 360° VR panoramas. State-of-the-art approaches require capture times of usually far more than a minute and are often limited in their supported range of head motion. We introduce OmniPhotos, a novel approach for quickly and casually capturing high-quality 360° panoramas with motion parallax. Our approach requires a single sweep with a consumer 360° video camera as input, which takes less than 3 seconds to capture with a rotating selfie stick or 10 seconds handheld. This is the fastest capture time for any VR photography approach supporting motion parallax by an order of magnitude. We improve the visual rendering quality of our OmniPhotos by alleviating vertical distortion using a novel deformable proxy geometry, which we fit to a sparse 3D reconstruction of captured scenes. In addition, the 360° input views significantly expand the available viewing area, and thus the range of motion, compared to previous approaches. We have captured more than 50 OmniPhotos and show video results for a large variety of scenes. We will make our code available.

Imperceptible manipulation of lateral camera motion for improved virtual reality applications

Virtual Reality (VR) systems increase immersion by reproducing users' movements in the real world. However, several works have shown that this real-to-virtual mapping does not need to be precise in order to convey a realistic experience. Being able to alter this mapping has many potential applications, since achieving an accurate real-to-virtual mapping is not always possible due to limitations in the capture or display hardware, or in the physical space available. In this work, we measure detection thresholds for lateral translation gains of virtual camera motion in response to the corresponding head motion under natural viewing, and in the absence of locomotion, so that virtual camera movement can be either compressed or expanded while these manipulations remain undetected. Finally, we propose three applications for our method, addressing three key problems in VR: improving 6-DoF viewing for captured 360° footage, overcoming physical constraints, and reducing simulator sickness. We have further validated our thresholds and evaluated our applications by means of additional user studies confirming that our manipulations remain imperceptible, and showing that (i) compressing virtual camera motion reduces visible artifacts in 6-DoF, hence improving perceived quality, (ii) virtual expansion allows for completion of virtual tasks within a reduced physical space, and (iii) simulator sickness may be alleviated in simple scenarios when our compression method is applied.

Egocentric videoconferencing

We introduce a method for egocentric videoconferencing that enables hands-free video calls, for instance by people wearing smart glasses or other mixed-reality devices. Videoconferencing portrays valuable non-verbal communication and face expression cues, but usually requires a front-facing camera. Using a frontal camera in a hands-free setting when a person is on the move is impractical. Even holding a mobile phone camera in the front of the face while sitting for a long duration is not convenient. To overcome these issues, we propose a low-cost wearable egocentric camera setup that can be integrated into smart glasses. Our goal is to mimic a classical video call, and therefore, we transform the egocentric perspective of this camera into a front facing video. To this end, we employ a conditional generative adversarial neural network that learns a transition from the highly distorted egocentric views to frontal views common in videoconferencing. Our approach learns to transfer expression details directly from the egocentric view without using a complex intermediate parametric expressions model, as it is used by related face reenactment methods. We successfully handle subtle expressions, not easily captured by parametric blendshape-based solutions, e.g., tongue movement, eye movements, eye blinking, strong expressions and depth varying movements. To get control over the rigid head movements in the target view, we condition the generator on synthetic renderings of a moving neutral face. This allows us to synthesis results at different head poses. Our technique produces temporally smooth video-realistic renderings in real-time using a video-to-video translation network in conjunction with a temporal discriminator. We demonstrate the improved capabilities of our technique by comparing against related state-of-the art approaches.

Optimizing depth perception in virtual and augmented reality through gaze-contingent stereo rendering

Virtual and augmented reality (VR/AR) displays crucially rely on stereoscopic rendering to enable perceptually realistic user experiences. Yet, existing near-eye display systems ignore the gaze-dependent shift of the no-parallax point in the human eye. Here, we introduce a gaze-contingent stereo rendering technique that models this effect and conduct several user studies to validate its effectiveness. Our findings include experimental validation of the location of the no-parallax point, which we then use to demonstrate significant improvements of disparity and shape distortion in a VR setting, and consistent alignment of physical and digitally rendered objects across depths in optical see-through AR. Our work shows that gaze-contingent stereo rendering improves perceptual realism and depth perception of emerging wearable computing systems.

QuickETC2: Fast ETC2 texture compression using Luma differences

Compressed textures are indispensable in most 3D graphics applications to reduce memory traffic and increase performance. For higher-quality graphics, the number and size of textures in an application have continuously increased. Additionally, the ETC2 texture format, which is mandatory in OpenGL ES 3.0, OpenGL 4.3, and Android 4.3 (and later versions), requires more complex texture compression than the traditional ETC1 format. As a result, texture compression becomes more and more time-consuming.

To accelerate ETC2 compression, we introduce two new compression techniques, named QuickETC2. The first technique is an early compression-mode decision scheme. Instead of testing all ETC1/2 modes to compress a texel block, we select proper modes for each block by exploiting the luma difference of the block to reduce unnecessary compression overhead. The second technique is a fast luma-based T- and H-mode compression method. When clustering each texel into two groups, we replace the 3D RGB space with the 1D luma space and quickly find the two groups that have the minimum luma differences. We also selectively perform the T- or H-mode and reduce its distance candidates, according to the luma differences of each group. We have implemented both techniques with AVX2 intrinsics to exploit SIMD parallelism. According to our experiments, QuickETC2 can compress more than 2000 1K×1K-sized images per second on an octa-core CPU.

Real-time rendering of decorative sound textures for soundscapes

Audio recordings contain rich information about sound sources and their properties such as the location, loudness, and frequency of events. One prevalent component in sound recordings is the sound texture, which contains a massive number of events. In such a texture, there can be some distinct and repeated sounds that we term as a foreground sound. Birds chirping in the wind is one such decorative sound texture with the chirping as a foreground sound and the wind as a background texture. To render these decorative sound textures in real-time and with high quality, we create two-layer Markov Models to enable smooth transitions from sound grain to sound grain and propose a hierarchical scheme to generate Head-Related Transfer Function filters for localization cues of sounds represented as area/volume sources. Moreover, during the synthesis stage, we provide control over the frequency and intensity of sounds for customization. Lastly, foreground sounds are often blended into background textures such as the sound of rain splats on car surfaces becoming submerged in the background rain. We develop an extraction component that outperforms existing learning-based methods to facilitate our synthesis with perceptible foreground sounds and well-defined textures.