ACM Transactions on Graphics (TOG): Vol. 41, No. 4. 2022

Full Citation in the ACM Digital Library

SESSION: Computational photography

Blending camera and 77 GHz radar sensing for equitable, robust plethysmography

With the resurgence of non-contact vital sign sensing due to the COVID-19 pandemic, remote heart-rate monitoring has gained significant prominence. Many existing methods use cameras; however previous work shows a performance loss for darker skin tones. In this paper, we show through light transport analysis that the camera modality is fundamentally biased against darker skin tones. We propose to reduce this bias through multi-modal fusion with a complementary and fairer modality - radar. Through a novel debiasing oriented fusion framework, we achieve performance gains over all tested baselines and achieve skin tone fairness improvements over the RGB modality. That is, the associated Pareto frontier between performance and fairness is improved when compared to the RGB modality. In addition, performance improvements are obtained over the radar-based method, with small trade-offs in fairness. We also open-source the largest multi-modal remote heart-rate estimation dataset of paired camera and radar measurements with a focus on skin tone representation.

Seeing through obstructions with diffractive cloaking

Unwanted camera obstruction can severely degrade captured images, including both scene occluders near the camera and partial occlusions of the camera cover glass. Such occlusions can cause catastrophic failures for various scene understanding tasks such as semantic segmentation, object detection, and depth estimation. Existing camera arrays capture multiple redundant views of a scene to see around thin occlusions. Such multi-camera systems effectively form a large synthetic aperture, which can suppress nearby occluders with a large defocus blur, but significantly increase the overall form factor of the imaging setup. In this work, we propose a monocular single-shot imaging approach that optically cloaks obstructions by emulating a large array. Instead of relying on different camera views, we learn a diffractive optical element (DOE) that performs depth-dependent optical encoding, scattering nearby occlusions while allowing paraxial wavefronts to be focused. We computationally reconstruct unobstructed images from these superposed measurements with a neural network that is trained jointly with the optical layer of the proposed imaging system. We assess the proposed method in simulation and with an experimental prototype, validating that the proposed computational camera is capable of recovering occluded scene information in the presence of severe camera obstruction.

High dynamic range and super-resolution from raw image bursts

Photographs captured by smartphones and mid-range cameras have limited spatial resolution and dynamic range, with noisy response in underexposed regions and color artefacts in saturated areas. This paper introduces the first approach (to the best of our knowledge) to the reconstruction of highresolution, high-dynamic range color images from raw photographic bursts captured by a handheld camera with exposure bracketing. This method uses a physically-accurate model of image formation to combine an iterative optimization algorithm for solving the corresponding inverse problem with a learned image representation for robust alignment and a learned natural image prior. The proposed algorithm is fast, with low memory requirements compared to state-of-the-art learning-based approaches to image restoration, and features that are learned end to end from synthetic yet realistic data. Extensive experiments demonstrate its excellent performance with super-resolution factors of up to ×4 on real photographs taken in the wild with hand-held cameras, and high robustness to low-light conditions, noise, camera shake, and moderate object motion.

SESSION: Shape analysis and approximation

EMBER: exact mesh booleans via efficient & robust local arrangements

Boolean operators are an essential tool in a wide range of geometry processing and CAD/CAM tasks. We present a novel method, EMBER, to compute Boolean operations on polygon meshes which is exact, reliable, and highly performant at the same time. Exactness is guaranteed by using a plane-based representation for the input meshes along with recently introduced homogeneous integer coordinates. Reliability and robustness emerge from a formulation of the algorithm via generalized winding numbers and mesh arrangements. High performance is achieved by avoiding the (pre-)construction of a global acceleration structure. Instead, our algorithm performs an adaptive recursive subdivision of the scene's bounding box while generating and tracking all required data on the fly. By leveraging a number of early-out termination criteria, we can avoid the generation and inspection of regions that do not contribute to the output. With a careful implementation and a work-stealing multi-threading architecture, we are able to compute Boolean operations between meshes with millions of triangles at interactive rates. We run an extensive evaluation on the Thingi10K dataset to demonstrate that our method outperforms state-of-the-art algorithms, even inexact ones like QuickCSG, by orders of magnitude.

TopoCut: fast and robust planar cutting of arbitrary domains

Given a complex three-dimensional domain delimited by a closed and non-degenerate input triangle mesh without any self-intersection, a common geometry processing task consists in cutting up the domain into cells through a set of planar cuts, creating a "cut-cell mesh", i.e., a volumetric decomposition of the domain amenable to visualization (e.g., exploded views), animation (e.g., virtual surgery), or simulation (finite volume computations). A large number of methods have proposed either efficient or robust solutions, sometimes restricting the cuts to form a regular or adaptive grid for simplicity; yet, none can guarantee both properties, severely limiting their usefulness in practice. At the core of the difficulty is the determination of topological relationships among large numbers of vertices, edges, faces and cells in order to assemble a proper cut-cell mesh: while exact geometric computations provide a robust solution to this issue, their high computational cost has prompted a number of faster solutions based on, e.g., local floating-point angle sorting to significantly accelerate the process --- but losing robustness in doing so. In this paper, we introduce a new approach to planar cutting of 3D domains that substitutes topological inference for numerical ordering through a novel mesh data structure, and revert to exact numerical evaluations only in the few rare cases where it is strictly necessary. We show that our novel concept of topological cuts exploits the inherent structure of cut-cell mesh generation to save computational time while still guaranteeing exactness for, and robustness to, arbitrary cuts and surface geometry. We demonstrate the superiority of our approach over state-of-the-art methods on almost 10,000 meshes with a wide range of geometric and topological complexity. We also provide an open source implementation.

Robust computation of implicit surface networks for piecewise linear functions

Implicit surface networks, such as arrangements of implicit surfaces and materials interfaces, are used for modeling piecewise smooth or partitioned shapes. However, accurate and numerically robust algorithms for discretizing either structure on a grid are still lacking. We present a unified approach for computing both types of surface networks for piecewise linear functions defined on a tetrahedral grid. Both algorithms are guaranteed to produce a correct combinatorial structure for any number of functions. Our main contribution is an exact and efficient method for partitioning a tetrahedron using the level sets of linear functions defined by barycentric interpolation. To further improve performance, we designed look-up tables to speed up processing of tetrahedra involving few functions and introduced an efficient algorithm for identifying nested 3D regions.

Approximate convex decomposition for 3D meshes with collision-aware concavity and tree search

Approximate convex decomposition aims to decompose a 3D shape into a set of almost convex components, whose convex hulls can then be used to represent the input shape. It thus enables efficient geometry processing algorithms specifically designed for convex shapes and has been widely used in game engines, physics simulations, and animation. While prior works can capture the global structure of input shapes, they may fail to preserve fine-grained details (e.g., filling a toaster's slots), which are critical for retaining the functionality of objects in interactive environments. In this paper, we propose a novel method that addresses the limitations of existing approaches from three perspectives: (a) We introduce a novel collision-aware concavity metric that examines the distance between a shape and its convex hull from both the boundary and the interior. The proposed concavity preserves collision conditions and is more robust to detect various approximation errors. (b) We decompose shapes by directly cutting meshes with 3D planes. It ensures generated convex hulls are intersection-free and avoids voxelization errors. (c) Instead of using a one-step greedy strategy, we propose employing a multi-step tree search to determine the cutting planes, which leads to a globally better solution and avoids unnecessary cuttings. Through extensive evaluation on a large-scale articulated object dataset, we show that our method generates decompositions closer to the original shape with fewer components. It thus supports delicate and efficient object interaction in downstream applications.

Developability-driven piecewise approximations for triangular meshes

We propose a novel method to compute a piecewise mesh with a few developable patches and a small approximation error for an input triangular mesh. Our key observation is that a deformed mesh after enforcing discrete developability is easily partitioned into nearly developable patches. To obtain the nearly developable mesh, we present a new edge-oriented notion of discrete developability to define a developability-encouraged deformation energy, which is further optimized by the block nonlinear Gauss-Seidel method. The key to successfully applying this optimizer is three types of auxiliary variables. Then, a coarse-to-fine segmentation technique is developed to partition the deformed mesh into a small set of nearly discrete developable patches. Finally, we refine the segmented mesh to reduce the discrete Gaussian curvature while keeping the patches smooth and the approximation error small. In practice, our algorithm achieves a favorable tradeoff between the number of developable patches and the approximation error. We demonstrate the feasibility and practicability of our method over various examples, including seventeen physical manufacturing models with paper.

SESSION: Volumes and materials

Unbiased inverse volume rendering with differential trackers

Volumetric representations are popular in inverse rendering because they have a simple parameterization, are smoothly varying, and transparently handle topology changes. However, incorporating the full volumetric transport of light is costly and challenging, often leading practitioners to implement simplified models, such as purely emissive and absorbing volumes with "baked" lighting. One such challenge is the efficient estimation of the gradients of the volume's appearance with respect to its scattering and absorption parameters. We show that the straightforward approach---differentiating a volumetric free-flight sampler---can lead to biased and high-variance gradients, hindering optimization. Instead, we propose using a new sampling strategy: differential ratio tracking, which is unbiased, yields low-variance gradients, and runs in linear time. Differential ratio tracking combines ratio tracking and reservoir sampling to estimate gradients by sampling distances proportional to the unweighted transmittance rather than the usual extinction-weighted transmittance. In addition, we observe local minima when optimizing scattering parameters to reproduce dense volumes or surfaces. We show that these local minima can be overcome by bootstrapping the optimization from nonphysical emissive volumes that are easily optimized.

Procedural texturing of solid wood with knots

We present a procedural framework for modeling the annual ring pattern of solid wood with knots. Although wood texturing is a well-studied topic, there have been few previous attempts at modeling knots inside the wood texture. Our method takes the skeletal structure of a tree log as input and produces a three-dimensional scalar field representing the time of added growth, which defines the volumetric annual ring pattern. First, separate fields are computed around each strand of the skeleton, i.e., the stem and each knot. The strands are then merged into a single field using smooth minimums. We further suggest techniques for controlling the smooth minimum to adjust the balance of smoothness and reproduce the distortion effects observed around dead knots. Our method is implemented as a shader program running on a GPU with computation times of approximately 0.5 s per image and an input data size of 600 KB. We present rendered images of solid wood from pine and spruce as well as plywood and cross-laminated timber (CLT). Our results were evaluated by wood experts, who confirmed the plausibility of the rendered annual ring patterns.

Link to code:

MatFormer: a generative model for procedural materials

Procedural material graphs are a compact, parameteric, and resolution-independent representation that are a popular choice for material authoring. However, designing procedural materials requires significant expertise and publicly accessible libraries contain only a few thousand such graphs. We present MatFormer, a generative model that can produce a diverse set of high-quality procedural materials with complex spatial patterns and appearance. While procedural materials can be modeled as directed (operation) graphs, they contain arbitrary numbers of heterogeneous nodes with unstructured, often long-range node connections, and functional constraints on node parameters and connections. MatFormer addresses these challenges with a multi-stage transformer-based model that sequentially generates nodes, node parameters, and edges, while ensuring the semantic validity of the graph. In addition to generation, MatFormer can be used for the auto-completion and exploration of partial material graphs. We qualitatively and quantitatively demonstrate that our method outperforms alternative approaches, in both generated graph and material quality.

Practical level-of-detail aggregation of fur appearance

Fur appearance rendering is crucial for the realism of computer generated imagery, but is also a challenge in computer graphics for many years. Much effort has been made to accurately simulate the multiple-scattered light transport among fur fibers, but the computation cost is still very high, since the number of fur fibers is usually extremely large. In this paper, we aim at reducing the number of fur fibers while preserving realistic fur appearance. We present an aggregated fur appearance model, using one thick cylinder to accurately describe the aggregated optical behavior of a bunch of fur fibers, including the multiple scattering of light among them. Then, to acquire the parameters of our aggregated model, we use a lightweight neural network to map individual fur fiber's optical properties to those in our aggregated model. Finally, we come up with a practical heuristic that guides the simplification process of fur dynamically at different bounces of the light, leading to a practical level-of-detail rendering scheme. Our method achieves nearly the same results as the ground truth, but performs 3.8×-13.5× faster.

Unbiased and consistent rendering using biased estimators

We introduce a general framework for transforming biased estimators into unbiased and consistent estimators for the same quantity. We show how several existing unbiased and consistent estimation strategies in rendering are special cases of this framework, and are part of a broader debiasing principle. We provide a recipe for constructing estimators using our generalized framework and demonstrate its applicability by developing novel unbiased forms of transmittance estimation, photon mapping, and finite differences.

SESSION: An ode to solvers

A fast unsmoothed aggregation algebraic multigrid framework for the large-scale simulation of incompressible flow

Multigrid methods are quite efficient for solving the pressure Poisson equation in simulations of incompressible flow. However, for viscous liquids, geometric multigrid turned out to be less efficient for solving the variational viscosity equation. In this contribution, we present an Unsmoothed Aggregation Algebraic MultiGrid (UAAMG) method with a multi-color Gauss-Seidel smoother, which consistently solves the variational viscosity equation in a few iterations for various material parameters. Moreover, we augment the OpenVDB data structure with Intel SIMD intrinsic functions to perform sparse matrix-vector multiplications efficiently on all multigrid levels. Our framework is 2.0 to 14.6 times faster compared to the state-of-the-art adaptive octree solver in commercial software for the large-scale simulation of both non-viscous and viscous flow. The code is available at

Loki: a unified multiphysics simulation framework for production

We introduce Loki, a new framework for robust simulation of fluid, rigid, and deformable objects with non-compromising fidelity on any single element, and capabilities for coupling and representation transitions across multiple elements. Loki adapts multiple best-in-class solvers into a unified framework driven by a declarative state machine where users declare 'what' is simulated but not 'when,' so an automatic scheduling system takes care of mixing any combination of objects. This leads to intuitive setups for coupled simulations such as hair in the wind or objects transitioning from one representation to another, for example bulk water FLIP particles to SPH spray particles to volumetric mist. We also provide a consistent treatment for components used in several domains, such as unified collision and attachment constraints across 1D, 2D, 3D deforming and rigid objects. Distribution over MPI, custom linear equation solvers, and aggressive application of sparse techniques keep performance within production requirements. We demonstrate a variety of solvers within the framework and their interactions, including FLIPstyle liquids, spatially adaptive volumetric fluids, SPH, MPM, and mesh-based solids, including but not limited to discrete elastic rods, elastons, and FEM with state-of-the-art constitutive models. Our framework has proven powerful and intuitive enough for voluntary artist adoption and has delivered creature and FX simulations for multiple major movie productions in the preceding four years.

Automatic quantization for physics-based simulation

Quantization has proven effective in high-resolution and large-scale simulations, which benefit from bit-level memory saving. However, identifying a quantization scheme that meets the requirement of both precision and memory efficiency requires trial and error. In this paper, we propose a novel framework to allow users to obtain a quantization scheme by simply specifying either an error bound or a memory compression rate. Based on the error propagation theory, our method takes advantage of auto-diff to estimate the contributions of each quantization operation to the total error. We formulate the task as a constrained optimization problem, which can be efficiently solved with analytical formulas derived for the linearized objective function. Our workflow extends the Taichi compiler and introduces dithering to improve the precision of quantized simulations. We demonstrate the generality and efficiency of our method via several challenging examples of physics-based simulation, which achieves up to 2.5× memory compression without noticeable degradation of visual quality in the results. Our code and data are available at

Energetically consistent inelasticity for optimization time integration

In this paper, we propose Energetically Consistent Inelasticity (ECI), a new formulation for modeling and discretizing finite strain elastoplasticity/viscoelasticity in a way that is compatible with optimization-based time integrators. We provide an in-depth analysis for allowing plasticity to be implicitly integrated through an augmented strain energy density function. We develop ECI on the associative von-Mises J2 plasticity, the non-associative Drucker-Prager plasticity, and the finite strain viscoelasticity. We demonstrate the resulting scheme on both the Finite Element Method (FEM) and the Material Point Method (MPM). Combined with a custom Newton-type optimization integration scheme, our method enables simulating stiff and large-deformation inelastic dynamics of metal, sand, snow, and foam with larger time steps, improved stability, higher efficiency, and better accuracy than existing approaches.

Grid-free Monte Carlo for PDEs with spatially varying coefficients

Partial differential equations (PDEs) with spatially varying coefficients arise throughout science and engineering, modeling rich heterogeneous material behavior. Yet conventional PDE solvers struggle with the immense complexity found in nature, since they must first discretize the problem---leading to spatial aliasing, and global meshing/sampling that is costly and error-prone. We describe a method that approximates neither the domain geometry, the problem data, nor the solution space, providing the exact solution (in expectation) even for problems with extremely detailed geometry and intricate coefficients. Our main contribution is to extend the walk on spheres (WoS) algorithm from constant- to variable-coefficient problems, by drawing on techniques from volumetric rendering. In particular, an approach inspired by null-scattering yields unbiased Monte Carlo estimators for a large class of 2nd order elliptic PDEs, which share many attractive features with Monte Carlo rendering: no meshing, trivial parallelism, and the ability to evaluate the solution at any point without solving a global system of equations.

Variational quadratic shape functions for polygons and polyhedra

Solving partial differential equations (PDEs) on geometric domains is an important component of computer graphics, geometry processing, and many other fields. Typically, the given discrete mesh is the geometric representation and should not be altered for simulation purposes. Hence, accurately solving PDEs on general meshes is a central goal and has been considered for various differential operators over the last years. While it is known that using higher-order basis functions on simplicial meshes can substantially improve accuracy and convergence, extending these benefits to general surface or volume tessellations in an efficient fashion remains an open problem. Our work proposes variationally optimized piecewise quadratic shape functions for polygons and polyhedra, which generalize quadratic P2 elements, exactly reproduce them on simplices, and inherit their beneficial numerical properties. To mitigate the associated cost of increased computation time, particularly for volumetric meshes, we introduce a custom two-level multigrid solver which significantly improves computational performance.

SESSION: Neural objects, materials and illumination

NeAT: neural adaptive tomography

In this paper, we present Neural Adaptive Tomography (NeAT), the first adaptive, hierarchical neural rendering pipeline for tomography. Through a combination of neural features with an adaptive explicit representation, we achieve reconstruction times far superior to existing neural inverse rendering methods. The adaptive explicit representation improves efficiency by facilitating empty space culling and concentrating samples in complex regions, while the neural features act as a neural regularizer for the 3D reconstruction.

The NeAT framework is designed specifically for the tomographic setting, which consists only of semi-transparent volumetric scenes instead of opaque objects. In this setting, NeAT outperforms the quality of existing optimization-based tomography solvers while being substantially faster.

NeROIC: neural rendering of objects from online image collections

We present a novel method to acquire object representations from online image collections, capturing high-quality geometry and material properties of arbitrary objects from photographs with varying cameras, illumination, and backgrounds. This enables various object-centric rendering applications such as novel-view synthesis, relighting, and harmonized background composition from challenging in-the-wild input. Using a multi-stage approach extending neural radiance fields, we first infer the surface geometry and refine the coarsely estimated initial camera parameters, while leveraging coarse foreground object masks to improve the training efficiency and geometry quality. We also introduce a robust normal estimation technique which eliminates the effect of geometric noise while retaining crucial details. Lastly, we extract surface material properties and ambient illumination, represented in spherical harmonics with extensions that handle transient elements, e.g. sharp shadows. The union of these components results in a highly modular and efficient object acquisition framework. Extensive evaluations and comparisons demonstrate the advantages of our approach in capturing high-quality geometry and appearance properties useful for rendering applications.

SESSION: Meshing and mapping

Compatible intrinsic triangulations

Finding distortion-minimizing homeomorphisms between surfaces of arbitrary genus is a fundamental task in computer graphics and geometry processing. We propose a simple method utilizing intrinsic triangulations, operating directly on the original surfaces without going through any intermediate domains such as a plane or a sphere. Given two models A and B as triangle meshes, our algorithm constructs a Compatible Intrinsic Triangulation (CIT), a pair of intrinsic triangulations over A and B with full correspondences in their vertices, edges and faces. Such a tessellation allows us to establish consistent images of edges and faces of A's input mesh over B (and vice versa) by tracing piecewise-geodesic paths over A and B. Our algorithm for constructing CITs, primarily consisting of carefully designed edge flipping schemes, is empirical in nature without any guarantee of success, but turns out to be robust enough to be used within a similar second-order optimization framework as was used previously in the literature. The utility of our method is demonstrated through comparisons and evaluation on a standard benchmark dataset.

Computing sparse integer-constrained cones for conformal parameterizations

We propose a novel method to generate sparse integer-constrained cone singularities with low distortion constraints for conformal parameterizations. Inspired by [Fang et al. 2021; Soliman et al. 2018], the cone computation is formulated as a constrained optimization problem, where the objective is the number of cones measured by the 0-norm of Gaussian curvature of vertices, and the constraint is to restrict the cone angles to be multiples of π/2 and control the distortion while ensuring that the Yamabe equation holds. Besides, the holonomy angles for the non-contractible homology loops are additionally required to be multiples of π/2 for achieving rotationally seamless conformal parameterizations. The Douglas-Rachford (DR) splitting algorithm is used to solve this challenging optimization problem, and our success relies on two key components. First, replacing each integer constraint with the intersection of a box set and a sphere enables us to manage the subproblems in DR splitting update steps in the continuous domain. Second, a novel solver is developed to optimize the 0-norm without any approximation. We demonstrate the effectiveness and feasibility of our algorithm on a data set containing 3885 models. Compared to state-of-the-art methods, our method achieves a better tradeoff between the number of cones and the parameterization distortion.

Which cross fields can be quadrangulated?: global parameterization from prescribed holonomy signatures

We describe a method for the generation of seamless surface parametrizations with guaranteed local injectivity and full control over holonomy. Previous methods guarantee only one of the two. Local injectivity is required to enable these parametrizations' use in applications such as surface quadrangulation and spline construction. Holonomy control is crucial to enable guidance or prescription of the parametrization's isocurves based on directional information, in particular from cross-fields or feature curves, and more generally to constrain the parametrization topologically. To this end we investigate the relation between cross-field topology and seamless parametrization topology. Leveraging previous results on locally injective parametrization and combining them with insights on this relation in terms of holonomy, we propose an algorithm that meets these requirements. A key component relies on the insight that arbitrary surface cut graphs, as required for global parametrization, can be homeomorphically modified to assume almost any set of turning numbers with respect to a given target cross-field.

Volume parametrization quantization for hexahedral meshing

Developments in the field of parametrization-based quad mesh generation on surfaces have been impactful over the past decade. In this context, an important advance has been the replacement of error-prone rounding in the generation of integer-grid maps, by robust quantization methods. In parallel, parametrization-based hex mesh generation for volumes has been advanced. In this volumetric context, however, the state-of-the-art still relies on fragile rounding, not rarely producing defective meshes, especially when targeting a coarse mesh resolution. We present a method to robustly quantize volume parametrizations, i.e., to determine guaranteed valid choices of integers for 3D integer-grid maps. Inspired by the 2D case, we base our construction on a non-conforming cell decomposition of the volume, a 3D analogue of a T-mesh. In particular, we leverage the motorcycle complex, a recent generalization of the motorcycle graph, for this purpose. Integer values are expressed in a differential manner on the edges of this complex, enabling the efficient formulation of the conditions required to strictly prevent forcing the map into degeneration. Applying our method in the context of hexahedral meshing, we demonstrate that hexahedral meshes can be generated with significantly improved flexibility.

SESSION: New wrinkles in cloth and shells

Simulation and optimization of magnetoelastic thin shells

Magnetoelastic thin shells exhibit great potential in realizing versatile functionalities through a broad range of combination of material stiffness, remnant magnetization intensity, and external magnetic stimuli. In this paper, we propose a novel computational method for forward simulation and inverse design of magnetoelastic thin shells. Our system consists of two key components of forward simulation and backward optimization. On the simulation side, we have developed a new continuum mechanics model based on the Kirchhoff-Love thin-shell model to characterize the behaviors of a megnetolelastic thin shell under external magnetic stimuli. Based on this model, we proposed an implicit numerical simulator facilitated by the magnetic energy Hessian to treat the elastic and magnetic stresses within a unified framework, which is versatile to incorporation with other thin shell models. On the optimization side, we have devised a new differentiable simulation framework equipped with an efficient adjoint formula to accommodate various PDE-constraint, inverse design problems of magnetoelastic thin-shell structures, in both static and dynamic settings. It also encompasses applications of magnetoelastic soft robots, functional Origami, artworks, and meta-material designs. We demonstrate the efficacy of our framework by designing and simulating a broad array of magnetoelastic thin-shell objects that manifest complicated interactions between magnetic fields, materials, and control policies.

True seams: modeling seams in digital garments

Seams play a fundamental role in the way a garment looks, fits, feels and behaves. Seams can have very different shapes and mechanical properties depending on how fabric is overlapped, folded and stitched together, with garment designers often choosing specific seam and stitch type combinations depending on the appearance and behavior they want for the garment. Yet, virtually all 3D CAD tools for fashion and visual effects ignore most of the visual and mechanical complexity of seams, and just treat them as joint edges, their simplest possible form, drastically limiting the fidelity of digital garments. In this paper, we present a method that models seams following their true, real-life construction. Each seam brings together and overlaps the fabric pieces to be sewn, folds the fabric according to the type of seam, and stitches the resulting assembly following the type of stitch. To avoid dealing with the complexities of folding in 3D space, we cast the problem into a sequence of simpler 2D problems where we can easily shape the seam and produce a result free of self-intersections, before lifting the folded geometry back to 3D space. We run a series of constrained optimizations to enforce spatial properties in these 2D settings, allowing us to treat asymmetric seams, gatherings and overlapping construction orders. Using a variety of common seams and stitches, we show how our approach substantially improves the visual appearance of full garments, for a better and more predictive digital replica.

A GPU-based multilevel additive schwarz preconditioner for cloth and deformable body simulation

In this paper, we wish to push the limit of real-time cloth and deformable body simulation to a higher level with 50K to 500K vertices, based on the development of a novel GPU-based multilevel additive Schwarz (MAS) pre-conditioner. Similar to other preconditioners under the MAS framework, our preconditioner naturally adopts multilevel and domain decomposition concepts. But contrary to previous works, we advocate the use of small, non-overlapping domains that can well explore the parallel computing power on a GPU. Based on this idea, we investigate and invent a series of algorithms for our preconditioner, including multilevel domain construction using Morton codes, low-cost matrix precomputation by one-way Gauss-Jordan elimination, and conflict-free symmetric-matrix-vector multiplication in runtime preconditioning. The experiment shows that our preconditioner is effective, fast, cheap to precompute and scalable with respect to stiffness and problem size. It is compatible with many linear and nonlinear solvers used in cloth and deformable body simulation with dynamic contacts, such as PCG, accelerated gradient descent and L-BFGS. On a GPU, our preconditioner speeds up a PCG solver by approximately a factor of four, and its CPU version outperforms a number of competitors, including ILU0 and ILUT.

A general two-stage initialization for sag-free deformable simulations

Initializing simulations of deformable objects involves setting the rest state of all internal forces at the rest shape of the object. However, often times the rest shape is not explicitly provided. In its absence, it is common to initialize by treating the given initial shape as the rest shape. This leads to sagging, the undesirable deformation under gravity as soon as the simulation begins. Prior solutions to sagging are limited to specific simulation systems and material models, most of them cannot handle frictional contact, and they require solving expensive global nonlinear optimization problems.

We introduce a novel solution to the sagging problem that can be applied to a variety of simulation systems and materials. The key feature of our approach is that we avoid solving a global nonlinear optimization problem by performing the initialization in two stages. First, we use a global linear optimization for static equilibrium. Any nonlinearity of the material definition is handled in the local stage, which solves many small local problems efficiently and in parallel. Notably, our method can properly handle frictional contact orders of magnitude faster than prior work. We show that our approach can be applied to various simulation systems by presenting examples with mass-spring systems, cloth simulations, the finite element method, the material point method, and position-based dynamics.

Estimation of yarn-level simulation models for production fabrics

This paper introduces a methodology for inverse-modeling of yarn-level mechanics of cloth, based on the mechanical response of fabrics in the real world. We compiled a database from physical tests of several different knitted fabrics used in the textile industry. These data span different types of complex knit patterns, yarn compositions, and fabric finishes, and the results demonstrate diverse physical properties like stiffness, nonlinearity, and anisotropy.

We then develop a system for approximating these mechanical responses with yarn-level cloth simulation. To do so, we introduce an efficient pipeline for converting between fabric-level data and yarn-level simulation, including a novel swatch-level approximation for speeding up computation, and some small-but-necessary extensions to yarn-level models used in computer graphics. The dataset used for this paper can be found at

SESSION: Contact and fracture

A unified newton barrier method for multibody dynamics

We present a simulation framework for multibody dynamics via a universal variational integration. Our method naturally supports mixed rigid-deformables and mixed codimensional geometries, while providing guaranteed numerical convergence and accurate resolution of contact, friction, and a wide range of articulation constraints. We unify (1) the treatment of simulation degrees of freedom for rigid and soft bodies by formulating them both in terms of Lagrangian nodal displacements, (2) the handling of general linear equality joint constraints through an efficient change-of-variable strategy, (3) the enforcement of nonlinear articulation constraints based on novel distance potential energies, (4) the resolution of frictional contact between mixed dimensions and bodies with a variational Incremental Potential Contact formulation, and (5) the modeling of generalized restitution through semi-implicit Rayleigh damping. We conduct extensive unit tests and benchmark studies to demonstrate the efficacy of our method.

Affine body dynamics: fast, stable and intersection-free simulation of stiff materials

Simulating stiff materials in applications where deformations are either not significant or else can safely be ignored is a fundamental task across fields. Rigid body modeling has thus long remained a critical tool and is, by far, the most popular simulation strategy currently employed for modeling stiff solids. At the same time, rigid body methods continue to pose a number of well known challenges and trade-offs including intersections, instabilities, inaccuracies, and/or slow performances that grow with contact-problem complexity. In this paper we revisit the stiff body problem and present ABD, a simple and highly effective affine body dynamics framework, which significantly improves state-of-the-art for simulating stiff-body dynamics. We trace the challenges in rigid-body methods to the necessity of linearizing piecewise-rigid trajectories and subsequent constraints. ABD instead relaxes the unnecessary (and unrealistic) constraint that each body's motion be exactly rigid with a stiff orthogonality potential, while preserving the rigid body model's key feature of a small coordinate representation. In doing so ABD replaces piecewise linearization with piecewise linear trajectories. This, in turn, combines the best of both worlds: compact coordinates ensure small, sparse system solves, while piecewise-linear trajectories enable efficient and accurate constraint (contact and joint) evaluations. Beginning with this simple foundation, ABD preserves all guarantees of the underlying IPC model we build it upon, e.g., solution convergence, guaranteed non-intersection, and accurate frictional contact. Over a wide range and scale of simulation problems we demonstrate that ABD brings orders of magnitude performance gains (two- to three-orders on the CPU and an order more when utilizing the GPU, obtaining 10, 000× speedups) over prior IPC-based methods, while maintaining simulation quality and nonintersection of trajectories. At the same time ABD has comparable or faster timings when compared to state-of-the-art rigid body libraries optimized for performance without guarantees, and successfully and efficiently solves challenging simulation problems where both classes of prior rigid body simulation methods fail altogether.

Fast evaluation of smooth distance constraints on co-dimensional geometry

We present a new method for computing a smooth minimum distance function based on the LogSumExp function for point clouds, edge meshes, triangle meshes, and combinations of all three. We derive blending weights and a modified Barnes-Hut acceleration approach that ensure our method approximates the true distance, and is conservative (points outside the zero isosurface are guaranteed to be outside the surface) and efficient to evaluate for all the above data types. This, in combination with its ability to smooth sparsely sampled and noisy data, like point clouds, shortens the gap between data acquisition and simulation, and thereby enables new applications such as direct, co-dimensional rigid body simulation using unprocessed lidar data.

Penetration-free projective dynamics on the GPU

We present a GPU algorithm for deformable simulation. Our method offers good computational efficiency and penetration-free guarantee at the same time, which are not common with existing techniques. The main idea is an algorithmic integration of projective dynamics (PD) and incremental potential contact (IPC). PD is a position-based simulation framework, favored for its robust convergence and convenient implementation. We show that PD can be employed to handle the variational optimization with the interior point method e.g., IPC. While conceptually straightforward, this requires a dedicated rework over the collision resolution and the iteration modality to avoid incorrect collision projection with improved numerical convergence. IPC exploits a barrier-based formulation, which yields an infinitely large penalty when the constraint is on the verge of being violated. This mechanism guarantees intersection-free trajectories of deformable bodies during the simulation, as long as they are apart at the rest configuration. On the downside, IPC brings a large amount of nonlinearity to the system, making PD slower to converge. To mitigate this issue, we propose a novel GPU algorithm named A-Jacobi for faster linear solve at the global step of PD. A-Jacobi is based on Jacobi iteration, but it better harvests the computation capacity on modern GPUs by lumping several Jacobi steps into a single iteration. In addition, we also re-design the CCD root finding procedure by using a new minimum-gradient Newton algorithm. Those saved time budgets allow more iterations to accommodate stiff IPC barriers so that the result is both realistic and collision-free. Putting together, our algorithm simulates complicated models of both solids and shells on the GPU at an interactive rate or even in real time.

Contact-centric deformation learning

We propose a novel method to machine-learn highly detailed, nonlinear contact deformations for real-time dynamic simulation. We depart from previous deformation-learning strategies, and model contact deformations in a contact-centric manner. This strategy shows excellent generalization with respect to the object's configuration space, and it allows for simple and accurate learning. We complement the contact-centric learning strategy with two additional key ingredients: learning a continuous vector field of contact deformations, instead of a discrete approximation; and sparsifying the mapping between the contact configuration and contact deformations. These two ingredients further contribute to the accuracy, efficiency, and generalization of the method. We integrate our learning-based contact deformation model with subspace dynamics, showing real-time dynamic simulations with fine contact deformation detail.

Adaptive rigidification of elastic solids

We present a method for reducing the computational cost of elastic solid simulation by treating connected sets of non-deforming elements as rigid bodies. Non-deforming elements are identified as those where the strain rate squared Frobenius norm falls below a threshold for several frames. Rigidification uses a breadth first search to identify connected components while avoiding connections that would form hinges between rigid components. Rigid elements become elastic again when their approximate strain velocity rises above a threshold, which is fast to compute using a single iteration of conjugate gradient with a fixed Laplacian-based incomplete Cholesky preconditioner. With rigidification, the system size to solve at each time step can be greatly reduced, and if all elastic element become rigid, it reduces to solving the rigid body system. We demonstrate our results on a variety of 2D and 3D examples, and show that our method is likewise especially beneficial in contact rich examples.

SESSION: Image/video editing and generation

Disentangling random and cyclic effects in time-lapse sequences

Time-lapse image sequences offer visually compelling insights into dynamic processes that are too slow to observe in real time. However, playing a long time-lapse sequence back as a video often results in distracting flicker due to random effects, such as weather, as well as cyclic effects, such as the day-night cycle. We introduce the problem of disentangling time-lapse sequences in a way that allows separate, after-the-fact control of overall trends, cyclic effects, and random effects in the images, and describe a technique based on data-driven generative models that achieves this goal. This enables us to "re-render" the sequences in ways that would not be possible with the input images alone. For example, we can stabilize a long sequence to focus on plant growth over many months, under selectable, consistent weather.

Our approach is based on Generative Adversarial Networks (GAN) that are conditioned with the time coordinate of the time-lapse sequence. Our architecture and training procedure are designed so that the networks learn to model random variations, such as weather, using the GAN's latent space, and to disentangle overall trends and cyclic variations by feeding the conditioning time label to the model using Fourier features with specific frequencies.

We show that our models are robust to defects in the training data, enabling us to amend some of the practical difficulties in capturing long time-lapse sequences, such as temporary occlusions, uneven frame spacing, and missing frames.

Rewriting geometric rules of a GAN

Deep generative models make visual content creation more accessible to novice users by automating the synthesis of diverse, realistic content based on a collected dataset. However, the current machine learning approaches miss a key element of the creative process - the ability to synthesize things that go far beyond the data distribution and everyday experience. To begin to address this issue, we enable a user to "warp" a given model by editing just a handful of original model outputs with desired geometric changes. Our method applies a low-rank update to a single model layer to reconstruct edited examples. Furthermore, to combat overfitting, we propose a latent space augmentation method based on style-mixing. Our method allows a user to create a model that synthesizes endless objects with defined geometric changes, enabling the creation of a new generative model without the burden of curating a large-scale dataset. We also demonstrate that edited models can be composed to achieve aggregated effects, and we present an interactive interface to enable users to create new models through composition. Empirical measurements on multiple test cases suggest the advantage of our method against recent GAN fine-tuning methods. Finally, we showcase several applications using the edited models, including latent space interpolation and image editing.

ASSET: autoregressive semantic scene editing with transformers at high resolutions

We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map. Our architecture is based on a transformer with a novel attention mechanism. Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions. While previous attention mechanisms are computationally too expensive for handling high-resolution images or are overly constrained within specific image regions hampering long-range interactions, our novel attention mechanism is both computationally efficient and effective. Our sparsified attention mechanism is able to capture long-range interactions and context, leading to synthesizing interesting phenomena in scenes, such as reflections of landscapes onto water or fora consistent with the rest of the landscape, that were not possible to generate reliably with previous convnets and transformer approaches. We present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of our method. Our code and dataset are available at our project page:

SESSION: Ray tracing and Monte Carlo methods

Generalized resampled importance sampling: foundations of ReSTIR

As scenes become ever more complex and real-time applications embrace ray tracing, path sampling algorithms that maximize quality at low sample counts become vital. Recent resampling algorithms building on Talbot et al.'s [2005] resampled importance sampling (RIS) reuse paths spatiotemporally to render surprisingly complex light transport with a few samples per pixel. These reservoir-based spatiotemporal importance resamplers (ReSTIR) and their underlying RIS theory make various assumptions, including sample independence. But sample reuse introduces correlation, so ReSTIR-style iterative reuse loses most convergence guarantees that RIS theoretically provides.

We introduce generalized resampled importance sampling (GRIS) to extend the theory, allowing RIS on correlated samples, with unknown PDFs and taken from varied domains. This solidifies the theoretical foundation, allowing us to derive variance bounds and convergence conditions in ReSTIR-based samplers. It also guides practical algorithm design and enables advanced path reuse between pixels via complex shift mappings.

We show a path-traced resampler (ReSTIR PT) running interactively on complex scenes, capturing many-bounce diffuse and specular lighting while shading just one path per pixel. With our new theoretical foundation, we can also modify the algorithm to guarantee convergence for offline renderers.

R2E2: low-latency path tracing of terabyte-scale scenes using thousands of cloud CPUs

In this paper we explore the viability of path tracing massive scenes using a "supercomputer" constructed on-the-fly from thousands of small, serverless cloud computing nodes. We present R2E2 (Really Elastic Ray Engine) a scene decomposition-based parallel renderer that rapidly acquires thousands of cloud CPU cores, loads scene geometry from a pre-built scene BVH into the aggregate memory of these nodes in parallel, and performs full path traced global illumination using an inter-node messaging service designed for communicating ray data. To balance ray tracing work across many nodes, R2E2 adopts a service-oriented design that statically replicates geometry and texture data from frequently traversed scene regions onto multiple nodes based on estimates of load, and dynamically assigns ray tracing work to lightly loaded nodes holding the required data. We port pbrt's ray-scene intersection components to the R2E2 architecture, and demonstrate that scenes with up to a terabyte of geometry and texture data (where as little as 1/250th of the scene can fit on any one node) can be path traced at 4K resolution, in tens of seconds using thousands of tiny serverless nodes on the AWS Lambda platform.

SPCBPT: subspace-based probabilistic connections for bidirectional path tracing

Bidirectional path tracing (BDPT) can be accelerated by selecting appropriate light sub-paths for connection. However, existing algorithms need to perform frequent distribution reconstruction and have expensive overhead. We present a novel approach, SPCBPT, for probabilistic connections that constructs the light selection distribution in sub-path space. Our approach bins the sub-paths into multiple subspaces and keeps the sub-paths in the same subspace of low discrepancy, wherein the light sub-paths can be selected by a subspace-based two-stage sampling method, i.e., first sampling the light subspace and then resampling the light sub-paths within this subspace. The subspace-based distribution is free of reconstruction and provides efficient light selection at a very low cost. We also propose a method that considers the Multiple Importance Sampling (MIS) term in the light selection and thus obtain an MIS-aware distribution that can minimize the upper bound of variance of the combined estimator. Prior methods typically omit this MIS weights term. We evaluate our algorithm using various benchmarks, and the results show that our approach has superior performance and can significantly reduce the noise compared with the state-of-the-art method.

Modeling and rendering non-euclidean spaces approximated with concatenated polytopes

A non-Euclidean space is characterized as a manifold with a specific structure that violates Euclid's postulates. This paper proposes to approximate a manifold with polytopes. Based on the scene designer's specification, the polytopes are automatically concatenated and embedded in a higher-dimensional Euclidean space. Then, the scene is navigated and rendered via novel methods tailored to concatenated polytopes. The proof-of-concept implementation and experiments with it show that the proposed methods bring the virtual-world users unusual and fascinating experiences, which cannot be provided in Euclidean-space applications.

Regression-based Monte Carlo integration

Monte Carlo integration is typically interpreted as an estimator of the expected value using stochastic samples. There exists an alternative interpretation in calculus where Monte Carlo integration can be seen as estimating a constant function---from the stochastic evaluations of the integrand---that integrates to the original integral. The integral mean value theorem states that this constant function should be the mean (or expectation) of the integrand. Since both interpretations result in the same estimator, little attention has been devoted to the calculus-oriented interpretation. We show that the calculus-oriented interpretation actually implies the possibility of using a more complex function than a constant one to construct a more efficient estimator for Monte Carlo integration. We build a new estimator based on this interpretation and relate our estimator to control variates with least-squares regression on the stochastic samples of the integrand. Unlike prior work, our resulting estimator is provably better than or equal to the conventional Monte Carlo estimator. To demonstrate the strength of our approach, we introduce a practical estimator that can act as a simple drop-in replacement for conventional Monte Carlo integration. We experimentally validate our framework on various light transport integrals. The code is available at

SESSION: Sampling, reconstruction and appearance

Efficiency-aware multiple importance sampling for bidirectional rendering algorithms

Multiple importance sampling (MIS) is an indispensable tool in light-transport simulation. It enables robust Monte Carlo integration by combining samples from several techniques. However, it is well understood that such a combination is not always more efficient than using a single sampling technique. Thus a major criticism of complex combined estimators, such as bidirectional path tracing, is that they can be significantly less efficient on common scenes than simpler algorithms like forward path tracing. We propose a general method to improve MIS efficiency: By cheaply estimating the efficiencies of various technique and sample-count combinations, we can pick the best one. The key ingredient is a numerically robust and efficient scheme that uses the samples of one MIS combination to compute the efficiency of multiple other combinations. For example, we can run forward path tracing and use its samples to decide which subset of VCM to enable, and at what sampling rates. The sample count for each technique can be controlled per-pixel or globally. Applied to VCM, our approach enables robust rendering of complex scenes with caustics, without compromising efficiency on simpler scenes.

EARS: efficiency-aware russian roulette and splitting

Russian roulette and splitting are widely used techniques to increase the efficiency of Monte Carlo estimators. But, despite their popularity, there is little work on how to best apply them. Most existing approaches rely on simple heuristics based on, e.g., surface albedo and roughness. Their efficiency often hinges on user-controlled parameters. We instead iteratively learn optimal Russian roulette and splitting factors during rendering, using a simple and lightweight data structure. Given perfect estimates of variance and cost, our fixed-point iteration provably converges to the optimal Russian roulette and splitting factors that maximize the rendering efficiency. In our application to unidirectional path tracing, we achieve consistent and significant speed-ups over the state of the art.

Shape dithering for 3D printing

We present an efficient, purely geometric, algorithmic, and parameter free approach to improve surface quality and accuracy in voxel-controlled 3D printing by counteracting quantization artifacts. Such artifacts arise due to the discrete voxel sampling of the continuous shape used to control the 3D printer, and are characterized by low-frequency geometric patterns on surfaces of any orientation. They are visually disturbing, particularly on small prints or smooth surfaces, and adversely affect the fatigue behavior of printed parts. We use implicit shape dithering, displacing the part's signed distance field with a high-frequent signal whose amplitude is adapted to the (anisotropic) print resolution. We expand the reverse generalized Fourier slice theorem by shear transforms, which we leverage to optimize a 3D blue-noise mask to generate the anisotropic dither signal. As a point process it is efficient and does not adversely affect 3D halftoning. We evaluate our approach for efficiency, geometric accuracy and show its advantages over the state of the art.

Semantically supervised appearance decomposition for virtual staging from a single panorama

We describe a novel approach to decompose a single panorama of an empty indoor environment into four appearance components: specular, direct sunlight, diffuse and diffuse ambient without direct sunlight. Our system is weakly supervised by automatically generated semantic maps (with floor, wall, ceiling, lamp, window and door labels) that have shown success on perspective views and are trained for panoramas using transfer learning without any further annotations. A GAN-based approach supervised by coarse information obtained from the semantic map extracts specular reflection and direct sunlight regions on the floor and walls. These lighting effects are removed via a similar GAN-based approach and a semantic-aware inpainting step. The appearance decomposition enables multiple applications including sun direction estimation, virtual furniture insertion, floor material replacement, and sun direction change, providing an effective tool for virtual home staging. We demonstrate the effectiveness of our approach on a large and recently released dataset of panoramas of empty homes.

MatBuilder: mastering sampling uniformity over projections

Many applications ranging from quasi-Monte Carlo integration over optimal control to neural networks benefit from high-dimensional, highly uniform samples. In the case of computer graphics, and more particularly in rendering, despite the need for uniformity, several sub-problems expose a low-dimensional structure. In this context, mastering sampling uniformity over projections while preserving high-dimensional uniformity has been intrinsically challenging. This difficulty may explain the relatively small number of mathematical constructions for such samplers. We propose a novel approach by showing that uniformity constraints can be expressed as an integer linear program that results in a sampler with the desired properties. As it turns out, complex constraints are easy to describe by means of stratification and sequence properties of digital nets. Formalized using generator matrix determinants, our new MatBuilder software solves the set of constraints by iterating the linear integer program solver in a greedy fashion to compute a problem-specific set of generator matrices that can be used as a drop-in replacement in the popular digital net samplers. The samplers created by MatBuilder achieve the uniformity of classic low discrepancy sequences. More importantly, we demonstrate the benefit of the unprecedented versatility of our constraint approach with respect to low-dimensional problem structure for several applications.

SESSION: Sketches, strokes, and ropes

Sketch2Pose: estimating a 3D character pose from a bitmap sketch

Artists frequently capture character poses via raster sketches, then use these drawings as a reference while posing a 3D character in a specialized 3D software --- a time-consuming process, requiring specialized 3D training and mental effort. We tackle this challenge by proposing the first system for automatically inferring a 3D character pose from a single bitmap sketch, producing poses consistent with viewer expectations. Algorithmically interpreting bitmap sketches is challenging, as they contain significantly distorted proportions and foreshortening. We address this by predicting three key elements of a drawing, necessary to disambiguate the drawn poses: 2D bone tangents, self-contacts, and bone foreshortening. These elements are then leveraged in an optimization inferring the 3D character pose consistent with the artist's intent. Our optimization balances cues derived from artistic literature and perception research to compensate for distorted character proportions. We demonstrate a gallery of results on sketches of numerous styles. We validate our method via numerical evaluations, user studies, and comparisons to manually posed characters and previous work.

Code and data for our paper are available at

CLIPasso: semantically-aware object sketching

Abstraction is at the heart of sketching due to the simple and minimal nature of line drawings. Abstraction entails identifying the essential visual properties of an object or scene, which requires semantic understanding and prior knowledge of high-level concepts. Abstract depictions are therefore challenging for artists, and even more so for machines. We present CLIPasso, an object sketching method that can achieve different levels of abstraction, guided by geometric and semantic simplifications. While sketch generation methods often rely on explicit sketch datasets for training, we utilize the remarkable ability of CLIP (Contrastive-Language-Image-Pretraining) to distill semantic concepts from sketches and images alike. We define a sketch as a set of Bézier curves and use a differentiable rasterizer to optimize the parameters of the curves directly with respect to a CLIP-based perceptual loss. The abstraction degree is controlled by varying the number of strokes. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual components of the subject drawn.

Detecting viewer-perceived intended vector sketch connectivity

Many sketch processing applications target precise vector drawings with accurately specified stroke intersections, yet free-form artist drawn sketches are typically inexact: strokes that are intended to intersect often stop short of doing so. While human observers easily perceive the artist intended stroke connectivity, manually, or even semi-manually, correcting drawings to generate correctly connected outputs is tedious and highly time consuming. We propose a novel, robust algorithm that extracts viewer-perceived stroke connectivity from inexact free-form vector drawings by leveraging observations about local and global factors that impact human perception of inter-stroke connectivity. We employ the identified local cues to train classifiers that assess the likelihood that pairs of strokes are perceived as forming end-to-end or T- junctions based on local context. We then use these classifiers within an incremental framework that combines classifier provided likelihoods with a more global, contextual and closure-based, analysis. We demonstrate our method on over 95 diversely sourced inputs, and validate it via a series of perceptual studies; participants prefer our outputs over the closest alternative by a factor of 9 to 1.

Piecewise-smooth surface fitting onto unstructured 3D sketches

We propose a method to transform unstructured 3D sketches into piecewise smooth surfaces that preserve sketched geometric features. Immersive 3D drawing and sketch-based 3D modeling applications increasingly produce imperfect and unstructured collections of 3D strokes as design output. These 3D sketches are readily perceived as piecewise smooth surfaces by viewers, but are poorly handled by existing 3D surface techniques tailored to well-connected curve networks or sparse point sets. Our algorithm is aligned with human tendency to imagine the strokes as a small set of simple smooth surfaces joined along stroke boundaries. Starting with an initial proxy surface, we iteratively segment the surface into smooth patches joined sharply along some strokes, and optimize these patches to fit surrounding strokes. Our evaluation is fourfold: we demonstrate the impact of various algorithmic parameters, we evaluate our method on synthetic sketches with known ground truth surfaces, we compare to prior art, and we show compelling results on more than 50 designs from a diverse set of 3D sketch sources.

SESSION: Design, direct, plan and program

Rapid design of articulated objects

Designing articulated objects is challenging because, unlike with static objects, it requires complex decisions to be made regarding the form, parts, rig, poses, and motion. We present a novel 3D sketching system for rapidly authoring concepts of articulated objects for the early stages of design, when designers make such decisions. Compared to existing CAD software, which focuses on slowly but elaborately producing models consisting of precise surfaces and volumes, our system focuses on quickly but roughly producing models consisting of key curves through a small set of coherent pen and multi-touch gestures. We found that professional designers could easily learn and use our system and author compelling concepts in a short time, showing that 3D sketching can be extended to designing articulated objects and is generally applicable in film, animation, game, and product design.

Dynamic optimal space partitioning for redirected walking in multi-user environment

In multi-user Redirected Walking (RDW), the space subdivision method divides a shared physical space into sub-spaces and allocates a sub-space to each user. While this approach has the advantage of precluding any collisions between users, the conventional space subdivision method suffers from frequent boundary resets due to the reduction of available space per user. To address this challenge, in this study, we propose a space subdivision method called Optimal Space Partitioning (OSP) that dynamically divides the shared physical space in real-time. By exploiting spatial information of the physical and virtual environment, OSP predicts the movement of users and divides the shared physical space into optimal sub-spaces separated with shutters. Our OSP framework is trained using deep reinforcement learning to allocate optimal sub-space to each user and provide optimal steering. Our experiments demonstrate that OSP provides higher sense of immersion to users by minimizing the total number of reset counts, while preserving the advantage of the existing space subdivision strategy: ensuring better safety to users by completely eliminating the possibility of any collisions between users beforehand. Our project is available at

Interactive augmented reality storytelling guided by scene semantics

We present a novel interactive augmented reality (AR) storytelling approach guided by indoor scene semantics. Our approach automatically populates virtual contents in real-world environments to deliver AR stories, which match both the story plots and scene semantics. During the storytelling process, a player can participate as a character in the story. Meanwhile, the behaviors of the virtual characters and the placement of the virtual items adapt to the player's actions. An input raw story is represented as a sequence of events, which contain high-level descriptions of the characters' states, and is converted into a graph representation with automatically supplemented low-level spatial details. Our hierarchical story sampling approach samples realistic character behaviors that fit the story contexts through optimizations; and an animator, which estimates and prioritizes the player's actions, animates the virtual characters to tell the story in AR. Through experiments and a user study, we validated the effectiveness of our approach for AR storytelling in different environments.

WallPlan: synthesizing floorplans by learning to generate wall graphs

Floorplan generation has drawn widespread interest in the community. Recent learning-based methods for generating realistic floorplans have made significant progress while a complex heuristic post-processing is still necessary to obtain desired results. In this paper, we propose a novel wall-oriented method, called WallPlan, for automatically and efficiently generating plausible floorplans from various design constraints. We pioneer the representation of the floorplan as a wall graph with room labels and consider the floorplan generation as a graph generation. Given the boundary as input, we first initialize the boundary with windows predicted by WinNet. Then a graph generation network GraphNet and semantics prediction network LabelNet are coupled to generate the wall graph progressively by imitating graph traversal. WallPlan can be applied for practical architectural designs, especially the wall-based constraints. We conduct ablation experiments, qualitative evaluations, quantitative comparisons, and perceptual studies to evaluate our method's feasibility, efficacy, and versatility. Intensive experiments demonstrate our method requires no post-processing, producing higher quality floorplans than state-of-the-art techniques.

Free2CAD: parsing freehand drawings into CAD commands

CAD modeling, despite being the industry-standard, remains restricted to usage by skilled practitioners due to two key barriers. First, the user must be able to mentally parse a final shape into a valid sequence of supported CAD commands; and second, the user must be sufficiently conversant with CAD software packages to be able to execute the corresponding CAD commands. As a step towards addressing both these challenges, we present Free2CAD wherein the user can simply sketch the final shape and our system parses the input strokes into a sequence of commands expressed in a simplified CAD language. When executed, these commands reproduce the sketched object. Technically, we cast sketch-based CAD modeling as a sequence-to-sequence translation problem, for which we leverage the powerful Transformers neural network architecture. Given the sequence of pen strokes as input, we introduce the new task of grouping strokes that correspond to individual CAD operations. We combine stroke grouping with geometric fitting of the operation parameters, such that intermediate groups are geometrically corrected before being reused, as context, for subsequent steps in the sequence inference. Although trained on synthetically-generated data, we demonstrate that Free2CAD generalizes to sketches created from real-world CAD models as well as to sketches drawn by novice users.

Code and data are at

SESSION: Physics-based character control

ASE: large-scale reusable adversarial skill embeddings for physically simulated characters

The incredible feats of athleticism demonstrated by humans are made possible in part by a vast repertoire of general-purpose motor skills, acquired through years of practice and experience. These skills not only enable humans to perform complex tasks, but also provide powerful priors for guiding their behaviors when learning new tasks. This is in stark contrast to what is common practice in physics-based character animation, where control policies are most typically trained from scratch for each task. In this work, we present a large-scale data-driven framework for learning versatile and reusable skill embeddings for physically simulated characters. Our approach combines techniques from adversarial imitation learning and unsupervised reinforcement learning to develop skill embeddings that produce life-like behaviors, while also providing an easy to control representation for use on new downstream tasks. Our models can be trained using large datasets of unstructured motion clips, without requiring any task-specific annotation or segmentation of the motion data. By leveraging a massively parallel GPU-based simulator, we are able to train skill embeddings using over a decade of simulated experiences, enabling our model to learn a rich and versatile repertoire of skills. We show that a single pre-trained model can be effectively applied to perform a diverse set of new tasks. Our system also allows users to specify tasks through simple reward functions, and the skill embedding then enables the character to automatically synthesize complex and naturalistic strategies in order to achieve the task objectives.

Learning to use chopsticks in diverse gripping styles

Learning dexterous manipulation skills is a long-standing challenge in computer graphics and robotics, especially when the task involves complex and delicate interactions between the hands, tools and objects. In this paper, we focus on chopsticks-based object relocation tasks, which are common yet demanding. The key to successful chopsticks skills is steady gripping of the sticks that also supports delicate maneuvers. We automatically discover physically valid chopsticks holding poses by Bayesian Optimization (BO) and Deep Reinforcement Learning (DRL), which works for multiple gripping styles and hand morphologies without the need of example data. Given as input the discovered gripping poses and desired objects to be moved, we build physics-based hand controllers to accomplish relocation tasks in two stages. First, kinematic trajectories are synthesized for the chopsticks and hand in a motion planning stage. The key components of our motion planner include a grasping model to select suitable chopsticks configurations for grasping the object, and a trajectory optimization module to generate collision-free chopsticks trajectories. Then we train physics-based hand controllers through DRL again to track the desired kinematic trajectories produced by the motion planner. We demonstrate the capabilities of our framework by relocating objects of various shapes and sizes, in diverse gripping styles and holding positions for multiple hand morphologies. Our system achieves faster learning speed and better control robustness, when compared to vanilla systems that attempt to learn chopstick-based skills without a gripping pose optimization module and/or without a kinematic motion planner. Our code and models are available at this link.1

Physics-based character controllers using conditional VAEs

High-quality motion capture datasets are now publicly available, and researchers have used them to create kinematics-based controllers that can generate plausible and diverse human motions without conditioning on specific goals (i.e., a task-agnostic generative model). In this paper, we present an algorithm to build such controllers for physically simulated characters having many degrees of freedom. Our physics-based controllers are learned by using conditional VAEs, which can perform a variety of behaviors that are similar to motions in the training dataset. The controllers are robust enough to generate more than a few minutes of motion without conditioning on specific goals and to allow many complex downstream tasks to be solved efficiently. To show the effectiveness of our method, we demonstrate controllers learned from several different motion capture databases and use them to solve a number of downstream tasks that are challenging to learn controllers that generate natural-looking motions from scratch. We also perform ablation studies to demonstrate the importance of the elements of the algorithm. Code and data for this paper are available at:

Learning high-DOF reaching-and-grasping via dynamic representation of gripper-object interaction

We approach the problem of high-DOF reaching-and-grasping via learning joint planning of grasp and motion with deep reinforcement learning. To resolve the sample efficiency issue in learning the high-dimensional and complex control of dexterous grasping, we propose an effective representation of grasping state characterizing the spatial interaction between the gripper and the target object. To represent gripper-object interaction, we adopt Interaction Bisector Surface (IBS) which is the Voronoi diagram between two close by 3D geometric objects and has been successfully applied in characterizing spatial relations between 3D objects. We found that IBS is surprisingly effective as a state representation since it well informs the finegrained control of each finger with spatial relation against the target object. This novel grasp representation, together with several technical contributions including a fast IBS approximation, a novel vector-based reward and an effective training strategy, facilitate learning a strong control model of high-DOF grasping with good sample efficiency, dynamic adaptability, and cross-category generality. Experiments show that it generates high-quality dexterous grasp for complex shapes with smooth grasping motions. Code and data for this paper are at

SESSION: Large scenes and fast neural rendering

Scalable neural indoor scene rendering

We propose a scalable neural scene reconstruction and rendering method to support distributed training and interactive rendering of large indoor scenes. Our representation is based on tiles. Tile appearances are trained in parallel through a background sampling strategy that augments each tile with distant scene information via a proxy global mesh. Each tile has two low-capacity MLPs: one for view-independent appearance (diffuse color and shading) and one for view-dependent appearance (specular highlights, reflections). We leverage the phenomena that complex view-dependent scene reflections can be attributed to virtual lights underneath surfaces at the total ray distance to the source. This lets us handle sparse samplings of the input scene where reflection highlights do not always appear consistently in input images. We show interactive free-viewpoint rendering results from five scenes, one of which covers an area of more than 100 m2. Experimental results show that our method produces higher-quality renderings than a single large-capacity MLP and five recent neural proxy-geometry and voxel-based baseline methods. Our code and data are available at project webpage

ADOP: approximate differentiable one-pixel point rendering

In this paper we present ADOP, a novel point-based, differentiable neural rendering pipeline. Like other neural renderers, our system takes as input calibrated camera images and a proxy geometry of the scene, in our case a point cloud. To generate a novel view, the point cloud is rasterized with learned feature vectors as colors and a deep neural network fills the remaining holes and shades each output pixel. The rasterizer renders points as one-pixel splats, which makes it very fast and allows us to compute gradients with respect to all relevant input parameters efficiently. Furthermore, our pipeline contains a fully differentiable physically-based photometric camera model, including exposure, white balance, and a camera response function. Following the idea of inverse rendering, we use our renderer to refine its input in order to reduce inconsistencies and optimize the quality of its output. In particular, we can optimize structural parameters like the camera pose, lens distortions, point positions and features, and a neural environment map, but also photometric parameters like camera response function, vignetting, and per-image exposure and white balance. Because our pipeline includes photometric parameters, e.g. exposure and camera response function, our system can smoothly handle input images with varying exposure and white balance, and generates high-dynamic range output. We show that due to the improved input, we can achieve high render quality, also for difficult input, e.g. with imperfect camera calibrations, inaccurate proxy geometry, or varying exposure. As a result, a simpler and thus faster deep neural network is sufficient for reconstruction. In combination with the fast point rasterization, ADOP achieves real-time rendering rates even for models with well over 100M points.

Egocentric scene reconstruction from an omnidirectional video

Omnidirectional videos capture environmental scenes effectively, but they have rarely been used for geometry reconstruction. In this work, we propose an egocentric 3D reconstruction method that can acquire scene geometry with high accuracy from a short egocentric omnidirectional video. To this end, we first estimate per-frame depth using a spherical disparity network. We then fuse per-frame depth estimates into a novel spherical binoctree data structure that is specifically designed to tolerate spherical depth estimation errors. By subdividing the spherical space into binary tree and octree nodes that represent spherical frustums adaptively, the spherical binoctree effectively enables egocentric surface geometry reconstruction for environmental scenes while simultaneously assigning high-resolution nodes for closely observed surfaces. This allows to reconstruct an entire scene from a short video captured with a small camera trajectory. Experimental results validate the effectiveness and accuracy of our approach for reconstructing the 3D geometry of environmental scenes from short egocentric omnidirectional video inputs. We further demonstrate various applications using a conventional omnidirectional camera, including novel-view synthesis, object insertion, and relighting of scenes using reconstructed 3D models with texture.

Neural rendering in a room: amodal 3D understanding and free-viewpoint rendering for the closed scene composed of pre-captured objects

We, as human beings, can understand and picture a familiar scene from arbitrary viewpoints given a single image, whereas this is still a grand challenge for computers. We hereby present a novel solution to mimic such human perception capability based on a new paradigm of amodal 3D scene understanding with neural rendering for a closed scene. Specifically, we first learn the prior knowledge of the objects in a closed scene via an offline stage, which facilitates an online stage to understand the room with unseen furniture arrangement. During the online stage, given a panoramic image of the scene in different layouts, we utilize a holistic neural-rendering-based optimization framework to efficiently estimate the correct 3D scene layout and deliver realistic free-viewpoint rendering. In order to handle the domain gap between the offline and online stage, our method exploits compositional neural rendering techniques for data augmentation in the offline training. The experiments on both synthetic and real datasets demonstrate that our two-stage design achieves robust 3D scene understanding and outperforms competing methods by a large margin, and we also show that our realistic free-viewpoint rendering enables various applications, including scene touring and editing. Code and data are available on the project webpage:

Instant neural graphics primitives with a multiresolution hash encoding

Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations: a small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The multiresolution structure allows the network to disambiguate hash collisions, making for a simple architecture that is trivial to parallelize on modern GPUs. We leverage this parallelism by implementing the whole system using fully-fused CUDA kernels with a focus on minimizing wasted bandwidth and compute operations. We achieve a combined speedup of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds, and rendering in tens of milliseconds at a resolution of 1920×1080.

SESSION: Neural geometry processing

Dual octree graph networks for learning adaptive volumetric shape representations

We present an adaptive deep representation of volumetric fields of 3D shapes and an efficient approach to learn this deep representation for high-quality 3D shape reconstruction and auto-encoding. Our method encodes the volumetric field of a 3D shape with an adaptive feature volume organized by an octree and applies a compact multilayer perceptron network for mapping the features to the field value at each 3D position. An encoder-decoder network is designed to learn the adaptive feature volume based on the graph convolutions over the dual graph of octree nodes. The core of our network is a new graph convolution operator defined over a regular grid of features fused from irregular neighboring octree nodes at different levels, which not only reduces the computational and memory cost of the convolutions over irregular neighboring octree nodes, but also improves the performance of feature learning. Our method effectively encodes shape details, enables fast 3D shape reconstruction, and exhibits good generality for modeling 3D shapes out of training categories. We evaluate our method on a set of reconstruction tasks of 3D shapes and scenes and validate its superiority over other existing approaches. Our code, data, and trained models are available at

Neural dual contouring

We introduce neural dual contouring (NDC), a new data-driven approach to mesh reconstruction based on dual contouring (DC). Like traditional DC, it produces exactly one vertex per grid cell and one quad for each grid edge intersection, a natural and efficient structure for reproducing sharp features. However, rather than computing vertex locations and edge crossings with hand-crafted functions that depend directly on difficult-to-obtain surface gradients, NDC uses a neural network to predict them. As a result, NDC can be trained to produce meshes from signed or unsigned distance fields, binary voxel grids, or point clouds (with or without normals); and it can produce open surfaces in cases where the input represents a sheet or partial surface. During experiments with five prominent datasets, we find that NDC, when trained on one of the datasets, generalizes well to the others. Furthermore, NDC provides better surface reconstruction accuracy, feature preservation, output complexity, triangle quality, and inference time in comparison to previous learned (e.g., neural marching cubes, convolutional occupancy networks) and traditional (e.g., Poisson) methods. Code and data are available at

SESSION: Convolutions and neural fields

DeltaConv: anisotropic operators for geometric deep learning on point clouds

Learning from 3D point-cloud data has rapidly gained momentum, motivated by the success of deep learning on images and the increased availability of 3D data. In this paper, we aim to construct anisotropic convolution layers that work directly on the surface derived from a point cloud. This is challenging because of the lack of a global coordinate system for tangential directions on surfaces. We introduce DeltaConv, a convolution layer that combines geometric operators from vector calculus to enable the construction of anisotropic filters on point clouds. Because these operators are defined on scalar- and vector-fields, we separate the network into a scalar- and a vector-stream, which are connected by the operators. The vector stream enables the network to explicitly represent, evaluate, and process directional information. Our convolutions are robust and simple to implement and match or improve on state-of-the-art approaches on several benchmarks, while also speeding up training and inference.

SPAGHETTI: editing implicit shapes through part aware generation

Neural implicit fields are quickly emerging as an attractive representation for learning based techniques. However, adopting them for 3D shape modeling and editing is challenging. We introduce a method for Editing Implicit Shapes Through Part Aware GeneraTion, permuted in short as SPAGHETTI. Our architecture allows for manipulation of implicit shapes by means of transforming, interpolating and combining shape segments together, without requiring explicit part supervision. SPAGHETTI disentangles shape part representation into extrinsic and intrinsic geometric information. This characteristic enables a generative framework with part-level control. The modeling capabilities of SPAGHETTI are demonstrated using an interactive graphical interface, where users can directly edit neural implicit shapes. Our code, editing user interface demo and pre-trained models are available at

Spelunking the deep: guaranteed queries on general neural implicit surfaces via range analysis

Neural implicit representations, which encode a surface as the level set of a neural network applied to spatial coordinates, have proven to be remarkably effective for optimizing, compressing, and generating 3D geometry. Although these representations are easy to fit, it is not clear how to best evaluate geometric queries on the shape, such as intersecting against a ray or finding a closest point. The predominant approach is to encourage the network to have a signed distance property. However, this property typically holds only approximately, leading to robustness issues, and holds only at the conclusion of training, inhibiting the use of queries in loss functions. Instead, this work presents a new approach to perform queries directly on general neural implicit functions for a wide range of existing architectures. Our key tool is the application of range analysis to neural networks, using automatic arithmetic rules to bound the output of a network over a region; we conduct a study of range analysis on neural networks, and identify variants of affine arithmetic which are highly effective. We use the resulting bounds to develop geometric queries including ray casting, intersection testing, constructing spatial hierarchies, fast mesh extraction, closest-point evaluation, evaluating bulk properties, and more. Our queries can be efficiently evaluated on GPUs, and offer concrete accuracy guarantees even on randomly-initialized networks, enabling their use in training objectives and beyond. We also show a preliminary application to inverse rendering.

DEF: deep estimation of sharp geometric features in 3D shapes

We propose Deep Estimators of Features (DEFs), a learning-based framework for predicting sharp geometric features in sampled 3D shapes. Differently from existing data-driven methods, which reduce this problem to feature classification, we propose to regress a scalar field representing the distance from point samples to the closest feature line on local patches. Our approach is the first that scales to massive point clouds by fusing distance-to-feature estimates obtained on individual patches.

We extensively evaluate our approach against related state-of-the-art methods on newly proposed synthetic and real-world 3D CAD model benchmarks. Our approach not only outperforms these (with improvements in Recall and False Positives Rates), but generalizes to real-world scans after training our model on synthetic data and fine-tuning it on a small dataset of scanned data.

We demonstrate a downstream application, where we reconstruct an explicit representation of straight and curved sharp feature lines from range scan data.

We make code, pre-trained models, and our training and evaluation datasets available at

Neural jacobian fields: learning intrinsic mappings of arbitrary meshes

This paper introduces a framework designed to accurately predict piecewise linear mappings of arbitrary meshes via a neural network, enabling training and evaluating over heterogeneous collections of meshes that do not share a triangulation, as well as producing highly detail-preserving maps whose accuracy exceeds current state of the art. The framework is based on reducing the neural aspect to a prediction of a matrix for a single given point, conditioned on a global shape descriptor. The field of matrices is then projected onto the tangent bundle of the given mesh, and used as candidate jacobians for the predicted map. The map is computed by a standard Poisson solve, implemented as a differentiable layer with cached pre-factorization for efficient training. This construction is agnostic to the triangulation of the input, thereby enabling applications on datasets with varying triangulations. At the same time, by operating in the intrinsic gradient domain of each individual mesh, it allows the framework to predict highly-accurate mappings. We validate these properties by conducting experiments over a broad range of scenarios, from semantic ones such as morphing, registration, and deformation transfer, to optimization-based ones, such as emulating elastic deformations and contact correction, as well as being the first work, to our knowledge, to tackle the task of learning to compute UV parameterizations of arbitrary meshes. The results exhibit the high accuracy of the method as well as its versatility, as it is readily applied to the above scenarios without any changes to the framework.

SESSION: Display, write, and unwrap

Joint neural phase retrieval and compression for energy- and computation-efficient holography on the edge

Recent deep learning approaches have shown remarkable promise to enable high fidelity holographic displays. However, lightweight wearable display devices cannot afford the computation demand and energy consumption for hologram generation due to the limited onboard compute capability and battery life. On the other hand, if the computation is conducted entirely remotely on a cloud server, transmitting lossless hologram data is not only challenging but also result in prohibitively high latency and storage.

In this work, by distributing the computation and optimizing the transmission, we propose the first framework that jointly generates and compresses high-quality phase-only holograms. Specifically, our framework asymmetrically separates the hologram generation process into high-compute remote encoding (on the server), and low-compute decoding (on the edge) stages. Our encoding enables light weight latent space data, thus faster and efficient transmission to the edge device. With our framework, we observed a reduction of 76% computation and consequently 83% in energy cost on edge devices, compared to the existing hologram generation methods. Our framework is robust to transmission and decoding errors, and approach high image fidelity for as low as 2 bits-per-pixel, and further reduced average bit-rates and decoding time for holographic videos.

Accommodative holography: improving accommodation response for perceptually realistic holographic displays

Holographic displays have gained unprecedented attention as next-generation virtual and augmented reality applications with recent achievements in the realization of a high-contrast image through computer-generated holograms (CGHs). However, these holograms show a high energy concentration in a limited angular spectrum, whereas the holograms with uniformly distributed angular spectrum suffer from a severe speckle noise in the reconstructed images. In this study, we claim that these two physical phenomena attributed to the existing CGHs significantly limit the support of accommodation cues, which is known as one of the biggest advantages of holographic displays. To support the statement, we analyze and evaluate various CGH algorithms with contrast gradients - a change of contrast over the change of the focal diopter of the eye - simulated based on the optical configuration of the display system and human visual perception models. We first introduce two approaches to improve monocular accommodation response in holographic viewing experience; optical and computational approaches to provide holographic images with sufficient contrast gradients. We design and conduct user experiments with our prototype of holographic near-eye displays, validating the deficient support of accommodation cues in the existing CGH algorithms and demonstrating the feasibility of the proposed solutions with significant improvements on accommodative gains.

Closed-loop control of direct ink writing via reinforcement learning

Enabling additive manufacturing to employ a wide range of novel, functional materials can be a major boost to this technology. However, making such materials printable requires painstaking trial-and-error by an expert operator, as they typically tend to exhibit peculiar rheological or hysteresis properties. Even in the case of successfully finding the process parameters, there is no guarantee of print-to-print consistency due to material differences between batches. These challenges make closed-loop feedback an attractive option where the process parameters are adjusted on-the-fly. There are several challenges for designing an efficient controller: the deposition parameters are complex and highly coupled, artifacts occur after long time horizons, simulating the deposition is computationally costly, and learning on hardware is intractable. In this work, we demonstrate the feasibility of learning a closed-loop control policy for additive manufacturing using reinforcement learning. We show that approximate, but efficient, numerical simulation is sufficient as long as it allows learning the behavioral patterns of deposition that translate to real-world experiences. In combination with reinforcement learning, our model can be used to discover control policies that outperform baseline controllers. Furthermore, the recovered policies have a minimal sim-to-real gap. We showcase this by applying our control policy in-vivo on a single-layer printer using low and high viscosity materials.

SESSION: Fluid simulation

Covector fluids

The animation of delicate vortical structures of gas and liquids has been of great interest in computer graphics. However, common velocity-based fluid solvers can damp the vortical flow, while vorticity-based fluid solvers suffer from performance drawbacks. We propose a new velocity-based fluid solver derived from a reformulated Euler equation using covectors. Our method generates rich vortex dynamics by an advection process that respects the Kelvin circulation theorem. The numerical algorithm requires only a small local adjustment to existing advection-projection methods and can easily leverage recent advances therein. The resulting solver emulates a vortex method without the expensive conversion between vortical variables and velocities. We demonstrate that our method preserves vorticity in both vortex filament dynamics and turbulent flows significantly better than previous methods, while also improving preservation of energy.

Efficient kinetic simulation of two-phase flows

Real-life multiphase flows exhibit a number of complex and visually appealing behaviors, involving bubbling, wetting, splashing, and glugging. However, most state-of-the-art simulation techniques in graphics can only demonstrate a limited range of multiphase flow phenomena, due to their inability to handle the real water-air density ratio and to the large amount of numerical viscosity introduced in the flow simulation and its coupling with the interface. Recently, kinetic-based methods have achieved success in simulating large density ratios and high Reynolds numbers efficiently; but their memory overhead, limited stability, and numerically-intensive treatment of coupling with immersed solids remain enduring obstacles to their adoption in movie productions. In this paper, we propose a new kinetic solver to couple the incompressible Navier-Stokes equations with a conservative phase-field equation which remedies these major practical hurdles. The resulting two-phase immiscible fluid solver is shown to be efficient due to its massively-parallel nature and GPU implementation, as well as very versatile and reliable because of its enhanced stability to large density ratios, high Reynolds numbers, and complex solid boundaries. We highlight the advantages of our solver through various challenging simulation results that capture intricate and turbulent air-water interaction, including comparisons to previous work and real footage.

VEMPIC: particle-in-polyhedron fluid simulation for intricate solid boundaries

The comprehensive visual modeling of fluid motion has historically been a challenging task, due in no small part to the difficulties inherent in geometries that are non-manifold, open, or thin. Modern geometric cut-cell mesh generators have been shown to produce, both robustly and quickly, workable volumetric elements in the presence of these problematic geometries, and the resulting volumetric representation would seem to offer an ideal infrastructure with which to perform fluid simulations. However, cut-cell mesh elements are general polyhedra that often contain holes and are non-convex; it is therefore difficult to construct the explicit function spaces required to employ standard functional discretizations, such as the Finite Element Method. The Virtual Element Method (VEM) has recently emerged as a functional discretization that successfully operates with complex polyhedral elements through a weak formulation of its function spaces. We present a novel cut-cell fluid simulation framework that exactly represents boundary geometry during the simulation. Our approach enables, for the first time, detailed fluid simulation with "in-the-wild" obstacles, including ones that contain non-manifold parts, self-intersections, and extremely thin features. Our key technical contribution is the generalization of the Particle-In-Cell fluid simulation methodology to arbitrary polyhedra using VEM. Coupled with a robust cut-cell generation scheme, this produces a fluid simulation algorithm that can operate on previously infeasible geometries without requiring any additional mesh modification or repair.

A clebsch method for free-surface vortical flow simulation

We propose a novel Clebsch method to simulate the free-surface vortical flow. At the center of our approach lies a level-set method enhanced by a wave-function correction scheme and a wave-function extrapolation algorithm to tackle the Clebsch method's numerical instabilities near a dynamic interface. By combining the Clebsch wave function's expressiveness in representing vortical structures and the level-set function's ability on tracking interfacial dynamics, we can model complex vortex-interface interaction problems that exhibit rich free-surface flow details on a Cartesian grid. We showcase the efficacy of our approach by simulating a wide range of new free-surface flow phenomena that were impractical for previous methods, including horseshoe vortex, sink vortex, bubble rings, and free-surface wake vortices.

Guided bubbles and wet foam for realistic whitewater simulation

We present a method for enhancing fluid simulations with realistic bubble and foam detail. We treat bubbles as discrete air particles, two-way coupled with a sparse volumetric Euler flow, as first suggested in [Stomakhin et al. 2020]. We elaborate further on their scheme and introduce a bubble inertia correction term for improved convergence. We also show how one can add bubbles to an already existing fluid simulation using our novel guiding technique, which performs local re-simulation of fluid to achieve more interesting bubble dynamics through coupling. As bubbles reach the surface, they are converted into foam and simulated separately. Our foam is discretized with smoothed particle hydrodynamics (SPH), and we replace forces normal to the fluid surface with a fluid surface manifold advection constraint to achieve more robust and stable results. The SPH forces are derived through proper constitutive modeling of an incompressible viscous liquid, and we explain why this choice is appropriate for "wet" types of foam. This allows us to produce believable dynamics from close-up scenarios to large oceans, with just a few parameters that work intuitively across a variety of scales. Additionally, we present relevant research on air entrainment metrics and bubble distributions that have been used in this work.

The power particle-in-cell method

This paper introduces a new weighting scheme for particle-grid transfers that generates hybrid Lagrangian/Eulerian fluid simulations with uniform particle distributions and precise volume control. At its core, our approach reformulates the construction of Power Particles [de Goes et al. 2015] by computing volume-constrained density kernels. We employ these optimized kernels as particle domains within the Generalized Interpolation Material Point method (GIMP) in order to incorporate Power Particles into the Particle-In-Cell framework, hence the name the Power Particle-In-Cell method. We address the construction of volume-constrained density kernels as a regularized optimal transportation problem and describe an iterative solver based on localized Gaussian convolutions that leads to a significant performance speedup compared to [de Goes et al. 2015]. We also present novel extensions for handling free surfaces and solid obstacles that bypass the need for cell clipping and ghost particles. We demonstrate the advantages of our transfer weights by improving hybrid schemes for fluid simulation such as the Fluid Implicit Particle (FLIP) method and the Affine Particle-In-Cell (APIC) method with volume preservation and robustness to varying particle-per-cell ratio, while retaining low numerical dissipation, conserving linear and angular momenta, and avoiding particle reseeding or post-process relaxations.

Physics informed neural fields for smoke reconstruction with sparse data

High-fidelity reconstruction of dynamic fluids from sparse multiview RGB videos remains a formidable challenge, due to the complexity of the underlying physics as well as the severe occlusion and complex lighting in the captured data. Existing solutions either assume knowledge of obstacles and lighting, or only focus on simple fluid scenes without obstacles or complex lighting, and thus are unsuitable for real-world scenes with unknown lighting conditions or arbitrary obstacles. We present the first method to reconstruct dynamic fluid phenomena by leveraging the governing physics (ie, Navier -Stokes equations) in an end-to-end optimization from a mere set of sparse video frames without taking lighting conditions, geometry information, or boundary conditions as input. Our method provides a continuous spatio-temporal scene representation using neural networks as the ansatz of density and velocity solution functions for fluids as well as the radiance field for static objects. With a hybrid architecture that separates static and dynamic contents apart, fluid interactions with static obstacles are reconstructed for the first time without additional geometry input or human labeling. By augmenting time-varying neural radiance fields with physics-informed deep learning, our method benefits from the supervision of images and physical priors. Our progressively growing model with regularization further disentangles the density-color ambiguity in the radiance field, which allows for a more robust optimization from the given input of sparse views. A pretrained density-to-velocity fluid model is leveraged in addition as the data prior to avoid suboptimal velocity solutions which underestimate vorticity but trivially fulfill physical equations. Our method exhibits high-quality results with relaxed constraints and strong flexibility on a representative set of synthetic and real flow captures. Code and sample tests are at

SESSION: Benchmarks, datasets and learning

NIMBLE: a non-rigid hand model with bones and muscles

Emerging Metaverse applications demand reliable, accurate, and photorealistic reproductions of human hands to perform sophisticated operations as if in the physical world. While real human hand represents one of the most intricate coordination between bones, muscle, tendon, and skin, state-of-the-art techniques unanimously focus on modeling only the skeleton of the hand. In this paper, we present NIMBLE, a novel parametric hand model that includes the missing key components, bringing 3D hand model to a new level of realism. We first annotate muscles, bones and skins on the recent Magnetic Resonance Imaging hand (MRI-Hand) dataset [Li et al. 2021] and then register a volumetric template hand onto individual poses and subjects within the dataset. NIMBLE consists of 20 bones as triangular meshes, 7 muscle groups as tetrahedral meshes, and a skin mesh. Via iterative shape registration and parameter learning, it further produces shape blend shapes, pose blend shapes, and a joint regressor. We demonstrate applying NIMBLE to modeling, rendering, and visual inference tasks. By enforcing the inner bones and muscles to match anatomic and kinematic rules, NIMBLE can animate 3D hands to new poses at unprecedented realism. To model the appearance of skin, we further construct a photometric HandStage to acquire high-quality textures and normal maps to model wrinkles and palm print. Finally, NIMBLE also benefits learning-based hand pose and shape estimation by either synthesizing rich data or acting directly as a differentiable layer in the inference network.

NeuralSound: learning-based modal sound synthesis with acoustic transfer

We present a novel learning-based modal sound synthesis approach that includes a mixed vibration solver for modal analysis and a radiation network for acoustic transfer. Our mixed vibration solver consists of a 3D sparse convolution network and a Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) module for iterative optimization. Moreover, we highlight the correlation between a standard numerical vibration solver and our network architecture. Our radiation network predicts the Far-Field Acoustic Transfer maps (FFAT Maps) from the surface vibration of the object. The overall running time of our learning-based approach for most new objects is less than one second on a RTX 3080 Ti GPU while maintaining a high sound quality close to the ground truth solved by standard numerical methods. We also evaluate the numerical and perceptual accuracy of our approach on different objects with various shapes and materials.

Implicit neural representation for physics-driven actuated soft bodies

Active soft bodies can affect their shape through an internal actuation mechanism that induces a deformation. Similar to recent work, this paper utilizes a differentiable, quasi-static, and physics-based simulation layer to optimize for actuation signals parameterized by neural networks. Our key contribution is a general and implicit formulation to control active soft bodies by defining a function that enables a continuous mapping from a spatial point in the material space to the actuation value. This property allows us to capture the signal's dominant frequencies, making the method discretization agnostic and widely applicable. We extend our implicit model to mandible kinematics for the particular case of facial animation and show that we can reliably reproduce facial expressions captured with high-quality capture systems. We apply the method to volumetric soft bodies, human poses, and facial expressions, demonstrating artist-friendly properties, such as simple control over the latent space and resolution invariance at test time.

SESSION: Differentiable rendering and neural fields

Efficient estimation of boundary integrals for path-space differentiable rendering

Boundary integrals are unique to physics-based differentiable rendering and crucial for differentiating with respect to object geometry. Under the differential path integral framework---which has enabled the development of sophisticated differentiable rendering algorithms---the boundary components are themselves path integrals. Previously, although the mathematical formulation of boundary path integrals have been established, efficient estimation of these integrals remains challenging.

In this paper, we introduce a new technique to efficiently estimate boundary path integrals. A key component of our technique is a primary-sample-space guiding step for importance sampling of boundary segments. Additionally, we show multiple importance sampling can be used to combine multiple guided samplings. Lastly, we introduce an optional edge sorting step to further improve the runtime performance. We evaluate the effectiveness of our method using several differentiable-rendering and inverse-rendering examples and provide comparisons with existing methods for reconstruction as well as gradient quality.

DR.JIT: a just-in-time compiler for differentiable rendering

DR.JIT is a new just-in-time compiler for physically based rendering and its derivative. DR.JIT expedites research on these topics in two ways: first, it traces high-level simulation code (e.g., written in Python) and aggressively simplifies and specializes the resulting program representation, producing data-parallel kernels with state-of-the-art performance on CPUs and GPUs.

Second, it simplifies the development of differentiable rendering algorithms. Efficient methods in this area turn the derivative of a simulation into a simulation of the derivative. DR.JIT provides fine-grained control over the process of automatic differentiation to help with this transformation.

Specialization is particularly helpful in the context of differentiation, since large parts of the simulation ultimately do not influence the computed gradients. DR.JIT tracks data dependencies globally to find and remove redundant computation.

Differentiable signed distance function rendering

Physically-based differentiable rendering has recently emerged as an attractive new technique for solving inverse problems that recover complete 3D scene representations from images. The inversion of shape parameters is of particular interest but also poses severe challenges: shapes are intertwined with visibility, whose discontinuous nature introduces severe bias in computed derivatives unless costly precautions are taken. Shape representations like triangle meshes suffer from additional difficulties, since the continuous optimization of mesh parameters cannot introduce topological changes.

One common solution to these difficulties entails representing shapes using signed distance functions (SDFs) and gradually adapting their zero level set during optimization. Previous differentiable rendering of SDFs did not fully account for visibility gradients and required the use of mask or silhouette supervision, or discretization into a triangle mesh.

In this article, we show how to extend the commonly used sphere tracing algorithm so that it additionally outputs a reparameterization that provides the means to compute accurate shape parameter derivatives. At a high level, this resembles techniques for differentiable mesh rendering, though we show that the SDF representation admits a particularly efficient reparameterization that outperforms prior work. Our experiments demonstrate the reconstruction of (synthetic) objects without complex regularization or priors, using only a per-pixel RGB loss.

Adjoint nonlinear ray tracing

Reconstructing and designing media with continuously-varying refractive index fields remains a challenging problem in computer graphics. A core difficulty in trying to tackle this inverse problem is that light travels inside such media along curves, rather than straight lines. Existing techniques for this problem make strong assumptions on the shape of the ray inside the medium, and thus limit themselves to media where the ray deflection is relatively small. More recently, differentiable rendering techniques have relaxed this limitation, by making it possible to differentiably simulate curved light paths. However, the automatic differentiation algorithms underlying these techniques use large amounts of memory, restricting existing differentiable rendering techniques to relatively small media and low spatial resolutions.

We present a method for optimizing refractive index fields that both accounts for curved light paths and has a small, constant memory footprint. We use the adjoint state method to derive a set of equations for computing derivatives with respect to the refractive index field of optimization objectives that are subject to nonlinear ray tracing constraints. We additionally introduce discretization schemes to numerically evaluate these equations, without the need to store nonlinear ray trajectories in memory, significantly reducing the memory requirements of our algorithm. We use our technique to optimize high-resolution refractive index fields for a variety of applications, including creating different types of displays (multiview, lightfield, caustic), designing gradient-index optics, and reconstructing gas flows.

SESSION: Reconstruction

Alpha wrapping with an offset

Given an input 3D geometry such as a triangle soup or a point set, we address the problem of generating a watertight and orientable surface triangle mesh that strictly encloses the input. The output mesh is obtained by greedily refining and carving a 3D Delaunay triangulation on an offset surface of the input, while carving with empty balls of radius alpha. The proposed algorithm is controlled via two user-defined parameters: alpha and offset. Alpha controls the size of cavities or holes that cannot be traversed during carving, while offset controls the distance between the vertices of the output mesh and the input. Our algorithm is guaranteed to terminate and to yield a valid and strictly enclosing mesh, even for defect-laden inputs. Genericity is achieved using an abstract interface probing the input, enabling any geometry to be used, provided a few basic geometric queries can be answered. We benchmark the algorithm on large public datasets such as Thingi10k, and compare it to state-of-the-art approaches in terms of robustness, approximation, output complexity, speed, and peak memory consumption. Our implementation is available through the CGAL library.

Iterative poisson surface reconstruction (iPSR) for unoriented points

Poisson surface reconstruction (PSR) remains a popular technique for reconstructing watertight surfaces from 3D point samples thanks to its efficiency, simplicity, and robustness. Yet, the existing PSR method and subsequent variants work only for oriented points. This paper intends to validate that an improved PSR, called iPSR, can completely eliminate the requirement of point normals and proceed in an iterative manner. In each iteration, iPSR takes as input point samples with normals directly computed from the surface obtained in the preceding iteration, and then generates a new surface with better quality. Extensive quantitative evaluation confirms that the new iPSR algorithm converges in 5--30 iterations even with randomly initialized normals. If initialized with a simple visibility based heuristic, iPSR can further reduce the number of iterations. We conduct comprehensive comparisons with PSR and other powerful implicit-function based methods. Finally, we confirm iPSR's effectiveness and scalability on the AIM@SHAPE dataset and challenging (indoor and outdoor) scenes. Code and data for this paper are at

ComplexGen: CAD reconstruction by B-rep chain complex generation

We view the reconstruction of CAD models in the boundary representation (B-Rep) as the detection of geometric primitives of different orders, i.e., vertices, edges and surface patches, and the correspondence of primitives, which are holistically modeled as a chain complex, and show that by modeling such comprehensive structures more complete and regularized reconstructions can be achieved. We solve the complex generation problem in two steps. First, we propose a novel neural framework that consists of a sparse CNN encoder for input point cloud processing and a tri-path transformer decoder for generating geometric primitives and their mutual relationships with estimated probabilities. Second, given the probabilistic structure predicted by the neural network, we recover a definite B-Rep chain complex by solving a global optimization maximizing the likelihood under structural validness constraints and applying geometric refinements. Extensive tests on large scale CAD datasets demonstrate that the modeling of B-Rep chain complex structure enables more accurate detection for learning and more constrained reconstruction for optimization, leading to structurally more faithful and complete CAD B-Rep models than previous results.

Moving level-of-detail surfaces

We present a simple, fast, and smooth scheme to approximate Algebraic Point Set Surfaces using non-compact kernels, which is particularly suited for filtering and reconstructing point sets presenting large missing parts. Our key idea is to consider a moving level-of-detail of the input point set which is adaptive w.r.t. to the evaluation location, just such as the samples weights are output sensitive in the traditional moving least squares scheme. We also introduce an adaptive progressive octree refinement scheme, driven by the resulting implicit surface, to properly capture the modeled geometry even far away from the input samples. Similarly to typical compactly-supported approximations, our operator runs in logarithmic time while defining high quality surfaces even on challenging inputs for which only global optimizations achieve reasonable results. We demonstrate our technique on a variety of point sets featuring geometric noise as well as large holes.

SESSION: Reflectance, shading models and shaders

Photo-to-shape material transfer for diverse structures

We introduce a method for assigning photorealistic relightable materials to 3D shapes in an automatic manner. Our method takes as input a photo exemplar of a real object and a 3D object with segmentation, and uses the exemplar to guide the assignment of materials to the parts of the shape, so that the appearance of the resulting shape is as similar as possible to the exemplar. To accomplish this goal, our method combines an image translation neural network with a material assignment neural network. The image translation network translates the color from the exemplar to a projection of the 3D shape and the part segmentation from the projection to the exemplar. Then, the material prediction network assigns materials from a collection of realistic materials to the projected parts, based on the translated images and perceptual similarity of the materials.

One key idea of our method is to use the translation network to establish a correspondence between the exemplar and shape projection, which allows us to transfer materials between objects with diverse structures. Another key idea of our method is to use the two pairs of (color, segmentation) images provided by the image translation to guide the material assignment, which enables us to ensure the consistency in the assignment. We demonstrate that our method allows us to assign materials to shapes so that their appearances better resemble the input exemplars, improving the quality of the results over the state-of-the-art method, and allowing us to automatically create thousands of shapes with high-quality photorealistic materials. Code and data for this paper are available at

Towards practical physical-optics rendering

Physical light transport (PLT) algorithms can represent the wave nature of light globally in a scene, and are consistent with Maxwell's theory of electromagnetism. As such, they are able to reproduce the wave-interference and diffraction effects of real physical optics. However, the recent works that have proposed PLT are too expensive to apply to real-world scenes with complex geometry and materials. To address this problem, we propose a novel framework for physical light transport based on several key ideas that actually makes PLT practical for complex scenes. First, we restrict the spatial coherence shape of light to an anisotropic Gaussian and justify this restriction with general arguments based on entropy. This restriction serves to simplify the rest of the derivations, without practical loss of generality. To describe partially-coherent light, we present new rendering primitives that generalize the radiometric radiance and irradiance, and are based on the well-known Stokes parameters. We are able to represent light of arbitrary spectral content and states of polarization, and with any coherence volume and anisotropy. We also present the wave BSDF to accurately render diffractions and wave-interference effects. Furthermore, we present an approach to importance sample this wave BSDF to facilitate bi-directional path tracing, which has been previously impossible. We show good agreement with state-of-the-art methods, but unlike them we are able to render complex scenes where all the materials are new, coherence-aware physical optics materials, and with performance approaching that of "classical" rendering methods.

Sparse ellipsometry: portable acquisition of polarimetric SVBRDF and shape with unstructured flash photography

Ellipsometry techniques allow to measure polarization information of materials, requiring precise rotations of optical components with different configurations of lights and sensors. This results in cumbersome capture devices, carefully calibrated in lab conditions, and in very long acquisition times, usually in the order of a few days per object. Recent techniques allow to capture polarimetric spatially-varying reflectance information, but limited to a single view, or to cover all view directions, but limited to spherical objects made of a single homogeneous material. We present sparse ellipsometry, a portable polarimetric acquisition method that captures both polarimetric SVBRDF and 3D shape simultaneously. Our handheld device consists of off-the-shelf, fixed optical components. Instead of days, the total acquisition time varies between twenty and thirty minutes per object. We develop a complete polarimetric SVBRDF model that includes diffuse and specular components, as well as single scattering, and devise a novel polarimetric inverse rendering algorithm with data augmentation of specular reflection samples via generative modeling. Our results show a strong agreement with a recent ground-truth dataset of captured polarimetric BRDFs of real-world objects.

Position-free multiple-bounce computations for smith microfacet BSDFs

Bidirectional Scattering Distribution Functions (BSDFs) encode how a material reflects or transmits the incoming light. The most commonly used model is the microfacet BSDF. It computes the material response from the microgeometry of the surface assuming a single bounce on specular microfacets. The original model ignores multiple bounces on the microgeometry, resulting in an energy loss, especially for rough materials. In this paper, we present a new method to compute the multiple bounces inside the microgeometry, eliminating this energy loss. Our method relies on a position-free formulation of multiple bounces inside the microgeometry. We use an explicit mathematical definition of the path space that describes single and multiple bounces in a uniform way. We then study the behavior of light on the different vertices and segments in the path space, leading to a reciprocal multiple-bounce description of BSDFs. Furthermore, we present practical, unbiased Monte Carlo estimators to compute multiple scattering. Our method is less noisy than existing algorithms for computing multiple scattering. It is almost noise-free with a very-low sampling rate, from 2 to 4 samples per pixel (spp).

Aδ: autodiff for discontinuous programs - applied to shaders

Over the last decade, automatic differentiation (AD) has profoundly impacted graphics and vision applications --- both broadly via deep learning and specifically for inverse rendering. Traditional AD methods ignore gradients at discontinuities, instead treating functions as continuous. Rendering algorithms intrinsically rely on discontinuities, crucial at object silhouettes and in general for any branching operation. Researchers have proposed fully-automatic differentiation approaches for handling discontinuities by restricting to affine functions, or semi-automatic processes restricted either to invertible functions or to specialized applications like vector graphics. This paper describes a compiler-based approach to extend reverse mode AD so as to accept arbitrary programs involving discontinuities. Our novel gradient rules generalize differentiation to work correctly, assuming there is a single discontinuity in a local neighborhood, by approximating the prefiltered gradient over a box kernel oriented along a 1D sampling axis. We describe when such approximation rules are first-order correct, and show that this correctness criterion applies to a relatively broad class of functions. Moreover, we show that the method is effective in practice for arbitrary programs, including features for which we cannot prove correctness. We evaluate this approach on procedural shader programs, where the task is to optimize unknown parameters in order to match a target image, and our method outperforms baselines in terms of both convergence and efficiency. Our compiler outputs gradient programs in TensorFlow, PyTorch (for quick prototypes) and Halide with an optional auto-scheduler (for efficiency). The compiler also outputs GLSL that renders the target image, allowing users to interactively modify and animate the shader, which would otherwise be cumbersome in other representations such as triangle meshes or vector art.

SESSION: Character animation

DeepPhase: periodic autoencoders for learning motion phase manifolds

Learning the spatial-temporal structure of body movements is a fundamental problem for character motion synthesis. In this work, we propose a novel neural network architecture called the Periodic Autoencoder that can learn periodic features from large unstructured motion datasets in an unsupervised manner. The character movements are decomposed into multiple latent channels that capture the non-linear periodicity of different body segments while progressing forward in time. Our method extracts a multi-dimensional phase space from full-body motion data, which effectively clusters animations and produces a manifold in which computed feature distances provide a better similarity measure than in the original motion space to achieve better temporal and spatial alignment. We demonstrate that the learned periodic embedding can significantly help to improve neural motion synthesis in a number of tasks, including diverse locomotion skills, style-based movements, dance motion synthesis from music, synthesis of dribbling motions in football, and motion query for matching poses within large animation databases.

Real-time controllable motion transition for characters

Real-time in-between motion generation is universally required in games and highly desirable in existing animation pipelines. Its core challenge lies in the need to satisfy three critical conditions simultaneously: quality, controllability and speed, which renders any methods that need offline computation (or post-processing) or cannot incorporate (often unpredictable) user control undesirable. To this end, we propose a new real-time transition method to address the aforementioned challenges. Our approach consists of two key components: motion manifold and conditional transitioning. The former learns the important low-level motion features and their dynamics; while the latter synthesizes transitions conditioned on a target frame and the desired transition duration. We first learn a motion manifold that explicitly models the intrinsic transition stochasticity in human motions via a multi-modal mapping mechanism. Then, during generation, we design a transition model which is essentially a sampling strategy to sample from the learned manifold, based on the target frame and the aimed transition duration. We validate our method on different datasets in tasks where no post-processing or offline computation is allowed. Through exhaustive evaluation and comparison, we show that our method is able to generate high-quality motions measured under multiple metrics. Our method is also robust under various target frames (with extreme cases).

GANimator: neural motion synthesis from a single sequence

We present GANimator, a generative model that learns to synthesize novel motions from a single, short motion sequence. GANimator generates motions that resemble the core elements of the original motion, while simultaneously synthesizing novel and diverse movements. Existing data-driven techniques for motion synthesis require a large motion dataset which contains the desired and specific skeletal structure. By contrast, GANimator only requires training on a single motion sequence, enabling novel motion synthesis for a variety of skeletal structures e.g., bipeds, quadropeds, hexapeds, and more. Our framework contains a series of generative and adversarial neural networks, each responsible for generating motions in a specific frame rate. The framework progressively learns to synthesize motion from random noise, enabling hierarchical control over the generated motion content across varying levels of detail. We show a number of applications, including crowd simulation, key-frame editing, style transfer, and interactive control, which all learn from a single input sequence. Code and data for this paper are at

Character articulation through profile curves

Computer animation relies heavily on rigging setups that articulate character surfaces through a broad range of poses. Although many deformation strategies have been proposed over the years, constructing character rigs is still a cumbersome process that involves repetitive authoring of point weights and corrective sculpts with limited and indirect shaping controls. This paper presents a new approach for character articulation that produces detail-preserving deformations fully controlled by 3D curves that profile the deforming surface. Our method starts with a spline-based rigging system in which artists can draw and articulate sparse curvenets that describe surface profiles. By analyzing the layout of the rigged curvenets, we quantify the deformation along each curve side independent of the mesh connectivity, thus separating the articulation controllers from the underlying surface representation. To propagate the curvenet articulation over the character surface, we formulate a deformation optimization that reconstructs surface details while conforming to the rigged curvenets. In this process, we introduce a cut-cell algorithm that binds the curvenet to the surface mesh by cutting mesh elements into smaller polygons possibly with cracks, and then derive a cut-aware numerical discretization that provides harmonic interpolations with curve discontinuities. We demonstrate the expressiveness and flexibility of our method using a series of animation clips.

SESSION: Learning "in style"

DCT-net: domain-calibrated translation for portrait stylization

This paper introduces DCT-Net, a novel image translation architecture for few-shot portrait stylization. Given limited style exemplars (~100), the new architecture can produce high-quality style transfer results with advanced ability to synthesize high-fidelity contents and strong generality to handle complicated scenes (e.g., occlusions and accessories). Moreover, it enables full-body image translation via one elegant evaluation network trained by partial observations (i.e., stylized heads). Few-shot learning based style transfer is challenging since the learned model can easily become overfitted in the target domain, due to the biased distribution formed by only a few training examples. This paper aims to handle the challenge by adopting the key idea of "calibration first, translation later" and exploring the augmented global structure with locally-focused translation. Specifically, the proposed DCT-Net consists of three modules: a content adapter borrowing the powerful prior from source photos to calibrate the content distribution of target samples; a geometry expansion module using affine transformations to release spatially semantic constraints; and a texture translation module leveraging samples produced by the calibrated distribution to learn a fine-grained conversion. Experimental results demonstrate the proposed method's superiority over the state of the art in head stylization and its effectiveness on full image translation with adaptive deformations. Our code is publicly available at

StyleGAN-NADA: CLIP-guided domain adaptation of image generators

Can a generative model be trained to produce images from a specific domain, guided only by a text prompt, without seeing any image? In other words: can an image generator be trained "blindly"? Leveraging the semantic power of large scale Contrastive-Language-Image-Pre-training (CLIP) models, we present a text-driven method that allows shifting a generative model to new domains, without having to collect even a single image. We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains characterized by diverse styles and shapes. Notably, many of these modifications would be difficult or infeasible to reach with existing methods. We conduct an extensive set of experiments across a wide range of domains. These demonstrate the effectiveness of our approach, and show that our models preserve the latent-space structure that makes generative models appealing for downstream tasks. Code and videos available at:

SNeRF: stylized neural implicit representations for 3D scenes

This paper presents a stylized novel view synthesis method. Applying state-of-the-art stylization methods to novel views frame by frame often causes jittering artifacts due to the lack of cross-view consistency. Therefore, this paper investigates 3D scene stylization that provides a strong inductive bias for consistent novel view synthesis. Specifically, we adopt the emerging neural radiance fields (NeRF) as our choice of 3D scene representation for their capability to render high-quality novel views for a variety of scenes. However, as rendering a novel view from a NeRF requires a large number of samples, training a stylized NeRF requires a large amount of GPU memory that goes beyond an off-the-shelf GPU capacity. We introduce a new training method to address this problem by alternating the NeRF and stylization optimization steps. Such a method enables us to make full use of our hardware memory capacity to both generate images at higher resolution and adopt more expressive image style transfer methods. Our experiments show that our method produces stylized NeRFs for a wide range of content, including indoor, outdoor and dynamic scenes, and synthesizes high-quality novel views with cross-view consistency.

SESSION: Perception

Noise-based enhancement for foveated rendering

Human visual sensitivity to spatial details declines towards the periphery. Novel image synthesis techniques, so-called foveated rendering, exploit this observation and reduce the spatial resolution of synthesized images for the periphery, avoiding the synthesis of high-spatial-frequency details that are costly to generate but not perceived by a viewer. However, contemporary techniques do not make a clear distinction between the range of spatial frequencies that must be reproduced and those that can be omitted. For a given eccentricity, there is a range of frequencies that are detectable but not resolvable. While the accurate reproduction of these frequencies is not required, an observer can detect their absence if completely omitted. We use this observation to improve the performance of existing foveated rendering techniques. We demonstrate that this specific range of frequencies can be efficiently replaced with procedural noise whose parameters are carefully tuned to image content and human perception. Consequently, these frequencies do not have to be synthesized during rendering, allowing more aggressive foveation, and they can be replaced by noise generated in a less expensive post-processing step, leading to improved performance of the rendering system. Our main contribution is a perceptually-inspired technique for deriving the parameters of the noise required for the enhancement and its calibration. The method operates on rendering output and runs at rates exceeding 200 FPS at 4K resolution, making it suitable for integration with real-time foveated rendering systems for VR and AR devices. We validate our results and compare them to the existing contrast enhancement technique in user experiments.

Image features influence reaction time: a learned probabilistic perceptual model for saccade latency

We aim to ask and answer an essential question "how quickly do we react after observing a displayed visual target?" To this end, we present psychophysical studies that characterize the remarkable disconnect between human saccadic behaviors and spatial visual acuity. Building on the results of our studies, we develop a perceptual model to predict temporal gaze behavior, particularly saccadic latency, as a function of the statistics of a displayed image. Specifically, we implement a neurologically-inspired probabilistic model that mimics the accumulation of confidence that leads to a perceptual decision. We validate our model with a series of objective measurements and user studies using an eye-tracked VR display. The results demonstrate that our model prediction is in statistical alignment with real-world human behavior. Further, we establish that many sub-threshold image modifications commonly introduced in graphics pipelines may significantly alter human reaction timing, even if the differences are visually undetectable. Finally, we show that our model can serve as a metric to predict and alter reaction latency of users in interactive computer graphics applications, thus may improve gaze-contingent rendering, design of virtual experiences, and player performance in e-sports. We illustrate this with two examples: estimating competition fairness in a video game with two different team colors, and tuning display viewing distance to minimize player reaction time.

stelaCSF: a unified model of contrast sensitivity as the function of spatio-temporal frequency, eccentricity, luminance and area

A contrast sensitivity function, or CSF, is a cornerstone of many visual models. It explains whether a contrast pattern is visible to the human eye. The existing CSFs typically account for a subset of relevant dimensions describing a stimulus, limiting the use of such functions to either static or foveal content but not both. In this paper, we propose a unified CSF, stelaCSF, which accounts for all major dimensions of the stimulus: spatial and temporal frequency, eccentricity, luminance, and area. To model the 5-dimensional space of contrast sensitivity, we combined data from 11 papers, each of which studied a subset of this space. While previously proposed CSFs were fitted to a single dataset, stelaCSF can predict the data from all these studies using the same set of parameters. The predictions are accurate in the entire domain, including low frequencies. In addition, stelaCSF relies on psychophysical models and experimental evidence to explain the major interactions between the 5 dimensions of the CSF. We demonstrate the utility of our new CSF in a flicker detection metric and in foveated rendering.

Dark stereo: improving depth perception under low luminance

It is often desirable or unavoidable to display Virtual Reality (VR) or stereoscopic content at low brightness. For example, a dimmer display reduces the flicker artefacts that are introduced by low-persistence VR headsets. It also saves power, prolongs battery life, and reduces the cost of a display or projection system. Additionally, stereo movies are usually displayed at relatively low luminance due to polarization filters or other optical elements necessary to separate two views. However, the binocular depth cues become less reliable at low luminance. In this paper, we propose a model of stereo constancy that predicts the precision of binocular depth cues for a given contrast and luminance. We use the model to design a novel contrast enhancement algorithm that compensates for the deteriorated depth perception to deliver good-quality stereoscopic images even for displays of very low brightness.

Perception of letter glyph parameters for InfoTypography

The advent of variable font technologies---where typographic parameters such as weight, x-height and slant are easily adjusted across a range---enables encoding ordinal, interval or ratio data into text that is still readable. This is potentially valuable to represent additional information in text labels in visualizations (e.g., font weight can indicate city size in a geographical visualization) or in text itself (e.g., the intended reading speed of a sentence can be encoded with the font width). However, we do not know how different parameters, which are complex variations of shape, are perceived by the human visual system. Without this information it is difficult to select appropriate parameters and mapping functions that maximize perception of differences within the parameter range. We provide an empirical characterization of seven typographical parameters of Latin fonts in terms of absolute perception and just noticeable differences (JNDs) to help visualization designers to choose typographic parameters for visualizations that contain text, as well as support typographers and type designers when selecting which levels of these parameters to implement to achieve differentiability between normal text, emphasized text and different headings.

Face deblurring using dual camera fusion on mobile phones

Motion blur of fast-moving subjects is a longstanding problem in photography and very common on mobile phones due to limited light collection efficiency, particularly in low-light conditions. While we have witnessed great progress in image deblurring in recent years, most methods require significant computational power and have limitations in processing high-resolution photos with severe local motions. To this end, we develop a novel face deblurring system based on the dual camera fusion technique for mobile phones. The system detects subject motion to dynamically enable a reference camera, e.g., ultrawide angle camera commonly available on recent premium phones, and captures an auxiliary photo with faster shutter settings. While the main shot is low noise but blurry (Figure 1(a)), the reference shot is sharp but noisy (Figure 1(b)). We learn ML models to align and fuse these two shots and output a clear photo without motion blur (Figure 1(c)). Our algorithm runs efficiently on Google Pixel 6, which takes 463 ms overhead per shot. Our experiments demonstrate the advantage and robustness of our system against alternative single-image, multi-frame, face-specific, and video deblurring algorithms as well as commercial products. To the best of our knowledge, our work is the first mobile solution for face motion deblurring that works reliably and robustly over thousands of images in diverse motion and lighting conditions.

SESSION: Computational design and fabrication

Computational design of passive grippers

This work proposes a novel generative design tool for passive grippers---robot end effectors that have no additional actuation and instead leverage the existing degrees of freedom in a robotic arm to perform grasping tasks. Passive grippers are used because they offer interesting trade-offs between cost and capabilities. However, existing designs are limited in the types of shapes that can be grasped. This work proposes to use rapid-manufacturing and design optimization to expand the space of shapes that can be passively grasped. Our novel generative design algorithm takes in an object and its positioning with respect to a robotic arm and generates a 3D printable passive gripper that can stably pick the object up. To achieve this, we address the key challenge of jointly optimizing the shape and the insert trajectory to ensure a passively stable grasp. We evaluate our method on a testing suite of 22 objects (23 experiments), all of which were evaluated with physical experiments to bridge the virtual-to-real gap. Code and data are at

Computational design of high-level interlocking puzzles

Interlocking puzzles are intriguing geometric games where the puzzle pieces are held together based on their geometric arrangement, preventing the puzzle from falling apart. High-level-of-difficulty, or simply high-level, interlocking puzzles are a subclass of interlocking puzzles that require multiple moves to take out the first subassembly from the puzzle. Solving a high-level interlocking puzzle is a challenging task since one has to explore many different configurations of the puzzle pieces until reaching a configuration where the first subassembly can be taken out. Designing a high-level interlocking puzzle with a user-specified level of difficulty is even harder since the puzzle pieces have to be interlocking in all the configurations before the first subassembly is taken out.

In this paper, we present a computational approach to design high-level interlocking puzzles. The core idea is to represent all possible configurations of an interlocking puzzle as well as transitions among these configurations using a rooted, undirected graph called a disassembly graph and leverage this graph to find a disassembly plan that requires a minimal number of moves to take out the first subassembly from the puzzle. At the design stage, our algorithm iteratively constructs the geometry of each puzzle piece to expand the disassembly graph incrementally, aiming to achieve a user-specified level of difficulty. We show that our approach allows efficient generation of high-level interlocking puzzles of various shape complexities, including new solutions not attainable by state-of-the-art approaches.

Mixed integer neural inverse design

In computational design and fabrication, neural networks are becoming important surrogates for bulky forward simulations. A long-standing, intertwined question is that of inverse design: how to compute a design that satisfies a desired target performance? Here, we show that the piecewise linear property, very common in everyday neural networks, allows for an inverse design formulation based on mixed-integer linear programming. Our mixed-integer inverse design uncovers globally optimal or near optimal solutions in a principled manner. Furthermore, our method significantly facilitates emerging, but challenging, combinatorial inverse design tasks, such as material selection. For problems where finding the optimal solution is intractable, we develop an efficient yet near-optimal hybrid approach. Eventually, our method is able to find solutions provably robust to possible fabrication perturbations among multiple designs with similar performances. Our code and data are available at

Umbrella meshes: elastic mechanisms for freeform shape deployment

We present a computational inverse design framework for a new class of volumetric deployable structures that have compact rest states and deploy into bending-active 3D target surfaces. Umbrella meshes consist of elastic beams, rigid plates, and hinge joints that can be directly printed or assembled in a zero-energy fabrication state. During deployment, as the elastic beams of varying heights rotate from vertical to horizontal configurations, the entire structure transforms from a compact block into a target curved surface. Umbrella Meshes encode both intrinsic and extrinsic curvature of the target surface and in principle are free from the area expansion ratio bounds of past auxetic material systems.

We build a reduced physics-based simulation framework to accurately and efficiently model the complex interaction between the elastically deforming components. To determine the mesh topology and optimal shape parameters for approximating a given target surface, we propose an inverse design optimization algorithm initialized with conformal flattening. Our algorithm minimizes the structure's strain energy in its deployed state and optimizes actuation forces so that the final deployed structure is in stable equilibrium close to the desired surface with few or no external constraints. We validate our approach by fabricating a series of physical models at various scales using different manufacturing techniques.

SESSION: Phenomenological animation

Filament based plasma

Simulation of stellar atmospheres, such as that of our own sun, is a common task in CGI for scientific visualization, movies and games. A fibrous volumetric texture is a visually dominant feature of the solar corona---the plasma that extends from the solar surface into space. These coronal fibers can be modeled as magnetic filaments whose shape is governed by the magnetohydrostatic equation. The magnetic filaments provide a Lagrangian curve representation and their initial configuration can be prescribed by an artist or generated from magnetic flux given as a scalar texture on the sun's surface. Subsequently, the shape of the filaments is determined based on a variational formulation. The output is a visual rendering of the whole sun. We demonstrate the fidelity of our method by comparing the resulting renderings with actual images of our sun's corona.

A moving eulerian-lagrangian particle method for thin film and foam simulation

We present the Moving Eulerian-Lagrangian Particles (MELP), a novel mesh-free method for simulating incompressible fluid on thin films and foams. Employing a bi-layer particle structure, MELP jointly simulates detailed, vigorous flow and large surface deformation at high stability and efficiency. In addition, we design multi-MELP: a mechanism that facilitates the physically-based interaction between multiple MELP systems, to simulate bubble clusters and foams with non-manifold topological evolution. We showcase the efficacy of our method with a broad range of challenging thin film phenomena, including the Rayleigh-Taylor instability across double-bubbles, foam fragmentation with rim surface tension, recovery of the Plateau borders, Newton black films, as well as cyclones on bubble clusters.

Ecoclimates: climate-response modeling of vegetation

One of the greatest challenges to mankind is understanding the underlying principles of climate change. Over the last years, the role of forests in climate change has received increased attention. This is due to the observation that not only the atmosphere has a principal impact on vegetation growth but also that vegetation is contributing to local variations of weather resulting in diverse microclimates. The interconnection of plant ecosystems and weather is described and studied as ecoclimates. In this work we take steps towards simulating ecoclimates by modeling the feedback loops between vegetation, soil, and atmosphere. In contrast to existing methods that only describe the climate at a global scale, our model aims at simulating local variations of climate. Specifically, we model tree growth interactively in response to gradients of water, temperature and light. As a result, we are able to capture a range of ecoclimate phenomena that have not been modeled before, including geomorphic controls, forest edge effects, the Foehn effect and spatial vegetation patterning. To validate the plausibility of our method we conduct a comparative analysis to studies from ecology and climatology. Consequently, our method advances the state-of-the-art of generating highly realistic outdoor landscapes of vegetation.

Unified many-worlds browsing of arbitrary physics-based animations

Manually tuning physics-based animation parameters to explore a simulation outcome space or achieve desired motion outcomes can be notoriously tedious. This problem has motivated many sophisticated and specialized optimization-based methods for fine-grained (keyframe) control, each of which are typically limited to specific animation phenomena, usually complicated, and, unfortunately, not widely used.

In this paper, we propose Unified Many-Worlds Browsing (UMWB), a practical method for sample-level control and exploration of physics-based animations. Our approach supports browsing of large simulation ensembles of arbitrary animation phenomena by using a unified volumetric WORLDPACK representation based on spatiotemporally compressed voxel data associated with geometric occupancy and other low-fidelity animation state. Beyond memory reduction, the WORLDPACK representation also enables unified query support for interactive browsing: it provides fast evaluation of approximate spatiotemporal queries, such as occupancy tests that find ensemble samples ("worlds") where material is either IN or NOT IN a user-specified spacetime region. WORLDPACKS also support real-time hardware-accelerated voxel rendering by exploiting the spatially hierarchical and temporal RLE raster data structure. Our UMWB implementation supports interactive browsing (and offline refinement) of ensembles containing thousands of simulation samples, and fast spatiotemporal queries and ranking. We show UMWB results using a wide variety of physics-based animation phenomena---not just JELL-O®.

SESSION: Cloth and patterns

Computational pattern making from 3D garment models

We propose a method for computing a sewing pattern of a given 3D garment model. Our algorithm segments an input 3D garment shape into patches and computes their 2D parameterization, resulting in pattern pieces that can be cut out of fabric and sewn together to manufacture the garment. Unlike the general state-of-the-art approaches for surface cutting and flattening, our method explicitly targets garment fabrication. It accounts for the unique properties and constraints of tailoring, such as seam symmetry, the usage of darts, fabric grain alignment, and a flattening distortion measure that models woven fabric deformation, respecting its anisotropic behavior. We bootstrap a recent patch layout approach developed for quadrilateral remeshing and adapt it to the purpose of computational pattern making, ensuring that the deformation of each pattern piece stays within prescribed bounds of cloth stress. While our algorithm can automatically produce the sewing patterns, it is fast enough to admit user input to creatively iterate on the pattern design. Our method can take several target poses of the 3D garment into account and integrate them into the sewing pattern design. We demonstrate results on both skintight and loose garments, showcasing the versatile application possibilities of our approach.

NeuralTailor: reconstructing sewing pattern structures from 3D point clouds of garments

The fields of SocialVR, performance capture, and virtual try-on are often faced with a need to faithfully reproduce real garments in the virtual world. One critical task is the disentanglement of the intrinsic garment shape from deformations due to fabric properties, physical forces, and contact with the body. We propose to use a garment sewing pattern, a realistic and compact garment descriptor, to facilitate the intrinsic garment shape estimation. Another major challenge is a high diversity of shapes and designs in the domain. The most common approach for Deep Learning on 3D garments is to build specialized models for individual garments or garment types. We argue that building a unified model for various garment designs has the benefit of generalization to novel garment types, hence covering a larger design domain than individual models would. We introduce NeuralTailor, a novel architecture based on point-level attention for set regression with variable cardinality, and apply it to the task of reconstructing 2D garment sewing patterns from the 3D point cloud garment models. Our experiments show that NeuralTailor successfully reconstructs sewing patterns and generalizes to garment types with pattern topologies unseen during training.

Clustered vector textures

Repetitive vector patterns are common in a variety of applications but can be challenging and tedious to create. Existing automatic synthesis methods target relatively simple, unstructured patterns such as discrete elements and continuous Bézier curves. This paper proposes an algorithm for generating vector patterns with diverse shapes and structured local interactions via a sample-based representation. Our main idea is adding explicit clustering as part of neighborhood similarity and iterative sample optimization for more robust sample synthesis and pattern reconstruction. The results indicate that our method can outperform existing methods on synthesizing a variety of structured vector textures. Our project page is available at

As-locally-uniform-as-possible reshaping of vector clip-art

Vector clip-art images consist of regions bounded by a network of vector curves. Users often wish to reshape, or rescale, existing clip-art images by changing the locations, proportions, or scales of different image elements. When reshaping images depicting synthetic content they seek to preserve global and local structures. These structures are best preserved when the gradient of the mapping between the original and the reshaped curve networks is locally as close as possible to a uniform scale; mappings that satisfy this property maximally preserve the input curve orientations and minimally change the shape of the input's geometric details, while allowing changes in the relative scales of the different features. The expectation of approximate scale uniformity is local; while reshaping operations are typically expected to change the relative proportions of a subset of network regions, users expect the change to be minimal away from the directly impacted regions and expect such changes to be gradual and distributed as evenly as possible. Unfortunately, existing methods for editing 2D curve networks do not satisfy these criteria. We propose a targeted As-Locally-Uniform-as-Possible (ALUP) vector clip-art reshaping method that satisfies the properties above. We formulate the computation of the desired output network as the solution of a constrained variational optimization problem. We effectively compute the desired solution by casting this continuous problem as a minimization of a non-linear discrete energy function, and obtain the desired minimizer by using a custom iterative solver. We validate our method via perceptual studies comparing our results to those created via algorithmic alternatives and manually generated ones. Participants preferred our results over the closest alternative by a ratio of 6 to 1.

SESSION: Neural pets, people and avatars

AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars

3D avatar creation plays a crucial role in the digital age. However, the whole production process is prohibitively time-consuming and labor-intensive. To democratize this technology to a larger audience, we propose AvatarCLIP, a zero-shot text-driven framework for 3D avatar generation and animation. Unlike professional software that requires expert knowledge, AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages. Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation. Specifically, driven by natural language descriptions, we initialize 3D human geometry generation with a shape VAE network. Based on the generated 3D human shapes, a volume rendering model is utilized to further facilitate geometry sculpting and texture generation. Moreover, by leveraging the priors learned in the motion VAE, a CLIP-guided reference-based motion synthesis method is proposed for the animation of the generated 3D avatar. Extensive qualitative and quantitative experiments validate the effectiveness and generalizability of AvatarCLIP on a wide range of avatars. Remarkably, AvatarCLIP can generate unseen 3D avatars with novel animations, achieving superior zero-shot capability. Codes are available at

Text2Human: text-driven controllable human image generation

Generating high-quality and diverse human images is an important yet challenging task in vision and graphics. However, existing generative models often fall short under the high diversity of clothing shapes and textures. Furthermore, the generation process is even desired to be intuitively controllable for layman users. In this work, we present a text-driven controllable framework, Text2Human, for a high-quality and diverse human generation. We synthesize full-body human images starting from a given human pose with two dedicated steps. 1) With some texts describing the shapes of clothes, the given human pose is first translated to a human parsing map. 2) The final human image is then generated by providing the system with more attributes about the textures of clothes. Specifically, to model the diversity of clothing textures, we build a hierarchical texture-aware codebook that stores multi-scale neural representations for each type of texture. The codebook at the coarse level includes the structural representations of textures, while the codebook at the fine level focuses on the details of textures. To make use of the learned hierarchical codebook to synthesize desired images, a diffusion-based transformer sampler with mixture of experts is firstly employed to sample indices from the coarsest level of the codebook, which then is used to predict the indices of the codebook at finer levels. The predicted indices at different levels are translated to human images by the decoder learned accompanied with hierarchical codebooks. The use of mixture-of-experts allows for the generated image conditioned on the fine-grained text input. The prediction for finer level indices refines the quality of clothing textures. Extensive quantitative and qualitative evaluations demonstrate that our proposed Text2Human framework can generate more diverse and realistic human images compared to state-of-the-art methods. Our project page is Code and pretrained models are available at

Authentic volumetric avatars from a phone scan

Creating photorealistic avatars of existing people currently requires extensive person-specific data capture, which is usually only accessible to the VFX industry and not the general public. Our work aims to address this drawback by relying only on a short mobile phone capture to obtain a drivable 3D head avatar that matches a person's likeness faithfully. In contrast to existing approaches, our architecture avoids the complex task of directly modeling the entire manifold of human appearance, aiming instead to generate an avatar model that can be specialized to novel identities using only small amounts of data. The model dispenses with low-dimensional latent spaces that are commonly employed for hallucinating novel identities, and instead, uses a conditional representation that can extract person-specific information at multiple scales from a high resolution registered neutral phone scan. We achieve high quality results through the use of a novel universal avatar prior that has been trained on high resolution multi-view video captures of facial performances of hundreds of human subjects. By fine-tuning the model using inverse rendering we achieve increased realism and personalize its range of motion. The output of our approach is not only a high-fidelity 3D head avatar that matches the person's facial shape and appearance, but one that can also be driven using a jointly discovered shared global expression space with disentangled controls for gaze direction. Via a series of experiments we demonstrate that our avatars are faithful representations of the subject's likeness. Compared to other state-of-the-art methods for lightweight avatar creation, our approach exhibits superior visual quality and animateability.

Artemis: articulated neural pets with appearance and motion synthesis

We, humans, are entering into a virtual era and indeed want to bring animals to the virtual world as well for companion. Yet, computer-generated (CGI) furry animals are limited by tedious off-line rendering, let alone interactive motion control. In this paper, we present ARTEMIS, a novel neural modeling and rendering pipeline for generating ARTiculated neural pets with appEarance and Motion synthesIS. Our ARTEMIS enables interactive motion control, real-time animation, and photo-realistic rendering of furry animals. The core of our ARTEMIS is a neural-generated (NGI) animal engine, which adopts an efficient octree-based representation for animal animation and fur rendering. The animation then becomes equivalent to voxel-level deformation based on explicit skeletal warping. We further use a fast octree indexing and efficient volumetric rendering scheme to generate appearance and density features maps. Finally, we propose a novel shading network to generate high-fidelity details of appearance and opacity under novel poses from appearance and density feature maps. For the motion control module in ARTEMIS, we combine state-of-the-art animal motion capture approach with recent neural character control scheme. We introduce an effective optimization scheme to reconstruct the skeletal motion of real animals captured by a multi-view RGB and Vicon camera array. We feed all the captured motion into a neural character control scheme to generate abstract control signals with motion styles. We further integrate ARTEMIS into existing engines that support VR headsets, providing an unprecedented immersive experience where a user can intimately interact with a variety of virtual animals with vivid movements and photo-realistic appearance. Extensive experiments and showcases demonstrate the effectiveness of our ARTEMIS system in achieving highly realistic rendering of NGI animals in real-time, providing daily immersive and interactive experiences with digital animals unseen before. We make available our ARTEMIS model and dynamic furry animal dataset at

SESSION: Faces and facial animation

Facial hair tracking for high fidelity performance capture

Facial hair is a largely overlooked topic in facial performance capture. Most production pipelines in the entertainment industry do not have a way to automatically capture facial hair or track the skin underneath it. Thus, actors are asked to shave clean before face capture, which is very often undesirable. Capturing the geometry of individual facial hairs is very challenging, and their presence makes it harder to capture the deforming shape of the underlying skin surface. Some attempts have already been made at automating this task, but only for static faces with relatively sparse 3D hair reconstructions. In particular, current methods lack the temporal correspondence needed when capturing a sequence of video frames depicting facial performance. The problem of robustly tracking the skin underneath also remains unaddressed. In this paper, we propose the first multiview reconstruction pipeline that tracks both the dense 3D facial hair, as well as the underlying 3D skin for entire performances. Our method operates with standard setups for face photogrammetry, without requiring dense camera arrays. For a given capture subject, our algorithm first reconstructs a dense, high-quality neutral 3D facial hairstyle by registering sparser hair reconstructions over multiple frames that depict a neutral face under quasi-rigid motion. This custom-built, reference facial hairstyle is then tracked throughout a variety of changing facial expressions in a captured performance, and the result is used to constrain the tracking of the 3D skin surface underneath. We demonstrate the proposed capture pipeline on a variety of different facial hairstyles and lengths, ranging from sparse and short to dense full-beards.

EyeNeRF: a hybrid representation for photorealistic synthesis, animation and relighting of human eyes

A unique challenge in creating high-quality animatable and relightable 3D avatars of real people is modeling human eyes, particularly in conjunction with the surrounding periocular face region. The challenge of synthesizing eyes is multifold as it requires 1) appropriate representations for the various components of the eye and the periocular region for coherent viewpoint synthesis, capable of representing diffuse, refractive and highly reflective surfaces, 2) disentangling skin and eye appearance from environmental illumination such that it may be rendered under novel lighting conditions, and 3) capturing eyeball motion and the deformation of the surrounding skin to enable re-gazing.

These challenges have traditionally necessitated the use of expensive and cumbersome capture setups to obtain high-quality results, and even then, modeling of the full eye region holistically has remained elusive. We present a novel geometry and appearance representation that enables high-fidelity capture and photorealistic animation, view synthesis and relighting of the eye region using only a sparse set of lights and cameras. Our hybrid representation combines an explicit parametric surface model for the eyeball surface with implicit deformable volumetric representations for the periocular region and the interior of the eye. This novel hybrid model has been designed specifically to address the various parts of that exceptionally challenging facial area - the explicit eyeball surface allows modeling refraction and high frequency specular reflection at the cornea, whereas the implicit representation is well suited to model lower frequency skin reflection via spherical harmonics and can represent non-surface structures such as hair (i.e. eyebrows) or highly diffuse volumetric bodies (i.e. sclera), both of which are a challenge for explicit surface models. Tightly integrating the two representations in a joint framework allows controlled photoreal image synthesis and joint optimization of both the geometry parameters of the eyeball and the implicit neural network in continuous 3D space. We show that for high-resolution close-ups of the human eye, our model can synthesize high-fidelity animated gaze from novel views under unseen illumination conditions, allowing to generate visually rich eye imagery.

DeepFaceVideoEditing: sketch-based deep editing of face videos

Sketches, which are simple and concise, have been used in recent deep image synthesis methods to allow intuitive generation and editing of facial images. However, it is nontrivial to extend such methods to video editing due to various challenges, ranging from appropriate manipulation propagation and fusion of multiple editing operations to ensure temporal coherence and visual quality. To address these issues, we propose a novel sketch-based facial video editing framework, in which we represent editing manipulations in latent space and propose specific propagation and fusion modules to generate high-quality video editing results based on StyleGAN3. Specifically, we first design an optimization approach to represent sketch editing manipulations by editing vectors, which are propagated to the whole video sequence using a proper strategy to cope with different editing needs. Specifically, input editing operations are classified into two categories: temporally consistent editing and temporally variant editing. The former (e.g., change of face shape) is applied to the whole video sequence directly, while the latter (e.g., change of facial expression or dynamics) is propagated with the guidance of expression or only affects adjacent frames in a given time window. Since users often perform different editing operations in multiple frames, we further present a region-aware fusion approach to fuse diverse editing effects. Our method supports video editing on facial structure and expression movement by sketch, which cannot be achieved by previous works. Both qualitative and quantitative evaluations show the superior editing ability of our system to existing and alternative solutions.

Local anatomically-constrained facial performance retargeting

Generating realistic facial animation for CG characters and digital doubles is one of the hardest tasks in animation. A typical production workflow involves capturing the performance of a real actor using mo-cap technology, and transferring the captured motion to the target digital character. This process, known as retargeting, has been used for over a decade, and typically relies on either large blendshape rigs that are expensive to create, or direct deformation transfer algorithms that operate on individual geometric elements and are prone to artifacts. We present a new method for high-fidelity offline facial performance retargeting that is neither expensive nor artifact-prone. Our two step method first transfers local expression details to the target, and is followed by a global face surface prediction that uses anatomical constraints in order to stay in the feasible shape space of the target character. Our method also offers artists with familiar blendshape controls to perform fine adjustments to the retargeted animation. As such, our method is ideally suited for the complex task of human-to-human 3D facial performance retargeting, where the quality bar is extremely high in order to avoid the uncanny valley, while also being applicable for more common human-to-creature settings. We demonstrate the superior performance of our method over traditional deformation transfer algorithms, while achieving a quality comparable to current blendshape-based techniques used in production while requiring significantly fewer input shapes at setup time. A detailed user study corroborates the realistic and artifact free animations generated by our method in comparison to existing techniques.