ACM Transactions on Graphics (TOG): Vol. 42, No. 4. 2023

Full Citation in the ACM Digital Library

SESSION: Geometric Optimization

Winding Numbers on Discrete Surfaces

In the plane, the winding number is the number of times a curve wraps around a given point. Winding numbers are a basic component of geometric algorithms such as point-in-polygon tests, and their generalization to data with noise or topological errors has proven valuable for geometry processing tasks ranging from surface reconstruction to mesh booleans. However, standard definitions do not immediately apply on surfaces, where not all curves bound regions. We develop a meaningful generalization, starting with the well-known relationship between winding numbers and harmonic functions. By processing the derivatives of such functions, we can robustly filter out components of the input that do not bound any region. Ultimately, our algorithm yields (i) a closed, completed version of the input curves, (ii) integer labels for regions that are meaningfully bounded by these curves, and (iii) the complementary curves that do not bound any region. The main computational cost is solving a standard Poisson equation, or for surfaces with nontrivial topology, a sparse linear program. We also introduce special basis functions to represent singularities that naturally occur at endpoints of open curves.

Flexible Isosurface Extraction for Gradient-Based Mesh Optimization

This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field, an increasingly common paradigm in applications including photogrammetry, generative modeling, and inverse physics. Existing implementations adapt classic isosurface extraction algorithms like Marching Cubes or Dual Contouring; these techniques were designed to extract meshes from fixed, known fields, and in the optimization setting they lack the degrees of freedom to represent high-quality feature-preserving meshes, or suffer from numerical instabilities. We introduce FlexiCubes, an isosurface representation specifically designed for optimizing an unknown mesh with respect to geometric, visual, or even physical objectives. Our main insight is to introduce additional carefully-chosen parameters into the representation, which allow local flexible adjustments to the extracted mesh geometry and connectivity. These parameters are updated along with the underlying scalar field via automatic differentiation when optimizing for a downstream task. We base our extraction scheme on Dual Marching Cubes for improved topological properties, and present extensions to optionally generate tetrahedral and hierarchically-adaptive meshes. Extensive experiments validate FlexiCubes on both synthetic benchmarks and real-world applications, showing that it offers significant improvements in mesh quality and geometric fidelity.

Topology driven approximation to rational surface-surface intersection via interval algebraic topology analysis

Computing the intersection between two parametric surfaces (SSI) is one of the most fundamental problems in geometric and solid modeling. Maintaining the SSI topology is critical to its computation robustness. We propose a topology-driven hybrid symbolic-numeric framework to approximate rational parametric surface-surface intersection (SSI) based on a concept of interval algebraic topology analysis (IATA), which configures within a 4D interval box the SSI topology. We map the SSI topology to an algebraic system's solutions within the framework, classify and enumerate all topological cases as a mixture of four fundamental cases (or their specific sub-cases). Various complicated topological situations are covered, such as cusp points or curves, tangent points (isolated or not) or curves, tiny loops, self-intersections, or their mixtures. The theoretical formulation is also implemented numerically using advanced real solution isolation techniques, and computed within a topology-driven framework which maximally utilizes the advantages of the topology maintenance of algebraic analysis, the robustness of iterative subdivision, and the efficiency of forward marching. The approach demonstrates improved robustness under benchmark topological cases when compared with available open-source and commercial solutions, including IRIT, SISL, and Parasolid.

SESSION: A Material World

A Practical Wave Optics Reflection Model for Hair and Fur

Traditional fiber scattering models, based on ray optics, are missing some important visual aspects of fiber appearance. Previous work [Xia et al. 2020] on wave scattering from ideal extrusions demonstrated that diffraction produces strong forward scattering and colorful effects that are missing from ray-based models. However, that work was unable to include some important surface characteristics such as surface roughness and tilted cuticle scales, which are known to be important for fiber appearance. In this work, we take an important step to study wave effects from rough fibers with arbitrary 3D microgeometry. While the full-wave simulation of realistic 3D fibers remains intractable, we developed a 3D wave optics simulator based on a physical optics approximation, using a GPU-based hierarchical algorithm to greatly accelerate the calculation. It simulates surface reflection and diffractive scattering, which are present in all fibers and typically dominate for darkly pigmented fibers. The simulation provides a detailed picture of first order scattering, but it is not practical to use for production rendering as this would require tabulation per fiber geometry. To practically handle geometry variations in the scene, we propose a model based on wavelet noise, capturing the important statistical features in the simulation results that are relevant for rendering. Both our simulation and practical model show similar granular patterns to those observed in optical measurement. Our compact noise model can be easily combined with existing scattering models to render hair and fur of various colors, introducing visually important colorful glints that were missing from all previous models.

SESSION: Motion Recipes and Simulation

Anatomically Detailed Simulation of Human Torso

Many existing digital human models approximate the human skeletal system using rigid bodies connected by rotational joints. While the simplification is considered acceptable for legs and arms, it significantly lacks fidelity to model rich torso movements in common activities such as dancing, Yoga, and various sports. Research from biomechanics provides more detailed modeling for parts of the torso, but their models often operate in isolation and are not fast and robust enough to support computationally heavy applications and large-scale data generation for full-body digital humans. This paper proposes a new torso model that aims to achieve high fidelity both in perception and in functionality, while being computationally feasible for simulation and optimal control tasks. We build a detailed human torso model consisting of various anatomical components, including facets, ligaments, and intervertebral discs, by coupling efficient finite-element and rigid-body simulations. Given an existing motion capture sequence without dense markers placed on the torso, our new model is able to recover the underlying torso bone movements. Our method is remarkably robust that it can be used to automatically "retrofit" the entire Mixamo motion database of highly diverse human motions without user intervention. We also show that our model is computationally efficient for solving trajectory optimization of highly dynamic full-body movements, without relying any reference motion. Physiological validity of the model is validated against established literature.

HACK: Learning a Parametric Head and Neck Model for High-fidelity Animation

Significant advancements have been made in developing parametric models for digital humans, with various approaches concentrating on parts such as the human body, hand, or face. Nevertheless, connectors such as the neck have been overlooked in these models, with rich anatomical priors often unutilized. In this paper, we introduce HACK (Head-And-neCK), a novel parametric model for constructing the head and cervical region of digital humans. Our model seeks to disentangle the full spectrum of neck and larynx motions, facial expressions, and appearance variations, providing personalized and anatomically consistent controls, particularly for the neck regions. To build our HACK model, we acquire a comprehensive multi-modal dataset of the head and neck under various facial expressions. We employ a 3D ultrasound imaging scheme to extract the inner biomechanical structures, namely the precise 3D rotation information of the seven vertebrae of the cervical spine. We then adopt a multi-view photometric approach to capture the geometry and physically-based textures of diverse subjects, who exhibit a diverse range of static expressions as well as sequential head-and-neck movements. Using the multi-modal dataset, we train the parametric HACK model by separating the 3D head and neck depiction into various shape, pose, expression, and larynx blendshapes from the neutral expression and the rest skeletal pose. We adopt an anatomically-consistent skeletal design for the cervical region, and the expression is linked to facial action units for artist-friendly controls. We also propose to optimize the mapping from the identical shape space to the PCA spaces of personalized blendshapes to augment the pose and expression blendshapes, providing personalized properties within the framework of the generic model. Furthermore, we use larynx blendshapes to accurately control the larynx deformation and force the larynx slicing motions along the vertical direction in the UV-space for precise modeling of the larynx beneath the neck skin. HACK addresses the head and neck as a unified entity, offering more accurate and expressive controls, with a new level of realism, particularly for the neck regions. This approach has significant benefits for numerous applications, including geometric fitting and animation, and enables inter-correlation analysis between head and neck for fine-grained motion synthesis and transfer.

SESSION: Character Animation: Knowing What To Do With Your Hands

GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents

The automatic generation of stylized co-speech gestures has recently received increasing attention. Previous systems typically allow style control via predefined text labels or example motion clips, which are often not flexible enough to convey user intent accurately. In this work, we present GestureDiffuCLIP, a neural network framework for synthesizing realistic, stylized co-speech gestures with flexible style control. We leverage the power of the large-scale Contrastive-Language-Image-Pre-training (CLIP) model and present a novel CLIP-guided mechanism that extracts efficient style representations from multiple input modalities, such as a piece of text, an example motion clip, or a video. Our system learns a latent diffusion model to generate high-quality gestures and infuses the CLIP representations of style into the generator via an adaptive instance normalization (AdaIN) layer. We further devise a gesture-transcript alignment mechanism that ensures a semantically correct gesture generation based on contrastive learning. Our system can also be extended to allow fine-grained style control of individual body parts. We demonstrate an extensive set of examples showing the flexibility and generalizability of our model to a variety of style descriptions. In a user study, we show that our system outperforms the state-of-the-art approaches regarding human likeness, appropriateness, and style correctness.

BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer

Automatic gesture synthesis from speech is a topic that has attracted researchers for applications in remote communication, video games and Metaverse. Learning the mapping between speech and 3D full-body gestures is difficult due to the stochastic nature of the problem and the lack of a rich cross-modal dataset that is needed for training. In this paper, we propose a novel transformer-based framework for automatic 3D body gesture synthesis from speech. To learn the stochastic nature of the body gesture during speech, we propose a variational transformer to effectively model a probabilistic distribution over gestures, which can produce diverse gestures during inference. Furthermore, we introduce a mode positional embedding layer to capture the different motion speeds in different speaking modes. To cope with the scarcity of data, we design an intra-modal pre-training scheme that can learn the complex mapping between the speech and the 3D gesture from a limited amount of data. Our system is trained with either the Trinity speech-gesture dataset or the Talking With Hands 16.2M dataset. The results show that our system can produce more realistic, appropriate, and diverse body gestures compared to existing state-of-the-art approaches.

Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models

Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models. We show that these models are an excellent fit for synthesising human motion that co-occurs with audio, e.g., dancing and co-speech gesticulation, since motion is complex and highly ambiguous given audio, calling for a probabilistic description. Specifically, we adapt the DiffWave architecture to model 3D pose sequences, putting Conformers in place of dilated convolutions for improved modelling power. We also demonstrate control over motion style, using classifier-free guidance to adjust the strength of the stylistic expression. Experiments on gesture and dance generation confirm that the proposed method achieves top-of-the-line motion quality, with distinctive styles whose expression can be made more or less pronounced. We also synthesise path-driven locomotion using the same model architecture. Finally, we generalise the guidance procedure to obtain product-of-expert ensembles of diffusion models and demonstrate how these may be used for, e.g., style interpolation, a contribution we believe is of independent interest.

Contact Edit: Artist Tools for Intuitive Modeling of Hand-Object Interactions

Posing high-contact interactions is challenging and time-consuming, with hand-object interactions being especially difficult due to the large number of degrees of freedom (DOF) of the hand and the fact that humans are experts at judging hand poses. This paper addresses this challenge by elevating contact areas to first-class primitives. We provide end-to-end art-directable (EAD) tools to model interactions based on contact areas, directly manipulate contact areas, and compute corresponding poses automatically. To make these operations intuitive and fast, we present a novel axis-based contact model that supports real-time approximately isometry-preserving operations on triangulated surfaces, permits movement between surfaces, and is both robust and scalable to large areas. We show that use of our contact model facilitates high quality posing even for unconstrained, high-DOF custom rigs intended for traditional keyframe-based animation pipelines. We additionally evaluate our approach with comparisons to prior art, ablation studies, user studies, qualitative assessments, and extensions to full-body interaction.

SESSION: Image and Video Editing

Eventfulness for Interactive Video Alignment

Humans are remarkably sensitive to the alignment of visual events with other stimuli, which makes synchronization one of the hardest tasks in video editing. A key observation of our work is that most of the alignment we do involves salient localizable events that occur sparsely in time. By learning how to recognize these events, we can greatly reduce the space of possible synchronizations that an editor or algorithm has to consider. Furthermore, by learning descriptors of these events that capture additional properties of visible motion, we can build active tools that adapt their notion of eventfulness to a given task as they are being used. Rather than learning an automatic solution to one specific problem, our goal is to make a much broader class of interactive alignment tasks significantly easier and less time-consuming. We show that a suitable visual event descriptor can be learned entirely from stochastically-generated synthetic video. We then demonstrate the usefulness of learned and adaptive eventfulness by integrating it in novel interactive tools for applications including audio-driven time warping of video and the extraction and application of sound effects across different videos.

FactorMatte: Redefining Video Matting for Re-Composition Tasks

We propose Factor Matting, an alternative formulation of the video matting problem in terms of counterfactual video synthesis that is better suited for re-composition tasks. The goal of factor matting is to separate the contents of a video into independent components, each representing a counterfactual version of the scene where the contents of other components have been removed. We show that factor matting maps well to a more general Bayesian framing of the matting problem that accounts for complex conditional interactions between layers. Based on this observation, we present a method for solving the factor matting problem that learns augmented patch-based appearance priors to produce useful decompositions even for video with complex cross-layer interactions like splashes, shadows, and reflections. Our method is trained per-video and does not require external training data or any knowledge about the 3D structure of the scene. Through extensive experiments, we show that it is able to produce useful decompositions of scenes with such complex interactions while performing competitively on classical matting tasks as well. We also demonstrate the benefits of our approach on a wide range of downstream video editing tasks. Our project website is at: https://factormatte.github.io/.

Computational Long Exposure Mobile Photography

Long exposure photography produces stunning imagery, representing moving elements in a scene with motion-blur. It is generally employed in two modalities, producing either a foreground or a background blur effect. Foreground blur images are traditionally captured on a tripod-mounted camera and portray blurred moving foreground elements, such as silky water or light trails, over a perfectly sharp background landscape. Background blur images, also called panning photography, are captured while the camera is tracking a moving subject, to produce an image of a sharp subject over a background blurred by relative motion. Both techniques are notoriously challenging and require additional equipment and advanced skills. In this paper, we describe a computational burst photography system that operates in a hand-held smartphone camera app, and achieves these effects fully automatically, at the tap of the shutter button. Our approach first detects and segments the salient subject. We track the scene motion over multiple frames and align the images in order to preserve desired sharpness and to produce aesthetically pleasing motion streaks. We capture an under-exposed burst and select the subset of input frames that will produce blur trails of controlled length, regardless of scene or camera motion velocity. We predict inter-frame motion and synthesize motion-blur to fill the temporal gaps between the input frames. Finally, we composite the blurred image with the sharp regular exposure to protect the sharpness of faces or areas of the scene that are barely moving, and produce a final high resolution and high dynamic range (HDR) photograph. Our system democratizes a capability previously reserved to professionals, and makes this creative style accessible to most casual photographers.

SESSION: Geometric Abstractions: Not Just for Cubists

ShapeCoder: Discovering Abstractions for Visual Programs from Unstructured Primitives

We introduce ShapeCoder, the first system capable of taking a dataset of shapes, represented with unstructured primitives, and jointly discovering (i) useful abstraction functions and (ii) programs that use these abstractions to explain the input shapes. The discovered abstractions capture common patterns (both structural and parametric) across a dataset, so that programs rewritten with these abstractions are more compact, and suppress spurious degrees of freedom. ShapeCoder improves upon previous abstraction discovery methods, finding better abstractions, for more complex inputs, under less stringent input assumptions. This is principally made possible by two methodological advancements: (a) a shape-to-program recognition network that learns to solve sub-problems and (b) the use of e-graphs, augmented with a conditional rewrite scheme, to determine when abstractions with complex parametric expressions can be applied, in a tractable manner. We evaluate ShapeCoder on multiple datasets of 3D shapes, where primitive decompositions are either parsed from manual annotations or produced by an unsupervised cuboid abstraction method. In all domains, ShapeCoder discovers a library of abstractions that captures high-level relationships, removes extraneous degrees of freedom, and achieves better dataset compression compared with alternative approaches. Finally, we investigate how programs rewritten to use discovered abstractions prove useful for downstream tasks.

The Visual Language of Fabrics

We introduce text2fabric, a novel dataset that links free-text descriptions to various fabric materials. The dataset comprises 15,000 natural language descriptions associated to 3,000 corresponding images of fabric materials. Traditionally, material descriptions come in the form of tags/keywords, which limits their expressivity, induces pre-existing knowledge of the appropriate vocabulary, and ultimately leads to a chopped description system. Therefore, we study the use of free-text as a more appropriate way to describe material appearance, taking the use case of fabrics as a common item that non-experts may often deal with. Based on the analysis of the dataset, we identify a compact lexicon, set of attributes and key structure that emerge from the descriptions. This allows us to accurately understand how people describe fabrics and draw directions for generalization to other types of materials. We also show that our dataset enables specializing large vision-language models such as CLIP, creating a meaningful latent space for fabric appearance, and significantly improving applications such as fine-grained material retrieval and automatic captioning.

ArrangementNet: Learning Scene Arrangements for Vectorized Indoor Scene Modeling

We present a novel vectorized indoor modeling approach that converts point clouds into building information models (BIM) with concise and semantically segmented polygonal meshes. Existing methods detect planar shapes and connect them to complete the scene. Some focus on floor plan reconstruction as a simplified problem to better analyze connectivity between planes of floors and walls. However, the connectivity analysis is still challenging and ill-posed with incomplete point clouds as input. We propose ArrangementNet to estimate scene arrangements from an incomplete point cloud, which we can easily convert into a BIM model. ArrangementNet is a novel graph neural network that consumes noisy over-partitioned initial arrangements extracted through non-learning techniques and outputs high-quality scene arrangement. The core of ArrangementNet is an extended graph convolution that leverages co-linear and co-face relationships in the arrangement and improves the quality of prediction in complex scenes. We apply ArrangementNet to improve floor plan and ceiling arrangements and enrich them with semantic objects as scene arrangements for scene generation. Our approach faithfully models challenging scenes obtained from laser scans or multiview stereo and shows significant improvement in BIM model reconstruction compared to the state-of-the-art. Our code is available at https://github.com/zssjh/ArrangementNet.

Juxtaform: interactive visual summarization for exploratory shape design

We present juxtaform, a novel approach to the interactive summarization of large shape collections for conceptual shape design. We conduct a formative study to ascertain design goals for creative shape exploration tools. Motivated by a mathematical formulation of these design goals, juxtaform integrates the exploration, analysis, selection, and refinement of large shape collections to support an interactive divergence-convergence shape design workflow. We exploit sparse, segmented sketch-stroke visual abstractions of shape and a novel visual summarization algorithm to balance the needs of shape understanding, in-situ shape juxtaposition, and visual clutter. Our evaluation is three-fold: we show that existing shape and stroke clustering algorithms do not address our design goals compared to our proposed shape corpus summarization algorithm; we compare juxtaform against a structured image gallery interface for various shape design and analysis tasks; and we present multiple compelling 2D/3D applications using juxtaform.

Patternshop: Editing Point Patterns by Image Manipulation

Point patterns are characterized by their density and correlation. While spatial variation of density is well-understood, analysis and synthesis of spatially-varying correlation is an open challenge. No tools are available to intuitively edit such point patterns, primarily due to the lack of a compact representation for spatially varying correlation. We propose a low-dimensional perceptual embedding for point correlations. This embedding can map point patterns to common three-channel raster images, enabling manipulation with off-the-shelf image editing software. To synthesize back point patterns, we propose a novel edge-aware objective that carefully handles sharp variations in density and correlation. The resulting framework allows intuitive and backward-compatible manipulation of point patterns, such as recoloring, relighting to even texture synthesis that have not been available to 2D point pattern design before. Effectiveness of our approach is tested in several user experiments. Code is available at https://github.com/xchhuang/patternshop.

SESSION: Magical Sketching

VideoDoodles: Hand-Drawn Animations on Videos with Scene-Aware Canvases

We present an interactive system to ease the creation of so-called video doodles - videos on which artists insert hand-drawn animations for entertainment or educational purposes. Video doodles are challenging to create because to be convincing, the inserted drawings must appear as if they were part of the captured scene. In particular, the drawings should undergo tracking, perspective deformations and occlusions as they move with respect to the camera and to other objects in the scene - visual effects that are difficult to reproduce with existing 2D video editing software. Our system supports these effects by relying on planar canvases that users position in a 3D scene reconstructed from the video. Furthermore, we present a custom tracking algorithm that allows users to anchor canvases to static or dynamic objects in the scene, such that the canvases move and rotate to follow the position and direction of these objects. When testing our system, novices could create a variety of short animated clips in a dozen of minutes, while professionals praised its speed and ease of use compared to existing tools.

StripMaker: Perception-driven Learned Vector Sketch Consolidation

Artist sketches often use multiple overdrawn strokes to depict a single intended curve. Humans effortlessly mentally consolidate such sketches by detecting groups of overdrawn strokes and replacing them with the corresponding intended curves. While this mental process is near instantaneous, manually annotating or retracing sketches to communicate this intended mental image is highly time consuming; yet most sketch applications are not designed to handle overdrawing and can only operate on overdrawing-free, consolidated sketches. We propose StripMaker, a new and robust learning based method for automatic consolidation of raw vector sketches. We avoid the need for an unsustainably large manually annotated learning corpus by leveraging observations about artist workflow and perceptual cues viewers employ when mentally consolidating sketches. We train two perception-aware classifiers that assess the likelihood that a pair of stroke groups jointly depicts the same intended curve: our first classifier is purely local and only accounts for the properties of the evaluated strokes; our second classifier incorporates global context and is designed to operate on approximately consolidated sketches. We embed these classifiers within a consolidation framework that leverages artist workflow: we first process strokes in the order they were drawn and use our local classifier to arrive at an approximate consolidation output, then use the contextual classifier to refine this output and finalize the consolidated result. We validate StripMaker by comparing its results to manual consolidation outputs and algorithmic alternatives. StripMaker achieves comparable performance to manual consolidation. In a comparative study participants preferred our results by a 53% margin over those of the closest algorithmic alternative (67% versus 14%, other/neither 19%).

Semi-supervised reference-based sketch extraction using a contrastive learning framework

Sketches reflect the drawing style of individual artists; therefore, it is important to consider their unique styles when extracting sketches from color images for various applications. Unfortunately, most existing sketch extraction methods are designed to extract sketches of a single style. Although there have been some attempts to generate various style sketches, the methods generally suffer from two limitations: low quality results and difficulty in training the model due to the requirement of a paired dataset. In this paper, we propose a novel multi-modal sketch extraction method that can imitate the style of a given reference sketch with unpaired data training in a semi-supervised manner. Our method outperforms state-of-the-art sketch extraction methods and unpaired image translation methods in both quantitative and qualitative evaluations.

SESSION: XR Displays and Perception: Seeing What's in Front of Your Eyes

Split-Lohmann Multifocal Displays

This work provides the design of a multifocal display that can create a dense stack of focal planes in a single shot. We achieve this using a novel computational lens that provides spatial selectivity in its focal length, i.e, the lens appears to have different focal lengths across points on a display behind it. This enables a multifocal display via an appropriate selection of the spatially-varying focal length, thereby avoiding time multiplexing techniques that are associated with traditional focus tunable lenses. The idea central to this design is a modification of a Lohmann lens, a focus tunable lens created with two cubic phase plates that translate relative to each other. Using optical relays and a phase spatial light modulator, we replace the physical translation of the cubic plates with an optical one, while simultaneously allowing for different pixels on the display to undergo different amounts of translations and, consequently, different focal lengths. We refer to this design as a Split-Lohmann multifocal display. Split-Lohmann displays provide a large étendue as well as high spatial and depth resolutions; the absence of time multiplexing and the extremely light computational footprint for content processing makes it suitable for video and interactive experiences. Using a lab prototype, we show results over a wide range of static, dynamic, and interactive 3D scenes, showcasing high visual quality over a large working range.

Étendue Expansion in Holographic Near Eye Displays through Sparse Eye-box Generation Using Lens Array Eyepiece

In this paper, we present a novel method the étendue expansion of near-eye holographic displays through the generation of a sparse eye-box. Conventional holographic near-eye displays have suffered from narrow field of view or narrow eye-box due to the limited étendue supported by a spatial light modulator. We focus on the fact that these displays typically form a dense eye-box, which could be an excessive investment of the limited étendue. By rearranging the eye-box in a sparse manner, the practical étendue can be extended. With a properly designed sparse eye-box shape, it can provide the 3D holographic images and ensure continuous light entrance to the pupil. To create a sparse eye-box, we utilize a lens array as an eyepiece lens. We optimize the spatial light modulator's phase profile for the proposed system and analyze the impact of the use of the lens array eyepiece on the holographic image quality. In particular, we focus on the lens array specification of the lenslet pitch and the focal length, deriving feasible specifications based on our analysis. We experimentally demonstrate a near eye see-through display using the proposed system and verify the étendue expansion.

SESSION: Procedural Modeling

Rhizomorph: The Coordinated Function of Shoots and Roots

Computer graphics has dedicated a considerable amount of effort to generating realistic models of trees and plants. Many existing methods leverage procedural modeling algorithms - that often consider biological findings - to generate branching structures of individual trees. While the realism of tree models generated by these algorithms steadily increases, most approaches neglect to model the root system of trees. However, the root system not only adds to the visual realism of tree models but also plays an important role in the development of trees. In this paper, we advance tree modeling in the following ways: First, we define a physically-plausible soil model to simulate resource gradients, such as water and nutrients. Second, we propose a novel developmental procedural model for tree roots that enables us to emergently develop root systems that adapt to various soil types. Third, we define long-distance signaling to coordinate the development of shoots and roots. We show that our advanced procedural model of tree development enables - for the first time - the generation of trees with their root systems.

Example-Based Procedural Modeling Using Graph Grammars

We present a method for automatically generating polygonal shapes from an example using a graph grammar. Most procedural modeling techniques use grammars with manually created rules, but our method can create them automatically from an example. Our graph grammars generate graphs that are locally similar to a given example. We disassemble the input into small pieces called primitives and then reassemble the primitives into new graphs. We organize all possible locally similar graphs into a hierarchy and find matching graphs within the hierarchy. These matches are used to create a graph grammar that can construct every locally similar graph. Our method generates graphs using the grammar and then converts them into a planar graph drawing to produce the final shape.

Forming Terrains by Glacial Erosion

We introduce the first solution for simulating the formation and evolution of glaciers, together with their attendant erosive effects, for periods covering the combination of glacial and inter-glacial cycles. Our efficient solution includes both a fast yet accurate deep learning-based estimation of highorder ice flows and a new, multi-scale advection scheme enabling us to account for the distinct time scales at which glaciers reach equilibrium compared to eroding the terrain. We combine the resulting glacial erosion model with finer-scale erosive phenomena to account for the transport of debris flowing from cliffs. This enables us to model the formation of terrain shapes not previously adequately modeled in Computer Graphics, ranging from U-shaped and hanging valleys to fjords and glacial lakes.

SESSION: Inverse Rendering: Does Anybody Know How I Got Here?

Recursive Control Variates for Inverse Rendering

We present a method for reducing errors---variance and bias---in physically based differentiable rendering (PBDR). Typical applications of PBDR repeatedly render a scene as part of an optimization loop involving gradient descent. The actual change introduced by each gradient descent step is often relatively small, causing a significant degree of redundancy in this computation. We exploit this redundancy by formulating a gradient estimator that employs a recursive control variate, which leverages information from previous optimization steps. The control variate reduces variance in gradients, and, perhaps more importantly, alleviates issues that arise from differentiating loss functions with respect to noisy inputs, a common cause of drift to bad local minima or divergent optimizations. We experimentally evaluate our approach on a variety of path-traced scenes containing surfaces and volumes and observe that primal rendering efficiency improves by a factor of up to 10.

Film Grain Rendering and Parameter Estimation

We propose a realistic film grain rendering algorithm based on statistics derived analytically from a physics-based Boolean model that Newson et al. adopted for Monte Carlo simulations of film grain. We also propose formulas for estimation of the model parameters from scanned film grain images. The proposed rendering is computationally efficient and can be used for real-time film grain simulation for a wide range of film grain parameters when the individual film grains are not visible. Experimental results demonstrate the effectiveness of the proposed approach for both constant and real-world images, for a six orders of magnitude speed-up compared with the Monte Carlo simulations of the Newson et al. approach.

Revisiting controlled mixture sampling for rendering applications

Monte Carlo rendering makes heavy use of mixture sampling and multiple importance sampling (MIS). Previous work has shown that control variates can be used to make such mixtures more efficient and more robust. However, the existing approaches failed to yield practical applications, chiefly because their underlying theory is based on the unrealistic assumption that a single mixture is optimized for a single integral. This is in stark contrast with rendering reality, where millions of integrals are computed---one per pixel---and each is infinitely recursive. We adapt and extend the theory introduced by previous work to tackle the challenges of real-world rendering applications. We achieve robust mixture sampling and (approximately) optimal MIS weighting for common applications such as light selection, BSDF sampling, and path guiding.

SESSION: Fabricating Appearance

Scratch-based Reflection Art via Differentiable Rendering

The 3D visual optical arts create fascinating special effects by carefully designing interactions between objects and light sources. One of the essential types is 3D reflection art, which aims to create reflectors that can display different images when viewed from different directions. Existing works produce impressive visual effects. Unfortunately, previous works discretize the reflector surface with regular grids/facets, leading to a large parameter space and a high optimization time cost. In this paper, we introduce a new type of 3D reflection art - scratch-based reflection art, which allows for a more compact parameter space, easier fabrication, and computationally efficient optimization. To design a 3D reflection art with scratches, we formulate it as a multi-view optimization problem and introduce differentiable rendering to enable efficient gradient-based optimizers. For that, we propose an analytical scratch rendering approach, together with a high-performance rendering pipeline, allowing efficient differentiable rendering. As a consequence, we could display multiple images on a single metallic board with only several minutes for optimization. We demonstrate our work by showing virtual objects and manufacturing our designed reflectors with a carving machine.

Meso-Facets for Goniochromatic 3D Printing

Goniochromatic materials and objects appear to have different colors depending on viewing direction. This occurs in nature, such as in wood or minerals, and in human-made objects such as metal and effect pigments. In this paper, we propose algorithms to control multi-material 3D printers to produce goniochromatic effects on arbitrary surfaces by procedurally augmenting the input surface with meso-facets, which allow distinct colors to be assigned to different viewing directions of the input surface while introducing minimal changes to that surface. Previous works apply only to 2D or 2.5D surfaces, require multiple fabrication technologies, or make considerable changes to the input surface and require special post-processing, whereas our approach requires a single fabrication technology and no special post-processing. Our framework is general, allowing different generating functions for both the shape and color of the facets. Working with implicit representations allows us to generate geometric features at the limit of device resolution without tessellation. We evaluate our approach for performance, showing negligible overhead compared to baseline color 3D print processing, and for goniochromatic quality.

Skin-Screen: A Computational Fabrication Framework for Color Tattoos

Tattoos are a highly popular medium, with both artistic and medical applications. Although the mechanical process of tattoo application has evolved historically, the results are reliant on the artisanal skill of the artist. This can be especially challenging for some skin tones, or in cases where artists lack experience. We provide the first systematic overview of tattooing as a computational fabrication technique. We built an automated tattooing rig and a recipe for the creation of silicone sheets mimicking realistic skin tones, which allowed us to create an accurate model predicting tattoo appearance. This enables several exciting applications including tattoo previewing, color retargeting, novel ink spectra optimization, color-accurate prosthetics, and more.

Orientable Dense Cyclic Infill for Anisotropic Appearance Fabrication

We present a method to 3D print surfaces exhibiting a prescribed varying field of anisotropic appearance using only standard fused filament fabrication printers. This enables the fabrication of patterns triggering reflections similar to that of brushed metal with direct control over the directionality of the reflections. Our key insight, on which we ground the method, is that the direction of the deposition paths leads to a certain degree of surface roughness, which yields a visual anisotropic appearance. Therefore, generating dense cyclic infills aligned with a line field allows us to grade the anisotropic appearance of the printed surface. To achieve this, we introduce a highly parallelizable algorithm for optimizing oriented, cyclic paths. Our algorithm outperforms existing approaches regarding efficiency, robustness, and result quality. We demonstrate the effectiveness of our technique in conveying an anisotropic appearance on several challenging test cases, ranging from patterns to photographs reinterpreted as anisotropic appearances.

SESSION: Contours, Conformality, Coarsening, and Coordinates

Differential Operators on Sketches via Alpha Contours

A vector sketch is a popular and natural geometry representation depicting a 2D shape. When viewed from afar, the disconnected vector strokes of a sketch and the empty space around them visually merge into positive space and negative space, respectively. Positive and negative spaces are the key elements in the composition of a sketch and define what we perceive as the shape. Nevertheless, the notion of positive or negative space is mathematically ambiguous: While the strokes unambiguously indicate the interior or boundary of a 2D shape, the empty space may or may not belong to the shape's exterior.

For standard discrete geometry representations, such as meshes or point clouds, some of the most robust pipelines rely on discretizations of differential operators, such as Laplace-Beltrami. Such discretizations are not available for vector sketches; defining them may enable numerous applications of classical methods on vector sketches. However, to do so, one needs to define the positive space of a vector sketch, or the sketch shape.

Even though extracting this 2D sketch shape is mathematically ambiguous, we propose a robust algorithm, Alpha Contours, constructing its conservative estimate: a 2D shape containing all the input strokes, which lie in its interior or on its boundary, and aligning tightly to a sketch. This allows us to define popular differential operators on vector sketches, such as Laplacian and Steklov operators.

We demonstrate that our construction enables robust tools for vector sketches, such as As-Rigid-As-Possible sketch deformation and functional maps between sketches, as well as solving partial differential equations on a vector sketch.

Min-Deviation-Flow in Bi-directed Graphs for T-Mesh Quantization

Subdividing non-conforming T-mesh layouts into conforming quadrangular meshes is a core component of state-of-the-art (re-)meshing methods. Typically, the required constrained assignment of integer lengths to T-Mesh edges is left to generic branch-and-cut solvers, greedy heuristics, or a combination of the two. This either does not scale well with input complexity or delivers suboptimal result quality. We introduce the Minimum-Deviation-Flow Problem in bi-directed networks (Bi-MDF) and demonstrate its use in modeling and efficiently solving a variety of T-Mesh quantization problems. We develop a fast approximate solver as well as an iterative refinement algorithm based on matching in graphs that solves Bi-MDF exactly. Compared to the state-of-the-art QuadWild [Pietroni et al. 2021] implementation on the authors' 300 dataset, our exact solver finishes after only 0.49% (total 17.06s) of their runtime (3491s) and achieves 11% lower energy while an approximation is computed after 0.09% (3.19s) of their runtime at the cost of 24% increased energy. A novel half-arc-based T-Mesh quantization formulation extends the feasible solution space to include previously unattainable quad meshes. The Bi-MDF problem is more general than our application in layout quantization, potentially enabling similar speedups for other optimization problems that fit into the scheme, such as quad mesh refinement.

Efficient Embeddings in Exact Arithmetic

We provide a set of tools for generating planar embeddings of triangulated topological spheres. The algorithms make use of Schnyder labelings and realizers. A new representation of the realizer based on dual trees leads to a simple linear time algorithm mapping from weights per triangle to barycentric coordinates and, more importantly, also in the reverse direction. The algorithms can be implemented so that all coefficients involved are 1 or -1. This enables integer computation, making all computations exact. Being a Schnyder realizer, mapping from positive triangle weights guarantees that the barycentric coordinates form an embedding. The reverse direction enables an algorithm for fixing flipped triangles in planar realizations, by mapping from coordinates to weights and adjusting the weights (without forcing them to be positive). In a range of experiments, we demonstrate that all algorithms are orders of magnitude faster than existing robust approaches.

SESSION: Thin and Thinner: Modeling Shells and Hair

Complex Wrinkle Field Evolution

We propose a new approach for representing wrinkles, designed to capture complex and detailed wrinkle behavior on coarse triangle meshes, called Complex Wrinkle Fields. Complex Wrinkle Fields consist of an almost-everywhere-unit complex-valued phase function over the surface; a frequency one-form; and an amplitude scalar, with a soft compatibility condition coupling the frequency and phase. We develop algorithms for interpolating between two such wrinkle fields, for visualizing them as displacements of a Loop-subdivided refinement of the base mesh, and for making smooth local edits to the wrinkle amplitude, frequency, and/or orientation. These algorithms make it possible, for the first time, to create and edit animations of wrinkles on triangle meshes that are smooth in space, evolve smoothly through time, include singularities along with their complex interactions, and that represent frequencies far finer than the surface resolution.

Computational Exploration of Multistable Elastic Knots

We present an algorithmic approach to discover, study, and design multistable elastic knots. Elastic knots are physical realizations of closed curves embedded in 3-space. When endowed with the material thickness and bending resistance of a physical wire, these knots settle into equilibrium states that balance the forces induced by elastic deformation and self-contacts of the wire. In general, elastic knots can have many distinct equilibrium states, i.e. they are multistable mechanical systems. We propose a computational pipeline that combines randomized spatial sampling and physics simulation to efficiently find stable equilibrium states of elastic knots. Leveraging results from knot theory, we run our pipeline on thousands of different topological knot types to create an extensive data set of multistable knots. By applying a series of filters to this data, we discover new transformable knots with interesting geometric and physical properties. A further analysis across knot types reveals geometric and topological patterns, yielding constructive principles that generalize beyond the currently tabulated knot types. We show how multistable elastic knots can be used to design novel deployable structures and engaging recreational puzzles. Several physical prototypes at different scales highlight these applications and validate our simulation.

Sag-Free Initialization for Strand-Based Hybrid Hair Simulation

Lagrangian/Eulerian hybrid strand-based hair simulation techniques have quickly become a popular approach in VFX and real-time graphics applications. With Lagrangian hair dynamics, the inter-hair contacts are resolved in the Eulerian grid using the continuum method, i.e., the MPM scheme with the granular Drucker-Prager rheology, to avoid expensive collision detection and handling. This fuzzy collision handling makes the authoring process significantly easier. However, although current hair grooming tools provide a wide range of strand-based modeling tools for this simulation approach, the crucial sag-free initialization functionality remains often ignored. Thus, when the simulation starts, gravity would cause any artistic hairstyle to sag and deform into unintended and undesirable shapes.

This paper proposes a novel four-stage sag-free initialization framework to solve stable quasistatic configurations for hybrid strand-based hair dynamic systems. These four stages are split into two global-local pairs. The first one ensures static equilibrium at every Eulerian grid node with additional inequality constraints to prevent stress from exiting the yielding surface. We then derive several associated closed-form solutions in the local stage to compute segment rest lengths, orientations, and particle deformation gradients in parallel. The second global-local step solves along each hair strand to ensure all the bend and twist constraints produce zero net torque on every hair segment, followed by a local step to adjust the rest Darboux vectors to a unit quaternion. We also introduce an essential modification for the Darboux vector to eliminate the ambiguity of the Cosserat rod rest pose in both initialization and simulation. We evaluate our method on a wide range of hairstyles, and our approach can only take a few seconds to minutes to get the rest quasistatic configurations for hundreds of hair strands. Our results show that our method successfully prevents sagging and has minimal impact on the hair motion during simulation.

CT2Hair: High-Fidelity 3D Hair Modeling using Computed Tomography

We introduce CT2Hair, a fully automatic framework for creating high-fidelity 3D hair models that are suitable for use in downstream graphics applications. Our approach utilizes real-world hair wigs as input, and is able to reconstruct hair strands for a wide range of hair styles. Our method leverages computed tomography (CT) to create density volumes of the hair regions, allowing us to see through the hair unlike image-based approaches which are limited to reconstructing the visible surface. To address the noise and limited resolution of the input density volumes, we employ a coarse-to-fine approach. This process first recovers guide strands with estimated 3D orientation fields, and then populates dense strands through a novel neural interpolation of the guide strands. The generated strands are then refined to conform to the input density volumes. We demonstrate the robustness of our approach by presenting results on a wide variety of hair styles and conducting thorough evaluations on both real-world and synthetic datasets. Code and data for this paper are at github.com/facebookresearch/CT2Hair.

SESSION: Full-Body XR: Beyond The Headset

EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors

Human and environment sensing are two important topics in Computer Vision and Graphics. Human motion is often captured by inertial sensors, while the environment is mostly reconstructed using cameras. We integrate the two techniques together in EgoLocate, a system that simultaneously performs human motion capture (mocap), localization, and mapping in real time from sparse body-mounted sensors, including 6 inertial measurement units (IMUs) and a monocular phone camera. On one hand, inertial mocap suffers from large translation drift due to the lack of the global positioning signal. EgoLo-cate leverages image-based simultaneous localization and mapping (SLAM) techniquesto locate the human in the reconstructed scene. Onthe other hand, SLAM often fails when the visual feature is poor. EgoLocate involves inertial mocap to provide a strong prior for the camera motion. Experiments show that localization, a key challenge for both two fields, is largely improved by our technique, compared with the state of the art of the two fields. Our codes are available for research at https://xinyu-yi.github.io/EgoLocate/.

Towards Attention–aware Foveated Rendering

Foveated graphics is a promising approach to solving the bandwidth challenges of immersive virtual and augmented reality displays by exploiting the falloff in spatial acuity in the periphery of the visual field. However, the perceptual models used in these applications neglect the effects of higherlevel cognitive processing, namely the allocation of visual attention, and are thus overestimating sensitivity in the periphery in many scenarios. Here, we introduce the first attention-aware model of contrast sensitivity. We conduct user studies to measure contrast sensitivity under different attention distributions and show that sensitivity in the periphery drops significantly when the user is required to allocate attention to the fovea. We motivate the development of future foveation models with another user study and demonstrate that tolerance for foveation in the periphery is significantly higher when the user is concentrating on a task in the fovea. Analysis of our model predicts significant bandwidth savings over those afforded by current models. As such, our work forms the foundation for attention-aware foveated graphics techniques.

SESSION: Neural Light Transport

Neural Prefiltering for Correlation-Aware Levels of Detail

We introduce a practical general-purpose neural appearance filtering pipeline for physically-based rendering. We tackle the previously difficult challenge of aggregating visibility across many levels of detail from local information only, without relying on learning visibility for the entire scene. The high adaptivity of neural representations allows us to retain geometric correlations along rays and thus avoid light leaks. Common approaches to prefiltering decompose the appearance of a scene into volumetric representations with physically-motivated parameters, where the inflexibility of the fitted models limits rendering accuracy. We avoid assumptions on particular types of geometry or materials, bypassing any special-case decompositions. Instead, we directly learn a compressed representation of the intra-voxel light transport. For such high-dimensional functions, neural networks have proven to be useful representations. To satisfy the opposing constraints of prefiltered appearance and correlation-preserving point-to-point visibility, we use two small independent networks on a sparse multi-level voxel grid. Each network requires 10--20 minutes of training to learn the appearance of an asset across levels of detail. Our method achieves 70--95% compression ratios and around 25% of quality improvements over previous work. We reach interactive to real-time framerates, depending on the level of detail.

SESSION: Pushing the Boundaries

Coupling Conduction, Convection and Radiative Transfer in a Single Path-Space: Application to Infrared Rendering

In the past decades, Monte Carlo methods have shown their ability to solve PDEs, independently of the dimensionality of the integration domain and for different use-cases (e.g. light transport, geometry processing, physics simulation). Specifically, the path-space formulation of transport equations is a key ingredient to define tractable and scalable solvers, and we observe nowadays a strong interest in the definition of simulation systems based on Monte Carlo algorithms. We also observe that, when simulating combined physics (e.g. thermal rendering from a heat transfer simulation), there is a lack of coupled Monte Carlo algorithms allowing to solve all the physics at once, in the same path space, rather than combining several independent MC estimators, a combination that would make the global solver critically sensitive to the complexity of each simulation space. This brings to our proposal: a coupled, single path-space, Monte Carlo algorithm for efficient multi-physics problems solving.

In this work, we combine our understanding and knowledge of Physics and Computer Graphics to demonstrate how to formulate and arrange different simulation spaces into a single path space. We define a tractable formalism for coupled heat transfer simulation using Monte Carlo, and we leverage the path-space construction to interactively compute multiple simulations with different conditions in the same scene, in terms of boundary conditions and observation time. We validate our proposal in the context of infrared rendering with different thermal simulation scenarios: e.g., room temperature simulation, visualization of heat paths within materials (detection of thermal bridges), heat diffusion capacity of thermal exchanger. We expect that our theoretical framework will foster collaboration and multidisciplinary studies. The perspectives this framework opens are detailed and we suggest a research agenda towards the resolution of coupled PDEs at the interface of Physics and Computer Graphics.

Walk on Stars: A Grid-Free Monte Carlo Method for PDEs with Neumann Boundary Conditions

Grid-free Monte Carlo methods based on the walk on spheres (WoS) algorithm solve fundamental partial differential equations (PDEs) like the Poisson equation without discretizing the problem domain or approximating functions in a finite basis. Such methods hence avoid aliasing in the solution, and evade the many challenges of mesh generation. Yet for problems with complex geometry, practical grid-free methods have been largely limited to basic Dirichlet boundary conditions. We introduce the walk on stars (WoSt) algorithm, which solves linear elliptic PDEs with arbitrary mixed Neumann and Dirichlet boundary conditions. The key insight is that one can efficiently simulate reflecting Brownian motion (which models Neumann conditions) by replacing the balls used by WoS with star-shaped domains. We identify such domains via the closest point on the visibility silhouette, by simply augmenting a standard bounding volume hierarchy with normal information. Overall, WoSt is an easy modification of WoS, and retains the many attractive features of grid-free Monte Carlo methods such as progressive and view-dependent evaluation, trivial parallelization, and sublinear scaling to increasing geometric detail.

A Practical Walk-on-Boundary Method for Boundary Value Problems

We introduce the walk-on-boundary (WoB) method for solving boundary value problems to computer graphics. WoB is a grid-free Monte Carlo solver for certain classes of second order partial differential equations. A similar Monte Carlo solver, the walk-on-spheres (WoS) method, has been recently popularized in computer graphics due to its advantages over traditional spatial discretization-based alternatives. We show that WoB's intrinsic properties yield further advantages beyond those of WoS. Unlike WoS, WoB naturally supports various boundary conditions (Dirichlet, Neumann, Robin, and mixed) for both interior and exterior domains. WoB builds upon boundary integral formulations, and it is mathematically more similar to light transport simulation in rendering than the random walk formulation of WoS. This similarity between WoB and rendering allows us to implement WoB on top of Monte Carlo ray tracing, and to incorporate advanced rendering techniques (e.g., bidirectional estimators with multiple importance sampling, the virtual point lights method, and Markov chain Monte Carlo) into WoB. WoB does not suffer from the intrinsic bias of WoS near the boundary and can estimate solutions precisely on the boundary. Our numerical results highlight the advantages of WoB over WoS as an attractive alternative to solve boundary value problems based on Monte Carlo.

Boundary Value Caching for Walk on Spheres

Grid-free Monte Carlo methods such as walk on spheres can be used to solve elliptic partial differential equations without mesh generation or global solves. However, such methods independently estimate the solution at every point, and hence do not take advantage of the high spatial regularity of solutions to elliptic problems. We propose a fast caching strategy which first estimates solution values and derivatives at randomly sampled points along the boundary of the domain (or a local region of interest). These cached values then provide cheap, output-sensitive evaluation of the solution (or its gradient) at interior points, via a boundary integral formulation. Unlike classic boundary integral methods, our caching scheme introduces zero statistical bias and does not require a dense global solve. Moreover we can handle imperfect geometry (e.g., with self-intersections) and detailed boundary/source terms without repairing or resampling the boundary representation. Overall, our scheme is similar in spirit to virtual point light methods from photorealistic rendering: it suppresses the typical salt-and-pepper noise characteristic of independent Monte Carlo estimates, while still retaining the many advantages of Monte Carlo solvers: progressive evaluation, trivial parallelization, geometric robustness, etc. We validate our approach using test problems from visual and geometric computing.

Generalizing Shallow Water Simulations with Dispersive Surface Waves

This paper introduces a novel method for simulating large bodies of water as a height field. At the start of each time step, we partition the waves into a bulk flow (which approximately satisfies the assumptions of the shallow water equations) and surface waves (which approximately satisfy the assumptions of Airy wave theory). We then solve the two wave regimes separately using appropriate state-of-the-art techniques, and re-combine the resulting wave velocities at the end of each step. This strategy leads to the first heightfield wave model capable of simulating complex interactions between both deep and shallow water effects, like the waves from a boat wake sloshing up onto a beach, or a dam break producing wave interference patterns and eddies. We also analyze the numerical dispersion created by our method and derive an exact correction factor for waves at a constant water depth, giving us a numerically perfect re-creation of theoretical water wave dispersion patterns.

Beyond Chainmail: Computational Modeling of Discrete Interlocking Materials

We present a method for computational modeling, mechanical characterization, and macro-scale simulation of discrete interlocking materials (DIM)---3D-printed chainmail fabrics made of quasi-rigid interlocking elements. Unlike conventional elastic materials for which deformation and restoring force are directly coupled, the mechanics of DIM are governed by contacts between individual elements that give rise to anisotropic deformation constraints. To model the mechanical behavior of these materials, we propose a computational approach that builds on three key components. (a): we explore the space of feasible deformations using native-scale simulations at the per-element level. (b): based on this simulation data, we introduce the concept of strain-space boundaries to represent deformation limits for in- and out-of-plane deformations, and (c): we use the strain-space boundaries to drive an efficient macro-scale simulation model based on homogenized deformation constraints. We evaluate our method on a set of representative discrete interlocking materials and validate our findings against measurements on physical prototypes.

SESSION: Cloud Rendering: Your GPU Is Somewhere Else

Trim Regions for Online Computation of From-Region Potentially Visible Sets

Visibility computation is a key element in computer graphics applications. More specifically, a from-region potentially visible set (PVS) is an established tool in rendering acceleration, but its high computational cost means a from-region PVS is almost always precomputed. Precomputation restricts the use of PVS to static scenes and leads to high storage cost, in particular, if we need fine-grained regions. For dynamic applications, such as streaming content over a variable-bandwidth network, online PVS computation with configurable region size is required. We address this need with trim regions, a new method for generating from-region PVS for arbitrary scenes in real time. Trim regions perform controlled erosion of object silhouettes in image space, implicitly applying the shrinking theorem known from previous work. Our algorithm is the first that applies automatic shrinking to unconstrained 3D scenes, including non-manifold meshes, and does so in real time using an efficient GPU execution model. We demonstrate that our algorithm generates a tight PVS for complex scenes and outperforms previous online methods for from-viewpoint and from-region PVS. It runs at 60 Hz for realistic game scenes consisting of millions of triangles and computes PVS with a tightness matching or surpassing existing approaches.

Potentially Visible Hidden-Volume Rendering for Multi-View Warping

This paper presents the model and rendering algorithm of Potentially Visible Hidden Volumes (PVHVs) for multi-view image warping. PVHVs are 3D volumes that are occluded at a known source view, but potentially visible at novel views. Given a bound of novel views, we define PVHVs using the edges of foreground fragments from the known view and the bound of novel views. PVHVs can be used to batch-test the visibilities of source fragments without iterating individual novel views in multi-fragment rendering, and thereby, cull redundant fragments prior to warping. We realize the model of PVHVs in Depth Peeling (DP). Our Effective Depth Peeling (EDP) can reduce the number of completely hidden fragments, capture important fragments early, and reduce warping cost. We demonstrate the benefit of our PVHVs and EDP in terms of memory, quality, and performance in multi-view warping.

Effect-based Multi-viewer Caching for Cloud-native Rendering

With cloud computing becoming ubiquitous, it appears as virtually everything can be offered as-a-service. However, real-time rendering in the cloud forms a notable exception, where the cloud adoption stops at running individual game instances in compute centers. In this paper, we explore whether a cloud-native rendering architecture is viable and scales to multi-client rendering scenarios. To this end, we propose world-space and on-surface caches to share rendering computations among viewers placed in the same virtual world. We discuss how caches can be utilized on an effect-basis and demonstrate that a large amount of computations can be saved as the number of viewers in a scene increases. Caches can easily be set up for various effects, including ambient occlusion, direct illumination, and diffuse global illumination. Our results underline that the image quality using cached rendering is on par with screen-space rendering and due to its simplicity and inherent coherence, cached rendering may even have advantages in single viewer setups. Analyzing the runtime and communication costs, we show that cached rendering is already viable in multi-GPU systems. Building on top of our research, cloud-native rendering may be just around the corner.

Random-Access Neural Compression of Material Textures

The continuous advancement of photorealism in rendering is accompanied by a growth in texture data and, consequently, increasing storage and memory demands. To address this issue, we propose a novel neural compression technique specifically designed for material textures. We unlock two more levels of detail, i.e., 16× more texels, using low bitrate compression, with image quality that is better than advanced image compression techniques, such as AVIF and JPEG XL.

At the same time, our method allows on-demand, real-time decompression with random access similar to block texture compression on GPUs, enabling compression on disk and memory. The key idea behind our approach is compressing multiple material textures and their mipmap chains together, and using a small neural network, that is optimized for each material, to decompress them. Finally, we use a custom training implementation to achieve practical compression speeds, whose performance surpasses that of general frameworks, like PyTorch, by an order of magnitude.

MERF: Memory-Efficient Radiance Fields for Real-time View Synthesis in Unbounded Scenes

Neural radiance fields enable state-of-the-art photorealistic view synthesis. However, existing radiance field representations are either too compute-intensive for real-time rendering or require too much memory to scale to large scenes. We present a Memory-Efficient Radiance Field (MERF) representation that achieves real-time rendering of large-scale scenes in a browser. MERF reduces the memory consumption of prior sparse volumetric radiance fields using a combination of a sparse feature grid and high-resolution 2D feature planes. To support large-scale unbounded scenes, we introduce a novel contraction function that maps scene coordinates into a bounded volume while still allowing for efficient ray-box intersection. We design a lossless procedure for baking the parameterization used during training into a model that achieves real-time rendering while still preserving the photorealistic view synthesis quality of a volumetric radiance field.

An Extensible, Data-Oriented Architecture for High-Performance, Many-World Simulation

Training AI agents to perform complex tasks in simulated worlds requires millions to billions of steps of experience. To achieve high performance, today's fastest simulators for training AI agents adopt the idea of batch simulation: using a single simulation engine to simultaneously step many environments in parallel. We introduce a framework for productively authoring novel training environments (including custom logic for environment generation, environment time stepping, and generating agent observations and rewards) that execute as high-performance, GPU-accelerated batched simulators. Our key observation is that the entity-component-system (ECS) design pattern, popular for expressing CPU-side game logic today, is also well-suited for providing the structure needed for high-performance batched simulators. We contribute the first fully-GPU accelerated ECS implementation that natively supports batch environment simulation. We demonstrate how ECS abstractions impose structure on a training environment's logic and state that allows the system to efficiently manage state, amortize work, and identify GPU-friendly coherent parallel computations within and across different environments. We implement several learning environments in this framework, and demonstrate GPU speedups of two to three orders of magnitude over open source CPU baselines and 5-33× over strong baselines running on a 32-thread CPU. An implementation of the OpenAI hide and seek 3D environment written in our framework, which performs rigid body physics and ray tracing in each simulator step, achieves over 1.9 million environment steps per second on a single GPU.

SESSION: Diffusion for Geometry

Locally Attentional SDF Diffusion for Controllable 3D Shape Generation

Although the recent rapid evolution of 3D generative neural networks greatly improves 3D shape generation, it is still not convenient for ordinary users to create 3D shapes and control the local geometry of generated shapes. To address these challenges, we propose a diffusion-based 3D generation framework --- locally attentional SDF diffusion, to model plausible 3D shapes, via 2D sketch image input. Our method is built on a two-stage diffusion model. The first stage, named occupancy-diffusion, aims to generate a low-resolution occupancy field to approximate the shape shell. The second stage, named SDF-diffusion, synthesizes a high-resolution signed distance field within the occupied voxels determined by the first stage to extract fine geometry. Our model is empowered by a novel view-aware local attention mechanism for image-conditioned shape generation, which takes advantage of 2D image patch features to guide 3D voxel feature learning, greatly improving local controllability and model generalizability. Through extensive experiments in sketch-conditioned and category-conditioned 3D shape generation tasks, we validate and demonstrate the ability of our method to provide plausible and diverse 3D shapes, as well as its superior controllability and generalizability over existing work.

3DShape2VecSet: A 3D Shape Representation for Neural Fields and Generative Diffusion Models

We introduce 3DShape2VecSet, a novel shape representation for neural fields designed for generative diffusion models. Our shape representation can encode 3D shapes given as surface models or point clouds, and represents them as neural fields. The concept of neural fields has previously been combined with a global latent vector, a regular grid of latent vectors, or an irregular grid of latent vectors. Our new representation encodes neural fields on top of a set of vectors. We draw from multiple concepts, such as the radial basis function representation, and the cross attention and self-attention function, to design a learnable representation that is especially suitable for processing with transformers. Our results show improved performance in 3D shape encoding and 3D shape generative modeling tasks. We demonstrate a wide variety of generative applications: unconditioned generation, category-conditioned generation, text-conditioned generation, point-cloud completion, and image-conditioned generation. Code: https://1zb.github.io/3DShape2VecSet/.

SESSION: Character Animation

Composite Motion Learning with Task Control

We present a deep learning method for composite and task-driven motion control for physically simulated characters. In contrast to existing data-driven approaches using reinforcement learning that imitate full-body motions, we learn decoupled motions for specific body parts from multiple reference motions simultaneously and directly by leveraging the use of multiple discriminators in a GAN-like setup. In this process, there is no need of any manual work to produce composite reference motions for learning. Instead, the control policy explores by itself how the composite motions can be combined automatically. We further account for multiple task-specific rewards and train a single, multi-objective control policy. To this end, we propose a novel framework for multi-objective learning that adaptively balances the learning of disparate motions from multiple sources and multiple goal-directed control objectives. In addition, as composite motions are typically augmentations of simpler behaviors, we introduce a sample-efficient method for training composite control policies in an incremental manner, where we reuse a pre-trained policy as the meta policy and train a cooperative policy that adapts the meta one for new composite tasks. We show the applicability of our approach on a variety of challenging multi-objective tasks involving both composite motion imitation and multiple goal-directed control. Code is available at https://motion-lab.github.io/CompositeMotion.

Example-based Motion Synthesis via Generative Motion Matching

We present GenMM, a generative model that "mines" as many diverse motions as possible from a single or few example sequences. In stark contrast to existing data-driven methods, which typically require long offline training time, are prone to visual artifacts, and tend to fail on large and complex skeletons, GenMM inherits the training-free nature and the superior quality of the well-known Motion Matching method. GenMM can synthesize a high-quality motion within a fraction of a second, even with highly complex and large skeletal structures. At the heart of our generative framework lies the generative motion matching module, which utilizes the bidirectional visual similarity as a generative cost function to motion matching, and operates in a multi-stage framework to progressively refine a random guess using exemplar motion matches. In addition to diverse motion generation, we show the versatility of our generative framework by extending it to a number of scenarios that are not possible with motion matching alone, including motion completion, key frame-guided generation, infinite looping, and motion reassembly.

Learning Physically Simulated Tennis Skills from Broadcast Videos

We present a system that learns diverse, physically simulated tennis skills from large-scale demonstrations of tennis play harvested from broadcast videos. Our approach is built upon hierarchical models, combining a low-level imitation policy and a high-level motion planning policy to steer the character in a motion embedding learned from broadcast videos. When deployed at scale on large video collections that encompass a vast set of examples of real-world tennis play, our approach can learn complex tennis shotmaking skills and realistically chain together multiple shots into extended rallies, using only simple rewards and without explicit annotations of stroke types. To address the low quality of motions extracted from broadcast videos, we correct estimated motion with physics-based imitation, and use a hybrid control policy that overrides erroneous aspects of the learned motion embedding with corrections predicted by the high-level policy. We demonstrate that our system produces controllers for physically-simulated tennis players that can hit the incoming ball to target positions accurately using a diverse array of strokes (serves, forehands, and backhands), spins (topspins and slices), and playing styles (one/two-handed backhands, left/right-handed play). Overall, our system can synthesize two physically simulated characters playing extended tennis rallies with simulated racket and ball dynamics. Code and data for this work is available at https://research.nvidia.com/labs/toronto-ai/vid2player3d/.

DOC: Differentiable Optimal Control for Retargeting Motions onto Legged Robots

Legged robots are designed to perform highly dynamic motions. However, it remains challenging for users to retarget expressive motions onto these complex systems. In this paper, we present a Differentiable Optimal Control (DOC) framework that facilitates the transfer of rich motions from either animals or animations onto these robots. Interfacing with either motion capture or animation data, we formulate retargeting objectives whose parameters make them agnostic to differences in proportions and numbers of degrees of freedom between input and robot. Optimizing these parameters over the manifold spanned by optimal state and control trajectories, we minimize the retargeting error. We demonstrate the utility and efficacy of our modeling by applying DOC to a Model-Predictive Control (MPC) formulation, showing retargeting results for a family of robots of varying proportions and mass distribution. With a hardware deployment, we further show that the retargeted motions are physically feasible, while MPC ensures that the robots retain their capability to react to unexpected disturbances.

SESSION: Colorful Topics In Imaging

Image vectorization and editing via linear gradient layer decomposition

A key advantage of vector graphics over raster graphics is their editability. For example, linear gradients define a spatially varying color fill with a few intuitive parameters, which are ubiquitously supported in standard vector graphics formats and libraries. By layering regions filled with linear gradients, complex appearances can be created. We propose an automatic method to convert a raster image into layered regions of linear gradients. Given an input raster image segmented into regions, our approach decomposes the resulting regions into opaque and semi-transparent linear gradient fills. Our approach is fully automatic (e.g., users do not identify a background as in previous approaches) and exhaustively considers all possible decompositions that satisfy perceptual cues. Experiments on a variety of images demonstrate that our method is robust and effective.

ColorfulCurves: Palette-Aware Lightness Control and Color Editing via Sparse Optimization

Color editing in images often consists of two main tasks: changing hue and saturation, and editing lightness or tone curves. State-of-the-art palette-based recoloring approaches entangle these two tasks. A user's only lightness control is changing the lightness of individual palette colors. This is inferior to state-of-the-art commercial software, where lightness editing is based on flexible tone curves that remap lightness. However, tone curves are only provided globally or per color channel (e.g., RGB). They are unrelated to the image content. Neither tone curves nor palette-based approaches support direct image-space edits---changing a specific pixel to a desired hue, saturation, and lightness. ColorfulCurves solves both of these problems by uniting palette-based and tone curve editing. In ColorfulCurves, users directly edit palette colors' hue and saturation, per-palette tone curves, or image pixels (hue, saturation, and lightness). ColorfulCurves solves an L2,1 optimization problem in real-time to find a sparse edit that satisfies all user constraints. Our expert study found overwhelming support for ColorfulCurves over experts' preferred tools.

Seeing Photons in Color

Megapixel single-photon avalanche diode (SPAD) arrays have been developed recently, opening up the possibility of deploying SPADs as generalpurpose passive cameras for photography and computer vision. However, most previous work on SPADs has been limited to monochrome imaging. We propose a computational photography technique that reconstructs high-quality color images from mosaicked binary frames captured by a SPAD array, even for high-dyanamic-range (HDR) scenes with complex and rapid motion. Inspired by conventional burst photography approaches, we design algorithms that jointly denoise and demosaick single-photon image sequences. Based on the observation that motion effectively increases the color sample rate, we design a blue-noise pseudorandom RGBW color filter array for SPADs, which is tailored for imaging dark, dynamic scenes. Results on simulated data, as well as real data captured with a fabricated color SPAD hardware prototype shows that the proposed method can reconstruct high-quality images with minimal color artifacts even for challenging low-light, HDR and fast-moving scenes. We hope that this paper, by adding color to computational single-photon imaging, spurs rapid adoption of SPADs for real-world passive imaging applications.

Guided Linear Upsampling

Guided upsampling is an effective approach for accelerating high-resolution image processing. In this paper, we propose a simple yet effective guided upsampling method. Each pixel in the high-resolution image is represented as a linear interpolation of two low-resolution pixels, whose indices and weights are optimized to minimize the upsampling error. The downsampling can be jointly optimized in order to prevent missing small isolated regions. Our method can be derived from the color line model and local color transformations. Compared to previous methods, our method can better preserve detail effects while suppressing artifacts such as bleeding and blurring. It is efficient, easy to implement, and free of sensitive parameters. We evaluate the proposed method with a wide range of image operators, and show its advantages through quantitative and qualitative analysis. We demonstrate the advantages of our method for both interactive image editing and real-time high-resolution video processing. In particular, for interactive editing, the joint optimization can be precomputed, thus allowing for instant feedback without hardware acceleration.

Language-based Photo Color Adjustment for Graphic Designs

Adjusting the photo color to associate with some design elements is an essential way for a graphic design to effectively deliver its message and make it aesthetically pleasing. However, existing tools and previous works face a dilemma between the ease of use and level of expressiveness. To this end, we introduce an interactive language-based approach for photo recoloring, which provides an intuitive system that can assist both experts and novices on graphic design. Given a graphic design containing a photo that needs to be recolored, our model can predict the source colors and the target regions, and then recolor the target regions with the source colors based on the given language-based instruction. The multi-granularity of the instruction allows diverse user intentions. The proposed novel task faces several unique challenges, including: 1) color accuracy for recoloring with exactly the same color from the target design element as specified by the user; 2) multi-granularity instructions for parsing instructions correctly to generate a specific result or multiple plausible ones; and 3) locality for recoloring in semantically meaningful local regions to preserve original image semantics. To address these challenges, we propose a model called LangRecol with two main components: the language-based source color prediction module and the semantic-palette-based photo recoloring module. We also introduce an approach for generating a synthetic graphic design dataset with instructions to enable model training. We evaluate our model via extensive experiments and user studies. We also discuss several practical applications, showing the effectiveness and practicality of our approach. Please find the code and data at https://zhenwwang.github.io/langrecol.

SESSION: Surfaces, Strips, Lights

Differentiable Stripe Patterns for Inverse Design of Structured Surfaces

Stripe patterns are ubiquitous in nature and everyday life. While the synthesis of these patterns has been thoroughly studied in the literature, their potential to control the mechanics of structured materials remains largely unexplored. In this work, we introduce Differentiable Stripe Patterns---a computational approach for automated design of physical surfaces structured with stripe-shaped bi-material distributions. Our method builds on the work by Knöppel and colleagues [2015] for generating globally-continuous and equally-spaced stripe patterns. To unlock the full potential of this design space, we propose a gradient-based optimization tool to automatically compute stripe patterns that best approximate macromechanical performance goals. Specifically, we propose a computational model that combines solid shell finite elements with XFEM for accurate and fully-differentiable modeling of elastic bi-material surfaces. To resolve non-uniqueness problems in the original method, we furthermore propose a robust formulation that yields unique and differentiable stripe patterns. We combine these components with equilibrium state derivatives into an end-to-end differentiable pipeline that enables inverse design of mechanical stripe patterns. We demonstrate our method on a diverse set of examples that illustrate the potential of stripe patterns as a design space for structured materials. Our simulation results are experimentally validated on physical prototypes.

Deployable strip structures

We introduce the new concept of C-mesh to capture kinetic structures that can be deployed from a collapsed state. Quadrilateral C-meshes enjoy rich geometry and surprising relations with differential geometry: A structure that collapses onto a flat and straight strip corresponds to a Chebyshev net of curves on a surface of constant Gaussian curvature, while structures collapsing onto a circular strip follow surfaces which enjoy the linear-Weingarten property. Interestingly, allowing more general collapses actually leads to a smaller class of shapes. Hexagonal C-meshes have more degrees of freedom, but a local analysis suggests that there is no such direct relation to smooth surfaces. Besides theory, this paper provides tools for exploring the shape space of C-meshes and for their design. We also present an application for freeform architectural skins, namely paneling with spherical panels of constant radius, which is an important fabrication-related constraint.

B-rep Matching for Collaborating Across CAD Systems

Large Computer-Aided Design (CAD) projects usually require collaboration across many different CAD systems as well as applications that interoperate with them for manufacturing, visualization, or simulation. A fundamental barrier to such collaborations is the ability to refer to parts of the geometry (such as a specific face) robustly under geometric and/or topological changes to the model. Persistent referencing schemes are a fundamental aspect of most CAD tools, but models that are shared across systems cannot generally make use of these internal referencing mechanisms, creating a challenge for collaboration. In this work, we address this issue by developing a novel learning-based algorithm that can automatically find correspondences between two CAD models using the standard representation used for sharing models across CAD systems: the Boundary-Representation (B-rep). Because our method works directly on B-reps it can be generalized across different CAD applications enabling collaboration.

∇-Prox: Differentiable Proximal Algorithm Modeling for Large-Scale Optimization

Tasks across diverse application domains can be posed as large-scale optimization problems, these include graphics, vision, machine learning, imaging, health, scheduling, planning, and energy system forecasting. Independently of the application domain, proximal algorithms have emerged as a formal optimization method that successfully solves a wide array of existing problems, often exploiting problem-specific structures in the optimization. Although model-based formal optimization provides a principled approach to problem modeling with convergence guarantees, at first glance, this seems to be at odds with black-box deep learning methods. A recent line of work shows that, when combined with learning-based ingredients, model-based optimization methods are effective, interpretable, and allow for generalization to a wide spectrum of applications with little or no extra training data. However, experimenting with such hybrid approaches for different tasks by hand requires domain expertise in both proximal optimization and deep learning, which is often error-prone and time-consuming. Moreover, naively unrolling these iterative methods produces lengthy compute graphs, which when differentiated via autograd techniques results in exploding memory consumption, making batch-based training challenging. In this work, we introduce ∇-Prox, a domain-specific modeling language and compiler for large-scale optimization problems using differentiable proximal algorithms. ∇-Prox allows users to specify optimization objective functions of unknowns concisely at a high level, and intelligently compiles the problem into compute and memory-efficient differentiable solvers. One of the core features of ∇-Prox is its full differentiability, which supports hybrid model- and learning-based solvers integrating proximal optimization with neural network pipelines. Example applications of this methodology include learning-based priors and/or sample-dependent inner-loop optimization schedulers, learned with deep equilibrium learning or deep reinforcement learning. With a few lines of code, we show ∇-Prox can generate performant solvers for a range of image optimization problems, including end-to-end computational optics, image deraining, and compressive magnetic resonance imaging. We also demonstrate ∇-Prox can be used in a completely orthogonal application domain of energy system planning, an essential task in the energy crisis and the clean energy transition, where it outperforms state-of-the-art CVXPY and commercial Gurobi solvers.

SESSION: Most Def: Fast, Large and Learned Deformables

Fast Complementary Dynamics via Skinning Eigenmodes

We propose a reduced-space elastodynamic solver that is well suited for augmenting rigged character animations with secondary motion. At the core of our method is a novel deformation subspace based on Linear Blend Skinning that overcomes many of the shortcomings prior subspace methods face. Our skinning subspace is parameterized entirely by a set of scalar weights, which we can obtain through a small, material-aware and rig-sensitive generalized eigenvalue problem. The resulting subspace can easily capture rotational motion and guarantees that the resulting simulation is rotation equivariant. We further propose a simple local-global solver for linear co-rotational elasticity and propose a clustering method to aggregate per-tetrahedra nonlinear energetic quantities. The result is a compact simulation that is fully decoupled from the complexity of the mesh.

Motion from Shape Change

We consider motion effected by shape change. Such motions are ubiquitous in nature and the human made environment, ranging from single cells to platform divers and jellyfish. The shapes may be immersed in various media ranging from the very viscous to air and nearly inviscid fluids. In the absence of external forces these settings are characterized by constant momentum. We exploit this in an algorithm which takes a sequence of changing shapes, say, as modeled by an animator, as input and produces corresponding motion in world coordinates. Our method is based on the geometry of shape change and an appropriate variational principle. The corresponding Euler-Lagrange equations are first order ODEs in the unknown rotations and translations and the resulting time stepping algorithm applies to all these settings without modification as we demonstrate with a broad set of examples.

Second-order Stencil Descent for Interior-point Hyperelasticity

In this paper, we present a GPU algorithm for finite element hyperelastic simulation. We show that the interior-point method, known to be effective for robust collision resolution, can be coupled with non-Newton procedures and be massively sped up on the GPU. Newton's method has been widely chosen for the interior-point family, which fully solves a linear system at each step. After that, the active set associated with collision/contact constraints is updated. Mimicking this routine using a non-Newton optimization (like gradient descent or ADMM) unfortunately does not deliver expected accelerations. This is because the barrier functions employed in an interior-point method need to be updated at every iteration to strictly confine the search to the feasible region. The associated cost (e.g., per-iteration CCD) quickly overweights the benefit brought by the GPU, and a new parallelism modality is needed. Our algorithm is inspired by the domain decomposition method and designed to move interior-point-related computations to local domains as much as possible. We minimize the size of each domain (i.e., a stencil) by restricting it to a single element, so as to fully exploit the capacity of modern GPUs. The stencil-level results are integrated into a global update using a novel hybrid sweep scheme. Our algorithm is locally second-order offering better convergence. It enables simulation acceleration of up to two orders over its CPU counterpart. We demonstrate the scalability, robustness, efficiency, and quality of our algorithm in a variety of simulation scenarios with complex and detailed collision geometries.

SESSION: Material Rendering

A Full-Wave Reference Simulator for Computing Surface Reflectance

Computing light reflection from rough surfaces is an important topic in computer graphics. Reflection models developed based on geometric optics fail to capture wave effects such as diffraction and interference, while existing models based on physical optics approximations give erroneous predictions under many circumstances (e.g. when multiple scattering from the surface cannot be ignored). We present a scalable 3D full-wave simulator for computing reference solutions to surface scattering problems, which can be used to evaluate and guide the development of approximate models for rendering. We investigate the range of validity for some existing wave optics based reflection models; our results confirm these models for low-roughness surfaces but also show that prior rendering methods do not accurately predict the scattering behavior of some types of surfaces.

Our simulator is based on the boundary element method (BEM) and accelerated using the adaptive integral method (AIM), and is implemented to execute on modern GPUs. We demonstrate the simulator on domains up to 60 × 60 × 10 wavelengths, involving surface samples with significant height variations. Furthermore, we propose a new system for efficiently computing BRDF values for large numbers of incident and outgoing directions at once, by combining small simulations to characterize larger areas. Our simulator will be released as an open-source toolkit for computing surface scattering.

Pyramid Texture Filtering

We present a simple but effective technique to smooth out textures while preserving the prominent structures. Our method is built upon a key observation---the coarsest level in a Gaussian pyramid often naturally eliminates textures and summarizes the main image structures. This inspires our central idea for texture filtering, which is to progressively upsample the very low-resolution coarsest Gaussian pyramid level to a full-resolution texture smoothing result with well-preserved structures, under the guidance of each fine-scale Gaussian pyramid level and its associated Laplacian pyramid level. We show that our approach is effective to separate structure from texture of different scales, local contrasts, and forms, without degrading structures or introducing visual artifacts. We also demonstrate the applicability of our method on various applications including detail enhancement, image abstraction, HDR tone mapping, inverse halftoning, and LDR image enhancement. Code is available at https://rewindl.github.io/pyramid_texture_filtering/.

SESSION: Surface Reconstruction

Globally Consistent Normal Orientation for Point Clouds by Regularizing the Winding-Number Field

Estimating normals with globally consistent orientations for a raw point cloud has many downstream geometry processing applications. Despite tremendous efforts in the past decades, it remains challenging to deal with an unoriented point cloud with various imperfections, particularly in the presence of data sparsity coupled with nearby gaps or thin-walled structures. In this paper, we propose a smooth objective function to characterize the requirements of an acceptable winding-number field, which allows one to find the globally consistent normal orientations starting from a set of completely random normals. By taking the vertices of the Voronoi diagram of the point cloud as examination points, we consider the following three requirements: (1) the winding number is either 0 or 1, (2) the occurrences of 1 and the occurrences of 0 are balanced around the point cloud, and (3) the normals align with the outside Voronoi poles as much as possible. Extensive experimental results show that our method outperforms the existing approaches, especially in handling sparse and noisy point clouds, as well as shapes with complex geometry/topology.

Locally Meshable Frame Fields

The main robustness issue of state-of-the-art frame field based hexahedral mesh generation algorithms originates from non-meshable topological configurations, which do not admit the construction of an integer-grid map but frequently occur in smooth frame fields. In this article, we investigate the topology of frame fields and derive conditions on their meshability, which are the basis for a novel algorithm to automatically turn a given non-meshable frame field into a similar but locally meshable one. Despite local meshability is only a necessary but not sufficient condition for the stronger requirement of meshability, our algorithm increases the 2% success rate of generating valid integer-grid maps with state-of-the-art methods to 58%, when compared on the challenging HexMe dataset [Beaufort et al. 2022]. The source code of our implementation and the data of our experiments are available at https://lib.algohex.eu.

SESSION: Neural Capturing

Neural Volumetric Reconstruction for Coherent Synthetic Aperture Sonar

Synthetic aperture sonar (SAS) measures a scene from multiple views in order to increase the resolution of reconstructed imagery. Image reconstruction methods for SAS coherently combine measurements to focus acoustic energy onto the scene. However, image formation is typically under-constrained due to a limited number of measurements and bandlimited hardware, which limits the capabilities of existing reconstruction methods. To help meet these challenges, we design an analysis-by-synthesis optimization that leverages recent advances in neural rendering to perform coherent SAS imaging. Our optimization enables us to incorporate physics-based constraints and scene priors into the image formation process. We validate our method on simulation and experimental results captured in both air and water. We demonstrate both quantitatively and qualitatively that our method typically produces superior reconstructions than existing approaches. We share code and data for reproducibility.

NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images

We present a neural rendering-based method called NeRO for reconstructing the geometry and the BRDF of reflective objects from multiview images captured in an unknown environment. Multiview reconstruction of reflective objects is extremely challenging because specular reflections are view-dependent and thus violate the multiview consistency, which is the cornerstone for most multiview reconstruction methods. Recent neural rendering techniques can model the interaction between environment lights and the object surfaces to fit the view-dependent reflections, thus making it possible to reconstruct reflective objects from multiview images. However, accurately modeling environment lights in the neural rendering is intractable, especially when the geometry is unknown. Most existing neural rendering methods, which can model environment lights, only consider direct lights and rely on object masks to reconstruct objects with weak specular reflections. Therefore, these methods fail to reconstruct reflective objects, especially when the object mask is not available and the object is illuminated by indirect lights. We propose a two-step approach to tackle this problem. First, by applying the split-sum approximation and the integrated directional encoding to approximate the shading effects of both direct and indirect lights, we are able to accurately reconstruct the geometry of reflective objects without any object masks. Then, with the object geometry fixed, we use more accurate sampling to recover the environment lights and the BRDF of the object. Extensive experiments demonstrate that our method is capable of accurately reconstructing the geometry and the BRDF of reflective objects from only posed RGB images without knowing the environment lights and the object masks. Codes and datasets are available at https://github.com/liuyuan-pal/NeRO.

SESSION: Fabrication-Oriented Design

Masonry Shell Structures with Discrete Equivalence Classes

This paper proposes a method to model masonry shell structures where the shell elements fall into a set of discrete equivalence classes. Such shell structure can reduce the fabrication cost and simplify the physical construction due to reuse of a few template shell elements. Given a freeform surface, our goal is to generate a small set of template shell elements that can be reused to produce a seamless and buildable structure that closely resembles the surface. The major technical challenge in this process is balancing the desire for high reusability of template elements with the need for a seamless and buildable final structure. To address the challenge, we define three error metrics to measure the seamlessness and buildability of shell structures made from discrete equivalence classes and develop a hierarchical cluster-and-optimize approach to generate a small set of template elements that produce a structure closely approximating the surface with low error metrics. We demonstrate the feasibility of our approach on various freeform surfaces and geometric patterns, and validate buildability of our results with four physical prototypes. Code and data of this paper are at https://github.com/Linsanity81/TileableShell.

Generative Design of Sheet Metal Structures

Sheet Metal (SM) fabrication is perhaps one of the most common metalworking technique.

Despite its prevalence, SM design is manual and costly, with rigorous practices that restrict the search space, yielding suboptimal results.

In contrast, we present a framework for the first automatic design of SM parts. Focusing on load bearing applications, our novel system generates a high-performing manufacturable SM that adheres to the numerous constraints that SM design entails:

The resulting part minimizes manufacturing costs while adhering to structural, spatial, and manufacturing constraints. In other words, the part should be strong enough, not disturb the environment, and adhere to the manufacturing process. These desiderata sum up to an elaborate, sparse, and expensive search space.

Our generative approach is a carefully designed exploration process, comprising two steps. In Segment Discovery connections from the input load to attachable regions are accumulated, and during Segment Composition the most performing valid combination is searched for.

For Discovery, we define a slim grammar, and sample it for parts using a Markov-Chain Monte Carlo (MCMC) approach, ran in intercommunicating instances (i.e, chains) for diversity. This, followed by a short continuous optimization, enables building a diverse and high-quality library of substructures. During Composition, a valid and minimal cost combination of the curated substructures is selected. To improve compliance significantly without additional manufacturing costs, we reinforce candidate parts onto themselves --- a unique SM capability called self-riveting. we provide our code and data in https://github.com/amir90/AutoSheetMetal.

We show our generative approach produces viable parts for numerous scenarios. We compare our system against a human expert and observe improvements in both part quality and design time. We further analyze our pipeline's steps with respect to resulting quality, and have fabricated some results for validation.

We hope our system will stretch the field of SM design, replacing costly expert hours with minutes of standard CPU, making this cheap and reliable manufacturing method accessible to anyone.

Inkjet 4D Print: Self-folding Tessellated Origami Objects by Inkjet UV Printing

We propose Inkjet 4D Print, a self-folding fabrication method of 3D origami tessellations by printing 2D patterns on both sides of a heat-shrinkable base sheet, using a commercialized inkjet ultraviolet (UV) printer. Compared to the previous folding-based 4D printing approach using fused deposition modeling (FDM) 3D printers [An et al. 2018], our method has merits in (1) more than 1200 times higher resolution in terms of the number of self-foldable facets, (2) 2.8 times faster printing speed, and (3) optional full-color decoration. This paper describes the material selection, the folding mechanism, the heating condition, and the printing patterns to self-fold both known and freeform tessellations. We also evaluated the self-folding resolution, the printing and transformation speed, and the shape accuracy of our method. Finally, we demonstrated applications enabled by our self-foldable tessellated objects.

SESSION: All About Meshes

Surface Simplification using Intrinsic Error Metrics

This paper describes a method for fast simplification of surface meshes. Whereas past methods focus on visual appearance, our goal is to solve equations on the surface. Hence, rather than approximate the extrinsic geometry, we construct a coarse intrinsic triangulation of the input domain. In the spirit of the quadric error metric (QEM), we perform greedy decimation while agglomerating global information about approximation error. In lieu of extrinsic quadrics, however, we store intrinsic tangent vectors that track how far curvature "drifts" during simplification. This process also yields a bijective map between the fine and coarse mesh, and prolongation operators for both scalar- and vector-valued data. Moreover, we obtain hard guarantees on element quality via intrinsic retriangulation---a feature unique to the intrinsic setting. The overall payoff is a "black box" approach to geometry processing, which decouples mesh resolution from the size of matrices used to solve equations. We show how our method benefits several fundamental tasks, including geometric multigrid, all-pairs geodesic distance, mean curvature flow, geodesic Voronoi diagrams, and the discrete exponential map.

Robust Low-Poly Meshing for General 3D Models

We propose a robust re-meshing approach that can automatically generate visual-preserving low-poly meshes for any high-poly models found in the wild. Our method can be seamlessly integrated into current mesh-based 3D asset production pipelines. Given an input high-poly, our method proceeds in two stages: 1) Robustly extracting an offset surface mesh that is feature-preserving, and guaranteed to be watertight, manifold, and self-intersection free; 2) Progressively simplifying and flowing the offset mesh to bring it close to the input. The simplicity and the visual-preservation of the generated low-poly is controlled by a user-required target screen size of the input: decreasing the screen size reduces the element count of the low-poly but enlarges its visual difference from the input. We have evaluated our method on a subset of the Thingi10K dataset that contains models created by practitioners in different domains, with varying topological and geometric complexities. Compared to state-of-the-art approaches and widely used software, our method demonstrates its superiority in terms of the element count, visual preservation, geometry, and topology guarantees of the generated low-polys.

Evolutionary Piecewise Developable Approximations

We propose a novel method to compute high-quality piecewise developable approximations for triangular meshes. Central to our approach is an evolutionary genetic algorithm for optimizing the combinatorial and discontinuous fitness function, including the approximation error, the number of patches, the patch boundary length, and the penalty for small patches and narrow regions within patches. The genetic algorithm's operations (i.e., initialization, selection, mutation, and crossover) are explicitly designed to minimize the fitness function.

The main challenge is evaluating the fitness function's approximation error as it requires developable patches, which are difficult or time-consuming to obtain. Resolving the challenge is based on a critical observation: the approximation error and the mapping distortion between an input surface and its developable approximation are positively correlated empirically. To efficiently measure distortion without explicitly generating developable shapes, we creatively use conformal mapping techniques. Then, we control the mapping distortion at a relatively low level to achieve high shape similarity in the genetic algorithm.

The feasibility and effectiveness of our method are demonstrated over 240 complex examples. Compared with the state-of-the-art methods, our results have much smaller approximation errors, fewer patches, shorter patch boundaries, and fewer small patches and narrow regions.

Micro-Mesh Construction

Micro-meshes (μ-meshes) are a new structured graphics primitive supporting a large increase in geometric fidelity, without commensurate memory and run-time processing costs, consisting of a base mesh enriched by a displacement map. A new generation of GPUs supports this structure with native hardware μ-mesh ray-tracing, that leverages a self-bounding, compressed displacement mapping scheme to achieve these efficiencies.

In this paper, we present anautomatic method to convert an existing multi-million triangle mesh into this compact format, unlocking the advantages of the data representation for a large number of scenarios. We identify the requirements for high-quality μ-meshes, and show how existing re-meshing and displacement-map baking tools are ill-suited for their generation. Our method is based on a simplification scheme tailored to the generation of high-quality base meshes, optimized for tessellation and displacement sampling, in conjunction with algorithms for determining displacement vectors to control the direction and range of displacements. We also explore the optimization of μ-meshes for texture maps and the representation of boundaries.

We demonstrate our method with extensive batch processing, converting an existing collection of high-resolution scanned models to the micro-mesh representation, providing an open-source reference implementation, and, as additional material, the data and an inspection tool.

SESSION: Going With the Flow

A Contact Proxy Splitting Method for Lagrangian Solid-Fluid Coupling

We present a robust and efficient method for simulating Lagrangian solid-fluid coupling based on a new operator splitting strategy. We use variational formulations to approximate fluid properties and solid-fluid interactions, and introduce a unified two-way coupling formulation for SPH fluids and FEM solids using interior point barrier-based frictional contact. We split the resulting optimization problem into a fluid phase and a solid-coupling phase using a novel time-splitting approach with augmented contact proxies, and propose efficient custom linear solvers. Our technique accounts for fluids interaction with nonlinear hyperelastic objects of different geometries and codimensions, while maintaining an algorithmically guaranteed non-penetrating criterion. Comprehensive benchmarks and experiments demonstrate the efficacy of our method.

Fluid-Solid Coupling in Kinetic Two-Phase Flow Simulation

Real-life flows exhibit complex and visually appealing behaviors such as bubbling, splashing, glugging and wetting that simulation techniques in graphics have attempted to capture for years. While early approaches were not capable of reproducing multiphase flow phenomena due to their excessive numerical viscosity and low accuracy, kinetic solvers based on the lattice Boltzmann method have recently demonstrated the ability to simulate water-air interaction at high Reynolds numbers in a massively-parallel fashion. However, robust and accurate handling of fluid-solid coupling has remained elusive: be it for CG or CFD solvers, as soon as the motion of immersed objects is too fast or too sudden, pressures near boundaries and interfacial forces exhibit spurious oscillations leading to blowups. Built upon a phase-field and velocity-distribution based lattice-Boltzmann solver for multiphase flows, this paper spells out a series of numerical improvements in momentum exchange, interfacial forces, and two-way coupling to drastically reduce these typical artifacts, thus significantly expanding the types of fluid-solid coupling that we can efficiently simulate. We highlight the numerical benefits of our solver through various challenging simulation results, including comparisons to previous work and real footage.

PolyStokes: A Polynomial Model Reduction Method for Viscous Fluid Simulation

Standard liquid simulators apply operator splitting to independently solve for pressure and viscous stresses, a decoupling that induces incorrect free surface boundary conditions. Such methods are unable to simulate fluid phenomena reliant on the balance of pressure and viscous stresses, such as the liquid rope coil instability exhibited by honey. By contrast, unsteady Stokes solvers retain coupling between pressure and viscosity, thus resolving these phenomena, albeit using a much larger and thus more computationally expensive linear system compared to the decoupled approach. To accelerate solving the unsteady Stokes problem, we propose a reduced fluid model wherein interior regions are represented with incompressible polynomial vector fields. Sets of standard grid cells are consolidated into super-cells, each of which are modelled using a quadratic field of 26 degrees of freedom. We demonstrate that the reduced field must necessarily be at least quadratic, with the affine model being unable to correctly capture viscous forces. We reproduce the liquid rope coiling instability, as well as other simulated examples, to show that our reduced model is able to reproduce the same fluid phenomena at a smaller computational cost. Futhermore, we performed a crowdsourced user survey to verify that our method produces imperceptible differences compared to the full unsteady Stokes method.

Building a Virtual Weakly-Compressible Wind Tunnel Testing Facility

Virtual wind tunnel testing is a key ingredient in the engineering design process for the automotive and aeronautical industries as well as for urban planning: through visualization and analysis of the simulation data, it helps optimize lift and drag coefficients, increase peak speed, detect high pressure zones, and reduce wind noise at low cost prior to manufacturing. In this paper, we develop an efficient and accurate virtual wind tunnel system based on recent contributions from both computer graphics and computational fluid dynamics in high-performance kinetic solvers. Running on one or multiple GPUs, our massively-parallel lattice Boltzmann model meets industry standards for accuracy and consistency while exceeding current mainstream industrial solutions in terms of efficiency --- especially for unsteady turbulent flow simulation at very high Reynolds number (on the order of 107) --- due to key contributions in improved collision modeling and boundary treatment, automatic construction of multiresolution grids for complex models, as well as performance optimization. We demonstrate the efficacy and reliability of our virtual wind tunnel testing facility through comparisons of our results to multiple benchmark tests, showing an increase in both accuracy and efficiency compared to state-of-the-art industrial solutions. We also illustrate the fine turbulence structures that our system can capture, indicating the relevance of our solver for both VFX and industrial product design.

Fluid Cohomology

The vorticity-streamfunction formulation for incompressible inviscid fluids is the basis for many fluid simulation methods in computer graphics, including vortex methods, streamfunction solvers, spectral methods, and Monte Carlo methods. We point out that current setups in the vorticity-streamfunction formulation are insufficient at simulating fluids on general non-simply-connected domains. This issue is critical in practice, as obstacles, periodic boundaries, and nonzero genus can all make the fluid domain multiply connected. These scenarios introduce nontrivial cohomology components to the flow in the form of harmonic fields. The dynamics of these harmonic fields have been previously overlooked. In this paper, we derive the missing equations of motion for the fluid cohomology components. We elucidate the physical laws associated with the new equations, and show their importance in reproducing physically correct behaviors of fluid flows on domains with general topology.

Improved Water Sound Synthesis using Coupled Bubbles

We introduce a practical framework for synthesizing bubble-based water sounds that captures the rich inter-bubble coupling effects responsible for low-frequency acoustic emissions from bubble clouds. We propose coupled-bubble oscillator models with regularized singularities, and techniques to reduce the computational cost of time stepping with dense, time-varying mass matrices. Airborne acoustic emissions are estimated using finite-difference time-domain (FDTD) methods. We propose a simple, analytical surface-acceleration model, and a sample-and-hold GPU wavesolver that is simple and faster than prior CPU wavesolvers.

Sound synthesis results are demonstrated using bubbly flows from incompressible, two-phase simulations, as well as procedurally generated examples using single-phase FLIP fluid animations. Our results demonstrate sound simulations with hundreds of thousands of bubbles, and perceptually significant frequency transformations with fuller low-frequency content.

SESSION: Text-Guided Generation

UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image

Text-driven image generation methods have shown impressive results recently, allowing casual users to generate high quality images by providing textual descriptions. However, similar capabilities for editing existing images are still out of reach. Text-driven image editing methods usually need edit masks, struggle with edits that require significant visual changes and cannot easily keep specific details of the edited portion. In this paper we make the observation that image-generation models can be converted to image-editing models simply by fine-tuning them on a single image. We also show that initializing the stochastic sampler with a noised version of the base image before the sampling and interpolating relevant details from the base image after sampling further increase the quality of the edit operation. Combining these observations, we propose UniTune, a novel image editing method. UniTune gets as input an arbitrary image and a textual edit description, and carries out the edit while maintaining high fidelity to the input image. UniTune does not require additional inputs, like masks or sketches, and can perform multiple edits on the same image without retraining. We test our method using the Imagen model in a range of different use cases. We demonstrate that it is broadly applicable and can perform a surprisingly wide range of expressive editing operations, including those requiring significant visual changes that were previously impossible.

SESSION: Marvelous Mappings

Galaxy Maps: Localized Foliations for Bijective Volumetric Mapping

A method is presented to compute volumetric maps and parametrizations of objects over 3D domains. As a key feature, continuity and bijectivity are ensured by construction. Arbitrary objects of ball topology, represented as tetrahedral meshes, are supported. Arbitrary convex as well as star-shaped domains are supported. Full control over the boundary mapping is provided. The method is based on the technique of simplicial foliations, generalized to a broader class of domain shapes and applied adaptively in a novel localized manner. This increases flexibility as well as efficiency over the state of the art, while maintaining reliability in guaranteeing map bijectivity.

Variational quasi-harmonic maps for computing diffeomorphisms

Computation of injective (or inversion-free) maps is a key task in geometry processing, physical simulation, and shape optimization. Despite being a longstanding problem, it remains challenging due to its highly nonconvex and combinatoric nature. We propose computation of variational quasi-harmonic maps to obtain smooth inversion-free maps. Our work is built on a key observation about inversion-free maps: A planar map is a diffeomorphism if and only if it is quasi-harmonic and satisfies a special Cauchy boundary condition. We hence equate the inversion-free mapping problem to an optimal control problem derived from our theoretical result, in which we search in the space of parameters that define an elliptic PDE. We show that this problem can be solved by minimizing within a family of functionals. Similarly, our discretized functionals admit exactly injective maps as the minimizers, empirically producing inversion-free discrete maps of triangle meshes. We design efficient numerical procedures for our problem that prioritize robust convergence paths. Experiments show that on challenging examples our methods can achieve up to orders of magnitude improvement over state-of-the-art, in terms of speed or quality. Moreover, we demonstrate how to optimize a generic energy in our framework while restricting to quasi-harmonic maps.

Expansion Cones: A Progressive Volumetric Mapping Framework

Volumetric mapping is a ubiquitous and difficult problem in Geometry Processing and has been the subject of research in numerous and various directions. While several methods show encouraging results, the field still lacks a general approach with guarantees regarding map bijectivity. Through this work, we aim at opening the door to a new family of methods by providing a novel framework based on the concept of progressive expansion. Starting from an initial map of a tetrahedral mesh whose image may contain degeneracies but no inversions, we incrementally adjust vertex images to expand degenerate elements. By restricting movement to so-called expansion cones, it is done in such a way that the number of degenerate elements decreases in a strictly monotonic manner, without ever introducing any inversion. Adaptive local refinement of the mesh is performed to facilitate this process. We describe a prototype algorithm in the realm of this framework for the computation of maps from ball-topology tetrahedral meshes to convex or star-shaped domains. This algorithm is evaluated and compared to state-of-the-art methods, demonstrating its benefits in terms of bijectivity. We also discuss the associated cost in terms of sometimes significant mesh refinement to obtain the necessary degrees of freedom required for establishing a valid mapping. Our conclusions include that while this algorithm is only of limited immediate practical utility due to efficiency concerns, the general framework has the potential to inspire a range of novel methods improving on the efficiency aspect.

Unsupervised Learning of Robust Spectral Shape Matching

We propose a novel learning-based approach for robust 3D shape matching. Our method builds upon deep functional maps and can be trained in a fully unsupervised manner. Previous deep functional map methods mainly focus on predicting optimised functional maps alone, and then rely on off-the-shelf post-processing to obtain accurate point-wise maps during inference. However, this two-stage procedure for obtaining point-wise maps often yields sub-optimal performance. In contrast, building upon recent insights about the relation between functional maps and point-wise maps, we propose a novel unsupervised loss to couple the functional maps and point-wise maps, and thereby directly obtain point-wise maps without any post-processing. Our approach obtains accurate correspondences not only for near-isometric shapes, but also for more challenging non-isometric shapes and partial shapes, as well as shapes with different discretisation or topological noise. Using a total of nine diverse datasets, we extensively evaluate the performance and demonstrate that our method substantially outperforms previous state-of-the-art methods, even compared to recent supervised methods. Our code is available at https://github.com/dongliangcao/Unsupervised-Learning-of-Robust-Spectral-Shape-Matching.

SESSION: Real-Time Rendering: Gotta Go Fast!

ETER: Elastic Tessellation for Real-Time Pixel-Accurate Rendering of Large-Scale NURBS Models

We present ETER, an elastic tessellation framework for rendering large-scale NURBS models with pixel-accurate and crack-free quality at real-time frame rates. We propose a highly parallel adaptive tessellation algorithm to achieve pixel accuracy, measured by the screen space error between the exact surface and its triangulation. To resolve a bottleneck in NURBS rendering, we present a novel evaluation method based on uniform sampling grids and accelerated by GPU Tensor Cores. Compared to evaluation based on hardware tessellation, our method has achieved a significant speedup of 2.9 to 16.2 times depending on the degrees of the patches. We develop an efficient crack-filling algorithm based on conservative rasterization and visibility buffer to fill the tessellation-induced cracks while greatly reducing the jagged effect introduced by conservative rasterization. We integrate all our novel algorithms, implemented in CUDA, into a GPU NURBS rendering pipeline based on Mesh Shaders and hybrid software/hardware rasterization. Our performance data on a commodity GPU show that the rendering pipeline based on ETER is capable of rendering up to 3.7 million patches (0.25 billion tessellated triangles) in real-time (30FPS). With its advantages in performance, scalability, and visual quality in rendering large-scale NURBS models, a real-time tessellation solution based on ETER can be a powerful alternative or even a potential replacement for the existing pre-tessellation solution in CAD systems.

Temporal Set Inversion for Animated Implicits

We exploit the temporal coherence of closed-form animated implicit surfaces by locally re-evaluating an octree-like discretization of the implicit field only as and where is necessary to rigorously maintain a global error invariant over time, thereby saving resources in static or slowly-evolving areas far from the motion where per-frame updates are not necessary. We treat implicit surface rendering as a special case of the continuous constraint satisfaction problem of set inversion, which seeks preimages of arbitrary sets under vector-valued functions. From this perspective, we formalize a temporally-coherent set inversion algorithm that localizes changes in the field by range-bounding its time derivatives using interval arithmetic. We implement our algorithm on the GPU using persistent thread scheduling and apply it to the scalar case of implicit surface and swept volume rendering where we achieve significant speedups in complex scenes with localized deformations like those found in games and modelling applications where interactivity is required and bounded-error approximation is acceptable.

Real-Time Radiance Fields for Single-Image Portrait View Synthesis

We present a one-shot method to infer and render a photorealistic 3D representation from a single unposed image (e.g., face portrait) in real-time. Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view synthesis via volume rendering. Our method is fast (24 fps) on consumer hardware, and produces higher quality results than strong GAN-inversion baselines that require test-time optimization. To train our triplane encoder pipeline, we use only synthetic data, showing how to distill the knowledge from a pretrained 3D GAN into a feedforward encoder. Technical contributions include a Vision Transformer-based triplane encoder, a camera data augmentation strategy, and a well-designed loss function for synthetic data training. We benchmark against the state-of-the-art methods, demonstrating significant improvements in robustness and image quality in challenging real-world settings. We showcase our results on portraits of faces (FFHQ) and cats (AFHQ), but our algorithm can also be applied in the future to other categories with a 3D-aware image generator.

SESSION: Character Animation: Interaction

Generating Activity Snippets by Learning Human-Scene Interactions

We present an approach to generate virtual activity snippets, which comprise sequenced keyframes of multi-character, multi-object interaction scenarios in 3D environments, by learning from recordings of human-scene interactions. The generation consists of two stages. First, we use a sequential deep graph generative model with a temporal module to iteratively generate keyframe descriptions, which represent abstract interactions using graphs, while preserving spatial-temporal relations through the activities. Second, we devise an optimization framework to instantiate the activity snippets in virtual 3D environments guided by the generated keyframe descriptions. Our approach optimizes the poses of character and object instances encoded by the graph nodes to satisfy the relations and constraints encoded by the graph edges. The instantiation process includes a coarse 2D optimization followed by a fine 3D optimization to effectively explore the complex solution space for placing and posing the instances. Through experiments and a perceptual study, we applied our approach to generate plausible activity snippets under different settings.

GREIL-Crowds: Crowd Simulation with Deep Reinforcement Learning and Examples

Simulating crowds with realistic behaviors is a difficult but very important task for a variety of applications. Quantifying how a person balances between different conflicting criteria such as goal seeking, collision avoidance and moving within a group is not intuitive, especially if we consider that behaviors differ largely between people. Inspired by recent advances in Deep Reinforcement Learning, we propose Guided REinforcement Learning (GREIL) Crowds, a method that learns a model for pedestrian behaviors which is guided by reference crowd data. The model successfully captures behaviors such as goal seeking, being part of consistent groups without the need to define explicit relationships and wandering around seemingly without a specific purpose. Two fundamental concepts are important in achieving these results: (a) the per agent state representation and (b) the reward function. The agent state is a temporal representation of the situation around each agent. The reward function is based on the idea that people try to move in situations/states in which they feel comfortable in. Therefore, in order for agents to stay in a comfortable state space, we first obtain a distribution of states extracted from real crowd data; then we evaluate states based on how much of an outlier they are compared to such a distribution. We demonstrate that our system can capture and simulate many complex and subtle crowd interactions in varied scenarios. Additionally, the proposed method generalizes to unseen situations, generates consistent behaviors and does not suffer from the limitations of other data-driven and reinforcement learning approaches.

SESSION: Making Faces With Neural Avatars

DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance

Emerging Metaverse applications demand accessible, accurate and easy-to-use tools for 3D digital human creations in order to depict different cultures and societies as if in the physical world. Recent large-scale vision-language advances pave the way for novices to conveniently customize 3D content. However, the generated CG-friendly assets still cannot represent the desired facial traits for human characteristics. In this paper, we present Dream-Face, a progressive scheme to generate personalized 3D faces under text guidance. It enables layman users to naturally customize 3D facial assets that are compatible with CG pipelines, with desired shapes, textures and fine-grained animation capabilities. From a text input to describe the facial traits, we first introduce a coarse-to-fine scheme to generate the neutral facial geometry with a unified topology. We employ a selection strategy in the CLIP embedding space to generate coarse geometry, and subsequently optimize both the detailed displacements and normals using Score Distillation Sampling (SDS) from the generic Latent Diffusion Model (LDM). Then, for neutral appearance generation, we introduce a dual-path mechanism, which combines the generic LDM with a novel texture LDM to ensure both the diversity and textural specification in the UV space. We also employ a two-stage optimization to perform SDS in both the latent and image spaces to significantly provide compact priors for fine-grained synthesis. It also enables learning the mapping from the compact latent space into physically-based textures (diffuse albedo, specular intensity, normal maps, etc.). Our generated neutral assets naturally support blendshapes-based facial animations, thanks to the unified geometric topology. We further improve the animation ability with personalized deformation characteristics. To this end, we learn the universal expression prior in a latent space with neutral asset conditioning using the cross-identity hypernetwork, we subsequently train a neural facial tracker from video input space into the pre-trained expression space for personalized fine-grained animation. Extensive qualitative and quantitative experiments validate the effectiveness and generalizability of DreamFace. Notably, DreamFace can generate realistic 3D facial assets with physically-based rendering quality and rich animation ability from video footage, even for fashion icons or exotic characters in cartoons and fiction movies.

SESSION: Environment Rendering: NeRFs on Earth

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

Virtual Mirrors: Non-Line-of-Sight Imaging Beyond the Third Bounce

Non-line-of-sight (NLOS) imaging methods are capable of reconstructing complex scenes that are not visible to an observer using indirect illumination. However, they assume only third-bounce illumination, so they are currently limited to single-corner configurations, and present limited visibility when imaging surfaces at certain orientations. To reason about and tackle these limitations, we make the key observation that planar diffuse surfaces behave specularly at wavelengths used in the computational wave-based NLOS imaging domain. We call such surfaces virtual mirrors. We leverage this observation to expand the capabilities of NLOS imaging using illumination beyond the third bounce, addressing two problems: imaging single-corner objects at limited visibility angles, and imaging objects hidden behind two corners. To image objects at limited visibility angles, we first analyze the reflections of the known illuminated point on surfaces of the scene as an estimator of the position and orientation of objects with limited visibility. We then image those limited visibility objects by computationally building secondary apertures at other surfaces that observe the target object from a direct visibility perspective. Beyond single-corner NLOS imaging, we exploit the specular behavior of virtual mirrors to image objects hidden behind a second corner by imaging the space behind such virtual mirrors, where the mirror image of objects hidden around two corners is formed. No specular surfaces were involved in the making of this paper.

SESSION: Fabulous Fabrication: From Knitting to Circuits

Dense, Interlocking-Free and Scalable Spectral Packing of Generic 3D Objects

Packing 3D objects into a known container is a very common task in many industries such as packaging, transportation, and manufacturing. This important problem is known to be NP-hard and even approximate solutions are challenging. This is due to the difficulty of handling interactions between objects with arbitrary 3D geometries and a vast combinatorial search space. Moreover, the packing must be interlocking-free for real-world applications. In this work, we first introduce a novel packing algorithm to search for placement locations given an object. Our method leverages a discrete voxel representation. We formulate collisions between objects as correlations of functions computed efficiently using Fast Fourier Transform (FFT). To determine the best placements, we utilize a novel cost function, which is also computed efficiently using FFT. Finally, we show how interlocking detection and correction can be addressed in the same framework resulting in interlocking-free packing. We propose a challenging benchmark with thousands of 3D objects to evaluate our algorithm. Our method demonstrates state-of-the-art performance on the benchmark when compared to existing methods in both density and speed.

PCBend: Light Up Your 3D Shapes With Foldable Circuit Boards

We propose a computational design approach for covering a surface with individually addressable RGB LEDs, effectively forming a low-resolution surface screen. To achieve a low-cost and scalable approach, we propose creating designs from flat PCB panels bent in-place along the surface of a 3D printed core. Working with standard rigid PCBs enables the use of established PCB manufacturing services, allowing the fabrication of designs with several hundred LEDs.

Our approach optimizes the PCB geometry for folding, and then jointly optimizes the LED packing, circuit and routing, solving a challenging layout problem under strict manufacturing requirements. Unlike paper, PCBs cannot bend beyond a certain point without breaking. Therefore, we introduce parametric cut patterns acting as hinges, designed to allow bending while remaining compact. To tackle the joint optimization of placement, circuit and routing, we propose a specialized algorithm that splits the global problem into one sub-problem per triangle, which is then individually solved.

Our technique generates PCB blueprints in a completely automated way. After being fabricated by a PCB manufacturing service, the boards are bent and glued by the user onto the 3D printed support. We demonstrate our technique on a range of physical models and virtual examples, creating intricate surface light patterns from hundreds of LEDs.

The code and data for this paper are available at https://github.com/mfremer/pcbend.

Semantics and Scheduling for Machine Knitting Compilers

Machine knitting is a well-established fabrication technique for complex soft objects, and both companies and researchers have developed tools for generating machine knitting patterns. However, existing representations for machine knitted objects are incomplete (do not cover the complete domain of machine knittable objects) or overly specific (do not account for symmetries and equivalences among knitting instruction sequences). This makes it difficult to define correctness in machine knitting, let alone verify the correctness of a given program or program transformation. The major contribution of this work is a formal semantics for knitout, a low-level Domain Specific Language for knitting machines. We accomplish this by using what we call the fenced tangle, which extends concepts from knot theory to allow for a mathematical definition of knitting program equivalence that matches the intuition behind knit objects. Finally, using this formal representation, we prove the correctness of a sequence of rewrite rules; and demonstrate how these rewrite rules can form the foundation for higher-level tasks such as compiling a program for a specific machine and optimizing for time/reliability, all while provably generating the same knit object under our proposed semantics. By establishing formal definitions of correctness, this work provides a strong foundation for compiling and optimizing knit programs.

A Temporal Coherent Topology Optimization Approach for Assembly Planning of Bespoke Frame Structures

We present a computational framework for planning the assembly sequence of bespoke frame structures. Frame structures are one of the most commonly used structural systems in modern architecture, providing resistance to gravitational and external loads. Building frame structures requires traversing through several partially built states. If the assembly sequence is planned poorly, these partial assemblies can exhibit substantial deformation due to self-weight, slowing down or jeopardizing the assembly process. Finding a good assembly sequence that minimizes intermediate deformations is an interesting yet challenging combinatorial problem that is usually solved by heuristic search algorithms. In this paper, we propose a new optimization-based approach that models sequence planning using a series of topology optimization problems. Our key insight is that enforcing temporal coherent constraints in the topology optimization can lead to sub-structures with small deformations while staying consistent with each other to form an assembly sequence. We benchmark our algorithm on a large data set and show improvements in both performance and computational time over greedy search algorithms. In addition, we demonstrate that our algorithm can be extended to handle assembly with static or dynamic supports. We further validate our approach by generating a series of results in multiple scales, including a real-world prototype with a mixed reality assistant using our computed sequence and a simulated example demonstrating a multi-robot assembly application.

SESSION: Making Contact: Simulating and Detecting Collisions

In-Timestep Remeshing for Contacting Elastodynamics

We propose In-Timestep Remeshing, a fully coupled, adaptive meshing algorithm for contacting elastodynamics where remeshing steps are tightly integrated, implicitly, within the timestep solve. Our algorithm refines and coarsens the domain automatically by measuring physical energy changes within each ongoing timestep solve. This provides consistent, degree-of-freedom-efficient, productive remeshing that, by construction, is physics-aware and so avoids the errors, over-refinements, artifacts, per-example hand-tuning, and instabilities commonly encountered when remeshing with timestepping methods. Our in-timestep computation then ensures that each simulation step's output is both a converged stable solution on the updated mesh and a temporally consistent trajectory with respect to the model and solution of the last timestep. At the same time, the output is guaranteed safe (intersection- and inversion-free) across all operations. We demonstrate applications across a wide range of extreme stress tests with challenging contacts, sharp geometries, extreme compressions, large timesteps, and wide material stiffness ranges - all scenarios well-appreciated to challenge existing remeshing methods.

Shortest Path to Boundary for Self-Intersecting Meshes

We introduce a method for efficiently computing the exact shortest path to the boundary of a mesh from a given internal point in the presence of self-intersections. We provide a formal definition of shortest boundary paths for self-intersecting objects and present a robust algorithm for computing the actual shortest boundary path. The resulting method offers an effective solution for collision and self-collision handling while simulating deformable volumetric objects, using fast simulation techniques that provide no guarantees on collision resolution. Our evaluation includes complex self-collision scenarios with a large number of active contacts, showing that our method can successfully handle them by introducing a relatively minor computational overhead.

P2M: A Fast Solver for Querying Distance from Point to Mesh Surface

Most of the existing point-to-mesh distance query solvers, such as Proximity Query Package (PQP), Embree and Fast Closest Point Query (FCPW), are based on bounding volume hierarchy (BVH). The hierarchical organizational structure enables one to eliminate the vast majority of triangles that do not help find the closest point. In this paper, we develop a totally different algorithmic paradigm, named P2M, to speed up point-to-mesh distance queries. Our original intention is to precompute a KD tree (KDT) of mesh vertices to approximately encode the geometry of a mesh surface containing vertices, edges and faces. However, it is very likely that the closest primitive to the query point is an edge e (resp., a face f), but the KDT reports a mesh vertex υ instead. We call υ an interceptor of e (resp., f). The main contribution of this paper is to invent a simple yet effective interception inspection rule and an efficient flooding interception inspection algorithm for quickly finding out all the interception pairs. Once the KDT and the interception table are precomputed, the query stage proceeds by first searching the KDT and then looking up the interception table to retrieve the closest geometric primitive. Statistics show that our query algorithm runs many times faster than the state-of-the-art solvers.

SESSION: Neural Image Generation and Editing

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt. Moreover, we find that in some cases the model also fails to correctly bind attributes (e.g., colors) to their corresponding subjects. To help mitigate these failure cases, we introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness of the generated images. Using an attention-based formulation of GSN, dubbed Attend-and-Excite, we guide the model to refine the cross-attention units to attend to all subject tokens in the text prompt and strengthen --- or excite --- their activations, encouraging the model to generate all subjects described in the text prompt. We compare our approach to alternative approaches and demonstrate that it conveys the desired concepts more faithfully across a range of text prompts. Code is available at our project page: https://attendandexcite.github.io/Attend-and-Excite/.

Blended Latent Diffusion

The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models has finally enabled text-based interfaces for creating and editing images. Handling generic images requires a diverse underlying generative model, hence the latest works utilize diffusion models, which were shown to surpass GANs in terms of diversity. One major drawback of diffusion models, however, is their relatively slow inference time. In this paper, we present an accelerated solution to the task of local text-driven editing of generic images, where the desired edits are confined to a user-provided mask. Our solution leverages a text-to-image Latent Diffusion Model (LDM), which speeds up diffusion by operating in a lower-dimensional latent space and eliminating the need for resource-intensive CLIP gradient calculations at each diffusion step. We first enable LDM to perform local image edits by blending the latents at each step, similarly to Blended Diffusion. Next we propose an optimization-based solution for the inherent inability of LDM to accurately reconstruct images. Finally, we address the scenario of performing local edits using thin masks. We evaluate our method against the available baselines both qualitatively and quantitatively and demonstrate that in addition to being faster, it produces more precise results.

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle with lengthy training times, high storage requirements or loss of identity. To overcome these limitations, we propose an encoder-based domain-tuning approach. Our key insight is that by underfitting on a large set of concepts from a given domain, we can improve generalization and create a model that is more amenable to quickly adding novel concepts from the same domain. Specifically, we employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain, e.g. a specific face, and learns to map it into a word-embedding representing the concept. Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively injest additional concepts. Together, these components are used to guide the learning of unseen concepts, allowing us to personalize a model using only a single image and as few as 5 training steps --- accelerating personalization from dozens of minutes to seconds, while preserving quality.

Code and trained encoders will be available at our project page.

Word-As-Image for Semantic Typography

A word-as-image is a semantic typography technique where a word illustration presents a visualization of the meaning of the word, while also preserving its readability. We present a method to create word-as-image illustrations automatically. This task is highly challenging as it requires semantic understanding of the word and a creative idea of where and how to depict these semantics in a visually pleasing and legible manner. We rely on the remarkable ability of recent large pretrained language-vision models to distill textual concepts visually. We target simple, concise, black-and-white designs that convey the semantics clearly. We deliberately do not change the color or texture of the letters and do not use embellishments. Our method optimizes the outline of each letter to convey the desired concept, guided by a pretrained Stable Diffusion model. We incorporate additional loss terms to ensure the legibility of the text and the preservation of the style of the font. We show high quality and engaging results on numerous examples and compare to alternative techniques. Code and demo will be available at our project page.

SESSION: Material Acquisition

Towards Material Digitization with a Dual-scale Optical System

Existing devices for measuring material appearance in spatially-varying samples are limited to a single scale, either micro or mesoscopic. This is a practical limitation when the material has a complex multi-scale structure. In this paper, we present a system and methods to digitize materials at two scales, designed to include high-resolution data in spatially-varying representations at larger scales. We design and build a hemispherical light dome able to digitize flat material samples up to 11x11cm. We estimate geometric properties, anisotropic reflectance and transmittance at the microscopic level using polarized directional lighting with a single orthogonal camera. Then, we propagate this structured information to the mesoscale, using a neural network trained with the data acquired by the device and image-to-image translation methods. To maximize the compatibility of our digitization, we leverage standard BSDF models commonly adopted in the industry. Through extensive experiments, we demonstrate the precision of our device and the quality of our digitization process using a set of challenging real-world material samples and validation scenes. Further, we demonstrate the optical resolution and potential of our device for acquiring more complex material representations by capturing microscopic attributes which affect the global appearance: we characterize the properties of textile materials such as the yarn twist or the shape of individual fly-out fibers. We also release the SEDDIDOME dataset of materials, including raw data captured by the machine and optimized parameteres.

End-to-end Procedural Material Capture with Proxy-Free Mixed-Integer Optimization

Node-graph-based procedural materials are vital to 3D content creation within the computer graphics industry. Leveraging the expressive representation of procedural materials, artists can effortlessly generate diverse appearances by altering the graph structure or node parameters. However, manually reproducing a specific appearance is a challenging task that demands extensive domain knowledge and labor. Previous research has sought to automate this process by converting artist-created material graphs into differentiable programs and optimizing node parameters against a photographed material appearance using gradient descent. These methods involve implementing differentiable filter nodes [Shi et al. 2020] and training differentiable neural proxies for generator nodes to optimize continuous and discrete node parameters [Hu et al. 2022a] jointly. Nevertheless, Neural Proxies exhibits critical limitations, such as long training times, inaccuracies, fixed resolutions, and confined parameter ranges, which hinder their scalability towards the broad spectrum of production-grade material graphs. These constraints fundamentally stem from the absence of faithful and efficient implementations of generic noise and pattern generator nodes, both differentiable and non-differentiable. Such deficiency prevents the direct optimization of continuous and discrete generator node parameters without relying on surrogate models.

We present Diffmat v2, an improved differentiable procedural material library, along with a fully-automated, end-to-end procedural material capture framework that combines gradient-based optimization and gradient-free parameter search to match existing production-grade procedural materials against user-taken flash photos. Diffmat v2 expands the range of differentiable material graph nodes in Diffmat [Shi et al. 2020] by adding generic noise/pattern generator nodes and user-customizable per-pixel filter nodes. This allows for the complete translation and optimization of procedural materials across various categories without the need for external proprietary tools or pre-cached noise patterns. Consequently, our method can capture a considerably broader array of materials, encompassing those with highly regular or stochastic geometries. We demonstrate that our end-to-end approach yields a closer match to the target than MATch [Shi et al. 2020] and Neural Proxies [Hu et al. 2022a] when starting from initially unmatched continuous and discrete parameters.

Materialistic: Selecting Similar Materials in Images

Separating an image into meaningful underlying components is a crucial first step for both editing and understanding images. We present a method capable of selecting the regions of a photograph exhibiting the same material as an artist-chosen area. Our proposed approach is robust to shading, specular highlights, and cast shadows, enabling selection in real images. As we do not rely on semantic segmentation (different woods or metal should not be selected together), we formulate the problem as a similarity-based grouping problem based on a user-provided image location. In particular, we propose to leverage the unsupervised DINO [Caron et al. 2021] features coupled with a proposed Cross-Similarity Feature Weighting module and an MLP head to extract material similarities in an image. We train our model on a new synthetic image dataset, that we release. We show that our method generalizes well to real-world images. We carefully analyze our model's behavior on varying material properties and lighting. Additionally, we evaluate it against a hand-annotated benchmark of 50 real photographs. We further demonstrate our model on a set of applications, including material editing, in-video selection, and retrieval of object photographs with similar materials.

SESSION: Deep Geometric Learning

OctFormer: Octree-based Transformers for 3D Point Clouds

We propose octree-based transformers, named OctFormer, for 3D point cloud learning. OctFormer can not only serve as a general and effective backbone for 3D point cloud segmentation and object detection but also have linear complexity and is scalable for large-scale point clouds. The key challenge in applying transformers to point clouds is reducing the quadratic, thus overwhelming, computation complexity of attentions. To combat this issue, several works divide point clouds into non-overlapping windows and constrain attentions in each local window. However, the point number in each window varies greatly, impeding the efficient execution on GPU. Observing that attentions are robust to the shapes of local windows, we propose a novel octree attention, which leverages sorted shuffled keys of octrees to partition point clouds into local windows containing a fixed number of points while permitting shapes of windows to change freely. And we also introduce dilated octree attention to expand the receptive field further. Our octree attention can be implemented in 10 lines of code with open-sourced libraries and runs 17 times faster than other point cloud attentions when the point number exceeds 200k. Built upon the octree attention, OctFormer can be easily scaled up and achieves state-of-the-art performances on a series of 3D semantic segmentation and 3D object detection benchmarks, surpassing previous sparse-voxel-based CNNs and point cloud transformers in terms of both efficiency and effectiveness. Notably, on the challenging ScanNet200 dataset, OctFormer outperforms sparse-voxel-based CNNs by 7.3 in mIoU. Our code and trained models are available at https://wang-ps.github.io/octformer.

Dictionary Fields: Learning a Neural Basis Decomposition

We present Dictionary Fields, a novel neural representation which decomposes a signal into a product of factors, each represented by a classical or neural field representation, operating on transformed input coordinates. More specifically, we factorize a signal into a coefficient field and a basis field, and exploit periodic coordinate transformations to apply the same basis functions across multiple locations and scales. Our experiments show that Dictionary Fields lead to improvements in approximation quality, compactness, and training time when compared to previous fast reconstruction methods. Experimentally, our representation achieves better image approximation quality on 2D image regression tasks, higher geometric quality when reconstructing 3D signed distance fields, and higher compactness for radiance field reconstruction tasks. Furthermore, Dictionary Fields enable generalization to unseen images/3D scenes by sharing bases across signals during training which greatly benefits use cases such as image regression from partial observations and few-shot radiance field reconstruction.

ScanBot: Autonomous Reconstruction via Deep Reinforcement Learning

Autoscanning of an unknown environment is the key to many AR/VR and robotic applications. However, autonomous reconstruction with both high efficiency and quality remains a challenging problem. In this work, we propose a reconstruction-oriented autoscanning approach, called ScanBot, which utilizes hierarchical deep reinforcement learning techniques for global region-of-interest (ROI) planning to improve the scanning efficiency and local next-best-view (NBV) planning to enhance the reconstruction quality. Given the partially reconstructed scene, the global policy designates an ROI with insufficient exploration or reconstruction. The local policy is then applied to refine the reconstruction quality of objects in this region by planning and scanning a series of NBVs. A novel mixed 2D-3D representation is designed for these policies, where a 2D quality map with tailored quality channels encoding the scanning progress is consumed by the global policy, and a coarse-to-fine 3D volumetric representation that embodies both local environment and object completeness is fed to the local policy. These two policies iterate until the whole scene has been completely explored and scanned. To speed up the learning of complex environmental dynamics and enhance the agent's memory for spatial-temporal inference, we further introduce two novel auxiliary learning tasks to guide the training of our global policy. Thorough evaluations and comparisons are carried out to show the feasibility of our proposed approach and its advantages over previous methods. Code and data are available at https://github.com/HezhiCao/Scanbot.

SESSION: NeRFs for Avatars

AvatarReX: Real-time Expressive Full-body Avatars

We present AvatarReX, a new method for learning NeRF-based full-body avatars from video data. The learnt avatar not only provides expressive control of the body, hands and the face together, but also supports real-time animation and rendering. To this end, we propose a compositional avatar representation, where the body, hands and the face are separately modeled in a way that the structural prior from parametric mesh templates is properly utilized without compromising representation flexibility. Furthermore, we disentangle the geometry and appearance for each part. With these technical designs, we propose a dedicated deferred rendering pipeline, which can be executed at a real-time framerate to synthesize high-quality free-view images. The disentanglement of geometry and appearance also allows us to design a two-pass training strategy that combines volume rendering and surface rendering for network training. In this way, patch-level supervision can be applied to force the network to learn sharp appearance details on the basis of geometry estimation. Overall, our method enables automatic construction of expressive full-body avatars with real-time rendering capability, and can generate photo-realistic images with dynamic details for novel body motions and facial expressions.

SketchFaceNeRF: Sketch-based Facial Generation and Editing in Neural Radiance Fields

Realistic 3D facial generation based on Neural Radiance Fields (NeRFs) from 2D sketches benefits various applications. Despite the high realism of free-view rendering results of NeRFs, it is tedious and difficult for artists to achieve detailed 3D control and manipulation. Meanwhile, due to its conciseness and expressiveness, sketching has been widely used for 2D facial image generation and editing. Applying sketching to NeRFs is challenging due to the inherent uncertainty for 3D generation with 2D constraints, a significant gap in content richness when generating faces from sparse sketches, and potential inconsistencies for sequential multi-view editing given only 2D sketch inputs. To address these challenges, we present SketchFaceNeRF, a novel sketch-based 3D facial NeRF generation and editing method, to produce free-view photo-realistic images. To solve the challenge of sketch sparsity, we introduce a Sketch Tri-plane Prediction net to first inject the appearance into sketches, thus generating features given reference images to allow color and texture control. Such features are then lifted into compact 3D tri-planes to supplement the absent 3D information, which is important for improving robustness and faithfulness. However, during editing, consistency for unseen or unedited 3D regions is difficult to maintain due to limited spatial hints in sketches. We thus adopt a Mask Fusion module to transform free-view 2D masks (inferred from sketch editing operations) into the tri-plane space as 3D masks, which guide the fusion of the original and sketch-based generated faces to synthesize edited faces. We further design an optimization approach with a novel space loss to improve identity retention and editing faithfulness. Our pipeline enables users to flexibly manipulate faces from different viewpoints in 3D space, easily designing desirable facial models. Extensive experiments validate that our approach is superior to the state-of-the-art 2D sketch-based image generation and editing approaches in realism and faithfulness.

HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion

Representing human performance at high-fidelity is an essential building block in diverse applications, such as film production, computer games or videoconferencing. To close the gap to production-level quality, we introduce HumanRF1, a 4D dynamic neural scene representation that captures full-body appearance in motion from multi-view video input, and enables playback from novel, unseen viewpoints. Our novel representation acts as a dynamic video encoding that captures fine details at high compression rates by factorizing space-time into a temporal matrix-vector decomposition. This allows us to obtain temporally coherent reconstructions of human actors for long sequences, while representing high-resolution details even in the context of challenging motion. While most research focuses on synthesizing at resolutions of 4MP or lower, we address the challenge of operating at 12MP. To this end, we introduce ActorsHQ, a novel multi-view dataset that provides 12MP footage from 160 cameras for 16 sequences with high-fidelity, per-frame mesh reconstructions2. We demonstrate challenges that emerge from using such high-resolution data and show that our newly introduced HumanRF effectively leverages this data, making a significant step towards production-level quality novel view synthesis.

NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads

We focus on reconstructing high-fidelity radiance fields of human heads, capturing their animations over time, and synthesizing re-renderings from novel viewpoints at arbitrary time steps. To this end, we propose a new multi-view capture setup composed of 16 calibrated machine vision cameras that record time-synchronized images at 7.1 MP resolution and 73 frames per second. With our setup, we collect a new dataset of over 4700 high-resolution, high-framerate sequences of more than 220 human heads, from which we introduce a new human head reconstruction benchmark1. The recorded sequences cover a wide range of facial dynamics, including head motions, natural expressions, emotions, and spoken language. In order to reconstruct high-fidelity human heads, we propose Dynamic Neural Radiance Fields using Hash Ensembles (NeRSemble). We represent scene dynamics by combining a deformation field and an ensemble of 3D multi-resolution hash encodings. The deformation field allows for precise modeling of simple scene movements, while the ensemble of hash encodings helps to represent complex dynamics. As a result, we obtain radiance field representations of human heads that capture motion over time and facilitate re-rendering of arbitrary novel viewpoints. In a series of experiments, we explore the design choices of our method and demonstrate that our approach outperforms state-of-the-art dynamic radiance field approaches by a significant margin.