Nearly every commodity imaging system we directly interact with, or indirectly rely on, leverages power efficient, application-adjustable black-box hardware image signal processing (ISPs) units, running either in dedicated hardware blocks, or as proprietary software modules on programmable hardware. The configuration parameters of these black-box ISPs often have complex interactions with the output image, and must be adjusted prior to deployment according to application-specific quality and performance metrics. Today, this search is commonly performed manually by "golden eye" experts or algorithm developers leveraging domain expertise. We present a fully automatic system to optimize the parameters of black-box hardware and software image processing pipelines according to any arbitrary (i.e., application-specific) metric. We leverage a differentiable mapping between the configuration space and evaluation metrics, parameterized by a convolutional neural network that we train in an end-to-end fashion with imaging hardware in-the-loop. Unlike prior art, our differentiable proxies allow for high-dimension parameter search with stochastic first-order optimizers, without explicitly modeling any lower-level image processing transformations. As such, we can efficiently optimize black-box image processing pipelines for a variety of imaging applications, reducing application-specific configuration times from months to hours. Our optimization method is fully automatic, even with black-box hardware in the loop. We validate our method on experimental data for real-time display applications, object detection, and extreme low-light imaging. The proposed approach outperforms manual search qualitatively and quantitatively for all domain-specific applications tested. When applied to traditional denoisers, we demonstrate that---just by changing hyperparameters---traditional algorithms can outperform recent deep learning methods by a substantial margin on recent benchmarks.
Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to-noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multiframe super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google's flagship phone.
We present a practical and robust deep learning solution for capturing and rendering novel views of complex real world scenes for virtual exploration. Previous approaches either require intractably dense view sampling or provide little to no guidance for how users should sample views of a scene to reliably render high-quality novel views. Instead, we propose an algorithm for view synthesis from an irregular grid of sampled views that first expands each sampled view into a local light field via a multiplane image (MPI) scene representation, then renders novel views by blending adjacent local light fields. We extend traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using our algorithm. In practice, we apply this bound to capture and render views of real world scenes that achieve the perceptual quality of Nyquist rate view sampling while using up to 4000X fewer views. We demonstrate our approach's practicality with an augmented reality smart-phone app that guides users to capture input images of a scene and viewers that enable realtime virtual exploration on desktop and mobile platforms.
In cinema, large camera lenses create beautiful shallow depth of field (DOF), but make focusing difficult and expensive. Accurate cinema focus usually relies on a script and a person to control focus in realtime. Casual videographers often crave cinematic focus, but fail to achieve it. We either sacrifice shallow DOF, as in smartphone videos; or we struggle to deliver accurate focus, as in videos from larger cameras. This paper is about a new approach in the pursuit of cinematic focus for casual videography. We present a system that synthetically renders refocusable video from a deep DOF video shot with a smartphone, and analyzes future video frames to deliver context-aware autofocus for the current frame. To create refocusable video, we extend recent machine learning methods designed for still photography, contributing a new dataset for machine training, a rendering model better suited to cinema focus, and a filtering solution for temporal coherence. To choose focus accurately for each frame, we demonstrate autofocus that looks at upcoming video frames and applies AI-assist modules such as motion, face, audio and saliency detection. We also show that autofocus benefits from machine learning and a large-scale video dataset with focus annotation, where we use our RVR-LAAF GUI to create this sizable dataset efficiently. We deliver, for example, a shallow DOF video where the autofocus transitions onto each person before she begins to speak. This is impossible for conventional camera autofocus because it would require seeing into the future.
Representing smooth geometric shapes by polyhedral meshes can be quite difficult in situations where the variation of edges and face normals is prominently visible. Especially problematic are saddle-shaped areas of the mesh, where typical vertices with six incident edges are ill suited to emulate the more symmetric smooth situation. The importance of a faithful discrete representation is apparent for certain special applications like freeform architecture, but is also relevant for simulation and geometric computing.
In this paper we discuss what exactly is meant by a good representation of saddle points, and how this requirement is stronger than a good approximation of a surface plus its normals. We characterize good saddles in terms of the normal pyramid in a vertex.
We show several ways to design meshes whose normals enjoy small variation (implying good saddle points). For this purpose we define a discrete energy of polyhedral surfaces, which is related to a certain total absolute curvature of smooth surfaces. We discuss the minimizers of both functionals and in particular show that the discrete energy is minimal not for triangle meshes, but for principal quad meshes. We demonstrate our procedures for optimization and interactive design by means of meshes intended for architectural design.
Tutte embedding is one of the most common building blocks in geometry processing algorithms due to its simplicity and provable guarantees. Although provably correct in infinite precision arithmetic, it fails in challenging cases when implemented using floating point arithmetic, largely due to the induced exponential area changes.
We propose Progressive Embedding, with similar theoretical guarantees to Tutte embedding, but more resilient to the rounding error of floating point arithmetic. Inspired by progressive meshes, we collapse edges on an invalid embedding to a valid, simplified mesh, then insert points back while maintaining validity. We demonstrate the robustness of our method by computing embeddings for a large collection of disk topology meshes.
By combining our robust embedding with a variant of the matchmaker algorithm, we propose a general algorithm for the problem of mapping multiply connected domains with arbitrary hard constraints to the plane, with applications in texture mapping and remeshing.
We present a novel algorithm to refine an input atlas with bounded packing efficiency. Central to this method is the use of the axis-aligned structure that converts the general polygon packing problem to a rectangle packing problem, which is easier to achieve high packing efficiency. Given a parameterized mesh with no flipped triangles, we propose a new angle-driven deformation strategy to transform it into a set of axis-aligned charts, which can be decomposed into rectangles by the motorcycle graph algorithm. Since motorcycle graphs are not unique, we select the one balancing the trade-off between the packing efficiency and chart boundary length, while maintaining bounded packing efficiency. The axis-aligned chart often contains greater distortion than the input, so we try to reduce the distortion while bounding the packing efficiency and retaining bijection. We demonstrate the efficacy of our method on a data set containing over five thousand complex models. For all models, our method is able to produce packed atlases with bounded packing efficiency; for example, when the packing efficiency bound is set to 80%, we elongate the boundary length by an average of 78.7% and increase the distortion by an average of 0.0533%. Compared to state-of-the-art methods, our method is much faster and achieves greater packing efficiency.
We study discrete geodesic foliations of surfaces---foliations whose leaves are all approximately geodesic curves---and develop several new variational algorithms for computing such foliations. Our key insight is a relaxation of vector field integrability in the discrete setting, which allows us to optimize for curl-free unit vector fields that remain well-defined near singularities and robustly recover a scalar function whose gradient is well aligned to these fields. We then connect the physics governing surfaces woven out of thin ribbons to the geometry of geodesic foliations, and present a design and fabrication pipeline for approximating surfaces of arbitrary geometry and topology by triaxially-woven structures, where the ribbon layout is determined by a geodesic foliation on a sixfold branched cover of the input surface. We validate the effectiveness of our pipeline on a variety of simulated and fabricated woven designs, including an example for readers to try at home.
Probabilistic distribution models like Gaussian mixtures have shown great potential for improving both the quality and speed of several geometric operators. This is largely due to their ability to model large fuzzy data using only a reduced set of atomic distributions, allowing for large compression rates at minimal information loss. We introduce a new surface model that utilizes these qualities of Gaussian mixtures for the definition and control of a parametric smooth surface. Our approach is based on an enriched mesh data structure, which describes the probability distribution of spatial surface locations around each vertex via a Gaussian covariance matrix. By incorporating this additional covariance information, we show how to define a smooth surface via a nonlinear probabilistic subdivision operator based on products of Gaussians, which is able to capture rich details at fixed control mesh resolution. This entails new applications in surface reconstruction, modeling, and geometric compression.
While bidirectional path tracing is a well-established light transport algorithm, many samples are required to obtain high-quality results for specular-diffuse-glossy or glossy-diffuse-glossy reflections especially when they are highly glossy. To improve the efficiency for such light path configurations, we propose a hierarchical Russian roulette technique for vertex connections. Our technique accelerates a huge number of Russian roulette operations according to an approximate scattering lobe at an eye-subpath vertex for many cached light-subpath vertices. Our method dramatically reduces the number of random number generations needed for Russian roulette by introducing a hierarchical rejection algorithm which assigns random numbers in a top-down fashion. To efficiently reject light vertices in each hierarchy, we also introduce an efficient approximation of anisotropic scattering lobes used for the probability of Russian roulette. Our technique is easy to integrate into some existing bidirectional path tracing-based algorithms which cache light-subpath vertices (e.g., probabilistic connections, and vertex connection and merging). In addition, unlike existing many-light methods, our method does not restrict multiple importance sampling strategies thanks to the simplicity of Russian roulette. Although the proposed technique does not support perfectly specular surfaces, it significantly improves the efficiency for caustics reflected on extremely glossy surfaces in an unbiased fashion.
Multiple Importance Sampling (MIS) is a key technique for achieving robustness of Monte Carlo estimators in computer graphics and other fields. We derive optimal weighting functions for MIS that provably minimize the variance of an MIS estimator, given a set of sampling techniques. We show that the resulting variance reduction over the balance heuristic can be higher than predicted by the variance bounds derived by Veach and Guibas, who assumed only non-negative weights in their proof. We theoretically analyze the variance of the optimal MIS weights and show the relation to the variance of the balance heuristic. Furthermore, we establish a connection between the new weighting functions and control variates as previously applied to mixture sampling. We apply the new optimal weights to integration problems in light transport and show that they allow for new design considerations when choosing the appropriate sampling techniques for a given integration problem.
During the last decade, we have been witnessing the continued development of new time-of-flight imaging devices, and their increased use in numerous and varied applications. However, physics-based rendering techniques that can accurately simulate these devices are still lacking: while existing algorithms are adequate for certain tasks, such as simulating transient cameras, they are very inefficient for simulating time-gated cameras because of the large number of wasted path samples. We take steps towards addressing these deficiencies, by introducing a procedure for efficiently sampling paths with a predetermined length, and incorporating it within rendering frameworks tailored towards simulating time-gated imaging. We use our open-source implementation of the above to empirically demonstrate improved rendering performance in a variety of applications, including simulating proximity sensors, imaging through occlusions, depth-selective cameras, transient imaging in dynamic scenes, and non-line-of-sight imaging.
We present a Monte Carlo rendering framework for the physically-accurate simulation of speckle patterns arising from volumetric scattering of coherent waves. These noise-like patterns are characterized by strong statistical properties, such as the so-called memory effect. These properties are at the core of imaging techniques for applications as diverse as tissue imaging, motion tracking, and non-line-of-sight imaging. Our rendering framework can replicate these properties computationally, in a way that is orders of magnitude more efficient than alternatives based on directly solving the wave equations. At the core of our framework is a path-space formulation for the covariance of speckle patterns arising from a scattering volume, which we derive from first principles. We use this formulation to develop two Monte Carlo rendering algorithms, for computing speckle covariance as well as directly speckle fields. While approaches based on wave equation solvers require knowing the microscopic position of wavelength-sized scatterers, our approach takes as input only bulk parameters describing the statistical distribution of these scatterers inside a volume. We validate the accuracy of our framework by comparing against speckle patterns simulated using wave equation solvers, use it to simulate memory effect observations that were previously only possible through lab measurements, and demonstrate its applicability for computational imaging tasks.
Monte-Carlo Renderers must generate many color samples to produce a noise-free image, and for each of those, they must evaluate complex mathematical models representing the appearance of the objects in the scene. These models are usually in the form of shaders: Small programs that are executed during rendering in order to compute a value for the current sample.
Renderers often compile and optimize shaders just before rendering, taking advantage of the knowledge of the scene. In principle, the entire renderer could benefit from a-priori code generation. For instance, scheduling can take advantage of the knowledge of the scene in order to maximize hardware usage. However, writing such a configurable renderer eventually means writing a compiler that translates a scene description into machine code.
In this paper, we present a framework that allows generating entire renderers for CPUs and GPUs without having to write a dedicated compiler: First, we provide a rendering library in a functional/imperative language that elegantly abstracts the individual rendering concepts using higher-order functions. Second, we use partial evaluation to combine and specialize the individual components of a renderer according to a particular scene.
Our results show that the renderers we generate outperform equivalent high-performance implementations written with state-of-the-art ray tracing libraries on the CPU and GPU.
We propose a stretch-sensing soft glove to interactively capture hand poses with high accuracy and without requiring an external optical setup. We demonstrate how our device can be fabricated and calibrated at low cost, using simple tools available in most fabrication labs. To reconstruct the pose from the capacitive sensors embedded in the glove, we propose a deep network architecture that exploits the spatial layout of the sensor itself. The network is trained only once, using an inexpensive off-the-shelf hand pose reconstruction system to gather the training data. The per-user calibration is then performed on-the-fly using only the glove. The glove's capabilities are demonstrated in a series of ablative experiments, exploring different models and calibration methods. Comparing against commercial data gloves, we achieve a 35% improvement in reconstruction accuracy.
Hybrid unmanned aerial vehicles (UAV) combine advantages of multicopters and fixed-wing planes: vertical take-off, landing, and low energy use. However, hybrid UAVs are rarely used because controller design is challenging due to its complex, mixed dynamics. In this paper, we propose a method to automate this design process by training a mode-free, model-agnostic neural network controller for hybrid UAVs. We present a neural network controller design with a novel error convolution input trained by reinforcement learning. Our controller exhibits two key features: First, it does not distinguish among flying modes, and the same controller structure can be used for copters with various dynamics. Second, our controller works for real models without any additional parameter tuning process, closing the gap between virtual simulation and real fabrication. We demonstrate the efficacy of the proposed controller both in simulation and in our custom-built hybrid UAVs (Figure 1, 8). The experiments show that the controller is robust to exploit the complex dynamics when both rotors and wings are active in flight tests.
Chain reaction contraptions, commonly referred to as Rube Goldberg machines, achieve simple tasks in an intentionally complex fashion via a cascading sequence of events. They are fun, engaging and satisfying to watch. Physically realizing them, however, involves hours or even days of manual trial-and-error effort. The main difficulties lie in predicting failure factors over long chains of events and robustly enforcing an expected causality between parallel chains, especially under perturbations of the layout. We present a computational framework to help design the layout of such contraptions by optimizing their robustness to possible assembly errors. Inspired by the active learning paradigm in machine learning, we propose a generic sampling-based method to progressively approximate the success probability distribution of a given scenario over the design space of possible scene layouts. The success or failure of any given simulation is determined from a user-specified causal graph enforcing a time ordering between expected events. Our method scales to complex causal graphs and high dimensional design spaces by dividing the graph and scene into simpler sub-scenarios. The aggregated success probability distribution is subsequently used to optimize the entire layout. We demonstrate the use of our framework through a range of real world examples of increasing complexity, and report significant improvements over alternative approaches. Code and fabrication diagrams are available on the project page.
Unbiased rendering of general, heterogeneous participating media currently requires using null-collision approaches for estimating transmittance and generating free-flight distances. A long-standing limitation of these approaches, however, is that the corresponding path pdfs cannot be computed due to the black-box nature of the null-collision rejection sampling process. These techniques therefore cannot be combined with other sampling techniques via multiple importance sampling (MIS), which significantly limits their robustness and generality. Recently, Galtier et al.  showed how to derive these algorithms directly from the radiative transfer equation (RTE). We build off this generalized RTE to derive a path integral formulation of null scattering, which reveals the sampling pdfs and allows us to devise new, express existing, and combine complementary unbiased techniques via MIS. We demonstrate the practicality of our theory by combining, for the first time, several path sampling techniques in spatially and spectrally varying media, generalizing and outperforming the prior state of the art.
Transmission of radiation through spatially-correlated media has demonstrated deviations from the classical exponential law of the corresponding uncorrelated media. In this paper, we propose a general, physically-based method for modeling such correlated media with non-exponential decay of transmittance. We describe spatial correlations by introducing the Fractional Gaussian Field (FGF), a powerful mathematical tool that has proven useful in many areas but remains under-explored in graphics. With the FGF, we study the effects of correlations in a unified manner, by modeling both high-frequency, noise-like fluctuations and k-th order fractional Brownian motion (fBm) with a stochastic continuity property. As a result, we are able to reproduce a wide variety of appearances stemming from different types of spatial correlations. Compared to previous work, our method is the first that addresses both short-range and long-range correlations using physically-based fluctuation models. We show that our method can simulate different extents of randomness in spatially-correlated media, resulting in a smooth transition in a range of appearances from exponential falloff to complete transparency. We further demonstrate how our method can be integrated into an energy-conserving RTE framework with a well-designed importance sampling scheme and validate its ability compared to the classical transport theory and previous work.
We generalize photon planes to photon surfaces: a new family of unbiased volumetric density estimators which we combine using multiple importance sampling. To derive our new estimators, we start with the extended path integral which duplicates the vertex at the end of the camera and photon subpaths and couples them using a blurring kernel. To make our formulation unbiased, however, we use a delta kernel to couple these two end points. Unfortunately, sampling the resulting singular integral using Monte Carlo is impossible since the probability of generating a contributing light path by independently sampling the two subpaths is zero. Our key insight is that we can eliminate the delta kernel and make Monte Carlo estimation practical by integrating any three dimensions analytically, and integrating only the remaining dimensions using Monte Carlo. We demonstrate the practicality of this approach by instantiating a collection of estimators which analytically integrate the distance along the camera ray and two arbitrary sampling dimensions along the photon subpath (e.g., distance, direction, surface area). This generalizes photon planes to curved "photon surfaces", including new "photon cone", "photon cylinder", "photon sphere", and multiple new "photon plane" estimators. These estimators allow us to handle light paths not supported by photon planes, including single scattering, and surface-to-media transport. More importantly, since our estimators have complementary strengths due to analytically integrating different dimensions of the path integral, we can combine them using multiple importance sampling. This combination mitigates singularities present in individual estimators, substantially reducing variance while remaining fully unbiased. We demonstrate our improved estimators on a number of scenes containing homogeneous media with highly anisotropic phase functions, accelerating both multiple scattering and single scattering compared to prior techniques.
Human motion capture using video-based or sensor-based methods gives animators the capability to directly translate complex human motions to create lifelike character animations. Advances in motion capture algorithms have improved their accuracy for estimating human generalized motion coordinates (joint angles and body positions). However, the traditional motion capture pipeline is not well suited to measure short duration, high acceleration impacts, such as running and jumping footstrikes. While high acceleration impacts have minimal influence on generalized coordinates, they play a big role in exciting soft tissue dynamics.
Here we present a method for correcting motion capture trajectories using a sparse set of inertial measurement units (IMUs) collecting at high sampling rates to produce more accurate impact accelerations without sacrificing accuracy of the generalized coordinates representing gross motions. We demonstrate the efficacy of our method by correcting human motion captured experimentally using commercial motion capture systems with high rate IMUs sampling at 400Hz during basketball jump shots and running. With our method, we automatically corrected 185 jumping impacts and 1266 running impacts from 5 subjects. Post correction, we found an average increase of 84.6% and 91.1% in pelvis vertical acceleration and ankle dorsiflexion velocity respectively for basketball jump shots, and an average increase of 110% and 237% in pelvis vertical acceleration and ankle plantarflexion velocity respectively for running. In both activities, pelvis vertical position and ankle angle had small corrections on average below 2.0cm and 0.20rad respectively. Finally, when driving a human rig with soft tissue dynamics using corrected motions, we found a 143.4% and 11.2% increase in soft tissue oscillation amplitudes in basketball jump shots and running respectively. Our methodology can be generalized to correct impact accelerations for other body segments, and provide new tools to create realistic soft tissue animations during dynamic activities for more lifelike characters and better motion reconstruction for biomechanical analyses.
Hand-object interaction is challenging to reconstruct but important for many applications like HCI, robotics and so on. Previous works focus on either the hand or the object while we jointly track the hand poses, fuse the 3D object model and reconstruct its rigid and nonrigid motions, and perform all these tasks in real time. To achieve this, we first use a DNN to segment the hand and object in the two input depth streams and predict the current hand pose based on the previous poses by a pre-trained LSTM network. With this information, a unified optimization framework is proposed to jointly track the hand poses and object motions. The optimization integrates the segmented depth maps, the predicted motion, a spatial-temporal varying rigidity regularizer and a real-time contact constraint. A nonrigid fusion technique is further involved to reconstruct the object model. Experiments demonstrate that our method can solve the ambiguity caused by heavy occlusions between hand and object, and generate accurate results for various objects and interacting motions.
We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands. Our approach is the first two-hand tracking solution that combines an extensive list of favorable properties, namely it is marker-less, uses a single consumer-level depth camera, runs in real time, handles inter- and intra-hand collisions, and automatically adjusts to the user's hand shape. In order to achieve this, we embed a recent parametric hand pose and shape model and a dense correspondence predictor based on a deep neural network into a suitable energy minimization framework. For training the correspondence prediction network, we synthesize a two-hand dataset based on physical simulations that includes both hand pose and shape annotations while at the same time avoiding inter-hand penetrations. To achieve real-time rates, we phrase the model fitting in terms of a nonlinear least-squares problem so that the energy can be optimized based on a highly efficient GPU-based Gauss-Newton optimizer. We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work, including tight two-hand grasps, significant inter-hand occlusions, and gesture interaction.1
We present the first method to accurately track the invisible jaw based solely on the visible skin surface, without the need for any markers or augmentation of the actor. As such, the method can readily be integrated with off-the-shelf facial performance capture systems. The core idea is to learn a non-linear mapping from the skin deformation to the underlying jaw motion on a dataset where ground-truth jaw poses have been acquired, and then to retarget the mapping to new subjects. Solving for the jaw pose plays a central role in visual effects pipelines, since accurate jaw motion is required when retargeting to fantasy characters and for physical simulation. Currently, this task is performed mostly manually to achieve the desired level of accuracy, and the presented method has the potential to fully automate this labour intense and error prone process.
The generation of quad meshes based on surface parametrization techniques has proven to be a versatile approach. These techniques quantize an initial seamless parametrization so as to obtain an integer grid map implying a pure quad mesh. State-of-the-art methods following this approach have to assume that the surface to be meshed either has no boundary, or has a boundary which the resulting mesh is supposed to be aligned to. In a variety of applications this is not desirable and non-boundary-aligned meshes or grid-parametrizations are preferred. We thus present a technique to robustly generate integer grid maps which are either boundary-aligned, non-boundary-aligned, or partially boundary-aligned, just as required by different applications. We thereby generalize previous work to this broader setting. This enables the reliable generation of trimmed quad meshes with partial elements along the boundary, preferable in various scenarios, from tiled texturing over design and modeling to fabrication and architecture, due to fewer constraints and hence higher overall mesh quality and other benefits in terms of aesthetics and flexibility.
We propose a robust 2D meshing algorithm, TriWild, to generate curved triangles reproducing smooth feature curves, leading to coarse meshes designed to match the simulation requirements necessary by applications and avoiding the geometrical errors introduced by linear meshes. The robustness and effectiveness of our technique are demonstrated by batch processing an SVG collection of 20k images, and by comparing our results against state of the art linear and curvilinear meshing algorithms. We demonstrate for our algorithm the practical utility of computing diffusion curves, fluid simulations, elastic deformations, and shape inflation on complex 2D geometries.
This paper tackles the challenging problem of constrained hexahedral meshing. An algorithm is introduced to build combinatorial hexahedral meshes whose boundary facets exactly match a given quadrangulation of the topological sphere. This algorithm is the first practical solution to the problem. It is able to compute small hexahedral meshes of quadrangulations for which the previously known best solutions could only be built by hand or contained thousands of hexahedra. These challenging quadrangulations include the boundaries of transition templates that are critical for the success of general hexahedral meshing algorithms.
The algorithm proposed in this paper is dedicated to building combinatorial hexahedral meshes of small quadrangulations and ignores the geometrical problem. The key idea of the method is to exploit the equivalence between quad flips in the boundary and the insertion of hexahedra glued to this boundary. The tree of all sequences of flipping operations is explored, searching for a path that transforms the input quadrangulation Q into a new quadrangulation for which a hexahedral mesh is known. When a small hexahedral mesh exists, a sequence transforming Q into the boundary of a cube is found; otherwise, a set of pre-computed hexahedral meshes is used.
A novel approach to deal with the large number of problem symmetries is proposed. Combined with an efficient backtracking search, it allows small shellable hexahedral meshes to be found for all even quadrangulations with up to 20 quadrangles. All 54, 943 such quadrangulations were meshed using no more than 72 hexahedra. This algorithm is also used to find a construction to fill arbitrary domains, thereby proving that any ball-shaped domain bounded by n quadrangles can be meshed with no more than 78 n hexahedra. This very significantly lowers the previous upper bound of 5396 n.
We introduce the notion of harmonic triangulations: a harmonic triangulation simultaneously minimizes the Dirichlet energy of all piecewise linear functions. By a famous result of Rippa, Delaunay triangulations are the harmonic triangulations of planar point sets. We prove by explicit counterexample that in 3D a harmonic triangulation does not exist in general. However, we show that bistellar flips are harmonic: if they decrease Dirichlet energy for one set of function values, they do so for all. This observation gives rise to the notion of locally harmonic triangulations. We demonstrate that locally harmonic triangulations can be efficiently computed, and efficiently reduce sliver tetrahedra. The notion of harmonic triangulation also gives rise to a scalar measure of the quality of a triangulation, which can be used to prioritize flips and optimize the position of vertices. Tetrahedral meshes generated by optimizing this function generally show better quality than Delaunay-based optimization techniques.
We present a data structure that makes it easy to run a large class of algorithms from computational geometry and scientific computing on extremely poor-quality surface meshes. Rather than changing the geometry, as in traditional remeshing, we consider intrinsic triangulations which connect vertices by straight paths along the exact geometry of the input mesh. Our key insight is that such a triangulation can be encoded implicitly by storing the direction and distance to neighboring vertices. The resulting signpost data structure then allows geometric and topological queries to be made on-demand by tracing paths across the surface. Existing algorithms can be easily translated into the intrinsic setting, since this data structure supports the same basic operations as an ordinary triangle mesh (vertex insertions, edge splits, etc.). The output of intrinsic algorithms can then be stored on an ordinary mesh for subsequent use; unlike previous data structures, we use a constant amount of memory and do not need to explicitly construct an overlay mesh unless it is specifically requested. Working in the intrinsic setting incurs little computational overhead, yet we can run algorithms on extremely degenerate inputs, including all manifold meshes from the Thingi10k data set. To evaluate our data structure we implement several fundamental geometric algorithms including intrinsic versions of Delaunay refinement and optimal Delaunay triangulation, approximation of Steiner trees, adaptive mesh refinement for PDEs, and computation of Poisson equations, geodesic distance, and flip-free tangent vector fields.
In volume-rendering applications, it is a de facto standard to reconstruct the underlying continuous function by using trilinear interpolation, and to estimate the gradients for the shading computations by calculating central differences on the fly. In a GPU implementation, this requires seven trilinear texture samples: one for the function reconstruction, and six for the gradient estimation. In this paper, for the first time, we show that the six additional samples can be used not just for gradient estimation, but for significantly improving the quality of the function reconstruction as well. As the additional arithmetic operations can be performed in the shadow of the texture fetches, we can achieve this quality improvement for free without reducing the rendering performance at all. Therefore, our method can completely replace the standard trilinear interpolation in the practice of GPU-accelerated volume rendering.
Procedural pattern synthesis is a fundamental tool of Computer Graphics, ubiquitous in games and special effects. By calling a single procedure in every pixel - or voxel - large quantities of details are generated at low cost, enhancing textures, producing complex structures within and along surfaces. Such procedures are typically implemented as pixel shaders.
We propose a novel procedural pattern synthesis technique that exhibits desirable properties for modeling highly contrasted patterns, that are especially well suited to produce surface and microstructure details. In particular, our synthesizer affords for a precise control over the profile, orientation and distribution of the produced stochastic patterns, while allowing to grade all these parameters spatially.
Our technique defines a stochastic smooth phase field - a phasor noise - that is then fed into a periodic function (e.g. a sine wave), producing an oscillating field with prescribed main frequencies and preserved contrast oscillations. In addition, the profile of each oscillation is directly controllable (e.g. sine wave, sawtooth, rectangular or any 1D profile). Our technique builds upon a reformulation of Gabor noise in terms of a phasor field that affords for a clear separation between local intensity and phase.
Applications range from texturing to modeling surface displacements, as well as multi-material microstructures in the context of additive manufacturing.
We tackle the problem of texture synthesis in the setting where many input images are given and a large-scale output is required. We build on recent generative adversarial networks and propose two extensions in this paper. First, we propose an algorithm to combine outputs of GANs trained on a smaller resolution to produce a large-scale plausible texture map with virtually no boundary artifacts. Second, we propose a user interface to enable artistic control. Our quantitative and qualitative results showcase the generation of synthesized high-resolution maps consisting of up to hundreds of megapixels as a case in point.
Despite the recent success of GANs in synthesizing images conditioned on inputs such as a user sketch, text, or semantic labels, manipulating the high-level attributes of an existing natural photograph with GANs is challenging for two reasons. First, it is hard for GANs to precisely reproduce an input image. Second, after manipulation, the newly synthesized pixels often do not fit the original image. In this paper, we address these issues by adapting the image prior learned by GANs to image statistics of an individual image. Our method can accurately reconstruct the input image and synthesize new content, consistent with the appearance of the input image. We demonstrate our interactive system on several semantic image editing tasks, including synthesizing new objects consistent with background, removing unwanted objects, and changing the appearance of an object. Quantitative and qualitative comparisons against several existing methods demonstrate the effectiveness of our method.
Facial Landmark detection in natural images is a very active research domain. Impressive progress has been made in recent years, with the rise of neural-network based methods and large-scale datasets. However, it is still a challenging and largely unexplored problem in the artistic portraits domain. Compared to natural face images, artistic portraits are much more diverse. They contain a much wider style variation in both geometry and texture and are more complex to analyze. Moreover, datasets that are necessary to train neural networks are unavailable.
We propose a method for artistic augmentation of natural face images that enables training deep neural networks for landmark detection in artistic portraits. We utilize conventional facial landmarks datasets, and transform their content from natural images into "artistic face" images. In addition, we use a feature-based landmark correction step, to reduce the dependency between the different facial features, which is necessary due to position and shape variations of facial landmarks in artworks. To evaluate our landmark detection framework, we created an "Artistic-Faces" dataset, containing 160 artworks of various art genres, artists and styles, with a large variation in both geometry and texture. Using our method, we can detect facial features in artistic portraits and analyze their geometric style. This allows the definition of signatures for artistic styles of artworks and artists, that encode both the geometry and the texture style. It also allows us to present a geometric-aware style transfer method for portraits.
Photographers take wide-angle shots to enjoy expanding views, group portraits that never miss anyone, or composite subjects with spectacular scenery background. In spite of the rapid proliferation of wide-angle cameras on mobile phones, a wider field-of-view (FOV) introduces a stronger perspective distortion. Most notably, faces are stretched, squished, and skewed, to look vastly different from real-life. Correcting such distortions requires professional editing skills, as trivial manipulations can introduce other kinds of distortions. This paper introduces a new algorithm to undistort faces without affecting other parts of the photo. Given a portrait as an input, we formulate an optimization problem to create a content-aware warping mesh which locally adapts to the stereographic projection on facial regions, and seamlessly evolves to the perspective projection over the background. Our new energy function performs effectively and reliably for a large group of subjects in the photo. The proposed algorithm is fully automatic and operates at an interactive rate on the mobile platform. We demonstrate promising results on a wide range of FOVs from 70° to 120°.
Despite recent developments towards on-demand, individualized garment design and fabrication, the majority of processes in the fashion industry are still inefficient and heavily dependent on manual work. A significant amount of recent research in this area has been focused on supporting designers to digitally create sewing patterns and shapes, but there is little work on textured fabrics. Aligning textile patterns like stripes or plaid along garment seams requires an experienced tailor and is thus reserved only for expensive, high-end garments. We present an interactive algorithm for automatically aligning repetitive textile patterns along seams for a given garment, allowing a user to make design choices at each step of our pipeline. Our approach is based on the 17 wallpaper groups and the symmetries they exhibit. We exploit these symmetries to optimize the alignment of the sewing pattern with the textured fabric for each of its pieces, determining where to cut the fabric. We optionally alter the sewing pattern slightly for a perfect fit along seams, without visibly changing the 3D shape of the garment. The pieces can then be cut automatically by a CNC or laser cutter. Our approach fits within the pipeline of digital garment design, eliminating the difficult, manual step of aligning and cutting the garment pieces by hand.
Industrial knitting machines are commonly used to manufacture complicated shapes from yarns; however, designing patterns for these machines requires extensive training. We present the first general visual programming interface for creating 3D objects with complex surface finishes on industrial knitting machines. At the core of our interface is a new, augmented, version of the stitch mesh data structure. The augmented stitch mesh stores low-level knitting operations per-face and encodes the dependencies between faces using directed edge labels. Our system can generate knittable augmented stitch meshes from 3D models, allows users to edit these meshes in a way that preserves their knittability, and can schedule the execution order and location of each face for production on a knitting machine. Our system is general, in that its knittability-preserving editing operations are sufficient to transform between any two machine-knittable stitch patterns with the same orientation on the same surface. We demonstrate the power and flexibility of our pipeline by using it to create and knit objects featuring a wide range of patterns and textures, including intarsia and Fair Isle colorwork; knit and purl textures; cable patterns; and laces.
Some artists peel citrus fruits into a variety of elegant 2D shapes, depicting animals, plants, and cartoons. It is a creative art form, called Citrus Peeling Art. This art form follows the conservation principle, i.e., each shape must be created using one entire peel. Central to this art is finding optimal cut lines so that the citruses can be cut and unfolded into the desired shapes. However, it is extremely difficult for users to imagine and generate cuts for their desired shapes. To this end, we present a computational method for citrus peeling art designs. Our key insight is that instead of solving the difficult cut generation problem, we map a designed input shape onto a citrus in an attempt to cover the entire citrus and use the mapped boundary to generate the cut paths. Sometimes, a mapped shape is unable to completely cover a citrus. Consequently, we have developed five customized ways of interaction that are used to rectify the input shape so that it is suitable for citrus peeling art. The mapping process and user interactions are iteratively conducted to satisfy a user's design intentions. A large number of experiments, including a formative user study, demonstrate the capability and practicability of our method for peeling art design and construction.
Modeling and rendering of dynamic scenes is challenging, as natural scenes often contain complex phenomena such as thin structures, evolving topology, translucency, scattering, occlusion, and biological motion. Mesh-based reconstruction and tracking often fail in these cases, and other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity. We circumvent these difficulties by presenting a learning-based approach to representing dynamic objects inspired by the integral projection model used in tomographic imaging. The approach is supervised directly from 2D images in a multi-view capture setting and does not require explicit reconstruction or tracking of the object. Our method has two primary components: an encoder-decoder network that transforms input images into a 3D volume representation, and a differentiable ray-marching operation that enables end-to-end training. By virtue of its 3D representation, our construction extrapolates better to novel viewpoints compared to screen-space rendering techniques. The encoder-decoder architecture learns a latent representation of a dynamic scene that enables us to produce novel content sequences not seen during training. To overcome memory limitations of voxel-based representations, we learn a dynamic irregular grid structure implemented with a warp field during ray-marching. This structure greatly improves the apparent resolution and reduces grid-like artifacts and jagged motion. Finally, we demonstrate how to incorporate surface-based representations into our volumetric-learning framework for applications where the highest resolution is required, using facial performance capture as a case in point.
The modern computer graphics pipeline can synthesize images at remarkable visual quality; however, it requires well-defined, high-quality 3D content as input. In this work, we explore the use of imperfect 3D content, for instance, obtained from photo-metric reconstructions with noisy and incomplete surface geometry, while still aiming to produce photo-realistic (re-)renderings. To address this challenging problem, we introduce Deferred Neural Rendering, a new paradigm for image synthesis that combines the traditional graphics pipeline with learnable components. Specifically, we propose Neural Textures, which are learned feature maps that are trained as part of the scene capture process. Similar to traditional textures, neural textures are stored as maps on top of 3D mesh proxies; however, the high-dimensional feature maps contain significantly more information, which can be interpreted by our new deferred neural rendering pipeline. Both neural textures and deferred neural renderer are trained end-to-end, enabling us to synthesize photo-realistic images even when the original 3D content was imperfect. In contrast to traditional, black-box 2D generative neural networks, our 3D representation gives us explicit control over the generated output, and allows for a wide range of application domains. For instance, we can synthesize temporally-consistent video re-renderings of recorded 3D scenes as our representation is inherently embedded in 3D space. This way, neural textures can be utilized to coherently re-render or manipulate existing video content in both static and dynamic environments at real-time rates. We show the effectiveness of our approach in several experiments on novel view synthesis, scene editing, and facial reenactment, and compare to state-of-the-art approaches that leverage the standard graphics pipeline as well as conventional generative neural networks.
A key promise of Virtual Reality (VR) is the possibility of remote social interaction that is more immersive than any prior telecommunication media. However, existing social VR experiences are mediated by inauthentic digital representations of the user (i.e., stylized avatars). These stylized representations have limited the adoption of social VR applications in precisely those cases where immersion is most necessary (e.g., professional interactions and intimate conversations). In this work, we present a bidirectional system that can animate avatar heads of both users' full likeness using consumer-friendly headset mounted cameras (HMC). There are two main challenges in doing this: unaccommodating camera views and the image-to-avatar domain gap. We address both challenges by leveraging constraints imposed by multiview geometry to establish precise image-to-avatar correspondence, which are then used to learn an end-to-end model for real-time tracking. We present designs for a training HMC, aimed at data-collection and model building, and a tracking HMC for use during interactions in VR. Correspondence between the avatar and the HMC-acquired images are automatically found through self-supervised multiview image translation, which does not require manual annotation or one-to-one correspondence between domains. We evaluate the system on a variety of users and demonstrate significant improvements over prior work.
Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis.
We present an analysis of anisotropic hyperelasticity, specifically transverse isotropy, that obtains closed-form expressions for the eigendecompositions of many common energies. We then use these to build fast and concise Newton implementations. We leverage our analysis in two separate applications. First, we show that existing anisotropic energies are not inversion-safe, and contain spurious stable states under large deformation. We then propose a new anisotropic strain invariant that enables the formulation of a novel, robust, and inversion-safe energy. The new energy fits completely within our analysis, so closed-form expressions are obtained for its eigensystem as well. Secondly, we use our analysis to rehabilitate badly-conditioned finite elements. Using this method, we can robustly simulate large deformations even when a mesh contains degenerate, zero-volume elements. We accomplish this by swapping the badly-behaved isotropic direction with a well-behaved anisotropic term. We validate our approach on a variety of examples.
Simulation methods are rapidly advancing the accuracy, consistency and controllability of elastodynamic modeling and animation. Critical to these advances, we require efficient time step solvers that reliably solve all implicit time integration problems for elastica. While available time step solvers succeed admirably in some regimes, they become impractically slow, inaccurate, unstable, or even divergent in others --- as we show here. Towards addressing these needs we present the Decomposed Optimization Time Integrator (DOT), a new domain-decomposed optimization method for solving the per time step, nonlinear problems of implicit numerical time integration. DOT is especially suitable for large time step simulations of deformable bodies with nonlinear materials and high-speed dynamics. It is efficient, automated, and robust at large, fixed-size time steps, thus ensuring stable, continued progress of high-quality simulation output. Across a broad range of extreme and mild deformation dynamics, using frame-rate size time steps with widely varying object shapes and mesh resolutions, we show that DOT always converges to user-set tolerances, generally well-exceeding and always close to the best wall-clock times across all previous nonlinear time step solvers, irrespective of the deformation applied.
Affine transformations are of vital importance in many tasks pertaining to motion design and animation. Interpolation of affine transformations is non-trivial. Typically, the given affine transformation is decomposed into simpler components which are easier to interpolate. This may lead to unintuitive results, while in some cases, such solutions may not work. In this work, we propose an interpolation framework which is based on a Lie group representation of the affine transformation. The Lie group representation decomposes the given transformation into simpler and meaningful components, on which computational tools like the exponential and logarithm maps are available in closed form. Interpolation exists for all affine transformations while preserving a few characteristics of the original transformation. A detailed analysis and several experiments of the proposed framework are included.
Using joint actuators to drive the skeletal movements is a common practice in character animation, but the resultant torque patterns are often unnatural or infeasible for real humans to achieve. On the other hand, physiologically-based models explicitly simulate muscles and tendons and thus produce more human-like movements and torque patterns. This paper introduces a technique to transform an optimal control problem formulated in the muscle-actuation space to an equivalent problem in the joint-actuation space, such that the solutions to both problems have the same optimal value. By solving the equivalent problem in the joint-actuation space, we can generate human-like motions comparable to those generated by musculotendon models, while retaining the benefit of simple modeling and fast computation offered by joint-actuation models. Our method transforms constant bounds on muscle activations to nonlinear, state-dependent torque limits in the joint-actuation space. In addition, the metabolic energy function on muscle activations is transformed to a nonlinear function of joint torques, joint configuration and joint velocity. Our technique can also benefit policy optimization using deep reinforcement learning approach, by providing a more anatomically realistic action space for the agent to explore during the learning process. We take the advantage of the physiologically-based simulator, OpenSim, to provide training data for learning the torque limits and the metabolic energy function. Once trained, the same torque limits and the energy function can be applied to drastically different motor tasks formulated as either trajectory optimization or policy learning.
Many anatomical factors, such as bone geometry and muscle condition, interact to affect human movements. This work aims to build a comprehensive musculoskeletal model and its control system that reproduces realistic human movements driven by muscle contraction dynamics. The variations in the anatomic model generate a spectrum of human movements ranging from typical to highly stylistic movements. To do so, we discuss scalable and reliable simulation of anatomical features, robust control of under-actuated dynamical systems based on deep reinforcement learning, and modeling of pose-dependent joint limits. The key technical contribution is a scalable, two-level imitation learning algorithm that can deal with a comprehensive full-body musculoskeletal model with 346 muscles. We demonstrate the predictive simulation of dynamic motor skills under anatomical conditions including bone deformity, muscle weakness, contracture, and the use of a prosthesis. We also simulate various pathological gaits and predictively visualize how orthopedic surgeries improve post-operative gaits.
Playing with a soccer ball is not easy even for a real human because of dynamic foot contacts with the moving ball while chasing and controlling it. The problem of online full-body soccer motion synthesis is challenging and has not been fully solved yet. In this paper, we present a novel motion control system that produces physically-correct full-body soccer motions: dribbling forward, dribbling to the side, and shooting, in response to an online user motion prescription specified by a motion type, a running speed, and a turning angle. This system performs two tightly-coupled tasks: data-driven motion prediction and physics-based motion synthesis. Given example motion data, the former synthesizes a reference motion in accordance with an online user input and further refines the motion to make the character kick the ball at a right time and place. Provided with the reference motion, the latter then adopts a Model Predictive Control (MPC) framework to generate a physically-correct soccer motion, by solving an optimal control problem that is formulated based on dynamics for a full-body character and the moving ball together with their interactions. Our demonstration shows the effectiveness of the proposed system that synthesizes convincing full-body soccer motions in various scenarios such as adjusting the desired running speed of the character, changing the velocity and the mass of the ball, and maintaining balance against external forces.
Analyzing human motion is a challenging task with a wide variety of applications in computer vision and in graphics. One such application, of particular importance in computer animation, is the retargeting of motion from one performer to another. While humans move in three dimensions, the vast majority of human motions are captured using video, requiring 2D-to-3D pose and camera recovery, before existing retargeting approaches may be applied. In this paper, we present a new method for retargeting video-captured motion between different human performers, without the need to explicitly reconstruct 3D poses and/or camera parameters.
In order to achieve our goal, we learn to extract, directly from a video, a high-level latent motion representation, which is invariant to the skeleton geometry and the camera view. Our key idea is to train a deep neural network to decompose temporal sequences of 2D poses into three components: motion, skeleton, and camera view-angle. Having extracted such a representation, we are able to re-combine motion with novel skeletons and camera views, and decode a retargeted temporal sequence, which we compare to a ground truth from a synthetic dataset.
We demonstrate that our framework can be used to robustly extract human motion from videos, bypassing 3D reconstruction, and outperforming existing retargeting methods, when applied to videos in-the-wild. It also enables additional applications, such as performance cloning, video-driven cartoons, and motion retrieval.
The goal of light transport acquisition is to take images from a sparse set of lighting and viewing directions, and combine them to enable arbitrary relighting with changing view. While relighting from sparse images has received significant attention, there has been relatively less progress on view synthesis from a sparse set of "photometric" images---images captured under controlled conditions, lit by a single directional source; we use a spherical gantry to position the camera on a sphere surrounding the object. In this paper, we synthesize novel viewpoints across a wide range of viewing directions (covering a 60° cone) from a sparse set of just six viewing directions. While our approach relates to previous view synthesis and image-based rendering techniques, those methods are usually restricted to much smaller baselines, and are captured under environment illumination. At our baselines, input images have few correspondences and large occlusions; however we benefit from structured photometric images. Our method is based on a deep convolutional network trained to directly synthesize new views from the six input views. This network combines 3D convolutions on a plane sweep volume with a novel per-view per-depth plane attention map prediction network to effectively aggregate multi-view appearance. We train our network with a large-scale synthetic dataset of 1000 scenes with complex geometry and material properties. In practice, it is able to synthesize novel viewpoints for captured real data and reproduces complex appearance effects like occlusions, view-dependent specularities and hard shadows. Moreover, the method can also be combined with previous relighting techniques to enable changing both lighting and view, and applied to computer vision problems like multiview stereo from sparse image sets.
We present a novel technique to relight images of human faces by learning a model of facial reflectance from a database of 4D reflectance field data of several subjects in a variety of expressions and viewpoints. Using our learned model, a face can be relit in arbitrary illumination environments using only two original images recorded under spherical color gradient illumination. The output of our deep network indicates that the color gradient images contain the information needed to estimate the full 4D reflectance field, including specular reflections and high frequency details. While capturing spherical color gradient illumination still requires a special lighting setup, reduction to just two illumination conditions allows the technique to be applied to dynamic facial performance capture. We show side-by-side comparisons which demonstrate that the proposed system outperforms the state-of-the-art techniques in both realism and speed.
We propose the first learning-based algorithm that can relight images in a plausible and controllable manner given multiple views of an outdoor scene. In particular, we introduce a geometry-aware neural network that utilizes multiple geometry cues (normal maps, specular direction, etc.) and source and target shadow masks computed from a noisy proxy geometry obtained by multi-view stereo. Our model is a three-stage pipeline: two subnetworks refine the source and target shadow masks, and a third performs the final relighting. Furthermore, we introduce a novel representation for the shadow masks, which we call RGB shadow images. They reproject the colors from all views into the shadowed pixels and enable our network to cope with inacuraccies in the proxy and the non-locality of the shadow casting interactions. Acquiring large-scale multi-view relighting datasets for real scenes is challenging, so we train our network on photorealistic synthetic data. At train time, we also compute a noisy stereo-based geometric proxy, this time from the synthetic renderings. This allows us to bridge the gap between the real and synthetic domains. Our model generalizes well to real scenes. It can alter the illumination of drone footage, image-based renderings, textured mesh reconstructions, and even internet photo collections.
Lighting plays a central role in conveying the essence and depth of the subject in a portrait photograph. Professional photographers will carefully control the lighting in their studio to manipulate the appearance of their subject, while consumer photographers are usually constrained to the illumination of their environment. Though prior works have explored techniques for relighting an image, their utility is usually limited due to requirements of specialized hardware, multiple images of the subject under controlled or known illuminations, or accurate models of geometry and reflectance. To this end, we present a system for portrait relighting: a neural network that takes as input a single RGB image of a portrait taken with a standard cellphone camera in an unconstrained environment, and from that image produces a relit image of that subject as though it were illuminated according to any provided environment map. Our method is trained on a small database of 18 individuals captured under different directional light sources in a controlled light stage setup consisting of a densely sampled sphere of lights. Our proposed technique produces quantitatively superior results on our dataset's validation set compared to prior works, and produces convincing qualitative relighting results on a dataset of hundreds of real-world cellphone portraits. Because our technique can produce a 640 × 640 image in only 160 milliseconds, it may enable interactive user-facing photographic applications in the future.
Users frequently seek to fabricate objects whose outer surfaces consist of regions with different surface attributes, such as color or material. Manufacturing such objects in a single piece is often challenging or even impossible. The alternative is to partition them into single-attribute volumetric parts that can be fabricated separately and then assembled to form the target object. Facilitating this approach requires partitioning the input model into parts that conform to the surface segmentation and that can be moved apart with no collisions. We propose Surface2Volume, a partition algorithm capable of producing such assemblable parts, each of which is affiliated with a single attribute, the outer surface of whose assembly conforms to the input surface geometry and segmentation. In computing the partition we strictly enforce conformity with surface segmentation and assemblability, and optimize for ease of fabrication by minimizing part count, promoting part simplicity, and simplifying assembly sequencing. We note that computing the desired partition requires solving for three types of variables: per-part assembly trajectories, partition topology, i.e. the connectivity of the interface surfaces separating the different parts, and the geometry, or location, of these interfaces. We efficiently produce the desired partitions by addressing one type of variables at a time: first computing the assembly trajectories, then determining interface topology, and finally computing interface locations that allow parts assemblability. We algorithmically identify inputs that necessitate sequential assembly, and partition these inputs gradually by computing and disassembling a subset of assemblable parts at a time. We demonstrate our method's robustness and versatility by employing it to partition a range of models with complex surface segmentations into assemblable parts. We further validate our framework via output fabrication and comparisons to alternative partition techniques.
Most additive manufacturing processes fabricate objects by stacking planar layers of solidified material. As a result, produced parts exhibit a so-called staircase effect, which results from sampling slanted surfaces with parallel planes. Using thinner slices reduces this effect, but it always remains visible where layers almost align with the input surfaces.
In this research we exploit the ability of some additive manufacturing processes to deposit material slightly out of plane to dramatically reduce these artifacts. We focus in particular on the widespread Fused Filament Fabrication (FFF) technology, since most printers in this category can deposit along slightly curved paths, under deposition slope and thickness constraints.
Our algorithm curves the layers, making them either follow the natural slope of the input surface or on the contrary, make them intersect the surfaces at a steeper angle thereby improving the sampling quality. Rather than directly computing curved layers, our algorithm optimizes for a deformation of the model which is then sliced with a standard planar approach. We demonstrate that this approach enables us to encode all fabrication constraints, including the guarantee of generating collision-free toolpaths, in a convex optimization that can be solved using a QP solver.
We produce a variety of models and compare print quality between curved deposition and planar slicing.
We present a method for designing mechanical metamaterials based on the novel concept of Voronoi diagrams induced by star-shaped metrics. As one of its central advantages, our approach supports interpolation between arbitrary metrics. This capability opens up a rich space of structures with interesting aesthetics and a wide range of mechanical properties, including isotropic, tetragonal, orthotropic, as well as smoothly graded materials. We evaluate our method by creating large sets of example structures, provided as accompanying material. We validate the mechanical properties predicted by simulation through tensile tests on a set of physical prototypes.
We present X-shells, a new class of deployable structures formed by an ensemble of elastically deforming beams coupled through rotational joints. An X-shell can be assembled conveniently in a flat configuration from standard elastic beam elements and then deployed through force actuation into the desired 3D target state. During deployment, the coupling imposed by the joints will force the beams to twist and buckle out of plane to maintain a state of static equilibrium. This complex interaction of discrete joints and continuously deforming beams allows interesting 3D forms to emerge. Simulating X-shells is challenging, however, due to unstable equilibria at the onset of beam buckling. We propose an optimization-based simulation framework building on a discrete rod model that robustly handles such difficult scenarios by analyzing and appropriately modifying the elastic energy Hessian. This real-time simulation method forms the basis of a computational design tool for X-shells that enables interactive design space exploration by varying and optimizing design parameters to achieve a specific design intent. We jointly optimize the assembly state and the deployed configuration to ensure the geometric and structural integrity of the deployable X-shell. Once a design is finalized, we also optimize for a sparse distribution of actuation forces to efficiently deploy it from its flat assembly state to its 3D target state. We demonstrate the effectiveness of our design approach with a number of design studies that highlight the richness of the X-shell design space, enabling new forms not possible with existing approaches. We validate our computational model with several physical prototypes that show excellent agreement with the optimized digital models.
We present an autonomous scanning approach which allows multiple robots to perform collaborative scanning for dense 3D reconstruction of unknown indoor scenes. Our method plans scanning paths for several robots, allowing them to efficiently coordinate with each other such that the collective scanning coverage and reconstruction quality is maximized while the overall scanning effort is minimized. To this end, we define the problem as a dynamic task assignment and introduce a novel formulation based on Optimal Mass Transport (OMT). Given the currently scanned scene, a set of task views are extracted to cover scene regions which are either unknown or uncertain. These task views are assigned to the robots based on the OMT optimization. We then compute for each robot a smooth path over its assigned tasks by solving an approximate traveling salesman problem. In order to showcase our algorithm, we implement a multi-robot auto-scanning system. Since our method is computationally efficient, we can easily run it in real time on commodity hardware, and combine it with online RGB-D reconstruction approaches. In our results, we show several real-world examples of large indoor environments; in addition, we build a benchmark with a series of carefully designed metrics for quantitatively evaluating multi-robot autoscanning. Overall, we are able to demonstrate high-quality scanning results with respect to reconstruction quality and scanning efficiency, which significantly outperforms existing multi-robot exploration systems.
The Iterative Closest Point (ICP) algorithm, commonly used for alignment of 3D models, has previously been defined using either a point-to-point or point-to-plane objective. Alternatively, researchers have proposed computationally-expensive methods that directly minimize the distance function between surfaces. We introduce a new symmetrized objective function that achieves the simplicity and computational efficiency of point-to-plane optimization, while yielding improved convergence speed and a wider convergence basin. In addition, we present a linearization of the objective that is exact in the case of exact correspondences. We experimentally demonstrate the improved speed and convergence basin of the symmetric objective, on both smooth models and challenging cases involving noise and partial overlap.
Computed tomography has emerged as the method of choice for scanning complex shapes as well as interior structures of stationary objects. Recent progress has also allowed the use of CT for analyzing deforming objects and dynamic phenomena, although the deformations have been constrained to be either slow or periodic motions.
In this work we improve the tomographic reconstruction of time-varying geometries undergoing faster, non-periodic deformations. Our method uses a warp-and-project approach that allows us to introduce an essentially continuous time axis where consistency of the reconstructed shape with the projection images is enforced for the specific time and deformation state at which the image was captured. The method uses an efficient, time-adaptive solver that yields both the moving geometry as well as the deformation field.
We validate our method with extensive experiments using both synthetic and real data from a range of different application scenarios.
A basic challenge in field-guided hexahedral meshing is to find a spatially-varying frame that is adapted to the domain geometry and is continuous up to symmetries of the cube. We introduce a fundamentally new representation of such 3D cross fields based on Cartan's method of moving frames. Our key observation is that cross fields and ordinary frame fields are locally characterized by identical conditions on their Darboux derivative. Hence, by using derivatives as the principal representation (and only later recovering the field itself), one avoids the need to explicitly account for symmetry during optimization. At the discrete level, derivatives are encoded by skew-symmetric matrices associated with the edges of a tetrahedral mesh; these matrices encode arbitrarily large rotations along each edge, and can robustly capture singular behavior even on coarse meshes. We apply this representation to compute 3D cross fields that are as smooth as possible everywhere but on a prescribed network of singular curves---since these fields are adapted to curve tangents, they can be directly used as input for field-guided mesh generation algorithms. Optimization amounts to an easy nonlinear least squares problem that behaves like a convex program in the sense that it always appears to produce the same result, independent of initialization. We study the numerical behavior of this procedure, and perform some preliminary experiments with mesh generation.
We propose an algorithm that interpolates between vector and frame fields on triangulated surfaces, designed to complement field design methods in geometry processing and simulation. Our algorithm is based on a polar construction, leveraging a conservation law from the Hopf-Poincaré theorem to match singular points using ideas from optimal transport; the remaining detail of the field is interpolated using straightforward machinery. Our model is designed with topology in mind, sliding singular points along the surface rather than having them appear and disappear, and it caters to all surface topologies, including boundary and generator loops.
Optimal transport research has surged in the last decade with wide applications in computer graphics. In most cases, however, it has focused on the special case of the so-called "balanced" optimal transport problem, that is, the problem of optimally matching positive measures of equal total mass. While this approach is suitable for handling probability distributions as their total mass is always equal to one, it precludes other applications manipulating disparate measures. Our paper proposes a fast approach to the optimal transport of constant distributions supported on point sets of different cardinality via one-dimensional slices. This leads to one-dimensional partial assignment problems akin to alignment problems encountered in genomics or text comparison. Contrary to one-dimensional balanced optimal transport that leads to a trivial linear-time algorithm, such partial optimal transport, even in 1-d, has not seen any closed-form solution nor very efficient algorithms to date. We provide the first efficient 1-d partial optimal transport solver. Along with a quasilinear time problem decomposition algorithm, it solves 1-d assignment problems consisting of up to millions of Dirac distributions within fractions of a second in parallel. We handle higher dimensional problems via a slicing approach, and further extend the popular iterative closest point algorithm using optimal transport - an algorithm we call Fast Iterative Sliced Transport. We illustrate our method on computer graphics applications such a color transfer and point cloud registration.
Polygonal meshes provide an efficient representation for 3D shapes. They explicitly captureboth shape surface and topology, and leverage non-uniformity to represent large flat regions as well as sharp, intricate features. This non-uniformity and irregularity, however, inhibits mesh analysis efforts using neural networks that combine convolution and pooling operations. In this paper, we utilize the unique properties of the mesh for a direct analysis of 3D shapes using MeshCNN, a convolutional neural network designed specifically for triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized convolution and pooling layers that operate on the mesh edges, by leveraging their intrinsic geodesic connections. Convolutions are applied on edges and the four edges of their incident triangles, and pooling is applied via an edge collapse operation that retains surface topology, thereby, generating new mesh connectivity for the subsequent convolutions. MeshCNN learns which edges to collapse, thus forming a task-driven process where the network exposes and expands the important features while discarding the redundant ones. We demonstrate the effectiveness of MeshCNN on various learning tasks applied to 3D meshes.
We present SAGNet, a structure-aware generative model for 3D shapes. Given a set of segmented objects of a certain class, the geometry of their parts and the pairwise relationships between them (the structure) are jointly learned and embedded in a latent space by an autoencoder. The encoder intertwines the geometry and structure features into a single latent code, while the decoder disentangles the features and reconstructs the geometry and structure of the 3D model. Our autoencoder consists of two branches, one for the structure and one for the geometry. The key idea is that during the analysis, the two branches exchange information between them, thereby learning the dependencies between structure and geometry and encoding two augmented features, which are then fused into a single latent code. This explicit intertwining of information enables separately controlling the geometry and the structure of the generated models. We evaluate the performance of our method and conduct an ablation study. We explicitly show that encoding of shapes accounts for both similarities in structure and geometry. A variety of quality results generated by SAGNet are presented.
Next generation smart and augmented reality systems demand a computational understanding of monocular footage that captures humans in physical spaces to reveal plausible object arrangements and human-object interactions. Despite recent advances, both in scene layout and human motion analysis, the above setting remains challenging to analyze due to regular occlusions that occur between objects and human motions. We observe that the interaction between object arrangements and human actions is often strongly correlated, and hence can be used to help recover from these occlusions. We present iMapper, a data-driven method to identify such human-object interactions and utilize them to infer layouts of occluded objects. Starting from a monocular video with detected 2D human joint positions that are potentially noisy and occluded, we first introduce the notion of interaction-saliency as space-time snapshots where informative human-object interactions happen. Then, we propose a global optimization to retrieve and fit interactions from a database to the detected salient interactions in order to best explain the input video. We extensively evaluate the approach, both quantitatively against manually annotated ground truth and through a user study, and demonstrate that iMapper produces plausible scene layouts for scenes with medium to heavy occlusion. Code and data are available on the project page.
We present an approach to the accurate and efficient large-scale simulation of the complex dynamics of ferrofluids based on physical principles. Ferrofluids are liquids containing magnetic particles that react to an external magnetic field without solidifying. In this contribution, we employ smooth magnets to simulate ferrofluids in contrast to previous methods based on the finite element method or point magnets. We solve the magnetization using the analytical solution of the smooth magnets' field, and derive the bounded magnetic force formulas addressing particle penetration. We integrate the magnetic field and force evaluations into the fast multipole method allowing for efficient large-scale simulations of ferrofluids. The presented simulations are well reproducible since our approach can be easily incorporated into a framework implementing a Fast Multipole Method and a Smoothed Particle Hydrodynamics fluid solver with surface tension. We provide a detailed analysis of our approach and validate our results against real wet lab experiments. This work can potentially open the door for a deeper understanding of ferrofluids and for the identification of new areas of applications of these materials.
While pressure forces are often the bottleneck in (near-)inviscid fluid simulations, viscosity can impose orders of magnitude greater computational costs at lower Reynolds numbers. We propose an implicit octree finite difference discretization that significantly accelerates the solution of the free surface viscosity equations using adaptive staggered grids, while supporting viscous buckling and rotation effects, variable viscosity, and interaction with scripted moving solids. In experimental comparisons against regular grids, our method reduced the number of active velocity degrees of freedom by as much as a factor of 7.7 and reduced linear system solution times by factors between 3.8 and 9.4. We achieve this by developing a novel adaptive variational finite difference methodology for octrees and applying it to the optimization form of the viscosity problem. This yields a linear system that is symmetric positive definite by construction, unlike naive finite difference/volume methods, and much sparser than a hypothetical finite element alternative. Grid refinement studies show spatial convergence at first order in L∞ and second order in L1, while the significantly smaller size of the octree linear systems allows for the solution of viscous forces at higher effective resolutions than with regular grids. We demonstrate the practical benefits of our adaptive scheme by replacing the regular grid viscosity step of a commercial liquid simulator (Houdini) to yield large speed-ups, and by incorporating it into an existing inviscid octree simulator to add support for viscous flows. Animations of viscous liquids pouring, bending, stirring, buckling, and melting illustrate that our octree method offers significant computational gains and excellent visual consistency with its regular grid counterpart.
The materials around us usually exist as mixtures of constituents, each constituent with possibly a different elasto-viscoplastic property. How can we describe the material property of such a mixture is the core question of this paper. We propose a nonlinear blending model that can capture intriguing flowing behaviors that can differ from that of the individual constituents (Fig. 1). We used a laboratory device, rheometer, to measure the flowing properties of various fluid-like foods, and found that an elastic Herschel-Bulkley model has nice agreements with the measured data even for the mixtures of these foods. We then constructed a blending model such that it qualitatively agrees with the measurements and is closed in the parameter space of the elastic Herschel-Bulkley model. We provide validations through comparisons between the measured and estimated properties using our model, and comparisons between simulated examples and captured footages. We show the utility of our model for producing interesting behaviors of various mixtures.
Popular Virtual Reality (VR) tools allow users to draw varying-width, ribbonlike 3D brush strokes by moving a hand-held controller in 3D space. Artists frequently use dense collections of such strokes to draw virtual 3D shapes. We propose SurfaceBrush, a surfacing method that converts such VR drawings into user-intended manifold free-form 3D surfaces, providing a novel approach for modeling 3D shapes. The inputs to our method consist of dense collections of artist-drawn stroke ribbons described by the positions and normals of their central polylines, and ribbon widths. These inputs are highly distinct from those handled by existing surfacing frameworks and exhibit different sparsity and error patterns, necessitating a novel surfacing approach. We surface the input stroke drawings by identifying and leveraging local coherence between nearby artist strokes. In particular, we observe that strokes intended to be adjacent on the artist imagined surface often have similar tangent directions along their respective polylines. We leverage this local stroke direction consistency by casting the computation of the user-intended manifold surface as a constrained matching problem on stroke polyline vertices and edges. We first detect and smoothly connect adjacent similarly-directed sequences of stroke edges producing one or more manifold partial surfaces. We then complete the surfacing process by identifying and connecting adjacent similarly directed edges along the borders of these partial surfaces. We confirm the usability of the SurfaceBrush interface and the validity of our drawing analysis via an observational study. We validate our stroke surfacing algorithm by demonstrating an array of manifold surfaces computed by our framework starting from a range of inputs of varying complexity, and by comparing our outputs to reconstructions computed using alternative means.
We suggest a rasterization pipeline tailored towards the needs of HMDs, where latency and field-of-view requirements pose new challenges beyond those of traditional desktop displays. Instead of image warping for low latency, or using multiple passes for foveation, we show how both can be produced directly in a single perceptual rasterization pass. We do this with per-fragment ray-casting. This is enabled by derivations of tight space-time-fovea pixel bounds, introducing just enough flexibility for the requisite geometric tests, but retaining most of the simplicity and efficiency of the traditional rasterizaton pipeline. To produce foveated images, we rasterize to an image with spatially varying pixel density. To compensate for latency, we extend the image formation model to directly produce "rolling" images where the time at each pixel depends on its display location. Our approach overcomes limitations of warping with respect to disocclusions, object motion and view-dependent shading, as well as geometric aliasing artifacts in other foveated rendering techniques. A set of perceptual user studies demonstrates the efficacy of our approach.
Current rendering techniques struggle to fulfill quality and power efficiency requirements imposed by new display devices such as virtual reality headsets. A promising solution to overcome these problems is foveated rendering, which exploits gaze information to reduce rendering quality for the peripheral vision where the requirements of the human visual system are significantly lower. Most of the current solutions model the sensitivity as a function of eccentricity, neglecting the fact that it also is strongly influenced by the displayed content. In this work, we propose a new luminance-contrast-aware foveated rendering technique which demonstrates that the computational savings of foveated rendering can be significantly improved if local luminance contrast of the image is analyzed. To this end, we first study the resolution requirements at different eccentricities as a function of luminance patterns. We later use this information to derive a low-cost predictor of the foveated rendering parameters. Its main feature is the ability to predict the parameters using only a low-resolution version of the current frame, even though the prediction holds for high-resolution rendering. This property is essential for the estimation of required quality before the full-resolution image is rendered. We demonstrate that our predictor can efficiently drive the foveated rendering technique and analyze its benefits in a series of user experiments.
We present a near-eye augmented reality display with resolution and focal depth dynamically driven by gaze tracking. The display combines a traveling microdisplay relayed off a concave half-mirror magnifier for the high-resolution foveal region, with a wide field-of-view peripheral display using a projector-based Maxwellian-view display whose nodal point is translated to follow the viewer's pupil during eye movements using a traveling holographic optical element. The same optics relay an image of the eye to an infrared camera used for gaze tracking, which in turn drives the foveal display location and peripheral nodal point. Our display supports accommodation cues by varying the focal depth of the microdisplay in the foveal region, and by rendering simulated defocus on the "always in focus" scanning laser projector used for peripheral display. The resulting family of displays significantly improves on the field-of-view, resolution, and form-factor tradeoff present in previous augmented reality designs. We show prototypes supporting 30, 40 and 60 cpd foveal resolution at a net 85° × 78° field of view per eye.
We present Vidgets, a family of mechanical widgets, specifically push buttons and rotary knobs that augment mobile devices with tangible user interfaces. When these widgets are attached to a mobile device and a user interacts with them, the widgets' nonlinear mechanical response shifts the device slightly and quickly, and this subtle motion can be detected by the accelerometer commonly equipped on mobile devices. We propose a physics-based model to understand the nonlinear mechanical response of widgets. This understanding enables us to design tactile force profiles of these widgets so that the resulting accelerometer signals become easy to recognize. We then develop a lightweight signal processing algorithm that analyzes the accelerometer signals and recognizes how the user interacts with the widgets in real time. Vidgets widgets are low-cost, compact, reconfigurable, and power efficient. They can form a diverse set of physical interfaces that enrich users' interactions with mobile devices in various practical scenarios. We demonstrate their use in three applications: photo capture with single-handed zoom, control of mobile games, and making a playable mobile music instrument.
Character animation tools are based on a keyframing metaphor where artists pose characters at selected keyframes and the software automatically interpolates the frames inbetween. Although the quality of the interpolation is critical for achieving a fluid and engaging animation, the tools available to adjust the result of the automatic inbetweening are rudimentary and typically require manual editing of spline parameters. As a result, artists spend a tremendous amount of time posing and setting more keyframes. In this pose-centric workflow, animators use combinations of forward and inverse kinematics. While forward kinematics leads to intuitive interpolations, it does not naturally support positional constraints such as fixed contact points. Inverse kinematics can be used to fix certain points in space at keyframes, but can lead to inferior interpolations, is slow to compute, and does not allow for positional contraints at non-keyframe frames. In this paper, we address these problems by formulating the control of interpolations with positional constraints over time as a space-time optimization problem in the tangent space of the animation curves driving the controls. Our method has the key properties that it (1) allows the manipulation of positions and orientations over time, extending inverse kinematics, (2) does not add new keyframes that might conflict with an artist's preferred keyframe style, and (3) works in the space of artist editable animation curves and hence integrates seamlessly with current pipelines. We demonstrate the utility of the technique in practice via various examples and use cases.
Creating animations for robotic characters is very challenging due to the constraints imposed by their physical nature. In particular, the combination of fast motions and unavoidable structural deformations leads to mechanical oscillations that negatively affect their performances. Our goal is to automatically transfer motions created using traditional animation software to robotic characters while avoiding such artifacts. To this end, we develop an optimization-based, dynamics-aware motion retargeting system that adjusts an input motion such that visually salient low-frequency, large amplitude vibrations are suppressed. The technical core of our animation system consists of a differentiable dynamics simulator that provides constraint-based two-way coupling between rigid and flexible components. We demonstrate the efficacy of our method through experiments performed on a total of five robotic characters including a child-sized animatronic figure that features highly dynamic drumming and boxing motions.
We present a computational framework for robotic animation of real-world string puppets. Also known as marionettes, these articulated figures are typically brought to life by human puppeteers. The puppeteer manipulates rigid handles that are attached to the puppet from above via strings. The motions of the marionette are therefore governed largely by gravity, the pull forces exerted by the strings, and the internal forces arising from mechanical articulation constraints. This seemingly simple setup conceals a very challenging and nuanced control problem, as marionettes are, in fact, complex coupled pendulum systems. Despite this, in the hands of a master puppeteer, marionette animation can be nothing short of mesmerizing. Our goal is to enable autonomous robots to animate marionettes with a level of skill that approaches that of human puppeteers. To this end, we devise a predictive control model that accounts for the dynamics of the marionette and kinematics of the robot puppeteer. The input to our system consists of a string puppet design and a target motion, and our trajectory planning algorithm computes robot control actions that lead to the marionette moving as desired. We validate our methodology through a series of experiments conducted on an array of marionette designs and target motions. These experiments are performed both in simulation and using a physical robot, the human-sized, dual arm ABB YuMi® IRB 14000.
It is well known that the dynamics of articulated rigid bodies can be solved in O(n) time using a recursive method, where n is the number of joints. However, when elasticity is added between the bodies (e.g., damped springs), with linearly implicit integration, the stiffness matrix in the equations of motion breaks the tree topology of the system, making the recursive O(n) method inapplicable. In such cases, the only alternative has been to form and solve the system matrix, which takes O(n3) time. We propose a new approach that is capable of solving the linearly implicit equations of motion in near linear time. Our method, which we call RedMax, is built using a combined reduced/maximal coordinate formulation. This hybrid model enables direct flexibility to apply arbitrary combinations of constraints and contact modeling in both reduced and maximal coordinates, as well as mixtures of implicit and explicit forces in either coordinate representation. We highlight RedMax's flexibility with seamless integration of deformable objects with two-way coupling, at a standard additional cost. We further highlight its flexibility by constructing an efficient internal (joint) and external (environment) frictional contact solver that can leverage bilateral joint constraints for rapid evaluation of frictional articulated dynamics.
We introduce a novel approach to measure the behavior of a geometric operator before and after coarsening. By comparing eigenvectors of the input operator and its coarsened counterpart, we can quantitatively and visually analyze how well the spectral properties of the operator are maintained. Using this measure, we show that standard mesh simplification and algebraic coarsening techniques fail to maintain spectral properties. In response, we introduce a novel approach for spectral coarsening. We show that it is possible to significantly reduce the sampling density of an operator derived from a 3D shape without affecting the low-frequency eigenvectors. By marrying techniques developed within the algebraic multigrid and the functional maps literatures, we successfully coarsen a variety of isotropic and anisotropic operators while maintaining sparsity and positive semi-definiteness. We demonstrate the utility of this approach for applications including operatorsensitive sampling, shape matching, and graph pooling for convolutional neural networks.
Establishing high-quality correspondence maps between geometric shapes has been shown to be the fundamental problem in managing geometric shape collections. Prior work has focused on computing efficient maps between pairs of shapes, and has shown a quantifiable benefit of joint map synchronization, where a collection of shapes are used to improve (denoise) the pairwise maps for consistency and correctness. However, these existing map synchronization techniques place very strong assumptions on the input shapes collection such as all the input shapes fall into the same category and/or the majority of the input pairwise maps are correct. In this paper, we present a multiple map synchronization approach that takes a heterogeneous shape collection as input and simultaneously outputs consistent dense pairwise shape maps. We achieve our goal by using a novel tensor-based representation for map synchronization, which is efficient and robust than all prior matrix-based representations. We demonstrate the usefulness of this approach across a wide range of geometric shape datasets and the applications in shape clustering and shape co-segmentation.
We introduce a new example-based approach to video stylization, with a focus on preserving the visual quality of the style, user controllability and applicability to arbitrary video. Our method gets as input one or more keyframes that the artist chooses to stylize with standard painting tools. It then automatically propagates the stylization to the rest of the sequence. To facilitate this while preserving visual quality, we developed a new type of guidance for state-of-art patch-based synthesis, that can be applied to any type of video content and does not require any additional information besides the video itself and a user-specified mask of the region to be stylized. We further show a temporal blending approach for interpolating style between keyframes that preserves texture coherence, contrast and high frequency details. We evaluate our method on various scenes from real production setting and provide a thorough comparison with prior art.
A common way to view a 360° video on a 2D display is to crop and render a part of the video as a normal field-of-view (NFoV) video. While users can enjoy natural-looking NFoV videos using this approach, they need to constantly make manual adjustment of the viewing direction not to miss interesting events in the video. In this paper, we propose an interactive and automatic navigation system for comfortable 360° video playback. Our system finds a virtual camera path that shows the most salient areas through the video, generates a NFoV video based on the path, and plays it in an online manner. A user can interactively change the viewing direction while watching a video, and the system instantly updates the path reflecting the intention of the user. To enable online processing, we design our system consisting of an offline pre-processing step, and an online 360° video navigation step. The pre-processing step computes optical flow and saliency scores for an input video. Based on these, the online video navigation step computes an optimal camera path reflecting user interaction, and plays a NFoV video in an online manner. For improved user experience, we also introduce optical flow-based camera path planning, saliency-aware path update, and adaptive control of the temporal window size. Our experimental results including user studies show that our system provides more pleasant experience of watching 360° videos than existing approaches.
We present an inverse design tool for fabric formwork - a process where flat panels are sewn together to form a fabric container for casting a plaster sculpture. Compared to 3D printing techniques, the benefit of fabric formwork is its properties of low-cost and easy transport. The process of fabric formwork is akin to molding and casting but having a soft boundary. Deformation of the fabric container is governed by force equilibrium between the pressure forces from liquid fill and tension in the stretched fabric. The final result of fabrication depends on the shapes of the flat panels, the fabrication orientation and the placement of external supports. Our computational framework generates optimized flat panels and fabrication orientation with reference to a target shape, and determines effective locations for external supports. We demonstrate the function of this design tool on a variety of models with different shapes and topology. Physical fabrication is also demonstrated to validate our approach.
We propose a novel technique for the automatic design of molds to cast highly complex shapes. The technique generates composite, two-piece molds. Each mold piece is made up of a hard plastic shell and a flexible silicone part. Thanks to the thin, soft, and smartly shaped silicone part, which is kept in place by a hard plastic shell, we can cast objects of unprecedented complexity. An innovative algorithm based on a volumetric analysis defines the layout of the internal cuts in the silicone mold part. Our approach can robustly handle thin protruding features and intertwined topologies that have caused previous methods to fail. We compare our results with state of the art techniques, and we demonstrate the casting of shapes with extremely complex geometry.
Commercially available full-color 3D printing allows for detailed control of material deposition in a volume, but an exact reproduction of a target surface appearance is hampered by the strong subsurface scattering that causes nontrivial volumetric cross-talk at the print surface. Previous work showed how an iterative optimization scheme based on accumulating absorptive materials at the surface can be used to find a volumetric distribution of print materials that closely approximates a given target appearance.
In this work, we first revisit the assumption that pushing the absorptive materials to the surface results in minimal volumetric cross-talk. We design a full-fledged optimization on a small domain for this task and confirm this previously reported heuristic. Then, we extend the above approach that is critically limited to color reproduction on planar surfaces, to arbitrary 3D shapes. Our method enables high-fidelity color texture reproduction on 3D prints by effectively compensating for internal light scattering within arbitrarily shaped objects. In addition, we propose a content-aware gamut mapping that significantly improves color reproduction for the pathological case of thin geometric features. Using a wide range of sample objects with complex textures and geometries, we demonstrate color reproduction whose fidelity is superior to state-of-the-art drivers for color 3D printers.
With the advance of personal and customized fabrication techniques, the capability to embed information in physical objects becomes evermore crucial. We present LayerCode, a tagging scheme that embeds a carefully designed barcode pattern in 3D printed objects as a deliberate byproduct of the 3D printing process. The LayerCode concept is inspired by the structural resemblance between the parallel black and white bars of the standard barcode and the universal layer-by-layer approach of 3D printing. We introduce an encoding algorithm that enables the 3D printing layers to carry information without altering the object geometry. We also introduce a decoding algorithm that reads the LayerCode tag of a physical object by just taking a photo. The physical deployment of LayerCode tags is realized on various types of 3D printers, including Fused Deposition Modeling printers as well as Stereolithography based printers. Each offers its own advantages and tradeoffs. We show that LayerCode tags can work on complex, nontrivial shapes, on which all previous tagging mechanisms may fail. To evaluate LayerCode thoroughly, we further stress test it with a large dataset of complex shapes using virtual rendering. Among 4,835 tested shapes, we successfully encode and decode on more than 99% of the shapes.
A significant fraction of the world's population have experienced virtual characters through games and movies, and the possibility of online VR social experiences may greatly extend this audience. At present, the skin deformation for interactive and real-time characters is typically computed using geometric skinning methods. These methods are efficient and simple to implement, but obtaining quality results requires considerable manual "rigging" effort involving trial-and-error weight painting, the addition of virtual helper bones, etc. The recently introduced Delta Mush algorithm largely solves this rig authoring problem, but its iterative computational approach has prevented direct adoption in real-time engines.
This paper introduces Direct Delta Mush, a new algorithm that simultaneously improves on the efficiency and control of Delta Mush while generalizing previous algorithms. Specifically, we derive a direct rather than iterative algorithm that has the same ballpark computational form as some previous geometric weight blending algorithms. Straightforward variants of the algorithm are then proposed to further optimize computational and storage cost with insignificant quality losses. These variants are equivalent to special cases of several previous skinning algorithms.
Our algorithm simultaneously satisfies the goals of reasonable efficiency, quality, and ease of authoring. Further, its explicit decomposition of rotational and translational effects allows independent control over bending versus twisting deformation, as well as a skin sliding effect.
We present a deep-learning-based method to automatically compute skin weights for skeleton-based deformation of production characters. Given a character mesh and its associated skeleton hierarchy in rest pose, our method constructs a graph for the mesh, each node of which encodes the mesh-skeleton attributes of a vertex. An end-to-end deep graph convolution network is then introduced to learn the mesh-skeleton binding patterns from a set of character models with skin weights painted by artists. The network can be used to predict the skin weight map for a new character model, which describes how the skeleton hierarchy influences the mesh vertices during deformation. Our method is designed to work for non-manifold meshes with multiple disjoint or intersected components, which are common in game production and require complex skeleton hierarchies for animation control. We tested our method on the datasets of two commercial games. Experiments show that the predicted skin weight maps can be readily applied to characters in the production pipeline to generate high-quality deformations.
We demonstrate how to acquire complete human hand bone anatomy (meshes) in multiple poses using magnetic resonance imaging (MRI). Such acquisition was previously difficult because MRI scans must be long for high-precision results (over 10 minutes) and because humans cannot hold the hand perfectly still in non-trivial and badly supported poses. We invent a manufacturing process whereby we use lifecasting materials commonly employed in film special effects industry to generate hand molds, personalized to the subject, and to each pose. These molds are both ergonomic and encasing, and they stabilize the hand during scanning. We also demonstrate how to efficiently segment the MRI scans into individual bone meshes in all poses, and how to correspond each bone's mesh to same mesh connectivity across all poses. Next, we interpolate and extrapolate the MRI-acquired bone meshes to the entire range of motion of the hand, producing an accurate data-driven animation-ready rig for bone meshes. We also demonstrate how to acquire not just bone geometry (using MRI) in each pose, but also a matching highly accurate surface geometry (using optical scanners) in each pose, modeling skin pores and wrinkles. We also give a soft tissue Finite Element Method simulation "rig", consisting of novel tet meshing for stability at the joints, spatially varying geometric and material detail, and quality constraints to the acquired skeleton kinematic rig. Given an animation sequence of hand joint angles, our FEM soft tissue rig produces quality hand surface shapes in arbitrary poses in the hand range of motion. Our results qualitatively reproduce important features seen in the photographs of the subject's hand, such as similar overall organic shape and fold formation.
Imaging objects outside a camera's direct line of sight has important applications in robotic vision, remote sensing, and many other domains. Time-of-flight-based non-line-of-sight (NLOS) imaging systems have recently demonstrated impressive results, but several challenges remain. Image formation and inversion models have been slow or limited by the types of hidden surfaces that can be imaged. Moreover, non-planar sampling surfaces and non-confocal scanning methods have not been supported by efficient NLOS algorithms. With this work, we introduce a wave-based image formation model for the problem of NLOS imaging. Inspired by inverse methods used in seismology, we adapt a frequency-domain method, f-k migration, for solving the inverse NLOS problem. Unlike existing NLOS algorithms, f-k migration is both fast and memory efficient, it is robust to specular and other complex reflectance properties, and we show how it can be used with non-confocally scanned measurements as well as for non-planar sampling surfaces. f-k migration is more robust to measurement noise than alternative methods, generally produces better quality reconstructions, and is easy to implement. We experimentally validate our algorithms with a new NLOS imaging system that records room-sized scenes outdoors under indirect sunlight, and scans persons wearing retroreflective clothing at interactive rates.
Traditional snapshot hyperspectral imaging systems include various optical elements: a dispersive optical element (prism), a coded aperture, several relay lenses, and an imaging lens, resulting in an impractically large form factor. We seek an alternative, minimal form factor of snapshot spectral imaging based on recent advances in diffractive optical technology. We thereupon present a compact, diffraction-based snapshot hyperspectral imaging method, using only a novel diffractive optical element (DOE) in front of a conventional, bare image sensor. Our diffractive imaging method replaces the common optical elements in hyperspectral imaging with a single optical element. To this end, we tackle two main challenges: First, the traditional diffractive lenses are not suitable for color imaging under incoherent illumination due to severe chromatic aberration because the size of the point spread function (PSF) changes depending on the wavelength. By leveraging this wavelength-dependent property alternatively for hyperspectral imaging, we introduce a novel DOE design that generates an anisotropic shape of the spectrally-varying PSF. The PSF size remains virtually unchanged, but instead the PSF shape rotates as the wavelength of light changes. Second, since there is no dispersive element and no coded aperture mask, the ill-posedness of spectral reconstruction increases significantly. Thus, we propose an end-to-end network solution based on the unrolled architecture of an optimization procedure with a spatial-spectral prior, specifically designed for deconvolution-based spectral reconstruction. Finally, we demonstrate hyperspectral imaging with a fabricated DOE attached to a conventional DSLR sensor. Results show that our method compares well with other state-of-the-art hyperspectral imaging methods in terms of spectral accuracy and spatial resolution, while our compact, diffraction-based spectral imaging method uses only a single optical element on a bare image sensor.
Simulating viscoelastic polymers and polymeric fluids requires a robust and accurate capture of elasticity and viscosity. The computation is known to become very challenging under large deformations and high viscosity. Drawing inspirations from return mapping based elastoplasticity treatment for granular materials, we present a finite strain integration scheme for general viscoelastic solids under arbitrarily large deformation and non-equilibrated flow. Our scheme is based on a predictor-corrector exponential mapping scheme on the principal strains from the deformation gradient, which closely resembles the conventional treatment for elastoplasticity and allows straightforward implementation into any existing constitutive models. We develop a new Material Point Method that is fully implicit on both elasticity and inelasticity using augmented Lagrangian optimization with various preconditioning strategies for highly efficient time integration. Our method not only handles viscoelasticity but also supports existing elastoplastic models including Drucker-Prager and von-Mises in a unified manner. We demonstrate the efficacy of our framework on various examples showing intricate and characteristic inelastic dynamics with competitive performance.
We present two new approaches for animating dynamic fracture involving large elastoplastic deformation. In contrast to traditional mesh-based techniques, where sharp discontinuity is introduced to split the continuum at crack surfaces, our methods are based on Continuum Damage Mechanics (CDM) with a variational energy-based formulation for crack evolution. Our first approach formulates the resulting dynamic material damage evolution with a Ginzburg-Landau type phase-field equation and discretizes it with the Material Point Method (MPM), resulting in a coupled momentum/damage solver rooted in phase field fracture: PFF-MPM. Although our PFF-MPM approach achieves convincing fracture with or without plasticity, we also introduce a return mapping algorithm that can be analytically solved for a wide range of general non-associated plasticity models, achieving more than two times speedup over traditional iterative approaches. To demonstrate the efficacy of the algorithm, we also develop a Non-Associated Cam-Clay (NACC) plasticity model with a novel fracture-friendly hardening scheme. Our NACC plasticity paired with traditional MPM composes a second approach to dynamic fracture, as it produces a breadth of organic, brittle material fracture effects on its own. Though NACC and PFF can be combined, we focus on exploring their material effects separately. Both methods can be easily integrated into any existing MPM solver, enabling the simulation of various fracturing materials with extremely high visual fidelity while requiring little additional computational overhead.
We propose a robust method for untangling an arbitrary number of cloth layers, possibly exhibiting deep interpenetrations, to a collision-free state, ready for animation. Our method relies on an intermediate, implicit representation to solve the problem: the user selects a few garments stored in a library together with their implicit approximations, and places them over a mannequin while specifying the desired order between layers. The intersecting implicit surfaces are then combined using a new family of N-ary composition operators, specially designed for untangling layers. Garment meshes are finally projected to the deformed implicit surfaces in linear time, while best preserving triangles and avoiding loss of details.
Each of the untangling operators computes the target surface for a given garment in a single step, while accounting for the order between cloth layers and their individual thicknesses. As a group, they guarantee an intersection-free output configuration. Moreover, a weight can be associated with each layer to tune their relative influence during untangling, such as leather being less deformed than cloth. Results for each layer then reflect the combined effect of the other layers, enabling us to output a plausible configuration in contact regions. As our results show, our method can be used to generate plausible, new static shapes of garments when underwear has been added, as well as collision-free configurations enabling a user to safely launch animations of arbitrarily complex layered clothing.
We present a new algorithm to automatically schedule Halide programs for high-performance image processing and deep learning. We significantly improve upon the performance of previous methods, which considered a limited subset of schedules. We define a parameterization of possible schedules much larger than prior methods and use a variant of beam search to search over it. The search optimizes runtime predicted by a cost model based on a combination of new derived features and machine learning. We train the cost model by generating and featurizing hundreds of thousands of random programs and schedules. We show that this approach operates effectively with or without autotuning. It produces schedules which are on average almost twice as fast as the existing Halide autoscheduler without autotuning, or more than twice as fast with, and is the first automatic scheduling algorithm to significantly outperform human experts on average.
We propose a new modal sound synthesis method that rapidly estimates all acoustic transfer fields of a linear modal vibration model, and greatly reduces preprocessing costs. Instead of performing a separate frequency-domain Helmholtz radiation analysis for each mode, our method partitions vibration modes into chords using optimal mode conflation, then performs a single time-domain wave simulation for each chord. We then perform transfer deconflation on each chord's time-domain radiation field using a specialized QR solver, and thereby extract the frequency-domain transfer functions of each mode. The precomputed transfer functions are represented for fast far-field evaluation, e.g., using multipole expansions. In this paper, we propose to use a single scalar-valued Far-field Acoustic Transfer (FFAT) cube map. We describe a GPU-accelerated vector wavesolver that achieves high-throughput acoustic transfer computation at accuracy sufficient for sound synthesis. Our implementation, KleinPAT, can achieve hundred- to thousand-fold speedups compared to existing Helmholtz-based transfer solvers, thereby enabling large-scale generation of modal sound models for audio-visual applications.
A typical rainfall scenario contains tens of thousands of dynamic sound sources. A characteristic of the large-scale scene is the strong randomness in raindrop distribution, which makes it notoriously expensive to synthesize such sounds with purely physical methods. Moreover, the raindrops hitting different surfaces (liquid or various solids) can emit distinct sounds, for which prior methods with unified impact sound models are ill-suited.
In this paper, we present a physically-based statistical simulation method to synthesize realistic rain sound, which respects surface materials. We first model the raindrop sound with two mechanisms, namely the initial impact and the subsequent pulsation of entrained bubbles. Then we generate material sound textures (MSTs) based on a specially designed signal decomposition and reconstruction model. This allows us to distinguish liquid surface with bubble sound and different solid surfaces with MSTs. Furthermore, we build a basic rain sound (BR-sound) bank with the proposed raindrop sound clustering method based on a statistical model, and design a sound source activator for simulating spatial propagation in an efficient manner. This novel method drastically decreases the computational cost while producing convincing sound results. Various experiments demonstrate the effectiveness of our sound simulation model.
We propose a new method for reconstructing an implicit surface from an un-oriented point set. While existing methods often involve non-trivial heuristics and require additional constraints, such as normals or labelled points, we introduce a direct definition of the function from the points as the solution to a constrained quadratic optimization problem. The definition has a number of appealing features: it uses a single parameter (parameter-free for exact interpolation), applies to any dimensions, commutes with similarity transformations, and can be easily implemented without discretizing the space. More importantly, the use of a global smoothness energy allows our definition to be much more resilient to sampling imperfections than existing methods, making it particularly suited for sparse and non-uniform inputs.
Denoising has proven to be useful to efficiently generate high-quality Monte Carlo renderings. Traditional pixel-based denoisers exploit summary statistics of a pixel's sample distributions, which discards much of the samples' information and limits their denoising power. On the other hand, sample-based techniques tend to be slow and have difficulties handling general transport scenarios. We present the first convolutional network that can learn to denoise Monte Carlo renderings directly from the samples. Learning the mapping between samples and images creates new challenges for the network architecture design: the order of the samples is arbitrary, and they should be treated in a permutation invariant manner. To address these challenges, we develop a novel kernel-predicting architecture that splats individual samples onto nearby pixels. Splatting is a natural solution to situations such as motion blur, depth-of-field and many light transport paths, where it is easier to predict which pixels a sample contributes to, rather than a gather approach that needs to figure out, for each pixel, which samples (or nearby pixels) are relevant. Compared to previous state-of-the-art methods, ours is robust to the severe noise of low-sample count images (e.g. 8 samples per pixel) and yields higher-quality results both visually and numerically. Our approach retains the generality and efficiency of pixel-space methods while enjoying the expressiveness and accuracy of the more complex sample-based approaches.
It has been shown that rendering in the gradient domain, i.e., estimating finite difference gradients of image intensity using correlated samples, and combining them with direct estimates of pixel intensities by solving a screened Poisson problem, often offers fundamental benefits over merely sampling pixel intensities. The reasons can be traced to the frequency content of the light transport integrand and its interplay with the gradient operator. However, while they often yield state of the art performance among algorithms that are based on Monte Carlo sampling alone, gradient-domain rendering algorithms have, until now, not generally been competitive with techniques that combine Monte Carlo sampling with post-hoc noise removal using sophisticated non-linear filtering.
Drawing on the power of modern convolutional neural networks, we propose a novel reconstruction method for gradient-domain rendering. Our technique replaces the screened Poisson solver of previous gradient-domain techniques with a novel dense variant of the U-Net autoencoder, additionally taking auxiliary feature buffers as inputs. We optimize our network to minimize a perceptual image distance metric calibrated to the human visual system. Our results significantly improve the quality obtained from gradient-domain path tracing, allowing it to overtake state-of-the-art comparison techniques that denoise traditional Monte Carlo samplings. In particular, we observe that the correlated gradient samples --- that offer information about the smoothness of the integrand unavailable in standard Monte Carlo sampling --- notably improve image quality compared to an equally powerful neural model that does not make use of gradient samples.
Subsurface scattering, in which light refracts into a translucent material to interact with its interior, is the dominant mode of light transport in many types of organic materials. Accounting for this phenomenon is thus crucial for visual realism, but explicit simulation of the complex internal scattering process is often too costly. BSSRDF models based on analytic transport solutions are significantly more efficient but impose severe assumptions that are almost always violated, e.g. planar geometry, isotropy, low absorption, and spatio-directional separability. The resulting discrepancies between model and usage lead to objectionable errors in renderings, particularly near geometric features that violate planarity.
This article introduces a new shape-adaptive BSSRDF model that retains the efficiency of prior analytic methods while greatly improving overall accuracy. Our approach is based on a conditional variational autoencoder, which learns to sample from a reference distribution produced by a brute-force volumetric path tracer. In contrast to the path tracer, our autoencoder directly samples outgoing locations on the object surface, bypassing a potentially lengthy internal scattering process.
The distribution is conditional on both material properties and a set of features characterizing geometric variation in a neighborhood of the incident location. We use a low-order polynomial to model the local geometry as an implicitly defined surface, capturing curvature, thickness, corners, as well as cylindrical and toroidal regions. We present several examples of objects with challenging medium parameters and complex geometry and compare to ground truth simulations and prior work.
In this paper, we introduce BiMocq2, an unconditionally stable, pure Eulerianbased advection scheme to efficiently preserve the advection accuracy of all physical quantities for long-term fluid simulations. Our approach is built upon the method of characteristic mapping (MCM). Instead of the costly evaluation of the temporal characteristic integral, we evolve the mapping function itself by solving an advection equation for the mappings. Dual mesh characteristics (DMC) method is adopted to more accurately update the mapping. Furthermore, to avoid visual artifacts like instant blur and temporal inconsistency introduced by re-initialization, we introduce multi-level mapping and back and forth error compensation. We conduct comprehensive 2D and 3D benchmark experiments to compare against alternative advection schemes. In particular, for the vortical flow and level set experiments, our method outperforms almost all state-of-art hybrid schemes, including FLIP, PolyPic and Particle-Level-Set, at the cost of only two Semi-Lagrangian advections. Additionally, our method does not rely on the particle-grid transfer operations, leading to a highly parallelizable pipeline. As a result, more than 45× performance acceleration can be achieved via even a straightforward porting of the code from CPU to GPU.
We introduce variable thickness, viscous vortex filaments. These can model such varied phenomena as underwater bubble rings or the intricate "chandeliers" formed by ink dropping into fluid. Treating the evolution of such filaments as an instance of Newtonian dynamics on a Riemannian configuration manifold we are able to extend classical work in the dynamics of vortex filaments through inclusion of viscous drag forces. The latter must be accounted for in low Reynolds number flows where they lead to significant variations in filament thickness and form an essential part of the observed dynamics. We develop and document both the underlying theory and associated practical numerical algorithms.
This paper investigates the use of fundamental solutions for animating detailed linear water surface waves. We first propose an analytical solution for efficiently animating circular ripples in closed form. We then show how to adapt the method of fundamental solutions (MFS) to create ambient waves interacting with complex obstacles. Subsequently, we present a novel wavelet-based discretization which outperforms the state of the art MFS approach for simulating time-varying water surface waves with moving obstacles. Our results feature high-resolution spatial details, interactions with complex boundaries, and large open ocean domains. Our method compares favorably with previous work as well as known analytical solutions. We also present comparisons between our method and real world examples.
Due to the enormous amount of detail and the interplay of various biological phenomena, modeling realistic ecosystems of trees and other plants is a challenging and open problem. Previous research on modeling plant ecologies has focused on representations to handle this complexity, mostly through geometric simplifications, such as points or billboards. In this paper we describe a multi-scale method to design large-scale ecosystems with individual plants that are realistically modeled and faithfully capture biological features, such as growth, plant interactions, different types of tropism, and the competition for resources. Our approach is based on leveraging inter- and intra-plant self-similarities for efficiently modeling plant geometry. We focus on the interactive design of plant ecosystems of up to 500K plants, while adhering to biological priors known in forestry and botany research. The introduced parameter space supports modeling properties of nine distinct plant ecologies while each plant is represented as a 3D surface mesh. The capabilities of our framework are illustrated through numerous models of forests, individual plants, and validations.
We present a new framework for interior scene synthesis that combines a high-level relation graph representation with spatial prior neural networks. We observe that prior work on scene synthesis is divided into two camps: object-oriented approaches (which reason about the set of objects in a scene and their configurations) and space-oriented approaches (which reason about what objects occupy what regions of space). Our insight is that the object-oriented paradigm excels at high-level planning of how a room should be laid out, while the space-oriented paradigm performs well at instantiating a layout by placing objects in precise spatial configurations. With this in mind, we present PlanIT, a layout-generation framework that divides the problem into two distinct planning and instantiation phases. PlanIT represents the "plan" for a scene via a relation graph, encoding objects as nodes and spatial/semantic relationships between objects as edges. In the planning phase, it uses a deep graph convolutional generative model to synthesize relation graphs. In the instantiation phase, it uses image-based convolutional network modules to guide a search procedure that places objects into the scene in a manner consistent with the graph. By decomposing the problem in this way, PlanIT generates scenes of comparable quality to those generated by prior approaches (as judged by both people and learned classifiers), while also providing the modeling flexibility of the intermediate relationship graph representation. These graphs allow the system to support applications such as scene synthesis from a partial graph provided by a user.
Layout is fundamental to graphic designs. For visual attractiveness and efficient communication of messages and ideas, graphic design layouts often have great variation, driven by the contents to be presented. In this paper, we study the problem of content-aware graphic design layout generation. We propose a deep generative model for graphic design layouts that is able to synthesize layout designs based on the visual and textual semantics of user inputs. Unlike previous approaches that are oblivious to the input contents and rely on heuristic criteria, our model captures the effect of visual and textual contents on layouts, and implicitly learns complex layout structure variations from data without the use of any heuristic rules. To train our model, we build a large-scale magazine layout dataset with fine-grained layout annotations and keyword labeling. Experimental results show that our model can synthesize high-quality layouts based on the visual semantics of input images and keyword-based summary of input text. We also demonstrate that our model internally learns powerful features that capture the subtle interaction between contents and layouts, which are useful for layout-aware design retrieval.
In this paper we present a unified deep inverse rendering framework for estimating the spatially-varying appearance properties of a planar exemplar from an arbitrary number of input photographs, ranging from just a single photograph to many photographs. The precision of the estimated appearance scales from plausible when the input photographs fails to capture all the reflectance information, to accurate for large input sets. A key distinguishing feature of our framework is that it directly optimizes for the appearance parameters in a latent embedded space of spatially-varying appearance, such that no handcrafted heuristics are needed to regularize the optimization. This latent embedding is learned through a fully convolutional auto-encoder that has been designed to regularize the optimization. Our framework not only supports an arbitrary number of input photographs, but also at high resolution. We demonstrate and evaluate our deep inverse rendering solution on a wide variety of publicly available datasets.
We present a model to measure the similarity in appearance between different materials, which correlates with human similarity judgments. We first create a database of 9,000 rendered images depicting objects with varying materials, shape and illumination. We then gather data on perceived similarity from crowdsourced experiments; our analysis of over 114,840 answers suggests that indeed a shared perception of appearance similarity exists. We feed this data to a deep learning architecture with a novel loss function, which learns a feature space for materials that correlates with such perceived appearance similarity. Our evaluation shows that our model outperforms existing metrics. Last, we demonstrate several applications enabled by our metric, including appearance-based search for material suggestions, database visualization, clustering and summarization, and gamut mapping.
We present a compact and efficient representation of spectra for accurate rendering using more than three dimensions. While tristimulus color spaces are sufficient for color display, a spectral renderer has to simulate light transport per wavelength. Consequently, emission spectra and surface albedos need to be known at each wavelength. It is practical to store dense samples for emission spectra but for albedo textures, the memory requirements of this approach are unreasonable. Prior works that approximate dense spectra from tristimulus data introduce strong errors under illuminants with sharp peaks and in indirect illumination. We represent spectra by an arbitrary number of Fourier coefficients. However, we do not use a common truncated Fourier series because its ringing could lead to albedos below zero or above one. Instead, we present a novel approach for reconstruction of bounded densities based on the theory of moments. The core of our technique is our bounded maximum entropy spectral estimate. It uses an efficient closed form to compute a smooth signal between zero and one that matches the given Fourier coefficients exactly. Still, a ground truth that localizes all of its mass around a few wavelengths can be reconstructed adequately. Therefore, our representation covers the full gamut of valid reflectances. The resulting textures are compact because each coefficient can be stored in 10 bits. For compatibility with existing tristimulus assets, we implement a mapping from tristimulus color spaces to three Fourier coefficients. Using three coefficients, our technique gives state of the art results without some of the drawbacks of related work. With four to eight coefficients, our representation is superior to all existing representations. Our focus is on offline rendering but we also demonstrate that the technique is fast enough for real-time rendering.
Prefiltering the reflectance of a displacement-mapped surface while preserving its overall appearance is challenging, as smoothing a displacement map causes complex changes of illumination effects such as shadowing-masking and interreflection. In this paper, we introduce a new method that prefilters displacement maps and BRDFs jointly and constructs SVBRDFs at reduced resolutions. These SVBRDFs preserve the appearance of the input models by capturing both shadowing-masking and interreflection effects. To express our appearance-preserving SVBRDFs efficiently, we leverage a new representation that involves spatially varying NDFs and a novel scaling function that accurately captures micro-scale changes of shadowing, masking, and interreflection effects. Further, we show that the 6D scaling function can be factorized into a 2D function of surface location and a 4D function of direction. By exploiting the smoothness of these functions, we develop a simple and efficient factorization method that does not require computing the full scaling function. The resulting functions can be represented at low resolutions (e.g., 42 for the spatial function and 154 for the angular function), leading to minimal additional storage. Our method generalizes well to different types of geometries beyond Gaussian surfaces. Models prefiltered using our approach at different scales can be combined to form mipmaps, allowing accurate and anti-aliased level-of-detail (LoD) rendering.