We present an approach for adding directed gaze movements to characters animated using full-body motion capture. Our approach provides a comprehensive authoring solution that automatically infers plausible directed gaze from the captured body motion, provides convenient controls for manual editing, and adds synthetic gaze movements onto the original motion. The foundation of the approach is an abstract representation of gaze behavior as a sequence of gaze shifts and fixations toward targets in the scene. We present methods for automatic inference of this representation by analyzing the head and torso kinematics and scene features. We introduce tools for convenient editing of the gaze sequence and target layout that allow an animator to adjust the gaze behavior without worrying about the details of pose and timing. A synthesis component translates the gaze sequence into coordinated movements of the eyes, head, and torso, and blends these with the original body motion. We evaluate the effectiveness of our inference methods, the efficiency of the authoring process, and the quality of the resulting animation.
Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort with marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. Alternative suit-based systems use several inertial measurement units or an exoskeleton to capture motion with an inside-in setup, i.e. without external sensors. This makes capture independent of a confined volume, but requires substantial, often constraining, and hard to set up body instrumentation. Therefore, we propose a new method for real-time, marker-less, and egocentric motion capture: estimating the full-body skeleton pose from a lightweight stereo pair of fisheye cameras attached to a helmet or virtual reality headset - an optical inside-in method, so to speak. This allows full-body motion capture in general indoor and outdoor scenes, including crowded scenes with many people nearby, which enables reconstruction in larger-scale activities. Our approach combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a large new dataset. It is particularly useful in virtual reality to freely roam and interact, while seeing the fully motion-captured virtual body.
Inverse dynamics is an important and challenging problem in human motion modeling, synthesis and simulation, as well as in robotics and biomechanics. Previous solutions to inverse dynamics are often noisy and ambiguous particularly when double stances occur. In this paper, we present a novel inverse dynamics method that accurately reconstructs biomechanically valid contact information, including center of pressure, contact forces, torsional torques and internal joint torques from input kinematic human motion data. Our key idea is to apply statistical modeling techniques to a set of preprocessed human kinematic and dynamic motion data captured by a combination of an optical motion capture system, pressure insoles and force plates. We formulate the data-driven inverse dynamics problem in a maximum a posteriori (MAP) framework by estimating the most likely contact information and internal joint torques that are consistent with input kinematic motion data. We construct a low-dimensional data-driven prior model for contact information and internal joint torques to reduce ambiguity of inverse dynamics for human motion. We demonstrate the accuracy of our method on a wide variety of human movements including walking, jumping, running, turning and hopping and achieve state-of-the-art accuracy in our comparison against alternative methods. In addition, we discuss how to extend the data-driven inverse dynamics framework to motion editing, filtering and motion control.
Microscopic crowd simulators rely on models of local interaction (e.g. collision avoidance) to synthesize the individual motion of each virtual agent. The quality of the resulting motions heavily depends on this component, which has significantly improved in the past few years. Recent advances have been in particular due to the introduction of a short-horizon motion prediction strategy that enables anticipated motion adaptation during local interactions among agents. However, the simplicity of prediction techniques of existing models somewhat limits their domain of validity. In this paper, our key objective is to significantly improve the quality of simulations by expanding the applicable range of motion predictions. To this end, we present a novel local interaction algorithm with a new context-aware, probabilistic motion prediction model. By context-aware, we mean that this approach allows crowd simulators to account for many factors, such as the influence of environment layouts or in-progress interactions among agents, and has the ability to simultaneously maintain several possible alternate scenarios for future motions and to cope with uncertainties on sensing and other agent's motions. Technically, this model introduces "collision probability fields" between agents, efficiently computed through the cumulative application of Warp Operators on a source Intrinsic Field. We demonstrate how this model significantly improves the quality of simulated motions in challenging scenarios, such as dense crowds and complex environments.
Artists routinely use gesture drawings to communicate ideated character poses for storyboarding and other digital media. During subsequent posing of the 3D character models, they use these drawing as a reference, and perform the posing itself using 3D interfaces which require time and expert 3D knowledge to operate. We propose the first method for automatically posing 3D characters directly using gesture drawings as an input, sidestepping the manual 3D posing step. We observe that artists are skilled at quickly and effectively conveying poses using such drawings, and design them to facilitate a single perceptually consistent pose interpretation by viewers. Our algorithm leverages perceptual cues to parse the drawings and recover the artist-intended poses. It takes as input a vector-format rough gesture drawing and a rigged 3D character model, and plausibly poses the character to conform to the depicted pose. No other input is required. Our contribution is two-fold: we first analyze and formulate the pose cues encoded in gesture drawings; we then employ these cues to compute a plausible image space projection of the conveyed pose and to imbue it with depth. Our framework is designed to robustly overcome errors and inaccuracies frequent in typical gesture drawings. We exhibit a wide variety of character models posed by our method created from gesture drawings of complex poses, including poses with occlusions and foreshortening. We validate our approach via result comparisons to artist-posed models generated from the same reference drawings, via studies that confirm that our results agree with viewer perception, and via comparison to algorithmic alternatives.
Volumetric micro-appearance models have provided remarkably high-quality renderings, but are highly data intensive and usually require tens of gigabytes in storage. When an object is viewed from a distance, the highest level of detail offered by these models is usually unnecessary, but traditional linear downsampling weakens the object's intrinsic shadowing structures and can yield poor accuracy. We introduce a joint optimization of single-scattering albedos and phase functions to accurately downsample heterogeneous and anisotropic media. Our method is built upon scaled phase functions, a new representation combining abledos and (standard) phase functions. We also show that modularity can be exploited to greatly reduce the amortized optimization overhead by allowing multiple synthesized models to share one set of down-sampled parameters. Our optimized parameters generalize well to novel lighting and viewing configurations, and the resulting data sets offer several orders of magnitude storage savings.
Several scalable many-light rendering methods have been proposed recently for the efficient computation of global illumination. However, gathering contributions of virtual lights in participating media remains an inefficient and time-consuming task. In this paper, we present a novel sparse sampling and reconstruction method to accelerate the gathering step of the many-light rendering for participating media. Our technique explores the observation that the scattered lightings are usually locally coherent and of low rank even in heterogeneous media. In particular, we first introduce a matrix formation with light segments as columns and eye ray segments as rows, and formulate the gathering step into a matrix sampling and reconstruction problem. We then propose an adaptive matrix column sampling and completion algorithm to efficiently reconstruct the matrix by only sampling a small number of elements. Experimental results show that our approach greatly improves the performance, and obtains up to one order of magnitude speedup compared with other state-of-the-art methods of many-light rendering for participating media.
We address the challenge of efficiently rendering massive assemblies of grains within a forward path-tracing framework. Previous approaches exist for accelerating high-order scattering for a limited, and static, set of granular materials, often requiring scene-dependent precomputation. We significantly expand the admissible regime of granular materials by considering heterogeneous and dynamic granular mixtures with spatially varying grain concentrations, pack rates, and sizes. Our method supports both procedurally generated grain assemblies and dynamic assemblies authored in off-the-shelf particle simulation tools. The key to our speedup lies in two complementary aggregate scattering approximations which we introduced to jointly accelerate construction of short and long light paths. For low-order scattering, we accelerate path construction using novel grain scattering distribution functions (GSDF) which aggregate intra-grain light transport while retaining important grain-level structure. For high-order scattering, we extend prior work on shell transport functions (STF) to support dynamic, heterogeneous mixtures of grains with varying sizes. We do this without a scene-dependent precomputation and show how this can also be used to accelerate light transport in arbitrary continuous heterogeneous media. Our multi-scale rendering automatically minimizes the usage of explicit path tracing to only the first grain along a light path, or can avoid it completely, when appropriate, by switching to our aggregate transport approximations. We demonstrate our technique on animated scenes containing heterogeneous mixtures of various types of grains that could not previously be rendered efficiently. We also compare to previous work on a simpler class of granular assemblies, reporting significant computation savings, often yielding higher accuracy results.
We explore the theory of integration with control variates in the context of rendering. Our goal is to optimally combine multiple estimators using their covariances. We focus on two applications, re-rendering and gradient-domain rendering, where we exploit coherence between temporally and spatially adjacent pixels. We propose an image-space (iterative) reconstruction scheme that employs control variates to reduce variance. We show that recent works on scene editing and gradient-domain rendering can be directly formulated as control-variate estimators, despite using seemingly different approaches. In particular, we demonstrate the conceptual equivalence of screened Poisson image reconstruction and our iterative reconstruction scheme. Our composite estimators offer practical and simple solutions that improve upon the current state of the art for the two investigated applications.
Wood is an important decorative material prized for its unique appearance. It is commonly rendered using artistically authored 2D color and bump textures, which reproduces color patterns on flat surfaces well. But the dramatic anisotropic specular figure caused by wood fibers, common in curly maple and other species, is harder to achieve. While suitable BRDF models exist, the texture parameter maps for these wood BRDFs are difficult to author---good results have been shown with elaborate measurements for small flat samples, but these models are not much used in practice. Furthermore, mapping 2D image textures onto 3D objects leads to distortion and inconsistencies. Procedural volumetric textures solve these geometric problems, but existing methods produce much lower quality than image textures. This paper aims to bring the best of all these techniques together: we present a comprehensive volumetric simulation of wood appearance, including growth rings, color variation, pores, rays, and growth distortions. The fiber directions required for anisotropic specular figure follow naturally from the distortions. Our results rival the quality of textures based on photographs, but with the consistency and convenience of a volumetric model. Our model is modular, with components that are intuitive to control, fast to compute, and require minimal storage.
We introduce a co-analysis technique designed for correspondence inference within large shape collections. Such collections are naturally rich in variation, adding ambiguity to the notoriously difficult problem of correspondence computation. We leverage the robustness of correspondences between similar shapes to address the difficulties associated with this problem. In our approach, pairs of similar shapes are extracted from the collection, analyzed and matched in an efficient and reliable manner, culminating in the construction of a network of correspondences that connects the entire collection. The correspondence between any pair of shapes then amounts to a simple propagation along the minimax path between the two shapes in the network. At the heart of our approach is the introduction of a robust, structure-oriented shape matching method. Leveraging the idea of projective analysis, we partition 2D projections of a shape to obtain a set of 1D ordered regions, which are both simple and efficient to match. We lift the matched projections back to the 3D domain to obtain a pairwise shape correspondence. The emphasis given to structural compatibility is a central tool in estimating the reliability and completeness of a computed correspondence, uncovering any non-negligible semantic discrepancies that may exist between shapes. These detected differences are a deciding factor in the establishment of a network aiming to capture local similarities. We demonstrate that the combination of the presented observations into a co-analysis method allows us to establish reliable correspondences among shapes within large collections.
We present a technique for parsing widely used furniture assembly instructions, and reconstructing the 3D models of furniture components and their dynamic assembly process. Our technique takes as input a multi-step assembly instruction in a vector graphic format and starts to group the vector graphic primitives into semantic elements representing individual furniture parts, mechanical connectors (e.g., screws, bolts and hinges), arrows, visual highlights, and numbers. To reconstruct the dynamic assembly process depicted over multiple steps, our system identifies previously built 3D furniture components when parsing a new step, and uses them to address the challenge of occlusions while generating new 3D components incrementally. With a wide range of examples covering a variety of furniture types, we demonstrate the use of our system to animate the 3D furniture assembly process and, beyond that, the semantic-aware furniture editing as well as the fabrication of personalized furnitures.
We introduce a framework for action-driven evolution of 3D indoor scenes, where the goal is to simulate how scenes are altered by human actions, and specifically, by object placements necessitated by the actions. To this end, we develop an action model with each type of action combining information about one or more human poses, one or more object categories, and spatial configurations of objects belonging to these categories which summarize the object-object and object-human relations for the action. Importantly, all these pieces of information are learned from annotated photos. Correlations between the learned actions are analyzed to guide the construction of an action graph. Starting with an initial 3D scene, we probabilistically sample a sequence of actions from the action graph to drive progressive scene evolution. Each action triggers appropriate object placements, based on object co-occurrences and spatial configurations learned for the action model. We show results of our scene evolution that lead to realistic and messy 3D scenes, as well as quantitative evaluations by user studies which compare our method to manual scene creation and state-of-the-art, data-driven methods, in terms of scene plausibility and naturalness.
Visualizing changes to indoor scenes is important for many applications. When looking for a new place to live, we want to see how the interior looks not with the current inhabitant's belongings, but with our own furniture. Before purchasing a new sofa, we want to visualize how it would look in our living room. In this paper, we present a system that takes an RGBD scan of an indoor scene and produces a scene model of the empty room, including light emitters, materials, and the geometry of the non-cluttered room. Our system enables realistic rendering not only of the empty room under the original lighting conditions, but also with various scene edits, including adding furniture, changing the material properties of the walls, and relighting. These types of scene edits enable many mixed reality applications in areas such as real estate, furniture retail, and interior design. Our system contains two novel technical contributions: a 3D radiometric calibration process that recovers the appearance of the scene in high dynamic range, and a global-illumination-aware inverse rendering framework that simultaneously recovers reflectance properties of scene surfaces and lighting properties for several light source types, including generalized point and line lights.
Recent study in vision science has shown that blue light in a certain frequency band affects human circadian rhythm and impairs our health. Although applying a light blocker to an image display can block the harmful blue light, it inevitably makes an image look like an aged photo. In this paper, we show that it is possible to reduce harmful blue light while preserving the blue appearance of an image. Moreover, we optimize the spectral transmittance profile of blue light blocker based on psychophysical data and develop a color compensation algorithm to minimize color distortion. A prototype using notch filters is built as a proof of concept.
Binocular disparity is the main depth cue that makes stereoscopic images appear 3D. However, in many scenarios, the range of depth that can be reproduced by this cue is greatly limited and typically fixed due to constraints imposed by displays. For example, due to the low angular resolution of current automultiscopic screens, they can only reproduce a shallow depth range. In this work, we study the motion parallax cue, which is a relatively strong depth cue, and can be freely reproduced even on a 2D screen without any limits. We exploit the fact that in many practical scenarios, motion parallax provides sufficiently strong depth information that the presence of binocular depth cues can be reduced through aggressive disparity compression. To assess the strength of the effect we conduct psycho-visual experiments that measure the influence of motion parallax on depth perception and relate it to the depth resulting from binocular disparity. Based on the measurements, we propose a joint disparity-parallax computational model that predicts apparent depth resulting from both cues. We demonstrate how this model can be applied in the context of stereo and multiscopic image processing, and propose new disparity manipulation techniques, which first quantify depth obtained from motion parallax, and then adjust binocular disparity information accordingly. This allows us to manipulate the disparity signal according to the strength of motion parallax to improve the overall depth reproduction. This technique is validated in additional experiments.
Large 3D model repositories of common objects are now ubiquitous and are increasingly being used in computer graphics and computer vision for both analysis and synthesis tasks. However, images of objects in the real world have a richness of appearance that these repositories do not capture, largely because most existing 3D models are untextured. In this work we develop an automated pipeline capable of transporting texture information from images of real objects to 3D models of similar objects. This is a challenging problem, as an object's texture as seen in a photograph is distorted by many factors, including pose, geometry, and illumination. These geometric and photometric distortions must be undone in order to transfer the pure underlying texture to a new object --- the 3D model. Instead of using problematic dense correspondences, we factorize the problem into the reconstruction of a set of base textures (materials) and an illumination model for the object in the image. By exploiting the geometry of the similar 3D model, we reconstruct certain reliable texture regions and correct for the illumination, from which a full texture map can be recovered and applied to the model. Our method allows for large-scale unsupervised production of richly textured 3D models directly from image data, providing high quality virtual objects for 3D scene design or photo editing applications, as well as a wealth of data for training machine learning algorithms for various inference tasks in graphics and vision.
We analyze the dense matching problem for Internet scene images based on the fact that commonly only part of images can be matched due to the variation of view angle, motion, objects, etc. We thus propose regional foremost matching to reject outlier matching points while still producing dense high-quality correspondence in the remaining foremost regions. Our system initializes sparse correspondence, propagates matching with model fitting and optimization, and detects foremost regions robustly. We apply our method to several applications, including time-lapse sequence generation, Internet photo composition, automatic image morphing, and automatic rephotography.
Foveated rendering synthesizes images with progressively less detail outside the eye fixation region, potentially unlocking significant speedups for wide field-of-view displays, such as head mounted displays, where target framerate and resolution is increasing faster than the performance of traditional real-time renderers.
To study and improve potential gains, we designed a foveated rendering user study to evaluate the perceptual abilities of human peripheral vision when viewing today's displays. We determined that filtering peripheral regions reduces contrast, inducing a sense of tunnel vision. When applying a postprocess contrast enhancement, subjects tolerated up to 2× larger blur radius before detecting differences from a non-foveated ground truth. After verifying these insights on both desktop and head mounted displays augmented with high-speed gaze-tracking, we designed a perceptual target image to strive for when engineering a production foveated renderer.
Given our perceptual target, we designed a practical foveated rendering system that reduces number of shades by up to 70% and allows coarsened shading up to 30° closer to the fovea than Guenter et al.  without introducing perceivable aliasing or blur. We filter both pre- and post-shading to address aliasing from undersampling in the periphery, introduce a novel multiresolution- and saccade-aware temporal antialising algorithm, and use contrast enhancement to help recover peripheral details that are resolvable by our eye but degraded by filtering.
We validate our system by performing another user study. Frequency analysis shows our system closely matches our perceptual target. Measurements of temporal stability show we obtain quality similar to temporally filtered non-foveated renderings.
We introduce Bidirectional Sound Transport (BST), a new algorithm that simulates sound propagation by bidirectional path tracing using multiple importance sampling. Our approach can handle multiple sources in large virtual environments with complex occlusion, and can produce plausible acoustic effects at an interactive rate on a desktop PC. We introduce a new metric based on the signal-to-noise ratio (SNR) of the energy response and use this metric to evaluate the performance of ray-tracing-based acoustic simulation methods. Our formulation exploits temporal coherence in terms of using the resulting sample distribution of the previous frame to guide the sample distribution of the current one. We show that our sample redistribution algorithm converges and better balances between early and late reflections. We evaluate our approach on different benchmarks and demonstrate significant speedup over prior geometric acoustic algorithms.
Crumpling a thin sheet produces a characteristic sound, comprised of distinct clicking sounds corresponding to buckling events. We propose a physically based algorithm that automatically synthesizes crumpling sounds for a given thin shell animation. The resulting sound is a superposition of individually synthesized clicking sounds corresponding to visually significant and insignificant buckling events. We identify visually significant buckling events on the dynamically evolving thin surface mesh, and instantiate visually insignificant buckling events via a stochastic model that seeks to mimic the power-law distribution of buckling energies observed in many materials.
In either case, the synthesis of a buckling sound employs linear modal analysis of the deformed thin shell. Because different buckling events in general occur at different deformed configurations, the question arises whether the calculation of linear modes can be reused. We amortize the cost of the linear modal analysis by dynamically partitioning the mesh into nearly rigid pieces: the modal analysis of a rigidly moving piece is retained over time, and the modal analysis of the assembly is obtained via Component Mode Synthesis (CMS). We illustrate our approach through a series of examples and a perceptual user study, demonstrating the utility of the sound synthesis method in producing realistic sounds at practical computation times.
Tangles are a form of structured pen-and-ink 2D art characterized by repeating, recursive patterns. We present a method to procedurally generate tangle drawings, seen as recursively split sets of arbitrary 2D polygons with holes, with anisotropic and non-stationary features. We formally model tangles with group grammars, an extension of set grammars, that explicitly handles the grouping of shapes necessary to represent tangle repetitions. We introduce a small set of expressive geometric and grouping operators, showing that they can respectively express complex tangles patterns and sub-pattern distributions, with relatively simple grammars. We also show how users can control tangle generation in an interactive and intuitive way. Throughout the paper, we show how group grammars can, in few tens of seconds, produce a wide variety of patterns that would take artists hours of tedious and time-consuming work. We then validated both the quality of the generated tangles and the efficiency of the control provided to the users with a user study, run with both expert and non-expert users.
In this paper, we present the concept of operator graph scheduling for high performance procedural generation on the graphics processing unit (GPU). The operator graph forms an intermediate representation that describes all possible operations and objects that can arise during a specific procedural generation. While previous methods have focused on parallelizing a specific procedural approach, the operator graph is applicable to all procedural generation methods that can be described by a graph, such as L-systems, shape grammars, or stack based generation methods. Using the operator graph, we show that all partitions of the graph correspond to possible ways of scheduling a procedural generation on the GPU, including the scheduling strategies of previous work. As the space of possible partitions is very large, we describe three search heuristics, aiding an optimizer in finding the fastest valid schedule for any given operator graph. The best partitions found by our optimizer increase performance of 8 to 30x over the previous state of the art in GPU shape grammar and L-system generation.
This paper presents an interactive design interface for three-dimensional free-form musical wind instruments. The sound of a wind instrument is governed by the acoustic resonance as a result of complicated interactions of sound waves and internal geometries of the instrument. Thus, creating an original free-form wind instrument by manual methods is a challenging problem. Our interface provides interactive sound simulation feedback as the user edits, allowing exploration of original wind instrument designs. Sound simulation of a 3D wind musical instrument is known to be computationally expensive. To overcome this problem, we first model the wind instruments as a passive resonator, where we ignore coupled oscillation excitation from the mouthpiece. Then we present a novel efficient method to estimate the resonance frequency based on the boundary element method by formulating the resonance problem as a minimum eigenvalue problem. Furthermore, we can efficiently compute an approximate resonance frequency using a new technique based on a generalized eigenvalue problem. The designs can be fabricated using a 3D printer, thus we call the results "print-wind instruments" in association with woodwind instruments. We demonstrate our approach with examples of unconventional shapes performing familiar songs.
Acquiring microscale reflectance and normals is useful for digital documentation and identification of real-world materials. However, its simultaneous acquisition has rarely been explored due to the difficulties of combining both sources of information at such small scale. In this paper, we capture both spatially-varying material appearance (diffuse, specular and roughness) and normals simultaneously at the microscale resolution. We design and build a microscopic light dome with 374 LED lights over the hemisphere, specifically tailored to the characteristics of microscopic imaging. This allows us to achieve the highest resolution for such combined information among current state-of-the-art acquisition systems. We thoroughly test and characterize our system, and provide microscopic appearance measurements of a wide range of common materials, as well as renderings of novel views to validate the applicability of our captured data. Additional applications such as bi-scale material editing from real-world samples are also demonstrated.
Many different techniques for measuring material appearance have been proposed in the last few years. These have produced large public datasets, which have been used for accurate, data-driven appearance modeling. However, although these datasets have allowed us to reach an unprecedented level of realism in visual appearance, editing the captured data remains a challenge. In this paper, we present an intuitive control space for predictable editing of captured BRDF data, which allows for artistic creation of plausible novel material appearances, bypassing the difficulty of acquiring novel samples. We first synthesize novel materials, extending the existing MERL dataset up to 400 mathematically valid BRDFs. We then design a large-scale experiment, gathering 56,000 subjective ratings on the high-level perceptual attributes that best describe our extended dataset of materials. Using these ratings, we build and train networks of radial basis functions to act as functionals mapping the perceptual attributes to an underlying PCA-based representation of BRDFs. We show that our functionals are excellent predictors of the perceived attributes of appearance. Our control space enables many applications, including intuitive material editing of a wide range of visual properties, guidance for gamut mapping, analysis of the correlation between perceptual attributes, or novel appearance similarity metrics. Moreover, our methodology can be used to derive functionals applicable to classic analytic BRDF representations. We release our code and dataset publicly, in order to support and encourage further research in this direction.
We present a novel integrated approach for estimating both spatially-varying surface reflectance and detailed geometry from a video of a rotating object under unknown static illumination. Key to our method is the decoupling of the recovery of normal and surface reflectance from the estimation of surface geometry. We define an apparent normal field with corresponding reflectance for each point (including those not on the object's surface) that best explain the observations. We observe that the object's surface goes through points where the apparent normal field and corresponding reflectance exhibit a high degree of consistency with the observations. However, estimating the apparent normal field requires knowledge of the unknown incident lighting. We therefore formulate the recovery of shape, surface reflectance, and incident lighting, as an iterative process that alternates between estimating shape and lighting, and simultaneously recovers surface reflectance at each step. To recover the shape, we first form an initial surface that passes through locations with consistent apparent temporal traces, followed by a refinement that maximizes the consistency of the surface normals with the underlying apparent normal field. To recover the lighting, we rely on appearance-from-motion using the recovered geometry from the previous step. We demonstrate our integrated framework on a variety of synthetic and real test cases exhibiting a wide variety of materials and shape.
We develop a method to acquire the BRDF of a homogeneous flat sample from only two images, taken by a near-field perspective camera, and lit by a directional light source. Our method uses the MERL BRDF database to determine the optimal set of lightview pairs for data-driven reflectance acquisition. We develop a mathematical framework to estimate error from a given set of measurements, including the use of multiple measurements in an image simultaneously, as needed for acquisition from near-field setups. The novel error metric is essential in the near-field case, where we show that using the condition-number alone performs poorly. We demonstrate practical near-field acquisition of BRDFs from only one or two input images. Our framework generalizes to configurations like a fixed camera setup, where we also develop a simple extension to spatially-varying BRDFs by clustering the materials.
We present a novel method for capturing real-world, spatially-varying surface reflectance from a small number of object views (k). Our key observation is that a specific target's reflectance can be represented by a small number of custom basis materials (N) convexly blended by an even smaller number of non-zero weights at each point (n). Based on this sparse basis/sparser blend model, we develop an SVBRDF reconstruction algorithm that jointly solves for n, N, the basis BRDFs, and their spatial blend weights with an alternating iterative optimization, each step of which solves a linearly-constrained quadratic programming problem. We develop a numerical tool that lets us estimate the number of views required and analyze the effect of lighting and geometry on reconstruction quality. We validate our method with images rendered from synthetic BRDFs, and demonstrate convincing results on real objects of pre-scanned shape and lit by uncontrolled natural illumination, from very few or even a single input image.
Portraits taken with direct flash look harsh and unflattering because the light source comes from a small set of angles very close to the camera. Advanced photographers address this problem by using bounce flash, a technique where the flash is directed towards other surfaces in the room, creating a larger, virtual light source that can be cast from different directions to provide better shading variation for 3D modeling. However, finding the right direction to point a bounce flash requires skill and careful consideration of the available surfaces and subject configuration. Inspired by the impact of automation for exposure, focus and flash metering, we automate control of the flash direction for bounce illumination. We first identify criteria for evaluating flash directions, based on established photography literature, and relate these criteria to the color and geometry of a scene. We augment a camera with servomotors to rotate the flash head, and additional sensors (a fisheye and 3D sensors) to gather information about potential bounce surfaces. We present a simple numerical optimization criterion that finds directions for the flash that consistently yield compelling illumination and demonstrate the effectiveness of our various criteria in common photographic configurations.
Demosaicking and denoising are the key first stages of the digital imaging pipeline but they are also a severely ill-posed problem that infers three color values per pixel from a single noisy measurement. Earlier methods rely on hand-crafted filters or priors and still exhibit disturbing visual artifacts in hard cases such as moiré or thin edges. We introduce a new data-driven approach for these challenges: we train a deep neural network on a large corpus of images instead of using hand-tuned filters. While deep learning has shown great success, its naive application using existing training datasets does not give satisfactory results for our problem because these datasets lack hard cases. To create a better training set, we present metrics to identify difficult patches and techniques for mining community photographs for such patches. Our experiments show that this network and training procedure outperform state-of-the-art both on noisy and noise-free data. Furthermore, our algorithm is an order of magnitude faster than the previous best performing techniques.
Cell phone cameras have small apertures, which limits the number of photons they can gather, leading to noisy images in low light. They also have small sensor pixels, which limits the number of electrons each pixel can store, leading to limited dynamic range. We describe a computational photography pipeline that captures, aligns, and merges a burst of frames to reduce noise and increase dynamic range. Our system has several key features that help make it robust and efficient. First, we do not use bracketed exposures. Instead, we capture frames of constant exposure, which makes alignment more robust, and we set this exposure low enough to avoid blowing out highlights. The resulting merged image has clean shadows and high bit depth, allowing us to apply standard HDR tone mapping methods. Second, we begin from Bayer raw frames rather than the demosaicked RGB (or YUV) frames produced by hardware Image Signal Processors (ISPs) common on mobile platforms. This gives us more bits per pixel and allows us to circumvent the ISP's unwanted tone mapping and spatial denoising. Third, we use a novel FFT-based alignment algorithm and a hybrid 2D/3D Wiener filter to denoise and merge the frames in a burst. Our implementation is built atop Android's Camera2 API, which provides per-frame camera control and access to raw imagery, and is written in the Halide domain-specific language (DSL). It runs in 4 seconds on device (for a 12 Mpix image), requires no user intervention, and ships on several mass-produced cell phones.
With the introduction of consumer light field cameras, light field imaging has recently become widespread. However, there is an inherent trade-off between the angular and spatial resolution, and thus, these cameras often sparsely sample in either spatial or angular domain. In this paper, we use machine learning to mitigate this trade-off. Specifically, we propose a novel learning-based approach to synthesize new views from a sparse set of input views. We build upon existing view synthesis techniques and break down the process into disparity and color estimation components. We use two sequential convolutional neural networks to model these two components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images. We show the performance of our approach using only four corner sub-aperture views from the light fields captured by the Lytro Illum camera. Experimental results show that our approach synthesizes high-quality images that are superior to the state-of-the-art techniques on a variety of challenging real-world scenes. We believe our method could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.
We propose a novel birefractive depth acquisition method, which allows for single-shot depth imaging by just placing a birefringent material in front of the lens. While most transmissive materials present a single refractive index per wavelength, birefringent crystals like calcite posses two, resulting in a double refraction effect. We develop an imaging model that leverages this phenomenon and the information contained in the ordinary and the extraordinary refracted rays, providing an effective formulation of the geometric relationship between scene depth and double refraction. To handle the inherent ambiguity of having two sources of information overlapped in a single image, we define and combine two different cost volume functions. We additionally present a novel calibration technique for birefringence, carefully analyze and validate our model, and demonstrate the usefulness of our approach with several image-editing applications.
We present a hybrid 3D-2D algorithm for stabilizing 360° video using a deformable rotation motion model. Our algorithm uses 3D analysis to estimate the rotation between key frames that are appropriately spaced such that the right amount of motion has occurred to make that operation reliable. For the remaining frames, it uses 2D optimization to maximize the visual smoothness of feature point trajectories. A new low-dimensional flexible deformed rotation motion model enables handling small translational jitter, parallax, lens deformation, and rolling shutter wobble. Our 3D-2D architecture achieves better robustness, speed, and smoothing ability than either pure 2D or 3D methods can provide. Stabilizing a video with our method takes less time than playing it at normal speed. The results are sufficiently smooth to be played back at high speed-up factors; for this purpose we present a simple 360° hyperlapse algorithm that remaps the video frame time stamps to balance the apparent camera velocity.
We present an automatic video completion algorithm that synthesizes missing regions in videos in a temporally coherent fashion. Our algorithm can handle dynamic scenes captured using a moving camera. State-of-the-art approaches have difficulties handling such videos because viewpoint changes cause image-space motion vectors in the missing and known regions to be inconsistent. We address this problem by jointly estimating optical flow and color in the missing regions. Using pixel-wise forward/backward flow fields enables us to synthesize temporally coherent colors. We formulate the problem as a non-parametric patch-based optimization. We demonstrate our technique on numerous challenging videos.
Extracting background features for estimating the camera path is a key step in many video editing and enhancement applications. Existing approaches often fail on highly dynamic videos that are shot by moving cameras and contain severe foreground occlusion. Based on existing theories, we present a new, practical method that can reliably identify background features in complex video, leading to accurate camera path estimation and background layering. Our approach contains a local motion analysis step and a global optimization step. We first divide the input video into overlapping temporal windows, and extract local motion clusters in each window. We form a directed graph from these local clusters, and identify background ones by finding a minimal path through the graph using optimization. We show that our method significantly outperforms other alternatives, and can be directly used to improve common video editing applications such as stabilization, compositing and background reconstruction.
We present Jump, a practical system for capturing high resolution, omnidirectional stereo (ODS) video suitable for wide scale consumption in currently available virtual reality (VR) headsets. Our system consists of a video camera built using off-the-shelf components and a fully automatic stitching pipeline capable of capturing video content in the ODS format. We have discovered and analyzed the distortions inherent to ODS when used for VR display as well as those introduced by our capture method and show that they are small enough to make this approach suitable for capturing a wide variety of scenes. Our stitching algorithm produces robust results by reducing the problem to one of pairwise image interpolation followed by compositing. We introduce novel optical flow and compositing methods designed specifically for this task. Our algorithm is temporally coherent and efficient, is currently running at scale on a distributed computing platform, and is capable of processing hours of footage each day.
Collision sequences are commonly used in games and entertainment to add drama and excitement. Authoring even two body collisions in the real world can be difficult, as one has to get timing and the object trajectories to be correctly synchronized. After tedious trial-and-error iterations, when objects can actually be made to collide, then they are difficult to capture in 3D. In contrast, synthetically generating plausible collisions is difficult as it requires adjusting different collision parameters (e.g., object mass ratio, coefficient of restitution, etc.) and appropriate initial parameters. We present SMASH to directly read off appropriate collision parameters directly from raw input video recordings. Technically we enable this by utilizing laws of rigid body collision to regularize the problem of lifting 2D trajectories to a physically valid 3D reconstruction of the collision. The reconstructed sequences can then be modified and combined to easily author novel and plausible collisions. We evaluate our system on a range of synthetic scenes and demonstrate the effectiveness of our method by accurately reconstructing several complex real world collision events.
We present a new method that achieves a two-way coupling between deformable solids and an incompressible fluid where the underlying geometric representation is entirely Eulerian. Using the recently developed Eulerian Solids approach [Levin et al. 2011], we are able to simulate multiple solids undergoing complex, frictional contact while simultaneously interacting with a fluid. The complexity of the scenarios we are able to simulate surpasses those that we have seen from any previous method. Eulerian Solids have previously been integrated using explicit schemes, but we develop an implicit scheme that allows large time steps to be taken. The in-compressibility condition is satisfied in both the solid and the fluid, which has the added benefit of simplifying collision handling.
We present a scalable parallel solver for the pressure Poisson equation in fluids simulation which can accommodate complex irregular domains in the order of a billion degrees of freedom, using a single server or workstation fitted with GPU or Many-Core accelerators. The design of our numerical technique is attuned to the subtleties of heterogeneous computing, and allows us to benefit from the high memory and compute bandwidth of GPU accelerators even for problems that are too large to fit entirely on GPU memory. This is achieved via algebraic formulations that adequately increase the density of the GPU-hosted computation as to hide the overhead of offloading from the CPU, in exchange for accelerated convergence. Our solver follows the principles of Domain Decomposition techniques, and is based on the Schur complement method for elliptic partial differential equations. A large uniform grid is partitioned in non-overlapping subdomains, and bandwidth-optimized (GPU or Many-Core) accelerator cards are used to efficiently and concurrently solve independent Poisson problems on each resulting subdomain. Our novel contributions are centered on the careful steps necessary to assemble an accurate global solver from these constituent blocks, while avoiding excessive communication or dense linear algebra. We ultimately produce a highly effective Conjugate Gradients preconditioner, and demonstrate scalable and accurate performance on high-resolution simulations of water and smoke flow.
We propose a method to simulate the rich, scale-dependent dynamics of water waves. Our method preserves the dispersion properties of real waves, yet it supports interactions with obstacles and is computationally efficient. Fundamentally, it computes wave accelerations by way of applying a dispersion kernel as a spatially variant filter, which we are able to compute efficiently using two core technical contributions. First, we design novel, accurate, and compact pyramid kernels which compensate for low-frequency truncation errors. Second, we design a shadowed convolution operation that efficiently accounts for obstacle interactions by modulating the application of the dispersion kernel. We demonstrate a wide range of behaviors, which include capillary waves, gravity waves, and interactions with static and dynamic obstacles, all from within a single simulation.
We present an algorithm to accelerate a large class of image processing operators. Given a low-resolution reference input and output pair, we model the operator by fitting local curves that map the input to the output. We can then produce a full-resolution output by evaluating these low-resolution curves on the full-resolution input. We demonstrate that this faithfully models state-of-the-art operators for tone mapping, style transfer, and recoloring. The curves are computed by lifting the input into a bilateral grid and then solving for the 3D array of affine matrices that best maps input color to output color per x, y, intensity bin. We enforce a smoothness term on the matrices which prevents false edges and noise amplification. We can either globally optimize this energy, or quickly approximate a solution by locally fitting matrices and then enforcing smoothness by blurring in grid space. This latter option reduces to joint bilateral upsampling [Kopf et al. 2007] or the guided filter [He et al. 2013], depending on the choice of parameters. The cost of running the algorithm is reduced to the cost of running the original algorithm at greatly reduced resolution, as fitting the curves takes about 10 ms on mobile devices, and 1--2 ms on desktop CPUs, and evaluating the curves can be done with a simple GPU shader.
Filters with slowly decaying impulse responses have many uses in computer graphics. Recursive filters are often the fastest option for such cases. In this paper, we derive closed-form formulas for computing the exact initial feedbacks needed for recursive filtering infinite input extensions. We provide formulas for the constant-padding (e.g. clamp-to-edge), periodic (repeat) and even-periodic (mirror or reflect) extensions. These formulas were designed for easy integration into modern block-parallel recursive filtering algorithms. Our new modified algorithms are state-of-the-art, filtering images faster even than previous methods that ignore boundary conditions.
Image downscaling is arguably the most frequently used image processing tool. We present an algorithm based on convolutional filters where input pixels contribute more to the output image the more their color deviates from their local neighborhood, which preserves visually important details. In a user study we verify that users prefer our results over related work. Our efficient GPU implementation works in real-time when downscaling images from 24 M to 70 k pixels. Further, we demonstrate empirically that our method can be successfully applied to videos.
This paper introduces a novel domain-specific compiler, which translates visual computing programs written in dynamic languages to highly efficient code. We define "dynamic" languages as those such as Python and MATLAB, which feature dynamic typing and flexible array operations. Such language features can be useful for rapid prototyping, however, the dynamic computation model introduces significant overheads in program execution time. We introduce a compiler framework for accelerating visual computing programs, such as graphics and vision programs, written in generalpurpose dynamic languages. Our compiler allows substantial performance gains (frequently orders of magnitude) over general compilers for dynamic languages by specializing the compiler for visual computation. Specifically, our compiler takes advantage of three key properties of visual computing programs, which permit optimizations: (1) many array data structures have small, constant, or bounded size, (2) many operations on visual data are supported in hardware or are embarrassingly parallel, and (3) humans are not sensitive to small numerical errors in visual outputs due to changing floating-point precisions. Our compiler integrates program transformations that have been described previously, and improves existing transformations to handle visual programs that perform complicated array computations. In particular, we show that dependent type analysis can be used to infer sizes and guide optimizations for many small-sized array operations that arise in visual programs. Programmers who are not experts on visual computation can use our compiler to produce more efficient Python programs than if they write manually parallelized C, with fewer lines of application logic.
We propose a novel example-based approach to synthesize scenes with complex relations, e.g., when one object is 'hooked', 'surrounded', 'contained' or 'tucked into' another object. Existing relationship descriptors used in automatic scene synthesis methods are based on contacts or relative vectors connecting the object centers. Such descriptors do not fully capture the geometry of spatial interactions, and therefore cannot describe complex relationships. Our idea is to enrich the description of spatial relations between object surfaces by encoding the geometry of the open space around objects, and use this as a template for fitting novel objects. To this end, we introduce relationship templates as descriptors of complex relationships; they are computed from an example scene and combine the interaction bisector surface (IBS) with a novel feature called the space coverage feature (SCF), which encodes the open space in the frequency domain. New variations of a scene can be synthesized efficiently by fitting novel objects to the template. Our method greatly enhances existing automatic scene synthesis approaches by allowing them to handle complex relationships, as validated by our user studies. The proposed method generalizes well, as it can form complex relationships with objects that have a topology and geometry very different from the example scene.
Convolutional neural networks have been successfully used to compute shape descriptors, or jointly embed shapes and sketches in a common vector space. We propose a novel approach that leverages both labeled 3D shapes and semantic information contained in the labels, to generate semantically-meaningful shape descriptors. A neural network is trained to generate shape descriptors that lie close to a vector representation of the shape class, given a vector space of words. This method is easily extendable to range scans, hand-drawn sketches and images. This makes cross-modal retrieval possible, without a need to design different methods depending on the query type. We show that sketch-based shape retrieval using semantic-based descriptors outperforms the state-of-the-art by large margins, and mesh-based retrieval generates results of higher relevance to the query, than current deep shape descriptors.
When geometric models with a desired combination of style and functionality are not available, they currently need to be created manually. We facilitate algorithmic synthesis of 3D models of man-made shapes which combines user-specified style, described via an exemplar shape, and functionality, encoded by a functionally different target shape. Our method automatically transfers the style of the exemplar to the target, creating the desired combination. The main challenge in performing cross-functional style transfer is to implicitly separate an object's style from its function: while stylistically the output shapes should be as close as possible to the exemplar, their original functionality and structure, as encoded by the target, should be strictly preserved. Recent literature point to the presence of similarly shaped, salient geometric elements as a main indicator of stylistic similarity between 3D shapes. We therefore transfer the exemplar style to the target via a sequence of element-level operations. We allow only compatible operations, ones that do not affect the target functionality. To this end, we introduce a cross-structural element compatibility metric that estimates the impact of each operation on the edited shape. Our metric is based on the global context and coarse geometry of evaluated elements, and is trained on databases of 3D objects. We use this metric to cast style transfer as a tabu search, which incrementally updates the target shape using compatible operations, progressively increasing its style similarity to the exemplar while strictly maintaining its functionality at each step. We evaluate our framework across a range of man-made objects including furniture, light fixtures, and tableware, and perform a number of user studies confirming that it produces convincing outputs combining the desired style and function.
Large repositories of 3D shapes provide valuable input for data-driven analysis and modeling tools. They are especially powerful once annotated with semantic information such as salient regions and functional parts. We propose a novel active learning method capable of enriching massive geometric datasets with accurate semantic region annotations. Given a shape collection and a user-specified region label our goal is to correctly demarcate the corresponding regions with minimal manual work. Our active framework achieves this goal by cycling between manually annotating the regions, automatically propagating these annotations across the rest of the shapes, manually verifying both human and automatic annotations, and learning from the verification results to improve the automatic propagation algorithm. We use a unified utility function that explicitly models the time cost of human input across all steps of our method. This allows us to jointly optimize for the set of models to annotate and for the set of models to verify based on the predicted impact of these actions on the human efficiency. We demonstrate that incorporating verification of all produced labelings within this unified objective improves both accuracy and efficiency of the active learning procedure. We automatically propagate human labels across a dynamic shape network using a conditional random field (CRF) framework, taking advantage of global shape-to-shape similarities, local feature similarities, and point-to-point correspondences. By combining these diverse cues we achieve higher accuracy than existing alternatives. We validate our framework on existing benchmarks demonstrating it to be significantly more efficient at using human input compared to previous techniques. We further validate its efficiency and robustness by annotating a massive shape dataset, labeling over 93,000 shape parts, across multiple model classes, and providing a labeled part collection more than one order of magnitude larger than existing ones.
This paper presents a numerical coarsening method for corotational elasticity, which enables interactive large deformation of high-resolution heterogeneous objects. Our method derives a coarse elastic model from a high-resolution discretization of corotational elasticity with high-resolution boundary conditions. This is in contrast to previous coarsening methods, which derive a coarse elastic model from an unconstrained high-resolution discretization of regular linear elasticity, and then apply corotational computations directly on the coarse setting. We show that previous approaches fail to handle high-resolution boundary conditions correctly, suffering accuracy and robustness problems. Our method, on the other hand, supports efficiently accurate high-resolution boundary conditions, which are fundamental for rich interaction with high-resolution heterogeneous models. We demonstrate the potential of our method for interactive deformation of complex medical imaging data sets.
We show that many existing elastic body simulation approaches can be interpreted as descent methods, under a nonlinear optimization framework derived from implicit time integration. The key question is how to find an effective descent direction with a low computational cost. Based on this concept, we propose a new gradient descent method using Jacobi preconditioning and Chebyshev acceleration. The convergence rate of this method is comparable to that of L-BFGS or nonlinear conjugate gradient. But unlike other methods, it requires no dot product operation, making it suitable for GPU implementation. To further improve its convergence and performance, we develop a series of step length adjustment, initialization, and invertible model conversion techniques, all of which are compatible with GPU acceleration. Our experiment shows that the resulting simulator is simple, fast, scalable, memory-efficient, and robust against very large time steps and deformations. It can correctly simulate the deformation behaviors of many elastic materials, as long as their energy functions are second-order differentiable and their Hessian matrices can be quickly evaluated. For additional speedups, the method can also serve as a complement to other techniques, such as multi-grid.
We present a method to create personalized anatomical models ready for physics-based animation, using only a set of 3D surface scans. We start by building a template anatomical model of an average male which supports deformations due to both 1) subject-specific variations: shapes and sizes of bones, muscles, and adipose tissues and 2) skeletal poses. Next, we capture a set of 3D scans of an actor in various poses. Our key contribution is formulating and solving a large-scale optimization problem where we compute both subject-specific and pose-dependent parameters such that our resulting anatomical model explains the captured 3D scans as closely as possible. Compared to data-driven body modeling techniques that focus only on the surface, our approach has the advantage of creating physics-based models, which provide realistic 3D geometry of the bones and muscles, and naturally supports effects such as inertia, gravity, and collisions according to Newtonian dynamics.
The solution of large sparse systems of linear constraints is at the base of most interactive solvers for physically-based animation of soft body dynamics. We focus on applications with hard and tight per-frame resource budgets, such as video games, where the solution of soft body dynamics needs to be computed in a few milliseconds. Linear iterative methods are preferred in these cases since they provide approximate solutions within a given error tolerance and in a short amount of time. We present a parallel randomized Gauss-Seidel method which can be effectively employed to enable the animation of 3D soft objects discretized as large and irregular triangular or tetrahedral meshes. At the beginning of each frame, we partition the set of equations governing the system using a randomized graph coloring algorithm. The unknowns in the equations belonging to the same partition are independent of each other. Then, all the equations belonging to the same partition are solved at the same time in parallel. Our algorithm runs completely on the GPU and can support changes in the constraints topology. We tested our method as a solver for soft body dynamics within the Projective Dynamics and Position Based Dynamics frameworks. We show how the algorithmic simplicity of this iterative strategy enables great numerical stability and fast convergence speed, which are essential features for physically based animations with fixed and small hard time budgets. Compared to the state of the art, we found our method to be faster and scale better while providing stabler solutions for very small time budgets.
We present a framework for global parametrization that utilizes the edge lengths (squared) of the mesh as variables. Given a mesh with arbitrary topology and prescribed cone singularities, we flatten the original metric of the surface under strict bounds on the metric distortion (various types of conformal and isometric measures are supported). Our key observation is that the space of bounded distortion metrics (given any particular bounds) is convex, and a broad range of useful and well-known distortion energies are convex as well. With the addition of nonlinear Gaussian curvature constraints, the parametrization problem is formulated as a constrained optimization problem, and a solution gives a locally injective map. Our method is easy to implement. Sequential convex programming (SCP) is utilized to solve this problem effectively. We demonstrate the flexibility of the method and its uncompromised robustness and compare it to state-of-the-art methods.
We present a novel method, called Simplex Assembly, to compute inversion-free mappings with low or bounded distortion on simplicial meshes. Our method involves two steps: simplex disassembly and simplex assembly. Given a simplicial mesh and its initial piecewise affine mapping, we project the affine transformation associated with each simplex into the inversion-free and distortion-bounded space. The projection disassembles the input mesh into disjoint simplices. The disjoint simplices are then assembled to recover the original connectivity by minimizing the mapping distortion and the difference of the disjoint vertices with respect to the piecewise affine transformations, while the piecewise affine mapping is restricted inside the feasible space. Due to the use of affine transformations as variables, our method explicitly guarantees that no inverted simplex occurs, and that the mapping distortion is below the bound during the optimization. Compared with existing methods, our method is robust to an initialization with many inverted elements and positional constraints. We demonstrate the efficiency and robustness of our method through a variety of geometric processing tasks.
Tutte's embedding is one of the most popular approaches for computing parameterizations of surface meshes in computer graphics and geometry processing. Its popularity can be attributed to its simplicity, the guaranteed bijectivity of the embedding, and its relation to continuous harmonic mappings.
In this work we extend Tutte's embedding into hyperbolic cone-surfaces called orbifolds. Hyperbolic orbifolds are simple surfaces exhibiting different topologies and cone singularities and therefore provide a flexible and useful family of target domains. The hyperbolic Orbifold Tutte embedding is defined as a critical point of a Dirichlet energy with special boundary constraints and is proved to be bijective, while also satisfying a set of points-constraints. An efficient algorithm for computing these embeddings is developed.
We demonstrate a powerful application of the hyperbolic Tutte embedding for computing a consistent set of bijective, seamless maps between all pairs in a collection of shapes, interpolating a set of user-prescribed landmarks, in a fast and robust manner.
Parametrization based methods have recently become very popular for the generation of high quality quad meshes. In contrast to previous approaches, they allow for intuitive user control in order to accommodate all kinds of application driven constraints and design intentions. A major obstacle in practice, however, are the relatively long computations that lead to response times of several minutes already for input models of moderate complexity. In this paper we introduce a novel strategy to handle highly complex input meshes with up to several millions of triangles such that quad meshes can still be created and edited within an interactive workflow. Our method is based on representing the input model on different levels of resolution with a mechanism to propagate parametrizations from coarser to finer levels. The major challenge is to guarantee consistent parametrizations even in the presence of charts, transition functions, and singularities. Moreover, the remaining degrees of freedom on coarser levels of resolution have to be chosen carefully in order to still achieve low distortion parametrizations. We demonstrate a prototypic system where the user can interactively edit quad meshes with powerful high-level operations such as guiding constraints, singularity repositioning, and singularity connections.
In facial animation, the accurate shape and motion of the lips of virtual humans is of paramount importance, since subtle nuances in mouth expression strongly influence the interpretation of speech and the conveyed emotion. Unfortunately, passive photometric reconstruction of expressive lip motions, such as a kiss or rolling lips, is fundamentally hard even with multi-view methods in controlled studios. To alleviate this problem, we present a novel approach for fully automatic reconstruction of detailed and expressive lip shapes along with the dense geometry of the entire face, from just monocular RGB video. To this end, we learn the difference between inaccurate lip shapes found by a state-of-the-art monocular facial performance capture approach, and the true 3D lip shapes reconstructed using a high-quality multi-view system in combination with applied lip tattoos that are easy to track. A robust gradient domain regressor is trained to infer accurate lip shapes from coarse monocular reconstructions, with the additional help of automatically extracted inner and outer 2D lip contours. We quantitatively and qualitatively show that our monocular approach reconstructs higher quality lip shapes, even for complex shapes like a kiss or lip rolling, than previous monocular approaches. Furthermore, we compare the performance of person-specific and multi-person generic regression strategies and show that our approach generalizes to new individuals and general scenes, enabling high-fidelity reconstruction even from commodity video footage.
In recent years, sophisticated image-based reconstruction methods for the human face have been developed. These methods capture highly detailed static and dynamic geometry of the whole face, or specific models of face regions, such as hair, eyes or eye lids. Unfortunately, image-based methods to capture the mouth cavity in general, and the teeth in particular, have received very little attention. The accurate rendering of teeth, however, is crucial for the realistic display of facial expressions, and currently high quality face animations resort to tooth row models created by tedious manual work. In dentistry, special intra-oral scanners for teeth were developed, but they are invasive, expensive, cumbersome to use, and not readily available. In this paper, we therefore present the first approach for non-invasive reconstruction of an entire person-specific tooth row from just a sparse set of photographs of the mouth region. The basis of our approach is a new parametric tooth row prior learned from high quality dental scans. A new model-based reconstruction approach fits teeth to the photographs such that visible teeth are accurately matched and occluded teeth plausibly synthesized. Our approach seamlessly integrates into photogrammetric multi-camera reconstruction setups for entire faces, but also enables high quality teeth modeling from normal uncalibrated photographs and even short videos captured with a mobile phone.
Significant challenges currently prohibit expressive interaction in virtual reality (VR). Occlusions introduced by head-mounted displays (HMDs) make existing facial tracking techniques intractable, and even state-of-the-art techniques used for real-time facial tracking in unconstrained environments fail to capture subtle details of the user's facial expressions that are essential for compelling speech animation. We introduce a novel system for HMD users to control a digital avatar in real-time while producing plausible speech animation and emotional expressions. Using a monocular camera attached to an HMD, we record multiple subjects performing various facial expressions and speaking several phonetically-balanced sentences. These images are used with artist-generated animation data corresponding to these sequences to train a convolutional neural network (CNN) to regress images of a user's mouth region to the parameters that control a digital avatar. To make training this system more tractable, we use audio-based alignment techniques to map images of multiple users making the same utterance to the corresponding animation parameters. We demonstrate that this approach is also feasible for tracking the expressions around the user's eye region with an internal infrared (IR) camera, thereby enabling full facial tracking. This system requires no user-specific calibration, uses easily obtainable consumer hardware, and produces high-quality animations of speech and emotional expressions. Finally, we demonstrate the quality of our system on a variety of subjects and evaluate its performance against state-of-the-art real-time facial tracking techniques.
Modern systems for real-time hand tracking rely on a combination of discriminative and generative approaches to robustly recover hand poses. Generative approaches require the specification of a geometric model. In this paper, we propose a the use of sphere-meshes as a novel geometric representation for real-time generative hand tracking. How tightly this model fits a specific user heavily affects tracking precision. We derive an optimization to non-rigidly deform a template model to fit the user data in a number of poses. This optimization jointly captures the user's static and dynamic hand geometry, thus facilitating high-precision registration. At the same time, the limited number of primitives in the tracking template allows us to retain excellent computational performance. We confirm this by embedding our models in an open source real-time registration algorithm to obtain a tracker steadily running at 60Hz. We demonstrate the effectiveness of our solution by qualitatively and quantitatively evaluating tracking precision on a variety of complex motions. We show that the improved tracking accuracy at high frame-rate enables stable tracking of extended and complex motion sequences without the need for per-frame re-initialization. To enable further research in the area of high-precision hand tracking, we publicly release source code and evaluation datasets.
We present FlexMolds, a novel computational approach to automatically design flexible, reusable molds that, once 3D printed, allow us to physically fabricate, by means of liquid casting, multiple copies of complex shapes with rich surface details and complex topology. The approach to design such flexible molds is based on a greedy bottom-up search of possible cuts over an object, evaluating for each possible cut the feasibility of the resulting mold. We use a dynamic simulation approach to evaluate candidate molds, providing a heuristic to generate forces that are able to open, detach, and remove a complex mold from the object it surrounds. We have tested the approach with a number of objects with nontrivial shapes and topologies.
Frame shapes, which are made of struts, have been widely used in many fields, such as art, sculpture, architecture, and geometric modeling, etc. An interest in robotic fabrication of frame shapes via spatial thermoplastic extrusion has been increasingly growing in recent years. In this paper, we present a novel algorithm to generate a feasible fabrication sequence for general frame shapes. To solve this non-trivial combinatorial problem, we develop a divide-and-conquer strategy that first decomposes the input frame shape into stable layers via a constrained sparse optimization model. Then we search a feasible sequence for each layer via a local optimization method together with a backtracking strategy. The generated sequence guarantees that the already-printed part is in a stable equilibrium state at all stages of fabrication, and that the 3D printing extrusion head does not collide with the printed part during the fabrication. Our algorithm has been validated by a built prototype robotic fabrication system made by a 6-axis KUKA robotic arm with a customized extrusion head. Experimental results demonstrate the feasibility and applicability of our algorithm.
Current CAD modeling techniques enable the design of objects with aesthetically pleasing smooth freeform surfaces. However, the fabrication of these freeform shapes remains challenging. Our novel method uses orthogonal principal strips to fabricate objects whose boundary consists of freeform surfaces. This approach not only lends an artistic touch to the appearance of objects, but also provides directions for reinforcement, as the surface is mostly bent along the lines of curvature. Moreover, it is unnecessary to adjust the bending of these orthogonal strips during the construction process, which automatically reforms the design shape as if it is memorized, provided the strips possess bending rigidity. Our method relies on semi-isometric mapping, which preserves the length of boundary curves, and approximates angles between boundary curves under local minimization. Applications include the fabrication of paper and sheet metal craft, and architectural models using plastic plates. We applied our technique to several freeform objects to demonstrate the effectiveness of our algorithms.
In this paper we propose failure probabilities as a semantically and mechanically meaningful measure of object fragility. We present a stochastic finite element method which exploits fast rigid body simulation and reduced-space approaches to compute spatially varying failure probabilities. We use an explicit rigid body simulation to emulate the real-world loading conditions an object might experience, including persistent and transient frictional contact, while allowing us to combine several such scenarios together. Thus, our estimates better reflect real-world failure modes than previous methods. We validate our results using a series of real-world tests. Finally, we show how to embed failure probabilities into a stress constrained topology optimization which we use to design objects such as weight bearing brackets and robust 3D printable objects.
We present an interactive system for computational design, optimization, and fabrication of multicopters. Our computational approach allows non-experts to design, explore, and evaluate a wide range of different multicopters. We provide users with an intuitive interface for assembling a multicopter from a collection of components (e.g., propellers, motors, and carbon fiber rods). Our algorithm interactively optimizes shape and controller parameters of the current design to ensure its proper operation. In addition, we allow incorporating a variety of other metrics (such as payload, battery usage, size, and cost) into the design process and exploring tradeoffs between them. We show the efficacy of our method and system by designing, optimizing, fabricating, and operating multicopters with complex geometries and propeller configurations. We also demonstrate the ability of our optimization algorithm to improve the multicopter performance under different metrics.
We introduce a novel GPU path rendering method based on scan-line rasterization, which is highly work-efficient but traditionally considered as GPU hostile. Our method is parallelized over boundary fragments, i.e., pixels directly intersecting the path boundary. Non-boundary pixels are processed in bulk as horizontal spans like in CPU scanline rasterizers, which saves a significant amount of winding number computation workload. The distinction also allows the majority of our algorithmic steps to focus on boundary fragments only, which leads to highly balanced workload among the GPU threads. In addition, we develop a ray shooting pattern that minimizes the global data dependency when computing winding numbers at anti-aliasing samples. This allows us to shift the majority of winding-number-related workload to the same kernel that consumes its result, which saves a significant amount of GPU memory bandwidth. Experiments show that our method gives a consistent 2.5X speedup over state-of-the-art alternatives for high-quality rendering at Ultra HD resolution, which can increase to more than 30X in extreme cases. We can also get a consistent 10X speedup on animated input.
This paper tackles a challenging 2D collage generation problem, focusing on shapes: we aim to fill a given region by packing irregular and reasonably-sized shapes with minimized gaps and overlaps. To achieve this nontrivial problem, we first have to analyze the boundary of individual shapes and then couple the shapes with partially-matched boundary to reduce gaps and overlaps in the collages. Second, the search space in identifying a good coupling of shapes is highly enormous, since arranging a shape in a collage involves a position, an orientation, and a scale factor. Yet, this matching step needs to be performed for every single shape when we pack it into a collage. Existing shape descriptors are simply infeasible for computation in a reasonable amount of time. To overcome this, we present a brand new, scale- and rotation-invariant 2D shape descriptor, namely pyramid of arclength descriptor (PAD). Its formulation is locally supported, scalable, and yet simple to construct and compute. These properties make PAD efficient for performing the partial-shape matching. Hence, we can prune away most search space with simple calculation, and efficiently identify candidate shapes. We evaluate our method using a large variety of shapes with different types and contours. Convincing collage results in terms of visual quality and time performance are obtained.
Modern GPUs supporting compressed textures allow interactive application developers to save scarce GPU resources such as VRAM and bandwidth. Compressed textures use fixed compression ratios whose lossy representations are significantly poorer quality than traditional image compression formats such as JPEG. We present a new method in the class of supercompressed textures that provides an additional layer of compression to already compressed textures. Our texture representation is designed for endpoint compressed formats such as DXT and PVRTC and decoding on commodity GPUs. We apply our algorithm to commonly used formats by separating their representation into two parts that are processed independently and then entropy encoded. Our method preserves the CPU-GPU bandwidth during the decoding phase and exploits the parallelism of GPUs to provide up to 3X faster decode compared to prior texture supercompression algorithms. Along with the gains in decoding speed, our method maintains both the compression size and quality of current state of the art texture representations.
Our aim is to give users real-time free-viewpoint rendering of real indoor scenes, captured with off-the-shelf equipment such as a high-quality color camera and a commodity depth sensor. Image-based Rendering (IBR) can provide the realistic imagery required at real-time speed. For indoor scenes however, two challenges are especially prominent. First, the reconstructed 3D geometry must be compact, but faithful enough to respect occlusion relationships when viewed up close. Second, man-made materials call for view-dependent texturing, but using too many input photographs reduces performance. We customize a typical RGB-D 3D surface reconstruction pipeline to produce a coarse global 3D surface, and local, per-view geometry for each input image. Our tiled IBR preserves quality by economizing on the expected contributions that entire groups of input pixels make to a final image. The two components are designed to work together, giving real-time performance, while hardly sacrificing quality. Testing on a variety of challenging scenes shows that our inside-out IBR scales favorably with the number of input images.
We present a data-driven approach for mesh denoising. Our key idea is to formulate the denoising process with cascaded non-linear regression functions and learn them from a set of noisy meshes and their ground-truth counterparts. Each regression function infers the normal of a denoised output mesh facet from geometry features extracted from its neighborhood facets on the input mesh and sends the result as the input of the next regression function. Specifically, we develop a filtered facet normal descriptor (FND) for modeling the geometry features around each facet on the noisy mesh and model a regression function with neural networks for mapping the FNDs to the facet normals of the denoised mesh. To handle meshes with different geometry features and reduce the training difficulty, we cluster the input mesh facets according to their FNDs and train neural networks for each cluster separately in an offline learning stage. At runtime, our method applies the learned cascaded regression functions to a noisy input mesh and reconstructs the denoised mesh from the output facet normals.
Our method learns the non-linear denoising process from the training data and makes no specific assumptions about the noise distribution and geometry features in the input. The runtime denoising process is fully automatic for different input meshes. Our method can be easily adapted to meshes with arbitrary noise patterns by training a dedicated regression scheme with mesh data and the particular noise pattern. We evaluate our method on meshes with both synthetic and real scanned noise, and compare it to other mesh denoising algorithms. Results demonstrate that our method outperforms the state-of-the-art mesh denoising methods and successfully removes different kinds of noise for meshes with various geometry features.
Given a tetrahedral mesh, the algorithm described in this article produces a smooth 3D frame field, i.e. a set of three orthogonal directions associated with each vertex of the input mesh. The field varies smoothly inside the volume, and matches the normals of the volume boundary. Such a 3D frame field is a key component for some hexahedral meshing algorithms, where it is used to steer the placement of the generated elements.
We improve the state-of-the art in terms of quality, efficiency and reproducibility. Our main contribution is a non-trivial extension in 3D of the existing least-squares approach used for optimizing a 2D frame field. Our algorithm is inspired by the method proposed by Huang et al. , improved with an initialization that directly enforces boundary conditions. Our initialization alone is a fast and easy way to generate frames fields that are suitable for remeshing applications. For better robustness and quality, the field can be further optimized using nonlinear optimization as in Li et al . We make the remark that sampling the field on vertices instead of tetrahedra significantly improves both performance and quality.
Interchangeable components allow an object to be easily reconfigured, but usually reveal that the object is composed of parts. In this work, we present a computational approach for the design of components which are interchangeable, but also form objects with a coherent appearance which conceals their composition from parts. These components allow a physical realization of Assembly Based Modelling, a popular virtual modelling paradigm in which new models are constructed from the parts of existing ones. Given a collection of 3D models and a segmentation that specifies the component connectivity, our approach generates the components by jointly deforming and partitioning the models. We determine the component boundaries by evolving a set of closed contours on the input models to maximize the contours' geometric similarity. Next, we efficiently deform the input models to enforce both C0 and C1 continuity between components while minimizing deviation from their original appearance. The user can guide our deformation scheme to preserve desired features. We demonstrate our approach on several challenging examples, showing that our components can be physically reconfigured to assemble a large variety of coherent shapes.
Example-based shape deformation allows a mesh to be easily manipulated or animated with simple inputs. As the user pulls parts of the shape, the rest of the mesh automatically changes in an intuitive way by drawing from a set of exemplars. This provides a way for virtual shapes or characters to be easily authored and manipulated, or for a set of drawings to be animated with simple inputs. We describe a new approach for example-based inverse kinematic mesh manipulation which generates high quality deformations for a wide range of inputs, and in particular works well even when provided stylized or "cartoony" examples. This approach is fast enough to run in real time, reliably uses the artist's input shapes in an intuitive way even for highly nonphysical deformations, and provides added expressiveness by allowing the input shapes to be utilized in a way which spatially varies smoothly across the resulting deformed mesh. This allows for rich and detailed deformations to be created from a small set of input shapes, and gives an easy way for a set of sketches to be brought alive with simple click-and-drag inputs.
In this paper, we present an interactive system for mechanism modeling from multi-view images. Its key feature is that the generated 3D mechanism models contain not only geometric shapes but also internal motion structures: they can be directly animated through kinematic simulation. Our system consists of two steps: interactive 3D modeling and stochastic motion parameter estimation. At the 3D modeling step, our system is designed to integrate the sparse 3D points reconstructed from multi-view images and a sketching interface to achieve accurate 3D modeling of a mechanism. To recover the motion parameters, we record a video clip of the mechanism motion and adopt stochastic optimization to recover its motion parameters by edge matching. Experimental results show that our system can achieve the 3D modeling of a range of mechanisms from simple mechanical toys to complex mechanism objects.
We propose a framework for global registration of building scans. The first contribution of our work is to detect and use portals (e.g., doors and windows) to improve the local registration between two scans. Our second contribution is an optimization based on a linear integer programming formulation. We abstract each scan as a block and model the blocks registration as an optimization problem that aims at maximizing the overall matching score of the entire scene. We propose an efficient solution to this optimization problem by iteratively detecting and adding local constraints. We demonstrate the effectiveness of the proposed method on buildings of various styles and that our approach is superior to the current state of the art.
We address the problem of autonomously exploring unknown objects in a scene by consecutive depth acquisitions. The goal is to reconstruct the scene while online identifying the objects from among a large collection of 3D shapes. Fine-grained shape identification demands a meticulous series of observations attending to varying views and parts of the object of interest. Inspired by the recent success of attention-based models for 2D recognition, we develop a 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view to focus on, to achieve efficient object recognition. The region-level attention leads to focus-driven features which are quite robust against object occlusion. The attention model, trained with the 3D shape collection, encodes the temporal dependencies among consecutive views with deep recurrent networks. This facilitates order-aware view planning accounting for robot movement cost. In achieving instance identification, the shape collection is organized into a hierarchy, associated with pre-trained hierarchical classifiers. The effectiveness of our method is demonstrated on an autonomous robot (PR) that explores a scene and identifies the objects to construct a 3D scene model.
Demand for high-volume 3D scanning of real objects is rapidly growing in a wide range of applications, including online retailing, quality-control for manufacturing, stop motion capture for 3D animation, and archaeological documentation and reconstruction. Although mature technologies exist for high-fidelity 3D model acquisition, deploying them at scale continues to require non-trivial manual labor. We describe a system that allows non-expert users to scan large numbers of physical objects within a reasonable amount of time, and with greater ease. Our system uses novel view- and path-planning algorithms to control a structured-light scanner mounted on a calibrated motorized positioning system. We demonstrate the ability of our prototype to safely, robustly, and automatically acquire 3D models for large collections of small objects.
We present a novel approach that allows web designers to easily direct user attention via visual flow on web designs. By collecting and analyzing users' eye gaze data on real-world webpages under the task-driven condition, we build two user attention models that characterize user attention patterns between a pair of page components. These models enable a novel web design interaction for designers to easily create a visual flow to guide users' eyes (i.e., direct user attention along a given path) through a web design with minimal effort. In particular, given an existing web design as well as a designer-specified path over a subset of page components, our approach automatically optimizes the web design so that the resulting design can direct users' attention to move along the input path. We have tested our approach on various web designs of different categories. Results show that our approach can effectively guide user attention through the web design according to the designer's high-level specification.
We present a full geometric parameterization of generalized barycentric coordinates on convex polytopes. We show that these continuous and non-negative coefficients ensuring linear precision can be efficiently and exactly computed through a power diagram of the polytope's vertices and the evaluation point. In particular, we point out that well-known explicit coordinates such as Wachspress, Discrete Harmonic, Voronoi, or Mean Value correspond to simple choices of power weights. We also present examples of new barycentric coordinates, and discuss possible extensions such as power coordinates for non-convex polygons and smooth shapes.
This paper presents a variational method to generate cell complexes with local anisotropy conforming to the Hessian of any given convex function and for any given local mesh density. Our formulation builds upon approximation theory to offer an anisotropic extension of Centroidal Voronoi Tessellations which can be seen as a dual form of Optimal Delaunay Triangulation. We thus refer to the resulting anisotropic polytopal meshes as Optimal Voronoi Tessellations. Our approach sharply contrasts with previous anisotropic versions of Voronoi diagrams as it employs first-type Bregman diagrams, a generalization of power diagrams where sites are augmented with not only a scalar-valued weight but also a vector-valued shift. As such, our OVT meshes contain only convex cells with straight edges, and admit an embedded dual triangulation that is combinatorially-regular. We show the effectiveness of our technique using off-the-shelf computational geometry libraries.
Computing centroidal Voronoi tessellations (CVT) has many applications in computer graphics. The existing methods, such as the Lloyd algorithm and the quasi-Newton solver, are efficient and easy to implement; however, they compute only the local optimal solutions due to the highly non-linear nature of the CVT energy. This paper presents a novel method, called manifold differential evolution (MDE), for computing globally optimal geodesic CVT energy on triangle meshes. Formulating the mutation operator using discrete geodesics, MDE naturally extends the powerful differential evolution framework from Euclidean spaces to manifold domains. Under mild assumptions, we show that MDE has a provable probabilistic convergence to the global optimum. Experiments on a wide range of 3D models show that MDE consistently out-performs the existing methods by producing results with lower energy. Thanks to its intrinsic and global nature, MDE is insensitive to initialization and mesh tessellation. Moreover, it is able to handle multiply-connected Voronoi cells, which are challenging to the existing geodesic CVT methods.
This article presents a new method to optimally partition a geometric domain with capacity constraints on the partitioned regions. It is an important problem in many fields, ranging from engineering to economics. It is known that a capacity-constrained partition can be obtained as a power diagram with the squared L2 metric. We present a method with super-linear convergence for computing optimal partition with capacity constraints that outperforms the state-of-the-art in an order of magnitude. We demonstrate the efficiency of our method in the context of three different applications in computer graphics and geometric processing: displacement interpolation of function distribution, blue-noise point sampling, and optimal convex decomposition of 2D domains. Furthermore, the proposed method is extended to capacity-constrained optimal partition with respect to general cost functions beyond the squared Euclidean distance.
Efficiently simulating light transport in various scenes with a single algorithm is a difficult and important problem in computer graphics. Two major issues have been shown to hinder the efficiency of the existing solutions: light transport due to multiple highly glossy or specular interactions, and scenes with complex visibility between the camera and light sources. While recent bidirectional path sampling methods such as vertex connection and merging/unified path sampling (VCM/UPS) efficiently deal with highly glossy or specular transport, they tend to perform poorly in scenes with complex visibility. On the other hand, Markov chain Monte Carlo (MCMC) methods have been able to show some excellent results in scenes with complex visibility, but they behave unpredictably in scenes with glossy or specular surfaces due to their fundamental issue of sample correlation. In this paper, we show how to fuse the underlying key ideas behind VCM/UPS and MCMC into a single, efficient light transport solution. Our algorithm is specifically designed to retain the advantages of both approaches, while alleviating their limitations. Our experiments show that the algorithm can efficiently render scenes with both highly glossy or specular materials and complex visibility, without compromising the performance in simpler cases.
We present a novel approach to improve temporal coherence in Monte Carlo renderings of animation sequences. Unlike other approaches that exploit temporal coherence in a post-process, our technique does so already during sampling. Building on previous gradient-domain rendering techniques that sample finite differences over the image plane, we introduce temporal finite differences and formulate a corresponding 3D spatio-temporal screened Poisson reconstruction problem that is solved over windowed batches of several frames simultaneously. We further extend our approach to include second order, mixed spatio-temporal differences, an improved technique to compute temporal differences exploiting motion vectors, and adaptive sampling. Our algorithm can be built on a gradient-domain path tracer without large modifications. In particular, we do not require the ability to evaluate animation paths over multiple frames. We demonstrate that our approach effectively reduces temporal flickering in animation sequences, significantly improving the visual quality compared to both path tracing and gradient-domain rendering of individual frames.
We present a novel technique that produces two-dimensional low-discrepancy (LD) blue noise point sets for sampling. Using one-dimensional binary van der Corput sequences, we construct two-dimensional LD point sets, and rearrange them to match a target spectral profile while preserving their low discrepancy. We store the rearrangement information in a compact lookup table that can be used to produce arbitrarily large point sets. We evaluate our technique and compare it to the state-of-the-art sampling approaches.
A common solution to reducing visible aliasing artifacts in image reconstruction is to employ sampling patterns with a blue noise power spectrum. These sampling patterns can prevent discernible artifacts by replacing them with incoherent noise. Here, we propose a new family of blue noise distributions, Stair blue noise, which is mathematically tractable and enables parameter optimization to obtain the optimal sampling distribution. Furthermore, for a given sample budget, the proposed blue noise distribution achieves a significantly larger alias-free low-frequency region compared to existing approaches, without introducing visible artifacts in the mid-frequencies. We also develop a new sample synthesis algorithm that benefits from the use of an unbiased spatial statistics estimator and efficient optimization strategies.
We present a texture space caching and reconstruction system for Monte Carlo ray tracing. Our system gathers and filters shading on-demand, including querying secondary rays, directly within a filter footprint around the current shading point. We shade on local grids in texture space with primary visibility decoupled from shading. Unique filters can be applied per material, where any terms of the shader can be chosen to be included in each kernel. This is a departure from recent screen space image reconstruction techniques, which typically use a single, complex kernel with a set of large auxiliary guide images as input. We show a number of high-performance use cases for our system, including interactive denoising of Monte Carlo ray tracing with motion/defocus blur, spatial and temporal shading reuse, cached product importance sampling, and filters based on linear regression in texture space.