We propose a novel framework for the computational design of tensegrity structures, which are constructions made of struts and cables, held rigid by continuous tension between the elements. Tensegrities are known to be difficult to design---existing design methods are often restricted to using symmetric or templated configurations, limiting the design space to simple constructions. We introduce an algorithm to automatically create free-form stable tensegrity designs that satisfy both fabrication and geometric constraints, and faithfully approximate input geometric shapes. Our approach sidesteps the usual force-based approach in favor of a geometric optimization on the positions of the elements. Equipped with this formulation, we provide a design framework to explore the highly constrained space of tensegrity structures. We validate our method with simulations and real-world constructions.
Three-dimensional structures in building construction and architecture are realized with conflicting goals in mind: engineering considerations and financial constraints easily are at odds with creative aims. It would therefore be very beneficial if optimization and side conditions involving statics and geometry could play a role already in early stages of design, and could be incorporated in design tools in an unobtrusive and interactive way. This paper, which is concerned with a prominent class of structures, is a substantial step towards this goal. We combine the classical work of Maxwell, Michell, and Airy with differential-geometric considerations and obtain a geometric understanding of "optimality" of surface-like lightweight structures. It turns out that total absolute curvature plays an important role. We enable the modeling of structures of minimal weight which in addition have properties relevant for building construction and design, like planar panels, dominance of axial forces over bending, and geometric alignment constraints.
Reconfigurable assemblies consist of a common set of parts that can be assembled into different forms for use in different situations. Designing these assemblies is a complex problem, since it requires a compatible decomposition of shapes with correspondence across forms, and a planning of well-matched joints to connect parts in each form. This paper presents computational methods as tools to assist the design and construction of reconfigurable assemblies, typically for furniture. There are three key contributions in this work. First, we present the compatible decomposition as a weakly-constrained dissection problem, and derive its solution based on a dynamic bipartite graph to construct parts across multiple forms; particularly, we optimize the parts reuse and preserve the geometric semantics. Second, we develop a joint connection graph to model the solution space of reconfigurable assemblies with part and joint compatibility across different forms. Third, we formulate the backward interlocking and multi-key interlocking models, with which we iteratively plan the joints consistently over multiple forms. We show the applicability of our approach by constructing reconfigurable furniture of various complexities, extend it with recursive connections to generate extensible and hierarchical structures, and fabricate a number of results using 3D printing, 2D laser cutting, and woodworking.
Recent advances in 3D printing have made it easier to manufacture customized objects by ordinary users in an affordable manner, and therefore spurred high demand for more accessible methods for designing and fabricating 3D objects of various shapes and functionalities. In this paper we present a novel approach to model and fabricate surface-like objects composed of connected tiles, which can be used as objects in daily life, such as ornaments, covers, shades or handbags.
We propose an automatic method to infer high dynamic range illumination from a single, limited field-of-view, low dynamic range photograph of an indoor scene. In contrast to previous work that relies on specialized image capture, user input, and/or simple scene models, we train an end-to-end deep neural network that directly regresses a limited field-of-view photo to HDR illumination, without strong assumptions on scene geometry, material properties, or lighting. We show that this can be accomplished in a three step process: 1) we train a robust lighting classifier to automatically annotate the location of light sources in a large dataset of LDR environment maps, 2) we use these annotations to train a deep neural network that predicts the location of lights in a scene from a single limited field-of-view photo, and 3) we fine-tune this network using a small dataset of HDR environment maps to predict light intensities. This allows us to automatically recover high-quality HDR illumination estimates that significantly outperform previous state-of-the-art methods. Consequently, using our illumination estimates for applications like 3D object insertion, produces photo-realistic results that we validate via a perceptual user study.
Inferring a high dynamic range (HDR) image from a single low dynamic range (LDR) input is an ill-posed problem where we must compensate lost data caused by under-/over-exposure and color quantization. To tackle this, we propose the first deep-learning-based approach for fully automatic inference using convolutional neural networks. Because a naive way of directly inferring a 32-bit HDR image from an 8-bit LDR image is intractable due to the difficulty of training, we take an indirect approach; the key idea of our method is to synthesize LDR images taken with different exposures (i.e., bracketed images) based on supervised learning, and then reconstruct an HDR image by merging them. By learning the relative changes of pixel values due to increased/decreased exposures using 3D deconvolutional networks, our method can reproduce not only natural tones without introducing visible noise but also the colors of saturated pixels. We demonstrate the effectiveness of our method by comparing our results not only with those of conventional methods but also with ground-truth HDR images.
Camera sensors can only capture a limited range of luminance simultaneously, and in order to create high dynamic range (HDR) images a set of different exposures are typically combined. In this paper we address the problem of predicting information that have been lost in saturated image areas, in order to enable HDR reconstruction from a single exposure. We show that this problem is well-suited for deep learning algorithms, and propose a deep convolutional neural network (CNN) that is specifically designed taking into account the challenges in predicting HDR values. To train the CNN we gather a large dataset of HDR images, which we augment by simulating sensor saturation for a range of cameras. To further boost robustness, we pre-train the CNN on a simulated HDR dataset created from a subset of the MIT Places database. We demonstrate that our approach can reconstruct high-resolution visually convincing HDR results in a wide range of situations, and that it generalizes well to reconstruction of images captured with arbitrary and low-end cameras that use unknown camera response functions and post-processing. Furthermore, we compare to existing methods for HDR expansion, and show high quality results also for image based lighting. Finally, we evaluate the results in a subjective experiment performed on an HDR display. This shows that the reconstructed HDR images are visually convincing, with large improvements as compared to existing methods.
A common way to generate high-quality product images is to start with a physically-based render of a 3D scene, apply image-based edits on individual render channels, and then composite the edited channels together (in some cases, on top of a background photograph). This workflow requires users to manually select the right render channels, prescribe channel-specific masks, and set appropriate edit parameters. Unfortunately, such edits cannot be easily reused for global variations of the original scene, such as a rigid-body transformation of the 3D objects or a modified viewpoint, which discourages iterative refinement of both global scene changes and image-based edits. We propose a method to automatically transfer such user edits across variations of object geometry, illumination, and viewpoint. This transfer problem is challenging since many edits may be visually plausible but non-physical, with a successful transfer dependent on an unknown set of scene attributes that may include both photometric and non-photometric features. To address this challenge, we present a transfer algorithm that extends the image analogies formulation to include an augmented set of photometric and non-photometric guidance channels and, more importantly, adaptively estimate weights for the various candidate channels in a way that matches the characteristics of each individual edit. We demonstrate our algorithm on a variety of complex edit-transfer scenarios for creating high-quality product images.
We present a method to create vector cliparts from photographs. Our approach aims at reproducing two key properties of cliparts: they should be easily editable, and they should represent image content in a clean, simplified way. We observe that vector artists satisfy both of these properties by modeling cliparts with linear color gradients, which have a small number of parameters and approximate well smooth color variations. In addition, skilled artists produce intricate yet editable artworks by stacking multiple gradients using opaque and semi-transparent layers. Motivated by these observations, our goal is to decompose a bitmap photograph into a stack of layers, each layer containing a vector path filled with a linear color gradient. We cast this problem as an optimization that jointly assigns each pixel to one or more layer and finds the gradient parameters of each layer that best reproduce the input. Since a trivial solution would consist in assigning each pixel to a different, opaque layer, we complement our objective with a simplicity term that favors decompositions made of few, semi-transparent layers. However, this formulation results in a complex combinatorial problem combining discrete unknowns (the pixel assignments) and continuous unknowns (the layer parameters). We propose a Monte Carlo Tree Search algorithm that efficiently explores this solution space by leveraging layering cues at image junctions. We demonstrate the effectiveness of our method by reverse-engineering existing cliparts and by creating original cliparts from studio photographs.
Implicit models can be combined by using composition operators; functions that determine the resulting shape. Recently, gradient-based composition operators have been used to express a variety of behaviours including smooth transitions, sharp edges, contact surfaces, bulging, or any combinations. The problem for designers is that building new operators is a complex task that requires specialized technical knowledge. In this work, we introduce an automatic method for deriving a gradient-based implicit operator from 2D drawings that prototype the intended visual behaviour. To solve this inverse problem, in which a shape defines a function, we introduce a general template for implicit operators. A user's sketch is interpreted as samples in the 3D operator's domain. We fit the template to the samples with a non-rigid registration approach. The process works at interactive rates and can accommodate successive refinements by the user. The final result can be applied to 3D surfaces as well as to 2D shapes. Our method is able to replicate the effect of any blending operator presented in the literature, as well as generating new ones such as non-commutative operators. We demonstrate the usability of our method with examples in font-design, collision-response modeling, implicit skinning, and complex shape design.
A geometric dissection is a set of pieces which can be assembled in different ways to form distinct shapes. Dissections are used as recreational puzzles because it is striking when a single set of pieces can construct highly different forms. Existing techniques for creating dissections find pieces that reconstruct two input shapes exactly. Unfortunately, these methods only support simple, abstract shapes because an excessive number of pieces may be needed to reconstruct more complex, naturalistic shapes. We introduce a dissection design technique that supports such shapes by requiring that the pieces reconstruct the shapes only approximately. We find that, in most cases, a small number of pieces suffices to tightly approximate the input shapes. We frame the search for a viable dissection as a combinatorial optimization problem, where the goal is to search for the best approximation to the input shapes using a given number of pieces. We find a lower bound on the tightness of the approximation for a partial dissection solution, which allows us to prune the search space and makes the problem tractable. We demonstrate our approach on several challenging examples, showing that it can create dissections between shapes of significantly greater complexity than those supported by previous techniques.
Computing solutions to linear systems is a fundamental building block of many geometry processing algorithms. In many cases the Cholesky factorization of the system matrix is computed to subsequently solve the system, possibly for many right-hand sides, using forward and back substitution. We demonstrate how to exploit sparsity in both the right-hand side and the set of desired solution values to obtain significant speedups. The method is easy to implement and potentially useful in any scenarios where linear problems have to be solved locally. We show that this technique is useful for geometry processing operations, in particular we consider the solution of diffusion problems. All problems profit significantly from sparse computations in terms of runtime, which we demonstrate by providing timings for a set of numerical experiments.
Correspondence problems are often modelled as quadratic optimization problems over permutations. Common scalable methods for approximating solutions of these NP-hard problems are the spectral relaxation for non-convex energies and the doubly stochastic (DS) relaxation for convex energies. Lately, it has been demonstrated that semidefinite programming relaxations can have considerably improved accuracy at the price of a much higher computational cost.
We present a convex quadratic programming relaxation which is provably stronger than both DS and spectral relaxations, with the same scalability as the DS relaxation. The derivation of the relaxation also naturally suggests a projection method for achieving meaningful integer solutions which improves upon the standard closest-permutation projection. Our method can be easily extended to optimization over doubly stochastic matrices, injective matching, and problems with additional linear constraints. We employ recent advances in optimization of linear-assignment type problems to achieve an efficient algorithm for solving the convex relaxation.
We present experiments indicating that our method is more accurate than local minimization or competing relaxations for non-convex problems. We successfully apply our algorithm to shape matching and to the problem of ordering images in a grid, obtaining results which compare favorably with state of the art methods.
We believe our results indicate that our method should be considered the method of choice for quadratic optimization over permutations.
We introduce a robust and automatic algorithm to simplify the structure and reduce the singularities of a hexahedral mesh. Our algorithm interleaves simplification operations to collapse sheets and chords of the base complex of the input mesh with a geometric optimization, which improves the elements quality. All our operations are guaranteed not to introduce elements with negative Jacobians, ensuring that our algorithm always produces valid hex-meshes, and not to increase the Hausdorff distance from the original shape more than a user-defined threshold, ensuring a faithful approximation of the input geometry. Our algorithm can improve meshes produced with any existing hexahedral meshing algorithm --- we demonstrate its effectiveness by processing a dataset of 194 hex-meshes created with octree-based, polycube-based, and field-aligned methods.
Bijective maps are commonly used in many computer graphics and scientific computing applications, including texture, displacement, and bump mapping. However, their computation is numerically challenging due to the global nature of the problem, which makes standard smooth optimization techniques prohibitively expensive. We propose to use a scaffold structure to reduce this challenging and global problem to a local injectivity condition. This construction allows us to benefit from the recent advancements in locally injective maps optimization to efficiently compute large scale bijective maps (both in 2D and 3D), sidestepping the need to explicitly detect and avoid collisions. Our algorithm is guaranteed to robustly compute a globally bijective map, both in 2D and 3D. To demonstrate the practical applicability, we use it to compute globally bijective single patch parametrizations, to pack multiple charts into a single UV domain, to remove self-intersections from existing models, and to deform 3D objects while preventing self-intersections. Our approach is simple to implement, efficient (two orders of magnitude faster than competing methods), and robust, as we demonstrate in a stress test on a parametrization dataset with over a hundred meshes.
One of the key properties of many surface reconstruction techniques is that they represent the volume in front of and behind the surface, e.g., using a variant of signed distance functions. This creates significant problems when reconstructing thin areas of an object since the backside interferes with the reconstruction of the front. We present a two-step technique that avoids this interference and thus imposes no constraints on object thickness. Our method first extracts an approximate surface crust and then iteratively refines the crust to yield the final surface mesh. To extract the crust, we use a novel observation-dependent kernel density estimation to robustly estimate the approximate surface location from the samples. Free space is similarly estimated from the samples' visibility information. In the following refinement, we determine the remaining error using a surface-based kernel interpolation that limits the samples' influence to nearby surface regions with similar orientation and iteratively move the surface towards its true location. We demonstrate our results on synthetic as well as real datasets reconstructed using multi-view stereo techniques or consumer depth sensors.
3D tensor field design is important in several graphics applications such as procedural noise, solid texturing, and geometry synthesis. Different fields can lead to different visual effects. The topology of a tensor field, such as degenerate tensors, can cause artifacts in these applications. Existing 2D tensor field design systems cannot be used to handle the topology of a 3D tensor field. In this paper, we present to our knowledge the first 3D tensor field design system. At the core of our system is the ability to edit the topology of tensor fields. We demonstrate the power of our design system with applications in solid texturing and geometry synthesis.
We present a new optical design for see-through near-eye displays that is simple, compact, varifocal, and provides a wide field of view with clear peripheral vision and large eyebox. Key to this effort is a novel see-through rear-projection screen. We project an image to the see-through screen using an off-axis path, which is then relayed to the user's eyes through an on-axis partially-reflective magnifying surface. Converting the off-axis path to a compact on-axis imaging path simplifies the optical design. We establish fundamental trade-offs between the quantitative parameters of resolution, field of view, and the form-factor of our design. We demonstrate a wearable binocular near-eye display using off-the-shelf projection displays, custom-designed see-through spherical concave mirrors, and see-through screen designs using either custom holographic optical elements or polarization-selective diffusers.
We introduce an augmented reality near-eye display dubbed "Retinal 3D." Key features of the proposed display system are as follows: Focus cues are provided by generating the pupil-tracked light field that can be directly projected onto the retina. Generated focus cues are valid over a large depth range since laser beams are shaped for a large depth of field (DOF). Pupil-tracked light field generation significantly reduces the needed information/computation load. Also, it provides "dynamic eye-box" which can be a break-through that overcome the drawbacks of retinal projection-type displays. For implementation, we utilized a holographic optical element (HOE) as an image combiner, which allowed high transparency with a thin structure. Compared with current augmented reality displays, the proposed system shows competitive performances of a large field of view (FOV), high transparency, high contrast, high resolution, as well as focus cues in a large depth range. Two prototypes are presented along with experimental results and assessments. Analysis on the DOF of light rays and validity of focus cue generation are presented as well. Combination of pupil tracking and advanced near-eye display technique opens new possibilities of the future augmented reality.
Computational caustics and light steering displays offer a wide range of interesting applications, ranging from art works and architectural installations to energy efficient HDR projection. In this work we expand on this concept by encoding several target images into pairs of front and rear phase-distorting surfaces. Different target holograms can be decoded by mixing and matching different front and rear surfaces under specific geometric alignments. Our approach, which we call mix-and-match holography, is made possible by moving from a refractive caustic image formation process to a diffractive, holographic one. This provides the extra bandwidth that is required to multiplex several images into pairing surfaces.
We derive a detailed image formation model for the setting of holographic projection displays, as well as a multiplexing method based on a combination of phase retrieval methods and complex matrix factorization. We demonstrate several application scenarios in both simulation and physical prototypes.
A variety of applications such as virtual reality and immersive cinema require high image quality, low rendering latency, and consistent depth cues. 4D light field displays support focus accommodation, but are more costly to render than 2D images, resulting in higher latency.
The human visual system can resolve higher spatial frequencies in the fovea than in the periphery. This property has been harnessed by recent 2D foveated rendering methods to reduce computation cost while maintaining perceptual quality. Inspired by this, we present foveated 4D light fields by investigating their effects on 3D depth perception. Based on our psychophysical experiments and theoretical analysis on visual and display bandwidths, we formulate a content-adaptive importance model in the 4D ray space. We verify our method by building a prototype light field display that can render only 16% -- 30% rays without compromising perceptual quality.
State-of-the-art real-time face tracking systems still lack the ability to realistically portray subtle details of various aspects of the face, particularly the region surrounding the eyes. To improve this situation, we propose a technique to reconstruct the 3D shape and motion of eyelids in real time. By combining these results with the full facial expression and gaze direction, our system generates complete face tracking sequences with more detailed eye regions than existing solutions in real-time. To achieve this goal, we propose a generative eyelid model which decomposes eyelid variation into two low-dimensional linear spaces which efficiently represent the shape and motion of eyelids. Then, we modify a holistically-nested DNN model to jointly perform semantic eyelid edge detection and identification on images. Next, we correspond vertices of the eyelid model to 2D image edges, and employ polynomial curve fitting and a search scheme to handle incorrect and partial edge detections. Finally, we use the correspondences in a 3D-to-2D edge fitting scheme to reconstruct eyelid shape and pose. By integrating our fast fitting method into a face tracking system, the estimated eyelid results are seamlessly fused with the face and eyeball results in real time. Experiments show that our technique applies to different human races, eyelid shapes, and eyelid motions, and is robust to changes in head pose, expression and gaze direction.
The field of 3D face modeling has a large gap between high-end and low-end methods. At the high end, the best facial animation is indistinguishable from real humans, but this comes at the cost of extensive manual labor. At the low end, face capture from consumer depth sensors relies on 3D face models that are not expressive enough to capture the variability in natural facial shape and expression. We seek a middle ground by learning a facial model from thousands of accurately aligned 3D scans. Our FLAME model (Faces Learned with an Articulated Model and Expressions) is designed to work with existing graphics software and be easy to fit to data. FLAME uses a linear shape space trained from 3800 scans of human heads. FLAME combines this linear shape space with an articulated jaw, neck, and eyeballs, pose-dependent corrective blendshapes, and additional global expression blendshapes. The pose and expression dependent articulations are learned from 4D face sequences in the D3DFACS dataset along with additional 4D sequences. We accurately register a template mesh to the scan sequences and make the D3DFACS registrations available for research purposes. In total the model is trained from over 33, 000 scans. FLAME is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model. We compare FLAME to these models by fitting them to static 3D scans and 4D sequences using the same optimization method. FLAME is significantly more accurate and is available for research purposes (http://flame.is.tue.mpg.de).
We present a fully automatic framework that digitizes a complete 3D head with hair from a single unconstrained image. Our system offers a practical and consumer-friendly end-to-end solution for avatar personalization in gaming and social VR applications. The reconstructed models include secondary components (eyes, teeth, tongue, and gums) and provide animation-friendly blendshapes and joint-based rigs. While the generated face is a high-quality textured mesh, we propose a versatile and efficient polygonal strips (polystrips) representation for the hair. Polystrips are suitable for an extremely wide range of hairstyles and textures and are compatible with existing game engines for real-time rendering. In addition to integrating state-of-the-art advances in facial shape modeling and appearance inference, we propose a novel single-view hair generation pipeline, based on 3D-model and texture retrieval, shape refinement, and polystrip patching optimization. The performance of our hairstyle retrieval is enhanced using a deep convolutional neural network for semantic hair attribute classification. Our generated models are visually comparable to state-of-the-art game characters designed by professional artists. For real-time settings, we demonstrate the flexibility of polystrips in handling hairstyle variations, as opposed to conventional strand-based representations. We further show the effectiveness of our approach on a large number of images taken in the wild, and how compelling avatars can be easily created by anyone.
We present a technique to automatically animate a still portrait, making it possible for the subject in the photo to come to life and express various emotions. We use a driving video (of a different subject) and develop means to transfer the expressiveness of the subject in the driving video to the target portrait. In contrast to previous work that requires an input video of the target face to reenact a facial performance, our technique uses only a single target image. We animate the target image through 2D warps that imitate the facial transformations in the driving video. As warps alone do not carry the full expressiveness of the face, we add fine-scale dynamic details which are commonly associated with facial expressions such as creases and wrinkles. Furthermore, we hallucinate regions that are hidden in the input target face, most notably in the inner mouth. Our technique gives rise to reactive profiles, where people in still images can automatically interact with their viewers. We demonstrate our technique operating on numerous still portraits from the internet.
We present a novel method for the combustion of botanical tree models. Tree models are represented as connected particles for the branching structure and a polygonal surface mesh for the combustion. Each particle stores biological and physical attributes that drive the kinetic behavior of a plant and the exothermic reaction of the combustion. Coupled with realistic physics for rods, the particles enable dynamic branch motions. We model material properties, such as moisture and charring behavior, and associate them with individual particles. The combustion is efficiently processed in the surface domain of the tree model on a polygonal mesh. A user can dynamically interact with the model by initiating fires and by inducing stress on branches. The flames realistically propagate through the tree model by consuming the available resources. Our method runs at interactive rates and supports multiple tree instances in parallel. We demonstrate the effectiveness of our approach through numerous examples and evaluate its plausibility against the combustion of real wood samples.
Imaginary winged creatures in computer animation applications are expected to perform a variety of motor skills in a physically realistic and controllable manner. Designing physics-based controllers for a flying creature is still very challenging particularly when the dynamic model of the creatures is high-dimensional, having many degrees of freedom. In this paper, we present a control method for flying creatures, which are aerodynamically simulated, interactively controllable, and equipped with a variety of motor skills such as soaring, gliding, hovering, and diving. Each motor skill is represented as Deep Neural Networks (DNN) and learned using Deep Q-Learning (DQL). Our control method is example-guided in the sense that it provides the user with direct control over the learning process by allowing the user to specify keyframes of motor skills. Our novel learning algorithm was inspired by evolutionary strategies of Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to improve the convergence rate and the final quality of the control policy. The effectiveness of our Evolutionary DQL method is demonstrated with imaginary winged creatures flying in a physically simulated environment and their motor skills learned automatically from user-provided keyframes.
Simulating the behavior of soap films and foams is a challenging task. A direct numerical simulation of films and foams via the Navier-Stokes equations is still computationally too expensive. We propose an alternative formulation inspired by geometric flow. Our model exploits the fact, according to Plateau's laws, that the steady state of a film is a union of constant mean curvature surfaces and minimal surfaces. Such surfaces are also well known as the steady state solutions of certain curvature flows. We show a link between the Navier-Stokes equations and a recent variant of mean curvature flow, called hyperbolic mean curvature flow, under the assumption of constant air pressure per enclosed region. Instead of using hyperbolic mean curvature flow as is, we propose to replace curvature by the gradient of the surface area functional. This formulation enables us to robustly handle non-manifold configurations; such junctions connecting multiple films are intractable with the traditional formulation using curvature. We also add explicit volume preservation to hyperbolic mean curvature flow, which in fact corresponds to the pressure term of the Navier-Stokes equations. Our method is simple, fast, robust, and consistent with Plateau's laws, which are all due to our reformulation of film dynamics as a geometric flow.
Rapid urbanization and increasing traffic have caused severe social, economic, and environmental problems in metropolitan areas worldwide. Traffic reconstruction and visualization using existing traffic data can provide novel tools for vehicle navigation and routing, congestion analysis, and traffic management. While traditional data collection methods are becoming increasingly common (e.g. using in-road sensors), GPS devices are also becoming ubiquitous. In this paper, we address the problem of traffic reconstruction, visualization, and animation using mobile vehicle data (i.e. GPS traces). We first conduct city-scale traffic reconstruction using statistical learning on mobile vehicle data for traffic animation and visualization, and then dynamically complete missing data using metamodel-based simulation optimization in areas of insufficient data coverage. We evaluate our approach quantitatively and qualitatively, and demonstrate our results with 2D visualization of citywide traffic, as well as 2D and 3D animation of reconstructed traffic in virtual environments.
We present a system for adaptive synthesis of indoor scenes given an empty room and only a few object categories. Automatically suggesting indoor objects and proper layouts to convert an empty room to a 3D scene is challenging, since it requires interior design knowledge to balance the factors like space, path distance, illumination and object relations, in order to insure the functional plausibility of the synthesized scenes. We exploit a database of 2D floor plans to extract object relations and provide layout examples for scene synthesis. With the labeled human positions and directions in each plan, we detect the activity relations and compute the coexistence frequency of object pairs to construct activity-associated object relation graphs. Given the input room and user-specified object categories, our system first leverages the object relation graphs and the database floor plans to suggest more potential object categories beyond the specified ones to make resulting scenes functionally complete, and then uses the similar plan references to create the layout of synthesized scenes. We show various synthesis results to demonstrate the practicability of our system, and validate its usability via a user study. We also compare our system with the state-of-the-art furniture layout and activity-centric scene representation methods, in terms of functional plausibility and user friendliness.
Autonomous reconstruction of unknown scenes by a mobile robot inherently poses the question of balancing between exploration efficacy and reconstruction quality. We present a navigation-by-reconstruction approach to address this question, where moving paths of the robot are planned to account for both global efficiency for fast exploration and local smoothness to obtain high-quality scans. An RGB-D camera, attached to the robot arm, is dictated by the desired reconstruction quality as well as the movement of the robot itself. Our key idea is to harness a time-varying tensor field to guide robot movement, and then solve for 3D camera control under the constraint of the 2D robot moving path. The tensor field is updated in real time, conforming to the progressively reconstructed scene. We show that tensor fields are well suited for guiding autonomous scanning for two reasons: first, they contain sparse and controllable singularities that allow generating a locally smooth robot path, and second, their topological structure can be used for globally efficient path routing within a partially reconstructed scene. We have conducted numerous tests with a mobile robot, and demonstrate that our method leads to a smooth exploration and high-quality reconstruction of unknown indoor scenes.
We present 3DLite1, a novel approach to reconstruct 3D environments using consumer RGB-D sensors, making a step towards directly utilizing captured 3D content in graphics applications, such as video games, VR, or AR. Rather than reconstructing an accurate one-to-one representation of the real world, our method computes a lightweight, low-polygonal geometric abstraction of the scanned geometry. We argue that for many graphics applications it is much more important to obtain high-quality surface textures rather than highly-detailed geometry. To this end, we compensate for motion blur, auto-exposure artifacts, and micro-misalignments in camera poses by warping and stitching image fragments from low-quality RGB input data to achieve high-resolution, sharp surface textures. In addition to the observed regions of a scene, we extrapolate the scene geometry, as well as the mapped surface textures, to obtain a complete 3D model of the environment. We show that a simple planar abstraction of the scene geometry is ideally suited for this completion task, enabling 3DLite to produce complete, lightweight, and visually compelling 3D scene models. We believe that these CAD-like reconstructions are an important step towards leveraging RGB-D scanning in actual content creation pipelines.
The creation of high-quality semantically parsed 3D models for dense metropolitan areas is a fundamental urban modeling problem. Although recent advances in acquisition techniques and processing algorithms have resulted in large-scale imagery or 3D polygonal reconstructions, such data-sources are typically noisy, and incomplete, with no semantic structure. In this paper, we present an automatic data fusion technique that produces high-quality structured models of city blocks. From coarse polygonal meshes, street-level imagery, and GIS footprints, we formulate a binary integer program that globally balances sources of error to produce semantically parsed mass models with associated facade elements. We demonstrate our system on four city regions of varying complexity; our examples typically contain densely built urban blocks spanning hundreds of buildings. In our largest example, we produce a structured model of 37 city blocks spanning a total of 1, 011 buildings at a scale and quality previously impossible to achieve automatically.
Normal mapping enhances the amount of visual detail of surfaces by using shading normals that deviate from the geometric normal. However, the resulting surface model is geometrically impossible and normal mapping is thus often considered a fundamentally flawed approach with unavoidable problems for Monte Carlo path tracing, such as asymmetry, back-facing normals, and energy loss arising from this incoherence. These problems are usually sidestepped in real-time renderers, but they cannot be fixed robustly in a path tracer: normal mapping breaks either the appearance (black fringes, energy loss) or the integrator (different forward and backward light transport); in practice, workarounds and tweaked normal maps are often required to hide artifacts.
We present microfacet-based normal mapping, an alternative way of faking geometric details without corrupting the robustness of Monte Carlo path tracing. It takes the same input data as classic normal mapping and works with any input BRDF. Our idea is to construct a geometrically valid microfacet surface made of two facets per shading point: the one given by the normal map at the shading point and an additional facet that compensates for it such that the average normal of the microsurface equals the geometric normal. We derive the resulting microfacet BRDF and show that it mimics geometric detail in a plausible way, although it does not replicate the appearance of classic normal mapping. However, our microfacet-based normal mapping model is well-defined, symmetric, and energy conserving, and thus yields identical results with any path tracing algorithm (forward, backward, or bidirectional).
We present a novel approach for on-site acquisition of surface reflectance for planar, spatially varying, isotropic samples in uncontrolled outdoor environments. Our method exploits the naturally occurring linear polarization of incident and reflected illumination for this purpose. By rotating a linear polarizing filter in front of a camera at three different orientations, we measure the polarization reflected off the sample and combine this information with multi-view analysis and inverse rendering in order to recover per-pixel, high resolution reflectance and surface normal maps. Specifically, we employ polarization imaging from two near orthogonal views close to the Brewster angle of incidence in order to maximize polarization cues for surface reflectance estimation. To the best of our knowledge, our method is the first to successfully extract a complete set of reflectance parameters with passive capture in completely uncontrolled outdoor settings. To this end, we analyze our approach under the general, but previously unstudied, case of incident partial linear polarization (due to the sky) in order to identify the strengths and weaknesses of the method under various outdoor conditions. We provide practical guidelines for on-site acquisition based on our analysis, and demonstrate high quality results with an entry level DSLR as well as a mobile phone.
The surface of metal, glass and plastic objects is often characterized by microscopic scratches caused by manufacturing and/or wear. A closer look onto such scratches reveals iridescent colors with a complex dependency on viewing and lighting conditions. The physics behind this phenomenon is well understood; it is caused by diffraction of the incident light by surface features on the order of the optical wavelength. Existing analytic models are able to reproduce spatially unresolved microstructure such as the iridescent appearance of compact disks and similar materials. Spatially resolved scratches, on the other hand, have proven elusive due to the highly complex wave-optical light transport simulations needed to account for their appearance. In this paper, we propose a wave-optical shading model based on non-paraxial scalar diffraction theory to render this class of effects. Our model expresses surface roughness as a collection of line segments. To shade a point on the surface, the individual diffraction patterns for contributing scratch segments are computed analytically and superimposed coherently. This provides natural transitions from localized glint-like iridescence to smooth BRDFs representing the superposition of many reflections at large viewing distances. We demonstrate that our model is capable of recreating the overall appearance as well as characteristic detail effects observed on real-world examples.
Physically-based hair and fur rendering is crucial for visual realism. One of the key effects is global illumination, involving light bouncing between different fibers. This is very time-consuming to simulate with methods like path tracing. Efficient approximate global illumination techniques such as dual scattering are in widespread use, but are limited to human hair only, and cannot handle color bleeding, transparency and hair-object inter-reflection.
We present the first global illumination model, based on dipole diffusion for subsurface scattering, to approximate light bouncing between individual fur fibers. We model complex light and fur interactions as subsurface scattering, and use a simple neural network to convert from fur fibers' properties to scattering parameters. Our network is trained on only a single scene with different parameters, but applies to general scenes and produces visually accurate appearance, supporting color bleeding and further inter-reflections.
Streaming of 360° content is gaining attention as an immersive way to remotely experience live events. However live capture is presently limited to 2D content due to the prohibitive computational cost associated with multi-camera rigs. In this work we present a system that directly captures streaming 3D virtual reality content. Our approach does not suffer from spatial or temporal seams and natively handles phenomena that are challenging for existing systems, including refraction, reflection, transparency and speculars. Vortex natively captures in the omni-directional stereo (ODS) format, which is widely supported by VR displays and streaming pipelines. We identify an important source of distortion inherent to the ODS format, and demonstrate a simple means of correcting it. We include a detailed analysis of the design space, including tradeoffs between noise, frame rate, resolution, and hardware complexity. Processing is minimal, enabling live transmission of immersive, 3D, 360° content. We construct a prototype and demonstrate capture of 360° scenes at up to 8192 X 4096 pixels at 5 fps, and establish the viability of operation up to 32 fps.
Computer-graphics engineers and vision scientists want to generate images that reproduce realistic depth-dependent blur. Current rendering algorithms take into account scene geometry, aperture size, and focal distance, and they produce photorealistic imagery as with a high-quality camera. But to create immersive experiences, rendering algorithms should aim instead for perceptual realism. In so doing, they should take into account the significant optical aberrations of the human eye. We developed a method that, by incorporating some of those aberrations, yields displayed images that produce retinal images much closer to the ones that occur in natural viewing. In particular, we create displayed images taking the eye's chromatic aberration into account. This produces different chromatic effects in the retinal image for objects farther or nearer than current focus. We call the method ChromaBlur. We conducted two experiments that illustrate the benefits of ChromaBlur. One showed that accommodation (eye focusing) is driven quite effectively when ChromaBlur is used and that accommodation is not driven at all when conventional methods are used. The second showed that perceived depth and realism are greater with imagery created by ChromaBlur than in imagery created conventionally. ChromaBlur can be coupled with focus-adjustable lenses and gaze tracking to reproduce the natural relationship between accommodation and blur in HMDs and other immersive devices. It may thereby minimize the adverse effects of vergence-accommodation conflicts.
Virtual reality applications prefer real walking to provide highly immersive presence than other locomotive methods. Mapping-based techniques are very effective for supporting real walking in small physical workspaces while exploring large virtual scenes. However, the existing methods for computing real walking maps suffer from poor quality due to distortion. In this paper, we present a novel divide-and-conquer method, called Smooth Assembly Mapping (SAM), to compute real walking maps with low isometric distortion for large-scale virtual scenes. First, the input virtual scene is decomposed into a set of smaller local patches. Then, a group of local patches is mapped together into a real workspace by minimizing a low isometric distortion energy with smoothness constraints between the adjacent patches. All local patches are mapped and assembled one by one to obtain a complete map. Finally, a global optimization is adopted to further reduce the distortion throughout the entire map. Our method easily handles teleportation technique by computing maps of individual regions and assembling them with teleporter conformity constraints. A large number of experiments, including formative user studies and comparisons, have shown that our method succeeds in generating high-quality real walking maps from large-scale virtual scenes to small real workspaces and is demonstrably superior to state-of-the-art methods.
To realize 3D spatial sound rendering with a two-channel headphone, one needs head-related transfer functions (HRTFs) tailored for a specific user. However, measurement of HRTFs requires a tedious and expensive procedure. To address this, we propose a fully perceptual-based HRTF fitting method for individual users using machine learning techniques. The user only needs to answer pairwise comparisons of test signals presented by the system during calibration. This reduces the efforts necessary for the user to obtain individualized HRTFs. Technically, we present a novel adaptive variational AutoEncoder with a convolutional neural network. In the training, this AutoEncoder analyzes publicly available HRTFs dataset and identifies factors that depend on the individuality of users in a nonlinear space. In calibration, the AutoEncoder generates high-quality HRTFs fitted to a specific user by blending the factors. We validate the feasibilities of our method through several quantitative experiments and a user study.
We present a mesh-based, interpolatory method for interactively creating artist-directed inbetweens from arbitrary sets of 2D drawing shapes without rigging. To enable artistic freedom of expression we remove prior restrictions on the range of possible changes between shapes; we support interpolation with extreme deformation and unrestricted topology change. To do this, we extend discrete variational interpolation by introducing a consistent multimesh structure over drawings, a Comesh Optimization algorithm that optimizes our multimesh for both intra- and inter-mesh quality, and a new shape-space energy that efficiently supports arbitrary changes and can prevent artwork overlap when desired. Our multimesh encodes specified correspondences that guide interpolation paths between shapes. With these correspondences, an efficient local-global minimization of our energy interpolates n-way between drawing shapes to create inbetweens. Our Comesh Optimization enables artifact-free minimization by building consistent meshes across drawings that improve both the quality of per-mesh energy discretization and inter-mesh mapping distortions, while guaranteeing a single, compatible triangulation. We implement our method in a test-bed interpolation system that allows interactive creation and editing of animations from sparse key drawings with arbitrary topology and shape change.
We present a highly efficient planar meshless shape deformation algorithm. Our method is based on an unconstrained minimization of isometric energies, and is guaranteed to produce C∞ locally injective maps by operating within a reduced dimensional subspace of harmonic maps. We extend the harmonic subspace of [Chen and Weber 2015] to support multiply-connected domains, and further provide a generalization of the bounded distortion theorem that appeared in that paper. Our harmonic map, as well as the gradient and the Hessian of our isometric energies possess closed-form expressions. A key result is a simple-and-fast analytic modification of the Hessian of the energy such that it is positive definite, which is crucial for the successful operation of a Newton solver. The method is straightforward to implement and is specifically designed to harness the processing power of modern graphics hardware. Our modified Newton iterations are shown to be extremely effective, leading to fast convergence after a handful of iterations, while each iteration is fast due to a combination of a number of factors, such as the smoothness and the low dimensionality of the subspace, the closed-form expressions for the differentials, and the avoidance of expensive strategies to ensure positive definiteness. The entire pipeline is carried out on the GPU, leading to deformations that are significantly faster to compute than the state-of-the-art.
We propose a UV mapping algorithm that jointly optimizes for cuts and distortion, sidestepping heuristics for placing the cuts. The energy we minimize is a state-of-the-art geometric distortion measure, generalized to take seams into account. Our algorithm is designed to support an interactive workflow: it optimizes UV maps on the fly, while the user can interactively move vertices, cut mesh parts, join seams, separate overlapping regions, and control the placement of the parameterization patches in the UV space. Our UV maps are of high quality in terms of both geometric distortion and cut placement, and compare favorably to those designed with traditional modeling tools. The UV maps can be created in a fraction of the time as existing methods, since our algorithm drastically alleviates the trial-and-error, iterative procedures that plague traditional UV mapping approaches.
A parameterization decouples the resolution of a signal on a surface from the resolution of the surface geometry. In practice, parameterized signals are conveniently and efficiently stored as texture images. Unfortunately, seams are inevitable when parametrizing most surfaces. Their visual artifacts are well known for color signals, but become even more egregious when geometry or displacement signals are used: cracks or gaps may appear in the surface. To make matters worse, parameterizations and their seams are frequently ignored during mesh processing. Carefully accounting for seams in one phase may be nullified by the next. The existing literature on seam-elimination requires non-standard rendering algorithms or else overly restricts the parameterization and signal.
We present seam-aware mesh processing techniques. For a given fixed mesh, we analytically characterize the space of seam-free textures as the null space of a linear operator. Assuming seam-free textures, we describe topological and geometric conditions for seam-free edge-collapse operations. Our algorithms eliminate seam artifacts in parameterized signals and decimate a mesh---including its seams---while preserving its parameterization and seam-free appearance. This allows the artifact-free display of surface signals---color, normals, positions, displacements, linear blend skinning weights---with the standard GPU rendering pipeline. In particular, our techniques enable crack-free use of the tessellation stage of modern GPU's for dynamic level-of-detail. This decouples the shape signal from mesh resolution in a manner compatible with existing workflows.
We present a novel, compact single-shot hyperspectral imaging method. It enables capturing hyperspectral images using a conventional DSLR camera equipped with just an ordinary refractive prism in front of the camera lens. Our computational imaging method reconstructs the full spectral information of a scene from dispersion over edges. Our setup requires no coded aperture mask, no slit, and no collimating optics, which are necessary for traditional hyperspectral imaging systems. It is thus very cost-effective, while still highly accurate. We tackle two main problems: First, since we do not rely on collimation, the sensor records a projection of the dispersion information, distorted by perspective. Second, available spectral cues are sparse, present only around object edges. We formulate an image formation model that can predict the perspective projection of dispersion, and a reconstruction method that can estimate the full spectral information of a scene from sparse dispersion information. Our results show that our method compares well with other state-of-the-art hyperspectral imaging systems, both in terms of spectral accuracy and spatial resolution, while being orders of magnitude cheaper than commercial imaging systems.
We present a novel hyperspectral image reconstruction algorithm, which overcomes the long-standing tradeoff between spectral accuracy and spatial resolution in existing compressive imaging approaches. Our method consists of two steps: First, we learn nonlinear spectral representations from real-world hyperspectral datasets; for this, we build a convolutional autoencoder which allows reconstructing its own input through its encoder and decoder networks. Second, we introduce a novel optimization method, which jointly regularizes the fidelity of the learned nonlinear spectral representations and the sparsity of gradients in the spatial domain, by means of our new fidelity prior. Our technique can be applied to any existing compressive imaging architecture, and has been thoroughly tested both in simulation, and by building a prototype hyperspectral imaging system. It outperforms the state-of-the-art methods from each architecture, both in terms of spectral accuracy and spatial resolution, while its computational complexity is reduced by two orders of magnitude with respect to sparse coding techniques. Moreover, we present two additional applications of our method: hyperspectral interpolation and demosaicing. Last, we have created a new high-resolution hyperspectral dataset containing sharper images of more spectral variety than existing ones, available through our project website.
Time-of-flight (ToF) imaging has become a widespread technique for depth estimation, allowing affordable off-the-shelf cameras to provide depth maps in real time. However, multipath interference (MPI) resulting from indirect illumination significantly degrades the captured depth. Most previous works have tried to solve this problem by means of complex hardware modifications or costly computations. In this work, we avoid these approaches and propose a new technique to correct errors in depth caused by MPI, which requires no camera modifications and takes just 10 milliseconds per frame. Our observations about the nature of MPI suggest that most of its information is available in image space; this allows us to formulate the depth imaging process as a spatially-varying convolution and use a convolutional neural network to correct MPI errors. Since the input and output data present similar structure, we base our network on an autoencoder, which we train in two stages. First, we use the encoder (convolution filters) to learn a suitable basis to represent MPI-corrupted depth images; then, we train the decoder (deconvolution filters) to correct depth from synthetic scenes, generated by using a physically-based, time-resolved renderer. This approach allows us to tackle a key problem in ToF, the lack of ground-truth data, by using a large-scale captured training set with MPI-corrupted depth to train the encoder, and a smaller synthetic training set with ground truth depth to train the decoder stage of the network. We demonstrate and validate our method on both synthetic and real complex scenarios, using an off-the-shelf ToF camera, and with only the captured, incorrect depth as input.
Computational photography encompasses a diversity of imaging techniques, but one of the core operations performed by many of them is to compute image differences. An intuitive approach to computing such differences is to capture several images sequentially and then process them jointly. In this paper, we introduce a snapshot difference imaging approach that is directly implemented in the sensor hardware of emerging time-of-flight cameras. With a variety of examples, we demonstrate that the proposed snapshot difference imaging technique is useful for direct-global illumination separation, for direct imaging of spatial and temporal image gradients, for direct depth edge imaging, and more.
The simulation of high viscoelasticity poses important computational challenges. One is the difficulty to robustly measure strain and its derivatives in a medium without permanent structure. Another is the high stiffness of the governing differential equations. Solutions that tackle these challenges exist, but they are computationally slow. We propose a constraint-based model of viscoelasticity that enables efficient simulation of highly viscous and viscoelastic phenomena. Our model reformulates, in a constraint-based fashion, a constitutive model of viscoelasticity for polymeric fluids, which defines simple governing equations for a conformation tensor. The model can represent a diverse palette of materials, spanning elastoplastic, highly viscous, and inviscid liquid behaviors. In addition, we have designed a constrained dynamics solver that extends the position-based dynamics method to handle efficiently both position-based and velocity-based constraints. We show results that range from interactive simulation of viscoelastic effects to large-scale simulation of high viscosity with competitive performance.
Recently the Affine Particle-In-Cell (APIC) Method was proposed by Jiang et al.[2015; 2017b] to improve the accuracy of the transfers in Particle-In-Cell (PIC) [Harlow 1964] techniques by augmenting each particle with a locally affine, rather than locally constant description of the velocity. This reduced the dissipation of the original PIC without suffering from the noise present in the historic alternative, Fluid-Implicit-Particle (FLIP) [Brackbill and Ruppel 1986]. We present a generalization of APIC by augmenting each particle with a more general local function. By viewing the grid-to-particle transfer as a linear and angular momentum conserving projection of the particle-wise local grid velocities onto a reduced basis, we greatly improve the energy and vorticity conservation over the original APIC. Furthermore, we show that the cost of the generalized projection is negligible over APIC when using a particular class of local polynomial functions. Lastly, we note that our method retains the filtering property of APIC and PIC and thus has similar robustness to noise.
We present an adaptive Generalized Interpolation Material Point (GIMP) method for simulating elastoplastic materials. Our approach allows adaptive refining and coarsening of different regions of the material, leading to an efficient MPM solver that concentrates most of the computation resources in specific regions of interest. We propose a C1 continuous adaptive basis function that satisfies the partition of unity property and remains non-negative throughout the computational domain. We develop a practical strategy for particle-grid transfers that leverages the recently introduced SPGrid data structure for storing sparse multi-layered grids. We demonstrate the robustness and efficiency of our method on the simulation of various elastic and plastic materials. We also compare key kernel components to uniform grid MPM solvers to highlight performance benefits of our method.
We introduce a unified particle framework which integrates the phase-field method with multi-material simulation to allow modeling of both liquids and solids, as well as phase transitions between them. A simple elasto-plastic model is used to capture the behavior of various kinds of solids, including deformable bodies, granular materials, and cohesive soils. States of matter or phases, particularly liquids and solids, are modeled using the non-conservative Allen-Cahn equation. In contrast, materials---made of different substances---are advected by the conservative Cahn-Hilliard equation. The distributions of phases and materials are represented by a phase variable and a concentration variable, respectively, allowing us to represent commonly observed fluid-solid interactions. Our multi-phase, multi-material system is governed by a unified Helmholtz free energy density. This framework provides the first method in computer graphics capable of modeling a continuous interface between phases. It is versatile and can be readily used in many scenarios that are challenging to simulate. Examples are provided to demonstrate the capabilities and effectiveness of this approach.
We introduce a deep learning approach for grouping discrete patterns common in graphical designs. Our approach is based on a convolutional neural network architecture that learns a grouping measure defined over a pair of pattern elements. Motivated by perceptual grouping principles, the key feature of our network is the encoding of element shape, context, symmetries, and structural arrangements. These element properties are all jointly considered and appropriately weighted in our grouping measure. To better align our measure with human perceptions for grouping, we train our network on a large, human-annotated dataset of pattern groupings consisting of patterns at varying granularity levels, with rich element relations and varieties, and tempered with noise and other data imperfections. Experimental results demonstrate that our deep-learned measure leads to robust grouping results.
Assembly-based tools provide a powerful modeling paradigm for non-expert shape designers. However, choosing a component from a large shape repository and aligning it to a partial assembly can become a daunting task. In this paper we describe novel neural network architectures for suggesting complementary components and their placement for an incomplete 3D part assembly. Unlike most existing techniques, our networks are trained on unlabeled data obtained from public online repositories, and do not rely on consistent part segmentations or labels. Absence of labels poses a challenge in indexing the database of parts for the retrieval. We address it by jointly training embedding and retrieval networks, where the first indexes parts by mapping them to a low-dimensional feature space, and the second maps partial assemblies to appropriate complements. The combinatorial nature of part arrangements poses another challenge, since the retrieval network is not a function: several complements can be appropriate for the same input. Thus, instead of predicting a single output, we train our network to predict a probability distribution over the space of part embeddings. This allows our method to deal with ambiguities and naturally enables a UI that seamlessly integrates user preferences into the design process. We demonstrate that our method can be used to design complex shapes with minimal or no user input. To evaluate our approach we develop a novel benchmark for component suggestion systems demonstrating significant improvement over state-of-the-art techniques.
We introduce a method for learning a model for the mobility of parts in 3D objects. Our method allows not only to understand the dynamic functionalities of one or more parts in a 3D object, but also to apply the mobility functions to static 3D models. Specifically, the learned part mobility model can predict mobilities for parts of a 3D object given in the form of a single static snapshot reflecting the spatial configuration of the object parts in 3D space, and transfer the mobility from relevant units in the training data. The training data consists of a set of mobility units of different motion types. Each unit is composed of a pair of 3D object parts (one moving and one reference part), along with usage examples consisting of a few snapshots capturing different motion states of the unit. Taking advantage of a linearity characteristic exhibited by most part motions in everyday objects, and utilizing a set of part-relation descriptors, we define a mapping from static snapshots to dynamic units. This mapping employs a motion-dependent snapshot-to-unit distance obtained via metric learning. We show that our learning scheme leads to accurate motion prediction from single static snapshots and allows proper motion transfer. We also demonstrate other applications such as motion-driven object detection and motion hierarchy construction.
Authoring virtual terrains presents a challenge and there is a strong need for authoring tools able to create realistic terrains with simple user-inputs and with high user control. We propose an example-based authoring pipeline that uses a set of terrain synthesizers dedicated to specific tasks. Each terrain synthesizer is a Conditional Generative Adversarial Network trained by using real-world terrains and their sketched counterparts. The training sets are built automatically with a view that the terrain synthesizers learn the generation from features that are easy to sketch. During the authoring process, the artist first creates a rough sketch of the main terrain features, such as rivers, valleys and ridges, and the algorithm automatically synthesizes a terrain corresponding to the sketch using the learned features of the training samples. Moreover, an erosion synthesizer can also generate terrain evolution by erosion at a very low computational cost. Our framework allows for an easy terrain authoring and provides a high level of realism for a minimum sketch cost. We show various examples of terrain synthesis created by experienced as well as inexperienced users who are able to design a vast variety of complex terrains in a very short time.
Monte-Carlo rendering algorithms have traditionally a high computational cost, because they rely on tracing up to billions of light paths through a scene to physically simulate light transport. Traditional path reusing amortizes the cost of path sampling over multiple pixels, but introduces visually unpleasant correlation artifacts and cannot handle scenes with specular light transport. We present gradient-domain path reusing, a novel unbiased Monte-Carlo rendering technique, which merges the concept of path reusing with the recently introduced idea of gradient-domain rendering. Since correlation is a key element in gradient sampling, it is a natural fit to be performed together with path reusing and we show that the typical artifacts of path reusing are significantly reduced by exploiting the gradient domain. Further, by employing the tools for shifting paths that were designed in the context of gradient-domain rendering over the last years, we can generalize path reusing to support arbitrary scenes including specular light transport. Our method is unbiased and currently the fastest converging unidirectional rendering technique outperforming conventional and gradient-domain path tracing by up to almost an order of magnitude.
We present a direct-to-indirect transport technique that enables accurate real-time rendering of indirect illumination in mostly static scenes of complexity on par with modern games while supporting fully dynamic lights, cameras and diffuse surface materials. Our key contribution is an algorithm for reconstructing the incident radiance field from a sparse set of local samples --- radiance probes --- by incorporating mutual visibility into the reconstruction filter. To compute global illumination, we factorize the direct-to-indirect transport operator into global and local parts, sample the global transport with sparse radiance probes at real-time, and use the sampled radiance field as input to our precomputed local reconstruction operator to obtain indirect radiance. In contrast to previous methods aiming to encode the global direct-to-indirect transport operator, our precomputed data is local in the sense that it needs no long-range interactions between probes and receivers, and every receiver depends only on a small, constant number of nearby radiance probes, aiding compression, storage, and iterative workflows. While not as accurate, we demonstrate that our method can also be used for rendering indirect illumination on glossy surfaces, and approximating global illumination in scenes with large-scale dynamic geometry.
We present a technique for efficiently synthesizing images of atmospheric clouds using a combination of Monte Carlo integration and neural networks. The intricacies of Lorenz-Mie scattering and the high albedo of cloud-forming aerosols make rendering of clouds---e.g. the characteristic silverlining and the "whiteness" of the inner body---challenging for methods based solely on Monte Carlo integration or diffusion theory. We approach the problem differently. Instead of simulating all light transport during rendering, we pre-learn the spatial and directional distribution of radiant flux from tens of cloud exemplars. To render a new scene, we sample visible points of the cloud and, for each, extract a hierarchical 3D descriptor of the cloud geometry with respect to the shading location and the light source. The descriptor is input to a deep neural network that predicts the radiance function for each shading configuration. We make the key observation that progressively feeding the hierarchical descriptor into the network enhances the network's ability to learn faster and predict with higher accuracy while using fewer coefficients. We also employ a block design with residual connections to further improve performance. A GPU implementation of our method synthesizes images of clouds that are nearly indistinguishable from the reference solution within seconds to minutes. Our method thus represents a viable solution for applications such as cloud design and, thanks to its temporal stability, for high-quality production of animated content.
Rendering fabrics using micro-appearance models---fiber-level microgeometry coupled with a fiber scattering model---can take hours per frame. We present a fast, precomputation-based algorithm for rendering both single and multiple scattering in fabrics with repeating structure illuminated by directional and spherical Gaussian lights.
Precomputed light transport (PRT) is well established but challenging to apply directly to cloth. This paper shows how to decompose the problem and pick the right approximations to achieve very high accuracy, with significant performance gains over path tracing. We treat single and multiple scattering separately and approximate local multiple scattering using precomputed transfer functions represented in spherical harmonics. We handle shadowing between fibers with precomputed per-fiber-segment visibility functions, using two different representations to separately deal with low and high frequency spherical Gaussian lights.
Our algorithm is designed for GPU performance and high visual quality. Compared to existing PRT methods, it is more accurate. In tens of seconds on a commodity GPU, it renders high-quality supersampled images that take path tracing tens of minutes on a compute cluster.
We propose an adaptive version of Lloyd's optimization method that distributes points based on Voronoi diagrams. Our inspiration is the Linde-Buzo-Gray-Algorithm in vector quantization, which dynamically splits Voronoi cells until a desired number of representative vectors is reached. We reformulate this algorithm by splitting and merging Voronoi cells based on their size, greyscale level, or variance of an underlying input image. The proposed method automatically adapts to various constraints and, in contrast to previous work, requires no good initial point distribution or prior knowledge about the final number of points. Compared to weighted Voronoi stippling the convergence rate is much higher and the spectral and spatial properties are superior. Further, because points are created based on local operations, coherent stipple animations can be produced. Our method is also able to produce good quality point sets in other fields, such as remeshing of geometry, based on local geometric features such as curvature.
We present an algorithm that enables casual 3D photography. Given a set of input photos captured with a hand-held cell phone or DSLR camera, our algorithm reconstructs a 3D photo, a central panoramic, textured, normal mapped, multi-layered geometric mesh representation. 3D photos can be stored compactly and are optimized for being rendered from viewpoints that are near the capture viewpoints. They can be rendered using a standard rasterization pipeline to produce perspective views with motion parallax. When viewed in VR, 3D photos provide geometrically consistent views for both eyes. Our geometric representation also allows interacting with the scene using 3D geometry-aware effects, such as adding new objects to the scene and artistic lighting effects.
Our 3D photo reconstruction algorithm starts with a standard structure from motion and multi-view stereo reconstruction of the scene. The dense stereo reconstruction is made robust to the imperfect capture conditions using a novel near envelope cost volume prior that discards erroneous near depth hypotheses. We propose a novel parallax-tolerant stitching algorithm that warps the depth maps into the central panorama and stitches two color-and-depth panoramas for the front and back scene surfaces. The two panoramas are fused into a single non-redundant, well-connected geometric mesh. We provide videos demonstrating users interactively viewing and manipulating our 3D photos.
We present a novel algorithm for view synthesis that utilizes a soft 3D reconstruction to improve quality, continuity and robustness. Our main contribution is the formulation of a soft 3D representation that preserves depth uncertainty through each stage of 3D reconstruction and rendering. We show that this representation is beneficial throughout the view synthesis pipeline. During view synthesis, it provides a soft model of scene geometry that provides continuity across synthesized views and robustness to depth uncertainty. During 3D reconstruction, the same robust estimates of scene visibility can be applied iteratively to improve depth estimation around object edges. Our algorithm is based entirely on O(1) filters, making it conducive to acceleration and it works with structured or unstructured sets of input views. We compare with recent classical and learning-based algorithms on plenoptic lightfields, wide baseline captures, and lightfield videos produced from camera arrays.
Holograms display a 3D image in high resolution and allow viewers to focus freely as if looking through a virtual window, yet computer generated holography (CGH) hasn't delivered the same visual quality under plane wave illumination and due to heavy computational cost. Light field displays have been popular due to their capability to provide continuous focus cues. However, light field displays must trade off between spatial and angular resolution, and do not model diffraction.
We present a light field-based CGH rendering pipeline allowing for reproduction of high-definition 3D scenes with continuous depth and support of intra-pupil view-dependent occlusion. Our rendering accurately accounts for diffraction and supports various types of reference illuminations for hologram. We avoid under- and over-sampling and geometric clipping effects seen in previous work. We also demonstrate an implementation of light field rendering plus the Fresnel diffraction integral based CGH calculation which is orders of magnitude faster than the state of the art [Zhang et al. 2015], achieving interactive volumetric 3D graphics.
To verify our computational results, we build a see-through, near-eye, color CGH display prototype which enables co-modulation of both amplitude and phase. We show that our rendering accurately models the spherical illumination introduced by the eye piece and produces the desired 3D imagery at the designated depth. We also analyze aliasing, theoretical resolution limits, depth of field, and other design trade-offs for near-eye CGH.
As head-mounted displays (HMDs) commonly present a single, fixed-focus display plane, a conflict can be created between the vergence and accommodation responses of the viewer. Multifocal HMDs have long been investigated as a potential solution in which multiple image planes span the viewer's accommodation range. Such displays require a scene decomposition algorithm to distribute the depiction of objects across image planes, and previous work has shown that simple decompositions can be achieved in real-time. However, recent optimal decompositions further improve image quality, particularly with complex content. Such decompositions are more computationally involved and likely require better alignment of the image planes with the viewer's eyes, which are potential barriers to practical applications.
Our goal is to enable interactive optimal decomposition algorithms capable of driving a vergence- and accommodation-tracked multifocal testbed. Ultimately, such a testbed is necessary to establish the requirements for the practical use of multifocal displays, in terms of computational demand and hardware accuracy. To this end, we present an efficient algorithm for optimal decompositions, incorporating insights from vision science. Our method is amenable to GPU implementations and achieves a three-orders-of-magnitude speedup over previous work. We further show that eye tracking can be used for adequate plane alignment with efficient image-based deformations, adjusting for both eye rotation and head movement relative to the display. We also build the first binocular multifocal testbed with integrated eye tracking and accommodation measurement, paving the way to establish practical eye tracking and rendering requirements for this promising class of display. Finally, we report preliminary results from a pilot user study utilizing our testbed, investigating the accommodation response of users to dynamic stimuli presented under optimal decomposition.
Wind-up toys are mechanical assemblies that perform intriguing motions driven by a simple spring motor. Due to the limited motor force and small body size, wind-up toys often employ higher pair joints of less frictional contacts and connector parts of nontrivial shapes to transfer motions. These unique characteristics make them hard to design and fabricate as compared to other automata. This paper presents a computational system to aid the design of wind-up toys, focusing on constructing a compact internal wind-up mechanism to realize user-requested part motions. Our key contributions include an analytical modeling of a wide variety of elemental mechanisms found in common wind-up toys, including their geometry and kinematics, conceptual design of wind-up mechanisms by computing motion transfer trees to realize the requested part motions, automatic construction of wind-up mechanisms by connecting multiple elemental mechanisms, and an optimization on the part and joint geometry with an objective of compacting the mechanism, reducing its weight, and avoiding collision. We use our system to design wind-up toys of various forms, fabricate a number of them using 3D printing, and show the functionality of various results.
We present an end-to-end solution for design and fabrication of soft pneumatic objects with desired deformations. Given a 3D object with its rest and deformed target shapes, our method automatically optimizes the chamber structure and material distribution inside the object volume so that the fabricated object can deform to all the target deformed poses with controlled air injection. To this end, our method models the object volume with a set of chambers separated by material shells. Each chamber has individual channels connected to the object surface and thus can be separately controlled with a pneumatic system, while the shell is comprised of base material with an embedded frame structure. A two-step algorithm is developed to compute the geometric layout of the chambers and frame structure as well as the material properties of the frame structure from the input. The design results can be fabricated with 3D printing and deformed by a controlled pneumatic system. We validate and demonstrate the efficacy of our method with soft pneumatic objects that have different shapes and deformation behaviors.
We present a method for designing and fabricating MetaSilicones---composite silicone rubbers that exhibit desired macroscopic mechanical properties. The underlying principle of our approach is to inject spherical inclusions of a liquid dopant material into a silicone matrix material. By varying the number, size, and locations of these inclusions as well as their material, a broad range of mechanical properties can be achieved. The technical core of our approach is formed by an optimization algorithm that, combining a simulation model based on extended finite elements (XFEM) and sensitivity analysis, computes inclusion distributions that lead to desired stiffness properties on the macroscopic level. We explore the design space of MetaSilicone on an extensive set of simulation experiments involving materials with optimized uni- and bi-directional stiffness, spatially-graded properties, as well as multi-material composites. We present validation through standard measurements on physical prototypes, which we fabricate on a modified filament-based 3D printer, thus combining the advantages of digital fabrication with the mechanical performance of silicone elastomers.
Color texture reproduction in 3D printing commonly ignores volumetric light transport (cross-talk) between surface points on a 3D print. Such light diffusion leads to significant blur of details and color bleeding, and is particularly severe for highly translucent resin-based print materials. Given their widely varying scattering properties, this cross-talk between surface points strongly depends on the internal structure of the volume surrounding each surface point. Existing scattering-aware methods use simplified models for light difusion, and often accept the visual blur as an immutable property of the print medium. In contrast, our work counteracts heterogeneous scattering to obtain the impression of a crisp albedo texture on top of the 3D print, by optimizing for a fully volumetric material distribution that preserves the target appearance. Our method employs an efficient numerical optimizer on top of a general Monte-Carlo simulation of heterogeneous scattering, supported by a practical calibration procedure to obtain scattering parameters from a given set of printer materials. Despite the inherent translucency of the medium, we reproduce detailed surface textures on 3D prints. We evaluate our system using a commercial, five-tone 3D print process and compare against the printer's native color texturing mode, demonstrating that our method preserves high-frequency features well without having to compromise on color gamut.
Our goal is to 3D print wireless sensors, input widgets and objects that can communicate with smartphones and other Wi-Fi devices, without the need for batteries or electronics. To this end, we present a novel toolkit for wireless connectivity that can be integrated with 3D digital models and fabricated using commodity desktop 3D printers and commercially available plastic filament materials. Specifically, we introduce the first computational designs that 1) send data to commercial RF receivers including Wi-Fi, enabling 3D printed wireless sensors and input widgets, and 2) embed data within objects using magnetic fields and decode the data using magnetometers on commodity smartphones. To demonstrate the potential of our techniques, we design the first fully 3D printed wireless sensors including a weight scale, flow sensor and anemometer that can transmit sensor data. Furthermore, we 3D print eyeglass frames, armbands as well as artistic models with embedded magnetic data. Finally, we present various 3D printed application prototypes including buttons, smart sliders and physical knobs that wirelessly control music volume and lights as well as smart bottles that can sense liquid flow and send data to nearby RF devices, without batteries or electronics.
We present a new algorithm for real-time hand tracking on commodity depth-sensing devices. Our method does not require a user-specific calibration session, but rather learns the geometry as the user performs live in front of the camera, thus enabling seamless virtual interaction at the consumer level. The key novelty in our approach is an online optimization algorithm that jointly estimates pose and shape in each frame, and determines the uncertainty in such estimates. This knowledge allows the algorithm to integrate per-frame estimates over time, and build a personalized geometric model of the captured user. Our approach can easily be integrated in state-of-the-art continuous generative motion tracking software. We provide a detailed evaluation that shows how our approach achieves accurate motion tracking for real-time applications, while significantly simplifying the workflow of accurate hand performance capture. We also provide quantitative evaluation datasets at http://gfx.uvic.ca/datasets/handy
The state of the art in articulated hand tracking has been greatly advanced by hybrid methods that fit a generative hand model to depth data, leveraging both temporally and discriminatively predicted starting poses. In this paradigm, the generative model is used to define an energy function and a local iterative optimization is performed from these starting poses in order to find a "good local minimum" (i.e. a local minimum close to the true pose). Performing this optimization quickly is key to exploring more starting poses, performing more iterations and, crucially, exploiting high frame rates that ensure that temporally predicted starting poses are in the basin of convergence of a good local minimum. At the same time, a detailed and accurate generative model tends to deepen the good local minima and widen their basins of convergence. Recent work, however, has largely had to trade-off such a detailed hand model with one that facilitates such rapid optimization. We present a new implicit model of hand geometry that mostly avoids this compromise and leverage it to build an ultra-fast hybrid hand tracking system. Specifically, we construct an articulated signed distance function that, for any pose, yields a closed form calculation of both the distance to the detailed surface geometry and the necessary derivatives to perform gradient based optimization. There is no need to introduce or update any explicit "correspondences" yielding a simple algorithm that maps well to parallel hardware such as GPUs. As a result, our system can run at extremely high frame rates (e.g. up to 1000fps). Furthermore, we demonstrate how to detect, segment and optimize for two strongly interacting hands, recovering complex interactions at extremely high framerates. In the absence of publicly available datasets of sufficiently high frame rate, we leverage a multiview capture system to create a new 180fps dataset of one and two hands interacting together or with objects.
Humans move their hands and bodies together to communicate and solve tasks. Capturing and replicating such coordinated activity is critical for virtual characters that behave realistically. Surprisingly, most methods treat the 3D modeling and tracking of bodies and hands separately. Here we formulate a model of hands and bodies interacting together and fit it to full-body 4D sequences. When scanning or capturing the full body in 3D, hands are small and often partially occluded, making their shape and pose hard to recover. To cope with low-resolution, occlusion, and noise, we develop a new model called MANO (hand Model with Articulated and Non-rigid defOrmations). MANO is learned from around 1000 high-resolution 3D scans of hands of 31 subjects in a wide variety of hand poses. The model is realistic, low-dimensional, captures non-rigid shape changes with pose, is compatible with standard graphics packages, and can fit any human hand. MANO provides a compact mapping from hand poses to pose blend shape corrections and a linear manifold of pose synergies. We attach MANO to a standard parameterized 3D body shape model (SMPL), resulting in a fully articulated body and hand model (SMPL+H). We illustrate SMPL+H by fitting complex, natural, activities of subjects captured with a 4D scanner. The fitting is fully automatic and results in full body models that move naturally with detailed hand motions and a realism not seen before in full body performance capture. The models and data are freely available for research purposes at http://mano.is.tue.mpg.de.
We present Motion2Fusion, a state-of-the-art 360 performance capture system that enables *real-time* reconstruction of arbitrary non-rigid scenes. We provide three major contributions over prior work: 1) a new non-rigid fusion pipeline allowing for far more faithful reconstruction of high frequency geometric details, avoiding the over-smoothing and visual artifacts observed previously. 2) a high speed pipeline coupled with a machine learning technique for 3D correspondence field estimation reducing tracking errors and artifacts that are attributed to fast motions. 3) a backward and forward non-rigid alignment strategy that more robustly deals with topology changes but is still free from scene priors. Our novel performance capture system demonstrates real-time results nearing 3x speed-up from previous state-of-the-art work on the exact same GPU hardware. Extensive quantitative and qualitative comparisons show more precise geometric and texturing results with less artifacts due to fast motions or topology changes than prior art.