 Full Citation in the ACM Digital Library
Full Citation in the ACM Digital Library
Fundamental display characteristics are constantly being improved, especially resolution, dynamic range, and color reproduction. However, whereas high resolution and high-dynamic range displays have matured as a technology, it remains largely unclear how to extend the color gamut of a display without either sacrificing light throughput or making other tradeoffs. In this paper, we advocate for adaptive color display; with hardware implementations that allow for color primaries to be dynamically chosen, an optimal gamut and corresponding pixel states can be computed in a content-adaptive and user-centric manner. We build a flexible gamut projector and develop a perceptually-driven optimization framework that robustly factors a wide color gamut target image into a set of time-multiplexed primaries and corresponding pixel values. We demonstrate that adaptive primary selection has many benefits over fixed gamut selection and show that our algorithm for joint primary selection and gamut mapping performs better than existing methods. Finally, we evaluate the proposed computational display system extensively in simulation and, via photographs and user experiments, with a prototype adaptive color projector.
The visual system constantly adapts to different luminance levels when viewing natural scenes. The state of visual adaptation is the key parameter in many visual models. While the time-course of such adaptation is well understood, there is little known about the spatial pooling that drives the adaptation signal. In this work we propose a new empirical model of local adaptation, that predicts how the adaptation signal is integrated in the retina. The model is based on psychophysical measurements on a high dynamic range (HDR) display. We employ a novel approach to model discovery, in which the experimental stimuli are optimized to find the most predictive model. The model can be used to predict the steady state of adaptation, but also conservative estimates of the visibility (detection) thresholds in complex images. We demonstrate the utility of the model in several applications, such as perceptual error bounds for physically based rendering, determining the backlight resolution for HDR displays, measuring the maximum visible dynamic range in natural scenes, simulation of afterimages, and gaze-dependent tone mapping.
We propose a color reproduction framework for creating specularly reflecting color images printed on a metallic substrate that change hue or chroma upon in-plane rotation by 90°. This framework is based on the anisotropic dot gain of line halftones when viewed under specular reflection. The proposed framework relies on a spectral prediction model specially conceived for predicting the color of non-rotated and of 90° in-plane rotated cross-halftones formed of superpositions of horizontal and vertical cyan, magenta and yellow line halftones. Desired non-rotated and rotated image colors are mapped onto the sub-gamut allowing for the desired hue or chroma shift and then, using a 6D correspondence table, converted to optimal cross-halftone ink surface coverages. The proposed recolorization and decolorization framework is especially effective for creating surprising effects such as image parts whose hues change, or gray regions that become colorful. It can be adapted to commercial printers capable of printing with cyan, magenta and yellow inks on substrates formed by an ink attracting polymer lying on top of a metallic film layer. Applications may include art, advertisement, exhibitions and document security.
In this paper, we propose a novel approach to simplify sketch drawings. The core problem is how to group sketchy strokes meaningfully, and this depends on how humans understand the sketches. The existing methods mainly rely on thresholding low-level geometric properties among the strokes, such as proximity, continuity and parallelism. However, it is not uncommon to have strokes with equal geometric properties but different semantics. The lack of semantic analysis will lead to the inability in differentiating the above semantically different scenarios. In this paper, we point out that, due to the gestalt phenomenon of closure, the grouping of strokes is actually highly influenced by the interpretation of regions. On the other hand, the interpretation of regions is also influenced by the interpretation of strokes since regions are formed and depicted by strokes. This is actually a chicken-or-the-egg dilemma and we solve it by an iterative cyclic refinement approach. Once the formed stroke groups are stabilized, we can simplify the sketchy strokes by replacing each stroke group with a smooth curve. We evaluate our method on a wide range of different sketch styles and semantically meaningful simplification results can be obtained in all test cases.
Hand-drawn animation is a major art form and communication medium, but can be challenging to produce. We present a system to help people create frame-by-frame animations through manual sketches. We design our interface to be minimalistic: it contains only a canvas and a few controls. When users draw on the canvas, our system silently analyzes all past sketches and predicts what might be drawn in the future across spatial locations and temporal frames. The interface also offers suggestions to beautify existing drawings. Our system can reduce manual workload and improve output quality without compromising natural drawing flow and control: users can accept, ignore, or modify such predictions visualized on the canvas by simple gestures. Our key idea is to extend the local similarity method in [Xing et al. 2014], which handles only low-level spatial repetitions such as hatches within a single frame, to a global similarity that can capture high-level structures across multiple frames such as dynamic objects. We evaluate our system through a preliminary user study and confirm that it can enhance both users' objective performance and subjective satisfaction.
Many tasks in geometry processing and physical simulation benefit from multiresolution hierarchies. One important characteristic across a variety of applications is that coarser layers strictly encage finer layers, nesting one another. Existing techniques such as surface mesh decimation, voxelization, or contouring distance level sets do not provide sufficient control over the quality of the output surfaces while maintaining strict nesting. We propose a solution that enables use of application-specific decimation and quality metrics. The method constructs each next-coarsest level of the hierarchy, using a sequence of decimation, flow, and contact-aware optimization steps. From coarse to fine, each layer then fully encages the next while retaining a snug fit. The method is applicable to a wide variety of shapes of complex geometry and topology. We demonstrate the effectiveness of our nested cages not only for multigrid solvers, but also for conservative collision detection, domain discretization for elastic simulation, and cage-based geometric modeling.
Decomposing a complex shape into geometrically simple primitives is a fundamental problem in geometry processing. We are interested in a shape decomposition problem where the simple primitives sought are generalized cylinders, which are ubiquitous in both organic forms and man-made artifacts. We introduce a quantitative measure of cylindricity for a shape part and develop a cylindricity-driven optimization algorithm, with a global objective function, for generalized cylinder decomposition. As a measure of geometric simplicity and following the minimum description length principle, cylindricity is defined as the cost of representing a cylinder through skeletal and cross-section profile curves. Our decomposition algorithm progressively builds local to non-local cylinders, which form over-complete covers of the input shape. The over-completeness of the cylinder covers ensures a conservative buildup of the cylindrical parts, leaving the final decision on decomposition to global optimization. We solve the global optimization by finding an exact cover, which optimizes the global objective function. We demonstrate results of our optimal decomposition algorithm on numerous examples and compare with other alternatives.
We study the design and optimization of polyhedral patterns, which are patterns of planar polygonal faces on freeform surfaces. Working with polyhedral patterns is desirable in architectural geometry and industrial design. However, the classical tiling patterns on the plane must take on various shapes in order to faithfully and feasibly approximate curved surfaces. We define and analyze the deformations these tiles must undertake to account for curvature, and discover the symmetries that remain invariant under such deformations. We propose a novel method to regularize polyhedral patterns while maintaining these symmetries into a plethora of aesthetic and feasible patterns.
3D geometric features constitute rich details of polygonal meshes. Their analysis and editing can lead to vivid appearance of shapes and better understanding of the underlying geometry for shape processing and analysis. Traditional mesh smoothing techniques mainly focus on noise filtering and they cannot distinguish different scales of features well, even mixing them up. We present an efficient method to process different scale geometric features based on a novel rolling-guidance normal filter. Given a 3D mesh, our method iteratively applies a joint bilateral filter to face normals at a specified scale, which empirically smooths small-scale geometric features while preserving large-scale features. Our method recovers the mesh from the filtered face normals by a modified Poisson-based gradient deformation that yields better surface quality than existing methods. We demonstrate the effectiveness and superiority of our method on a series of geometry processing tasks, including geometry texture removal and enhancement, coating transfer, mesh segmentation and level-of-detail meshing.
Delaunay meshes (DM) are a special type of triangle mesh where the local Delaunay condition holds everywhere. We present an efficient algorithm to convert an arbitrary manifold triangle mesh M into a Delaunay mesh. We show that the constructed DM has O(Kn) vertices, where n is the number of vertices in M and K is a model-dependent constant. We also develop a novel algorithm to simplify Delaunay meshes, allowing a smooth choice of detail levels. Our methods are conceptually simple, theoretically sound and easy to implement. The DM construction algorithm also scales well due to its O(nK log K) time complexity.
Delaunay meshes have many favorable geometric and numerical properties. For example, a DM has exactly the same geometry as the input mesh, and it can be encoded by any mesh data structure. Moreover, the empty geodesic circumcircle property implies that the commonly used cotangent Laplace-Beltrami operator has non-negative weights. Therefore, the existing digital geometry processing algorithms can benefit the numerical stability of DM without changing any codes. We observe that DMs can improve the accuracy of the heat method for computing geodesic distances. Also, popular parameterization techniques, such as discrete harmonic mapping, produce more stable results on the DMs than on the input meshes.
Acquiring 3D geometry of an object is a tedious and time-consuming task, typically requiring scanning the surface from multiple viewpoints. In this work we focus on reconstructing complete geometry from a single scan acquired with a low-quality consumer-level scanning device. Our method uses a collection of example 3D shapes to build structural part-based priors that are necessary to complete the shape. In our representation, we associate a local coordinate system to each part and learn the distribution of positions and orientations of all the other parts from the database, which implicitly also defines positions of symmetry planes and symmetry axes. At the inference stage, this knowledge enables us to analyze incomplete point clouds with substantial occlusions, because observing only a few regions is still sufficient to infer the global structure. Once the parts and the symmetries are estimated, both data sources, symmetry and database, are fused to complete the point cloud. We evaluate our technique on a synthetic dataset containing 481 shapes, and on real scans acquired with a Kinect scanner. Our method demonstrates high accuracy for the estimated part structure and detected symmetries, enabling higher quality shape completions in comparison to alternative techniques.
In this paper, we present a consolidation method that is based on a new representation of 3D point sets. The key idea is to augment each surface point into a deep point by associating it with an inner point that resides on the meso-skeleton, which consists of a mixture of skeletal curves and sheets. The deep points representation is a result of a joint optimization applied to both ends of the deep points. The optimization objective is to fairly distribute the end points across the surface and the meso-skeleton, such that the deep point orientations agree with the surface normals. The optimization converges where the inner points form a coherent meso-skeleton, and the surface points are consolidated with the missing regions completed. The strength of this new representation stems from the fact that it is comprised of both local and non-local geometric information. We demonstrate the advantages of the deep points consolidation technique by employing it to consolidate and complete noisy point-sampled geometry with large missing parts.
Detailed scanning of indoor scenes is tedious for humans. We propose autonomous scene scanning by a robot to relieve humans from such a laborious task. In an autonomous setting, detailed scene acquisition is inevitably coupled with scene analysis at the required level of detail. We develop a framework for object-level scene reconstruction coupled with object-centric scene analysis. As a result, the autoscanning and reconstruction will be object-aware, guided by the object analysis. The analysis is, in turn, gradually improved with progressively increased object-wise data fidelity. In realizing such a framework, we drive the robot to execute an iterative analyze-and-validate algorithm which interleaves between object analysis and guided validations.
The object analysis incorporates online learning into a robust graph-cut based segmentation framework, achieving a global update of object-level segmentation based on the knowledge gained from robot-operated local validation. Based on the current analysis, the robot performs proactive validation over the scene with physical push and scan refinement, aiming at reducing the uncertainty of both object-level segmentation and object-wise reconstruction. We propose a joint entropy to measure such uncertainty based on segmentation confidence and reconstruction quality, and formulate the selection of validation actions as a maximum information gain problem. The output of our system is a reconstructed scene with both object extraction and object-wise geometry fidelity.
Various Structured Light (SL) methods are used to capture 3D range images, where a number of binary or continuous light patterns are sequentially projected onto a scene of interest, while a digital camera captures images of the illuminated scene. All existing SL methods require the projector and camera to be hardware or software synchronized, with one image captured per projected pattern. A 3D range image is computed from the captured images. The two synchronization methods have disadvantages, which limit the use of SL methods to niche industrial and low quality consumer applications. Unsynchronized Structured Light (USL) is a novel SL method which does not require synchronization of pattern projection and image capture. The light patterns are projected and the images are captured independently, at constant, but possibly different, frame rates. USL synthesizes new binary images as would be decoded from the images captured by a camera synchronized to the projector, reducing the subsequent computation to standard SL. USL works both with global and rolling shutter cameras. USL enables most burst-mode-capable cameras, such as modern smartphones, tablets, DSLRs, and point-and-shoots, to function as high quality 3D snapshot cameras. Beyond the software, which can run in the devices, a separate SL Flash, able to project the sequence of patterns cyclically, during the acquisition time, is needed to enable the functionality.
We present a novel method to generate 3D scenes that allow the same activities as real environments captured through noisy and incomplete 3D scans. As robust object detection and instance retrieval from low-quality depth data is challenging, our algorithm aims to model semantically-correct rather than geometrically-accurate object arrangements. Our core contribution is a new scene synthesis technique which, conditioned on a coarse geometric scene representation, models functionally similar scenes using prior knowledge learned from a scene database. The key insight underlying our scene synthesis approach is that many real-world environments are structured to facilitate specific human activities, such as sleeping or eating. We represent scene functionalities through virtual agents that associate object arrangements with the activities for which they are typically used. When modeling a scene, we first identify the activities supported by a scanned environment. We then determine semantically-plausible arrangements of virtual objects -- retrieved from a shape database -- constrained by the observed scene geometry. For a given 3D scan, our algorithm produces a variety of synthesized scenes which support the activities of the captured real environments. In a perceptual evaluation study, we demonstrate that our results are judged to be visually appealing and functionally comparable to manually designed scenes.
Biped controller design pursues two fundamental goals; simulated walking should look human-like and robust against perturbation while maintaining its balance. Normal gait is a pattern of walking that humans normally adopt in undisturbed situations. It has previously been postulated that normal gait is more energy efficient than abnormal or impaired gaits. However, it is not clear whether normal gait is also superior to abnormal gait patterns with respect to other factors, such as stability. Understanding the correlation between gait and stability is an important aspect of biped controller design. We studied this issue in two sets of experiments with human participants and a simulated biped. The experiments evaluated the degree of resilience to external pushes for various gait patterns. We identified four gait factors that affect the balance-recovery capabilities of both human and simulated walking. We found that crouch gait is significantly more stable than normal gait against lateral push. Walking speed and the timing/magnitude of disturbance also affect gait stability. Our work would provide a potential way to compare the performance of biped controllers by normalizing their output gaits and improve their performance by adjusting these decisive factors.
Motion-tracked real-time character control is important for games and VR, but current solutions are limited: retargeting is hard for non-human characters, with locomotion bound to the sensing volume; and pose mappings are ambiguous with difficult dynamic motion control. We robustly estimate wave properties ---amplitude, frequency, and phase---for a set of interactively-defined gestures by mapping user motions to a low-dimensional independent representation. The mapping separates simultaneous or intersecting gestures, and extrapolates gesture variations from single training examples. For animations such as locomotion, wave properties map naturally to stride length, step frequency, and progression, and allow smooth transitions from standing, to walking, to running. Interpolating out-of-phase locomotions is hard, e.g., quadruped legs between walks and runs switch phase, so we introduce a new time-interpolation scheme to reduce artifacts. These improvements to real-time motion-tracked character control are important for common cyclic animations. We validate this in a user study, and show versatility to apply to part- and full-body motions across a variety of sensors.
We present a real-time facial tracking and animation system based on a Kinect sensor with video and audio input. Our method requires no user-specific training and is robust to occlusions, large head rotations, and background noise. Given the color, depth and speech audio frames captured from an actor, our system first reconstructs 3D facial expressions and 3D mouth shapes from color and depth input with a multi-linear model. Concurrently a speaker-independent DNN acoustic model is applied to extract phoneme state posterior probabilities (PSPP) from the audio frames. After that, a lip motion regressor refines the 3D mouth shape based on both PSPP and expression weights of the 3D mouth shapes, as well as their confidences. Finally, the refined 3D mouth shape is combined with other parts of the 3D face to generate the final result. The whole process is fully automatic and executed in real time.
The key component of our system is a data-driven regresor for modeling the correlation between speech data and mouth shapes. Based on a precaptured database of accurate 3D mouth shapes and associated speech audio from one speaker, the regressor jointly uses the input speech and visual features to refine the mouth shape of a new actor. We also present an improved DNN acoustic model. It not only preserves accuracy but also achieves real-time performance.
Our method efficiently fuses visual and acoustic information for 3D facial performance capture. It generates more accurate 3D mouth motions than other approaches that are based on audio or video input only. It also supports video or audio only input for real-time facial animation. We evaluate the performance of our system with speech and facial expressions captured from different actors. Results demonstrate the efficiency and robustness of our method.
We present a method for the real-time transfer of facial expressions from an actor in a source video to an actor in a target video, thus enabling the ad-hoc control of the facial expressions of the target actor. The novelty of our approach lies in the transfer and photorealistic re-rendering of facial deformations and detail into the target video in a way that the newly-synthesized expressions are virtually indistinguishable from a real video. To achieve this, we accurately capture the facial performances of the source and target subjects in real-time using a commodity RGB-D sensor. For each frame, we jointly fit a parametric model for identity, expression, and skin reflectance to the input color and depth data, and also reconstruct the scene lighting. For expression transfer, we compute the difference between the source and target expressions in parameter space, and modify the target parameters to match the source expressions. A major challenge is the convincing re-rendering of the synthesized target face into the corresponding video stream. This requires a careful consideration of the lighting and shading design, which both must correspond to the real-world environment. We demonstrate our method in a live setup, where we modify a video conference feed such that the facial expressions of a different person (e.g., translator) are matched in real-time.
Virtual characters contribute strongly to the entire visuals of 3D animated films. However, designing believable characters remains a challenging task. Artists rely on stylization to increase appeal or expressivity, exaggerating or softening specific features. In this paper we analyze two of the most influential factors that define how a character looks: shape and material. With the help of artists, we design a set of carefully crafted stimuli consisting of different stylization levels for both parameters, and analyze how different combinations affect the perceived realism, appeal, eeriness, and familiarity of the characters. Moreover, we additionally investigate how this affects the perceived intensity of different facial expressions (sadness, anger, happiness, and surprise). Our experiments reveal that shape is the dominant factor when rating realism and expression intensity, while material is the key component for appeal. Furthermore our results show that realism alone is a bad predictor for appeal, eeriness, or attractiveness.
Rendering photo-realistic animal fur is a long-standing problem in computer graphics. Considerable effort has been made on modeling the geometric complexity of fur, but the reflectance of fur fibers is not well understood. Fur has a distinct diffusive and saturated appearance, that is not captured by either the Marschner hair model or the Kajiya-Kay model. In this paper, we develop a physically-accurate reflectance model for fur fibers. Based on anatomical literature and measurements, we develop a double cylinder model for the reflectance of a single fur fiber, where an outer cylinder represents the biological observation of a cortex covered by multiple cuticle layers, and an inner cylinder represents the scattering interior structure known as the medulla. Our key contribution is to model medulla scattering accurately---in contrast, for human hair, the medulla has minimal width and thus negligible contributions to the reflectance. Medulla scattering introduces additional reflection and transmission paths, as well as diffusive reflectance lobes. We validate our physical model with measurements on real fur fibers, and introduce the first database in computer graphics of reflectance profiles for nine fur samples. We show that our model achieves significantly better fits to the measured data than the Marschner hair reflectance model. For efficient rendering, we develop a method to precompute 2D medulla scattering profiles and analytically approximate our reflectance model with factored lobes. The accuracy of the approach is validated by comparing our rendering model to full 3D light transport simulations. Our model provides an enriched set of controls, where the parameters we fit can be directly used to render realistic fur, or serve as a starting point from which artists can manually tune parameters for desired appearances.
The bidirectional reflectance distribution function (BRDF) is critical for rendering, and accurate material representation requires data-driven reflectance models. However, isotropic BRDFs are 3D functions, and measuring the reflectance of a flat sample can require a million incident and outgoing direction pairs, making the use of measured BRDFs impractical. In this paper, we address the problem of reconstructing a measured BRDF from a limited number of samples. We present a novel mapping of the BRDF space, allowing for extraction of descriptive principal components from measured databases, such as the MERL BRDF database. We optimize for the best sampling directions, and explicitly provide the optimal set of incident and outgoing directions in the Rusinkiewicz parameterization for n = {1, 2, 5, 10, 20} samples. Based on the principal components, we describe a method for accurately reconstructing BRDF data from these limited sets of samples. We validate our results on the MERL BRDF database, including favorable comparisons to previous sets of industry-standard sampling directions, as well as with BRDF measurements of new flat material samples acquired with a gantry system. As an extension, we also demonstrate how this method can be used to find optimal sampling directions when imaging a sphere of a homogeneous material; in this case, only two images are often adequate for high accuracy.
Level-of-detail (LOD) rendering is a key optimization used by modern video game engines to achieve high-quality rendering with fast performance. These LOD systems require simplified shaders, but generating simplified shaders remains largely a manual optimization task for game developers. Prior efforts to automate this process have taken hours to generate simplified shader candidates, making them impractical for use in modern shader authoring workflows for complex scenes. We present an end-to-end system for automatically generating a LOD policy for an input shader. The system operates on shaders used in both forward and deferred rendering pipelines, requires no additional semantic information beyond input shader source code, and in only seconds to minutes generates LOD policies (consisting of simplified shader, the desired LOD distance set, and transition generation) with performance and quality characteristics comparable to custom hand-authored solutions. Our design contributes new shader simplification transforms such as approximate common subexpression elimination and movement of GPU logic to parameter bind-time processing on the CPU, and it uses a greedy search algorithm that employs extensive caching and upfront collection of input shader statistics to rapidly identify simplified shaders with desirable performance-quality trade-offs.
Hierarchical depth culling is an important optimization, which is present in all modern high performance graphics processors. We present a novel culling algorithm based on a layered depth representation, with a per-sample mask indicating which layer each sample belongs to. Our algorithm is feed forward in nature in contrast to previous work, which rely on a delayed feedback loop. It is simple to implement and has fewer constraints than competing algorithms, which makes it easier to load-balance a hardware architecture. Compared to previous work our algorithm performs very well, and it will often reach over 90% of the efficiency of an optimal culling oracle. Furthermore, we can reduce bandwidth by up to 16% by compressing the hierarchical depth buffer.
We present a novel approach to remesh a surface into an isotropic triangular or quad-dominant mesh using a unified local smoothing operator that optimizes both the edge orientations and vertex positions in the output mesh. Our algorithm produces meshes with high isotropy while naturally aligning and snapping edges to sharp features. The method is simple to implement and parallelize, and it can process a variety of input surface representations, such as point clouds, range scans and triangle meshes. Our full pipeline executes instantly (less than a second) on meshes with hundreds of thousands of faces, enabling new types of interactive workflows. Since our algorithm avoids any global optimization, and its key steps scale linearly with input size, we are able to process extremely large meshes and point clouds, with sizes exceeding several hundred million elements. To demonstrate the robustness and effectiveness of our method, we apply it to hundreds of models of varying complexity and provide our cross-platform reference implementation in the supplemental material.
Injective parameterizations of surface meshes are vital for many applications in Computer Graphics, Geometry Processing and related fields. Tutte's embedding, and its generalization to convex combination maps, are among the most popular approaches for computing parameterizations of surface meshes into the plane, as they guarantee injectivity, and their computation only requires solving a sparse linear system. However, they are only applicable to disk-type and toric surface meshes.
In this paper we suggest a generalization of Tutte's embedding to other surface topologies, and in particular the common, yet untreated case, of sphere-type surfaces. The basic idea is to enforce certain boundary conditions on the parameterization so as to achieve a Euclidean orbifold structure. The orbifold-Tutte embedding is a seamless, globally bijective parameterization that, similarly to the classic Tutte embedding, only requires solving a sparse linear system for its computation.
In case the cotangent weights are used, the orbifold-Tutte embedding globally minimizes the Dirichlet energy and is shown to approximate conformal and four-point quasiconformal mappings. As far as we are aware, this is the first fully-linear method that produces bijective approximations to conformal mappings.
Aside from parameterizations, the orbifold-Tutte embedding can be used to generate bijective inter-surface mappings with three or four landmarks and symmetric patterns on sphere-type surfaces.
We propose an efficient algorithm for computing large-scale bounded distortion maps of triangular and tetrahedral meshes. Specifically, given an initial map, we compute a similar map whose differentials are orientation preserving and have bounded condition number.
Inspired by alternating optimization and Gauss-Newton approaches, we devise a first order method which combines the advantages of both. On the one hand, its iterations are as computationally efficient as those of alternating optimization. On the other hand, it enjoys preferable convergence properties, associated with Gauss-Newton like approaches.
We demonstrate the utility of the proposed approach in efficiently solving geometry processing problems, focusing on challenging large-scale problems.
Global surface parametrization often requires the use of cuts or charts due to non-trivial topology. In recent years a focus has been on so-called seamless parametrizations, where the transition functions across the cuts are rigid transformations with a rotation about some multiple of 90°. Of particular interest, e.g. for quadrilateral meshing, paneling, or texturing, are those instances where in addition the translational part of these transitions is integral (or more generally: quantized). We show that finding not even the optimal, but just an arbitrary valid quantization (one that does not imply parametric degeneracies), is a complex combinatorial problem. We present a novel method that allows us to solve it, i.e. to find valid as well as good quality quantizations. It is based on an original approach to quickly construct solutions to linear Diophantine equation systems, exploiting the specific geometric nature of the parametrization problem. We thereby largely outperform the state-of-the-art, sometimes by several orders of magnitude.
Spherical Fibonacci point sets yield nearly uniform point distributions on the unit sphere S2 ⊂ R3. The forward generation of these point sets has been widely researched and is easy to implement, such that they have been used in various applications.
Unfortunately, the lack of an efficient mapping from points on the unit sphere to their closest spherical Fibonacci point set neighbors rendered them impractical for a wide range of applications, especially in computer graphics. Therefore, we introduce an inverse mapping from points on the unit sphere which yields the nearest neighbor in an arbitrarily sized spherical Fibonacci point set in constant time, without requiring any precomputations or table lookups.
We show how to implement this inverse mapping on GPUs while addressing arising floating point precision problems. Further, we demonstrate the use of this mapping and its variants, and show how to apply it to fast unit vector quantization. Finally, we illustrate the means by which to modify this inverse mapping for texture mapping with smooth filter kernels and showcase its use in the field of procedural modeling.
When looking at videos of very similar actions with the naked eye, it is often difficult to notice subtle motion differences between them. In this paper we introduce video diffing, an algorithm that highlights the important differences between a pair of video recordings of similar actions. We overlay the edges of one video onto the frames of the second, and color the edges based on a measure of local dissimilarity between the videos. We measure dissimilarity by extracting spatiotemporal gradients from both videos and calculating how dissimilar histograms of these gradients are at varying spatial scales. We performed a user study with 54 people to compare the ease with which users could use our method to find differences. Users gave our method an average grade of 4.04 out of 5 for ease of use, compared to 3.48 and 2.08 for two baseline approaches. Anecdotal results also show that our overlays are useful in the specific use cases of professional golf instruction and analysis of animal locomotion simulations.
We introduce JumpCut, a new mask transfer and interpolation method for interactive video cutout. Given a source frame for which a foreground mask is already available, we compute an estimate of the foreground mask at another, typically non-successive, target frame. Observing that the background and foreground regions typically exhibit different motions, we leverage these differences by computing two separate nearest-neighbor fields (split-NNF) from the target to the source frame. These NNFs are then used to jointly predict a coherent labeling of the pixels in the target frame. The same split-NNF is also used to aid a novel edge classifier in detecting silhouette edges (S-edges) that separate the foreground from the background. A modified level set method is then applied to produce a clean mask, based on the pixel labels and the S-edges computed by the previous two steps. The resulting mask transfer method may also be used for coherently interpolating the foreground masks between two distant source frames. Our results demonstrate that the proposed method is significantly more accurate than the existing state-of-the-art on a wide variety of video sequences. Thus, it reduces the required amount of user effort, and provides a basis for an effective interactive video object cutout tool.
Extending image processing techniques to videos is a non-trivial task; applying processing independently to each video frame often leads to temporal inconsistencies, and explicitly encoding temporal consistency requires algorithmic changes. We describe a more general approach to temporal consistency. We propose a gradient-domain technique that is blind to the particular image processing algorithm. Our technique takes a series of processed frames that suffers from flickering and generates a temporally-consistent video sequence. The core of our solution is to infer the temporal regularity from the original unprocessed video, and use it as a temporal consistency guide to stabilize the processed sequence. We formally characterize the frequency properties of our technique, and demonstrate, in practice, its ability to stabilize a wide range of popular image processing techniques including enhancement and stylization of color and tone, intrinsic images, and depth estimation.
Short looping videos concisely capture the dynamism of natural scenes. Creating seamless loops usually involves maximizing spatiotemporal consistency and applying Poisson blending. We take an end-to-end view of the problem and present new techniques that jointly improve loop quality while also significantly reducing processing time. A key idea is to relax the consistency constraints to anticipate the subsequent blending, thereby enabling looping of low-frequency content like moving clouds and changing illumination. We also analyze the input video to remove an undesired bias toward short loops. The quality gains are demonstrated visually and confirmed quantitatively using a new gradient-domain consistency metric. We improve system performance by classifying potentially loopable pixels, masking the 2D graph cut, pruning graph-cut labels based on dominant periods, and optimizing on a coarse grid while retaining finer detail. Together these techniques reduce computation times from tens of minutes to nearly real-time.
Real-time high quality video tone mapping is needed for many applications, such as digital viewfinders in cameras, display algorithms which adapt to ambient light, in-camera processing, rendering engines for video games and video post-processing. We propose a viable solution for these applications by designing a video tone-mapping operator that controls the visibility of the noise, adapts to display and viewing environment, minimizes contrast distortions, preserves or enhances image details, and can be run in real-time on an incoming sequence without any preprocessing. To our knowledge, no existing solution offers all these features. Our novel contributions are: a fast procedure for computing local display-adaptive tone-curves which minimize contrast distortions, a fast method for detail enhancement free from ringing artifacts, and an integrated video tone-mapping solution combining all the above features.
Traditional fluid simulations require large computational resources even for an average sized scene with the main bottleneck being a very small time step size, required to guarantee the stability of the solution. Despite a large progress in parallel computing and efficient algorithms for pressure computation in the recent years, realtime fluid simulations have been possible only under very restricted conditions. In this paper we propose a novel machine learning based approach, that formulates physics-based fluid simulation as a regression problem, estimating the acceleration of every particle for each frame. We designed a feature vector, directly modelling individual forces and constraints from the Navier-Stokes equations, giving the method strong generalization properties to reliably predict positions and velocities of particles in a large time step setting on yet unseen test videos. We used a regression forest to approximate the behaviour of particles observed in the large training set of simulations obtained using a traditional solver. Our GPU implementation led to a speed-up of one to three orders of magnitude compared to the state-of-the-art position-based fluid solver and runs in real-time for systems with up to 2 million particles.
We present a real-time painting system that simulates the interactions among brush, paint, and canvas at the bristle level. The key challenge is how to model and simulate sub-pixel paint details, given the limited computational resource in each time step. To achieve this goal, we propose to define paint liquid in a hybrid fashion: the liquid close to the brush is modeled by particles, and the liquid away from the brush is modeled by a density field. Based on this representation, we develop a variety of techniques to ensure the performance and robustness of our simulator under large time steps, including brush and particle simulations in non-inertial frames, a fixed-point method for accelerating Jacobi iterations, and a new Eulerian-Lagrangian approach for simulating detailed liquid effects. The resulting system can realistically simulate not only the motions of brush bristles and paint liquid, but also the liquid transfer processes among different representations. We implement the whole system on GPU by CUDA. Our experiment shows that artists can use the system to draw realistic and vivid digital paintings, by applying the painting techniques that they are familiar with but not offered by many existing systems.
Multiple-fluid interaction is an interesting and common visual phenomenon we often observe. In this paper, we present an energy-based Lagrangian method that expands the capability of existing multiple-fluid methods to handle various phenomena, such as extraction, partial dissolution, etc. Based on our user-adjusted Helmholtz free energy functions, the simulated fluid evolves from high-energy states to low-energy states, allowing flexible capture of various mixing and unmixing processes. We also extend the original Cahn-Hilliard equation to be better able to simulate complex fluid-fluid interaction and rich visual phenomena such as motion-related mixing and position based pattern. Our approach is easily integrated with existing state-of-the-art smooth particle hydrodynamic (SPH) solvers and can be further implemented on top of the position based dynamics (PBD) method, improving the stability and incompressibility of the fluid during Lagrangian simulation under large time steps. Performance analysis shows that our method is at least 4 times faster than the state-of-the-art multiple-fluid method. Examples are provided to demonstrate the new capability and effectiveness of our approach.
We present a method to increase the apparent resolution of particle-based liquid simulations. Our method first outputs a dense, temporally coherent, regularized point set from a coarse particle-based liquid simulation. We then apply a surface-only Lagrangian wave simulation to this high-resolution point set. We develop novel methods for seeding and simulating waves over surface points, and use them to generate high-resolution details. We avoid error-prone surface mesh processing, and robustly propagate waves without the need for explicit connectivity information. Our seeding strategy combines a robust curvature evaluation with multiple bands of seeding oscillators, injects waves with arbitrarily fine-scale structures, and properly handles obstacle boundaries. We generate detailed fluid surfaces from coarse simulations as an independent post-process that can be applied to most particle-based fluid solvers.
Previous garment modeling techniques mainly focus on designing novel garments to dress up virtual characters. We study the modeling of real garments and develop a system that is intuitive to use even for novice users. Our system includes garment component detectors and design attribute classifiers learned from a manually labeled garment image database. In the modeling time, we scan the garment with a Kinect and build a rough shape by KinectFusion from the raw RGBD sequence. The detectors and classifiers will identify garment components (e.g. collar, sleeve, pockets, belt, and buttons) and their design attributes (e.g. falbala collar or lapel collar, hubble-bubble sleeve or straight sleeve) from the RGB images. Our system also contains a 3D deformable template database for garment components. Once the components and their designs are determined, we choose appropriate templates, stitch them together, and fit them to the initial garment mesh generated by KinectFusion. Experiments on various different garment styles consistently generate high quality results.
We propose a novel system to reconstruct a high-quality hair depth map from a single portrait photo with minimal user input. We achieve this by combining depth cues such as occlusions, silhouettes, and shading, with a novel 3D helical structural prior for hair reconstruction. We fit a parametric morphable face model to the input photo and construct a base shape in the face, hair and body regions using occlusion and silhouette constraints. We then estimate the normals in the hair region via a Shape-from-Shading-based optimization that uses the lighting inferred from the face model and enforces an adaptive albedo prior that models the typical color and occlusion variations of hair. We introduce a 3D helical hair prior that captures the geometric structure of hair, and show that it can be robustly recovered from the input photo in an automatic manner. Our system combines the base shape, the normals estimated by Shape from Shading, and the 3D helical hair prior to reconstruct high-quality 3D hair models. Our single-image reconstruction closely matches the results of a state-of-the-art multi-view stereo applied on a multi-view stereo dataset. Our technique can reconstruct a wide variety of hairstyles ranging from short to long and from straight to messy, and we demonstrate the use of our 3D hair models for high-quality portrait relighting, novel view synthesis and 3D-printed portrait reliefs.
Current modeling packages have polished interfaces for editing polygonal meshes, where artists work individually on each mesh. A variety of recent cloud-based services have shown the benefits of editing documents in real-time collaboration with others. In this paper, we present a system for collaborative editing of low-polygonal and subdivision mesh models. We cast collaborative editing as a special instance of distributed version control. We support concurrent editing by robustly sharing and merging mesh version histories in real-time. We store and transmit mesh differences efficiently by encoding them as sequences of primitive editing operations. We enable collaboration by merging and detecting conflicts. We extend this model letting artists adapt others' editing histories by retargeting sequences of editing operations to new parts of the mesh with potentially different topology. We tested our algorithms by editing meshes with up to thousand edits, in collaborative editing sessions lasting a few hours, and by retargeting sequences of several hundred edits. We found the proposed system to be reliable, fast and scaling well with mesh complexity. We demonstrate that our merge algorithm is more robust than prior work. We further validated the proposed collaborative workflow with a user study where MeshHisto was consistently preferred over other alternatives for collaborative workflows.
A shape grammar defines a procedural shape space containing a variety of models of the same class, e.g. buildings, trees, furniture, airplanes, bikes, etc. We present a framework that enables a user to interactively design a probability density function (pdf) over such a shape space and to sample models according to the designed pdf. First, we propose a user interface that enables a user to quickly provide preference scores for selected shapes and suggest sampling strategies to decide which models to present to the user to evaluate. Second, we propose a novel kernel function to encode the similarity between two procedural models. Third, we propose a framework to interpolate user preference scores by combining multiple techniques: function factorization, Gaussian process regression, autorelevance detection, and l1 regularization. Fourth, we modify the original grammars to generate models with a pdf proportional to the user preference scores. Finally, we provide evaluations of our user interface and framework parameters and a comparison to other exploratory modeling techniques using modeling tasks in five example shape spaces: furniture, low-rise buildings, skyscrapers, airplanes, and vegetation.
We introduce AniMesh, a system that supports interleaved modeling and animation creation and editing. AniMesh is suitable for rapid prototyping and easily accessible to non-experts. Source animations can be obtained from commodity motion capture devices or by adapting canned motion sequences. We propose skeleton abstraction and motion retargeting algorithms for finding correspondences and transferring motion between skeletons, or portions of skeletons, with varied topology. Motion can be copied-and-pasted between kinematic chains with different skeletal topologies, and entire model parts can be cut and reattached, while always retaining plausible, composite animations.
Photon mapping (PM) has been widely regarded as an efficient solution for light transport simulation, including challenging caustics paths and many-bounce indirect lighting. The efficiency of PM comes from reusing traced photons. However, the handling of photon gathering in existing PM algorithms is universally biased -- the expected value of their results does not necessarily agree with the true solution of the rendering equation. We present a novel photon gathering method to efficiently achieve unbiased rendering with photon mapping. Instead of aggregating the gathered photons into an estimated density as in classical photon mapping, we process each photon individually and connect the corresponding light sub-path with the eye sub-path that generates the gather point, creating an unbiased path sample. The Monte Carlo estimate for such a path sample is calculated by evaluating all relevant terms in a strict and unbiased way, leading to a self-contained unbiased sampling technique. We further develop a set of multiple importance sampling (MIS) weights that allow our method to be optimally combined with bidirectional path tracing (BDPT), resulting in an unbiased rendering algorithm that can efficiently handle a wide variety of light paths and that compares favorably with previous algorithms. Experiments demonstrate the efficacy and robustness of our method.
The simulation of light transport in the presence of multi-bounce glossy effects and motion is challenging because the integrand is high dimensional and areas of high-contribution tend to be narrow and hard to sample. We present a Markov Chain Monte Carlo (MCMC) rendering algorithm that extends Metropolis Light Transport by automatically and explicitly adapting to the local shape of the integrand, thereby increasing the acceptance rate. Our algorithm characterizes the local behavior of throughput in path space using its gradient as well as its Hessian. In particular, the Hessian is able to capture the strong anisotropy of the integrand. We obtain the derivatives using automatic differentiation, which makes our solution general and easy to extend to additional sampling dimensions such as time.
However, the resulting second order Taylor expansion is not a proper distribution and cannot be used directly for importance sampling. Instead, we use ideas from Hamiltonian Monte-Carlo and simulate the Hamiltonian dynamics in a flipped version of the Taylor expansion where gravity pulls particles towards the high-contribution region. Whereas such methods usually require numerical integration, we show that our quadratic landscape leads to a closed-form anisotropic Gaussian distribution for the final particle positions, and it results in a standard Metropolis-Hastings algorithm. Our method excels at rendering glossy-to-glossy reflections on small and highly curved surfaces. Furthermore, unlike previous work that derives sampling anisotropy with pen and paper and only considers specific effects such as specular BSDFs, we characterize the local shape of throughput through automatic differentiation. This makes our approach very general. In particular, our method is the first MCMC rendering algorithm that is able to resolve the anisotropy in the time dimension and render difficult moving caustics.
Instead of computing on a large number of virtual point lights (VPLs), scalable many-lights rendering methods effectively simulate various illumination effects only using hundreds or thousands of representative VPLs. However, gathering illuminations from these representative VPLs, especially computing the visibility, is still a tedious and time-consuming task. In this paper, we propose a new matrix sampling-and-recovery scheme to efficiently gather illuminations by only sampling a small number of visibilities between representative VPLs and surface points. Our approach is based on the observation that the lighting matrix used in manylights rendering is of low-rank, so that it is possible to sparsely sample a small number of entries, and then numerically complete the entire matrix. We propose a three-step algorithm to explore this observation. First, we design a new VPL clustering algorithm to slice the rows and group the columns of the full lighting matrix into a number of reduced matrices, which are sampled and recovered individually. Second, we propose a novel prediction method that predicts visibility of matrix entries from sparsely and randomly sampled entries. Finally, we adapt the matrix separation technique to recover the entire reduced matrix and compute final shadings. Experimental results show that our method heavily reduces the required visibility sampling in the final gathering and achieves 3--7 times speedup compared with the state-of-the-art methods on test scenes.
We propose a novel algorithm for blue noise sampling inspired by the Smoothed Particle Hydrodynamics (SPH) method. SPH is a well-known method in fluid simulation -- it computes particle distributions to minimize the internal pressure variance. We found that this results in sample points (i.e., particles) with a high quality blue-noise spectrum. Inspired by this, we tailor the SPH method for blue noise sampling. Our method achieves fast sampling in general dimensions for both surfaces and volumes. By varying a single parameter our method can generate a variety of blue noise samples with different distribution properties, ranging from Lloyd's relaxation to Capacity Constrained Voronoi Tessellations (CCVT). Our method is fast and supports adaptive sampling and multi-class sampling. We have also performed experimental studies of the SPH kernel and its influence on the distribution properties of samples. We demonstrate with examples that our method can generate a variety of controllable blue noise sample patterns, suitable for applications such as image stippling and re-meshing.
We describe a novel technique for the fast production of large point sets with different spectral properties. In contrast to tile-based methods we use so-called AA Patterns: ornamental point sets obtained from quantization errors. These patterns have a discrete and structured number-theoretic nature, can be produced at very low costs, and possess an inherent structural indexing mechanism equivalent to those used in recursive tiling techniques. This allows us to generate, manipulate and store point sets very efficiently. The technique outperforms existing methods in speed, memory footprint, quality, and flexibility. This is demonstrated by a number of measurements and comparisons to existing point generation algorithms.
We pose the decompose-and-pack or DAP problem, which tightly combines shape decomposition and packing. While in general, DAP seeks to decompose an input shape into a small number of parts which can be efficiently packed, our focus is geared towards 3D printing. The goal is to optimally decompose-and-pack a 3D object into a printing volume to minimize support material, build time, and assembly cost. We present Dapper, a global optimization algorithm for the DAP problem which can be applied to both powder- and FDM-based 3D printing. The solution search is top-down and iterative. Starting with a coarse decomposition of the input shape into few initial parts, we progressively pack a pile in the printing volume, by iteratively docking parts, possibly while introducing cuts, onto the pile. Exploration of the search space is via a prioritized and bounded beam search, with breadth and depth pruning guided by local and global DAP objectives. A key feature of Dapper is that it works with pyramidal primitives, which are packing- and printing-friendly. Pyramidal shapes are also more general than boxes to reduce part counts, while still maintaining a suitable level of simplicity to facilitate DAP optimization. We demonstrate printing efficiency gains achieved by Dapper, compare to state-of-the-art alternatives, and show how fabrication criteria such as cut area and part size can be easily incorporated into our solution framework to produce more physically plausible fabrications.
As the 3D printing technology starts to revolutionize our daily life and the manufacturing industries, a critical problem is about to e-merge: how can we find an automatic way to divide a 3D model into multiple printable pieces, so as to save the space, to reduce the printing time, or to make a large model printable by small printers. In this paper, we present a systematic study on the partitioning and packing of 3D models under the multi-phase level set framework. We first construct analysis tools to evaluate the qualities of a partitioning using six metrics: stress load, surface details, interface area, packed size, printability, and assembling. Based on this analysis, we then formulate level set methods to improve the qualities of the partitioning according to the metrics. These methods are integrated into an automatic system, which repetitively and locally optimizes the partitioning. Given the optimized partitioning result, we further provide a container structure modeling algorithm to facilitate the packing process of the printed pieces. Our experiment shows that the system can generate quality partitioning of various 3D models for space saving and fast production purposes.
This paper introduces a perceptual model for determining 3D printing orientations. Additive manufacturing methods involving low-cost 3D printers often require robust branching support structures to prevent material collapse at overhangs. Although the designed shape can successfully be made by adding supports, residual material remains at the contact points after the supports have been removed, resulting in unsightly surface artifacts. Moreover, fine surface details on the fabricated model can easily be damaged while removing supports. To prevent the visual impact of these artifacts, we present a method to find printing directions that avoid placing supports in perceptually significant regions. Our model for preference in 3D printing direction is formulated as a combination of metrics including area of support, visual saliency, preferred viewpoint and smoothness preservation. We develop a training-and-learning methodology to obtain a closed-form solution for our perceptual model and perform a large-scale study. We demonstrate the performance of this perceptual model on both natural and man-made objects.
We present an interactive design system that allows casual users to quickly create 3D-printable robotic creatures. Our approach automates the tedious parts of the design process while providing ample room for customization of morphology, proportions, gait and motion style. The technical core of our framework is an efficient optimization-based solution that generates stable motions for legged robots of arbitrary designs. An intuitive set of editing tools allows the user to interactively explore the space of feasible designs and to study the relationship between morphological features and the resulting motions. Fabrication blueprints are generated automatically such that the robot designs can be manufactured using 3D-printing and off-the-shelf servo motors. We demonstrate the effectiveness of our solution by designing six robotic creatures with a variety of morphological features: two, four or five legs, point or area feet, actuated spines and different proportions. We validate the feasibility of the designs generated with our system through physics simulations and physically-fabricated prototypes.
We propose IM6D, a novel real-time magnetic motion-tracking system using multiple identifiable, tiny, lightweight, wireless and occlusion-free markers. It provides reasonable accuracy and update rates and an appropriate working space for dexterous 3D interaction. Our system follows a novel electromagnetic induction principle to externally excite wireless LC coils and uses an externally located pickup coil array to track each of the LC coils with 5-DOF. We apply this principle to design a practical motion-tracking system using multiple markers with 6-DOF and to achieve reliable tracking with reasonable speed. We also solved the principle's inherent dead-angle problem. Based on this method, we simulated the configuration of parameters for designing a system with scalability for dexterous 3D motion. We implemented an actual system and applied a parallel computation structure to increase the tracking speed. We also built some examples to show how well our system works for actual situations.
We propose a novel three-dimensional motion sensing method using lasers. Recently, object motion information is being used in various applications, and the types of targets that can be sensed continue to diversify. Nevertheless, conventional motion sensing systems have low universality because they require some devices to be mounted on the target, such as accelerometers and gyro sensors, or because they are based on cameras, which limits the types of targets that can be detected. Our method solves this problem and enables noncontact, high-speed, deterministic measurement of the velocity of a moving target without any prior knowledge about the target shape and texture, and can be applied to any unconstrained, unspecified target. These distinctive features are achieved by using a system consisting of a laser range finder, a laser Doppler velocimeter, and a beam controller, in addition to a robust 3D motion calculation method. The motion of the target is recovered from fragmentary physical information, such as the distance and speed of the target at the laser irradiation points. From the acquired laser information, our method can provide a numerically stable solution based on the generalized weighted Tikhonov regularization. Using this technique and a prototype system that we developed, we also demonstrated a number of applications, including motion capture, video game control, and 3D shape integration with everyday objects.
We present RF-Capture, a system that captures the human figure -- i.e., a coarse skeleton -- through a wall. RF-Capture tracks the 3D positions of a person's limbs and body parts even when the person is fully occluded from its sensor, and does so without placing any markers on the subject's body. In designing RF-Capture, we built on recent advances in wireless research, which have shown that certain radio frequency (RF) signals can traverse walls and reflect off the human body, allowing for the detection of human motion through walls. In contrast to these past systems which abstract the entire human body as a single point and find the overall location of that point through walls, we show how we can reconstruct various human body parts and stitch them together to capture the human figure. We built a prototype of RF-Capture and tested it on 15 subjects. Our results show that the system can capture a representative human figure through walls and use it to distinguish between various users.
Transient images help to analyze light transport in scenes. Besides two spatial dimensions, they are resolved in time of flight. Cost-efficient approaches for their capture use amplitude modulated continuous wave lidar systems but typically take more than a minute of capture time. We propose new techniques for measurement and reconstruction of transient images, which drastically reduce this capture time. To this end, we pose the problem of reconstruction as a trigonometric moment problem. A vast body of mathematical literature provides powerful solutions to such problems. In particular, the maximum entropy spectral estimate and the Pisarenko estimate provide two closed-form solutions for reconstruction using continuous densities or sparse distributions, respectively. Both methods can separate m distinct returns using measurements at m modulation frequencies. For m = 3 our experiments with measured data confirm this. Our GPU-accelerated implementation can reconstruct more than 100000 frames of a transient image per second. Additionally, we propose modifications of the capture routine to achieve the required sinusoidal modulation without increasing the capture time. This allows us to capture up to 18.6 transient images per second, leading to transient video. An important byproduct is a method for removal of multipath interference in range imaging.
Wire wrapping is a traditional form of handmade jewelry that involves bending metal wire to create intricate shapes. The technique appeals to novices and casual crafters because of its low cost, accessibility and unique aesthetic. We present a computational design tool that addresses the two main challenges of creating 2D wire-wrapped jewelry: decomposing an input drawing into a set of wires, and bending the wires to give them shape. Our main contribution is an automatic wire decomposition algorithm that segments a drawing into a small number of wires based on aesthetic and fabrication principles. We formulate the task as a constrained graph labeling problem and present a stochastic optimization approach that produces good results for a variety of inputs.
Given a decomposition, our system generates a 3D-printed custom support structure, or jig, that helps users bend the wire into the appropriate shape. We validated our wire decomposition algorithm against existing wire-wrapped designs, and used our end-to-end system to create new jewelry from clipart drawings. We also evaluated our approach with novice users, who were able to create various pieces of jewelry in less than half an hour.
Building LEGO sculptures requires accounting for the target object's shape, colors, and stability. In particular, finding a good layout of LEGO bricks that prevents the sculpture from collapsing (due to its own weight) is usually challenging, and it becomes increasingly difficult as the target object becomes larger or more complex. We devise a force-based analysis for estimating physical stability of a given sculpture. Unlike previous techniques for Legolization, which typically use heuristic-based metrics for stability estimation, our force-based metric gives 1) an ordering in the strength so that we know which structure is more stable, and 2) a threshold for stability so that we know which one is stable enough. In addition, our stability analysis tells us the weak portion of the sculpture. Building atop our stability analysis, we present a layout refinement algorithm that iteratively improves the structure around the weak portion, allowing for automatic generation of a LEGO brick layout from a given 3D model, accounting for color information, required workload (in terms of the number of bricks) and physical stability. We demonstrate the success of our method with real LEGO sculptures built up from a wide variety of 3D models, and compare against previous methods.
Metallophones such as glockenspiels produce sounds in response to contact. Building these instruments is a complicated process, limiting their shapes to well-understood designs such as bars. We automatically optimize the shape of arbitrary 2D and 3D objects through deformation and perforation to produce sounds when struck which match user-supplied frequency and amplitude spectra. This optimization requires navigating a complex energy landscape, for which we develop Latin Complement Sampling to both speed up finding minima and provide probabilistic bounds on landscape exploration. Our method produces instruments which perform similarly to those that have been professionally-manufactured, while also expanding the scope of shape and sound that can be realized, e.g., single object chords. Furthermore, we can optimize sound spectra to create overtones and to dampen specific frequencies. Thus our technique allows even novices to design metallophones with unique sound and appearance.
We present an interactive tool for designing physical surfaces made from flexible interlocking quadrilateral elements of a single size and shape. With the element shape fixed, the design task becomes one of finding a discrete structure---i.e., element connectivity and binary orientations---that leads to a desired geometry. In order to address this challenging problem of combinatorial geometry, we propose a forward modeling tool that allows the user to interactively explore the space of feasible designs. Paralleling principles from conventional modeling software, our approach leverages a library of base shapes that can be instantiated, combined, and extended using two fundamental operations: merging and extrusion. In order to assist the user in building the designs, we furthermore propose a method to automatically generate assembly instructions. We demonstrate the versatility of our method by creating a diverse set of digital and physical examples that can serve as personalized lamps or decorative items.
Photos compress 3D visual data to 2D. However, it is still possible to infer depth information even without sophisticated object learning. We propose a solution based on small-scale defocus blur inherent in optical lens and tackle the estimation problem by proposing a non-parametric matching scheme for natural images. It incorporates a matching prior with our newly constructed edgelet dataset using a non-local scheme, and includes semantic depth order cues for physically based inference. Several applications are enabled on natural images, including geometry based rendering and editing.
Structures and objects are often supposed to have idealized geometries such as straight lines or circles. Although not always visible to the naked eye, in reality, these objects deviate from their idealized models. Our goal is to reveal and visualize such subtle geometric deviations, which can contain useful, surprising information about our world. Our framework, termed Deviation Magnification, takes a still image as input, fits parametric models to objects of interest, computes the geometric deviations, and renders an output image in which the departures from ideal geometries are exaggerated. We demonstrate the correctness and usefulness of our method through quantitative evaluation on a synthetic dataset and by application to challenging natural images.
We present an algorithm for automatically detecting and visualizing small non-local variations between repeating structures in a single image. Our method allows to automatically correct these variations, thus producing an 'idealized' version of the image in which the resemblance between recurring structures is stronger. Alternatively, it can be used to magnify these variations, thus producing an exaggerated image which highlights the various variations that are difficult to spot in the input image. We formulate the estimation of deviations from perfect recurrence as a general optimization problem, and demonstrate it in the particular cases of geometric deformations and color variations.
Cloud image processing is often proposed as a solution to the limited computing power and battery life of mobile devices: it allows complex algorithms to run on powerful servers with virtually unlimited energy supply. Unfortunately, this overlooks the time and energy cost of uploading the input and downloading the output images. When transfer overhead is accounted for, processing images on a remote server becomes less attractive and many applications do not benefit from cloud offloading. We aim to change this in the case of image enhancements that preserve the overall content of an image. Our key insight is that, in this case, the server can compute and transmit a description of the transformation from input to output, which we call a transform recipe. At equivalent quality, our recipes are much more compact than JPEG images: this reduces the client's download. Furthermore, recipes can be computed from highly compressed inputs which significantly reduces the data uploaded to the server. The client reconstructs a high-fidelity approximation of the output by applying the recipe to its local high-quality input. We demonstrate our results on 168 images and 10 image processing applications, showing that our recipes form a compact representation for a diverse set of image filters. With an equivalent transmission budget, they provide higher-quality results than JPEG-compressed input/output images, with a gain of the order of 10 dB in many cases. We demonstrate the utility of recipes on a mobile phone by profiling the energy consumption and latency for both local and cloud computation: a transform recipe-based pipeline runs 2--4x faster and uses 2--7x less energy than local or naive cloud computation.
The field of topology optimization seeks to optimize shapes under structural objectives, such as achieving the most rigid shape using a given quantity of material. Besides optimal shape design, these methods are increasingly popular as design tools, since they automatically produce structures having desirable physical properties, a task hard to perform by hand even for skilled designers. However, there is no simple way to control the appearance of the generated objects.
In this paper, we propose to optimize shapes for both their structural properties and their appearance, the latter being controlled by a user-provided pattern example. These two objectives are challenging to combine, as optimal structural properties fully define the shape, leaving no degrees of freedom for appearance. We propose a new formulation where appearance is optimized as an objective while structural properties serve as constraints. This produces shapes with sufficient rigidity while allowing enough freedom for the appearance of the final structure to resemble the input exemplar.
Our approach generates rigid shapes using a specified quantity of material while observing optional constraints such as voids, fills, attachment points, and external forces. The appearance is defined by examples, making our technique accessible to casual users. We demonstrate its use in the context of fabrication using a laser cutter to manufacture real objects from optimized shapes.
There is a growing expectation for high performance design in architecture which negotiates between the requirements of the client and the physical constraints of a building site. Clients for building projects often challenge architects to maximize view quality since it can significantly increase real estate value. To pursue this challenge, architects typically move through several design revision cycles to identify a set of design options which satisfy these view quality expectations in coordination with other goals of the project. However, reviewing a large quantity of design options within the practical time constraints is challenging due to the limitations of existing tools for view performance evaluation. These challenges include flexibility in the definition of view quality and the ability to handle the expensive computation involved in assessing both the view quality and the exploration of a large number of possible design options. To address these challenges, we propose a catalogue-based framework that enables the interactive exploration of conceptual building design options based on adjustable view preferences. We achieve this by integrating a flexible mechanism to combine different view measures with an indexing scheme for view computation that achieves high performance and precision. Furthermore, the combined view measures are then used to model the building design space as a high dimensional scalar function. The topological features of this function are then used as candidate building designs. Finally, we propose an interactive design catalogue for the exploration of potential building designs based on the given view preferences. We demonstrate the effectiveness of our approach through two use case scenarios to assess view potential and explore conceptual building designs on sites with high development likelihood in Manhattan, New York City.
We present AutoConnect, an automatic method that creates customized, 3D-printable connectors attaching two physical objects together. Users simply position and orient virtual models of the two objects that they want to connect and indicate some auxiliary information such as weight and dimensions. Then, AutoConnect creates several alternative designs that users can choose from for 3D printing. The design of the connector is created by combining two holders, one for each object. We categorize the holders into two types. The first type holds standard objects such as pipes and planes. We utilize a database of parameterized mechanical holders and optimize the holder shape based on the grip strength and material consumption. The second type holds free-form objects. These are procedurally generated shell-gripper designs created based on geometric analysis of the object. We illustrate the use of our method by demonstrating many examples of connectors and practical use cases.
Assigning textures and materials within 3D scenes is a tedious and labor-intensive task. In this paper, we present Magic Decorator, a system that automatically generates material suggestions for 3D indoor scenes. To achieve this goal, we introduce local material rules, which describe typical material patterns for a small group of objects or parts, and global aesthetic rules, which account for the harmony among the entire set of colors in a specific scene. Both rules are obtained from collections of indoor scene images. We cast the problem of material suggestion as a combinatorial optimization considering both local material and global aesthetic rules. We have tested our system on various complex indoor scenes. A user study indicates that our system can automatically and efficiently produce a series of visually plausible material suggestions which are comparable to those produced by artists.
Collections of images and 3D models hide in them many interesting aspects of our surroundings. Significant efforts have been devoted to organize and explore such data repositories. Most such efforts, however, process the two data modalities separately, and do not take full advantage of the complementary information that exist in different domains, which can help to solve difficult problems in one by exploiting the structure in the other. Beyond the obvious difference in data representations, a key difficulty in such joint analysis lies in the significant variability in the structure and inherent properties of the 2D and 3D data collections, which hinders cross-domain analysis and exploration. We introduce CrossLink, a system for joint image-3D model processing that uses the complementary strengths of each data modality to facilitate analysis and exploration. We first show how our system significantly improves the quality of text-based 3D model search by using side information coming from an image database. We then demonstrate how to consistently align the filtered 3D model collections, and then use them to re-sort image collections based on pose and shape attributes. We evaluate our framework both quantitatively and qualitatively on 20 object categories of 2D image and 3D model collections, and quantitatively demonstrate how a wide variety of tasks in each data modality can strongly benefit from the complementary information present in the other, paving the way to a richer 2D and 3D processing toolbox.
Both 3D models and 2D images contain a wealth of information about everyday objects in our environment. However, it is difficult to semantically link together these two media forms, even when they feature identical or very similar objects. We propose a joint embedding space populated by both 3D shapes and 2D images of objects, where the distances between embedded entities reflect similarity between the underlying objects. This joint embedding space facilitates comparison between entities of either form, and allows for cross-modality retrieval. We construct the embedding space using 3D shape similarity measure, as 3D shapes are more pure and complete than their appearance in images, leading to more robust distance metrics. We then employ a Convolutional Neural Network (CNN) to "purify" images by muting distracting factors. The CNN is trained to map an image to a point in the embedding space, so that it is close to a point attributed to a 3D model of a similar object to the one depicted in the image. This purifying capability of the CNN is accomplished with the help of a large amount of training data consisting of images synthesized from 3D shapes. Our joint embedding allows cross-view image retrieval, image-based shape retrieval, as well as shape-based image retrieval. We evaluate our method on these retrieval tasks and show that it consistently out-performs state-of-the-art methods, and demonstrate the usability of a joint embedding in a number of additional applications.
Computing similarities or distances between 3D shapes is a crucial building block for numerous tasks, including shape retrieval, exploration and classification. Current state-of-the-art distance measures mostly consider the overall appearance of the shapes and are less sensitive to fine changes in shape structure or geometry. We present shape edit distance (SHED) that measures the amount of effort needed to transform one shape into the other, in terms of re-arranging the parts of one shape to match the parts of the other shape, as well as possibly adding and removing parts. The shape edit distance takes into account both the similarity of the overall shape structure and the similarity of individual parts of the shapes. We show that SHED is favorable to state-of-the-art distance measures in a variety of applications and datasets, and is especially successful in scenarios where detecting fine details of the shapes is important, such as shape retrieval and exploration.
We present a deformation-driven approach to topology-varying 3D shape correspondence. In this paradigm, the best correspondence between two shapes is the one that results in a minimal-energy, possibly topology-varying, deformation that transforms one shape to conform to the other while respecting the correspondence. Our deformation model, called GeoTopo transform, allows both geometric and topological operations such as part split, duplication, and merging, leading to fine-grained and piecewise continuous correspondence results. The key ingredient of our correspondence scheme is a deformation energy that penalizes geometric distortion, encourages structure preservation, and simultaneously allows topology changes. This is accomplished by connecting shape parts using structural rods, which behave similarly to virtual springs but simultaneously allow the encoding of energies arising from geometric, structural, and topological shape variations. Driven by the combined deformation energy, an optimal shape correspondence is obtained via a pruned beam search. We demonstrate our deformation-driven correspondence scheme on extensive sets of man-made models with rich geometric and topological variation and compare the results to state-of-the-art approaches.
Using projection mapping enables us to bring virtual worlds into shared physical spaces. In this paper, we present a novel, adaptable and real-time projection mapping system, which supports multiple projectors and high quality rendering of dynamic content on surfaces of complex geometrical shape. Our system allows for smooth blending across multiple projectors using a new optimization framework that simulates the diffuse direct light transport of the physical world to continuously adapt the color output of each projector pixel. We present a real-time solution to this optimization problem using off-the-shelf graphics hardware, depth cameras and projectors. Our approach enables us to move projectors, depth camera or objects while maintaining the correct illumination, in realtime, without the need for markers on the object. It also allows for projectors to be removed or dynamically added, and provides compelling results with only commodity hardware.
Cameras attached to small quadrotor aircraft are rapidly becoming a ubiquitous tool for cinematographers, enabling dynamic camera movements through 3D environments. Currently, professionals use these cameras by flying quadrotors manually, a process which requires much skill and dexterity. In this paper, we investigate the needs of quadrotor cinematographers, and build a tool to support video capture using quadrotor-based camera systems. We begin by conducting semi-structured interviews with professional photographers and videographers, from which we extract a set of design principles. We present a tool based on these principles for designing and autonomously executing quadrotor-based camera shots. Our tool enables users to: (1) specify shots visually using keyframes; (2) preview the resulting shots in a virtual environment; (3) precisely control the timing of shots using easing curves; and (4) capture the resulting shots in the real world with a single button click using commercially available quadrotors. We evaluate our tool in a user study with novice and expert cinematographers. We show that our tool makes it possible for novices and experts to design compelling and challenging shots, and capture them fully autonomously.
We present algorithms for extracting an image-space representation of object structure from video and using it to synthesize physically plausible animations of objects responding to new, previously unseen forces. Our representation of structure is derived from an image-space analysis of modal object deformation: projections of an object's resonant modes are recovered from the temporal spectra of optical flow in a video, and used as a basis for the image-space simulation of object dynamics. We describe how to extract this basis from video, and show that it can be used to create physically-plausible animations of objects without any knowledge of scene geometry or material properties.
Blackboard-style lecture videos are popular, but learning using existing video player interfaces can be challenging. Viewers cannot consume the lecture material at their own pace, and the content is also difficult to search or skim. For these reasons, some people prefer lecture notes to videos. To address these limitations, we present Visual Transcripts, a readable representation of lecture videos that combines visual information with transcript text. To generate a Visual Transcript, we first segment the visual content of a lecture into discrete visual entities that correspond to equations, figures, or lines of text. Then, we analyze the temporal correspondence between the transcript and visuals to determine how sentences relate to visual entities. Finally, we arrange the text and visuals in a linear layout based on these relationships. We compare our result with a standard video player, and a state-of-the-art interface designed specifically for blackboard-style lecture videos. User evaluation suggests that users prefer our interface for learning and that our interface is effective in helping them browse or search through lecture videos.
Multi-domain subspace simulation can efficiently and conveniently simulate the deformation of a large deformable body, by constraining the deformation of each domain into a different subspace. The key challenge in implementing this method is how to handle the coupling among multiple deformable domains, so that the overall effect is free of gap or locking issues. In this paper, we present a new domain decomposition framework that connects two disjoint domains through coupling elements. Under this framework, we present a unified simulation system that solves subspace deformations and rigid motions of all of the domains by a single linear solve. Since the coupling elements are part of the deformable body, their elastic properties are the same as the rest of the body and our system does not need stiffness parameter tuning. To quickly evaluate the reduced elastic forces and their Jacobian matrices caused by the coupling elements, we further develop two cubature optimization schemes using uniform and non-uniform cubature weights. Our experiment shows that the whole system can efficiently handle large and complex scenes, many of which cannot be easily simulated by previous techniques without limitations.
In this paper, we propose a full featured and efficient subspace simulation method in the rotation-strain (RS) space for elastic objects. Sharply different from previous methods using the rotation-strain space, except for the ability to handle non-linear elastic materials and external forces, our method correctly formulates the kinetic energy, centrifugal and Coriolis forces which significantly reduces the dynamic artifacts. We show many techniques used in the Euclidean space methods, such as modal derivatives, polynomial and cubature approximation, can be adapted to our RS simulator. Carefully designed experiments show that the equation of motion in RS space has less non-linearity than its Euclidean counterpart, and as a consequence, our method has great advantages of lower dimension and computational complexity than state-of-the-art methods in the Euclidean space.
Model reduction has popularized itself for simulating elastic deformation for graphics applications. While these techniques enjoy orders-of-magnitude speedups at runtime simulation, the efficiency of precomputing reduced subspaces remains largely over-looked. We present a complete system of precomputation pipeline as a faster alternative to the classic linear and nonlinear modal analysis. We identify three bottlenecks in the traditional model reduction precomputation, namely modal matrix construction, cubature training, and training dataset generation, and accelerate each of them. Even with complex deformable models, our method has achieved orders-of-magnitude speedups over the traditional precomputation steps, while retaining comparable runtime simulation quality.
We present a model-reduced variational Eulerian integrator for incompressible fluids, which combines the efficiency gains of dimension reduction, the qualitative robustness of coarse spatial and temporal resolutions of geometric integrators, and the simplicity of sub-grid accurate boundary conditions on regular grids to deal with arbitrarily-shaped domains. At the core of our contributions is a functional map approach to fluid simulation for which scalar- and vector-valued eigenfunctions of the Laplacian operator can be easily used as reduced bases. Using a variational integrator in time to preserve liveliness and a simple, yet accurate embedding of the fluid domain onto a Cartesian grid, our model-reduced fluid simulator can achieve realistic animations in significantly less computational time than full-scale non-dissipative methods but without the numerical viscosity from which current reduced methods suffer. We also demonstrate the versatility of our approach by showing how it easily extends to magnetohydrodynamics and turbulence modeling in 2D, 3D and curved domains.
Existing multigrid methods for cloth simulation are based on geometric multigrid. While good results have been reported, geometric methods are problematic for unstructured grids, widely varying material properties, and varying anisotropies, and they often have difficulty handling constraints arising from collisions. This paper applies the algebraic multigrid method known as smoothed aggregation to cloth simulation. This method is agnostic to the underlying tessellation, which can even vary over time, and it only requires the user to provide a fine-level mesh. To handle contact constraints efficiently, a prefiltered preconditioned conjugate gradient method is introduced. For highly efficient preconditioners, like the ones proposed here, prefiltering is essential, but, even for simple preconditioners, prefiltering provides significant benefits in the presence of many constraints. Numerical tests of the new approach on a range of examples confirm 6--8x speedups on a fully dressed character with 371k vertices, and even larger speedups on synthetic examples.
In this paper, we study the use of the Chebyshev semi-iterative approach in projective and position-based dynamics. Although projective dynamics is fundamentally nonlinear, its convergence behavior is similar to that of an iterative method solving a linear system. Because of that, we can estimate the "spectral radius" and use it in the Chebyshev approach to accelerate the convergence by at least one order of magnitude, when the global step is handled by the direct solver, the Jacobi solver, or even the Gauss-Seidel solver. Our experiment shows that the combination of the Chebyshev approach and the direct solver runs fastest on CPU, while the combination of the Chebyshev approach and the Jacobi solver outperforms any other combination on GPU, as it is highly compatible with parallel computing. Our experiment further shows position-based dynamics can be accelerated by the Chebyshev approach as well, although the effect is less obvious for tetrahedral meshes. The whole approach is simple, fast, effective, GPU-friendly, and has a small memory cost.
Level sets have been established as highly versatile implicit surface representations, with widespread use in graphics applications including modeling and dynamic simulation. Nevertheless, level sets are often presumed to be limited, compared to explicit meshes, in their ability to represent domains with thin topological features (e.g. narrow slits and gaps) or, even worse, material overlap. Geometries with such features may arise from modeling tools that tolerate occasional self-intersections, fracture modeling algorithms that create narrow or zero-width cuts by design, or as transient states in collision processing pipelines for deformable objects. Converting such models to level sets can alter their topology if thin features are not resolved by the grid size. We argue that this ostensible limitation is not an inherent defect of the implicit surface concept, but a collateral consequence of the standard Cartesian lattice used to store the level set values. We propose storing signed distance values on a regular hexahedral mesh which can have multiple collocated cubic elements and non-manifold bifurcation to accommodate non-trivial topology. We show how such non-manifold level sets can be systematically generated from convenient alternative geometric representations. Finally we demonstrate how this representation can facilitate fast and robust treatment of self-collision in simulations of volumetric elastic deformable bodies.
We present a learned model of human body shape and pose-dependent shape variation that is more accurate than previous models and is compatible with existing graphics pipelines. Our Skinned Multi-Person Linear model (SMPL) is a skinned vertex-based model that accurately represents a wide variety of body shapes in natural human poses. The parameters of the model are learned from data including the rest pose template, blend weights, pose-dependent blend shapes, identity-dependent blend shapes, and a regressor from vertices to joint locations. Unlike previous models, the pose-dependent blend shapes are a linear function of the elements of the pose rotation matrices. This simple formulation enables training the entire model from a relatively large number of aligned 3D meshes of different people in different poses. We quantitatively evaluate variants of SMPL using linear or dual-quaternion blend skinning and show that both are more accurate than a Blend-SCAPE model trained on the same data. We also extend SMPL to realistically model dynamic soft-tissue deformations. Because it is based on blend skinning, SMPL is compatible with existing rendering engines and we make it available for research purposes.