We present a software simulation system called LEap SURgical simulator (LESUR), which incorporates a human computer interface device, Leap, developed by Ultraleap Inc., to give users training in using their hands and fingers in a dexterous way, especially for surgeon trainees. There are two interaction systems in the simulator. One is for coarser dexterity training using a Kuka style robotic arm (Leapulator), and the other is a low-cost method for surgical training that can be used with a Da Vinci-like robotic surgical system. Existing surgical simulators, like the 3D systems’ Touch and Phantom devices, do not give enough dexterity training for finger motion. In particular, for laparoscopic and minimally invasive surgical systems, it is necessary to acquire skills with fine finger motion and dexterity. Our simulation system aims at developing superior expertise for trainees. We use a Leap motion device to capture the finger motion and a two-mode simulator to provide different levels of dexterity training.
Passive motion is often used in physical skill transfer. This study investigated sense of agency during virtual drumming with visual, vibrotactile and auditory sensation feedback in three types of forearm motion: voluntary, passive and imagined. Agency judgement showed that visual, vibrotactile and auditory feedback contributed to sense of agency around 25 %- 45 % depending on the type of forearm motion. Motor command in voluntary motion contributed about 35 %, while proprioception in passive motion did about 25 %. Threse results suggested that multiple sensation feedback contributed to generate sense of agency.
This poster describes the implementation of a performant 2D drawing application in the browser that renders Signed Distance Functions (SDF) compiled from user input. SDFs are well suited for CAD applications because they reveal elegant boolean operations and efficiently allow for superior anti-aliasing. Because of their reliance on the GPU, SDFs have not traditionally lent themselves to graphical user input. By compiling shaders on the fly from user input we are able to seamlessly interact with rendered SDFs in a CAD interface.
Synesthesia is a neurological phenomenon still largely unknown to and misunderstood by many. This project aims to gain an insight into the perceptual experiences of those diagnosed with synesthesia, focusing specifically on flavour-to-vision synesthesia, which cause a person to see abstract shapes, colours, or textures whilst tasting the flavours of food and drinks. In this research, we have developed an original method in 3D design, creating an artefact as a collaborative practice-based research output between synesthetes’ sketches and artistic interpretation. By exploring 3D sculpting techniques and treatments of printed materials, we produced a set of three physical artefacts to represent synesthetic perception (see Fig. 1).
In this paper, we present a simple and intuitive approach to transfer existing FACS (Facial Action Coding System) based facial rig and its data set on to a different facial model that has a different mesh topology to support rigging and animation pipeline for visual effects films, 3D animations, and video games. Facial rigs are custom built for each character and are time consuming and complex to create, and even minor tweaks on the model typology may cause the rigging process to start from scratch. In order to maximize the data reuse, this system can apply one set of data onto a variety of human or creature facial rigs.
We present a new method for automatic font animation, where the outlines of one or more glyphs are automatically segmented into sets of curves that can be efficiently controlled over time. This segmentation takes into account the directionality of similar segments, and allows arbitrary levels of subdivision. We showcase different examples of physics-based effects that can be driven by motion sensors in mobile devices to achieve novel and engaging text experiences. Our method works over existing off-the-shelf fonts, enabling new automatic text-based effects previously requiring lots of manual work.
We present a novel method for organizing Bézier bounded geometry based on affine similarity and visual saliency. For this computationally expensive (many-to-many) problem, we propose a highly parallel algorithm, leveraging programmable GPU pipeline, computing pairwise affine transformations to classify Bézier geometry into clusters, where paths in each cluster are affine transforms of each other. Using these clusters, we propose a method to organize paths into meaningful groups, even from complex, unstructured geometry, common in real-world vector art. We propose a function to quantify relative importance of these groups, using attributes such as complexity and frequency of occurrence, resulting in meaningful, reusable assets. Our method is both robust and performant, capable of processing thousands of paths within milliseconds.
We propose a novel paradigm for concurrent editing of vector geometry. Our method automatically establishes an editing context, which encompasses similar Bv́ezier geometry within a graphic object (local repetition), as well as multiple instances of entire objects (global repetition). We start with given key geometry, efficiently analyze entire vector graphic document to identify similar geometry, and then propagate modifications (both geometric and stylistic) from key geometry to its variants. To accomplish this, we orchestrate Procrustes algorithm in a one-to-many solve to determine all affine variants of key geometry. This solve also computes a per-variant transformation matrix which is used to propagate modifications. Our method is performant and efficient, identifying and concurrently modifying tens of thousands of objects in real time. In addition, it does not require any instrumentation from designers, and is thus applicable to all existing vector graphics documents (such as SVG and PDF files).
Children's National Hospital is a leading pediatric teaching hospital with medical students, residency programs, fellowships, and research initiatives. Children's National uses online e-learning in its training, including interactive courses with 3D animated virtual simulations.
Infants and young children can be hard to diagnose since they cannot easily tell doctors where it hurts or what they are feeling. Thorough diagnosing is important because diseases may often resemble other issues. Medical trainees of all levels require practice in the process of forming a diagnosis through organized steps in information gathering via medical history, lab results, and physical exam findings.
The creation of educational videos of ill infants pose unique problems since children cannot self-consent, have rapid decompensations from minute to minute, and video capture can put the infant at risk. For example, a videographer would need to be in close proximity to a small child and could lead to increased exposures or prevent the care team from giving timely care. In addition, children often present with only one or two findings and these expand with time. As we work to ensure diversity in education and learning, the ability to create cases with diverse patient populations is important. While in a clinic setting, it is unethical to "allow" disease progression. Using 3D virtual simulation, the diagnostic process can be taught without risk to the patient.
Procedural terrain generation refers to the generation of terrain features, such as landscaping, rivers or road networks, through the use of algorithms, with minimal input required from the user. In the process of game development, generating terrain is often an important part of the game development process. Traditional generation methods are often too time consuming especially with larger terrain maps. On the other hand, procedural methods that generate terrain automatically often do not have much user control over the output. We explore the usage of conditional generative adversarial networks in the creation of road maps, as well as the application of such road maps in the creation of game levels in game development engines such as Unreal Engine 4.
In this research, we propose a digital fabrication method that can print reusable objects rapidly using the clay material.
Therefore, we developed a magnetic force control device and a magnetic thermoplastic material that deforms dynamically. When the user creates a shape model on the screen, the clay material instantly changes to the shape of the designed model. The user can confirm the shape in a short time, and the material can be reused. In this paper, we describe the design and implementation of magnetic materials and control devices, evaluation, and future work.
In this paper, we present the concept of Remote Empathetic Viewpoints, an approach which allows the viewer to simultaneously access multiple remote viewpoints, granting us the ability to explore and extend the concepts of Cubism in 3D and in animation. In Remote Empathetic Viewpoints, we utilize a single Primary View Camera, a small set of Control lights, and a significantly large number of secondary cameras, whose positions and directions are referenced by Control Lights. By using such a multi camera system, we overcome the “shower door” affect that comes from using multiple cameras, which are used to obtain cubist rendering. By animating camera movement, we obtain temporal cubist art.
A powerful rigging pipeline is proposed to automatically rig the raw scans of human model with 1) fitting a Rigged Parametric Body Model (RPBM) to the scan with a novel and effective energy formulation, 2) cutting the scan to break self-occlusions automatically, 3) inpainting the geometry and texture in UV parametric space to create a complete avatar and 4) transferring the rigging information from the RPBM to the avatar. To further improve the fitting result, Semantic Deformation Component (SDC) is generated and utilized to replace the original shape blendshapes of the RPBM in the fitting stage, which can also be used for body reshaping. Our method is highly effective and robust in rigging the raw human scan models.
In this paper, I present a web-based experience named Wanderlust into (y)our past, which aims to mediate a bond with a partner, family member, and loved ones. Due to the COVID-19 pandemic in 2020, couples and families are learning how to co-exist healthily in quarantined life where they live together all the time in a confined space. Wanderlust into (y)our past takes them together to a virtual time travel to the places where they have been in the past and nudges them to share personal memories. Through these interpersonal dialogues of the past, this system encourages them to consider each other and take collaborative and prosocial actions towards the new normal in the post-pandemic. This paper describes backgrounds, approaches, ongoing functional prototype, and a future direction.
One of the problems in virtual reality is a natural interaction with a virtual object. In this study, we focus on an interaction with a virtual object outside a user’s field of view. The position of the virtual object is estimated without looking at it by using frequency change and/or sound pressure change as auditory stimuli.
We present ”CrowbarLimbs”, a new method with two deformable extending virtual limbs for text entry in virtual reality (VR) which relies on a crowbar-like metaphor. Text entry is the basis of many applications but remains challenging in VR environments, where some body parts of a user may quickly get fatigued by using previous selection-based methods [Speicher et al. 2018], such as the ray metaphor [Lee and Kim 2017] and ”DrumStick” [Doronichev 2016]. By adding two deformable virtual limbs and placing the virtual keyboard at a user-preferred location, our method can help users to place their hands in a comfortable posture, thus reducing the physical fatigue of different body parts, such as arms and shoulders. To the best knowledge of ours, previous text entry papers have not yet discussed which metaphor is more suitable for reducing physical fatigue. We thus introduce “CrowbarLimbs” to allow less fatigue, good system usability, and comparable text entry speed/accuracy to previous methods.
In this paper we explore how different interaction types (POV, cloth, visual effects, and background music) can influence the user experiences in terms of interest, immersion, fun, and ease of use. To this end, we design an interactive virtual reality fashion show system with a pipeline approach and develop it using Unity3D game engine and HTC Vive HMD. A pilot study with 12 participants shows that the four interaction types can positively affect the user experiences.
To smoothly operate an HMD controller in the 3D space, a variety of operation methods that are linked to the real hand movements have been proposed. Many controller recognition methods are based on outside-in tracking. Therefore, those controllers have the disadvantage that it cannot be operated if it moves into the occluded area from the sensor. In this paper, we propose an inside-out controller for HMD using a smartphone that can estimate its own position and the angle via image recognition. This is a system that transmits the controller’s 6DoF(six degrees of freedom) information to the HMD and performs raycasting based on the 6DoF information. Wherever the controller is located, the user can operate it. This method covers the disadvantages of existing HMD controllers.
We estimate depth maps from real-life monocular 360º VR videos using a Deep Neural Net architecture trained on 3D point-based renderings of people (called depth “mannequins”) captured with a Multi36 camera setup and processed with a combined pipeline of AI Instance Segmentation, Structure from Motion and Multi View Stereo methods.
“reFrame” is an optically see-through Augmented Reality (AR) platform capable of displaying parallax-free images superimposed over physical objects and the scenery behind it. It uses head-tracking technology and off-axis perspective projection to simulate the motion parallax perspective of a virtual 3D scene in relation to a user's position in space. This perspective corrected scene is then rendered on an optically see-through display, practically turning into a parallax-free and scalable general-purpose Heads Up Display (HUD). reFrame combines established and affordable technologies to offer an extremely accessible alternative to available mixed-reality systems, as well as a medium to explore the practical and creative possibilities of spatial augmented reality. It provides an opportunity to focus on subject-centered and attention-based embodied interaction paradigms that are less explored in other forms of Mixed Reality (MR). This paper is offered as a proof of concept and a starting point for further research and conversation.
Due to the limited range of depth sensors, the depth estimators for wild images were trained by ordinal data. Without the need of re-train, we proposed an approach to create seamlessly ordinal depth for 360 wild images. The 360 depth was applied to VR display for improving the virtual perception.
We propose a projection for omnidirectional images called TubeMap which is efficient for omnidirectional view interpolation. This new projection keeps linearity on the captured scenes and covers almost every direction in a single region with no discontinuities by extending the projection planes of a cube map along the direction from one camera towards another. TubeMap makes it easy to construct correspondence between two omnidirectional images while eliminating distortion on interpolated views.
In this paper, we present a hybrid-haptic feedback system with a stationary fan and wireless controller with a fan. To explore the possibility of using our system to expand the users’ perception of environmental wind, we investigate the users’ perception of the wind blowing area combined with different distances and handheld wind speeds. The results show that compared with using the stationary fan only, the perceptional winds area is increased by at least 59.9% when using our hybrid-haptic device.
Unlike playback of a video where a user can only interact by altering the video’s playback, a recorded Virtual Reality (VR) experience when played back in VR allows unprecedented avenues for interaction, including alteration of the content. Thus, we present Vuja De, the propensity of finding novelty in familiar experiences. It enables re-experiencing VR recordings anew each time because of the multiple interaction outcomes made possible. Subsequently, we discuss comprehensive interaction scenarios demonstrating the utility of interactive recordings in realizing diverse possibilities.
In this work, we present a method to turn still life paintings with global illumination effects into dynamic paintings with moving lights. Our method specifically focuses on still life images containing (1) glass objects, which can have specular highlights with Fresnel effect and (2) fruits, which can have reflection and subsurface scattering. Our goal is to preserve the original look-and-feel of the still-life paintings while allowing the user to move the light source anywhere in the scene, causing the shadows, diffuse shading, as well as reflections and specular highlights to move according to the new light position. We have provided a proof of concept based on an original digital painting. This method can be used to turn any similar still life painting into a dynamic painting.
We present a color mapping method that corresponds to intensity and considers the interaction of light among objects in a 3-dimensional scene. Previous methods that map color according to geometric specifications or a rendered result do not reproduce the interreflection of surrounding objects. Using a path tracer, the proposed method replaces the radiance with an arbitrary one-dimensional texture in the sampled light transport path. The texture appearing on the object is calculated by a weighted average of the modified estimations. The proposed method can generate color variation in physically based intensity. It hence can reproduce physical events such as caustics or color bleeding while retaining the flexibility of rendering within a path tracing framework.
We present a rendering system for 4D ultrasound data based on Monte Carlo path tracing, where a recurrent denoising autoencoder is trained on a large collection of images to produce noise-free images with a reduced number of samples per pixel. While the diagnostic value of photorealistic shading for 3D medical imaging has not been established definitively, the enhanced shape and depth perception allow for a more complete understanding of the data in a variety of scenarios. The dynamic nature of ultrasound data typically limits the global illumination effects that can be rendered interactively, but we demonstrated that AI-based denoising together with Monte Carlo path tracing can be used both for interactive workflows and for rendering an entire heartbeat sequence at high quality in about a minute, while also allowing for complex lighting environments. Specifically, our contribution is a model compatible with the NVIDIA OptiX interactive denoiser, which has been trained on ultrasound-specific rendering presets and data.
In this study, we propose a method to produce a simple 3D aerial image display with a wide viewing angle. In recent years, research on aerial imagery has been actively evolving. However, in order to create a 3D aerial image with a wide viewing angle, complex and large-scale devices are still required. In the present study, the visual field of a retro-transmissive optical system is widened by a symmetric mirror structure that also enables multiple 3D images to be seamlessly coupled midair to form a wide viewing angle 3D display. As a result, a viewing angle approximately three times that of the conventional system is realized using only simple optical systems.
In the field of computational design of meta materials, complex patterns of meso-structures have seen increased interest because of the ability to locally control their flexibility through adjustment of the meso-structure parameters. Such structures come with a number of advantages like, despite the simplicity of their fabrication, their ability to nestle to sophisticated free-form surfaces. However, the simulation of such complex structures still comes with a high computational cost. We propose an approach to reduce this computational cost by abstracting the meso-structures and encoding the properties of their elastic deformation behavior into a different set of material parameters. We can thus obtain an approximation of the deformed pattern by simulating a simplified version of the pattern using the computed material parameters.
In this poster, we present a heterogeneous architecture for estimating 6D object pose from RGB images. First, we use a two-stream network to extract robust 3D-to-2D embedding feature correspondence. The segmentation stream processes the RGB information and spatial features individually. Then, we construct another fusion network to couple color and positional features, and predict the locations of keypoints in the regression stream. The pose can be obtained by an efficient RANSAC-based PnP algorithm. Moreover, we design an end-to-end iterative pose refinement procedure that further improves the reliable pose estimation. Our method outperforms state-of-the-art approaches in two public datasets.
Although artists’ actions in photo retouching appear to be highly nonlinear in nature and very difficult to characterize analytically, we find that the net effects of interactively editing a mundane image to a desired appearance can be modeled, in most cases, by a parametric monotonically non-decreasing global tone mapping function in the luminance axis and by a global affine transform in the chrominance plane. This allows us to greatly simplify the existing CNN methods for mimicking the artists in photo retouching, and design a new artful image regeneration network (AIRNet). The objective of AIRNet is to learn the image-dependent parameters of the luminance tone mapping function and the affine chrominance transform, rather than learning the end-to-end pixel level mapping as in the standard practice of current CNN methods for image restoration and enhancement. The proposed new approach reduces the complexity of the neural network by two orders of magnitude, and as a side benefit, it also improves the robustness and the generation capability at the inference stage.
We present a new spatially-varying dynamic range compression algorithm for high dynamic range (HDR) images based on bound-constrained optimization using soft constraints. Rather than explicitly attenuating gradients as in previous work, we minimize an objective function to instead compute a globally optimal manipulation of input pixel differences. Our framework provides simple yet effective preservation of visually important image properties, such as order statistics and global consistency, that requires little to no parameter tuning. Our results are free of haloing, washout, and other artifacts, while retaining detail across the image’s full range. The speed of our algorithm and flexibility of the constraint framework allows our method to be easily extended to video.
We propose a GAN(Generative Adversarial Networks)-based drawing board which takes the semantic (by segmentation) and color tone (by strokes) inputs from users and automatically generates paintings. Our approach is built on a novel and lightweight feature embedding which incorporates the colorization effects into the painting generation process. Unlike the existing GAN-based image generation models which take semantics input, our drawing board has the ability to edit the local colors after generation. Our method samples the color information from users’ strokes as extra input, then feeds it into a GAN model for conditional generation. We enable the creation of pictures or paintings with semantics and color control in real-time.
As a new method that can contribute to 3D shape analysis, we propose Incremental Contour Flow (ICF), which divides a 3D object into segments separated by boundary surfaces at narrow parts. ICF utilizes a property of distance transform that generates local maxima in a center of swollen part and bottleneck-like structures in a narrow part. Core of segment is defined as a local maximum layer that has a higher value than surrounding layers in a layer structure generated by converting the distance values to integers. Voxels in a surrounding layer of a core are added to the core incrementally, thereby 3D voxels are grouped to as many segments as the cores. As shown in an example to be discussed, ICF can generates combined soap bubble-like boundary surfaces from a 3D object made from merged three spheres. Since human organs tend to be distinguished by its shape in medical 3D images, ICF can be useful in the medical imaging applications.
Recently, numerous studies have utilized deep-learning-based approaches to detect anomalies in surveillance cameras. However, while several of these studies used motion features to detect abnormal situations, detection problems can arise due to the sparse information and irregular patterns in certain abnormal situations. We propose a means of preserving motion patterns in abnormal situations through a network called MA-Net, which solves representation problems caused by a loss of sparse information and irregular patterns. We show through experiments that the proposed method is superior to state-of-the-art methods.
We present a valid polarization-based reflection contaminated image synthesis method, which can provide adequate, diverse and authentic training dataset. Meanwhile, we enhance the neural network by introducing the reflection information as guidance and utilizing adaptive convolution kernel size to fuse multi-scale information. We demonstrate that the proposed approach achieves convincing improvements over state of the arts.
Out-of-focus points of light, when obscured by out-of-focus occluders, become “eclipsed” by a sharply-focused traveling occlusion edge which can move at a speed different from that of the occluder, and even in the opposite direction. The phenomenon can produce interesting visual effects in photos and motion pictures.
Establishing a robust measure for material similarity that correlates well with human perception is a long-standing problem. A recent work presented a deep learning model trained to produce a feature space that aligns with human perception by gathering human subjective measures. The resulting metric outperforms objective existing ones. In this work, we aim to understand whether this increased performance is a result of using human perceptual data or is due to the nature of feature learnt by deep learning models. We train similar networks with objective measures (BRDF similarity or classification task) and show that these networks can predict human judgements as well, suggesting that the non-linear features learnt by convolutional network might be a key to model material perception.
There is lots of hidden information behind the sequential data and their sequences. We proposed a model for learning visual representation by solving order prediction task. We concatenated the frame pairs, instead of concatenating the feature pairs. This concatenation makes it possible to apply a 3D-CNN to extract features from the frame pairs. Also, we proposed a new grouping, which have achieved 80 percent accuracy on average. We have modified the shuffled video clips order prediction task to the shuffled frame order prediction, by selecting a frame from each clip, by random. Then this task was solved by applying our model.
We present a novel method for fast and accurate ellipse detection based on an efficient arc grouping strategy. We first extract edges from the input image, and then obtain smooth arcs by recognizing sharp turns and inflexion points. To speed up ellipse generation, we group arcs by three intuitive yet more efficient rules, followed by a validation and a more distinctive cluster scheme to further improve the accuracy. Our approach achieves promising results on both synthetic and three real-world datasets.
Depth of Field (DoF) in games is usually achieved as a post-process effect by blurring pixels in the sharp rasterized image based on the defined focus plane. This paper describes a novel real-time DoF technique that uses ray tracing with image filtering to achieve more accurate partial occlusion semi-transparencies on edges of blurry foreground geometry. This hybrid rendering technique leverages ray tracing hardware acceleration as well as spatio-temporal reconstruction techniques to achieve interactive frame rates.
For a foreground object in motion, details of its background which would otherwise be hidden are uncovered through its inner blur. This paper presents a novel hybrid motion blur rendering technique combining post-process image filtering and hardware-accelerated ray tracing. In each frame, we advance rays recursively into the scene to retrieve background information for inner blur regions and apply a post-process filtering pass on the ray-traced background and rasterized colour before compositing them together. Our approach achieves more accurate partial occlusion semi-transparencies for moving objects while maintaining interactive frame rates.
To understand the sound propagation in a real space, we propose the mixed reality (MR) visualization system of instantaneous sound intensity, which can be measured at each measurement point with moving a handy microphone array in synchronization with sound reproduction. By visualizing the measured instantaneous sound intensities with an MR animation, we can observe the temporal flow of sound energy in the real space to understand the propagation of sound wave and acoustic properties, such as sound reflection.
In this paper, we propose a measurement and visualization system for spatial impulse responses that utilizes a moving handy microphone and Mixed Reality (MR). By enhancing the existing visualization system of sound intensity using MR, the proposed system aids in the visualization of spatial impulse responses using the estimation based on the signal measured by a moving microphone. As the visualization of the sound field varies with time, it is effective to understand the relationship between a complicated sound field, including the reflected sounds, and the reflecting objects in a room.
In this research, Visum, an immersive audio/visual application, which uses visual perception through an interface to create sound and visual objects is presented. Visual evoked potentials (VEP) are changes in brainwave electrical activity that are created when a visual stimulus is presented to an observer. Here, VEPs are generated through observing optical illusions. These VEPs are brought into an environment as three-dimensional sound and visual objects. The VEP objects can be played as an instrument via a user-controlled interface such as mixing and parameter control creating an evolving, dynamic, sounds and visuals. Visum is created for artistic and educational purposes.