SIGGRAPH '23: ACM SIGGRAPH 2023 Posters

Full Citation in the ACM Digital Library

SESSION: Animation & Simulation

3D Character Motion Authoring From Temporally-Sparse Photos

This paper presents a neural network-based learning approach that enables seamless generation of 3D human motion in-between photos to accelerate the process of 3D character motion authoring. This new approach allows users to freely edit (replace, insert, or delete) input photos and specify the transition length to generate a kinematically coherent sequence of 3D human poses and shapes in-between the given photos. We demonstrate through qualitative and subjective evaluations that our approach is capable of generating high-fidelity, natural 3D pose and shape transitions.

Art Simulates Life: 3D Visualization Takes Pediatric Hospitalist Simulations to the Next Level

Children’s National Hospital is a leading pediatric teaching hospital with medical students, residency programs, fellowships, and research initiatives. We develop virtual 3D patients to improve medical training by simulating plausible, life-threatening scenarios in infants and children for Pediatric Hospitalists. Medical simulation is widely used to enhance the readiness of medical professionals [Davila and Price 2023; Motola et al. 2013]. Children’s National’s Division of Pediatric Hospital Medicine’s Resuscitation Simulation Team (HRST) leads training for attending Hospitalists responsible for the inpatient care of sick and injured children across our entire regional network. Our 3D patient simulations provide realistic depictions of nuanced physical findings that are essential to identify uncommon, yet life threatening medical conditions in children. 3D models applicable for pediatric simulations are underrepresented compared to those used for adult patients.

Content-Preserving Motion Stylization using Variational Autoencoder

This work proposes a motion style transfer network that transfers motion style between different motion categories using variational autoencoders. The proposed network effectively transfers style among various motion categories and can create stylized motion unseen in the dataset. The network contains a content-conditioned module to preserve the characteristic of the content motion, which is important for real applications. We implement the network with variational autoencoders, which enable us to control the intensity of the style and mix different styles to enrich the motion diversity.

Improved Projective Dynamics Global Using Snapshots-based Reduced Bases

Learning Human-like Locomotion Based on Biological Actuation and Rewards

We propose a method of learning a policy for human-like locomotion via deep reinforcement learning based on a human anatomical model, muscle actuation, and biologically inspired rewards, without any inherent control rules or reference motions. Our main ideas involve providing a dense reward using metabolic energy consumption at every step during the initial stages of learning and then transitioning to a sparse reward as learning progresses, and adjusting the initial posture of the human model to facilitate the exploration of locomotion. Additionally, we compared and analyzed differences in learning outcomes across various settings other than the proposed method.

Learning to Simulate Crowds with Crowds

Controlling agent behaviors with Reinforcement Learning is of continuing interest in multiple areas. One major focus is to simulate multi-agent crowds that avoid collisions while locomoting to their goals. Although avoiding collisions is important, it is also necessary to capture realistic anticipatory navigation behaviors. We introduce a novel methodology that includes: 1) an RL method for learning an optimal navigational policy, 2) position-based constraints for correcting policy navigational decisions, and 3) a crowd-sourcing framework for selecting policy control parameters. Based on optimally selected parameters, we train a multi-agent navigation policy, which we demonstrate on crowd benchmarks. We compare our method to existing works, and demonstrate that our approach achieves superior multi-agent behaviors.

SESSION: Art & Design

Computational Design of Nebuta-like Paper-on-Wire Artworks

"F O R M S" - Creating new visual perceptions of dance movement through machine learning

"FORMS" is a new digital art concept that combines the fields of dance movement and machine learning techniques, specifically human pose detection, to create a real-time and interactive visual experience. This project aims to explore the relationship between dance and visual art by creating a framework that generates abstract and literal visual models from the dancers’ movements. The main objective of this project is to enhance the perception of dance movement by providing a new layer of visual composition. The proposed framework provides different visual forms based on human pose detection, creating a novel and real-time visual expression of the dance movement. The human pose detection model used in this project is based on state-of-the-art deep learning techniques, which analyze the positions and movements of different parts of the human body in real-time. This model allows the framework to capture movements of the dancers and translate them into unique visual forms. The case study showcases the potential of "FORMS" by demonstrating how professional young dancers can use the framework to enrich their performance and create new visual perceptions of dance movement. This study contributes to the cultivation of body awareness, understanding of the dance movement and overall enrichment of the art experience. The use of machine learning techniques showcases the potential of technology to enhance and expand the boundaries of artistic expression. The "FORMS" project is a novel and interdisciplinary approach that bridges the fields of art and technology, providing a new way to experience and perceive the dance movement.

Image Printing on Stones, Wood, and More

Metro Re-illustrated: Incremental Generation of Stylized Paintings Using Neural Networks

Metro Re-illustrated is a project that explores incremental generation of stylized paintings of city metro maps using neural networks. It begins with an interactive system for labeling time-series data on city metro maps and generating reference images. These images are fed into a neural painter that incrementally generates oil painting-like strokes on virtual canvases. The generated paintings demonstrate blending and layering features of oil paintings while capturing the progressive nature of urban development.

Palette-Based Colorization for Vector Icons

The Talk: Speculative Conversation with Everyday Objects

Communication between humans and everyday objects is often implicit. In this paper, we create speculative scenarios where users could have a conversation, as explicit communication, with everyday objects that usually may not have conversational artificial intelligence (AI) installed, such as a book. We present a design fiction narrative and a conversational system about conversations with everyday objects. Our user test showed a positive acceptance of conversation with everyday objects, and users tried to build a human-like relationship with them.

SESSION: Augmented & Virtual Reality

Acute Stress Disorder Therapy by Virtual Reality: a Case Study of Ukrainian Refugees

AI-Assisted Avatar Fashion Show: Word-to-Clothing Texture Exploration and Motion Synthesis for Metaverse UGC

An investigation of changes in taste perception by varying polygon resolution of foods in virtual environments

In recent years, metaverse has received considerable attention. We believe that as this technology develops, humanity can dine in a virtual space while maintaining a sense of immersion. Therefore, we investigated whether the taste of food is influenced by the number of polygons of CG models using VR/AR technology. We created CG models and overlaid the image onto the actual food via HMD. Then the subjects consumed the food which CG image overlaid and answered a questionnaire. Results showed that the higher the number of polygons, the less hardness was perceived and the toon-like model was more likely to affect the taste.

SESSION: Augmented & Virtual Reality

BStick: Hand-held Haptic Controller for Virtual Reality Applications

This study proposes Bstick-Mark2, a handheld virtual reality (VR) haptic controller that monitors and controls input from five fingers in real-time with pressure sensors and linear motors. Bstick-Mark2 is designed and fabricated to enable users to use both hands along with a head-mounted display (HMD) while freely roaming based on Bluetooth technologies. When a user holds the haptic controller with fingers, the data input from five pressure sensors is transmitted to microcontroller unit (MCU) to independently control the movements of five linear motors. The pressure and position data of linear motors are sent to a computer connected to a VR display through a Bluetooth module embedded in the controller and utilized in interaction with a virtual object and virtual hand movements using the Unity game engine. Bstick-Mark2 can withstand 22 N of force per finger to maintain the pressing force of a male’s finger and is compact to enable users to easily handle using their hands. It enables to make sensations of grabbing and controlling while interacting with VR content.

Down the Rabbit Hole:: Experiencing Alice in Wonderland Syndrome through Virtual Reality

Alice in Wonderland Syndrome (AIWS) is a rare perceptual disorder affecting visual processing, the perception of one's body and the experience of time. This condition can be congenital or result from various insults to the brain. There is growing interest in AIWS in providing a window into how different areas of the brain work together to construct reality. We developed a virtual reality (VR) simulation of this condition as a psychoeducational tool for students in the psychological and medical sciences and care givers to experience the different perceptual distortions common in AIWS and an opportunity to reflect on the nature of perception.

Exploring Multiple-Display Interaction for Live XR Performance

Although VR concerts offer a unique experience for audience to watch a performance, people still tend to participate in a live performance physically for the co-presence and shared experience are difficult to perceive in VR. To address this issue, we propose Actualities, a live XR performance system that integrates onsite and online concerts to create a seamless experience across multiple displays. Our system utilizes various sensors to detect signals from musical instruments and onsite audiences, digitalizing onsite performance elements into a virtual world. We project the visuals onto screens and live-stream the content for audiences to watch through various devices, and we also designed several interactive elements for the audience to interact with the public display. To evaluate our system, we conducted exploratory research to help us refine our system and improve the cross-reality experience.

Mixed Reality Racing: Combining Real and Virtual Motorsport Racing

In this work, we proposed a proof-of-concept system that combines real and virtual racing, allowing professional racing drivers and e-racers to compete in real time. The real race car is imported into a racing game, and the virtual game car is exported into the real world in AR. This allows e-racers to compete against professional racing drivers and gives the audience an exciting experience from the grandstand with the use of AR. Our system was deployed during an actual racing event, and we gathered initial feedback from users. Both e-racers and audience expressed excitement about our system, indicating a strong potential for this new form of mixed reality racing.

Mixed Reality Visualization of Room Impulse Response Map using Room Geometry and Physical Model of Sound Propagation

In this paper, an MR visualization method based on sound field modeling is proposed. Using a small quantity of measurement data, the sound field was modeled using equivalent sources and room shapes acquired via SLAM. From the modeled sound field, the estimated room impulse responses at the target grid points were then animated to visualize the sound field using MR technology. Consequently, the animation of the sound field in MR clearly represented how sound propagates, including reflections.

Redirected Walking in Overlapping Rooms

Walking in larger virtual environments than the physical one can lead to collisions with physical boundaries. Multiple locomotion techniques like Redirected Walking (RDW) and Overlapping Architecture (OA) aim to overcome this limitation. Combining these two has yet to be investigated in large physical spaces with resets. In this work, a hybrid locomotion method was implemented that combines RDW and OA. A user study was conducted where participants collected items in a virtual environment with multiple rooms. The study showed that the distance walked between resets was increased substantially, thus showing the solid advantages of combining OA and RDW.

Towards Realistic Virtual Try-on for E-commerce by Sewing Pattern Estimation

SESSION: Geometry & Modeling

Automatic Architectural Floorplan Reconstruction

High-resolution 3D Reconstruction with Neural Mesh Shading

Neural Shape Diameter Function for Efficient Mesh Segmentation

Virtual Manipulation of Cultural Assets: An Initial Case Study with Single-Joint Articulated Models

Virtual space can eliminate the physical constraints of real space. Three-dimensional (3D) digital twins enable us to experience manipulation of cultural assets that are inaccessible in real space. However, most 3D models obtained using current reconstruction techniques are static; they cannot be manipulated dynamically. In this study, we reproduce a dynamic 3D model of a simple articulated object with a rotating joint from a static point cloud only with reference to a video of the motion and a manually added rotation axis. The reconstructed actual cultural asset is used for first-person experiences in an augmented reality environment.

SESSION: Images, Video, & Computer Vision

Camouflage via Coevolution of Predator and Prey

DAncing body, Speaking Hands (DASH): Sign Dance Generation System with Deep Learning

Deformable Neural Radiance Fields for Object Motion Blur Removal

In this paper, we present a novel approach to remove object motion blur in 3D scene renderings using deformable neural radiance fields. Our technique adapts the hyperspace representation to accommodate shape changes induced by object motion blur. Experiments on Blender-generated datasets demonstrate the effectiveness of our method in producing higher-quality images with reduced object motion blur artifacts.

Efficient 3D Reconstruction of NeRF using Camera Pose Interpolation and Photometric Bundle Adjustment

SESSION: Images, Video, & Computer Vision

Improved Automatic Colorization by Optimal Pre-colorization

Automatic line-drawings colorization of anime characters is a challenging problem in computer graphics. The previous fully automatic colorization method suffers from colorization accuracy and costs colorization artists to validate and correct colorization errors. We propose to improve the colorization accuracy by introducing “pre-colorization” step into our production pipeline that requests user to manually colorize partial regions in line-drawings before doing automatic colorization. The pre-colorized regions work as clues to colorize the other regions and improve the colorization accuracy. We found an optimal region to be pre-colorized to obtain the best automatic colorization performance.

Photo-Realistic Streamable Free-Viewpoint Video

We present a novel free-viewpoint video(FVV) framework for capturing, processing and compressing the volumetric content for immersive VR/AR experience. Compared to previous FVV capture systems, we propose an easy-to-use multi-camera array consisting of mobile phones with time synchronization. In order to generate photo-realistic FVV results with sparse multi-camera input, we improve the novel view synthesis method by introducing visual hull guided neural representation, called VH-NeRF. Our VH-NeRF combines the advantages of both explicit models by traditional 3D reconstruction and the notable implicit representation of Neural Radiance Field. Each dynamic entity’s VH-NeRF is learned and supervised by the visual hull reconstructed data, and can be further edited for complex and large-scale dynamic scenes. Moreover, our FVV solution can do both effective compression and transmission on multi-perspective videos, as well as real-time rendering on consumer-grade hardware. To the best of our knowledge, our work is the first solution for photo-realistic FVV captured by sparse multi-camera array, and allow real-time live streaming of large-scale dynamic scenes for immersive VR and AR applications on mobile devices.

Point Anywhere: Directed Object Estimation from Omnidirectional Images

One of the intuitive instruction methods in robot navigation is a pointing gesture. In this study, we propose a method using an omnidirectional camera to eliminate the user/object position constraint and the left/right constraint of the pointing arm. Although the accuracy of skeleton and object detection is low due to the high distortion of equirectangular images, the proposed method enables highly accurate estimation by repeatedly extracting regions of interest from the equirectangular image and projecting them onto perspective images. Furthermore, we found that training the likelihood of the target object in machine learning further improves the estimation accuracy.

Robust Color Correction for Preserving Spatial Variations within Photographs

SegAnimeChara: Segmenting Anime Characters Generated by AI

This work introduces SegAnimeChara, a novel system of transforming AI-generated anime images into game characters while retaining unique features. Using volume-based body pose segmentation, SegAnimeChara can efficiently, zero-shot, segment body parts from generative images based on OpenPose human skeleton. Furthermore, this system integrates a semantic segmentation pipeline based on the text prompts of the existing Text2Image workflow. The system conserves the game character’s unique outfit and reduces the redundant duplicate text prompts for semantic segmentation.

Smart Scaling: A Hybrid Deep-Learning Approach to Content-Aware Image Retargeting

Updating Human Pose Estimation using Event-based Camera to Improve Its Accuracy

Utilizing LiDAR Data for 3D Sound Source Localization

This paper introduces a visualization system of 3D sound pressure distribution using a minimum variance distortionless response (MVDR) beamformer with Light Detection and Ranging (LiDAR) technology to estimate the sound source localization. By using LiDAR to capture 3D data, the proposed system calculates the time-averaged output power of the MVDR beamformer at the virtual source position for each point in the point cloud data. The results are then superimposed onto the 3D data to estimate sound sources. The proposed system provides a more visually comprehensible display of the sound pressure distribution in 3D.

SESSION: Interactive Techniques

A Proposal of Acquiring and Analyzing Method for Distributed Litter on the Street using Smartphone Users as Passive Mobility Sensors

With increased environmental protection activities, smartphone-enabled cleaning activities to deter street littering are gaining attention. We propose a method to analyze litter-on-road images captured by a smartphone camera mounted on a bicycle for users who do not require conscious care (Fig. 1). First, the user mounts the smartphone on a bicycle and starts the developed application, which creates a still image by capturing videos. The still images were then categorized using machine learning, and the type of trash was annotated in the images. Finally, to predict the distribution of trash, the probability of its influence on the environment, such as convenience stores and bars, was calculated using the machine learning model. This paper discusses our developed system’s efficacy for acquiring and analyzing methods on the road. As a fast effort, we verify the accuracy of tagging PET bottles, cans, food trays, and masks using a learning model generated by Detectron2.

Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production

ExudedVestibule: Enhancing Mid-air Haptics through Galvanic Vestibular Stimulation

This study presents a novel system that enhances air cannon tactile perception using synchronous galvanic vestibular stimulation (GVS). We conducted a user study with a within-subjects design to evaluate the enhancement effects of synchronous GVS on air cannon tactile sensations across multiple body locations. Results demonstrated significant improvements without affecting the magnitude of physical body sway, suggesting potential applications in virtual reality, particularly for augmenting existing air vortex ring haptics use cases.

sPellorama: An Immersive Prototyping Tool using Generative Panorama and Voice-to-Prompts

We proposed sPellorama, an immersive tool that enables Virtual Reality (VR) content creators to quickly prototype scenes based on verbal input. The system first converts voice input to text, then utilizes a text-guided panorama generation model to produce the described scene. The panorama is later applied to Skybox in Unity. Previously generated panorama will be preserved in the photosphere and ready to be viewed. The pilot study shows that our tool can enhance the process of discussion and prototyping for VR content creators.

Tidd: Augmented Tabletop Interaction Supports Children with Autism to Train Daily Living Skills

Children with autism may have difficulties in learning daily living skills due to repetitive behavior, which poses a challenge to their independent living training. Previous studies have shown the potential of using interactive technology to help children with autism train daily living skills. In this poster, we present Tidd, an interactive device based on desktop augmented reality projection, designed to support children with autism in daily living skills training. The system combines storytelling with Applied Behavior Analysis (ABA) therapy to scaffold the training process. A pilot study was conducted on 13 children with autism in two autism rehabilitation centers. The results showed that Tidd helped children with autism learn bed-making and dressing skills while engaging in the training process.

SESSION: Rendering & Displays

Crossed half-silvered Mirror Array: Fabrication and Evaluation of a See-Through Capable DIY Crossed Mirror Array

Crossed mirror arrays (CMAs) have recently been employed in simple retinal projection augmented reality (AR) devices owing to their wide field of view and nonfocal nature. However, they remain inadequate for AR devices for everyday use owing to the limited visibility of the physical environment. This study aims to enhance the transmittance of the CMA by fabricating it with half-silvered acrylic mirrors. Further, we evaluated the transmittance and quality of the retinal display. The proposed CMA successfully achieved sufficient retinal projection and higher see-through capability, making it more suitable for use in AR devices than conventional CMAs.

Efficient Rendering of Glossy Materials by Interpolating Prefiltered Environment Maps based on Primary Normals

We propose a method to improve the speed of the rendering of glossy surfaces including anisotropy reflection, using prefiltering. The key idea is to prefilter an environment light map for multiple primary normals. Furthermore, we propose an interpolation method to smoothly connect the boundaries generated by switching multiple prefiltered environment maps. As a result, we are able to render objects with glossy surface more accurately even in real-time.

Fabrication of Edible lenticular lens

Lenticular lenses exhibit the color changing effect depending on the viewing angle and the vanishing effect in certain directions. In this study, we propose two fabrication methods for edible lenticular lenses. One is the mold forming method, and another is the knife cutting method using a knife with the inverse structure of a lenticular lens created by an SLA 3D printer. We also evaluate the properties of the end products. The IOR of material is optimized by using ray tracing simulation.

Guided Training of NeRFs for Medical Volume Rendering

Neural Radiance Fields (NeRF) trained on pre-rendered photorealistic images represent complex medical data in a fraction of the size, while interactive applications synthesize novel views directly from the neural networks. We demonstrate a practical implementation of NeRFs for high resolution CT volume data, using differentiable rendering for training view selection.

Reverse Projection: Real-Time Local Space Texture Mapping

We present Reverse Projection, a novel projective texture mapping technique for painting a decal directly to the texture of a 3D object. Designed to be used in games, this technique works in real-time. By using projection techniques that are computed in local space textures and outward-looking, users using low-end android devices to high-end gaming desktops are able to enjoy the personalization of their assets. We believe our proposed pipeline is a step in improving the speed and versatility of model painting.

The use of Containers in OpenGL, ML and HPC for Teaching and Research Support

We share our experience of using containers (Docker and Singularity) in computer graphics teaching and research support, as well as the use of the same containers in HPC. We use OpenISS sample containers for this purpose, containers that are open-source and publicly available.

Toward Efficient Capture of Spatially Varying Material Properties

Improvements in the science and art of computer-graphics rendering, particularly with a shift in recent decades toward more physically driven models in both real-time and offline rendering, have motivated improvements in material models. However, real-world materials are often still significantly more complex in their observable light scattering than current shading models used to represent them in renderers. In order to represent these complexities at higher visible fidelity, improved methods for material acquisition and representation are desired, and one important area of continued study is capture and representation of properties of spatially varying physical materials. We present developing efforts toward acquiring and representing those spatially varying properties that build on recent work concerning parameterization techniques to improve the efficiency of material acquisition.

VirtualVoxel: Real-Time Large Scale Scene Visualization and Modification

This short paper introduces VirtualVoxel, a novel rasterization-based voxel renderer that enables real-time rendering of 4M3 resolution voxel scenes. VirtualVoxel combines the benefits of virtual texture and virtual geometry, allowing for efficient storage and rendering of texture and geometry information simultaneously.

Similar to the VirtualTexture approach, VirtualVoxel streams the appropriate level of detail into GPU memory, ensuring optimal performance. However, unlike previous graph-based voxel visualization methods, VirtualVoxel stores data linearly and employs the rasterization rendering pipeline, resulting in highly efficient modification of large scenes.

VirtualVoxel’s high-performance capabilities make it ideal for a wide range of applications, from video games to scientific visualizations. It can render scenes at resolutions of up to 4M3 voxels in real-time, providing artists and designers with a powerful new tool for creating stunning graphics.