SA '23: SIGGRAPH Asia 2023 Posters

Full Citation in the ACM Digital Library

3D Lighter: Learning to Generate Emissive Textures

We present a novel approach to generate emissive textures for luminous objects, using direct 3D supervision from a 3D model dataset. To this end, we construct Emissive Objaverse, a dataset based on the recently proposed Objaverse dataset, and propose 3D Lighter, a method using neural fields with generative latent optimization.

Aerial Display Method Using a Flying Screen with an IR Marker and Long Range Dynamic Projection Mapping

Our group proposed a method of aerial display using a flying screen suspended from a drone and dynamic projection mapping on the screen. In this study, we propose two methods as extensions of this previous study. First, we propose a structure of a large screen with a specially designed LED marker. The proposed screen structure is suitable for suspending from a drone. The LED marker has a special structure that allows the screen center position to be estimated from the captured image. Successful and stable projection of laser patterns onto the screen prototype suspended from a flying drone was demonstrated. In this experiment the distance between the screen and the projection system was approximately 16 m. Second, we propose a method that enables long range dynamic projection mapping using a high-brightness projector. Stable projection of an animation of a 2D character on a manually moving screen approximately 16 meters away from the projection system was also successfully demonstrated.

AI-supported Nishijin-ori: connecting a text-to-image model to traditional Nishijin-ori textile production

An Examination of Text Shaking Correction Methods for AR Walking

One problem with walking in AR is less readability of displayed text. Head shaking causes the displayed text to shake. The screen coordinate system(SCS) or world coordinate system(WCS) is used for displaying text with different effective distances. We propose methods to correct text shaking by combining SCS and WCS.

asmVR: VR-Based ASMR Experience with Multimodal Triggers for Mental Well-Being

Individuals are besieged by anxiety and stress throughout the world. Solutions to improve mental well-being can vary, and Autonomous Sensory Meridian Response (ASMR) is one potential method that has proven to be able to reduce stress. To that end, we introduce asmVR, a novel approach to enhancing ASMR experiences using multi-modal triggers. Combining online and offline modes, asmVR enhances ASMR tingling, offering immersive VR environments and remote ASMRist embodiments. Initial user testing shows tingles enhance and stress relief potential, along with new possibilities for VR in psychological therapy.

Auditory VR Generative System for Non-Experts to Reproduce Human Memories Through Natural Language Interactions

We propose an automatic auditory VR generative system based on natural language input and attempt to apply it to VR exposure therapy, a promising treatment for Post-Traumatic Stress Disorder (PTSD). The system consisted of the user interface, developed based on the Large Language Model (LLM), the auditory event dataset, which has metadata of “subject” and “verb” and spatial audio generator.

Augmentation of Medical Preparation for Children by Using Projective and Tangible Interface

This research aims to create an interactive experience that alleviate anxiety of pediatric patients and cause empathy within their family and medical community. We developed the novel medical preparation system through the integration of projective and tangible interfaces. Through our work, children can intuitively understand their illnesses and medical procedures in an age-appropriate and engaging manner.

Avatars for Good Drinking: An Exploratory Study of The Effects of Avatar's Body Shape on Beverage Perception

The Proteus effect is known as the effect that the self-image evoked by an avatar influences the behavior. Previous studies have shown that the Proteus effect influences food-related behaviors, but it is not clear whether it influences perceptions of food. Therefore, in this study, we investigated whether the avatar’s body shape affects beverage perception when drinking a beverage in a virtual environment. Specifically, we measured the sense of body ownership toward the avatar, the taste of the beverage, the amount of the beverage consumed, and the purchase intention of the beverage. The results showed that a gradual transition in body shape with drinking produced a significant improvement in the sense of body ownership, particularly in terms of feeling that the virtual body belonged to oneself. We also report the larger body shape significantly increases the purchase intention. Furthermore, we found that image congruence between the avatar and cola induces better taste perception.

Closest Point Exterior Calculus

Conversation Echo: Communication in virtual environments that reflects conversation contents

In this study, we propose Conversation Echo, a system that reflects the topics of conversation in a VR environment in real time.This method uses AI to convert speech data into text, extract conversation topics, and generate panoramic images to generate a VR environment, which dynamically changes the environment in real-time. This method aims to realize an experience that generates conversation topics and inspiration.

Crossing Narrative: Exploring the Possibilities of Crossing the Virtuality and Reality in Interactive Narrative Experiences

With the development of mixed reality technology, the convergence of virtual and real experiences has emerged as a trend in the interactive narratives. In this paper, we introduce “Crossing Narrative”, an interactive narrative experience that seamlessly blends virtuality and reality by utilizing real-world views and bystanders. We discuss specific methods for designing cross-reality narrative experience, focusing on three key aspects of cross-reality interactions: diegetic objects, transition effects, and bystander's avatar. Our design methods aim to enhance audience social interactions and understanding of the immersive narrative.

Datamoshing with Optical Flow

Deep Albedo: A Spatially Aware Autoencoder Approach to Interactive Human Skin Rendering

Developing a Realistic VR Interface to Recreate a Full-body Immersive Fire Scene Experience

This paper describes a research project on the development of a VR fire training system. We aimed to create a multi-sensory experience that simulates a real-world fire scene. To achieve this, we developed a motion simulator to provide physical stimulation, haptic firefighting nozzles to provide a realistic sense of tool use, and a firefighting suit that can transmit the hot/cold sensations of VR content to the entire body. We evaluated firefighter and public satisfaction with the VR firefighting experience, and outlined future research challenges to improve usability in the field.

Efficient and Accurate Physically Based Rendering of Periodic Multilayer Structures with Iridescence

Exploring Embodiment and Usability of Autonomous Prosthetic Limbs through Virtual Reality

We propose the utilization of motion capture and immersive virtual reality to explore embodiment and user perception associated with prosthetic limbs. We developed a virtual reality simulation where a user could control an amputated avatar in first-person view with full body motion capture to experimentally investigate how the movement speed of an autonomous prosthetic limb affects its embodiment, usability, competence, warmth, and discomfort. In a within-subjects design experiment, participants performed a reaching task using a virtual prosthetic lower arm that moved at six different speeds in a minimum jerk trajectory. Results showed that extremely fast movements and extremely slow movements equally reduce embodiment, usability, and competence while movements at moderate speeds maximize embodiment and usability while reducing discomfort. Our findings provide insights into developing autonomous prosthetic limbs with higher user satisfaction that take embodiment, user perception, and usability into consideration.

Expression Omnibus: Expandable Facial Expression Dataset via Embedding Analysis and Synthesis

Flying Over Tourist Attractions: A Novel Augmented Reality Tourism System Using Miniature Dioramas

This paper presents a novel augmented reality tourism system using miniature dioramas. It offers a unique, immersive experience simulating aerial exploration of tourist attractions. Our system employs advanced 3D scanning and tracking for user interaction with attraction features, along with informative voice guides. This fresh approach enhances engagement and learning in diverse contexts.

FoodMorph: Changing Food Appearance Towards Less Unhealthy Food Intake

Human dietary experiences are influenced by multiple senses. To promote healthy eating and reduce the consumption of unhealthy food, we have developed FoodMorph, a virtual reality system that immerses users in visually simulated food textures that are inedible. By presenting users with these textures, FoodMorph aims to diminish interest in and intake of unhealthy food while also assessing the dining enjoyment associated with common non-food textures. We found that concrete textures on food tend to lead to the lowest enjoyment score, which potentially effects their dietary habits.

Gaze and Graze: Illuminating Taiwanese Hand Puppet Character Display and Deconstructing Visual Engagement

Geometry Aware Texturing

In this work, we propose a novel approach to texture generation, making use of recent advancements in Latent Diffusion models, [Rombach et al. 2022] unlocked by [Zhang and Agrawala 2023], introducing control inputs to generation pipelines via ControlNet. We find that a special condition, where the mesh is encoded into UV space, can serve as a control input, producing textures that are geometrically and visually coherent, and of high quality. Using this approach, we are able to generate a unique look guided by text for existing meshes in a matter of seconds.

Ignis: Eulerian Fluid Simulation and Rendering at VR Frame Rates

We present Ignis, a GPU Eulerian fluid solver capable of simulating and rendering at VR resolutions and refresh rates. Core to our approach are an approximate shadowing technique and an adaptive dithering technique, which allow us to cheaply render fully lit volumes at high resolutions and under a range of visibility conditions. We discuss the design decisions which enable our solver to run at interactive rates.

Interactive Relative Pose Estimation for 360° Indoor Panoramas through Wall-Wall Matching Selections

We present an interactive approach to estimating the relative camera pose of two panoramas shot in the same indoor environment. Compared to the trivial interactive baseline, which would require the user to precisely select 8 or more pairs of matching points by mouse clicks, our method just needs the user to select a pair of matching walls with two mouse clicks or keyboard strokes. Our method is based on the key observation that, in most cases, there exist at least one or multiple pairs of roughly matched walls in the room layouts estimated by neural networks - which alone are sufficient to generate accurate relative camera poses. Tested on a real-world indoor panorama dataset, our method outperforms current state-of-the-art automatic methods by large margins, compensating the additional human efforts. Through user studies, we found that matched wall-wall pairs can be easily recognized and selected by humans in relatively short time, indicating that such an interactive approach is practical.

Landmark Guided 4D Facial Expression Generation

In this paper, we proposed a generative model that learns to synthesize the 4D facial expression with the neutral landmark. Existing works mainly focus on the generation of sequences guided by expression labels, speech, etc, while they are not robust to the change of different identities. Our LM-4DGAN utilizes neutral landmarks to guide the facial expression generation while adding an identity discriminator and a landmark autoencoder to the basic WGAN for achieving better identity robustness. Furthermore, we add a cross-attention mechanism to the existing displacement decoder which is suitable for the given identity.

Learning to Generate Wire Sculpture Art from 3D Models

Meta Musicking: A Playground for Exploring Alternative Realities with Others in the XR Age

Multi-Stage Manufacturing for Preoperative Medical Models with Overhanging Components

Medical models play an instrumental role in replicating a patient’s unique anatomy, enhancing surgical outcomes, and fostering effective communication between doctors and patients. While 3D printing technology offers bespoke solutions for such models, producing designs with multiple overhanging tissues often requires advanced 3D printers – making it unattainable for conventional FDM fabrication methods. To bridge this gap, we present a cost-efficient, multi-stage, printing-molding hybrid manufacturing technique that incrementally solidifies complex medical models. Leveraging our adaptive optimization algorithm, we determine the optimal molding direction and the most streamlined manufacturing process with the fewest stages. As a testament to our method’s efficacy, we successfully printed a liver model featuring overhanging tumors.

OwnDiffusion: A Design Pipeline Using Design Generative AI to preserve Sense Of Ownership

Generative Artificial Intelligence (AI) has been a fast-growing technology, well known for generating high-quality design drawings and images in seconds with a simple text input. However, users often feel uncertain about whether generative art should be considered created by AI or by themselves. Losing the sense of ownership of the outcome might impact the learning process and confidence of novice designers and design learners who seek to benefit from using Generative Design Tools. In this context, we propose OwnDiffusion, a design pipeline that utilizes Generative AI to assist in the physical prototype ideation process for novice product designers and industrial design learners while preserving their sense of ownership. The pipeline incorporates a prompt weight assessing tool, allowing designers to fine-tune the AI’s input based on their sense of ownership. We envision this method as a solution for AI-assisted design, enabling designers to maintain confidence in their creativity and ownership of a design.

Quantifying display lag and its effects during Head-Mounted Display based Virtual Reality

Recognition-Independent Handwritten Text Alignment Using Lightweight Recurrent Neural Network

Legibility refers to the ease with which handwritten content can be read and understood accurately. However, existing approaches to handwriting beautification either rely on the result of handwriting recognition and accumulate errors from the recognition system or do not address the alignment problem and are difficult to generalize to other languages. This paper presents a novel approach to improve handwriting legibility by straightening the written content. It utilizes a recurrent neural network that operates without the need for recognition, supports connected writing, and accommodates various writing styles. The results obtained with this method demonstrate significant improvements in handwriting alignment. Moreover, a single neural network model can effectively cater to multiple languages within the same writing system.

Recovering Detailed Neural Implicit Surfaces from Blurry Images

In this poster, we present a method for recovering surface details from blurry images. We achieve this by transforming input position features of the neural implicit surface and radiance field using a blur kernel and simulating the motion blur process through weighted averaging of different transformations. Then, we can obtain more clear surface reconstruction results by discarding the blur kernel during the testing phase. Experimental results on the blurred DTU dataset demonstrate that our approach exhibits robustness to blurry inputs and effectively reconstructs surface details.

Room to Room Mapping: Seamlessly Connecting Different Rooms

In this study, we propose a projection mapping technique designed to connect rooms in disparate locations virtually, creating a continuous, immersive space.

Despite technologies related to remote communication becoming widespread in recent years, the physical presence of display screens often hinders the sense of realism and immersion. Therefore, this study aims to create a sense of continuity between a local room and a remote room, irrespective of display device constraints. We achieve this by utilizing a wide-area image projection technique in a room. First, we realize a projection mapping system that can project images over a wide area, including in limited indoor spaces. This system improves the sense of connectivity with remote locations by expressing a seamless illusion of continuity between the remote and local environments. We conduct user studies and show the effectiveness of the proposed method.

Rule-of-Thirds or Centered? A study in preference in photo composition

The Rule of Thirds is a well known heuristic in photo composition. The professional photography community both uses it and derides it. We report on an experiment to test the validity of the Rule of Thirds in the simplest case: composition of a single object. Our results show that our participants overwhelmingly preferred a centered object in the image to one positioned according to the Rule of Thirds. We speculate why this is so and point to other research that addresses how we can take advantage of this “salient centeredness”.

SCOOT:Self-supervised Centric Open-set Object Tracking

We propose a novel and comprehensive general-purpose object tracking system named Self-supervised Centric Open-set Object Tracking or ‘SCOOT’. Our SCOOT encompasses a self-supervised appearance model, a fusion module for combining textual and visual features, and an object association algorithm based on reconstruction and observation. Through this system, we unlock new possibilities for enhancing the capability of open-set object tracking with the aid of language cues in real-world scenarios.

Somatic Music: Enhancing Musical Experiences through the Performer's Embodiment

Music serves as a tangible embodiment of the performer’s expression, bearing a distinct musicality that transcends auditory perception. This study delves into the distinctive musicality inherent to musicians through physical data analysis. It introduces a novel approach to the auditory experience by incorporating tactile stimulation via vibration and pressure to present an innovative channel for conveying musicality to the audience.

The primary aim is to enhance the realm of music appreciation by amplifying the scope of performers’ expressive capacities, thereby revolutionizing the conventional paradigm of music experiences.

Text-driven Tree Modeling on L-System

Text-driven methods have recently gained substantial attention in the realm of image and 3D model generation. A critical aspect of these methods is CLIP (Contrastive Language-Image Pre-training), which computes semantic similarities between input texts and resultant images. This paper introduces a text-driven approach to tree modeling, adopting an optimization technique with CLIP. Tree models are generated through L-System. We adopt genetic algorithms for optimization, determining fitness through CLIP. The efficacy of our method is demonstrated through various examples.

The Effect of Wearing Knee Supporters on the Applicable Gain of Redirected Walking

Redirected Walking (RDW) is a locomotion technique that enables users to explore extensive virtual environments while confined to limited real-world environment by manipulating orientation and coordinates within the virtual environment. Previous research has shown that RDW can be made more effective by wearing a knee-tightening device. In this study, we examined the effects of two different knee supporters—band-type and soft-type—on the applicable gain of RDW. A significant difference in applicable gain between the two supporter conditions was confirmed. This result indicates that knee tightening affects the applicable gains of RDW and suggest that a more comfortable RDW experience may be possible with the use of appropriately shaped supporters.

Towards a Psychophysically Plausible Simulation of Translucent Appearance

Understanding visual perception of materials is critical for informing image-based approaches to real-time rendering. This poster presents a new cue to translucency that can be efficiently modeled using graphical rendering.

Towards Efficient Local 3D Conditioning

Recently, Neural Implicit Representations (NIRs) have gained popularity for learning-based 3D shape representation. General representations, i.e. ones that share a decoder across a family of geometries have multiple advantages such as ability of generating previously unseen samples and smoothly interpolating between training examples. These representations, however, impose a trade-off between quality of reconstruction and memory footprint stored per sample. Globally conditioned NIRs suffer from a lack of quality in capturing intricate shape details, while densely conditioned NIRs demand excessive memory resources. In this work we suggest using a Neural Network to approximate a grid of latent codes, while sharing the decoder across the entire category. Our model achieves a significantly better reconstruction quality compared to globally conditioned methods, while using less memory per sample to store single geometry.

Usability Evaluation of VR Shopping System not Imitating Real Stores

Virtual reality (VR) shopping has been attracting attention for its potential to offer people a new purchasing experience. However, most previous studies on VR shopping have dealt with systems that imitate real stores. Therefore, we focused on VR shopping systems that do not imitate real stores (non-store type systems) and evaluated their usability.First, we conducted an experiment to compare the shopping experience between a store type system and a circularly displayed non-store type system. The results revealed that the non-store type system was superior in terms of usefulness and ease of mobility but inferior in terms of spatial recognition and likability. On the basis of these results, we then proposed a retractable non-store type system and conducted an experiment to clarify its usability. As a result, the retractable non-store type system was rated better in terms of visibility and ease of mobility than the store type system. To create a better system, we have fixed and improved some problems with the retractable system pointed out by the experiment participants.

Vector Gradient Stroke Stylized Neural Network Painting

This study focuses on the oil painting brush style transfer in deep convolutional network-based style painting models. We proposes an SVG gradient vectorization process to preserve brush stroke structures while avoiding the generation of a large number of paths. Most of the images in non-photorealistic rendering painting are raster images, suffering from blurriness and quality degradation when zoomed in, whereas vector graphics offer advantages such as scalability and detail preservation. However, existing SVG vectorization methods struggle with images containing gradient colors. The proposed method involves vectorizing each brush, analyzing the positions of main color tones, and incorporating gradient color control points. Finally, the vectorized brush results are stacked and merged. Experimental results demonstrate that this process can preserve the brush stroke structure and present gradient color effects in non-photorealistic rendering style transfer, enhancing editing flexibility and printing quality for brush style transfer.

Visual Signatures of Music Mood

Majority of the existing methods of music visualization utilized mostly frequency, tempo and volume which are rendered in realtime as animated images for the music being played. Visualization of music as static images is rarely addressed. In this paper, we propose visual signatures – static images which are generated using artificial intelligence to visualize the music mood.