SIGGRAPH Posters '25: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Posters

Full Citation in the ACM Digital Library

SESSION: Posters: 3D & Geometry

Adding Regional Control for Continuous Remeshing via Attention Flows

Jiaqi Wu
Kun Xu

Designing Balancing Toys Through Mass and Shape Optimization

Shunsuke Hirata
Yuta Noma
Koya Narumi
Yoshihiro Kawahara

We propose an algorithm to interactively design freeform balancing toys that stably balance on a single point of contact. We achieve this by positioning the center of mass outside the model’s surface while deforming the external surface. Our approach relies on a simple energy function that is fast to evaluate and optimize, allowing an interactive design process. The results confirm the feasibility of creating stable balancing toys via standard 3D printing, expanding the possibilities for mechanical design.

Interactive Camerawork Authoring System for Free-Viewpoint Dance Contents

Yu Suzuki
Naoya Iwamoto
Shigeo Morishima

Manu-Grid: UI for parameter estimation of tilt angle extended generalized projection function in illustrations

Mamoru Akiyoshi
Ayumu Sato
Suguru Saito

Compositing a background image and a foreground image produced from a 3D object requires a projection function that ensures consistency in the scene. We modified the generalized projection of [Yoshimura and Saito 2017] to allow tilted-angle images and introduced a user interface, Manu-Grid, to estimate the parameters of the projection function corresponding to the drawing method used in a background image. The interface has a useful characteristic that when a user manipulates a vanishing direction, a vanishing line, and a reference point on the ground, the others are pinned.

Minecraft to 3D: A Pipeline for High-Fidelity Reconstruction of Minecraft Worlds

Sean Hardesty Lewis

We introduce Minecraft to 3D, a novel pipeline that automatically converts any Minecraft world into a high-quality polygonal scene. A 3D convolutional network recognises Minecraft’s default objects, the block surface is resampled into a smooth height‑map, and each recognised object is substituted with a high‑quality 3D model chosen from an external library. Object locations, orientations, and tags are preserved, a separate water plane is exported for engine‑level ocean rendering, and the final scene opens natively in modern 3D engines. The pipeline processes a one‑square‑kilometre world in under three minutes on a single consumer GPU, enabling educators, indie developers, and artists to move rapidly from voxel sketches to fully lit environments.

Reassemble by Packing: Path-Valid Spectral Placement for 3D Fragment Assembly

Vinicius Gonçalves Hirono
Paula Dornhofer Paro Costa

3D reassembly needs both tight alignment and a collision-free insertion path, but most methods enforce only the former. We recast the task as constrained packing, apply FFT correlation for coarse placement, and prune unreachable poses with a flood-fill path test on the correlation map. A final ICP stage then maximizes surface alignment (instead of minimizing contact as in classic packing). Assuming a known target envelope, this path-aware spectral pipeline yields high-fidelity, physically valid reconstructions, although accuracy still depends on the initial pose sampling.

Spatial Adaptivity for Solving PDEs on Manifolds with the Closest Point Method

Nathan King
Steven Ruuth
Christopher Batty

SESSION: Animation & Simulation

Co-Speech Gesture and Facial Expression Generation for Non-Photorealistic 3D Characters

Taisei Omine
Naoyuki Kawabata
Fuminori Homma

With the advancement of conversational AI, research on bodily expressions, including gestures and facial expressions, has also progressed. However, many existing studies focus on photorealistic avatars, making them unsuitable for non-photorealistic characters, such as those found in anime. This study proposes methods for expressing emotions, including exaggerated expressions unique to non-photorealistic characters, by utilizing expression data extracted from comics and dialogue-specific semantic gestures. A user study demonstrated significant improvements across multiple aspects when compared to existing research.

Disentangled Phoneme-Prosody Mapping for Controllable 3D Facial Animation

Danzel Serrano
Przemyslaw Musialski

We introduce a two-stage pipeline that gives artists fine-grained, input-level control of audio-driven 3D facial animation. Stage 1 learns a latent relative-motion prior from neutral/offset position maps, confining deformations to realistic shapes. Stage 2 projects an explainable phoneme–prosody vector into this space, so visemes and expressions are editable in feature space. Early experiments show preserved lip-sync and natural motion, narrowing the gap between fidelity and control.

Dynamic Skinning: Kinematics-Driven Cartoon Effects for Articulated Characters

Karim Salem
Damien Rohmer
Niranjan Kalyanasundaram
Victor Zordan

We propose Dynamic Skinning (DS), an extension of rig skinning which exhibits the appearance of a physical phenomena without the need for simulation. Our approach applies offsets from traditional skinning to produce these effects based on time-delayed, filtered joint motion. We showcase a number of effects including 1) time-varying oscillation and 2) time delay across skeletal bones to produce what we call delayed linear blend skinning (dLBS) directly on skinning computation. Our approach is easy to control by artists with simple input parameters and the method is compatible with standard rigged characters.

Simulating the Mechanics of Ant Swarm Aggregations

Matthew Loges
Tomer Weiss

Ants exhibit unique abilities to self-assemble into animate, living structures. Such structures display properties of both fluid and solid-like, deformable materials. Despite much progress in our understanding of ant aggregation dynamics, simulating such phenomena has been largely overlooked in real-time graphics and animation applications. We present a constraints-based approach for simulating the collective dynamics of ants. We demonstrate ant collective behaviors interactively with compelling physical realism.

SESSION: Art & Design

Beyond Automation: Fostering Agency Between Humans, Algorithms, and Machines in Computational Design for Digital Fabrication

Cody Tucker

This project explores the role of the designer in digital fabrication workflows as digitization leads to higher levels of design automation. As digital technologies are adopted to streamline design to manufacturing workflows, elements of the creative process can become standardized to improve production efficiency at the cost of designer autonomy and product customization. In order to ensure designers’ agency and increase product variation, the Carrara project presents a collaborative tool utilizing agent-based modeling (ABM) to represent designers, fabrication machines, and algorithms as active co-participants in the design process. This co-participatory workflow enables a generative, scalable product line that takes advantage of digital efficiencies while providing the designer with autonomy and control in the creative process.

Digitizing Devotion: Virtual Religious Spaces for Cultural Preservation and Transmission

Kuo Zhang
Zhiqi Gao
Shuai Zhang
Mingrui He
Shuochen Zhao
Mengyao Guo

"Digitizing Devotion" utilizes advanced oblique photography and AI to create immersive virtual reconstructions of sacred spaces, preserving traditional worship practices for the global diaspora while ensuring cultural continuity across generations and geographical boundaries.

Dust in Time: Exploring Embodied Experience of Time via an Interactive Installation

Zhonghe Ruan
Junwei Liu
Min Fan
Haiyan Li

This paper introduces Dust in Time, a tangible and embodied art installation that allows the audience to interact with physical hourglasses and virtual particles through embodied gestures and motions. We describe the design concept and technical details of this installation. Through this conceptual tangible interactive installation, we aim to explore how tangible and embodied interaction can be used to represent the concepts of time and its relationship with human beings and promote both explicit and implicit interaction.

Exploring AI Frame Interpolation Techniques for Watercolour Animation

Tanja Nuijten
Hannes Sturm
Vincent Maurer
Avina Graefe

The animated short film Sensual explores a novel workflow for hand-painted watercolor animation, blending traditional artistic methods with AI-based frame interpolation techniques. By combining compositing with the Real-Time Intermediate Flow Estimation (RIFE) image interpolation network, we significantly reduced production time while maintaining the unique hand-painted aesthetic.

Foliager: Procedural Forest Generation from Natural Language Using Scientific Data and AI

Grace Todd
Mike Bailey

Haunting Horizons: Human–AI Auto-Cartography of Tasmanian Island Experience

Adam Hsieh

Lutruwita/Tasmania's island conditions are often misperceived as isolated and unchanging. Building on Giada Peterle's concept of auto-cartography, this paper explores Tasmania's dynamic island identity through human-AI mapping. This is achieved through the creation of haunting horizons, an interactive installation powered by a customised generative AI model. By translating my island experience into a training dataset, this work positions human-AI auto-cartography as an embodied, affective process, enabling artists and participants to engage with maps and reflect on their relations to place in new ways.

SESSION: Augmented & Virtual Reality

Anywhere Door Experience: Projection Mapping for Enhancing Entertainment and Immersion

Naoki Hashimoto
Yuki Inada

In this study, we propose an experience inspired by the Anywhere Door concept, in which users transition between multiple life-sized projected virtual spaces by opening, closing, and passing through a physical door. We demonstrate that this approach not only enhances the entertainment value of the visual experience but also increases the sense of immersion in the destination virtual space.

DiversePuppetry: An Immersive Multi-User Puppetry System Based on Asymmetric Interaction

Chun-Cheng Hsu
Wei-Chen Yen
Jen-Kai Liu
Ping-Hsuan Han

This work presents DiversePuppetry, an immersive and asymmetric puppetry interaction system that integrates a virtual reality head-mounted display (VR-HMD), a mixed reality head-mounted display (MR-HMD), and a CAVE Automatic Virtual Environment (VR-CAVE). In this project, traditional Taiwanese Budaixi puppets were digitized and incorporated into diverse forms of immersive experiences. This study explores an interactive and immersive platform for puppetry through multiple modes of control. The findings highlight the potential of asymmetric immersive interaction, offering puppetry culture a novel way of creating a complete and immersive digital experience.

DreamCraft: Interactive 3D Scene Creation from Editable Panorama in Virtual Reality

Cheng-Chih Tsai
Tse-Yu Pan

Creating interactive 3D scenes often requires technical expertise and significant time, limiting accessibility for non-experts. To address this, we present DreamCraft, a VR system enabling users to intuitively generate and edit interactive 3D environments from panoramas without professional skills. DreamCraft supports panorama generation, interactive object selection, panorama editing, and 3D reconstruction. By combining techniques like 3D Gaussian Splatting (3DGS), object segmentation, and 2D-to-3D conversion, it streamlines immersive scene creation. A user study confirmed its usability, ease of learning, and creative potential, positioning DreamCraft as a step toward accessible 3D content creation.

Evaluating the Effectiveness of Configurable Virtual Reality System for Multi-sensory Spatial Audio Training

Sinnie Choi
Delsther James Edralin
Eric Tang
Mark Harmon
Julien Roy

Enhanced Auditory Reality Simulation for Improved Mapping (EARSIM) is a stand-alone, virtual reality (VR) application that procedurally configures a multi-sensory cue system to deliver adaptive auditory localization tasks. In a pilot study, twenty-one participants completed three 40-second sessions with progressively increasing sensory cues. Median localization accuracy decreased monotonically as the number of cues increased, suggesting that the dynamic cue system was effective in modulating task difficulty. These results validate EARSIM as a configurable platform and its potential for future clinical applications in auditory rehabilitation.

Exploring Real-Time Water Surface Simulation for Immersive Virtual Reality Using Marker-Based Tracking

Li-En Lai
Chi-Yu Lin
Tse-Yu Pan
Ping-Hsuan Han

Although real-time fluid simulation in virtual environments has been widely explored, existing systems often rely on virtual models and predefined parameters, limiting their ability to capture the complexity of physical water flow. To address this, we propose a marker-based VR system that simulates water surface dynamics by tracking real-world water flow using ArUco markers. The system analyzes floating marker trajectories to generate a FlowMap, which is applied to a virtual water surface in Unity for real-time flow simulation. A controllable circular pool with water-jet units was used to create varying flow conditions, and computer vision techniques converted the data into directional vector fields. The FlowMap is continuously updated and interpolated to reduce visual lag. We implemented the prototype in a VR environment and verified the accuracy of the generated flow patterns. Results demonstrate the potential of this sensor-driven approach for realistic water simulations in immersive VR.

Hand Gesture-Driven Vertical Teleportation: Navigating Complex Height Differences in VR

Hibiki Kirihata
Tomokazu Ishikawa

This paper presents two novel teleportation methods for VR environments that address limitations of conventional parabola-based approaches when navigating varying heights. The SphereBackcast and Penetration methods utilize straight-line specification for intuitive movement to elevated locations. Experiments with 22 participants showed our methods significantly outperformed parabola-based teleportation for height differences above 2m, while maintaining comparable performance on flat terrain. NASA-TLX and SUS evaluations confirmed improved usability and reduced cognitive load, indicating these methods can be readily integrated into existing VR applications.

Preserving Intangible Cultural Heritage of Megalithic Sites using Immersive Mobile XR

Masood Masoodian
Inkeri Aula
Renata Vieira
Áurea Rodrigues
Ivo Santos
António Lacerda Diniz
Camila Campos
Rafael Prezado
Leonor Rocha

This poster introduces the INT-ACT project which aims to investigate the use of immersive XR environments for presenting the emotional, experiential and environmental dimensions of Intangible Cultural Heritage (ICH) associated with tangible cultural heritage sites. It also presents a mobile XR demonstrator, developed as part of INT-ACT, that focuses on the ICH related to a megalithic site.

Pretraining Support for Cheerleading Stunts using Virtual Reality

Mizuki Akiyama
Christian Sandor
Yuki Igarashi

Cheerleading stunts are group gymnastics performed by multiple people. As the skills involved become more challenging, it is necessary to devise better practice methods. Thus, in this paper, we propose a pretraining support system for cheerleading stunts using Virtual Reality (VR) technology. This system enables users to experience successfully performing a stunt in the virtual space by adopting the viewpoints of the cheerleaders performing various types of stunts. We evaluated our system through interviews with two experts. Our system has the potential to meaningfully augment the established training method of previsualization of stunts.

SugART: Mixed Reality Sugar Painting for Intangible Cultural Heritage Learning at Home

Haowei Xiong
Kexin Nie
Jiachen Zeng
Shujing Shen
Mengyao Guo

This paper introduces SugART, a Mixed Reality (MR) project that enables users to learn and recreate traditional sugar painting at home. By combining hand tracking, virtual guidance, and real-time feedback, our project supports creative expression and cultural education, thereby lowering barriers to participation in intangible cultural heritage through accessible and interactive digital experiences.

The Gesture Lives On: A VR-Driven Puppet Performance in Immersive Space

Yi Jen Lin
Wei-Chen Yen
Chun-Cheng Hsu

The Gesture Lives On is a real-time VR performance system that reimagines traditional Taiwanese glove puppetry through immersive, interactive means. Rooted in precise gestural movement, this art form faces decline amid shifting audience engagement. Using VR-based gesture recognition, a performer co-creates a digital duet with a virtual puppet. This hybrid space transforms tradition into contemporary expression, offering audiences a new way to experience puppetry within a shared virtual–physical environment.

Weight Illusion Induced by AR Visual Effects on the Arm

Mie Sato
Kazuki Takeyama
Naoki Hashimoto

This study investigates the effective range of the weight illusion induced by AR visual effects displayed on the arm. The results show that AR visual effects on the arm can create a “strong” impression and that using such visual effects can induce a weight illusion in which weights ranging from 100 g to 500 g are perceived as lighter when lifted with the arm augmented by these visual effects.

You Can Grow Here: A Therapeutic VR Journey for Anxiety Management

Gaeun Lee
Hope Jo
Cindy Nakhammouane
Khin Yuupar Myat

You Can Grow Here is an immersive VR experience developed for the CAVE2™ environment, aligning with the UN Sustainable Development Goal of Good Health and Well-Being. In response to the mental health challenges intensified by the COVID-19 pandemic, the project explores how interactive storytelling, ambient sound, and 3D typography can support emotional reflection and teach anxiety coping strategies. Built in Unity with custom assets from Blender and Maya, the experience differs from most clinical VR programs, allowing users to independently explore emotions, manage anxiety, and practice evidence-based calming techniques within a safe, narrative-driven space that builds emotional resilience.

SESSION: Displays & Optics

60,000nits Full-color Native RGB Single Junction 3,386PPI Micro-OLED

Kuo-Cheng Hsu
Li-Min Huang
Yujia Qiu

In this study, we independently developed a knowledge system named uNEEDXR™, successfully realizing a full-color micro-OLED device with a brightness of 60,000 nits on a silicon-based backplane. This knowledge system was developed over nine years and innovatively know-how and integrates expertise across design, processes, manufacturing, equipment, and material, overcoming the performance limitations of traditional micro-OLED architectures. The system enables high brightness, high pixel density, low power consumption, high contrast ratio, high color saturation, and tunable energy distribution (including viewing angle, wavelength, and bandwidth). Additionally, it meets customer requirements for reliability and lifespan. This technology provides a scalable production solution for near-eye display applications in augmented reality.

A Large-Étendue Direct-View Holographic Display System

Ryota Koiso
Suyeon Choi
Manu Gopakumar
Brian Chao
Jacqueline Yang
Gordon Wetzstein

Direct-view 3D displays enable immersive experiences but often cause visual discomfort from relying on binocular disparity alone. Holographic displays offer an ultimate solution by reconstructing the full light wavefront, but face scalability and viewing freedom limitations due to the spatial-bandwidth product and high cost of fine-pitch phase-only SLMs. We present a system combining an amplitude-only display with ultra-high pixel count and dynamic optical steering for a fully 3D eye box. By axially translating a lens, we expand the eye box in depth beyond the capabilities of conventional pupil-steering methods. We further extend SGD-based hologram optimization to support dual light sources and an amplitude-only SLM, enabling stereoscopic delivery with suppressed crosstalk. Our prototype shows accurate depth cues, paving the way for scalable, high-quality holographic displays with expanded viewing freedom.

A Novel Maxwellian Optics Combining Spherical Multi Pinholes and TMD for Enhanced Field of View

Shuri Futamura
Ryuichi Inui
Tomoki Matsumoto
Yasuhisa Nakano
Tatsuji Tokiwa

In this study, we propose a novel Maxwellian optics that achieves a wide field of view and evaluate its effectiveness through 2D and 3D simulations. The fundamental principle of the proposed system is based on Maxwellian view, a form of retinal projection that can mitigate the vergence–accommodation conflict. Conventional Maxwellian optics typically employ a pinhole to focus light on a single point within the pupil, enabling the projection of sharp images onto the retina with a wide field of view, independent of the eye's accommodation. However, a major limitation of such systems is the disappearance of the image when the eye rotates and the convergence point shifts outside the pupil. To address this issue, we propose a novel Maxwellian optical system that combines a spherical multi-pinhole (SMP) with a transmissive mirror device (TMD).

Aerial 3D Display with Ultra-Wide Viewing Zone Using Polarization Characteristics of LCDs

Haruki Kato
Naoki Hashimoto

We propose a naked-eye stereoscopic display with an ultra-wide viewing zone by applying the display principle of general LCDs. By replacing the polarizer of an LCD with a reflective polarizer and arranging them three-dimensionally, this technology refracts light rays freely and enables an expansion of the viewing zone. In this study, we created a prototype and confirmed that the viewing zone expanded.

An Infinity Mirror Without Apparent Mirroring

James McCann

An infinity mirror is an optical novelty that uses facing mirrors – at least one of which is partially transparent to allow viewing – to present the appearance of an infinite tunnel of copies of a scene. One limitation of infinity mirrors is that alternate reflections of the scene are – by necessity – reflected, which means one cannot create, e.g., speed tunnel effects where lights chase into or out of the apparent tunnel. I present a prototype infinity mirror that uses light cells to overcome this limitation. These cells have a different appearance when viewed from the front and back, apparently breaking the symmetry between the primary and reflected versions of the scene. The cells are made from a 3D-printed baffle and diffuser and lit with off-the-shelf programmable LED strips, resulting in an overall inexpensive-to-produce design. In this poster I discuss the construction of my prototype infinity mirror, demonstrate some simple speed tunnel effects, and discuss the design trade-offs in my simple light-cell design.

Real-Time Multispectral Lighting Reproduction

Xueming Yu
David George
John Millward
Paul Debevec

We present a real-time algorithm for driving multispectral LED lights in a spherical lighting reproduction stage to achieve optimal color rendition for a dynamic lighting environment. Previous work has driven multispectral LED lights (ours include red, green, blue, white, and amber LEDs) by solving a nonnegative least squares (NNLS) problem for each light source; the solution ensures that each light appears to be the correct RGB color seen by the camera and also optimizes how closely the lights illuminate a color chart to appear as it should in the target lighting environment. We create a real-time version of this technique by pre-computing a lookup table of these NNLS solutions across the full range of input RGB values. Since the proper relative mix of LEDs depends on chrominance and not on luminance, our lookup table can be reduced to 2D saving both storage and computation. With this technique, we can drive several thousand multispectral LED lights at video frame rates with proper color matching and color rendition for a dynamic lighting environment.

Smartphone-based Simple HMD with Multiple Mirrors and Lenticular Lens for Ultra-Wide Field of View

Tomohiro Kamide
Naoki Hashimoto

In this study, we propose a smartphone-based wide field-of-view HMD by expanding the display area using inexpensive mirrors and lenticular lenses. Lenticular lenses placed on the both edge of the display convert these areas into the multi-view displays. The expansion of the display area is achieved by observing multi-view images via the properly placed mirrors.

SESSION: Images, Video & Computer Vision

Anime Colorization Using Segment Matching with Candidate Colors

Yu Takano
Akinobu Maejima
Shugo Yamaguchi
Shigeo Morishima

We propose a new automatic colorization method for anime line drawings using segment matching with a few reference images. To address the limitations of existing segment matching methods in handling large motion gaps or small regions, the authors introduce patch-based few-shot colorization and a color shuffling process to estimate candidate colors for subsequent segment matching. This addresses the nonlinear movements that is unique to anime, and optical flow estimation struggles with. The paper demonstrates that the proposed method improves accuracy compared to the state-of-the-art segment matching method.

Assessing Learned Models for Phase-only Hologram Compression

Zicong Peng
Yicheng Zhan
Josef Spjut
Kaan Akşit

We evaluate the performance of four common learned models utilizing INR and VAE structures for compressing phase-only holograms in holographic displays. The evaluated models include a vanilla MLP, SIREN [Sitzmann et al. 2020], and FilmSIREN [Chan et al. 2021], with TAESD [Bohan 2023] as the representative VAE model. Our experiments reveal that a pretrained image VAE, TAESD, with 2.2M parameters struggles with phase-only hologram compression, revealing the need for task-specific adaptations. Among the INR s, SIREN with 4.9k parameters achieves \(\%40\) compression with high quality in the reconstructed 3D images (PSNR = 34.54 dB). These results emphasize the effectiveness of INR s and identify the limitations of pretrained image compression VAE s for hologram compression task.

Automatic Interpretation of Ancient Egyptian Texts for Education and Research

Maksim Golyadkin
Innokentiy Humonen
Yanis Plevokas
Ekaterina Bureeva
Ekaterina Alexandrova
Ilya Makarov

We introduce a pipeline for interpreting Ancient Egyptian hieroglyphic texts combining OCR, transliteration, and translation. Designed for the low-resource data, our system improves accessibility for learners and efficiency for researchers. We evaluate its performance on a new diverse dataset reflective of real-world conditions.

Confidence Estimation of Few-shot Patch-based Learning for Anime-style Colorization

Yuexiang Ji
Akinobu Maejima
Yotam Sechayk
Yuki Koyama
Takeo Igarashi

In hand-drawn anime production, automatic colorization is used to boost productivity, where line drawings are automatically colored based on reference frames. However, the results sometimes include wrong color estimations, requiring artists to carefully inspect each region and correct colors—a time-consuming and labor-intensive task. To support this process, we propose a confidence estimation method that indicates the confidence level of colorization for each region of the image. Our method compares local patches in the colorized result and the reference frame.

Emulating Emulsion: A Compact Physically-Based Model for Film Colour

Hyun Jo Jang
Hakki Karaimer
Michael Brown

From Style to Identity: AI Pipelines for Visual and Character Coherence in Film

Zhiyu Zhang

This study contrasts two generative AI (GenAI) workflows (Figure 1) addressing visual and character consistency and introduces a filmmaker-oriented framework for AI-assisted production, grounded in two practice-based short films.

Full-color natural light holographic video camera

Kihong Choi
Daeyoul Park
Keehoon Hong

This paper presents a compact handheld type holographic video camera system capable of capturing real-time, full-color complex hologram videos under natural lighting conditions. By integrating a geometric phase lens with a polarization image sensor, our system captures interference patterns without requiring specialized lighting or bulky equipment. We successfully apply conventional 2D video super-resolution techniques to the complex holograms, significantly enhancing both resolution and visibility while preserving digital refocusing capabilities. Our experimental results demonstrate that this approach satisfies three critical requirements for practical modern cameras: operation under incoherent lighting, robustness to mobile shooting condition, and compact design. This work represents an advancement toward practical holographic media applications, particularly for broadcast content production in extended reality and mixed reality environments.

G-FED: G-Buffer Guided Frame Extrapolation in Video Diffusion Models

Pedro Antonio Pena
Karthik Mohan Kumar
Damian Andrysiak
Kunal Tyagi
Rama Harihara

Rendering near-final quality previews requires a great number of samples per pixel. Recently, diffusion models have shown superior denoising capabilities, but suffer from large variance. This is further amplified by the spatial and temporal inconsistencies introduced by diffusion models. In our pipeline, we propose the use of multiple control features and forward projections to denoise 1 sample per pixel frames & extrapolate a high-quality frame to generate a consistent and controllable sequence of high-quality frames.

Physically-Based Compositing of 2D Graphics

Tyrus Tracey
Stefan Diaconu
Sebastian Dille
S. Mahdi H. Miangoleh
Yağız Aksoy

Predicting Colors in Unpainted Gaps for Anime-Style Illustration

Masahiro Kono
Akinobu Maejima
Yuki Koyama
Takeo Igarashi

QRBTF - AI QR Code Generator

Hao Ni
Baiyu Chen
Zhaohan Wang
Zhiyong Chen
Wanyi Miao
Xin Lyu
Nan Cao

QRBTF generates artistic QR codes that maintain machine readability while enhancing visual appeal. Our method integrates diffusion models with ControlNet conditioning and adaptive brightness control. Experimental results demonstrate effective brightness contrast control in specific image regions and robust model migration capabilities. Key innovations include: (1) Ternary luminance quantization mapping QR modules to control signals; (2) Style-adaptive generation using LoRA embeddings; (3) Post-processing optimization. The system has generated over 600,000 codes via qrbtf.com , validating its utility in branding and digital marketing applications.

Reconstructing Graphic Design Posters via Visual Decomposition and Semantic Layer Translation

Veeramanohar Avudaiappan
Ritwik Murali

This work presents a pipeline to convert rasterized graphic design posters into multi-layered, editable digital assets. It decomposes the input poster into core elements, categorizes them, and converts them into semantically meaningful formats. A novel strategy using Z-index addresses layer ordering and overlap. The pipeline’s accuracy was evaluated by comparing over 24,000 original and reconstructed posters of multiple widely used sizes and aspect ratios in print & digital media. Layer semantic accuracy was assessed using the LLaVA-7B model, which showed high confidence scores across image, text, and shape layers. A user-centered evaluation with 20 participants resulted in high satisfaction ratings, confirming the pipeline’s ability to accurately reproduce poster designs with excellent fidelity, layout, and overall quality. This pipeline contributes a refined approach to reconstructing rasterized graphic design posters, advancing beyond existing methods.

SAWNA: Space-Aware Text to Image Generation

Ryugo Morita
Sho Kuno
Ryunosuke Tanaka
Rongzhi Li
Hoang Dai Dinh
Issey Sukeda

Layout-aware text-to-image generation allows users to synthesize images by specifying object positions through text prompts and layouts. This has proven useful in a variety of creative fields such as advertising, UI design, and animation, where structured scene control is essential. In real-world workflows, however, certain regions are often intentionally left empty—for instance, for headlines in advertisements, buttons in interface prototypes, or subtitles and speech bubbles in animation frames. Existing models lack the ability to explicitly preserve such negative spaces, often resulting in unwanted content and complicating downstream editing. We introduce Space-Controllable Text-to-Image Generation, a task that treats reserved areas as first-class constraints. To address this, we propose SAWNA (Space-Aware Text-to-Image Generation), a training-free diffusion framework that injects nonreactive noise into user-defined masked regions, ensuring they remain empty throughout generation. Our method maintains semantic integrity and visual fidelity without retraining and integrates seamlessly into layout-sensitive workflows in design, advertising, and animation. Experiments demonstrate that SAWNA reliably enforces spatial constraints and improves the practical usability of generated content.

Sketch-based Fluid Video Generation Using Motion-Guided Diffusion Models in Still Landscape Images

Hao Jin
Haoran Xie

Sticking Information in Plain Sight: Encoding and Detecting Hidden Stickers in the Real World

Christina Shatford
Szymon Rusinkiewicz

While there are many techniques (e.g., QR codes) that convey information via visual patterns, many applications would benefit from having those codes be imperceptible to the human eye. We present a method for designing subtle code-conveying patterns that can be printed on transparent sticker paper, then applied to real-world surfaces. An image of a scene with an encoded sticker can be sent through our localization and decoding modules, where the sticker subsection is robustly localized and decoded. We jointly optimize the encoding, localization, and decoding modules end to end, taking into account both imperceptibility and accuracy. Notably, we also account for human error when placing stickers, as pixel-perfect alignment is not something that can be reliably expected. Our model encodes and decodes 100-bit secrets, which, with BCH error correction, means that a sticker could encode 56 data bits with 40 parity bits. Experimental results show that this method is robust to sticker placement errors while being easy to deploy in the real world.

StructInbet: Integrating Explicit Structural Guidance into Inbetween Frame Generation

Zhenglin Pan
Haoran Xie

Super Resolution for Humans

Volodymyr Karpenko
Taimoor Tariq
Jorge Condor
Piotr Didyk

Super-resolution (SR) is crucial for delivering high-quality content at lower bandwidths and supporting modern display demands in VR and AR. Unfortunately, state-of-the-art neural network SR methods remain computationally expensive. Our key insight is to leverage the limitations of the human visual system (HVS) to selectively allocate computational resources, such that perceptually important image regions, identified by our low-level perceptual model, are processed by more demanding SR methods, while less critical areas use simpler methods. This approach, inspired by content-aware foveated rendering [Tursun et al. 2019], optimizes efficiency without sacrificing perceived visual quality. User studies and quantitative results demonstrate that our method achieves a reduction in computational requirements with no perceptible quality loss. The technique is architecture-agnostic and well-suited for VR/AR, where focusing effort on foveal vision offers significant computational savings.

Taking Control: Procedural Diffusion Guidance for Architectural Facade Editing

Aleksander Plocharski
Jan Swidzinski
Przemyslaw Musialski

Our training-free method enables photorealistic facade editing by combining hierarchical procedural structure control with diffusion models. Starting from a facade image, we reconstruct, edit, and guide generation to produce high-fidelity, photorealistic variations. The method ensures structural consistency and appearance preservation, demonstrating the power of symbolic modeling for controllable image synthesis.

Train Once, Generate Anywhere: Discretization Agnostic Neural Cellular Automata using SPH Method

Hyunsoo Kim
Jinah Park

Two-Stage Sketch-Based Smoke Illustration Generation Using Stream Function

Hengyuan Chang
Xiaoxuan Xie
Syuhei Sato
Haoran Xie

SESSION: Interactive Techniques

Capsule: Efficient Player Isolation for Datacenters

Zhouheng Du
Nima Davari
Li Li
Nodir Kodirov

Cloud gaming is increasingly popular. A challenge for cloud provider is to efficiently operate their datacenters, i.e., keep datacenter utilization high: a non-trivial task due to application variety. Cloud datacenter resources are also diverse, e.g., CPUs, GPUs, NPUs. We propose player-level isolation to address this challenge. We implemented such an isolation mechanism in Open 3D Engine (O3DE) with Capsule. Capsule allows multiple players to efficiently share one GPU. It is efficient because computation can be reused across players. Our evaluations show that Capsule can increase datacenter resource utilization by accommodating up to 2.25 × more players, without degrading player gaming experience. Capsule is also application agnostic. We ran four applications on Capsule-based O3DE with no application changes. Our experiences show that Capsule design can be adopted by other game engines to increase datacenter utilization across cloud providers.

Exploring Distance Management in Immersive Combat Sports Training with Encountered-type Haptic Feedback

Yen-Hua Lai
Chieh-Hsin Liu
Yu-Hsiang Weng
Ping-Hsuan Han
Chien-Hsing Chou
Wen-Hsin Chiu

Distance management is a crucial component of immersive combat sports training. However, limited research has explored the distance management in virtual reality (VR) combat training. Our preliminary study invited professional boxers to engage with a VR combat training system, aiming to evaluate the effects of encountered-type haptic feedback via tracking system. The results indicate that haptic feedback led to shorter punch distances and a lower movement ratio. However, no significant differences were observed in step count or the average distance to the opponent. These findings suggest that haptic feedback supports more efficient distance management, allowing users to move less while maintaining effective positioning.

Gaze Entropy and Driver Safety: Understanding Cognitive Failure and Situational Response Before Take-over

Mi Chang
Eun Hye Jang
Woojin Kim
Daesub Yoon

Understanding the driver's cognitive state is critical in conditionally autonomous driving, particularly when responding to Take-Over Requests (TOR). However, existing approaches rely primarily on visual attention and are limited in capturing fundamental cognitive failures. This study proposes a quantifiable framework that identifies such failures through gaze entropy analysis and links the driver's gaze behavior to accident risk.

Interactive Trailers and Posters that Enhance Viewing Intentions through ‘Breaking the Fourth Wall’

Boyoung Lim
Jusub Kim

The rise of video streaming has shifted video consumption from traditional venues like theaters to mobile and social media platforms. However, promotional strategies have not kept pace—posters and trailers are still used on mobile devices without leveraging their unique capabilities. This paper presents a new approach to boosting viewing intent through interactive engagement. It introduces interactive posters and trailers that break the "Fourth Wall," allowing characters to communicate directly with users. A prototype enabling dialog-based interaction was tested with 33 participants in their 20s and 30s. Results showed that these interactive experiences significantly increased anticipation and intent to watch the film.

PAAP: Performer-Aware Automatic Panning System

Kangeun Lee
Sungyoung Kim

We propose PAAP (Performer-Aware Automatic Panning System), the first system to automatically track performer(s) and generate spatial audio panning data integrated with a Digital Audio Workstation (DAW). The system pipeline consists of three main stages: (1) visual cue analysis via performer tracking and monocular depth estimation, (2) spatial information prediction using a custom algorithm that produces DAW-compatible panning parameters, and (3) integration of industry-standard DAW using embedded script processing. We tested and validated the technical feasibility and real-world applicability including Open Sound Control (OSC) based real-time processing. To our knowledge, this is the first complete study of an automatic panning with DAW and we anticipate PAAP to streamline live and studio music production.

Predicting Accidents in Conditional Autonomous Driving: A Multimodal Approach Integrating Human Misuse, Biometric Indicators, and Spatial Complexity

Eun Hye Jang
Mi Chang
Woojin Kim
Daesub Yoon

Decreased attention, distraction, and complex environments are major contributors to accidents in Level 2 autonomous driving. This study examines how spatial complexity and human factors affect accident risk using scenario-based simulations. We analyzed subjective factors (workload, situation awareness) and biometric data (eye tracking, HRV). Logistic regression identified age, workload, and situation awareness as significant predictors, with 74.2% accuracy (5-fold cross-validation). High spatial complexity increased cognitive load and visual scanning, elevating accident risk. These results support the need for integrated prediction strategies and adaptive driver support systems to enhance safety.

SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Rishit Dagli
Shivesh Prakash
Robert Wu
Houman Khosravani

Generating combined visual and auditory sensory experiences is critical for immersive content. We introduce SEE-2-SOUND, a training-free pipeline that turns an image, GIF, or video into 5.1 spatial audio. SEE-2-SOUND sequentially: (i) segments visual sound sources; (ii) estimates their 3-D positions from monocular depth; (iii) synthesises mono audio for every source; and (iv) renders the mix with room acoustics. Built entirely from off-the-shelf models, the method needs no fine-tuning and runs in zero-shot mode on real or generated media. We demonstrate compelling results for generating spatial audio from videos, images, dynamic images, and media generated by learned approaches. Project page: https://see2sound.github.io/.

Skylight: Real-Time Projection Mapping for Surgical Navigation Leveraging Skin-Adhered Fiducials

Nick Shelton
Enoch Omale

Stroke Imprint: Knitting Reassurance into Anxious Moments

Yuqing Liu
Rinchong Kim
Kyunghee Kim

Imagine if, during moments of heightened anxiety, you could once again feel the gentle, familiar touch of a loved one’s hand. Stroke Imprint is a knitted wearable that simulates stroking sensations to comfort young women experiencing anxiety through pressure sensing and SMA-based actuation. Paired with a digital interface, the glove allows users to record personalized tactile sensation. Through user interviews, design iterations, and user testing, the study demonstrates its potential as an anxiety tracking, therapeutic wearable within a closed biofeedback loop.

When Mud Toys Meet Digital M(B)uddies: How "Play with Earth" Bridges Traditional Craftsmanship and AI-Assisted Creation

Mengyao Guo
Junfeng Meng

"Play with Earth" introduces a novel project that addresses the preservation and innovation of intangible cultural heritage (ICH), with a focus on traditional mud toys from China’s Yellow River. Based on a comprehensive documentation of 15,686 photographs of mud toys and interviews with inheritors, our project achieved an interactive platform combining traditional craftsmanship with AI-assisted creativity.

SESSION: Rendering

Efficient Proxy Raytracer for Optical Systems using Implicit Neural Representations

Shiva Sinaei
Chuanjun Zheng
Kaan Akşit
Daisuke Iwai

Ray tracing is a widely used technique for modeling optical systems, involving sequential surface-by-surface computations which can be computationally intensive. We propose Ray2Ray, a novel method that leverages implicit neural representations to model optical systems with greater efficiency, eliminating the need for surface-by-surface computations in a single pass end-to-end model. Ray2Ray learns the mapping between rays emitted from a given source and their corresponding rays after passing through a given optical system in a physically accurate manner. We train Ray2Ray on nine off-the-shelf optical systems, acheiving positional errors on the order of 1 μm and angular deviations on the order 0.01 degrees in the estimated output rays. Our work highlights the potentials of neural representations as a proxy optical raytracer.

Evaluating Skin Tone Biases in Virtual Human Rendering

Erick Menezes
Helena Leal
JoÃO VÍTor Moura
Victor Araujo
Soraia Raupp Musse

We evaluate skin tone bias in a real-time rendering engine using 80 MetaHumans covering all 10 levels of the Monk Skin Tone (MST) scale. Two color pipelines are compared: MST-RS, which uses standard RGB reference swatches, and MST-CS, based on cheek-sampled RGB values from real photographs. We apply patch-based metrics, median RGB intensity. MST-RS exhibits a smooth, monotonic RGB decline from MST 1 to 10, while MST-CS reveals geometry-sensitive, non-linear variations and gamut compression in darker tones. These differences highlight potential rendering biases and support the need for tone-aware shader validation.

GBake: Baking 3D Gaussian Splats into Reflection Probes

Stephen Pasch
Joel Salzman
Changxi Zheng

The growing popularity of 3D Gaussian Splatting has created the need to integrate traditional computer graphics techniques and assets in splatted environments. Since 3D Gaussian primitives encode lighting and geometry jointly as appearance, meshes are relit improperly when inserted directly in a mixture of 3D Gaussians and thus appear noticeably out of place. We introduce GBake, a specialized tool for baking reflection probes from Gaussian-splatted scenes that enables realistic reflection mapping of traditional 3D meshes in the Unity game engine.

Hide A Bit: A Training-Free and High-Fidelity Steganography Method for 3D Gaussian Splatting Based on Bit Manipulation and RSA Encryption

Kaoru Sasaki
Kazuhito Sato
Shugo Yamaguchi
Keitaro Tanaka
Shigeo Morishima

HyperParamBRDF: Fast Parametric Reflectance via Hypernetworks and Physics-Based Simulation

Abraham Beauferris
Wei Sen Loi

Simulating and fabricating plasmonic nanostructures for specific colors is slow and costly. HyperParamBRDF exploits a hypernetwork to learn a parametric reflectance model from physical parameters. Trained on sparse FDTD data, it infers BRDFs in milliseconds, achieving > 10⁷ × speedup with high fidelity, enabling real-time appearance exploration for complex simulated materials.

Interactive Object Insertion with Differentiable Rendering

Weikun Peng
Sota Taira
Chris Careaga
Yağız Aksoy

Scalable Volume Rendering of Billion-Cell CFD Simulations Using VFX Pipelines

Petr Strakos
Milan Jaros
Tomas Brzobohaty
Marketa Faltynkova
Ondrej Meca
Lubomir Riha

We present a modular, scalable workflow for high-fidelity volume rendering of large-scale CFD simulations. Designed with visual effects (VFX) techniques in mind, our workflow transforms unstructured CFD data into cinematic-quality visuals using parallel voxelization and sparse volume export. By leveraging CyclesPhi renderer and OpenVDB, we deliver performance, scalability, and expressive visualization on HPC infrastructure. Results on two large CFD cases demonstrate significant speedups over traditional tools with support for interactive rendering of volumes.

SurfelPlus: A Surfel-Based Global Illumination Solution Optimized for Low-End Graphics Hardware

Ruipeng Wang
Zhen Ren
Jinxiang Wang

To Infinity and Beyond: a GPU-driven memory sharing pipeline to generate and process infinite synthetic data

Daniele Della Pietra
Gino Lanzo Hahn
Nicola Garau

What if data generation, manipulation, and training could all happen entirely on the GPU, without ever touching the RAM or the CPU? In this work, we present a novel pipeline based on Unreal Engine 5, which allows us to generate, render, and process graphics data entirely on the GPU. By keeping the data stored in GPU memory throughout all the steps, we bypass the traditional bottlenecks related to CPU-GPU transfers, significantly accelerating data manipulation and enabling fast training of deep learning algorithms. Traditional storage systems impose latency and capacity limitations, which become increasingly problematic as data volume increases. Our method demonstrates substantial performance improvements on multiple benchmarks, offering a new paradigm for integrating game engines with data-driven applications. More information on our project page: https://mmlab-cv.github.io/Infinity/

Towards accelerating polarization path tracing of multi-bounce Smith microfacet BSDFs

Hidehito Ohba
Tatsuya Yatagawa
Shigeo Morishima

The polarization state of light is described in a local coordinate frame, where the oscillation of the electronic and magnetic fields occurs. In physics, this frame is rotated according to the surface normal of the object. In this study, we investigate the effect of this frame rotation while evaluating multi-bounce Smith microfacet BSDFs. We show evidence that we can speed up the evaluation if the frame rotation does not significantly matter. Then, we experimentally show the acceleration can be practically feasible.