We propose an algorithm to interactively design freeform balancing toys that stably balance on a single point of contact. We achieve this by positioning the center of mass outside the model’s surface while deforming the external surface. Our approach relies on a simple energy function that is fast to evaluate and optimize, allowing an interactive design process. The results confirm the feasibility of creating stable balancing toys via standard 3D printing, expanding the possibilities for mechanical design.
Compositing a background image and a foreground image produced from a 3D object requires a projection function that ensures consistency in the scene. We modified the generalized projection of [Yoshimura and Saito 2017] to allow tilted-angle images and introduced a user interface, Manu-Grid, to estimate the parameters of the projection function corresponding to the drawing method used in a background image. The interface has a useful characteristic that when a user manipulates a vanishing direction, a vanishing line, and a reference point on the ground, the others are pinned.
We introduce Minecraft to 3D, a novel pipeline that automatically converts any Minecraft world into a high-quality polygonal scene. A 3D convolutional network recognises Minecraft’s default objects, the block surface is resampled into a smooth height‑map, and each recognised object is substituted with a high‑quality 3D model chosen from an external library. Object locations, orientations, and tags are preserved, a separate water plane is exported for engine‑level ocean rendering, and the final scene opens natively in modern 3D engines. The pipeline processes a one‑square‑kilometre world in under three minutes on a single consumer GPU, enabling educators, indie developers, and artists to move rapidly from voxel sketches to fully lit environments.
3D reassembly needs both tight alignment and a collision-free insertion path, but most methods enforce only the former. We recast the task as constrained packing, apply FFT correlation for coarse placement, and prune unreachable poses with a flood-fill path test on the correlation map. A final ICP stage then maximizes surface alignment (instead of minimizing contact as in classic packing). Assuming a known target envelope, this path-aware spectral pipeline yields high-fidelity, physically valid reconstructions, although accuracy still depends on the initial pose sampling.
With the advancement of conversational AI, research on bodily expressions, including gestures and facial expressions, has also progressed. However, many existing studies focus on photorealistic avatars, making them unsuitable for non-photorealistic characters, such as those found in anime. This study proposes methods for expressing emotions, including exaggerated expressions unique to non-photorealistic characters, by utilizing expression data extracted from comics and dialogue-specific semantic gestures. A user study demonstrated significant improvements across multiple aspects when compared to existing research.
We introduce a two-stage pipeline that gives artists fine-grained, input-level control of audio-driven 3D facial animation. Stage 1 learns a latent relative-motion prior from neutral/offset position maps, confining deformations to realistic shapes. Stage 2 projects an explainable phoneme–prosody vector into this space, so visemes and expressions are editable in feature space. Early experiments show preserved lip-sync and natural motion, narrowing the gap between fidelity and control.
We propose Dynamic Skinning (DS), an extension of rig skinning which exhibits the appearance of a physical phenomena without the need for simulation. Our approach applies offsets from traditional skinning to produce these effects based on time-delayed, filtered joint motion. We showcase a number of effects including 1) time-varying oscillation and 2) time delay across skeletal bones to produce what we call delayed linear blend skinning (dLBS) directly on skinning computation. Our approach is easy to control by artists with simple input parameters and the method is compatible with standard rigged characters.
Ants exhibit unique abilities to self-assemble into animate, living structures. Such structures display properties of both fluid and solid-like, deformable materials. Despite much progress in our understanding of ant aggregation dynamics, simulating such phenomena has been largely overlooked in real-time graphics and animation applications. We present a constraints-based approach for simulating the collective dynamics of ants. We demonstrate ant collective behaviors interactively with compelling physical realism.
This project explores the role of the designer in digital fabrication workflows as digitization leads to higher levels of design automation. As digital technologies are adopted to streamline design to manufacturing workflows, elements of the creative process can become standardized to improve production efficiency at the cost of designer autonomy and product customization. In order to ensure designers’ agency and increase product variation, the Carrara project presents a collaborative tool utilizing agent-based modeling (ABM) to represent designers, fabrication machines, and algorithms as active co-participants in the design process. This co-participatory workflow enables a generative, scalable product line that takes advantage of digital efficiencies while providing the designer with autonomy and control in the creative process.
"Digitizing Devotion" utilizes advanced oblique photography and AI to create immersive virtual reconstructions of sacred spaces, preserving traditional worship practices for the global diaspora while ensuring cultural continuity across generations and geographical boundaries.
This paper introduces Dust in Time, a tangible and embodied art installation that allows the audience to interact with physical hourglasses and virtual particles through embodied gestures and motions. We describe the design concept and technical details of this installation. Through this conceptual tangible interactive installation, we aim to explore how tangible and embodied interaction can be used to represent the concepts of time and its relationship with human beings and promote both explicit and implicit interaction.
The animated short film Sensual explores a novel workflow for hand-painted watercolor animation, blending traditional artistic methods with AI-based frame interpolation techniques. By combining compositing with the Real-Time Intermediate Flow Estimation (RIFE) image interpolation network, we significantly reduced production time while maintaining the unique hand-painted aesthetic.
Lutruwita/Tasmania's island conditions are often misperceived as isolated and unchanging. Building on Giada Peterle's concept of auto-cartography, this paper explores Tasmania's dynamic island identity through human-AI mapping. This is achieved through the creation of haunting horizons, an interactive installation powered by a customised generative AI model. By translating my island experience into a training dataset, this work positions human-AI auto-cartography as an embodied, affective process, enabling artists and participants to engage with maps and reflect on their relations to place in new ways.
In this study, we propose an experience inspired by the Anywhere Door concept, in which users transition between multiple life-sized projected virtual spaces by opening, closing, and passing through a physical door. We demonstrate that this approach not only enhances the entertainment value of the visual experience but also increases the sense of immersion in the destination virtual space.
This work presents DiversePuppetry, an immersive and asymmetric puppetry interaction system that integrates a virtual reality head-mounted display (VR-HMD), a mixed reality head-mounted display (MR-HMD), and a CAVE Automatic Virtual Environment (VR-CAVE). In this project, traditional Taiwanese Budaixi puppets were digitized and incorporated into diverse forms of immersive experiences. This study explores an interactive and immersive platform for puppetry through multiple modes of control. The findings highlight the potential of asymmetric immersive interaction, offering puppetry culture a novel way of creating a complete and immersive digital experience.
Creating interactive 3D scenes often requires technical expertise and significant time, limiting accessibility for non-experts. To address this, we present DreamCraft, a VR system enabling users to intuitively generate and edit interactive 3D environments from panoramas without professional skills. DreamCraft supports panorama generation, interactive object selection, panorama editing, and 3D reconstruction. By combining techniques like 3D Gaussian Splatting (3DGS), object segmentation, and 2D-to-3D conversion, it streamlines immersive scene creation. A user study confirmed its usability, ease of learning, and creative potential, positioning DreamCraft as a step toward accessible 3D content creation.
Enhanced Auditory Reality Simulation for Improved Mapping (EARSIM) is a stand-alone, virtual reality (VR) application that procedurally configures a multi-sensory cue system to deliver adaptive auditory localization tasks. In a pilot study, twenty-one participants completed three 40-second sessions with progressively increasing sensory cues. Median localization accuracy decreased monotonically as the number of cues increased, suggesting that the dynamic cue system was effective in modulating task difficulty. These results validate EARSIM as a configurable platform and its potential for future clinical applications in auditory rehabilitation.
Although real-time fluid simulation in virtual environments has been widely explored, existing systems often rely on virtual models and predefined parameters, limiting their ability to capture the complexity of physical water flow. To address this, we propose a marker-based VR system that simulates water surface dynamics by tracking real-world water flow using ArUco markers. The system analyzes floating marker trajectories to generate a FlowMap, which is applied to a virtual water surface in Unity for real-time flow simulation. A controllable circular pool with water-jet units was used to create varying flow conditions, and computer vision techniques converted the data into directional vector fields. The FlowMap is continuously updated and interpolated to reduce visual lag. We implemented the prototype in a VR environment and verified the accuracy of the generated flow patterns. Results demonstrate the potential of this sensor-driven approach for realistic water simulations in immersive VR.
This paper presents two novel teleportation methods for VR environments that address limitations of conventional parabola-based approaches when navigating varying heights. The SphereBackcast and Penetration methods utilize straight-line specification for intuitive movement to elevated locations. Experiments with 22 participants showed our methods significantly outperformed parabola-based teleportation for height differences above 2m, while maintaining comparable performance on flat terrain. NASA-TLX and SUS evaluations confirmed improved usability and reduced cognitive load, indicating these methods can be readily integrated into existing VR applications.
This poster introduces the INT-ACT project which aims to investigate the use of immersive XR environments for presenting the emotional, experiential and environmental dimensions of Intangible Cultural Heritage (ICH) associated with tangible cultural heritage sites. It also presents a mobile XR demonstrator, developed as part of INT-ACT, that focuses on the ICH related to a megalithic site.
Cheerleading stunts are group gymnastics performed by multiple people. As the skills involved become more challenging, it is necessary to devise better practice methods. Thus, in this paper, we propose a pretraining support system for cheerleading stunts using Virtual Reality (VR) technology. This system enables users to experience successfully performing a stunt in the virtual space by adopting the viewpoints of the cheerleaders performing various types of stunts. We evaluated our system through interviews with two experts. Our system has the potential to meaningfully augment the established training method of previsualization of stunts.
This paper introduces SugART, a Mixed Reality (MR) project that enables users to learn and recreate traditional sugar painting at home. By combining hand tracking, virtual guidance, and real-time feedback, our project supports creative expression and cultural education, thereby lowering barriers to participation in intangible cultural heritage through accessible and interactive digital experiences.
The Gesture Lives On is a real-time VR performance system that reimagines traditional Taiwanese glove puppetry through immersive, interactive means. Rooted in precise gestural movement, this art form faces decline amid shifting audience engagement. Using VR-based gesture recognition, a performer co-creates a digital duet with a virtual puppet. This hybrid space transforms tradition into contemporary expression, offering audiences a new way to experience puppetry within a shared virtual–physical environment.
This study investigates the effective range of the weight illusion induced by AR visual effects displayed on the arm. The results show that AR visual effects on the arm can create a “strong” impression and that using such visual effects can induce a weight illusion in which weights ranging from 100 g to 500 g are perceived as lighter when lifted with the arm augmented by these visual effects.
You Can Grow Here is an immersive VR experience developed for the CAVE2™ environment, aligning with the UN Sustainable Development Goal of Good Health and Well-Being. In response to the mental health challenges intensified by the COVID-19 pandemic, the project explores how interactive storytelling, ambient sound, and 3D typography can support emotional reflection and teach anxiety coping strategies. Built in Unity with custom assets from Blender and Maya, the experience differs from most clinical VR programs, allowing users to independently explore emotions, manage anxiety, and practice evidence-based calming techniques within a safe, narrative-driven space that builds emotional resilience.
In this study, we independently developed a knowledge system named uNEEDXR™, successfully realizing a full-color micro-OLED device with a brightness of 60,000 nits on a silicon-based backplane. This knowledge system was developed over nine years and innovatively know-how and integrates expertise across design, processes, manufacturing, equipment, and material, overcoming the performance limitations of traditional micro-OLED architectures. The system enables high brightness, high pixel density, low power consumption, high contrast ratio, high color saturation, and tunable energy distribution (including viewing angle, wavelength, and bandwidth). Additionally, it meets customer requirements for reliability and lifespan. This technology provides a scalable production solution for near-eye display applications in augmented reality.
Direct-view 3D displays enable immersive experiences but often cause visual discomfort from relying on binocular disparity alone. Holographic displays offer an ultimate solution by reconstructing the full light wavefront, but face scalability and viewing freedom limitations due to the spatial-bandwidth product and high cost of fine-pitch phase-only SLMs. We present a system combining an amplitude-only display with ultra-high pixel count and dynamic optical steering for a fully 3D eye box. By axially translating a lens, we expand the eye box in depth beyond the capabilities of conventional pupil-steering methods. We further extend SGD-based hologram optimization to support dual light sources and an amplitude-only SLM, enabling stereoscopic delivery with suppressed crosstalk. Our prototype shows accurate depth cues, paving the way for scalable, high-quality holographic displays with expanded viewing freedom.
In this study, we propose a novel Maxwellian optics that achieves a wide field of view and evaluate its effectiveness through 2D and 3D simulations. The fundamental principle of the proposed system is based on Maxwellian view, a form of retinal projection that can mitigate the vergence–accommodation conflict. Conventional Maxwellian optics typically employ a pinhole to focus light on a single point within the pupil, enabling the projection of sharp images onto the retina with a wide field of view, independent of the eye's accommodation. However, a major limitation of such systems is the disappearance of the image when the eye rotates and the convergence point shifts outside the pupil. To address this issue, we propose a novel Maxwellian optical system that combines a spherical multi-pinhole (SMP) with a transmissive mirror device (TMD).
We propose a naked-eye stereoscopic display with an ultra-wide viewing zone by applying the display principle of general LCDs. By replacing the polarizer of an LCD with a reflective polarizer and arranging them three-dimensionally, this technology refracts light rays freely and enables an expansion of the viewing zone. In this study, we created a prototype and confirmed that the viewing zone expanded.
An infinity mirror is an optical novelty that uses facing mirrors – at least one of which is partially transparent to allow viewing – to present the appearance of an infinite tunnel of copies of a scene. One limitation of infinity mirrors is that alternate reflections of the scene are – by necessity – reflected, which means one cannot create, e.g., speed tunnel effects where lights chase into or out of the apparent tunnel. I present a prototype infinity mirror that uses light cells to overcome this limitation. These cells have a different appearance when viewed from the front and back, apparently breaking the symmetry between the primary and reflected versions of the scene. The cells are made from a 3D-printed baffle and diffuser and lit with off-the-shelf programmable LED strips, resulting in an overall inexpensive-to-produce design. In this poster I discuss the construction of my prototype infinity mirror, demonstrate some simple speed tunnel effects, and discuss the design trade-offs in my simple light-cell design.
We present a real-time algorithm for driving multispectral LED lights in a spherical lighting reproduction stage to achieve optimal color rendition for a dynamic lighting environment. Previous work has driven multispectral LED lights (ours include red, green, blue, white, and amber LEDs) by solving a nonnegative least squares (NNLS) problem for each light source; the solution ensures that each light appears to be the correct RGB color seen by the camera and also optimizes how closely the lights illuminate a color chart to appear as it should in the target lighting environment. We create a real-time version of this technique by pre-computing a lookup table of these NNLS solutions across the full range of input RGB values. Since the proper relative mix of LEDs depends on chrominance and not on luminance, our lookup table can be reduced to 2D saving both storage and computation. With this technique, we can drive several thousand multispectral LED lights at video frame rates with proper color matching and color rendition for a dynamic lighting environment.
In this study, we propose a smartphone-based wide field-of-view HMD by expanding the display area using inexpensive mirrors and lenticular lenses. Lenticular lenses placed on the both edge of the display convert these areas into the multi-view displays. The expansion of the display area is achieved by observing multi-view images via the properly placed mirrors.
We propose a new automatic colorization method for anime line drawings using segment matching with a few reference images. To address the limitations of existing segment matching methods in handling large motion gaps or small regions, the authors introduce patch-based few-shot colorization and a color shuffling process to estimate candidate colors for subsequent segment matching. This addresses the nonlinear movements that is unique to anime, and optical flow estimation struggles with. The paper demonstrates that the proposed method improves accuracy compared to the state-of-the-art segment matching method.
We evaluate the performance of four common learned models utilizing INR and VAE structures for compressing phase-only holograms in holographic displays. The evaluated models include a vanilla MLP, SIREN [Sitzmann et al. 2020], and FilmSIREN [Chan et al. 2021], with TAESD [Bohan 2023] as the representative VAE model. Our experiments reveal that a pretrained image VAE, TAESD, with 2.2M parameters struggles with phase-only hologram compression, revealing the need for task-specific adaptations. Among the INR s, SIREN with 4.9k parameters achieves \(\%40\) compression with high quality in the reconstructed 3D images (PSNR = 34.54 dB). These results emphasize the effectiveness of INR s and identify the limitations of pretrained image compression VAE s for hologram compression task.
We introduce a pipeline for interpreting Ancient Egyptian hieroglyphic texts combining OCR, transliteration, and translation. Designed for the low-resource data, our system improves accessibility for learners and efficiency for researchers. We evaluate its performance on a new diverse dataset reflective of real-world conditions.
In hand-drawn anime production, automatic colorization is used to boost productivity, where line drawings are automatically colored based on reference frames. However, the results sometimes include wrong color estimations, requiring artists to carefully inspect each region and correct colors—a time-consuming and labor-intensive task. To support this process, we propose a confidence estimation method that indicates the confidence level of colorization for each region of the image. Our method compares local patches in the colorized result and the reference frame.
This study contrasts two generative AI (GenAI) workflows (Figure 1) addressing visual and character consistency and introduces a filmmaker-oriented framework for AI-assisted production, grounded in two practice-based short films.
This paper presents a compact handheld type holographic video camera system capable of capturing real-time, full-color complex hologram videos under natural lighting conditions. By integrating a geometric phase lens with a polarization image sensor, our system captures interference patterns without requiring specialized lighting or bulky equipment. We successfully apply conventional 2D video super-resolution techniques to the complex holograms, significantly enhancing both resolution and visibility while preserving digital refocusing capabilities. Our experimental results demonstrate that this approach satisfies three critical requirements for practical modern cameras: operation under incoherent lighting, robustness to mobile shooting condition, and compact design. This work represents an advancement toward practical holographic media applications, particularly for broadcast content production in extended reality and mixed reality environments.
Rendering near-final quality previews requires a great number of samples per pixel. Recently, diffusion models have shown superior denoising capabilities, but suffer from large variance. This is further amplified by the spatial and temporal inconsistencies introduced by diffusion models. In our pipeline, we propose the use of multiple control features and forward projections to denoise 1 sample per pixel frames & extrapolate a high-quality frame to generate a consistent and controllable sequence of high-quality frames.
QRBTF generates artistic QR codes that maintain machine readability while enhancing visual appeal. Our method integrates diffusion models with ControlNet conditioning and adaptive brightness control. Experimental results demonstrate effective brightness contrast control in specific image regions and robust model migration capabilities. Key innovations include: (1) Ternary luminance quantization mapping QR modules to control signals; (2) Style-adaptive generation using LoRA embeddings; (3) Post-processing optimization. The system has generated over 600,000 codes via qrbtf.com , validating its utility in branding and digital marketing applications.
This work presents a pipeline to convert rasterized graphic design posters into multi-layered, editable digital assets. It decomposes the input poster into core elements, categorizes them, and converts them into semantically meaningful formats. A novel strategy using Z-index addresses layer ordering and overlap. The pipeline’s accuracy was evaluated by comparing over 24,000 original and reconstructed posters of multiple widely used sizes and aspect ratios in print & digital media. Layer semantic accuracy was assessed using the LLaVA-7B model, which showed high confidence scores across image, text, and shape layers. A user-centered evaluation with 20 participants resulted in high satisfaction ratings, confirming the pipeline’s ability to accurately reproduce poster designs with excellent fidelity, layout, and overall quality. This pipeline contributes a refined approach to reconstructing rasterized graphic design posters, advancing beyond existing methods.
Layout-aware text-to-image generation allows users to synthesize images by specifying object positions through text prompts and layouts. This has proven useful in a variety of creative fields such as advertising, UI design, and animation, where structured scene control is essential. In real-world workflows, however, certain regions are often intentionally left empty—for instance, for headlines in advertisements, buttons in interface prototypes, or subtitles and speech bubbles in animation frames. Existing models lack the ability to explicitly preserve such negative spaces, often resulting in unwanted content and complicating downstream editing. We introduce Space-Controllable Text-to-Image Generation, a task that treats reserved areas as first-class constraints. To address this, we propose SAWNA (Space-Aware Text-to-Image Generation), a training-free diffusion framework that injects nonreactive noise into user-defined masked regions, ensuring they remain empty throughout generation. Our method maintains semantic integrity and visual fidelity without retraining and integrates seamlessly into layout-sensitive workflows in design, advertising, and animation. Experiments demonstrate that SAWNA reliably enforces spatial constraints and improves the practical usability of generated content.
While there are many techniques (e.g., QR codes) that convey information via visual patterns, many applications would benefit from having those codes be imperceptible to the human eye. We present a method for designing subtle code-conveying patterns that can be printed on transparent sticker paper, then applied to real-world surfaces. An image of a scene with an encoded sticker can be sent through our localization and decoding modules, where the sticker subsection is robustly localized and decoded. We jointly optimize the encoding, localization, and decoding modules end to end, taking into account both imperceptibility and accuracy. Notably, we also account for human error when placing stickers, as pixel-perfect alignment is not something that can be reliably expected. Our model encodes and decodes 100-bit secrets, which, with BCH error correction, means that a sticker could encode 56 data bits with 40 parity bits. Experimental results show that this method is robust to sticker placement errors while being easy to deploy in the real world.
Super-resolution (SR) is crucial for delivering high-quality content at lower bandwidths and supporting modern display demands in VR and AR. Unfortunately, state-of-the-art neural network SR methods remain computationally expensive. Our key insight is to leverage the limitations of the human visual system (HVS) to selectively allocate computational resources, such that perceptually important image regions, identified by our low-level perceptual model, are processed by more demanding SR methods, while less critical areas use simpler methods. This approach, inspired by content-aware foveated rendering [Tursun et al. 2019], optimizes efficiency without sacrificing perceived visual quality. User studies and quantitative results demonstrate that our method achieves a reduction in computational requirements with no perceptible quality loss. The technique is architecture-agnostic and well-suited for VR/AR, where focusing effort on foveal vision offers significant computational savings.
Our training-free method enables photorealistic facade editing by combining hierarchical procedural structure control with diffusion models. Starting from a facade image, we reconstruct, edit, and guide generation to produce high-fidelity, photorealistic variations. The method ensures structural consistency and appearance preservation, demonstrating the power of symbolic modeling for controllable image synthesis.
Cloud gaming is increasingly popular. A challenge for cloud provider is to efficiently operate their datacenters, i.e., keep datacenter utilization high: a non-trivial task due to application variety. Cloud datacenter resources are also diverse, e.g., CPUs, GPUs, NPUs. We propose player-level isolation to address this challenge. We implemented such an isolation mechanism in Open 3D Engine (O3DE) with Capsule. Capsule allows multiple players to efficiently share one GPU. It is efficient because computation can be reused across players. Our evaluations show that Capsule can increase datacenter resource utilization by accommodating up to 2.25 × more players, without degrading player gaming experience. Capsule is also application agnostic. We ran four applications on Capsule-based O3DE with no application changes. Our experiences show that Capsule design can be adopted by other game engines to increase datacenter utilization across cloud providers.
Distance management is a crucial component of immersive combat sports training. However, limited research has explored the distance management in virtual reality (VR) combat training. Our preliminary study invited professional boxers to engage with a VR combat training system, aiming to evaluate the effects of encountered-type haptic feedback via tracking system. The results indicate that haptic feedback led to shorter punch distances and a lower movement ratio. However, no significant differences were observed in step count or the average distance to the opponent. These findings suggest that haptic feedback supports more efficient distance management, allowing users to move less while maintaining effective positioning.
Understanding the driver's cognitive state is critical in conditionally autonomous driving, particularly when responding to Take-Over Requests (TOR). However, existing approaches rely primarily on visual attention and are limited in capturing fundamental cognitive failures. This study proposes a quantifiable framework that identifies such failures through gaze entropy analysis and links the driver's gaze behavior to accident risk.
The rise of video streaming has shifted video consumption from traditional venues like theaters to mobile and social media platforms. However, promotional strategies have not kept pace—posters and trailers are still used on mobile devices without leveraging their unique capabilities. This paper presents a new approach to boosting viewing intent through interactive engagement. It introduces interactive posters and trailers that break the "Fourth Wall," allowing characters to communicate directly with users. A prototype enabling dialog-based interaction was tested with 33 participants in their 20s and 30s. Results showed that these interactive experiences significantly increased anticipation and intent to watch the film.
We propose PAAP (Performer-Aware Automatic Panning System), the first system to automatically track performer(s) and generate spatial audio panning data integrated with a Digital Audio Workstation (DAW). The system pipeline consists of three main stages: (1) visual cue analysis via performer tracking and monocular depth estimation, (2) spatial information prediction using a custom algorithm that produces DAW-compatible panning parameters, and (3) integration of industry-standard DAW using embedded script processing. We tested and validated the technical feasibility and real-world applicability including Open Sound Control (OSC) based real-time processing. To our knowledge, this is the first complete study of an automatic panning with DAW and we anticipate PAAP to streamline live and studio music production.
Decreased attention, distraction, and complex environments are major contributors to accidents in Level 2 autonomous driving. This study examines how spatial complexity and human factors affect accident risk using scenario-based simulations. We analyzed subjective factors (workload, situation awareness) and biometric data (eye tracking, HRV). Logistic regression identified age, workload, and situation awareness as significant predictors, with 74.2% accuracy (5-fold cross-validation). High spatial complexity increased cognitive load and visual scanning, elevating accident risk. These results support the need for integrated prediction strategies and adaptive driver support systems to enhance safety.
Generating combined visual and auditory sensory experiences is critical for immersive content. We introduce SEE-2-SOUND, a training-free pipeline that turns an image, GIF, or video into 5.1 spatial audio. SEE-2-SOUND sequentially: (i) segments visual sound sources; (ii) estimates their 3-D positions from monocular depth; (iii) synthesises mono audio for every source; and (iv) renders the mix with room acoustics. Built entirely from off-the-shelf models, the method needs no fine-tuning and runs in zero-shot mode on real or generated media. We demonstrate compelling results for generating spatial audio from videos, images, dynamic images, and media generated by learned approaches. Project page: https://see2sound.github.io/.
Imagine if, during moments of heightened anxiety, you could once again feel the gentle, familiar touch of a loved one’s hand. Stroke Imprint is a knitted wearable that simulates stroking sensations to comfort young women experiencing anxiety through pressure sensing and SMA-based actuation. Paired with a digital interface, the glove allows users to record personalized tactile sensation. Through user interviews, design iterations, and user testing, the study demonstrates its potential as an anxiety tracking, therapeutic wearable within a closed biofeedback loop.
"Play with Earth" introduces a novel project that addresses the preservation and innovation of intangible cultural heritage (ICH), with a focus on traditional mud toys from China’s Yellow River. Based on a comprehensive documentation of 15,686 photographs of mud toys and interviews with inheritors, our project achieved an interactive platform combining traditional craftsmanship with AI-assisted creativity.
Ray tracing is a widely used technique for modeling optical systems, involving sequential surface-by-surface computations which can be computationally intensive. We propose Ray2Ray, a novel method that leverages implicit neural representations to model optical systems with greater efficiency, eliminating the need for surface-by-surface computations in a single pass end-to-end model. Ray2Ray learns the mapping between rays emitted from a given source and their corresponding rays after passing through a given optical system in a physically accurate manner. We train Ray2Ray on nine off-the-shelf optical systems, acheiving positional errors on the order of 1 μm and angular deviations on the order 0.01 degrees in the estimated output rays. Our work highlights the potentials of neural representations as a proxy optical raytracer.
We evaluate skin tone bias in a real-time rendering engine using 80 MetaHumans covering all 10 levels of the Monk Skin Tone (MST) scale. Two color pipelines are compared: MST-RS, which uses standard RGB reference swatches, and MST-CS, based on cheek-sampled RGB values from real photographs. We apply patch-based metrics, median RGB intensity. MST-RS exhibits a smooth, monotonic RGB decline from MST 1 to 10, while MST-CS reveals geometry-sensitive, non-linear variations and gamut compression in darker tones. These differences highlight potential rendering biases and support the need for tone-aware shader validation.
The growing popularity of 3D Gaussian Splatting has created the need to integrate traditional computer graphics techniques and assets in splatted environments. Since 3D Gaussian primitives encode lighting and geometry jointly as appearance, meshes are relit improperly when inserted directly in a mixture of 3D Gaussians and thus appear noticeably out of place. We introduce GBake, a specialized tool for baking reflection probes from Gaussian-splatted scenes that enables realistic reflection mapping of traditional 3D meshes in the Unity game engine.
Simulating and fabricating plasmonic nanostructures for specific colors is slow and costly. HyperParamBRDF exploits a hypernetwork to learn a parametric reflectance model from physical parameters. Trained on sparse FDTD data, it infers BRDFs in milliseconds, achieving > 107 × speedup with high fidelity, enabling real-time appearance exploration for complex simulated materials.
We present a modular, scalable workflow for high-fidelity volume rendering of large-scale CFD simulations. Designed with visual effects (VFX) techniques in mind, our workflow transforms unstructured CFD data into cinematic-quality visuals using parallel voxelization and sparse volume export. By leveraging CyclesPhi renderer and OpenVDB, we deliver performance, scalability, and expressive visualization on HPC infrastructure. Results on two large CFD cases demonstrate significant speedups over traditional tools with support for interactive rendering of volumes.
What if data generation, manipulation, and training could all happen entirely on the GPU, without ever touching the RAM or the CPU? In this work, we present a novel pipeline based on Unreal Engine 5, which allows us to generate, render, and process graphics data entirely on the GPU. By keeping the data stored in GPU memory throughout all the steps, we bypass the traditional bottlenecks related to CPU-GPU transfers, significantly accelerating data manipulation and enabling fast training of deep learning algorithms. Traditional storage systems impose latency and capacity limitations, which become increasingly problematic as data volume increases. Our method demonstrates substantial performance improvements on multiple benchmarks, offering a new paradigm for integrating game engines with data-driven applications. More information on our project page: https://mmlab-cv.github.io/Infinity/
The polarization state of light is described in a local coordinate frame, where the oscillation of the electronic and magnetic fields occurs. In physics, this frame is rotated according to the surface normal of the object. In this study, we investigate the effect of this frame rotation while evaluating multi-bounce Smith microfacet BSDFs. We show evidence that we can speed up the evaluation if the frame rotation does not significantly matter. Then, we experimentally show the acceleration can be practically feasible.