Uthana presents a real-time character animation system powered by generative AI. Our pipeline enables fully automated motion synthesis across arbitrary 3D characters using four novel components: one-click auto-retargeting, natural language-to-motion generation, interactive diffusion-based motion control, and AI-powered motion stitching. Retargeting is achieved through a learned joint mapping algorithm that adapts source skeletons to target rigs in sub-second time. Motion generation is guided by plain English prompts, producing expressive full-body animations in 6 seconds. Real-time motion control is enabled via a diffusion model optimized for 30 ms responsiveness in the browser using WebGPU. Finally, motion stitching module computes transitions between clips using latent space interpolation. These tools operate with no manual retargeting or keyframing, allowing users an accessible approach to animating characters.
Interfacing with artificial intelligence has evolved from the domain of computer science into an everyday consumer activity. From the advent of virtual assistants to the mass adoption of LLMs for both professional and recreational use, we have welcomed a new population of automated, synthetic personalities into society that is actively transforming how we live.
Advancements in artificial intelligence personalities are of particular utility to the digital creator class, whose livelihood depends largely on online interactions. Creators’ audiences have already crossed the threshold of virtual engagement; these users possess a level of comfort with digital personas that lays the groundwork for the effective AI augmentation of audience interactivity.
Beyond consumers’ growing familiarity with digital personalities, there is also a perceived need for additional automation of streaming content production. The creator economy is becoming increasingly lucrative and competitive, attracting swathes of new talent that may not have the production resources or faculties to create content on par with audiences’ expectations. Simultaneously, more experienced creators are seeking more efficient production practices. Here, we see an opportunity for artificial intelligence to serve a technical utility with its social and entertainment uses, acting as a virtual crewmember for streamers.
Streamlabs has seen the following as the trajectory of future digital creation: a world where AI automation greases the wheels of creativity, assisting creators behind the scenes while also becoming an integral part of the show.
This vision informs our work with NVIDIA and Inworld AI to create the Streamlabs Intelligent Streaming Agent: an AI-powered co-host, producer, and technical assistant for digital creators. After debuting the Agent at CES 2025, we have now prepared an updated technical demonstration to showcase the Agent's depth of functionality.
In this demonstration, we begin by customizing the Agent's avatar, showcasing the process of selecting its appearance, style, and personality traits. We examine the relationship between the user interface (UI) design and fine-tuning the Agent's personality. We then configure automations through the Streamlabs app, programming the following game-reactive and production assistant functions: comment reactions to the streamer getting ‘eliminated’ in a game; showing an image; unhiding a source; and switching scenes. Throughout, we emphasize flexibility and ease of use, given our focus on tool-building for streamers of all experience levels.
Descendant advances cultural heritage digital conservation through a Guangdong embroidery case study. establishing a workflow that digitizes endangered the embroidery artifacts and related rituals via cross-platform integration of real-time tools: Blender, OpenXR-based VR, and WeChat. This integration bridges preservation and cultural storytelling—democratizing heritage revival for designers, enabling community memory-making, and establishing scalable digital-legacy frameworks. By reconfiguring ritual as computational material, Descendant demonstrates real-time systems’ potential to sustain cultural continuity.
DJESTHESIA uses tangible interaction to craft real-time audiovisual multimedia, blending sound, visuals, and gestures into a unified live performance. The project supports four interaction modes: I) Knob changes music, where standard DJing is performed. II) Music changes visuals, where changes in the audio parameters done through the mixer have a direct impact in the visualizations representing the music (e.g., color palette). III) Gesture changes visuals, where gestures and body movements give the possibility to interact physically with the visual representation of the music (e.g., grab, release, throw). IV) Gesture changes music, where, gestures can convey information to an audio composition software to alter aspects of the music being played (e.g., EQs). The aim of DJESTHESIA is to transform the DJ into both a performer and a performance.
This paper presents a real-time system combining AI image generation with laser installations for live audiovisual performance. Using StreamDiffusion integrated with TouchDesigner, we generate imagery and convert it to laser-traceable paths while working within graphics hardware and laser display constraints. The system includes feedback loops between AI generation, performer control, and physical laser output, creating an interactive visual instrument for live musical performance.
We present InfiniteStudio, the first 4D volumetric capture system that meets the visual fidelity requirements for professional-grade video production. Building upon innovations in 4D Gaussian Splatting, InfiniteStudio reduces production time while unlocking unprecedented creative freedom during post-production. The system harnesses 4D Gaussian Splatting to maintain production-level visual quality, support long-duration content, and achieve high compression rates of 80–120 Mbps. InfiniteStudio revolutionizes film production by eliminating 20-30% of shooting time traditionally spent capturing multiple angles and reducing retakes by 5-10%. It empowers filmmakers with complete creative freedom for virtual camera movements, slow-motion effects, and advanced background replacement, paving the way for next-generation interactive media and immersive spatial storytelling.
We present Miegakure, a four-dimensional puzzle-platformer game and comprehensive exploration of a 4D interactive world. In the game, players navigate through four-dimensional space to solve spatial puzzles impossible in lower dimensions. We explain our chosen representation method for displaying 4D space on traditional 2D screens using dimensional analogy. We discuss the game’s technical implementation of (1) 4D meshes built from tetrahedra rather than triangles, which are sliced to produce 3D triangles for display (2) Procedurally modeling 4D objects and embedding 3D objects in 4D by extruding them and modeling their inside. (3) Texturing meshes procedurally: 4D objects have 3D surfaces and hence 3D textures (or 4D textures if use solid texturing is used).
We explore interactive painting on 3D Gaussian splat scenes and other surfaces using 3D Gaussian splat brushes, each containing a chunk of realistic texture-geometry that make capture representations so appealing. The suite of brush capabilities we propose enables 3D artists to capture and then remix real world imagery and geometry with direct interactive control. We also present an ensemble of artistic brush parameters, resulting in a wide range of appearance options for the same brush. Our contribution is a judicious combination of algorithms, design features and creative affordances, that together enable the first prototype implementation of interactive brush-based painting with 3D Gaussian splats. Code and data can be accessed from splatpainting.github.io.
Live physical whole-puppet performances can be used to drive digital animation characters and creatures via puppix, a new capture system. The benefits of having a live puppet character interacting in the room with audience, actors, directors and other characters are demonstrated as well as some practical processes of capturing non-human physicalities.
Physical puppets allow directors and actors to work with non-human characters with the same flexibility, freedom and immediacy as human actors. Capturing these performances means non-human digital characters can work alongside and be directed like physical actors.
Unlike keyframe animation, capture of in-the-moment live performance allows real-world weight, physicality and movement transfer to digital twins. This also disrupts the limitations of human-based motion capture systems and the bulk of learning model training data sets, whose movements are originally from human physicality.
The origins of motion capture as a whole come from the technology of puppetry and animatronics. Performance armatures and rigs like Dinosaur Input Device, Sil and Hensons’ Waldo operate as control systems for digital performances, with director focus on digital output screens.
puppix, a whole-puppet capture system, keeps the performance focus on the character in the room, not on the screen.
At present, reference puppets are being used on-set as placeholders for CG characters, providing lighting, position, interaction reference, whilst actors interact with the reference puppeteers’ performances.
puppix allows the full reference puppet performance to be motion captured. Secondary movements and whole body physicality match the digital characters and transfer to the digital character for free. Director and performers focus in the room, whilst creating digital animated performances. This is a tool like human-based motion capture, but for non-human characters and creatures.
Our Real Time Live! segment demonstrates the practicalities of operating a capture puppet on-set during a shoot, allowing interaction with presenter / director and audience, using a puppix capture puppet live-performed by puppeteers during the presentation.
Desmos 2D and 3D calculators are browser-based tools that enable real-time experimentation with graphics defined by math. The software is all browser-based with purely client-side computation, which lowers the barrier for both creating and sharing. We employ a series of features to enable complicated computation to work natively and performantly in the browser, as well as a real-time rendering pipeline specifically for mathematics. We combine these technological building blocks into a system that allows users to build in real-time with graphics defined by math itself, which lowers the barrier to entry relative to more abstract languages and systems. This browser-first approach has enabled artists and engineers, from students to professionals, to make extraordinary creations using just math and a browser.
NVIDIA’s RTX Mega Geometry is a technology that accelerates Bounding Volume Hierarchy (BVH) building, enabling path tracing of scenes with up to 100x more triangles. With our cluster-based renderer, we introduce several improvements to existing methods and APIs that enable:
• |
streaming of Continuous-LOD geometry [Karis 2021] |
||||
• |
dynamic tessellation using a new "slanted" pattern |
||||
• |
animated subdivision surfaces [Brainerd et al. 2016] |
This Real-Time Live! demonstration presents Machine-Guided Spatial Sensing, a novel framework for real-time, operator-in-the-loop environmental measurement using augmented reality (AR). The system integrates active learning with spatialized visual guidance to enable non-expert users to perform high-accuracy sampling of complex environmental fields—such as airflows. During the demonstration, an operator equipped with a head-mounted display and a handheld sensor is guided through the measurement process via dynamic AR overlays. A live data model continuously assimilates measurements, estimates the underlying physical field, and quantifies uncertainty. Based on this evolving model, the system generates spatial cues that direct the user to the most informative sampling locations, effectively transferring domain expertise to the algorithm and streamlining the measurement process. This demonstration highlights the system’s capacity to render previously invisible environmental phenomena directly into the user’s field of view, enabling intuitive exploration and high-fidelity reconstruction. The approach generalizes across sensor modalities, making it adaptable for applications in scientific research, environmental monitoring, and engineering diagnostics. By showcasing interactive spatial sensing in a live performance setting, this work illustrates the transformative potential of human-machine teaming in real-world data acquisition.