Prakash: Lighting-Aware Motion Capture for Dynamic Virtual Sets

This project demonstrates new methods of flexible scene capture (including motion, orientation, and incident illumination) that create a dynamic "virtual recording set." The system uses tracking tags that are imperceptible under attire, and inserted computer graphics elements can match the lighting on the presenter, making the technique ideal for real-time broadcast.

Enhanced Life

Techniques for accessible motion capture system are quite underdeveloped. While several mature techniques are available, they require work areas dedicated to esoteric systems, hours of data post-processing, and/or high-speed cameras.

This system foregoes these expensive parts, so it is much less expensive. Motion capture no longer requires specially designated spaces, special lighting, and huge investments.

Fields that would immediately benefit from accessible motion-capture include rehabilitation clinics and independent biomedical researchers in many fields: physics, anthropology, and sociology, for example. Even veterinary clinics could use accessible motion-tracking systems to examine animal gaits and behaviors for diagnosis. And artists everywhere will benefit, because current motion-capture systems are too complex and expensive.

Goals

To support accessible motion capture that is easy, yet powerful, for authoring and enhancement of visual effects for video production. The system is based on tracking technology that does not impose complicated set ups, while providing interactive feedback and maintaining incredibly accurate measurements, to the millimeter.

Innovations

This system delivers performance equivalent to the best existing optical motion-capture methods, plus:
  • It can record orientation and incident illumination at the marker tags. For the motion capture portion, it tracks the position of marker tags at a rate of 500 Hz, with 8-bit location precision, and with self-identifying tags. For the orientation, it strategically configures a set of modulated light transmitters and uses light modulation and demodulation techniques to estimate individual attributes at the locations of the receiving photosensors. Although these measurements are made at a sparse set of points in a scene, their richness allows extrapolation within a small range. These measured scene attributes can be used to factorize a captured video sequence and manipulate the video based on the resulting attributes. In addition, this factorization is accomplished at a very high speed (much faster than a typical camera could achieve), allowing manipulation of individual video frames at an intra-image level. All this is accomplished with strikingly simple hardware components.

  • Since each tag records its own location, there are no reacquisition issues in the case of occlusion. So the system can support an unlimited number of tags while maintaining the same fast capture rate.

  • In a virtual-set application, the flexibility of the tags becomes apparent. The system not only captures motion and lighting conditions in their actual setting, but also the tags worn by an actor are easily hidden by theatrical wardrobe so they are invisible in the video recording, and they do not interfere with performance

  • A key advantage of this approach is that it is based on components developed by the rapidly advancing fields of optical communication and solid-state lighting, which allows the system to capture photometric quantities without added software or hardware overhead. Marker-based techniques that use other physical media can not capture photometric properties.
There is one disadvanatage to this approach. Tags must be in the line of sight of the transmitters (at least those that label the space they occupy), so the system does not completely overcome the usual challenges of dealing with limited dynamic range when the ambient lighting is very strong, dealing with loss of communication due to occlusions (shadows), and handling multi-path distortions due to secondary scattering of light. This means that the system is not appropriate to every scenario, but it still allows much more freedom in tracking on location and in dynamic settings than current systems. In the next generations of this technique, these issues can be revisited for further refinement.

Vision

The one predicable impact of opening exclusive techniques to a wider audience has always been the unpredictability of innovation. With the proliferation of digital video on the web, video authoring and animation are becoming an essential part of the online experience. In this YouTube-empowered world, virtual sets at home or school may become as routine as the HTML editors of the recent past.

Contact

Ramesh Raskar
Mitsubishi Electric Research Laboratories (MERL)
Raskar (at) merl.com

Contributors

Hideaki Nii
Bert de Decker
Yuki Hashimoto
Dylan Moore
Jay Summet
Yong Zhao
Jonathan Westhues
Paul Dietz
John Barnwell
Mitsubishi Electric Research Laboratories (MERL)

Masahiko Inami
University of Electro-Communications

Philippe Bekaert
Universiteit Hasselt