Augmented perception via cartoon rendering - Reflections on a real-time video-to-cartoon system
The Emerging Technologies section of this year's conference includes two independent projects that convert live video into a cartoon-like image in real-time. In one system (Real-time video abstraction, Northwestern University), you can see your own face looking back at you like a cartoon character, complete with dark-line silhouette outlines and flat-color "fill" of various surface areas. When you turn your head, the cartoon character turns its head. Raise your right eyebrow and smile, the character does the same. He really looks like you, but in a cartoon kind of way. It is definitely a most peculiar sensation to see one's self as a cartoon avatar, as it were, face-to-face!
One promising application area that project lead Bruce Gooch is considering is live-video chat, in which users can control the amount of "cartooning" that their face appears with on other computer screens. With a "very cartooned" look, it is as if you are wearing a mask that protects your identity and real image, but still allows basic facial gestures to be recognizable. Think of this as a video equivalent of voice scrambling. This is great for kids on the 'net, who will naturally be attracted to such an entertaining mode of communication, but gain the added benefit of protection against potential "phishers" with bad intentions. For trusted communications, you could "tone down" the cartooning to a bare minimum: in this situation your "real face" is recognizable, but the mild cartooning has the effect of smoothing away details such as scars and wrinkles. There definitely seems to be something for all ages in this kind of technology!
Another research project (Augmented painting, Tuebingen University) implements something similar, and demonstrates its use as a technique for making seamless non-photo-realistic augmented reality displays.
For the time being, before various engineering and commercialization hurdles are overcome, just watching the world around you through such systems is already a very engaging experience. The reason this experience is so engrossing has to do with how our brain works, in particular how we process visual information.
A cartoon rendition of a scene is a simplified representation, in which a lot of extraneous details are absent. This is what makes comics such an effective visual communication method - only the essential elements are present. "Reading" the visual information is fast and easy, as it involves only the main contour lines that are the most important for visual recognition. This choice of "bare bones" elements must be fairly fundamental to how we perceive, as cartoons are easily interpreted across a wide range of cultural groups and age groups.
In order to make a coherent internal picture of the physical world around us, we continually simplify the visual information coming into our eyes, picking out the essential cues needed to reconstruct what objects are around us, and our location within the scene. We have a very sophisticated visual information processing pipeline inside of our heads that does exactly this. At some point along this pipeline, our images are probably transformed into something not far from a cartoon-style representation. Indeed, neurological studies confirm the existence of contour detection "hardware" in the very early stages of our visual processing. We can think of this pipeline originating as 2D images on our retinas, and ending as reconstructed 3D scene: a set of recognized objects laid out in a 3D "mental map". (This visual processing pipeline is in fact the inverse of the rendering pipeline of modern graphics cards, which starts with a 3D scene and "works backward", producing a 2D image from a particular perspective. For more exploration along these lines, see the introductory talk to Montreal ACM SIGGRAPH's 2002 event on 3D graphics on the web.
The "cartoon video" system, then, actually reproduces some of the same processing that our own retinal images go through on their way to the higher levels of cognition in our brains. There is an equivalent program that is running on the grey-matter CPU of our visual cortex, if you will. Now, when we receive images where this simplification stage has already been applied, we sense this redundancy at some level. Unconsciously, we realize we don't need to work as hard to interpret the images. Our internal program goes into "sleep mode", and the visual-cortex CPU, as it were, is freed up to do other things. This is why the images are so mesmerizing: they are "pre-digested", and so we read them with less effort than normal "untreated" images.
Gooch's research paper does not actually present things in this manner, of course, but it does cite evidence of cognitive advantages (such as improved image memory retention) , and refers to psychophysics literature. One can anticipate that this "visual predigestion" could free up the visual cortex to attend to other matters more effectively than usual, such as detecting subtle changes in a complicated visual scene, or reacting more quickly to certain changes in visual patterns. Some of the work usually done in our visual cortex is now being done in software, before the image reaches our retina; when we save our brain’s work, we augment our brain's overall capacity to perform. (This benefit also applies to artificial vision systems that use this rendering as a pre-processing step, yet another application area). If this can become a no-latency real-time system and be manufactured in a mobile/embedded form factor (there are no fundamental reasons why this can not be done), this could become a bona fide artificial extension to our visual processing abilities, providing a subtle form of technology-enhanced visual capability for both humans and machines.
Although much of this sounds very futuristic, this way of thinking about perception and the human mind is exactly what visual artists have been exploring for centuries, and is at the root of many of the great artistic movements. Various schools of thought within fine arts are based on enquiries into the nature of perception. The works produced can be considered artists’ attempts to communicate some of their insights. That is to say, putting down on canvas the results from some of the intermediate stages of the visual processing that they notice taking place within their minds. Inevitably, this also serves as a reminder to us that seeing is an ambiguous and highly non-trivial personal and cultural phenomenon. (This theme was originally taken up in the introductory talk to Montreal ACM SIGGRAPH's 2001 event on artificial vision.) Many artistic traditions anticipate the concepts of artificial intelligence and have reflected upon the manner in which we process and store visual information. If only my fine arts teacher had explained things in this way to me! Perhaps museums should consider adopting the language of computing and artificial intelligence to interpret art works for today's young public.
From a broad historical perspective, then, we can see this technology not as something born purely of the digital age, but as an example of computing technology accelerating the pace of an introspective tradition that goes a long way back. Its roots draw from a rich and inspiring visual tradition, and its branches will likely extend far into a bright future.