|
|
|
|
Applications of Computer Vision to Computer Graphics |
Vol.33 No.4 November 1999
|
|
|
Figure 3: Some real-time applications can be made faster using specialized hardware. A CMOS detector made by Mitsubishi Electric, called the artificial retina chip, can both detect the image and perform some image processing operations as shown in (a). (b) shows the chip. The Nintendo GameBoy Camera (c) uses the artificial retina chip and allows the player’s face to be inserted into a simple game. The retina chip is used for both detection and image enhancement. Fast and Low-Cost SystemsThe systems above typically require powerful workstations for real-time performance. A focus of our work at Mitsubishi Electric (in Cambridge, MA, U.S.A. and in Osaka, Japan) has been low-cost, real-time systems. We have built prototypes of vision-controlled computer games and televisions with gesture-based remote control [5]. The existing interfaces for these systems impose daunting speed and cost constraints for any computer vision algorithm designed to replace them. A game pad or a television remote control costs a few tens of dollars and responds in milliseconds. The components of a vision-based interface covering the same functionality as those interfaces include a camera, digitizer and a computer. The system must acquire and analyze the image in little more time than it takes to press a button on a keypad interface. It may seem impossible to design a vision-based system that can compete in cost or speed. We have made prototypes that address the speed and cost constraints by exploiting the restrictions to the visual interpretations imposed by the interactive applications. For example, at some moment in a computer game, it may be expected that the player is running in place. The task of the vision algorithm may then be simply to determine how fast the player is running, assuming they are running, a relatively easy vision problem. Such application constraints allow one to use simple and fast algorithms and inexpensive hardware. We constructed a vision-based version of the Sega game, Decathlete, illustrated in Figure 2. The player pantomimes various events of the decathlon. Knowing which event is being played, simple computations can determine the timing and speed parameters needed to make the graphical character move in a similar way to the pantomiming player. This results in natural control of rather complex character actions. We demonstrated the game at COMDEX ‘96 in the U.S. and at CeBIT ‘97 in Germany. Novice users had fun right away, controlling the running, jumping or throwing of the computer character by acting out the motions themselves. Specialized detection and processing hardware can also reduce costs. Low-cost, CMOS sensors are finding many vision applications. We have designed a low-power, low-cost CMOS sensor with the additional feature of some on-chip, parallel image computations [11], named the Artificial Retina (by analogy with biological retinas which also combine the functions of detection and processing). The chip’s computations include edge-detection, filtering, cropping and projection. Some of the computer game applications involve the computation of image moments, which can be calculated particularly quickly using the on-chip image projections [5]. Figure 3 shows a schematic diagram of the artificial retina chip, a photograph of it and a commercial product that uses the chip, the Nintendo GameBoy Camera. We also made a gesture-based television remote control, again designing the system to make the vision task simple [6]. The only visual task required is the detection and tracking of an open hand, a relatively distinct feature and easy to track. When the television is turned off, a camera scans the room for the appearance of the open hand gesture. When someone makes that gesture, the television set turns on. A hand icon appears in a graphical menu of television controls. The hand on the screen tracks the viewer’s hand, allowing the viewer to use his or her hand like a mouse, adjusting the television set controls of the graphical overlay (see Figure 4). |
|
|
Figure 4: Adjusting a television set by hand signals. To get the attention of the television, the viewer raises his hand, palm toward the camera. When that gesture is detected, the television turns on, and a graphical overlay appears. A yellow hand tracks the position of the viewer’s hand, and allows the viewer to use his hand like a mouse, selecting the desired menu item from the graphical interface. Normalized correlation with pre-stored hand templates performs the tracking. Finally, Figure 5 shows 3D head tracking. The visual task of head tracking allows for a template-based approach, described in the caption. This could be used for a variety of interactive applications, such as a graphical avatar in a videoconferencing application, or to adjust a graphical display appropriately for the viewer’s head position. In addition to the entertainment uses described above, vision interfaces have applications for safety. Such tracking may be used in automobile applications to detect that a driver is drowsy or inattentive. The Present and the FutureComputer analysis of images of people is an active research area. Specialized conferences, such as the International Conference on Automatic Face and Gesture Recognition and the Workshop on Perceptual User Interfaces (PUI), present the state of the art. Relevant papers also appear in the major computer vision conferences: Computer Vision and Pattern Recognition (CVPR) and the International Conference on Computer Vision (ICCV). Systems are now beginning to move beyond the research community, and to become viable commercial products. The Me2Cam, due in the Fall of 1999 from Intel and Mattel, will allow children to pop or become trapped by bubbles on the computer screen, depending on their movements. As the field progresses and the sophistication and reliability of the vision algorithms increases, applications should proliferate. Inter-disciplinary approaches, combining human studies as well as computer vision, will contribute. If interface-builders can match the ease of use shown in Figure 1, the prediction of that photograph should come true in at least one aspect: vision-based interfaces should become ubiquitous. |
|
|
Figure 5: The scheme for head tracking shown here is based on template matching [1]. At initialization time, a frontal image of the subject is registered to a generic 3D model of the human head, and synthetically generated templates showing the appearance of the subject for a range of head poses are computed and stored. Subsequent head motion is determined by matching incoming images of the subject to the templates. The system achieves frame-rate performance by operating on subsampled images (32x32 resolution as shown in the figure). The approach is robust to non-rigid motion of the face (eyes closing, mouth opening etc). References
William T. Freeman is a Senior Research Scientist at MERL, a Mitsubishi Electric Research Lab in Cambridge, MA, where he studies Bayesian models of perception and interactive applications of computer vision. As part of his doctoral work at the Massachusetts Institute of Technology (1992), he developed "steerable filters," a class of oriented filters useful in image processing and computer vision. In 1997 he received the outstanding paper prize at the Conference on Computer Vision and Pattern Recognition for work on applying bilinear models to separate "style and content." Paul A. Beardsley is a Research Scientist at MERL in Cambridge, MA, working in computer vision. His research interests include 3D reconstruction and development of partial 3D representations for image-based rendering. He received a Ph.D. in computer vision from the University of Oxford in 1992. His postdoctoral work was on the recovery of 3D structures from "uncalibrated" image sequences, with a particular application to robot navigation. Hiroshi Kage is a Researcher of the Business Promotion Project of Artificial Retinas, the System LSI Division of Mitsubishi Electric in Sagamihara, Japan. He has been engaged in developing various machine vision algorithms for artificial retina chips. He received his B.E. and M.E. degrees in information science from Kyoto University, Japan in 1988 and 1990, respectively. Ken-ichi Tanaka is a Group Manager of the Business Promotion Project of Artificial Retinas, the System LSI Division of Mitsubishi Electric in Sagamihara, Japan. His research interests include electric propulsion systems for spacecraft and neural networks. He received his B.E. and M.E. degrees in aeronautical engineering from Kyoto University, Japan in 1979 and 1981, respectively. Kazuo Kyuma is the Project Manager of the Business Promotion Project of Artificial Retinas, the System LSI Division of Mitsubishi Electric in Sagamihara, Japan. He is also a professor at the Graduate School of Science and Technology, Kobe University, Japan, and a lecturer at Osaka University and the Tokyo Institute of Technology. His research interests cover optoelectronics, advanced LSI systems and neurocomputing. He received B.S., M.S., and Ph.D. degrees in electronic engineering from the Tokyo Institute of Technology, Japan, in 1972, 1974 and 1977, respectively. Craig D. Weissman is Director, Software Engineering at E.piphany, Inc. in San Mateo, CA, a leading provider of customer-centric analytic solutions that help businesses personalize their interactions with customers by connecting and analyzing data from inside and outside the enterprise. He graduated in 1995 from Harvard University with an B.A./M.S. in applied math and computer science. |
|
William T. Freeman
Hiroshi Kage
Craig D. Weissman
|
|
Reader Survey Archive Join the ACM and SIGGRAPH Join SIGGRAPH Calendar - Upcoming Events Contacts SIGGRAPH 99 SIGGRAPH 2000 The SIGGRAPH home page Professional Chapters |