a a a

Non-Photorealistic Rendering

Vol.32 No.1 February 1999

“Tour Into the Picture” as a Non-Photorealistic Animation

Ken Anjyo
Hitachi, Ltd.

Figure 1a
Figure 1b
Figure 1: An example of Tour Into the Picture - (top) input image; (bottom) output image.

Figure 2
Figure 2: Pseudo-3D scene models for the same input image - (a) input image; (b) FOV = 54 deg.; (c) FOV = 150 deg

It is not unusual that one wishes to get into the 3D world described by a painting when it is encountered in a museum. A natural and simple question for the readers of this journal then arises: Is it possible to do so using the computer? We suggested an answer by providing an image-based rendering technique entitled TIP (“Tour Into the Picture”) at recent SIGGRAPH conferences [1, 2]. TIP provides a simple but powerful GUI (graphical user interface) which allows users to easily construct “visually 3D” animation using just one 2D image of the scene to be animated. As shown in Figure 1, TIP provides different views from a single image. In this article, in addition to the theory and practice of TIP, the motivation behind TIP and its philosophical background are described.

In general, 3D animation requires 3D scene geometry. However, if only one image of the scene is input, it is fairly impossible to construct a 3D scene model. (Who knows what is behind a foreground object in the input image?) In order to know “behind the scenes” from one image of the scene, we have to depend largely on our own interpretation of the image. On one hand, we usually have “depth perception” in seeing a painting of a realistic scene. This seems to come from the fact that scene painting is made with a perspective view. It should then be noted that the 3D world described in the painting may have different 3D geometry. The 3D structure may be distinguished according to a viewer.

Following is a brief look at the steps involved in “Tour Into the Picture.” Roughly speaking, after inputting a single 2D image such as the one shown in Figure 2a, TIP provides a background scene model with, at most, five 3D polygons. Each foreground object appearing in the input image is represented as a texture-mapped image on a billboard, as shown in Figure 2b or 2c. (As a preprocess, we need additional 2D information which is represented as the background image and the mask information about foreground objects. For more details, see [2]) The billboards and the simple background model constitute the pseudo-3D scene model in TIP. Note that, in TIP, a user specifies which are foreground objects and which are the rest, i.e., the background. The specification depends on the user’s sense and aim in making animation from the input image, and it does not matter what the image actually is. The pseudo-3D scene model is made after the user decides what type of perspective view is suitable for the input image to be modeled in 3D (we can select from one or two-point perspective and their variants).

In many cases one-point perspective is selected, as this is the most interesting. For example, the user can also specify the FOV (field-of-view angle) as a parameter for designing the pseudo-3D scene model. As a result, we can get variations of the scene models just by choosing the different FOVs, as shown in Figures 2b and 2c. Again, note that selection of FOV is made by the user, whereas FOV is uniquely fixed in a two-point perspective case.

Another interesting feature in TIP lies in the process of making the background model. To make the background model in a one-point perspective case, the user must specify the position of the vanishing point and its neighborhood. In the top images of Figure 3, the point surrounding the rectangle is the vanishing point and the rectangle is the neighborhood of the vanishing point. Using the GUI in TIP, called spidery mesh, users can move the vanishing point and its neighborhood, as they like.

Users can specify where the vanishing point is as well as its neighborhood. The top two images in Figure 3 show different choices of the vanishing point and its neighborhood. As shown in the middle of Figure 3, the rectangle representing a neighborhood of the vanishing point corresponds to a 3D rectangle of the background model. The background image (made in the preprocessing) is then mapped onto these rectangles, so that the obtained 3D background, as shown in the bottom of Figure 3, looks like to each other, independently of the choice of the location of the vanishing point and its neighborhood. Actually it’s hard to see the difference of the bottom images in Figure 3, because they are still images. However, different animations can be obtained through the different background models. For example, in the case of using the background models in Figure 3, the height of the “sky” (or, of clouds in the sky) differs, so that we can tell the difference in the background models when seeing the animations. Of course we feel both animations are real, whereas different and user-specified pseudo-3D structures of the scene in the input image are constructed!

Referring back to Figure 1, TIP provides a non-photorealistic animation in the sense that the obtained animation still has a “painterly touch.” TIP generally gives a depth perception effect to a wide range of 2D images, such as photos, paintings and sketches. Therefore, if we use a painting or illustration as input, we get a “visually 3D” animation through TIP. It should then be noted that the pseudo-3D scene model used for making the animation is fake, and comes from the user’s imagination. In this sense the generated animation is non-photorealistic.

Ken-ichi Anjyo
Visualware Planning Department
Hitachi, Ltd.
4-6 Kanda-Surugadai Chiyoda
Tokyo, 101-8010 Japan

Tel: +81-3-5295-5410
Fax: +81-3-3258-5082

The copyright of articles and images printed remains with the author unless otherwise indicated.

Photoreality has been an ultimate goal of computer graphics rendering. In addition to this, recent rapid progress of image-based rendering (IBR) and non-photorealistic rendering (NPR) have widened the meaning of reality as a goal of computer graphics technologies. If we say, for example, that photoreality is achieved by physics-based app-roaches, we could say that another reality is achieved by user’s interpretation or imagination of the scene to be animated. Or, if we don’t want to use the word “reality” to describe the results achieved by IBR or NPR, we might say that a new “touch” of computer graphics has been created, which at least entertains and stimulates the user’s and viewer’s mind, regardless of whether the animations generated are physically correct or not. Let’s have fun with computer graphics!


  1.  Anjyo, K. and Horry, Y. Theory and Practice of “Tour Into the Picture,” SIGGRAPH 98 Course Notes #8, July 1998.
  2.  Horry, Y., K. Anjyo and K. Arai. “Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image,” In SIGGRAPH 97 conference proceedings, Computer Graphics, August 1997, pp. 225-232.
Ken-ichi Anjyo is a Research Scientist and Creative Director in the Visualware Planning Department of Hitachi, Ltd. He has served as a member of program committees for several international conferences on computer graphics, including the 1999 ACM Symposium on Interactive 3D Graphics. He is also active in industrial projects for making film and digital cell animation.