Applications of Computer Vision to Computer Graphics
Vol.33 No.4 November 1999
Image-Based Rendering:A New Interface Between Computer Vision and Computer Graphics
Leonard McMillan MIT
Steven Gortler Harvard University
What is IBR?
Image-based rendering (IBR) describes a set of techniques that allow three-dimensional graphical interaction with objects and scenes whose original specification began as images or photographs. In an IBR pipeline, processing is applied to a set of input photographs creating an intermediate data structure. Later, this data structure is used to create new images of the scene or object.
Computer vision provides tools to analyze images and create models representing shape and surface properties. Computer graphics provides tools to take models and create images. This suggests that image-based rendering can be achieved by combining computer vision and computer graphics techniques. However, is combining the best-known computer vision and graphics techniques the best we can do? Many computer vision algorithms are just not very robust, and high quality rendering can be computationally expensive.
To better explore the connection between computer graphics and computer vision we ask the following question. How concise a model should one try to create from the image data using computer vision analysis? At one extreme, one could ask for a smooth surface representation annotated with shading parameters. A less concise and unified representation is simply a set of depth images, where a depth (z) value is associated with each pixel of the original photographs. An even less concise representation associates with each image pixel a color value and a ray direction.
Concise representations are flexible. They permit straightforward manipulations of models, including modifying shape and material properties. In addition, they allow for measurements and other forms of geometric reasoning (i.e. collision and proximity detection). However, as we demand a more concise representation, the analysis becomes more difficult and the associated computer vision algorithms less reliable. Furthermore, more computation is required to synthesize images from these more concise representations.
This leads us to the following questions. Is the transformation of images to the most concise representation possible a requirement for generating new images? Is a three-dimensional model the best way of maintaining both the realism and integrity of the source images? Is a model merely a form of compression, for that matter, lossy compression? Are we willing to tolerate such losses?
Image-based rendering approaches three-dimensional graphics problems by designing data structures that can be robustly computed from images and can subsequently be used to create high quality images at minimal computational cost. Thus, image-based rendering forces us to think about how to best use computer vision and computer graphics concepts and tools in conjunction with each other.
Figure 1: A wide variety of representations have been used for image-based rendering, including depth images like the one shown above. The image and depth information were acquired simultaneously using a laser-based 3D scanner.
How is IBR Different?
Many IBR methods render new images from some image-like, sampled representation instead of from a polygonal or polynomial geometric representation. IBR explores new rendering algorithms that exploit the regularity of these sampled representations. In particular, image-based rendering has developed many new algorithms that avoid much of the overhead found in polygonal rendering.
In the future, image-based-rendering methods may challenge the triangle as the predominant modeling primitive. Triangles have always provided a powerful primitive for describing a vast range of different shapes. Their primary advantages include that they can be easily manipulated using linear methods, that they are convex in both 2D and 3D and that they (in general) remain triangles when projected. Another important feature of triangles is that they provide a very compact representation for regions: by merely manipulating only the three triangle vertices, it is possible to affect hundreds of displayed pixels.
However, all of these advantages evaporate as the average triangle’s size is reduced. This is unfortunate, since as we attempt to create increasingly realistic computer graphics models, we are also witnessing a systematic shrinking of the triangles used to represent them. It is not difficult to generate models where a typical triangle projects onto a single image pixel. At this extreme, the efficiency of a triangle-based representation, where three vertices are transformed and projected in order to produce a single output pixel, comes into question.
Images, on the other hand, represent an efficient sampling of a scene from a particular viewpoint. If we want to produce new images from viewpoints that are close to an image from an existing reference set, it stands to reason that the result will appear as a subtle remapping of the pixels from the reference. Thus, image-based rendering seeks to trade off the complexity of view-independent fidelity, as represented by the ideal triangle-based model, for high fidelity at a restricted range of views.
Because of their efficiency, image-based rendering algorithms can be used for synthetic models as well as photographic input. In particular, one can use the output of a high-end rendering engine (perhaps a ray tracer) as an input image for an IBR rendering technique. This provides techniques to couple high-end "slow" rendering with interactive viewing .
Why IBR Now?
In addition to the trend towards smaller and smaller triangles, many other forces have contributed to the recent explosion of activity in the area of image-based rendering. Among these are:
The classic approach to three-dimensional computer graphics depends on models, and realistic models are not easy to come by. The creation of hand-built geometric models and realistic surface models is a time-consuming ordeal, and seldom do these models approach the complexity of real-world objects. In fact, it is still nearly as difficult to create three-dimensional models today as it was 30 years ago. Often, this lack of progress is blamed on the state of three-dimensional modeling tools. However, placing such onus on modeling tools is probably unjustified since the creative process does not share in the same exponential performance gains achieved by rendering architectures. However, new technological developments do offer some promise in this area. In particular, recent work in the area of active-range sensors shows great promise for speeding up the process of model generation (see Brian Curless’ article in this issue on pp. 38-40). Still, these devices are expensive, fragile and limited in the scale of models that they can produce. Furthermore, they do not address the need for the accurate reflectance models that are required for photorealistic rendering. IBR, on the other hand, addresses both the geometric and reflectance aspects of modeling simultaneously, by relying on images as the underlying model (see Paul Debevec’s article on Image-Based Modeling and Lighting, pp. 46-50).
On the technological front, the recent explosion of multi-megapixel digital cameras has made it easier than ever to acquire high-quality precision imagery. The combination of digital repeatability with an image quality approaching that of 35mm film, coupled with the economic benefits of a mass market application, leads one to wonder how to best incorporate digital images into the generation of three-dimensional graphics.
Current trends in modern computer architecture also invite a reinvestigation of traditional approaches to computer graphics. Over the past five years, there has been a steady movement of computer graphics-specific acceleration hardware away from dedicated subsystems and closer to the central processing unit. This motion provides graphics hardware access to the vast amounts of memory capacity and bandwidth necessary for processing images. Examples of this trend include instruction-set extensions tailored specifically for graphics such as AMD’s 3DNow, Intel’s MMX and Sun’s Sparc VIS architecture extensions. This new partitioning also allows for the reorganization of rendering tasks beyond the classic graphics pipeline. For example, incremental transformations of regular data structures can replace classic transformations that use 4 by 4 matrix multiplies, and texture references can be made prior to rasterization. In short, moving functionality closer to the CPU allows for a more holistic treatment of graphics rendering and it provides for new data representations that are not dictated by specific interfaces.
It is these forces, as well as others that have led to an explosion of research on IBR. While there is a wide range of different approaches, all of them share a common feature - the use of images as an underlying model for the generation of three-dimensional graphics.
Figure 2: Light fields and lumigraphs are some of the least concise IBR representations. Each image is treated as a collection of rays. New images are approximated from these collections by interpolating between nearby rays.
Figure 3: Panoramic images can make effective image-based representations even though they are inherently two-dimensional. The combination of changes in perspective and their wide field-of-view can provide a striking sensation of being immersed within a three dimensional environment.
Figure 4: Layered depth images overcome many problems associated with occlusions and exposures of depth-images, yet they are still sampled representations with equivalent computational advantages.
Image-based rendering encompasses a wide range of different methods, all of which use images as a significant component of a scene’s representation. The most common of these is texture mapping, where the apparent complexity of an object is increased by overlaying an image across its surface . In this case, the rough shape is represented by the geometric description, with subtle details finessed by the shading and other perceptual hints derived from the image. Texture mapping has enjoyed a preeminent role among computer graphics techniques over the past few years, primarily because of the additional visual fidelity that it provides for a very modest cost. Clever variants of the classical "label-like" texture mapping have also emerged. In particular, projective texturing allows for entire collections of primitives to easily share a single texture defined from a particular viewpoint . These new texture-mapping methods are very much in the spirit of modern image-based rendering.
To some extent, a single panoramic image can be used to represent an immersive scene, as in Apple’s QuickTime VR and IPIX images . The rendering of these representations only allows for rotations and modifications of the field of view at a single viewing position. This results in dynamic perspective changes, but excludes parallax. It is surprising how effective the combination of immersion and changing perspective can be in creating an illusion of three dimensions, when such representations are entirely two-dimensional. Nonetheless, the rendering of unique views from a single image source, which correspond to specified viewing directions and field-of-views, qualifies these panoramic viewing methods as image-based techniques.
Another computer graphics method that can be considered as an image-based rendering method is image morphing. In image morphing, the goal is to develop a series of intermediate images that represent a plausible transition between two or more reference images . All of the transformations involved in image morphing are computed entirely within the two-dimensional image domain. Thus, the synthesis of apparent camera motions or shape distortions depends entirely upon implicit constraints that rest in the creative mind of the artist or animator who specifies the desired morphing parameters. In the case where the morph is used to simulate only camera motions between reference images from a static scene, Seitz has developed a specially constrained variant of image morphing called "view morphing" .
The three previous image-based rendering techniques have become well established and already enjoy substantial commercial success. Next, we will describe more recent developments and areas of ongoing research.
Recent Developments in IBR
Images augmented with depth or optical-flow information can be used as scene representations, using a technique called image reprojection [4, 12]. With the addition of depth information at each pixel, images can be reprojected to new viewing positions exhibiting parallax changes and occlusions. Unlike view morphing, this technique allows for the synthesis of a wide range of images (i.e. those located off the lines connecting the reference camera centers). Consequently, renderings that use these methods behave similar to models rendered using classical computer graphics methods.
The notion of depth can be represented in more abstract forms. For example, the apparent trajectory of a point while a camera moves, called optical flow, or likewise, the correspondence of feature points between images can take the place of explicit depth values in the warping process. Methods that use these more abstract representations include the trilinear-tensor image reprojection  and reconstruction from fundamental matrices . These methods can be applied in situations where the precise position and field of view of the camera is unknown.
One difficulty of this technique is the acquisition of the required depth information, or at least image feature correspondence, which usually depends on the application of computer vision techniques. The use of registered active-range-finding devices is becoming more commonplace and is offsetting this difficulty somewhat.
The pixels of a single image only represent the closest object along each ray. When reprojecting images, surfaces that were not visible can become exposed. One way to overcome this, is to combine the color and depth information from several images into a single layered-depth image (LDI) . An LDI combines in each pixel the multiple color and depth values representing the many surfaces seen along the ray. Another alternative is to use a multiple center of projection image (MCOP) . Unlike a traditional image, where all of the rays originate from a single center of projection, pixels in an MCOP image emanate from different centers of projection. This allows an MCOP to "see around" objects.
An alternative approach to image-based rendering is to consider a collection of images as merely a collection or database of rays with no associated structure. To reconstruct a new image, each desired pixel is colored by querying the database for that ray. Rays not in the database are filled in using some form of interpolation. Light field , Lumigraph  and Concentric Mosaic  methods fall within this class of methods.
The advantage of these approaches is that they require minimal interpretation of the data. However, they rely on either a very dense set of images or sophisticated reconstruction algorithms in order to synthesize acceptable images. In theory, these methods are the most intriguing since they are closest to a purely image-based representation. In practice they suffer from focus and ghosting problems, limited fields of view and costly storage requirements. Further extensions to these ray database approaches are still needed to accommodate deep scenes, optimal sampling and rapid access to offline data.
There are still huge unexplored areas in the field of image-based rendering. Included among these is the incorporation of sparse and approximate correspondences as part of the underlying representation. For example, the processing of silhouettes to form approximate models has already been shown to significantly improve the reconstruction of light field models, even though these models are known to be inexact .
What to Expect
The jury is still out on IBR. In the future, it will be interesting to see if image-based rendering remains as a distinct intellectual pursuit, or whether it is absorbed into the mainstream of either computer vision or computer graphics. It is doubtful that the problems addressed by image-based rendering are likely to disappear anytime soon. We will undoubtedly continue to want to make new images that are similar to and consistent with a given image set. It is also likely that we will continue to redefine and revise representations in order to optimize the processes for deriving new information from them. Whether IBR ends up as a distinct field, part of computer vision, part of computer graphics or is simply forgotten, it is still interesting to make conjectures about the future impacts from these recent endeavors.
One likely outcome of the research efforts in image-based rendering is the development of new acquisition devices . We’ve already seen tremendous activity in this area. In particular, there has been an explosion of both mechanical and optical devices for capturing panoramic images. In the future, it is easy to imagine new devices that capture dynamic image sequences from a multitude of different points of view.
Another area where image-based rendering is likely to have a significant impact is in the development of new hardware rendering architectures. IBR approaches can potentially provide both higher quality and higher performance than traditional architectures . For the most part, the algorithms used in IBR are extremely simple, regular and, thus, well suited to hardware implementation. IBR techniques could either augment, or entirely replace, the traditional graphics pipeline.
However, the most significant impact that image-based rendering can have on the fields of computer vision and computer graphics is to enable new applications and problem domains. Among these applications are remote telepresence, telemedicine, 3D endoscopy, virtual television and virtual movie sets. It is difficult to even approach these problems using traditional computer graphics methods.
Image-based rendering techniques have been developing quickly in the last five years. The idea of creating renderings from photographs has ignited the imaginations of both computer graphics and computer vision practitioners. The concept of rendering new images directly from old ones is both compelling and intriguing. Furthermore, these new techniques suggest new and efficient rendering algorithms that have many advantages over traditional polygon rendering. This excitement has driven people to design new image capture devices, rethink the rendering pipeline and investigate new interactions between computer vision and computer graphics. To date, these endeavors have produced many stunning results. We are eagerly anticipating what will emerge from IBR in the next five years.
Leonard McMillan is an Assistant Professor in the Electrical Engineering and Computer Science Department at MIT. There he co-leads the Computer Graphics Group within the Laboratory of Computer Science. McMillan received his BSEE and MSEE from Georgia Institute of Technology and his Ph.D. from the University of North Carolina at Chapel Hill. He has also worked at Bell Laboratories and Sun Microsystems. His research interests include image-based rendering, three-dimension display technologies, computer graphics hardware and the fusion of image processing, multimedia and computer graphics.
Steven (Shlomo) Gortler is an Assistant Professor of Computer Science at Harvard University. He completed his Ph.D. at Princeton University, spent two years with the Microsoft Graphics Research Group working on image based rendering as well as other projects.
Steven (Shlomo) Gortler