Image based modelling with VideoTrace
The idea of image based modelling is to take several snapshots, or a video, of something you want to model, and then to quickly and reliably create an accurate 3D model of it.
School of Computer Science
University of Adelaide, Australia
www.acvt.com.au/research/videotrace
February 5, 2008
Abstract
Image based modeling (IBM) combines aspects of computer vision, graphics and interface design. Practitioners in each field have approached IBM within the context of their own discipline, but recently systems have emerged that harness the strengths of each. In this article we discuss approaches to IBM, the development of our own system VideoTrace, and applications now and in the future.
Background
The idea of image based modelling is to take several snapshots, or a video, of something you want to model, and then to quickly and reliably create an accurate 3D model of it. This idea is hardly new, and in fact such systems have been used in areas such as photogrammetry and visual effects for decades. However, in the past systems have often fallen short of the ideals of speed, reliability or accuracy, meaning that the creation of photo-realistic models from photographs is a laborious process.
For over 50 years, photogrammetrists have used measurements taken from photographs to infer properties of the actual object or scene being photographed
(e.g. [6]). Typically, photogrammetry requires prior knowledge of some part of the scene (for example, the size of a calibration object), and/or the properties of the camera. Given this, the imaging process can be at least partially reversed, so a line or points in the image can be transformed to their counterparts in the scene. Two or more views are required to recover 3D scene information. Historically this was done manually, and needed to be extremely precise. Modern examples of photogrammetry software include Photomodeler [2] and Facade [4]; although these are designed to assist the user they do still require a great deal of manual input to create a 3D model.
Image based modelling, aka photo-modelling, has also become a feature of many 3D modelling packages such as Blender, Maya and ZBrush. In these packages, however, images are typically used either as textures or as a visual guide for the modeller. For example, in Blender [1], images of an object from the front, side and top can be superimposed on the corresponding 3D views of the object to be modelled. The modeller can then ensure that as they create the model it agrees with each of these images. The images themselves can be combined to create a texture for the object. This requires a great deal of care when taking the images, though, or careful rectification of the images to generate “perfect” top, side and front views.
In computer vision, the prevalent approach to image based modelling has been an ambitious one: images in, model out, with complete automation. Some such systems were devised (e.g. [3]) but inevitably they could not be guaranteed to work for all imagery. This is mainly because they relied on the ability to extract enough features from each image to fully describe the object, and then to match corresponding features in different images. Commercial versions of these systems such as Voodoo [7] and PFTrack [5] allow some user interaction— for example, the manual insertion and matching of features—so that a result can be obtained even when these processes fail. However, manually inserting and creating feature trajectories is a time consuming task, and still does not lead to the complete 3D polygonal model that is required for most applications.
More recent systems devised to combine the strengths of these approaches. For example, interactive methods that use computer vision to do the heavy lifting, thereby relieving the operator from much manual modelling. The ILM Image based modeling system is a prime example, but no details are available (it is used in house only). It is used for high grade visual effects.
In this report we discuss the development of our own image based modelling system, called VideoTrace [8], that builds on recent developments in computer vision. From our perspective as computer vision researchers, we see computer vision coming increasingly near to the goal of models from images, and so we design simple user interactions that cater to the broad brush strokes of modelling that humans do well and yet computer vision finds extremely difficult. Meanwhile, the precise fitting of shapes to images, which is tedious for a person, is taken care of automatically.
Making a model
There are 2 main steps to building a model in VudeoTrace:
1. Run a camera tracker [7] over the video sequence
2. Trace out the structure you want to model
Camera tracking technology has evolved to the point where a number of commercial systems are available and widely used within the visual effects industry. A by-product of these systems is the location of a number of points on the surface of the scene. Some camera trackers have included features to convert these points into fully fledged polygonal models -however these are limited in what they can do, as they are not the focus of the software. In contrast we believe this information can be used as the basis for a rapid 3D modelling application by using it to interpret user interactions.
In our system all modelling is done by drawing or painting on frames of video, effectively tracing out or painting over visible structure. This makes building a model (almost literally) child’s play.
2.1 Tracing out structure

Figure 1: Tracing out the boundary of a planar facet.
Three-dimensional structure is built by first tracing planar faces in the image as in Figure 1. The facet in this example has seven straight edges defined by points [1] to [2], [2] to [3] and so on until the last edge connects [7] to the starting point [1]. These are defined by clicking the mouse on point [1] and dragging it to each other point in turn. Curves are created by refining edges, so create a ’base line’ indicated by points [3] and [4] that can be refined later. The boundary is closed by creating the last edge back to the starting point [1].

Figure 2: Check your tracing from another view, and refine it.
The 3D position and shape of the polygon you’ve just created is now estimated based on the 3D position of nearby points reconstructed by the camera tracker, while its boundary is snapped to local image edges. You can check the shape by navigating to another frame of the video (i.e. looking at the scene from another point of view). If the 3D shape is correct, the outline you’ve drawn will still be aligned with the edges of the shape (see Figure 2). If not, you can adjust the lines so that they agree with the new view. Next, you can turn a straight edge into a curve. Click the start of the curve [1] and drag the cursor to the end of the edge [2] while sketching the curved edge of the blue building block.

Figure 3: Extrude faces to create solid models.
Faces are turned into solid models by extruding them along a line in three-dimensions, as in Figure 3. Change to a new view so you can see the front of the facade and the side of the model. With the extrude tool, click on the facade
[1] and drag the cursor until the new face is aligned with the back of the model. Again, the model is automatically “snapped” to image edges, so you do not need to be pixel-perfect!

Figure 4: Create an initial facet representing the steps, and then refine the steps after checking their 3D shape.
Using the same process described earlier, trace the side of the steps with the pencil tool. You can simply trace a triangle to represent the entire stair-case rather than modelling the individual steps, as shown in Figure 4(left).
Once the initial plane has been anchored in three-dimensions (again, use other viewpoints to check it), it can be refined to better represent the steps as in Figure 4(right). Using the refine tool, drag the mouse to form the line segment from point [1] to [2], and then to [3]. Repeat this process until the steps are closed at point [7].

Figure 5: Extrude the steps, and view the created model. The model can also be viewed and exported with a texture map derived from the video.
Using the extrude tool, sweep the steps by clicking on the shape [1] and dragging the reference point until the back-faces are aligned [2] (Figure 5(left).
The model that has been created so far can viewed either in wireframe (Figure 5(right), or texture mapped. Also shown is the estimated camera path.
This model can be incrementally adjusted and added to until it meets the requirements of the user. After this, it can be exported using standard file formats such as VRML and 3DS so that it can be used in other 3D modelling packages.

Figure 6: Jeep model created in VideoTrace.
Creating such a model is only a few minutes’ work, even for someone with no experience in 3D modelling. We have used the system to create more complex models such as cars (e.g. Figure 6), buildings and furniture, and find that after becoming accustomed to the software these objects can also be modelled in approximately 10-20 minutes. More information is available in [8].
Applications and the future
For a number of reasons, we predict the use of image based modelling will grow:
- Ease of capturing high quality images or video of a subject. These days, cameras are ubiquitous as stand-alone items or in phones or PDAs.
- Advances in computer vision relieve the user from the details of fitting the model to the images, and from having to know anything about the camera.
- Increased appetite for 3D models for use in diverse application areas such as visualisation, effects, simulations and walkthroughs and digital archiving.
- Image based modeling packages are increasingly gaining acceptance in the technologically savvy realm of visual effects production (see ILM), where they are very useful for object removal or insertion in video. However a number of other application areas stand to benefit from such a package, including:
- Architecture firms are increasingly reliant on 3D models to create walkthroughs of existing buildings or proposed projects. Currently, models are created based on measurements and images taken on site. With image based modelling, models can be created from video alone.
- Virtual tourism and digital archiving. 3D models are increasingly being created of entire cities, or of heritage sites. These can then be used to take virtual tours or simply to record culturally important sites or objects for posterity.
- Games and Second Life. Video games are offering increasing levels of user customisation. A logical next step is to allow the gamer to capture their car, or house, or themself, on video and then upload it to the game as a 3D model.
- Medical visualisation and modelling. Medical imaging methods do not capture 3D anatomical structure, which can be crucial in diagnosis and analysis. Image based modelling can convert video from a medical scan into 3D information.
- Engineering. 3D models are used in engineering for a variety of purposes from simulation to training to reverse engineering.
- In future, we expect the range of applications and capabilities of image based modelling systems to broaden significantly. For example, we are currently investigating ways of combining high resolution image data with user interaction to accurately model fine surface detail in three dimensions. We also expect that the capture of material properties beyond a single texture is possible by reasoning about the lighting conditions in an image.
3.1 Summary
Image based modelling is of course not applicable in every situation, but we believe that it is currently under-utilised. This is partly because it lies at the intersection of computer vision, graphics and photogrammetry, and requires some input from each of these areas. However once convincing demonstrations of its potential are available we expect to see much more work in this exciting area.
About the Authors:

Anton van den Hengel
is the Director of the Australian Centre for Visual Technologies, a Director of PunchCard Visual Technologies Pty Ltd, and an Associate Professor in Computer Vision at the University of Adelaide, South Australia. Dr van den Hengel's primary research interests are in interactive 3D modelling from image sets and large-scale video surveillance.

Anthony Dick
is the Head of the Interactive 3D Modelling Program within the Australian Centre for Visual Technologies and a Lecturer at the University of Adelaide, South Australia. Dr Dick's research interests include interactive 3D modelling from image sets and Computer Vision for Computer Graphics.
References
[1] Blender Foundation. http://www.blender.org.
[2] Eos Systems. Photomodeler: A commercial photogrammetry product http://www.photomodeler.com, 2005.
[3] M. Pollefeys, L. Van Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch. Visual modeling with a hand-held camera. Int. Journal Computer Vision, 59(3):207–232, 2004.
[4] C.J. Taylor, P.E. Debevec, and J. Malik. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. ACM SIGGraph, pages 11–20, 1996.
[5] The Pixel Farm. PFTRACK: A commercial camera tracking and image based modelling product http://www.thepixelfarm.co.uk.
[6] E. H. Thomson. A rational algebraic formulation of the problem of relative orientation. Photogrammetric Record, 14(3):152–159, 1959.
[7] Thorsten Thorm¨ahlen and Hellward Broszio. Voodoo camera tracker. free download at www.digilab.uni-hannover.de.
[8] A. van den Hengel, A. Dick, T. Thorm¨ahlen, B. Ward, and P. Torr. Video-trace: Rapid interactive scene modelling from video. In SIGGraph 2007, pages 86–90, 2007.