# Treatment of scattered data

The data from the rainfall stations are typical of observational data that are scattered at irregular locations in two or three dimensions (i.e., data with no notion of connectivity or topology). Figure 2 is representative of a straightforward discrete realization of such data as a scatter plot to show the spatial distibution. Figure 3 illustrates the temporal distribution for a single station.

Figure 2. Spatial Distribution of Peruvian Rainfall Measurements on January 26, 1983.

Figure 3. Time History Plot of Rainfall Measurements at Chulucanas, Peru with January 26, 1983 Highlighted.

The simplest and quickest approach is to create a regular grid from the point data by nearest neighbor meshing -- find the nearest point to each cell in the resultant grid and assign that cell the point's value as illustrated in Figure 4. Such a technique is valuable because it preserves the original data values and distribution of a grid after a coordinate transformation may have taken place on a collection of points. Although computationally inexpensive, the results may not be very suitable for qualitative display because of the preservation of the discrete spatial structure.

Figure 4. Nearest Neighbor Gridding.

Although such techniques preserve the fidelity of the data, they fail to impart qualitative information about the spatial characteristics of the measurements or the phenomena of which they represent discrete samples. Thus, the application of continuous realization techniques (e.g., surface deformation or contouring for two-dimensional data, volume rendering or surface extraction for three-dimensional data) is necessary. An intermediate step of defining a topological relationship between the locations of data to form a mesh structure is required. Conventional continuous realization techniques can then be applied to such a mesh. There is a long history of mathematical methods to create such meshes. Each method does change the data and their artifacts must be understood because they will carry through to the actual visualization. This discussion is only meant as a very brief introduction to the topic. Nielson [8] summarizes many of the methods in use today and their relative advantages and disadvantages.

An alternate approach that preserves the original data values involves imposing an unstructured grid dependent on the distribution of the scattered points. In two dimensions, this would be a method for triangulating a set of scattered points in a plane [2]. This technique first requires the Voronoi tesselation of the plane with a polygonal tile surrounding each of the scattered points. These tiles are such that the locus of all points within a particular tile are closer to the scattered point associated with that tile than they are to any other points in the set. A triangulation can then be constructed which is the dual of the Voronoi tesselation (i.e., connecting a line between every pair of points whose tiles share edges). This is known as Delauney triangulation and is illustrated in Figure 5 as applied to the rainfall stations.

Figure 5. Delauney Triangulation of Rainfall Stations.

For a relatively random distribution of a small number of points such as these rainfall data, the application of continuous realization techniques to the triangulated mesh does not yield useful qualitative results. Consider Figure 6, in which the mesh from Figure 5 is pseudo-colored by amount of rainfall. The rendering process applies bilinear interpolation to the value at each node to determine the color of each pixel in the image. Although the original data are preserved, the sparseness of the points results in a pseudo-colored distribution that is difficult to interpret.

Figure 6Pseudo-Colored Rainfall Distribution from Delauney Triangulation

A potentially more appropriate method, and certainly one that is more accurate than nearest neighbor meshing, uses weighted averaging as illustrated in Figure 7. For any given cell in a grid, the weighted average of the n nearest values in the original data distribution spatially nearest to that cell is chosen. A weighting factor, wi = f(di), where di is the distance between the cell and the ith (i = 1 , , m) point in the original distribution, is applied to each of the n values. Figure 7 illustrates the case where n = 3. A common weight is w = d-2. These are variants of Shepard's method [15]. For example, Renka [14] modified this approach with local adaptive surface fitting. Collectively, these methods are typically O[nlog(n)] in cost. Intermediate in quality and computational expense would be using linear instead of weighted averaging.

Figure 7. Weighted Average Gridding

All methods in the aforementioned class do introduce aliasing or smoothing of the data to achieve a gridded structure. The form of the interpolation may also impart artifacts on the results depending on the relative spatial variability of the original data vs. how close the interpolant function may be able to model that structure. Given a goal of qualitative visualization, such artifacts may be acceptable. Figure 8 shows a regular mesh with spacing of 0.04 degree (of latitude and longitude), onto which the rainfall data for January 26, 1983 has been gridded using d-2 weighting for n = 5. The mesh and the data locations have been similarly pseudo-colored. There is good, but NOT perfect correspondence between the original data values and that of the interpolated grid, sufficient for qualitative realization. From this grid isocontour lines of constant rain- fall every 25 mm and a pseudo-colored image are created as shown in Figures 9 and 10, respectively.

Figure 8. Pseudo-Colored Mesh from Weighted Averaging and Rainfall Stations.

Figure 9. Isocontour Lines of Rainfall from Weighted Average Gridding of Stations.

Figure 10. Pseudo-Colored Rainfall from Weighted Average Gridding of Stations.