IEEE Visualization 2000 Tutorial #3 Notes

The Convergence of Scientific Visualization
Methods with the World Wide Web

III. Incorporating Scientific Visualization into Multi-User and Internet-based Decision Support Tools

Theresa Marie Rhyne
Lockheed Martin Technical Services
US EPA Scientific Visualization Center
Email: rhyne.theresa@epa.gov

Introduction

There are many ways to consider scientific visualization. One interesting approach is to define it as computationally intense visual thinking. For centuries, scientists and engineers used illustation and drafting to visually depict their ideas. Collaborative discussion was facilitated by drawing on paper while postal mail allowed for sharing among remote sites. Later, telephone and television technology improved multi-user discussions and decision making. Computers, the Internet, and the World Wide Web (Web) changed our viewpoints on what is possible with regards to visualizing and sharing complex scientific data. New computer graphics algorithms and interactive techniques provide us with the ability to computationally generate images that look like hand-drawn and paintbrush scientific visualizations. Evolving computer graphics techniques are used to assist with multi-user interactions in three-dimensional (3D) virtual environments and to draw on huge "electronic" displays. With the Web, the general public is now able to search on the Internet about scientific endeavors and decision makers can evaluate streaming visualizations in customized "chat" rooms.

Reference: Theresa Marie Rhyne, "Scientific Visualization in the Next Millennium", Special Issue on Vision 2000 of IEEE Computer Graphics & Application Magazine, IEEE Computer Society, Vol. 20, No. 1, (January - February 2000), 20 - 21.

Figure # III-1: Paintbrush Visualization Techniques developed by researchers at Brown University. See: "Visualizing Multivalued Data from 2D Incompressible Flows Using Concepts from Painting", R. Michael Kirby, H. Marmanis, and David H. Laidlaw, IEEE Visualization 1999 Conference Proceedings, October 1999. Image shown courtesy of David Laidlaw.

This section of the IEEE Visualization 2000 tutorial notes on "The Convergence of Scientific Visualization Methods with the World Wide Web" highlights methods for incorporating scientific visualization into multi-user and Internet-based decision support tools. There are seven major sections: 1) Joining Visualization with 3 Major Tasks of Insight; 2) Customizing Visualization Tools for Decision Making and Presentation; 3) Collaborative Visualization with Multicasting and Webcasting; 4) Streaming Visualizations over the Internet; 5) Scientific Visualization, Knowledge Discovery and Intelligent Searching; 6) Web & Intelligent Agent Assistance for Scientific Visualization; and 7) Web-based Scientific Visualization by the General Public.

III -I. Joining Visualization with the 3 Major Tasks of Insight

In scientific and other areas of investigation, tasks can be grouped into three arenas of insight: (a) analysis and exploration; (b) decision support; and (c) presentation. Below, we highlight each of these phases and use a visualization example to show how the illustation or animation might differ for each task. The visualization example is of sediment concentrations in Lake Erie resulting from a large storm. The same visualization toolkit was used to create animations for the three major tasks reviewed.

III-I. a) Analysis and Exploration

The analysis and exploration task is usually performed by the investigators on the project. Here, visualization techniques and tools are used to explore scientific data sets. Some of the visualization challenges faced in the analysis and exploration phase include determining appropriate data models to explore the problem under consideration; developing feature extraction algorithms for knowledge discovery of extremely large data sets; providing computational support for real time display and interactivity with the scientific information; and facilitating shared cyberspace explorations by multiple scientists located at geographically distant sites. These are complex challenges, currently under investigation by the scientific visualization community today, that will continue into the immediate future.

Scientific Visualization tools that allow for interactivity and real time exploration of data on workstations or personal computers are optimally suited for the analysis and exploration tasks. The Flow Analysis Software Toolkit (FAST) (http://science.nas.nasa.gov/Software/FAST/), developed at NASA Ames, and Vis5D (http://www.ssec.wisc.edu/~billh/vis5d.html), developed at the University of Wisconsin at Madison, are examples of turn key visualization tools built for these tasks. Modular Visualization Environments (MVEs) like the Application Visualization System (http://www.avs.com), Open Data Explorer (http://www.research.ibm.com/dx/), Iris Explorer ((http://extweb.nag.com/Welcome_IEC.html), and the Visualization Toolkit (vtk)(http://www.kitware.com/vtk.html) also support interactive data visualization efforts. Many visualization tool builders are also applying Java and Web-based technology to create internetworked visualization tools for analysis and exploration.

Figure # III-2: Example of an Analysis and Exploration visualization of Sediment Concentrations in Lake Erie resulting from a large storm - a collaboration between researchers at the University of California at Santa Barbara, the United States Envrionmental Protection Agency's (US EPA) Large Lakes Research Station in Gross Ile, Michigan, and the US EPA Scientific Visualization Center in Research Triangle Park, North Carolina. This visualization was created with the Flow Analysis Software Toolkit (FAST). See: (http://www.epa.gov/vislab/)

III-I. b) Decision Support

The decision support task frequently occurs after the analysis and exploration phase. With decision support, investigators are putting forward their results from the analysis and exploration phase to obtain viewpoints on how to proceed or make recommendations pertaining to their findings. The original exploration visualization is often modified and visualization software is customized for the decision support task.

Modular Visualization Environments (MVEs) like the Application Visualization System, Open Data Explorer, and Iris Explorer can be customized with user interfaces to effectively support visualization decision support tasks. Recently, the Web has become a powerful tool for videoconferencing and visual display during policy making activities. Multicasting on the Internet allows for computer-based videoconferencing for decision making activities. JPEG or GIF images of static visualization images can be incorporated into Hypertext Markup Language (HTML) documents for viewing on the Web. Scientific animations in MPEG, QuickTime movie, or streaming media formats allow for Web viewing. Web 3D technologies, like the Virtual Reality Modeling Language (VRML) or the emerging Extensible 3D specification (X3D), facilitate the interactive examination of decision support visualizations. VRML is frequently incorporated as an export file format for visualizations created with MVEs or other visualization toolkits. This allows for the sharing of these visualizations by decision makers, from a Web - VRML browser.

Figure # III-3: Example of a Decision Support visualization of the sediment concentrations in Lake Erie resulting from a large storm. Here the various components of the computational model output (wind velocity, sediment concentrations, erosion & deposition, and depth) are depicted as individual layers. This visualization was created with the Flow Analysis Software Toolkit (FAST).

III-I. c) Presentation

The presentation task is the stage where those removed from the exploration or decision making tasks have the opportunity to view the scientific results. This is a general public or scientific education phase. As a result, the visualization is again altered to provide clarity and insight to those removed from the scientific investigation. Web techniques are used to deliver this content to the general public. Streaming media technology can aid in the wide distribution of scientific animations. A challenge associated with the presentation phase includes effective compression of streaming video to allow for the timely delivery and yet preservation of the scientific integrity of the visualization. Web3D technologies are also used to facilitate interactive viewing of 3D scientific visualizations. As Web-based education and computer games increase as home activities, there will be a need to design innovative user interfaces and intelligent agents to assist in the science education process.

High-end computer modeling techniques and polished video or "movie" production techniques are often used in this phase. Examples of high end animation tools include SoftImage from Avid Technology Inc., (http://www.softimage.com/), and Maya from Alias/WaveFront (a Silicon Graphics Company), (http://www.aw.sgi.com/) These are the same type of animation systems used for computer graphics special effects in the motion picture industry. The example shown below was created with FAST. In many circumstances, it is also possible to create high quality "television broadcast" visualizations with turn key visualization and MVE systems. Internetworked 3D graphics is opening up the possiblity to deliver interactive 3D virtual experiences of scientific phenomenon. It is also possible to imagine general users creating their own scientific visualizations from publically accessible data sets.

Figure # III-4: Example of a Presentation visualization of the sediment concentrations in Lake Erie resulting from a large storm. Here arrows depict the wind direction. This image appears in the Federal High Performance & Communications: Toward a National Information Infrastructure (1994) publication. (Customized tube code for wind vector display written by Mark Bolstad.) This visualization was created with the Flow Analysis Software Toolkit (FAST).

III-I. d) The Role of Renaissance Teams

The development and usage of tools that support these three classes of visualization usually involves collaborative efforts among scientists, policy analysts, artists, programmers and other expert staff. This is often referred to as a Renaissance Team.

The Renaissance Team concept was first defined by Donna Cox and her colleagues with regard to projects at the National Center for Supercomputing Applications (NCSA). These notions also apply to scientific projects at other research centers (like NASA and the U.S. Geological Survey) and even to large scale movie projects in the entertainment industry (such as work done at Disney Feature Animation or Pixar). Collaborative computing attempts to support the Renasissance Team concept via the Internet, allowing for geographically dispersed scientific teams to share computational resources and visualizations. Figure #5 shows a working sessions between two sceintific groups located at geographically remote sites. The application, CSpray, was developed at the University of California at Santa Barbara.

Reference: Donna Cox, "Renaissance Teams and Scientific Visualization: A Convergence of Art and Science", Collaboration in Computer Graphics Education Course #29, (ACM/SIGGRAPH, July 1988) pp. 81 - 104.

Figure # III-5: Example session from CSpray, a collaborative visualization tool developed at the University of California at Santa Cruz. Reference: Alex Pang and Craig Wittenbrink, Collaborative 3D Visualization with CSpray, IEEE Computer Graphics and Applications, Vol. 17, No. 2 (March-April 1997), 32 - 41. See: (http://www.cse.ucsc.edu/research/slvg/envis.html). Image shown courtesy of Alex Pang and Craig Wittenbrink.

III-II. Customizing Visualization Tools for Decision Making and Presentation

Although standard visualization software is effective in developing initial displays of physical and natural sciences data, some customization is usually required to support both decision making and presentation. Customizing software encompasses the development of user interfaces that support collaborative computing and easy access to integrated decision support tools. Some of these issues are highlighted below.

III-II. a) Spatial Context

There are several factors that influence the visual representation of physical and natural sciences data. These include: type of data, relationships among different components of a data set, placement of data in a spatial and temporal context and interpretation of the data.

Frequently, earth sciences data is geographically registered. As a result, a map of the geographic domain is a helpful visual aid to provide spatial context for the data. Advanced principles of cartography are often applied to develop more sophisticated projections for mapping coordinate systems. The International Cartographic Association's (ICA) Commission on Visualization, the GeoVRML Working Group of the Web3D Consortium, and the Association for Computing Machinery's Special Interest Group on Graphics (ACM - SIGGRAPH) are collaborating on methods to integrate Cartographic data and Geographic Information Systems with Scientific Visualization and Web-based Exploration tools. More information on this effort, entitled "The Carto Project", can be found at the following web site: (http://www.siggraph.org/ ~rhyne/carto).

Spatial context is also important for examining other types of physical and natural data sets. In the realm of computational chemistry, merging a molecular visualization with a traditional line drawing diagram of the molecule's structure establishes a base line for decision making. In examining air flow in and around buildings, developing a three-dimensional display of the building is helpful. The level of detail depicted in the three-dimensional characteristics of the building depends on the granularity of the computational model of air flow. If the computational model is attempting to examine general air flow patterns around a building, a simple cubic representation may only be required. However, if the computational model is examining particle tracing associated with air flow inside the building, a very detailed architectural rendering of the interior of the building might be desired. The challenge for the detailed architectural rendering approach might involve merging Computer Aided Design (CAD) systems with Scientific Visualization tools.

III-II. b) Simple Visual Cues

For centuries, the field of cartography has realized the importance of providing legends and symbol keys to assist with visual cues on geographic maps. Color bars and time clocks are also important visual cues for any scientific visualization or animation.

In the environmental sciences, air and water quality computational models often examine multiple pollutants for a given scenario. The data sets from these kinds of computational runs include multiple chemical species, examined across multiple atmospheric or water layers, for episodes lasting over 100 or more time steps for a given geographic domain. Visualization tools that support labeling and titling functions along with time clocks and counters are helpful here.

III-II. c) User Interface Design - Converging with the Web

In developing visualization tools for scientists and policy analysts, it cannot be expected that all or the majority of decision makers will learn how to visual program. As a result, customized user interfaces to pre-established visual programming modules or scripts are often developed. Designing effective user interfaces also allows for access to data visualization by the general public. The emergence of the Web as a common information distribution medium has resulted in the inclusion of Java, VRML, streaming media and other Web techniques into scientific visualization tool development.

Figure # III- 6: User Interface to a prototype Terrain Modeling Web Page. The user can obtain Digital Elevation Models (DEMs) for prescribed geographic regions with this system. Using Java and Javascript, VRML files of the specified DEM are created. With a VRML Web browser, the user can then interactively explore the specified terrain model. See: Case Study #1 of these course notes and (http://www.epa.gov/gisvis/).

III-III. Collaborative Visualization with Multicasting and Webcasting

The Multicast Backbone (MBone) on the Internet provides scientists, decision makers and perhaps the general public with access to video-conferencing, audio and multiplayer capabilities from their appropriately configured desktop or mobile computers. These multi-media tools allow scientists and decision makers located at geographically remote sites to interact in real time and share visual information.

Reference: Michael R. Macedonia and Donald P. Brutzman, MBone Provides Audio and Video Across the Internet, Computer, IEEE Computer Society, Vol. 27, No. 4, (April 1994), 30 - 36.

MBone software tools are open source and in the public domain. An original suite of tools was developed for Unix workstations by researchers at Lawrence Berkeley National Laboratory. The Microsoft Bay Area Research Center has subsequently developed freeware MBone tools for Windows 95, 98 & NT platforms. Multicasting videoconferencing applications for Linux have also evolved. We note these respective web sites below:

Lawrence Berkeley National Laboratory - MBone Info.
(http://www-itg.lbl.gov/~clarsen/vconf/vconf-faq.html)

Microsoft Research - Mbone Info for Windows 95, 98 & NT
(http://www.research.microsoft.com/research/BARC/mbone/).

Linux Multicast Information by Mike Esler
(http://www.cs.washington.edu/homes/esler/multicast/#apps)

Merit Network Inc. - MBone Software Archives
(http://www.merit.edu/~mbone/index/titles.html).

In addition to MBone tools, Webcasting videoconferencing is also increasing in popularity. Webcasting facilitates real time videoconferencing but may use either unicasting or multicasting networking technology. Unicasting is where a seperate connection is set up between a server and each of its clients. Multicasting allows a server to transmit only a single stream of data, regardless of how many clients might request it. Efforts are also underway to establish networking protocols for access to the Internet from digital mobile phones, pagers, personal digital assistants, and other wireless terminals. The Wireless Application Protocol (WAP) is becoming a defacto standard for these purposes: (http://www.wapforum.org/who/index.html).

Figure # III-7: Example of MBone session showing application tools nv (network video), vat (visual audio tool), wb (whiteboard), and sd (session directory). This session occurred at the Monterey Bay Aquarium Research Institute. Image courtesy of Don Brutzman at the Naval Postgraduate School.

III-IV. Streaming Visualizations over the Internet

The concept of streaming media is based on the notion that is it is not necessary to completely download content from the Internet before it is played. Streaming audio (e.g. MP3) is likely one of the most widely known applications of streaming media across the Web. For scientific visualization purposes, there is a desire to stream animation sequences or video/sound presentations from a streaming media server to client machines. These course notes will not go into the details of encoding and hosting streaming media. An excellent resource on these topics is the forth coming book from Addison-Wesley "e-Video: How to Produce Internet Video as Broadband Technologies Converge" by H.P. Alesso. Alesso's Video Software Laboratory Web site is also an excellent resource.

Video Software Laboratory - Encoding & Hosting Streaming Video
(http://www.mpeg2-video.com/)

Reference: H.P. Alesso, e-Video: How to Produce Internet Video as Broadband Technologies Converge, Addison-Wesley, (to be published).

III-IV. a) The 2 Defacto Standard Formats for Streaming Video

The two most common streaming video formats for the Web are RealMedia (.rm), from Real Networks Inc., and Windows Media (.avi), from Microsoft Corporation. Both require the installation of a plug-in to your Web browser into order to "play" the streaming media. Freeware versions of both RealMedia and Windows Media are available from each parent company's respective Web site. These software tools are available for Microsoft Windows and Apple Macintosh platforms. UNIX and Linux systems are currently not supported.

RealNetworks Inc. - Streaming Media Web Site
(http://www.realnetworks.com/)

Microsoft Corporation - Windows Media Technologies
(http://www.microsoft.com/windows/windowsmedia/en/features/roadmap.asp)

Many scientific visualization sequences are generated and stored in client-server formats like MPEG-1 (.mpg) and Quicktime (.mov). As a result, software for converting MPEG-1 and QuickTime animations is available from Real Networks, Microsoft or other companies. To recap, MPEG-1 and Quicktime formats require the complete downloading of the file to a client Web browser before viewing the animation sequence. In many situations, decision makers or the general public may not have the computational power on their local machines to support these downloads. Streaming video does not require the complete download of content in order to begin viewing an animation sequence. This is especially helpful for lengthy video content.

MPEG Web site
(http://drogo.cselt.stet.it/mpeg/)

Apple Computer, Inc. - QuickTime Web Site
(http://www.apple.com/quicktime/)

The National Center for Supercomputing Applications (NCSA) was one of the first scientific centers to provide a streaming media server for viewing their scientific visualization sequences. NCSA has developed a comprehensive streaming media and Web-casting library that is available at: (http://www.ncsa.uiuc.edu/MEDIA/vidlib/). Figure #8 shows an example streaming visualization from their library. The Binary Neturon Star Collision visualization was created by David Bock at NCSA with the simulation work performed by Doug Swesty, Alan Calder, and Ed Wang. A detailed discussion of the volume rendering and other visualization techniques that David Bock developed is referenced below. The Neutron Star animation sequence was then converted into streaming video formats and posted to the general public on the NCSA streaming media server.

Reference: David Bock's discussion on Neutron Star Visualization Efforts: (http://woodall.ncsa.uiuc.edu/dbock/Vis/NeutronStar/Summary.html)

Figure # III-8: Example image from a streaming video clip developed at the National Center for Supercomputing Applications (NCSA). This Binary Neutron Star Collision visualization image is shown courtesy of David Bock at NCSA. To view the stream video example, you need to be on a Windows or Macintosh machine with either a RealMedia or Windows Media plug-in on your Web browser, see: (http://www.ncsa.uiuc.edu/MEDIA/vidlib/).

III-IV. b) Synchronized Multimedia Integration Language - SMIL

The Synchronized Multimedia Integration Language (SMIL) is a markup language for describing the temporal behavior, screen layout, and associated hyperlinks of a multimedia streaming presentation. SMIL is a specification based on the Extensible Markup Lanugage (XML). XML is a technology for supporting richly structured documents over the Web. The XML specification was fostered by the World Wide Web Consortium and is emerging as a technology for next-generation Web development, (see: (http://www.w3.org/TR/REC-xml)). Multimedia servers need to be configured to deliver SMIL presentations. RealServer, from RealNetworks, supports SMIL files.

The SMIL Specification for the World Wide Web Consortium
(http://www.w3.org/TR/REC-smil/)

For scientific visualization purposes, SMIL allows for combining GIF images, hypermedia links, streaming text, streaming audio and streaming video into one integrated presentation. Figure #9 shows a screen capture from a multimedia composition on geographic visualization. The initial geographic visualization work is featured as Case Study #1 in these course notes and was conducted at the US EPA Scientific Visualization Center. Peter Alesso, of Video Software Laboratory, developed the SMIL code associated with this streaming "geographic visualization" composition. SMIL provides the capability to develop Web-based multimedia presentations of scientific visualizations for decision making and general public viewing.

URL to streaming example: (http://www.mpeg2-video.com/rhyne/rhyne.ram)

Figure # III-9: Example multimedia streaming presentation created with the Synchronized Multimedia Integration Language (SMIL). This presentation on geographic visualization was adapted from the US EPA's GIS-VIS Web site at: (http://www.epa.gov/gisvis). The SMIL code was written by Peter Alesso of Video Software Laboratory, see: (http://www.mpeg2-video.com/).

III-IV. c) Web3D and Streaming 3D Objects over the Internet

The production of scientific and information visualizations involves transforming data into visual representations. The two most common approaches are: 1) conversion of mesh geometry and data directly into graphical primitives (e.g. points, lines, polygons) and 2) data sampling (e.g. volume rendering). Data model and management issues associated with these techniques are covered in earlier chapters of these course notes. Streaming video, like traditional video or movie production, does not allow for the interactive viewing and examination of the 3D characteristics of scientific visualization in real time. This means that the viewer is unable to explore closely the 3D graphics primitives or volume renderings to establish unique thoughts about visual patterns. There are various Web3D technologies that permit interactive viewing of 3D scientific visualization objects by decision makers and the general public.

III-IV. c-1) The Virtual Reality Modeling Language & Extensible 3D

The Virtual Reality Modeling Language (VRML) is an early developed (1997 ISO Specification) open standard for describing interactive 3D objects provided and delivered across the Internet. Many visualization software tools provide export capablities of their 3D objects to VRML. With an appropriate VRML plug-in installed on your Web browser, it is possible to interactively view 3D visualizations on the Web. Two of the limitations of VRML include: (a) the ASCII file format structure and (b) its client-server model for downloading objects. For 3D scientific visualization objects, the VRML ASCII format produces very large files that include both data and geometry elements. Dependency on the client-server model means that the VRML file must be completely downloaded to the client computer before viewing 3D objects. Combining these two limitations together can easily produce the "World Wide Wait" for an internetworked 3D visualization. The next generation VRML standard, entitled Extensible 3D (X3D), is intended to address the desire for a binary file format and reliable streaming VRML capabilities. The Web3d Consortium, who guides the evolution of VRML and X3D, intends to release X3D in July 2000, at SIGGRAPH 2000.

Although VRML files can be large and thus produce the "World Wide Wait", scientists, decision makers and the general public can obtain insight in viewing 3D scientific visualizations in this format. Case Study #1 of these Course notes decribes the successful implementation of geographic visualizations created with VRML. Later, in this specific writeup, we describe a successful scientific policy analysis visualization created with VRML.

Reference Web3D sites for VRML & X3D:

The Web3d Consortium,
(http://www.web3d.org/)

VRML and X3D Specification Documents,
(http://www.web3d.org/fs_specifications.htm)

The Web3d/VRML Repository,
(http://www.web3d.org/vrml/vrml.htm)

Computer Associates's free VRML Browser - Cosmo Player,
(http://www.cosmosoftware.com/)

Reference: George S. Carson, Richard F. Puk and Rikk Carey, "Developing the VRML 97 International Standard", IEEE Computer Graphics and Applications, IEEE Computer Society, Vol. 19, No. 2 (March/April 1999), pp. 52 - 58.

URL to VRML example: (http://www.epa.gov/gisvis/arcview/okral4.wrl)

Figure # III-10: Snapshot of an example VRML visualization. The original geographic visualization was created with ArcView 3D Analyst and exported to the VRML97 format. This visualization was developed for the US EPA's Human Exposure in Urban Environments Project - Alan Huber, principal investigator. See: (http://www.epa.gov/gisvis/arcview/). To view actual VRML files, your Web browser needs a VRML plug-in.

III-IV. c-2) Metastreaming of 3D Objects over the Internet

Another promising technology for streaming 3D objects over the Internet is Metastream. Metastream is an open source file format that was co-developed by Metacreations and Intel. Since the technology's development, MetaCreations has partnered with Computer Associations Inc. to create the MetaStream.com Corporation. Metastream objects stream across the Internet without a server and scale automatically to support the end-user's processing power. With the aid of smart compression technology, end-users with networking connection speeds ranging from a 56K modem to a T1 (or higher) line are able to examine content in real time. Metastream provides a free Web browser plug-in to view content from either Microsoft Windows and Apple Macintosh platforms. UNIX or Linux platforms are currently not supported. Unlike VRML or X3D, Metastream is not a modeling language. This means that a modeling tool or a visualization tool with Metastream export capabilities is required to create the 3D objects. Metacreations' Cararra, Poser and Bryce provide export translators to the Metastream format, see: (http://www.metacreations.com/products). Since this is an emerging technology, not many scientific visualization examples have yet to be built that incorporate Metastream capabilities. A novel implementation is the MetaMol Gallery of molecular modeling developed by Shawn A. Sapp at Colorado State University, (http://lamar.colostate.edu/%7Esasapp/metamol-gallery.html). Figure #11 shows a Buckminster Fullerene example.

The Metastream Web Site
(http://www.metastream.com)

Figure #III-11: Snapshot of a Metastream - streaming 3D object - molecular modeling visualization. This Buckminster Fullerene example was created by Shawn A. Sapp at Colorado State University, see: (http://lamar.colostate.edu/%7Esasapp/metamol-gallery.html). To view the Metastream example, you need to be on either a Windows or Macintosh system with the free Metastream plug-in installed with your Web browser.

Reference: Shawn A. Sapp's discussion on Buckminster Fullerene:
(http://lamar.colostate.edu/%7Esasapp/metamol-c60.html)

III-V. Scientific Visualization, Knowledge Discovery and Intelligent Searching

In recent years, there has been significant interest in applying visualization to assist with knowledge discovery and data mining techniques. Additionally, visualization researchers have examined the reverse process where intelligent feature extraction techniques are used to assist with visualizing enormous scientific data sets. We highlight some of these concepts in this section of our course notes.

III-V. a) Integrating Visualization into Data Mining Techniques

Data mining is a set of methodologies for the automated exploration of complex relationships in very large datasets. True data mining is discovery driven. This means that no a priori assumptions exist about the data. Data mining uses discovery approaches where pattern-matching and other algorithms are executed to determine key relationships in the data.

There are seven components associated with implementing a data mining activity: 1) Data Selection; 2) Data Preparation; 3) Feature Selection; 4) Model Building & Testing; 5) Results Analysis; 6) Stability Testing; and 7) Implementing Results.

  • Step #1 (Data Selection) involves determining the datasets or information for performing data mining tasks.

  • Step #2 (Data Preparation) focuses on cleaning and transforming data so that it is consistent and accurate.

  • Step #3 (Feature Selection) encompasses selecting variables in the data sets that are most correlated to the problem we want data mining to help us examine.

  • Step #4 (Model Building & Testing) includes using data mining alogrithms to build and perform preliminary testing of our predictive modeling efforts.

  • Step #5 (Results Analysis) centers on analyzing the results from our preliminary testing efforts and jumping back to Step #4 to refine our predictive model.

  • Step #6 (Stability Testing) can also be called "Reality Checking". Since we have used Knowledge Discovery techniques, we want to verify that our results from Step #5 correspond to real world observations and previously known behaviors.

  • Step #7 (Implementing Results) is the final step of building a strategy for problem resolution based on the results from Steps 1 - 6.

    The most immediate location for applying visualization techniques in this seven step process is during Step #5 - Results Analysis. SGI Inc. designed their Mine Set data mining software to provide visualization output during this phase. More information on Mine Set can be found at: (http://www.sgi.com/software/mineset/).

    Below, we show an example of how scientific visualization techniques might be useful in Step #4 - Model Building & Testing. The visualization example shown was created with Visible Decisions Inc.'s SeeIT software. For more infomation on SeeIT, see Visible Decisions' Web site at: (http://www.vdi.com), (note: Visible Decisions was recently acquired by Visual Insights). Here, SeeIT is used to examine results from a Mobile Emissions computational model. Applying visualization at Step 4 allows for a visual way of examining preliminary testing of predictive modeling efforts. Using the Virtual Reality Modeling Language (VRML97), we were successful at converting this visualization to a Web-enabled visual display. For this specific visualization example shown, our environmental scientists were aware, a priori, of this data relationship in building their predictive model.

    Figure #III-12: Use of Visualization techniques to examine relationships between data elements in a database associated with a Mobile Emissions computational model. This visual display was created in a working session with Sue Kimbrough (Principal Scientist) of the Mobile Emissions Characterization Team at the US EPA in Research Triangle Park, North Carolina. The visualization was built with Visible Decisions Inc.'s SeeIT software at the US EPA Scientific Visualization Center.

    The continued integration of visualization techniques and data mining is an important topic of research in both the scientific and information visualization communities. As internetworked mobile computing and home appliances increase, it is reasonable to assume that visual searching and data mining techniques will be embedded into these devices.

    III-V. b) Applying Intelligence and Knowledge Discovery to Visualization

    Many visualization research centers have applied intelligent algorithms to assist with visualization activities. This is particularly true when dealing with extremely large data sets. In this section, we highlight pioneering work in "feature extraction" by researchers at NASA Ames and "data signatures" by researches at PNNL.

    III-V. b-1) Intelligent Feature Extraction

    Researchers at NASA Ames have extensively explored feature extraction techniques for the computational fluid dynamics (CFD) domain. Feature extraction is designed to assist scientists who already have defined a description of the phenomena they wish to explore in a data set. Perhaps the scientists are looking for specific vortices or flow seperation phenomena in a huge data set. Feature extraction algorithms search the entire data set for every occurance of the desired phenomena and then produce 3D graphics primitives that are orders of magnitude smaller than the original data set. This allows for interactive viewing of the feature extracted graphics on client workstations.

    Reference: Steve Bryson, David Kenwright, Michael Cox, David Ellsworth, and Robert Haimes, "Visually Exploring Gigabyte Data Sets in Real Time", Communications of the ACM, Association for Computing Machinery, Vol. 42, No. 8, (August), 1999, pp. 82 - 90.

    III-V. b-1) Data Signatures for Large Data Sets

    In the information visualization arena, researchers at the Pacific Northwest National Laboratory (PNNL) use knowledge extraction and visual metaphors to identify patterns or unexpected occurances of topics over time in large text document repositories. Since many interesting classes of information have no obvious physical representation, a key challenge is establishing a visual metaphor or mapping. In some cases, information visualization researchers have borrowed concepts from the physical sciences to create a visual metaphor. Figure #13, below, shows the use of a river metaphor to visually depict patterns or unexpected occurances of topics over time. PNNL has begun to apply concepts gained from information visualization tasks to exploring scientific visualization problems. This has resulted in the concept of a "data signature" where the essence of a large dataset is captured in a small fraction of its original size. The data signature concept has been applied to climate modeling and data visualization problems.

    Reference: Pak Chung Wong, Harlan Foote, Ruby Leung, Dan Adams, and Jim Thomas, "Data Signature and Visualization of Scientific Data Sets", (under: Visualization Viewpoints: Theresa-Marie Rhyne and Lloyd Treinish, editors), IEEE Computer Graphics & Applications Magazine, IEEE Computer Society, Vol. 20, No. 2, (March - April 2000), 12 - 15.

    Figure #III-13: Example information visualization from the ThemeRiver application developed by researchers at Pacific Northwest National Laboratory (PNNL). Similar knowledge extraction technology is currently being applied to the concept of "data signatures" at PNNL. It is interesting to note that, at PNNL, scientific visualization metaphors are applied to information visualization problems while information visualization solutions are being extended to the scientific visualization arena. Image shown courtesy of Sue Havre and Beth Hetzler.

    Reference: Susan Havre, Beth Hetzler, and Lucy Nowell, "ThemeRiver: In Search of Trends, Patterns, and Relationships", in IEEE InfoVis 1999 Hot Topics publication, (InfoVis '99), pp. 23 - 26.

    III-V. c) Expanding into Web Mining

    As the Web and electronic commerce have expanded, interest in applying knowledge discovery techniques and intelligent searching to "Web Mining" has increased. Commerical applications include using Web mining to obtain information about consumer profiles and potential markets of interest. With Java and the Extensible Markup Language (XML), Web mining techniques can also be used to explore Web-based scientific data repositories as well. In the next section of these course notes, we discuss intelligent agents that assist with searching and retrieving information on the Web. These intelligent agents can be configured to deliver 3D Web-enabled visualizations associated with Web Mining activities as well search for Web-based visual information.

    III-VI. Web & Intelligent Agent Assistance for Scientific Visualization

    A significant limitation of existing hypermedia Internet tools, like Netscape, is the inability to rapidly find and quickly recall information resources of interest on the Web. Infrequent (and general) users of distributed hypermedia systems can easily become overwhelmed by the large number of links to information resources and disoriented while navigating between the various remote file servers. Since 1995, a number of Web technology companies and research centers have begun to address this issue with intelligent agent technology.

    III-VI. a) Defining Intelligent Agents

    An agent is an automated program that examines the Internet on its operator's behalf searching for specified information. There are currently agents already in active use that are called "web crawlers" or "search engines". Using keyword-based searches, web crawlers automatically search through the Internet and index the information they find. "Metacrawlers" perform multiple searches, in parallel, across the Internet. Trainable Web agents based on nerual-network software have also been built. Researchers at IBM Japan have developed the concept of an aglet. An aglet is a Java object that performs mobile agent tasks on the Internet.

    Reference Web Sites for Webcrawlers & Intelligent Agents:

    Robert Filman and Feniosky Pena-Mora, Seek, and Ye Shall Find - Arachnoid Tourist, at the IEEE Computer Society Web Site, (http://www.computer.org/internet/v2n4/w4arach.htm).

    WebCrawler, (http://webcrawler.com).

    Agent Technology Projects in the Stanford Digital Library, (http://www-diglib.stanford.edu/diglib/pub/agents.ht).

    IBM Corporation's Aglets Software Development Kit, (http://www.trl.ibm.co.jp/aglets/).

    III-VI. b) Surfing via Query by Pictorial Example

    In the digital libraries and data base management domains, research efforts are underway to expand visual information retrieval (VIR) technology. VIR supports searching through image databases using the visual information contained in the image, such as color, texture, composition, and structure rather than key words. This concept of content extraction provides a user the capability to retrieve visual information by asking a query like "Give me all pictures that look like this". The VIR system satisfies the query by comparing the content of the query picture with that of all target pictures in the database. This is called "Query by Pictorial Example", OBPE.

    Reference Web Sites for Digital Library Research & OBPE:

    UC Berkeley Digital Library Project,
    (http://elib.cs.berkeley.edu)

    Webseek at Columbia University
    A Content-Based Image and Video Search and Catalog Tool for the Web,
    (http://www.ctr.columbia.edu/webseek/)

    III-VI. c) Automating Scientific Visualization

    In the scientific visualization arena, a number of research groups have begun to explore building intelligence into visualization software. This concept allows a researcher or policy analyst to prescribe a particular analysis task such as compare ozone concentrations with power plant emissions for a given air pollution computational model scenario. The software system then automatically creates an appropriate visualization. The users of these task-directed or rule based visualization software systems specify their area of interest, describe the data parameters, and determine an analysis objective. The intelligent software tool then suggests and describes visual representations of the data that might include contour plots, isosurfaces, volume renderings, and animated vector representations.

    Reference: M.P. Baker, "The KNOWVIS Project: An Experiment in Automating Visualization", Decision Support 2001 Conference, Toronto, California, September 1994.

    Rogowitz, Bernice E. and Lloyd A. Treinish, An Architecture for Perceptual Rule-Based Visualization, Proceedings IEEE Visualization '94 Conference, San Jose, California, IEEE Computer Society Press, 1994, 236 - 243.

    The future development of intelligent agents and VIR databases that are incorporated into Internet browsing tools and scientific visualization software will aid in the building of comprehensive decision support systems. These task directed decision support systems will allow researchers and policy analysts to specify analysis requirements and perform data mining functions. The system will then automatically construct appropriate visualizations that are linked to information databases.

    Reference: Theresa Marie Rhyne, Scientific visualization and technology transfer: An EPA Case Study, (under Internet Kiosk: Ron Vetter, editor), Computer, Vol. 28, No.7, (July 1995), 94 - 96.

    III-VII. Web-based Scientific Visualization by the General Public

    Many scientific and government centers are realizing the importance of providing public access to scientific information, data and visualizations via the Web. As an example, the NASA Jet Propulsion Laboratory (JPL) provides easy access to publically released images associated with its various Solar System exploration programs, (http://photojournal.jpl.nasa.gov/) . Here, we step beyond the notion of providing visualizations. We anticipate that, in the future, decision makers and the general public will create internetworked streaming video and 3D content of scientific phenomena via their own Web-mining activities. Below, we highlight two efforts to create Web-enabled solutions. One is focused on policy analysis and the other on local community concerns.

    III-VII. a) Exploring Policy Analysis Visualization:

    In a recent project for the US EPA's Science Advisory Board, we explored the capablities of developing 3D Web visualizations of policy information using VRML 97 technology. First, weighting values were entered into a spread sheet. Next, using VR Charts, from AlterVue Systems Inc., an interactive 3D display was created from the spread sheet data. VR Charts stores its 3D displays in VRML 97 format. This greatly facilitated Web-enabling tasks. We found VR Charts to be usable by general as well as technical users. More information on VR Charts can be found at: (http://www.vrcharts.com). Figure #14 shows the results of our efforts. The 3D Bar Chart depicts a conceptualization of planning objectives versus policy topics. Using hypertext technology (HTML), online Web pages of environmental policy items are easily linked to this 3D Web visualization. Since the interactive visualization is Web-enabled, it can be shared and accessed by all members of the US EPA's Science Advisory Board. We are presently exploring how to link similar 3D Web displays to metadata indexing and search engine technologies on the US EPA's internal and public access Web sites. It is our hope to use visualization techniques to aid in navigating through the many public policy repositories that encompass the US EPA's Environmental Information resources.

    Click Here to see VRML97 model of the image shown above. (your Web Browser needs a VRML plug-in to view this.)

    Figure #III-14: Example application of visualization techniques to environmental policy data. Objectives are plotted against Policy topics. This 3D Web (Virtual Reality Modeling Language - VRML97) visualization was created for Don Barnes and Vickie Richardson of the US EPA's Science Advisory Board. VR Charts, from AlterVue Systems, Inc., was used to create this visual display. These type of visualizations will be created by decision makers and the general public in the near future.

    Reference: Theresa-Marie Rhyne, "Two Stepping Information Technology with Visualization", Computer Graphics, a publication of Association for Computing Machinery's Special Interest Group on Graphics (ACM - SIGGRAPH), Feb. 2000, v. 34, no. 1, pp. 45 - 47.

    III-VII. b) Predicting Common Place Visualization:

    As internetworked computer games increase as an activity in our society as well a digital capablities for general users to create their own content, it is not hard to imagine that private citizens will want to create their own scientific visualizations of local community concerns. Figure #15 was created with the Environmental Systems Research Institute's ArcView 3D Analyst desktop mapping product, (http://www.esri.com/software/arcview/extensions/3dext.html). The blue-green path depicts air pollution data associated with mobile (e.g. automobile) emissions that was collected with a Global Position System (GPS). Low values are dark green while high values are dark blue. This data was imported into ArcView's Table function and visualized with the 3D Analyst module.

    The City of Raleigh, North Carolina (USA) provided us with digital data of specific local streets and buildings associated with the same regions where the GPS data was collected. In this image, the street and building information is geographically registered with the mobile emissions data. Using the 3D Analyst "Extrude" functions, we were able to add height to the buildings to create a "virtual community". Since we did not have accurate data for the buildling heights, we estimated these values. This visualization can easily be exported to VRML97 format, thus allowing for a Web-enabled interactive visual display. Digital geographic information for local communities is already becoming available via the Web and usage of GPS tools are increasing. So, by 2025, it is reasonable to expect young people to use similar visualization techniques for their school science projects and older private citizens to develop scientific visualizations for real time or virtual town hall meetings.

    Figure #III-15: Example common place visualization created with desktop geographic visualization tools. It is reasonable to expect that general users, in the future, will be able to build these type of visualizations. The local community data sets would be obtained from intelligent agents and data mining activities via the Web. This specific visualization was created for the US EPA's Human Exposure in Urban Environments Project - Alan Huber, principal investigator.

    Reference: Theresa-Marie Rhyne, "Commonplace Visualizations for the Future" (a sidebar writeup), in Special Issue on Vision 2000 of IEEE Computer Graphics & Application Magazine, IEEE Computer Society, Vol. 20, No. 1, (January - February 2000), page 21.

    III-VII. Acknowledgments

    These notes and a number of the images are the result of many conversations and insights from my colleagues at the U.S. EPA Scientific Visualization Center .... so a warm thank you to Mark Bolstad (Manager), John Bailey, Ray Burton, Randall Hopper, Robert Lin, George McGregor, Todd Plessel, Richard Spencer, Michael Andrew Uhl, George Delic, Charles Foley, Matt Freeman, Dan Sullivan, and David Wong in the Lockheed Martin Visualization and Scientific Computing Group.

    I am also appreciative to EPA's Visualization Technical Services Manager - Lynne Petterson and my collaborator on GIS - Visualization integration research, Thomas Fowler. Our work lives would not be as they are if it were not for the many scientists within and outside of EPA who have shared their data and sense of wonder with us. Special thanks to David Laidlaw, Donna Cox, Alex Pang, Craig Wittenbrink, Don Brutzman, Peter Alesso, David Bock, Shawn Sapp, Steve Bryson, Pak Wong, Susan Harve, Beth Hetzler, Polly Baker, and Bernice Rogowitz for their concepts and ideas cited in the references of these course notes.

    Finally, it is always a joy doing collaborative teaching with Mike Botts, Bill Hibbard, and Mike Bailey.

    ---------------------------------------------------------------------------