Collaborative Computing and Integrated Decision Support Tools for Scientific Visualization

Theresa Marie Rhyne
Lockheed Martin
U.S. EPA Scientific Visualization Center
U.S. EPA Environmental Research Center
86 T. W. Alexander Drive
Research Triangle Park, North Carolina
trhyne@vislab.epa.gov

Introduction

These notes explore the concepts of renaissance teams,collaborative computing and integrated decision support tools. There are five sections: 1) The Three Classes of Visualization Tasks; 2) Customizing Software for Analysis & Decision Making ( A First Step in Developing CollaborativeComputing Tools); 3) Multi-Variant Physical & Natural Sciences Visualization; 4) Collaborative Computing and the Three Stages of Metacomputing and 5) Looking on the Horizon -Integrated Decision Support Tools.

I The Three Classes of Visualization Tasks

In dealing with scientific data sets, there are three classes of visualization tasks that are independent of data or technique. They include: a) analysis and exploration; b) decision support; and c) presentation. Each of these tasks usually involves collaborative efforts between research scientists, policy analysts, artists, programmers, and other expert staff.

I. a) Analysis and Exploration

The analysis and exploration tasks focus on examining scientific data. These data sets can include remotely sensed or monitoring site observationsas well as large scale computational output from supercomputers. For air quality and water quality modeling efforts, some data visualization tasks include the comparisons of emission inventories which are data inputs to a model with the resulting data output from executing the model. In exploring subsurface contamination, data associated with in-situ observations is frequently combined with the generation of a three-dimensional isosurface of the contaminated zone. Thus, visualization is used as an exploratory tool for examining data integrity and data validity issues.

Visualization is also used as a technique for calibrating the computational algorithms that are components of large computer models. Here, interactive visualization tools are helpful for gaining insight into the impacts of modifying algorithms.

See Figure #1: Example of an Analysis and Exploration visualization of Sediment Concentrations in Lake Erie resulting from a large storm - a collaboration between researchers at the University of California at Santa Barbara, the U.S. EPA Large Lakes Research Station in Gross Ile, Michigan, and the U.S. EPA Scientific Visualization Center.

I. b) Decision Support

Visualization techniques also assist with policy setting and decision making processes. At the U.S. Environmental Protection Agency (U.S. EPA), visualization is used by the Office of Air Quality Planning and Standards as a visual display tool in the process associated with developing air quality standards, policies and procedures. The U.S. EPA Great Lakes National Program Office uses visualization as an inquiry and decision support tool for water quality and ecosystem analyses.

These activities require the customization of visualization software to support policy decision making efforts. This includes the creation of color legends and titling tools that are linked into the visual display. These tools are interactive and usable by policy analysts. Customized point and click user interfaces to visualization tools are also helpful.

Figure #2: Example of a Decision Support visualization of the sediment concentrations in Lake Erie resulting from a large storm. Here the various components of the computational model output (wind velocity, sediment concentrations, erosion &deposition, and depth) are depicted as individual layers.

I. c) Presentation

There is also a need to develop visualizations and animation sequences that educate the general public and inform high level decision makers about scientific concerns. These presentation visualizations frequently require the use of high end animation tools. The final product is often a polished production with voice over narration and background music soundtracks.

Figure #3: Example of a Presentation visualization of the Regional Acid Deposition Model (RADM). Here a "mountain plot" technique is used to emphasize the geographic locations of high SO2 deposition.

I. d) The Role of Renaissance Teams

The development and usage of tools that support these three classes of visualization usually involves collaborative efforts among scientists, policy analysts, artists, programmers and other expert staff. This is often defined as a Renaissance Team.

Reference: Cox, Donna, "Renaissance Teams and ScientificVisualization: A Convergence of Art and Science", Collaboration in ComputerGraphics Education Course #29, (ACM/Siggraph, July 1988) pp. 81 - 104.

At the U.S. EPA, the Renaissance Team approach has been applied to visualization toolkit development for collaborative computing. The composition of the team includes: a) environmental and computational scientists in an EPA research Lab; b) policy analysts and computational scientists in an EPA program office; c)computational model builders; and d) visualization specialists. The goal has been to build tools which scientists and policy analysts can use for the daily examination and visual display of physical and natural data. Current efforts include the transfer of this technology beyond the Federal government to State environmental protection agencies.

The wide usage of visualization tools will also allow for collaborative teams that support multi-disciplinary research activities in the physical and natural sciences. The next section of these course notes will highlight efforts to customize visualization software for exploring multi-variant data sets.

II. Customizing Software for Analysis & DecisionMaking
(A First Step in Developing Collaborative ComputingTools)

Although standard visualization software can be effective in developing initial displays of scientific data, some customization is usually required to support both the analysis and decision making process. Customizing software encompasses the development of user interfaces that support collaborative computing and easy access to integrated decision support tools. Some of these issues are highlighted below.

Reference: Rhyne, Theresa, Mark Bolstad, Penny Rheingans,Lynne Petterson and Walter Shackelford, Visualizing Environmental Data at the EPA, IEEE Computer Graphics and Applications, Vol. 13, No. 2, (March 1993), pp.34 - 38.

II. a) Spatial Context

There are several factors that influence the visual representation of scientific data. These include: type of data, relationships among different components of a data set, placement of data in a spatial and temporal context and interpretation of the data.

Frequently, earth sciences data is geographically registered. As a result, a map of the geographic domain is a helpful visual aid to provide spatial context for the data. Advanced principles of cartography can also be applied todevelop more sophisticated projections for mapping coordinate systems. At theU.S. EPA, we are currently exploring methods to integrate our Geographic Information Systems set of tools with Scientific Visualization software in order to create a comprehensivesoftware environment for the visual display of geographically registered environmental data sets.

Spatial context is also important in examining other types of scientific data sets. In the realm of computational chemistry, merging a molecular visualization with a traditional line drawing diagram of the molecule's structure establishes a base line for decision making. In examining air flow in and around buildings, developing a three-dimensional display of the building is helpful. The level of detail depicted in the three-dimensional characteristics of the building depends on the granularity of the computational model of air flow. If the computational model is attempting to examine general air flow patterns around a building, a simple cubic representation may only be required. However, if the computational model is examining particle tracing associated with air flow inside the building, a very detailed architectural rendering of the interior of the building might bedesired. The challenge for the detailed architectural rendering approach mighti nvolve merging Computer Aided Design (CAD) systems with ScientificVisualization tools.

II. b) Simple Visual Cues

In air quality and water quality visualizations where concentration levels of pollutants and times of exposure are critical, visual cues that describe these changing activities are important. Color bars and legends are helpful for these purposes. At the U.S. EPA, we have often customized visualization software to support environmental researchers' and policy analysts' needs to depict several emission scenarios for developing air and water quality guidelines. Here, the ability to support multiple color maps and discrete color mapping functions in a single visualization/animation sequence becomes important.

Complex air quality and water quality computational models often examine multiple pollutants for a given scenario. Thus, the data sets from these kinds of computational runs include multiple chemical species examined across multiple atmospheric or water layers for episodes lasting over 100 or more time steps for a given geographic domain.Visualization tools that support labeling and titling functions are helpful here. Time clocks and counters are also effective visual cues for these animation sequences.

To support the analysis and decision making process, we often use discrete color maps (originally developed for printouts from computer plotting devices). Cool hues and colors (e.g. dark purple and blue) indicate low concentrations while warmer tones (e.g. orange and red) denote higher values. In some circumstances, yellow has been used to indicate a midrange value in the data where air quality or water quality standards could be exceeded.

II. c) User Interface Design - Distributed Networks

In developing visualization tools for scientists and policy analysts, it cannot be expected that all or the majority of decision makers will learn how to visual program. As a result, customized and pre-established visual programming modules and networks need to be created that support the visual display of output from scientific models and data sets. As mentioned in the previous section, data output from air quality and water quality models can consist of multiple chemical species examined across multiple air or water layers for a given episode having 100 or more time steps. Designing effective user interfaces that allow decision makers to visually examine these type of datasets is one of the challenges we are presently dealing with at the U.S. EPA. We have used the widget and button tools of visual programming environments(often encompassed in visualization toolkit packages) to build user interfaces which are linked to pre-established visualization networks.

EPA researchers frequently execute their computational models on the Cray C90 remotely located at the National Environmental Supercomputing Center (NESC) in Bay City, Michigan. The data frequently requires mass storage for retrieval at a later point in time after execution of thecomputational models. As a result, customized visualization modules and networks need to address distributed network processing and remote module execution. We have built visualization networks that transparently combine modules from a heterogeneous group of compute engines, storage systems, and workstations.

II. d) The Need for Collaborative Computing

Once visualization networks are built and user interfaces designed, there is a need to provide scientists with the capabilities to share visual information in real time.Often researchers are located in sites geographically remote from one another. Then, the real time sharing of visual information requires the usage of high speed networking. Thus begins the journey of collaborative computing.

III. Multi-Variant Physical & Natural Sciences Visualization

An important initiative of the U.S. High Performance Computing and Communications Program involves Grand Challenge research efforts that attemptto examine the multi-variant concerns of physical and natural sciences problems. For the environmental and earth sciences, this encompasses the merger of air, water and subsurface data sets into single visualization presentations. These multi-variant projects involve collaborative efforts among physical and natural scientists located at research sites across the United States and abroad. (See Figure #2 as an example multi-variant visualization)

Some of the system design issues associated with collaborative computing that supports the examination of multi-variant data types include: data format standards; data management; graphics-client software; and tracking & steering functions for collaborative efforts. Historically, air quality, water quality and subsurface computational models have been developed independently of each other. As a result, the data output format structures differ. Determining a common data output format is a part of the collaborative process. This includes determining the appropriate time step value for animation sequences. Some data sets might animate according to hourly time steps whileothers might change over daily or monthly time periods. Data storage, access and retrieval are important data management issues for effective exploration of multi-variant physical and natural sciences data.

There are many situations where simulation codes have already been executed but there continues to remain a need for collaborative analyses of the computational output. This analysis function provides for two or more scientists at remote locations to simultaneously view the computational output and pass control of the interactive analysis to each other to allow for question-answers, mutual clarification, or expert to novice advice on interpretation of the data in question. Data storage and retrieval mechanisms become important for these situations.

The Tecate Visualization System, developed at the San Diego Supercomputer Center, is a software environment that supports exploratory visualization of data collected from networked data systems. A simple World Wide Web interface accesses and stores earth sciences data into a database management system. This visualization management system is intended to extend beyond the typical database management environment by storing information on how to visualize the data with the data itself. For collaborative computing, this approach will allow scientists to return to their data sets with a "record" of previous visualization efforts. This record is helpful when exploring multi-variant data sets which have not previously been combined.

Reference: Kochevar, Peter, "The Tecate VisualizationSystem", at http://www.sdsc.edu/SDSC/Research/Visualization/Tecate/tecate.html), 1996.

IV. Collaborative Computing and the Three Stages of Metacomputing

Collaborative computing involves facilitating information discovery and scientific visualization activities between researchers located at various remote sites. It includes the use of visualization and information retrieval in a high speed networked environment. Computing resources become transparently available to researchers via the networked environment and this results in a metacomputer. The metacomputer is a network of heterogeneous, computational resources linked by software in such a way that these linked resources can be used as easily as a personal computer. For any one research project, a scientist might use a desktop workstation, remote supercomputer, a mainframe supporting mass storage archive of data, and a specialized high performance graphics workstation.

The metacomputer concept is still evolving and software tools to support these activities are beginning to emerge. Three stages of metacomputing are outlined in the following discussion.

IV. a) Metacomputing Stage 1 - Building AccessTools

The first stage of effective collaborative computing is primarily a software and hardware integration effort. It involves connecting all of the metacomputing resources with high-speed networks, implementing a distributed file system, coordinating researchers' access across the various computational elements, and creating a seamless software access to the computing technology. Examples of software that support information discovery and its visual display in a metacomputing environment are NCSA's Mosaic and Netscape from Netscape Communications Corporation. These tools allow for browsing the World Wide Web (WWW). These browsing tools are hypertext windowing systems and are available for the X window system, Apple Macintosh and Microsoft Windows environments. Using point and click methods, researchers are "linked" to information resources across the Internet. With appropriate graphics hardware and memory, it is possible to access, display and run animation files. With WWW browser tools, researchers are able to access and share information across heterogeneous computing platforms. (As an example, the U.S. EPA's WWW application is located at(http://www.epa.gov/).)

Another important component of these collaborations involves usage of the Multicast Backbone (MBone) on the Internet. The MBone provides scientists access to video-conferencing type capabilities from their appropriately configured desktop workstations. Public domain tools that support live audio,video and whiteboard activities are currently available for these efforts. These multi-media tools allow scientists located at geographically remote sites to interact in real time and share visual information.

Reference: Macedonia, Michael R. and Donald P. Brutzman,MBone Provides Audio and Video Across the Internet, Computer, IEEE ComputerSociety, Vol.27, No. 4, (April 1994), 30 - 36. Also see(http://taurus.cs.nps.navy.mil/pub/mosaic/mbone.html).

IV. b) Metacomputing Stage 2 - Computing in Concert

The second step of collaborative computing focuses on spreading components of a single research application across several computers. This permits a center's heterogeneous collection of computers to work in concert on a single problem. Software that supports collaborative visualization by researchers at remote sites is just now emerging. One example of a prototype interactive scientific data analysis and visualization system was developed at NASA's Jet Propulsion Laboratory (JPL). The Linked Windows InteractiveData System (LinkWinds) is designed to support the two and three dimensional visualization of multi-variable earth and space sciences data sets. LinkWinds supports networked collaborative data analysis. The graphical user interface (GUI) is X-Windows based while the computer graphics rendering functions rely on the Silicon Graphics,Inc. (SGI) OpenGL specification. LinkWinds is designed to handle direct access to a variety of data formats which allows for the merger and visual display of datasets from multiple computational sources and scientific disciplines. The networking functions of LinkWinds do not rely upon the X Window networking facilities. Instead, the implementation (based on MUSE) transmits only individual control values and button or menu selections. This reduces the sizable steam of commands which sometimes result under X Window networking facilities.

Reference: Jacobson, Allan S., Andrew L. Berkin, andMartin N. Orton, LinkWinds: Interactive Scientific Data Analysis andVisualization, Communications ACM 37, 4 (April 1994), 43 - 52.

LinkWinds WWW site: (http://linkwinds.jpl.nasa.gov/)

IV. c) Metacomputing Stage 3 - Surfing on the Infrastructure

The third stage of this process will be a transparent national network that will dramatically increase computational and information resources available to explore a scientific research application. This stage is closely tied to the activities of the National Information Infrastructure.

The High Performance Computing and Communications Program (HPCC) in the United States is supporting research and development (R&D) in gigabit speed networks. This technology is designed to support researchers' requirements to continuously display on local workstations the output from model simulations running on remote high performance systems. These R&D efforts are examining satellite, broadcast, optical, and affordable local area networking designs. These networking technologies are intended to support the rapid movement of huge files of data, images and videos in a shared, collaborative computing environment which spans thousands of networks and encompasses millions of users.

Reference: High Performance Computing and Communications: Foundation for America's Information Future, (Supplement to thePresident's Fiscal Year 1996 Budget), A report by the Committee on Information and Communications, National Science and Technology Council, 1996.(http://www.hpcc.gov/blue96/index.html)

There are positive and negative technical and social impacts associated with surfing on this telecommunications infrastructure.Positive aspects are associated high speed networked collaborations that center on real time visualization and information discovery among geographically remote research or Renaissance Teams. There are also negative impacts or roadblocks associated with metacomputing. Network transmission difficulties and differences in desktop workstation architectures can cloud the actual visualization two collaborating researchers who are simultaneously viewing and steering. Setting up and learning to use the metacomputing infrastructure can be all consuming and thus refocus the basic education or scientific discovery process. These remain unresolved issues as we move into the realm of multimedia.

V. Looking on the Horizon -- Integrated Decision SupportTools

There are many unresolved computing challenges associated with scientific visualization. Here, we present two such issues for thought and consideration: a) the notion of GIS -VIS integration and b) the use of the WWW and intelligent agents for assisting scientific visualization efforts.

V. a) GIS - VIS Integration

Geographic Information System (GIS) and Visualization methodologies and techniques are used to examine some types of scientificdata. Interestingly, both of these disciplines developed and have often been implemented in parallel to each other. In many situations,Visualization is primarily associated with computational simulations and modeling efforts as well as data obtained from satellite and remote sensing systems. GIS environments have supported the collection, analysis, and display of large volumes of spatially referenced data pertaining to geographic, cultural, political, and statistical arenas.

Often the hardware configurations optimized to support GIS are not compatible with Visualization methodologies. Frequently, Visualization software is customized to encompass standard cartographic and spatial display capabilities of GIS environments. Here, we propose the integration of these two disciplines to enhance decision making. This will allow scientists and policy makers the capability to move between multiple information and visual displays of their data. The GIS capabilities will allow for researchers to query data points and obtain precise locations and attributes. The Visualization functions will support the creation of 3-dimensional surfaces and animations of multiple data sets.

Three levels of GIS and Visualization integration can be defined: rudimentary, operational, and functional. The rudimentary approach uses the minimum amount of data sharing and exchange between these two technologies. The operational level attempts to provide consistency of the data while removing redundancies between the two technologies. The functional form attempts to provide transparent communication between these respective software environments. At this level, the user only needs to request information and the integrated system retrieves or generates the information depending upon the request.

Software links that attempt to integrate GIS and Visualization toolkits are currently being established. IBM Data Explorer(Visualization Toolkit) modules that link to Environmental Systems Research Institutes (ESRI)'s Arc-Info (GIS)software are under development and testing. Systems designers at Advanced Visual Systems have also built Application Visualization System ( AVS -Visualization Toolkit) modules which support Arc-Info data types and formats. The US Army Construction Engineering Research Laboratory has expanded itspublic domain GIS system(GRASS) to support visualization capablities, (http://www.cecer.army.mil/grass/viz/VIZ.html). Information on the US EPA efforts to examine GIS-Visualization integration can be found at (http://www.epa.gov/ givis).

This type of integration will provide scientists and policy analysts with an improved set of information and visual display tools for data exploration and decision making as well as increasing the capabilities for collaborative computing.

References: Rhyne, Theresa Marie, William Ivey, Loey Knapp, Peter Kochevar, and Tom Mace, "Visualization and Geographic Information System Integration: What are the needs and the requirements, if any ?", a Panel Statement in Proceedings IEEE Visualization'94 Conference, IEEE Computer Society Press, Washington D.C., 1994, 400-403.

Brown, William M., Mark Astley, Terry Baker, and Helena Mitasova, "GRASS as an Integrated GIS and Visualization System for Spatio-Temporal Modeling", Proceedings of AutoCarto 12, Charlotte, North Carolina, 1995. (http://softail.cecer.army.mil/~brown/papers/AC12.html)

V. b) WWW & Intelligent Agent Assistance for Visualization

A significant limitation of existing hypermedia Internet tools, like Mosaic and Netscape, is the inability to rapidly find and quickly recall information resources of interest on the WWW. Infrequent (and general) users of distributed hypermedia systems can easily become overwhelmed by the large number of links to information resources and disoriented while navigating between the various remote file servers. One potential solution to this dilemma is the incorporation of intelligent or remote agent capabilities into browser programs.

Reference: R. Vetter, C. Spell and C. Ward, "Mosaic andthe World-Wide Web", Computer, IEEE Computer Society, Vol. 27, No. 27, No. 10,October 1994, pp. 49 - 57.

An agent is an automated program that examines the Internet on its operator's behalf searching for specified information. There are currently a few agents already existing which are called "web crawlers" or "search engines". Using keyword-based searches, web crawlers automatically search through the Internet and index the information they find. Efforts are also underway to develop "metacrawlers" that perform multiple searches, in parallel, across the Internet.

References:"WebCrawler", http://webcrawler.com, 1996.
"MetaCrawler Multi-Threaded Web Search Engine",by Erik Selberg and Oren Etzioni, 1996. (http://metacrawler.cs.washington.edu:8080/)

" AutoNomy" - Neural Autonomous Information Agents (http://www.camneuro.stjohns.co.uk/multimed/autonomy.html)

In the digital libraries and data base management domains, research efforts are underway to expand visual information retrieval (VIR) technology. VIR supports searching through image databases using the visual information contained in the image, such as color, texture, composition, and structure rather than key words. This concept of content extraction provides a user the capability to retrieve visual information by asking a query like "Give me all pictures that look like this".The VIR system satisfies the query by comparing the content of the query picture with that of all target pictures in the database. This is call "Query by Pictorial Example",QBPE.

References: "UC Berkeley Digital Library Project", (http://elib.cs.berkeley.edu), 1996.

"IBM's Query by Image Content Project", 1996 (http://wwwqbic.almaden.ibm.com/~qbic/qbic.html)

Columbia University's "ADVENT" Project, 1996 (http://www.ctr.columbia.edu/advent/demos.html)

In the scientific visualization arena, a number of research groups have begun to explore building intelligence into visualization software. This concept allows a researcher or policy analyst to prescribe a particular analysis task such as to compare ozone concentrations with power plant emissions for a given air pollution computational model scenario. The software system then automatically creates an appropriate visualization. The users of these task-directed or rule based visualization software systems will specify their area of interest, describe the data parameters, and determine an analysis objective. The intelligent software tool will then suggest and describe visual representations of the data which might include contour plots, isosurfaces, volume renderings, and animated vector representations.

Reference: M.P. Baker, "The KNOWVIS Project: An Experiment in Automating Visualization", Decision Support 2001 Conference Proceedings, 1995.

Rogowitz, Bernice E. and Lloyd A. Treinish, An Architecture for Perceptual Rule-Based Visualization, Proceedings IEEE Visualization '94 Conference, San Jose, California, IEEE Computer Society Press, 1993, 236 - 243.

Here, we propose the notion of integrating these three research efforts. We suggest building an intelligent WWW - based application that educates and assists novice and advanced users in the application of scientific visualization techniques. The future development of intelligent agents and VIR databases that are incorporated into Internet browsing tools and scientific visualization software will aid in the building of comprehensive decision support systems. These task directed decision support systems will allow researchers and policy analysts to specify analysis requirements. The system will then automatically construct appropriate visualizations that are linked to information databases.

Reference: Rhyne, Theresa Marie, Scientific visualization and technology transfer: An EPA Case Study, (under Internet Kiosk: Ron Vetter,editor), Computer, Vol. 28, No.7, (July 1995), 94 - 96.

VI. Acknowledgments

These notes and a number of the images are the result of many conversations and insights from my colleagues at the U.S. EPA Scientific Visualization Center.... so a warm thank you to Mark Bolstad, Tom Boomgaard, Al Bourgeois, Todd Plessel, Dan Santistevan, Mike Uhl, Jeff Wang, and Zhang Yanching in the Lockheed Martin ServicesGroup.

I am also appreciative to EPA's Work Assignment Manager for Visualization -Lynne Petterson and my collaborator on GIS - Visuaization integration research, Thomas Fowler (Lockheed Martin - GIS expert). Our work lives would not be as they are if it were not for the many scientists within and outside of EPA who have shared their data and sense of wonder with us. Special thanks to Donna Cox, Peter Kochevar, DonBrutzman, Bud Jacobson, Ron Vetter, and Polly Baker for their concepts and ideas cited in the references of these course notes. Warm regards are also extended to Bill Brown, Loey Knapp, William Ivey, and Tom Mace who shared their insights on the GIS - VIS Integration topic.

Section III of Siggraph 96 Course #16Notes for Visualizing Scientific Data and Information