Management of Biological, Physical, and Chemical Data Within the U.S. GLOBEC Program

Robert C. Groman and Peter H. Wiebe
Woods Hole Oceanographic Institution
Woods Hole, MA 02543 USA

Contained in: Proceedings of the International Workshop on Oceanograpohic Biological and Chemical Data Management, May 20-23, 1996; NOAA Techinical Report NESDIS 87, February 1997.

Keywords: Database management, GLOBEC Program, JGOFS software, Georges Bank Program, collaboratory

ABSTRACT

The primary goal of the U.S. GLOBEC (GLOBal ocean ECosystems dynamics) Georges Bank Project is to understand the population dynamics of the target species on Georges Bank in terms of their coupling to the physical environment and their predators and prey. A key component of this project is the sharing of the field, retrospective, modeled, and derived data collected by the project's scientific investigators. We use the JGOFS (U.S. Joint Global Ocean Flux Study) data management software to serve these data and information to all investigators. The JGOFS systems takes advantage of standard World Wide Web browsers (such as Netscape, Mosaic, Internet Explorer, etc.) so that everyone can access our data base to look at, manipulate, and retrieve data stored within this distributed system.

INTRODUCTION

U.S. GLOBEC (GLOBal ocean ECosystems dynamics) is a research program organized by oceanographers and fisheries scientists to address the question of how global climate change may affect the abundance and production of animals in the sea. The U.S. GLOBEC Georges Bank Project, the first of several research modules with U.S. GLOBEC, is a large multi-national, multi-year, and multi-disciplinary oceanographic program involving over seventy scientific investigators from twenty-four organizations in the USA and Canada (Figure 1). The project's primary goal is to understand the population dynamics of the target species on the Bank, cod (Gadus morhua) and haddock (Melanogrammus aeglefinus), and the copepods, Calanus finmarchicus and Pseudocalanus spp., in terms of their coupling to the physical environment and their predators and prey. The ultimate goal is to predict changes in these populations as the physical and biotic environment change due to yearly variability and possible longer-term climatic changes.

In order to accomplish these goals, intensive field, laboratory, and retrospective studies were initiated in 1994, with major field efforts in 1995, 1997, and 1999. The effort is substantial, requiring broad-scale surveys of the entire Bank, and process studies which focus both on the links between the target species and their physical environment, and the determination of fundamental aspects of these species' life history (birth rates, growth rates, death rates, etc).

Equally important are the modeling efforts that are ongoing which seek to provide realistic predictions of the flow field and which utilize the life history information to produce an integrated view of the dynamics of the populations.

ORIGIN OF PROJECT DATA SETS

Broad-scale cruises carry out CTD (Salinity, Temperature, and Depth), zooplankton, larval fish, and acoustic surveys of Georges Bank and adjacent waters. Process cruises are designed to measure vital physiological rates of zooplankton and fish larvae, and to determine fine-scale vertical and horizontal distribution of zooplankton on Georges Bank, and to link these to particular physical processes under study. Zooplankton and fish larvae are collected at stations in a variety of ways, including pump, Bongo net, and MOCNESS (Multiple Opening/Closing Net and Environmental Sensing System). They are also remotely sensed in situ with the VPR (Video Plankton Recorder), TAPS (Tracor Acoustic Profiling System, a multi-frequency acoustic system that is lowered with the CTD), and BIOMAPER (a towed acoustic system). Microzooplankton is sampled with Goflo Bottles. Hydrography, and zooplankton abundance, distribution, stage frequency distribution, and condition are measured at stations between drifter stations and along selected transects. Moorings collect time-series data (hourly averages) from a variety of sensors including vector velocity (VACM) at two heights - 15m and 45m, and temperature sensors (VACM, SeaCats, and Branker temperature-probes) located at 1, 5, 10, 15, 20, 25, 30, 35, 40, and 45m. Salinity sensors (SeaBird SeaCats sampling at 2 Hz) are at 1, 10, 20, 30, and 40m. During cruises to set or recover moorings, hydrographic sections are made with a CTD. In addition, at strategic locations, satellite tracked Lagrangian drifters are deployed monthly on broad-scale, process, or mooring cruises to provide an additional measure of current flow. The data collected at sea are analyzed along with other available data such as satellite sea surface temperature images derived from full (1 km) resolution AVHRR satellite data for all of Georges Bank and the Gulf of Maine. Also analyzed are spatially averaged CZCS pigment images derived from full (1 km) resolution CZCS and SeaWiFS satellite data for all of Georges Bank and the Gulf of Maine.

The basic measurements spawn a number of derived computational products which also must be accommodated (see Table 1).

FOSTERING MULTI-DISCIPLINARY DATA EXCHANGE AND SYNTHESIS

Our challenge is to develop the database, data analysis, and data visualization structures which will enable widely distributed, multi-disciplinary investigators to work with each other's data and to collaborate with each other without the necessity of leaving their home institution (sometimes referred to as the collaboratory concept). An essential element of the project is the U.S. GLOBEC data policy which fosters quick sharing of data and information with other members of the program. Key elements of this policy (U.S. GLOBEC Data Policy, Report Number 10, February 1994) are:

DATA MANAGEMENT STRUCTURE

We selected the U.S. Joint Global Ocean Flux Study (JGOFS) data management software [Flierl, et. al., in press] to implement our data storage and retrieval strategy. The JGOFS software provides a distributed, flexible, extensible, and data driven methodology to store and serve data and information about the data (metadata).

Distributed Functionality

The JGOFS system takes advantage of the hypertext transmission protocol (http) to exchange data between servers and clients. This enables the JGOFS system to use any UNIX or PC/Windows based computer as a server. The Georges Bank Program currently uses four data servers and one applications server. New data servers can be added easily; indeed we could have a ratio of one server per data contributor. Similarly, any networked computer system running a Web browser (such as Netscape, Internet Explorer, or Mosaic) is a supported client and has access to our on-line data and information. One does not need to know where the data are stored to access it; rather, the system takes care of automatically generating the necessary hypertext links on the Web page each time data are is requested. An additional benefit of this approach is that the data directory is automatically maintained. (See Data Driven, below.) To access our home page, use http://globec.whoi.edu.

Flexible Methods for Handling Data

The JGOFS system can handle any data format and data type necessary. This is because it uses the data object concept where data and the necessary method (i.e. software) to access it are linked into a "data object" ( Figure 2). The networking and inter-operational software are common to every data object software module; only the code specific to accessing and reading the data needs to be written. In this way, we are able to serve both ASCII and binary data with equal ease, and serve image and video data using the same software. There are three "standard" methods distributed with the JGOFS system and these can be used to handle many situations without further programming. For example the default or def method handles ASCII flat files of the form:

     leg     year    month   station lat        lon       press     temp      sal
     
     1       81       6           3          38.28   -73.53  5.000    18.334   33.570 
     1       81       6           3          38.28   -73.53  25.000  12.848   34.159 
     1       81       6           3          38.28   -73.53  49.000  11.070   34.523  
     1       81       6           3          38.28   -73.53  99.000  11.093   35.090 
     1       81       6           3          38.28   -73.53  149.000 11.906  35.487 
     1       81       6           3          38.28   -73.53  199.000 10.819  35.435 
     1       81       6           3          38.28   -73.53  300.000 8.293    35.126 
     1       81       6           3          38.28   -73.53  400.000 6.363    35.046 
     1       81       6           3          38.28   -73.53  500.000 5.724    35.019 
 

The data can be separated by either blanks, a comma, or tab. The columns do not need to line up. The def method can also be used to read hierarchically structured data where the slowest varying parameters are listed first. In fact, this is the preferred approach for handling such data.

Extensible Architecture

The JGOFS system is highly extensible since one is free to use any programming language or even scripting language to help serve data and information. The JGOFS architecture readily allows for new capabilities and new features without compromising its design or implementation. This feature is exploited, for example, when serving our static and video drifter track images as well as the AVHRR satellite images. In the later case, an existing satellite image is made available and converted, as needed, to a format suitable for display by the browser (i.e. gif image format) all within the context of data system.

Data Driven

The system lists available data objects based on what is currently available. These lists are generated each time a person requests a list and therefore they are always up to date (Figure 3). Furthermore, we attempt to serve data in the format and form used by the scientific investigator who collected or generated the data in the first place so that users of the system are actually using the same data from the same computer that the contributing investigator is using. Whenever the data are further processed or errors removed, the most up to date data are automatically made available to the others in the project. This is consistent with our policy of making data available whenever it is useful to others, even if the data are not yet final.

FUTURE VENTURES

Until recently our focus has been to provide access to the data; we have not addressed the data analysis needs of our researchers. However, we are expanding our ability to offer analysis tools to users of our system by adding additional display software and output format options. Currently, we offer basic x-y plotting (Figure 4) and basic mapping plots and a basic flat file listing of the data, in ASCII. We recently added the ability to download a Matlab formatted data file which can be loaded directly into Matlab using the standard load command. For Unix based machines, it is also possible to access data objects directly from the data system from within Matlab using the M-file command loadjg. This command, when given the data object name, accesses the data directly from the appropriate JGOFS data server over the network and creates a data vector for each field name in the object. We are also actively investigating other analysis options such as providing a scientific visualization capability and links to other display and analysis systems such as LinkWinds.

Other tasks include training of investigators with varying computer skills to make it possible for the collaboratory interactions to actually take place.

References

Flierl, Glenn R., Bishop, James K.B., Glover, David M. and Paranjpe, Satish "A Data and Information System for JGOFS", in press. A different version of this paper is available on-line at http://lake.mit.edu/datasys/jgsys.html called "JGOFS Data System Overview."

LinkWinds User's Guide: The linked Windows Interactive Data System (Version 2.1), Allan S. Jacobson, et. al.

U.S. GLOBEC Data Policy, Report Number 10, February 1994. Available on-line from http://www.usglobec.berkeley.edu/usglobec/Reports/reports.home.html.

Acknowledgments

We would like to thank Glenn Flierl for his primary contributions in designing and implementing the JGOFS data management system, and Chris Hammond and Warren Sass for their help in expanding on the system's capabilities. This project is supported by the National Science Foundation Grants OCE-9313674 and OCE-9417423. U.S. GLOBEC Contribution No. 55 and WHOI Contribution No. 9228.