Management of Biological, Physical, and
Chemical Data Within the U.S. GLOBEC Program
Robert C. Groman and Peter H. Wiebe
Woods Hole Oceanographic Institution
Woods Hole, MA 02543 USA
Contained in: Proceedings of the International Workshop on
Oceanograpohic Biological and Chemical Data Management,
May 20-23, 1996; NOAA Techinical Report NESDIS 87, February 1997.
Keywords: Database management, GLOBEC Program,
JGOFS software, Georges Bank Program,
collaboratory
ABSTRACT
The primary goal of the U.S. GLOBEC (GLOBal ocean ECosystems
dynamics) Georges Bank Project is to understand the population
dynamics of the target species on Georges Bank in terms of their
coupling to the physical environment and their predators and prey. A
key component of this project is the sharing of the field, retrospective,
modeled, and derived data collected by the project's scientific
investigators. We use the JGOFS (U.S. Joint Global Ocean Flux Study)
data management software to serve these data and information to all
investigators. The JGOFS systems takes advantage of standard World
Wide Web browsers (such as Netscape, Mosaic, Internet Explorer, etc.)
so that everyone can access our data base to look at, manipulate, and
retrieve data stored within this distributed system.
INTRODUCTION
U.S. GLOBEC (GLOBal ocean ECosystems dynamics) is a research
program organized by oceanographers and fisheries scientists to address
the question of how global climate change may affect the abundance and
production of animals in the sea. The U.S. GLOBEC Georges Bank
Project, the first of several research modules with U.S. GLOBEC, is a
large multi-national, multi-year, and multi-disciplinary oceanographic
program involving over seventy scientific investigators from twenty-four
organizations in the USA and Canada
(Figure 1). The project's primary
goal is to understand the population dynamics of the target species on
the Bank, cod (Gadus morhua) and haddock (Melanogrammus
aeglefinus), and the copepods, Calanus finmarchicus and
Pseudocalanus spp., in terms of their coupling to the physical
environment and their predators and prey. The ultimate goal is to predict
changes in these populations as the physical and biotic environment
change due to yearly variability and possible longer-term climatic
changes.
In order to accomplish these goals, intensive field, laboratory, and
retrospective studies were initiated in 1994, with major field efforts in
1995, 1997, and 1999. The effort is substantial, requiring broad-scale
surveys of the entire Bank, and process studies which focus both on the
links between the target species and their physical environment, and the
determination of fundamental aspects of these species' life history (birth
rates, growth rates, death rates, etc).
Equally important are the modeling efforts that are ongoing which seek
to provide realistic predictions of the flow field and which utilize the life
history information to produce an integrated view of the dynamics of the
populations.
ORIGIN OF PROJECT DATA SETS
Broad-scale cruises carry out CTD (Salinity, Temperature, and Depth),
zooplankton, larval fish, and acoustic surveys of Georges Bank and
adjacent waters. Process cruises are designed to measure vital
physiological rates of zooplankton and fish larvae, and to determine
fine-scale vertical and horizontal distribution of zooplankton on Georges
Bank, and to link these to particular physical processes under study.
Zooplankton and fish larvae are collected at stations in a variety of ways,
including pump, Bongo net, and MOCNESS (Multiple Opening/Closing
Net and Environmental Sensing System). They are also remotely sensed
in situ with the VPR (Video Plankton Recorder), TAPS (Tracor Acoustic
Profiling System, a multi-frequency acoustic system that is lowered with
the CTD), and BIOMAPER (a towed acoustic system).
Microzooplankton is sampled with Goflo Bottles. Hydrography, and
zooplankton abundance, distribution, stage frequency distribution, and
condition are measured at stations between drifter stations and along
selected transects. Moorings collect time-series data (hourly averages)
from a variety of sensors including vector velocity (VACM) at two
heights - 15m and 45m, and temperature sensors (VACM, SeaCats, and
Branker temperature-probes) located at 1, 5, 10, 15, 20, 25, 30, 35, 40,
and 45m. Salinity sensors (SeaBird SeaCats sampling at 2 Hz) are at 1,
10, 20, 30, and 40m. During cruises to set or recover moorings,
hydrographic sections are made with a CTD. In addition, at strategic
locations, satellite tracked Lagrangian drifters are deployed monthly on
broad-scale, process, or mooring cruises to provide an additional
measure of current flow. The data collected at sea are analyzed along
with other available data such as satellite sea surface temperature images
derived from full (1 km) resolution AVHRR satellite data for all of
Georges Bank and the Gulf of Maine. Also analyzed are spatially
averaged CZCS pigment images derived from full (1 km) resolution
CZCS and SeaWiFS satellite data for all of Georges Bank and the Gulf
of Maine.
The basic measurements spawn a number of derived computational
products which also must be accommodated
(see
Table 1).
FOSTERING MULTI-DISCIPLINARY DATA
EXCHANGE AND SYNTHESIS
Our challenge is to develop the database, data analysis, and data
visualization structures which will enable widely distributed, multi-disciplinary
investigators to work with each other's data and to
collaborate with each other without the necessity of leaving their home
institution (sometimes referred to as the collaboratory concept). An
essential element of the project is the U.S. GLOBEC data policy which
fosters quick sharing of data and information with other members of the
program. Key elements of this policy (U.S. GLOBEC Data Policy,
Report Number 10, February 1994) are:
- Data must be made available to the scientific community on a
timely basis,
- Field data, retrospective data sets, and numerical experiments
must all be included in the database,
- As soon as data might be useful to other researcher, the data
should be released (note, the data need not be "final" for this to
occur),
- Model results that would be useful for the interpretation of
field data or comparison with later model studies should also be
included,
- Documentation of the measurement and analysis techniques
used to produce the data set must be submitted with the data to the
Data Management Office,
- Data sets consist of both the actual measurements and also
descriptive data, sometimes referred to as metadata.... U.S.
GLOBEC databases must include all relevant metadata,
- The primary responsibility of the Data Management Office is
to accept data from U.S. GLOBEC investigators, to verify the data
has been properly transmitted, to report on the status of data
submissions to the Program Manager and the Steering Committee,
and most importantly to facilitate the interdisciplinary exchange of
data.
DATA MANAGEMENT STRUCTURE
We selected the U.S. Joint Global Ocean Flux Study (JGOFS) data
management software [Flierl, et. al., in press] to implement our data
storage and retrieval strategy. The JGOFS software provides a
distributed, flexible, extensible, and data driven methodology to store
and serve data and information about the data (metadata).
Distributed Functionality
The JGOFS system takes advantage of the hypertext transmission
protocol (http) to exchange data between servers and clients. This
enables the JGOFS system to use any UNIX or PC/Windows based
computer as a server. The Georges Bank Program currently uses four
data servers and one applications server. New data servers can be added
easily; indeed we could have a ratio of one server per data contributor.
Similarly, any networked computer system running a Web browser (such
as Netscape, Internet Explorer, or Mosaic) is a supported client and has
access to our on-line data and information. One does not need to know
where the data are stored to access it; rather, the system takes care of
automatically generating the necessary hypertext links on the Web page
each time data are is requested. An additional benefit of this approach is
that the data directory is automatically maintained. (See Data Driven,
below.) To access our home page, use http://globec.whoi.edu.
Flexible Methods for Handling Data
The JGOFS system can handle any data format and data type necessary.
This is because it uses the data object concept where data and the
necessary method (i.e. software) to access it are linked into a "data
object"
(
Figure 2). The networking and inter-operational software are
common to every data object software module; only the code specific to
accessing and reading the data needs to be written. In this way, we are
able to serve both ASCII and binary data with equal ease, and serve
image and video data using the same software. There are three
"standard" methods distributed with the JGOFS system and these can be
used to handle many situations without further programming. For
example the default or def method handles ASCII flat files of the form:
leg year month station lat lon press temp sal
1 81 6 3 38.28 -73.53 5.000 18.334 33.570
1 81 6 3 38.28 -73.53 25.000 12.848 34.159
1 81 6 3 38.28 -73.53 49.000 11.070 34.523
1 81 6 3 38.28 -73.53 99.000 11.093 35.090
1 81 6 3 38.28 -73.53 149.000 11.906 35.487
1 81 6 3 38.28 -73.53 199.000 10.819 35.435
1 81 6 3 38.28 -73.53 300.000 8.293 35.126
1 81 6 3 38.28 -73.53 400.000 6.363 35.046
1 81 6 3 38.28 -73.53 500.000 5.724 35.019
The data can be separated by either blanks, a comma, or tab. The
columns do not need to line up. The def method can also be used to read
hierarchically structured data where the slowest varying parameters are
listed first. In fact, this is the preferred approach for handling such data.
Extensible Architecture
The JGOFS system is highly extensible since one is free to use any
programming language or even scripting language to help serve data and
information. The JGOFS architecture readily allows for new capabilities
and new features without compromising its design or implementation.
This feature is exploited, for example, when serving our static and video
drifter track images as well as the AVHRR satellite images. In the later
case, an existing satellite image is made available and converted, as
needed, to a format suitable for display by the browser (i.e. gif image
format) all within the context of data system.
Data Driven
The system lists available data objects based on what is currently
available. These lists are generated each time a person requests a list
and therefore they are always up to date
(Figure 3). Furthermore, we
attempt to serve data in the format and form used by the scientific
investigator who collected or generated the data in the first place so that
users of the system are actually using the same data from the same
computer that the contributing investigator is using. Whenever the data
are further processed or errors removed, the most up to date data are
automatically made available to the others in the project. This is
consistent with our policy of making data available whenever it is useful
to others, even if the data are not yet final.
FUTURE VENTURES
Until recently our focus has been to provide access to the data; we have
not addressed the data analysis needs of our researchers. However, we
are expanding our ability to offer analysis tools to users of our system by
adding additional display software and output format options. Currently,
we offer basic x-y plotting
(Figure 4) and basic mapping plots and a
basic flat file listing of the data, in ASCII. We recently added the ability
to download a Matlab formatted data file which can be loaded directly
into Matlab using the standard load command. For Unix based
machines, it is also possible to access data objects directly from the data
system from within Matlab using the M-file command loadjg. This
command, when given the data object name, accesses the data directly
from the appropriate JGOFS data server over the network and creates a
data vector for each field name in the object. We are also actively
investigating other analysis options such as providing a scientific
visualization capability and links to other display and analysis systems
such as LinkWinds.
Other tasks include training of investigators with varying computer skills
to make it possible for the collaboratory interactions to actually take
place.
References
Flierl, Glenn R., Bishop, James K.B., Glover, David M. and Paranjpe,
Satish "A Data and Information System for JGOFS", in press. A
different version of this paper is available on-line at
http://globec.whoi.edu/globec-dir/doc/datasys/jgsys.html called "JGOFS Data System
Overview."
LinkWinds User's Guide: The linked Windows Interactive Data System
(Version 2.1), Allan S. Jacobson, et. al.
U.S. GLOBEC Data Policy, Report Number 10, February 1994.
Available on-line in
Web format and
PDF format.
Acknowledgments
We would like to thank Glenn Flierl for his primary contributions in
designing and implementing the JGOFS data management system, and
Chris Hammond and Warren Sass for their help in expanding on the
system's capabilities. This project is supported by the National Science
Foundation Grants OCE-9313674 and OCE-9417423. U.S. GLOBEC
Contribution No. 55 and WHOI Contribution No. 9228.
Last modified: October 4, 2005