Data Management in the
U.S. GLOBEC Program
SCOR/IGBP Meeting on Data Management for Marine Research Projects
Robert C. Groman
Woods Hole Oceanographic Institution
8 – 10 December 2003
Click here for PowerPoint version.

U.S. GLOBEC: Goal
To understand the population dynamics.  Ultimately want to be able to predict changes in distribution and abundance of key species as a result of changes in the physical and biotic environment, such as from climate change.

Three U.S. Programs
Georges Bank – field program started in 1995 with some cruises earlier; field program ended in 1999.
Northeast Pacific – field program started in 2000.  Gulf of Alaska too.
Southern Ocean – field program started in 2001 and ended in 2003.

Georges Bank Study Area

Northeast Pacific Study Area

Southern Ocean Study Area

Project Components
Field program: Georges Bank project completed  120 cruises with 360 days at sea.
Laboratory experiments
Retrospective studies
Analysis and synthesis

Georges Bank Data

Northeast Pacific/CGOA

Northeast Pacific/CCS

Northeast Pacific Summary

Southern Ocean

Southern Ocean Data in the Works

Data Policy
Dissemination of data to scientific investigators and others on a timely basis
Make available when useful (not necessarily only when finalized)
Serve data and information, such as reports, papers, and other program documentation

Data Characteristics and Distribution Approach
Data from many, distributed, researchers (greater than 100 contributors)
Open access – read only by everyone
Restricted access supported, but rarely used
Quality control is contributors’ responsibility and on-going
Emphasis on access to data and information as early as possible
Data sets most useful when used with other data

Data Acknowledgement Policy
Any person making substantial use of a data set must communicate with the investigator(s) who acquired the data prior to publication and anticipate that the data collector(s) will be co-author(s) of published results. This extends to model results and to data organized for retrospective studies.
See on-line policy statement

Slide 17

Data can be viewed

Or plotted

Track plot of NBP0202

Or downloaded

Data Sources
Broad-scale cruises
Process cruises
Moorings
Drifters
Satellites
Modeling

Instruments
CTDs
Rosette
MOCNESS (3 flavors)
Bongo tows
Acoustic biomass measurements
Video Plankton Recorder
Drifters, MET packages, . . .

Sensors and Computed Parameters
Conductivity, temperature, pressure, fluorescence, transmittance, acoustics, light (PAR), video, wind speed/direction,  AVHRR, . . .
Biomass, taxonomic composition/size distribution, species (counts, size, stage, status, rates, behavior), density, currents, stratification, heat flux, nutrients, turbulence, chlorophyll, . . .

Data Access
Using the JGOFS Data Management System developed by G. Flierl, J. Bishop, D. Glover, and S. Paranjpe
Distributed access via standard web browsers

Web Access
Hierarchical list of data objects
On-line list of data
Downloads as ASCII, Matlab files, or reorganized into single or multiple files
Simple X-Y plots
Created EasyKrig (kriging) and 3-D visualization applications

Distributed Data
Ten distributed data servers use the US JGOFS software
Uses the Web httpd protocol - integrates very well with standard web pages
Handles tabular data in ASCII, Matlab format, and user-supplied formats using methods.  It is object oriented and data driven.

Nostalgia
In the Olden Days ….
Reformatting and processing data was a common activity
Merging navigation with measured and computed results also took time
First data management system used 9 track tapes for data storage, run in batch
Second system used data on disk with techniques to located data within degree squares to improve performance

Meta-data
Data about data.
Document information about data elements or attributes (name, size, data type, etc), about records or data structures (length, fields, columns, etc), and about data (where it is located, ownership, etc.). Meta-data may include descriptive information about the context, quality and condition, or characteristics of the data.

Detailed Meta-data
Pros – required for full understanding of data within a database management system.  Required if others want to use the data
Cons – pain in the neck to prepare, maintain, and enter (Best to take advantage of tools)
Currently completing Global Change Master Directory’s DIF records

What’s Happening
Organizations creating systems to access their own meta-data and/or data.
Umbrella databases linking to other peoples meta-data and/or data.  (OBIS, GMBIS, …)
Linking to meta-data is more manageable than is linking to other people’s data.

Other Efforts
LabNet – consortium of marine organizations to make their data available (uses 4D Geobrowser “index cards”)
Ocean Data View  - access WOCE, NGDC, and other data sets.  CTD, bottle, XBT …
OBIS – “portal” (aggregation server) for biological data (using Darwin Core 2 – OBIS)

Other Efforts, continued
ZOPE – object oriented application server
LAS – web-based, active-image based data interface for registered data.  Used by US JGOFS Program
uBio – (Universal Biological Indexer and Organizer) a networked information service for biological information resources based on the Taxonomic Name Server (TNS), a thesaurus; an index.

Other Efforts, continued
Hexacoral – biggest user in OBIS; uses DiGIR (D.G. Fautin, et al.)
DiGIR – Distributed Generic Information Retrieval.  Uses XML protocol to get the data.  Extends XML to do queries.  Uses php software package to execute the code.  Supports  14 or 15 databases, e.g SQL based.  Three options for JGOFS: export to flat file, export to MySQL, or write own perl script to interface directly to DiGIR (ZooGene -> OBIS)

Other Efforts, continued
Oregon State University, Randy Keller and Paul Johnson, mapping specialist at HMRG
Steve Hankin, “An Implementation Plan for the Data and Communication Subsystem of the U.S. Integrated Ocean Observing System”
Margo Edwards at HIG and Dawn Wright at OSU

Other Efforts, continued
RIDGE, petrological data. Endeavor Observatory website, Lamont’s PetDB
SIO Ocean Exploration data portal, http://sioexplorer.ucsd.edu
University of Washington’s Endeavor GIS and Portal to Endeavor Data (PED)

Educational “Tools”
Virtual Research Vessel, University of Oregon and Oregon State University
REVEL, University of Washington
Dive and Discover, WHOI

Protocols
OpenDAP (was DODS) à http
DiGIR à uses XML; but too verbose for physical data.  OBIS may use OpenDAP for physical data.
JGOFS à http

Other Projects and Protocols
Apologies for references I’ve missed.
There are many other efforts underway in all these areas.

In the Trenches
What temperature: Sea surface, air, at depth?
Units?
How collected?
How calibrated?
Data quality control still labor intensive even though we can collect and store gigabytes of data daily

Future Data Management and Display Efforts
Enhance data search capabilities
Add additional graphical display (visualization) options
Improve interface between data system and visualization/analysis tool
Consider other protocols, such as OpenDAP

Slide 42