Serving U.S. GLOBEC Data Guidelines
Introduction
The fundamental objectives of U.S. GLOBEC are dependent upon the
cooperation of scientists from several disciplines. Physicists,
biologists, chemists, meteorologists, resource managers, and others
make use of data collected during U.S. GLOBEC field programs to
further our understanding of the interplay of physics, biology, and
chemistry. Our objectives require quantitative analysis of
interdisciplinary data sets and therefore data must be exchanged
between researchers. To extract the full scientific value, data
must be made available to the scientific community on a timely
basis.
Adding your data to the U.S. GLOBEC Data Management System can be
as easy as sending the data to the Data Management Office as a flat
file, spreadsheet or other computer readable form via e-mail
(dmo@whoi.edu) or via ftp (globec.whoi.edu, change directory to
/pub/incoming). In addition to the data, we also will need
information about who collected the data, who should be contacted
with questions (if different than the collector(s)), what
methodology was used to collect and process the data into its
current form, and what the field names mean, including their units.
Once we get the data we will prepare it for inclusion in the U.S.
GLOBEC data directory (http://globec.whoi.edu/jg/dir/globec/) and
ask you for your preferences on where the link to your data should
be placed in this directory.
However, since the U.S. GLOBEC Data Management System permits data
to be served from any computer capable of running this software
(currently Unix-based machines), it is also possible, and in may
ways preferable, if the contributor serves their data from their
own computer. The serving software is available from the data
server web site via ftp
(ftp://globec.whoi.edu/pub/software/JGOFS_GLOBEC/). The only other
requirement is that your computer have web-serving software
installed, such as Apache. (If your computer has a web site, then
it has this software already available.) While it is possible to
install the JGOFS/GLOBEC software yourself and populate it with
your data on your own, many have found it useful to contact the
Data Management Office for help and suggestions. If you wish, the
DMO will install the software on your machine and get things set up
for you. No special privileges are required to install the
software, although your Webmaster will need to define an entry
(called a ScriptAlias) in the httpd.conf web server configuration
file so your web server knows where to look for the software.
Data Serving Guidelines
There are some guidelines, based on the recommendations of the U.S.
GLOBEC Data Policy report and suggestions from Glenn Flierl, Bob
Groman, Dicky Allison, and others about how to organize and submit
your data for the U.S. GLOBEC data management system used by our
program. However, the main strength of the system is that it can
accommodate almost any kind of data with a suitably written method
(i.e. computer program). But when several people collect similar
data sets it will make it easier for people to retrieve the data for
subsequent review and analysis if some guidelines are followed.
These guidelines are as follows:
- Data are structured hierarchically with the slowest changing
variable first (e.g. cruiseid, leg, station, then cast
number).
- Use variable/parameter names as defined in the program's
thesaurus. If your variable is not in the
thesaurus, please contact
the Data Management Office (dmo@whoi.edu) so we can add it.
The submission and serving of data is an iterative process. The
data contributor and the Data Management work together to serve the
data in a way and at a time that is most useful to the contributor
and the scientific community as a whole. The data can be easily
re-submitted and re-served if later processing improves data quality
Adding Your Data
To add data to the U.S. GLOBEC data management system, take the
following steps:
- Decide where data will reside. If you wish to 'serve' the
data from your own machine, you can download a copy of the
JGOFS/GLOBEC data management software from our web site at
ftp://globec.whoi.edu/pub/software/JGOFS_GLOBEC/. We will assist
you in its installation. At this point the server computer must be
running a UNIX-based operating system. If you do not wish to
'serve' your data from your own machine, the data may reside instead
on the U.S. GLOBEC server (globec.whoi.edu).
- Organize the data in a way that is useful for your needs.
Chances are that is how the data will be useful to others. If you
have your own data management system and want to use it, such as
Oracle or MySQL, by all means use it. We have methods that can
extract these data out of your system and serve them via the U.S.
GLOBEC data management system more or less automatically. If you
prefer to use your data from a spreadsheet (e.g. an EXCEL
spreadsheet, that is fine too. In that case, we will extract the
data from the spreadsheet into ASCII and serve the data that way.
We can also reorganize the data into what is called a hierarchical
form as described above, to foster unique retrieval of values. This
reorganization can also be done automatically so you need not take
the time to reorganize your data.
- Please consult the U.S. GLOBEC thesaurus
(http://globec.whoi.edu/globec-dir/thesaurus.html) for the preferred
name of data values. It is much easier to compare results if people
use a common set of field names. Contact the DMO if there are field
names that you require that are not yet defined. We will work with
you to create new names that meet your needs.
- Create the information necessary in order to use these data.
This includes a description of the field names used, the data
collection and data processing methodology and steps taken to
produce these data, and an explanation of unusual values, including
missing data. Provide the Data Management Office with the names of
people contributing the data and the names of people who should be
contacted with questions, if different. Keep in mind the long-term
use of these data and please do not specify a graduate student as
the sole contact. The material can be provided in an e-mail to the
Data Management Office, either in straight text or in any common
word processing program, such as Microsoft Word and WordPerfect.
Oftentimes, this information is called metadata. Metadata is the
information not already contained within the data file that makes
the data truly useful to others.
Accessing Data
Of course, a data management system would have very little use
unless one could extract data from it. The JGOFS/GLOBEC system
allows you to select data meeting specified criteria (e.g. all data
where depth is greater than 120 meters); project data (e.g. only
show me the latitude, longitude, date and sea surface temperature
values); join data with attributes in common; make a simple X-Y plot
to check trends; do a few rudimentary statistical counts; and,
perhaps most importantly, download the outcome of your selection and
projection operations as a simple text (ASCII) table, as a
Matlab-compatible binary file, and (soon) as a NetCDF file.
Last modified: September 27, 2006