Serving U.S. GLOBEC Data Guidelines

Introduction

The fundamental objectives of U.S. GLOBEC are dependent upon the cooperation of scientists from several disciplines. Physicists, biologists, chemists, meteorologists, resource managers, and others make use of data collected during U.S. GLOBEC field programs to further our understanding of the interplay of physics, biology, and chemistry. Our objectives require quantitative analysis of interdisciplinary data sets and therefore data must be exchanged between researchers. To extract the full scientific value, data must be made available to the scientific community on a timely basis.

Adding your data to the U.S. GLOBEC Data Management System can be as easy as sending the data to the Data Management Office as a flat file, spreadsheet or other computer readable form via e-mail (dmo@whoi.edu) or via ftp (globec.whoi.edu, change directory to /pub/incoming). In addition to the data, we also will need information about who collected the data, who should be contacted with questions (if different than the collector(s)), what methodology was used to collect and process the data into its current form, and what the field names mean, including their units. Once we get the data we will prepare it for inclusion in the U.S. GLOBEC data directory (http://globec.whoi.edu/jg/dir/globec/) and ask you for your preferences on where the link to your data should be placed in this directory.

However, since the U.S. GLOBEC Data Management System permits data to be served from any computer capable of running this software (currently Unix-based machines), it is also possible, and in may ways preferable, if the contributor serves their data from their own computer. The serving software is available from the data server web site via ftp (ftp://globec.whoi.edu/pub/software/JGOFS_GLOBEC/). The only other requirement is that your computer have web-serving software installed, such as Apache. (If your computer has a web site, then it has this software already available.) While it is possible to install the JGOFS/GLOBEC software yourself and populate it with your data on your own, many have found it useful to contact the Data Management Office for help and suggestions. If you wish, the DMO will install the software on your machine and get things set up for you. No special privileges are required to install the software, although your Webmaster will need to define an entry (called a ScriptAlias) in the httpd.conf web server configuration file so your web server knows where to look for the software.

Data Serving Guidelines

There are some guidelines, based on the recommendations of the U.S. GLOBEC Data Policy report and suggestions from Glenn Flierl, Bob Groman, Dicky Allison, and others about how to organize and submit your data for the U.S. GLOBEC data management system used by our program. However, the main strength of the system is that it can accommodate almost any kind of data with a suitably written method (i.e. computer program). But when several people collect similar data sets it will make it easier for people to retrieve the data for subsequent review and analysis if some guidelines are followed.

These guidelines are as follows:

The submission and serving of data is an iterative process. The data contributor and the Data Management work together to serve the data in a way and at a time that is most useful to the contributor and the scientific community as a whole. The data can be easily re-submitted and re-served if later processing improves data quality

Adding Your Data

To add data to the U.S. GLOBEC data management system, take the following steps:

  1. Decide where data will reside. If you wish to 'serve' the data from your own machine, you can download a copy of the JGOFS/GLOBEC data management software from our web site at ftp://globec.whoi.edu/pub/software/JGOFS_GLOBEC/. We will assist you in its installation. At this point the server computer must be running a UNIX-based operating system. If you do not wish to 'serve' your data from your own machine, the data may reside instead on the U.S. GLOBEC server (globec.whoi.edu).
  2. Organize the data in a way that is useful for your needs. Chances are that is how the data will be useful to others. If you have your own data management system and want to use it, such as Oracle or MySQL, by all means use it. We have methods that can extract these data out of your system and serve them via the U.S. GLOBEC data management system more or less automatically. If you prefer to use your data from a spreadsheet (e.g. an EXCEL spreadsheet, that is fine too. In that case, we will extract the data from the spreadsheet into ASCII and serve the data that way. We can also reorganize the data into what is called a hierarchical form as described above, to foster unique retrieval of values. This reorganization can also be done automatically so you need not take the time to reorganize your data.
  3. Please consult the U.S. GLOBEC thesaurus (http://globec.whoi.edu/globec-dir/thesaurus.html) for the preferred name of data values. It is much easier to compare results if people use a common set of field names. Contact the DMO if there are field names that you require that are not yet defined. We will work with you to create new names that meet your needs.
  4. Create the information necessary in order to use these data. This includes a description of the field names used, the data collection and data processing methodology and steps taken to produce these data, and an explanation of unusual values, including missing data. Provide the Data Management Office with the names of people contributing the data and the names of people who should be contacted with questions, if different. Keep in mind the long-term use of these data and please do not specify a graduate student as the sole contact. The material can be provided in an e-mail to the Data Management Office, either in straight text or in any common word processing program, such as Microsoft Word and WordPerfect. Oftentimes, this information is called metadata. Metadata is the information not already contained within the data file that makes the data truly useful to others.

Accessing Data

Of course, a data management system would have very little use unless one could extract data from it. The JGOFS/GLOBEC system allows you to select data meeting specified criteria (e.g. all data where depth is greater than 120 meters); project data (e.g. only show me the latitude, longitude, date and sea surface temperature values); join data with attributes in common; make a simple X-Y plot to check trends; do a few rudimentary statistical counts; and, perhaps most importantly, download the outcome of your selection and projection operations as a simple text (ASCII) table, as a Matlab-compatible binary file, and (soon) as a NetCDF file.


Last modified: September 27, 2006