"User" documentation for defgb 2.3-diagnostics optional file

Note: this must be reformated for on-line Web access.
31 May 1996 17:48:29 -0400
   Description of first .objects parameter for GLOBEC methods (31 May 96)
          (Describes methods generated from defgb v 2.3 and later)

        The first parameter (parameter 0) to a GLOBEC method will be a file
specification, optionally preceded by an at-sign, or optionally enclosed in
parentheses. 
        If neither at sign nor parentheses are present, the parameter is the
name of the level 0 data file.  
        If the at-sign is present, the parameter is assumed to be the name of 
a
file containing other file specifications, including that of the level 0 data
file.  This file is called the indirect input file. 
        If enclosed in parentheses, the file is a command whose output will
serve as the level 0 data file.  In this case, the file specification may be
followed by parameter strings.  Piping and/or i/o redirection in the command
is not allowed.

---
NOTE: Any input line which begins with *** is considered an error message
      by defgb.  The line will be passed to the error processing routines
      (see documentation of diagnostics optional file for details
      about controlling error messages) and defgb will exit with an error
      status
---

                           The indirect input file

        The format of this file begins with optional comment lines, formatted
the same way as comment lines in the level 0 data file for this method.  The
data in the file consist of lines holding one file specification apiece, 
optionally preceded by a file descriptor.  The files named in the indirect
input file are called optional files.

	An ordinary file specification is the name of a file containing
information about a particular option of the method.
	The file specification may be preceded by a backslash character.
If so, it indicates that what follows is not actually a file name.
Instead, it is one data record formatted for that particular optional file.
(Purpose of this "escape mechanism" is to allow small amounts of information
to reside in the indirect file itself without requiring another file).
	The file specification may also be a keyword of the form
.objects_parameter_N, to indicate that the file specification will come
from parameter N of the .objects file that includes the definition of the
object being accessed.  N begins with 1, signifying the first parameter
after the name of the indirect input file.  The _N is optional.  If not
present, N is determined from the file descriptor or the position of
the line within the indirect file (see below).  This type of file 
specification
may be useful if, for example, it's desirable to associate the same 
indirect input file with many different level 0 data files.

        The file descriptor, if present, is separated from the file it
describes by spaces, tabs, and/or equal signs. The file descriptor tells the
method what kind of file is being named.  The descriptors are: datafile,
transvar, inpwidths, removals, gmtparams, and diagnostics.  
        If no descriptor is present, the file is assumed to be the type 
implied
by its position within the indirect input file.  The first file specified in
the indirect input file is, if undescribed, the datafile.  The second is the
transvar file.  The third is the inpwidths file.  The fourth is the removals
file.  The fifth is the gmtparams file.  The sixth is the diagnostics file.  
It
is possible to mix described and undescribed files.  It is not possible to
specify an undescribed file without specifying all preceding files.
        If the file specification is the keyword .objects_parameter, the
"number" of the optional file is used to tell which parameter in the .objects
file corresponds to the optional file.  The datafile will be expected from
parameter 1 (first parameter after the indirect input file), the transvar file
from parameter 2, etc.  The order may be changed by specifying the parameter
number after the keyword (see above).

        The datafile specifies the level 0 data file.  It may be enclosed in
parentheses to indicate that the level 0 data will come from a script or
executable program.  The transvar file specifies the file containing variable
name translations.  The inpwidths file specifies the field widths of
fixed-field-width input  variables.  The removals file contains the names of
input variables that are to be logically removed from the dataset.  The
gmtparams file contains the names of input variables that are to be combined 
to
produce a gmt time.  The diagnostics file contains information about what
output to produce (besides the data), and the file to which the output goes. 
See below for the details of contents and formatting of each optional file.

	File specifications must each include any necessary directory 
specifications.  That is, there is no assumption made about the relationship
between the directories holding the various files (eg, that they all come
from the directory of the first file named).

        Optional files may contain information involving variables (eg,
datafile, inpwidths and removals files).  The variable names in these files 
may
be either the names before transvar translation or after.  A file descriptor
may contain the prefix "pre-translation-" or the prefix "post-translation-" to
indicate which mode is in effect.  If neither prefix is supplied,
"post-translation-" is assumed (except for datafiles, where "pre-translation-"
is assumed).  It is an error if "pre-translation-" is supplied but no transvar
file is supplied in the indirect file, or if the prefixes are supplied for
optional files that do not contain variable names, or if the prefixes are
applied to the transvar descriptor.

	It may be that a method must or must not have a particular
optional file.  Inner can be compiled with switches to indicate that a
particular file is not, in fact, optional.  See documentation for details.
The level 0 data file is not optional.

	Sample file:
# Test indirect file for met data.
# datafile = not required if data file on first line
# removals & gmtparams data short enough to be included in here via \
# variable names specified to gmtparams require translation.  HR variable
#   is not in transvar.dat, and therefore is the same pre- and post-
#   translation, so it doesn't care about the *-translation prefix.
# WJS Mar 96
/export/home/rpayne/emet259.dat
transvar=/export/home/rpayne/transvar.dat
pre-translation-gmtparams= \ hrname=HR; mnname=MN; gmtname=MN
removals = \ HR
inpwidths = /export/home/rpayne/widths.dat

---

                          The optional input files

	The format of the optional files may differ on a per-file basis.
However there is a common format, used by several, that is described here.
Files in this common format have two types of records; comment and data.
Comment records that occur before the first data record are inserted in
the data comment stream.  Comment records that occur after the first data
record are ignored.  Comment records are specified the same way as comments
for the level 0 data file.
	Data records consist of one or more pairs of strings.  Pairs are
separated by semicolons.  Within a pair, the strings are separated by blanks,
tabs, and/or equal signs.
	The first string of a pair must be unique among the first
strings of all pairs.

-------

datafile
	The format of the data file differs from method to method.  The
format for the default method, def, is described elsewhere.  The datafile
file descriptor accepts the optional pre-translation- and post-translation- 
prefixes.  The datafile is required to be present.

-------

transvar
	The presence of this file indicates that any variable names in
the dataset and also in the translation file are to be changed by the method
to the corresponding names in the translation file.
	There is no required correspondence between the variable names in the
dataset and the variable names in the translation file.  For example, you
may have a single translation file for all the variables found in all your
datasets.  On the other hand, you may translate only a few of the names in
a particular dataset.
	It is simplest if all the names, both before and after translation,
are unique.  It is not permitted for a single variable to be translated to
more than one synonym.  All the other combinations are allowed, but must
be used carefully.  For example, one dataset might have the variable LONG
and another the variable name LONGITUDE.  If you want the name of this
variable to appear as lon, you might well build one translation file, and in
it translate both LONG and LONGITUDE to lon.  This will work unless the
translation file is used on a dataset that has both the LONG and LONGITUDE
variables in it.  In this case, both will translate to lon, and the 2nd
occurrence will be flagged as a "Duplicate variable name" error.  Similarly,
it is possible but tricky to translate A to X and X to Y in the same
translation file.

        The transvar optional file is in common format.  Its descriptor,
however, does NOT accept the pre-translation or post-translation prefix.
The first element of a pair is the "old" name, found in the datasets.
The second element is the "new" synonym name to which the method is to
translate the old name.

        Sample file:
# defmet translation list from RCG via Email 29 Dec 95.   WJS 23 Mar 96
# 25 Mar 96.  Play with HR, MN translations to get gmt stuff to work. WJS
YD 	=	yrday_gmt     
LAT	=	lat
LONG 	=	lon        
SMG 	=	speed_trim  
EDO_Z 	=	depth_w 
#	Will use defgb CREATE_GMT capability to make time_gmt variable
#	from HR & MN.  To store it over MN, will translate MN into it.
#	Will use defgb variable removal capability to logically remove HR,
#	so it doesn't need translation
MN	=	time_gmt

-------

inpwidths
        The presence of this file indicates that some of the input data is to
be read from fixed-width fields instead of via free-field input. Variables
found in this file are read from fixed-width fields.  Variables not in the 
file
are read with free-field input.  Variables at a particular  level must be
either all free-field or all fixed-field.  
        It is not possible to specify the starting column for a field. 
The starting column is determined by the position of the variable within a 
data
record, and the sum of the widths of the preceding variables (if any).  The
order of the variables is determined by the variable list for the level being
read.  Accordingly, the sum of the field widths of the variables present at a
level must equal the width of the data records of that level (before any
next-level file specification)

	The inpwidths optional file is in common format.  Its descriptor
accepts the optional pre-translation- and post-translation- prefixes.
The first element of a pair is a variable name.  The second element is
the field width of that variable in the dataset.

	Sample file:
# latitude & longitude recorded to 3 decimal places
# Must read data in fixed field since latitude and longitude might
# not be separated if longitude goes negative
lat		=	7
lon		=	8
temp_air	=	5
-------

removals
	The default method allows data selection by providing a list of
variables to process.  The presence of a removals optional file allows
data selection by providing a list of variables to omit.  If a variable
is present in both a selection and omission list, the omission list takes
priority.

        The removals optional file has the same comment record rules as a
common format file.  Each data record, however, consists of a list of 
variables
separated by blanks, tabs, commas and/or semicolons.  The removals descriptor
accepts the optional pre-translation- and post-translation-  prefixes.

	Removed variables are available to calculations performed by
inner (such as gmt calculation).  They are not available to outer.

	Sample file:
# Remove HR variable, since its info will be in GMT variable
HR

-------

gmtparams
	The presence of this file indicates that GMT is to be calculated
from variables in the dataset containing hour and minute information.
The hour data is in 24-hour time format and the minute data is an integer
from 0 to 59.  The output GMT format is a 4 digit string HHMM.
        To perform this calculation, inner needs to know the locations of the
hour and minute variables, and it needs to know where to store the result,
Unfortunately, inner cannot easily create variables, so the result must
be stored over some other data, destroying it.  
        If you don't want to replace existing data, room for the gmt variable
must be made in the dataset by adding a dummy name at the level where you want
the gmt to appear (and, therefore, also adding the dummy name to the level 0
list).  
	If you DO replace existing data, you may well wish to change the
variable name of that data to reflect the fact that it will be gmt data
instead of what it originally was.  This is done via the transvar file.
	The gmt data may overlay its source hour or minute variables.

        The gmtparams optional file is in common format.  Its descriptor
accepts the optional pre-translation- and post-translation- prefixes. Its data
consists of 3 pairs.  The first element of each pair is a keyword and the
second element of each pair is a variable name.  The keywords are: hrname,
mnname, and gmtname.  hrname is the name of an  existing variable containing
the 2 digit hour.  mnname is the name of an existing variable containing the 2
digit minute.  gmtname is the name  of the existing variable to be replaced by
the calculated gmt data.

	Sample file:
#   Take hour information from variable HR and minute information from
# variable MN.  Store the calculated gmt time in the location where the 
# MN data came from, making the original MN data unavailable to users of 
# the method.  
#   Note that the resulting gmt data will be labelled MN, unless MN is
# translated to some other name (like time_gmt) in the transvar file
hrname=HR; mnname=MN; gmtname=MN

-------

diagnostics
        A method produces output of various types.  At a minimum, it generates
a stream of comments, a list of variables, and data from the data file
described above.  For a description of this output, see (?). If an error is
detected, a message is sent to the standard output device, stdout.  If the
method is compiled with certain switches, debugging information is generated
and written to the standard error device, stderr.  When being run
interactively, this output can be directed with standard shell output
redirection techniques.  When being run in an http environment, stdout output
appears on a browser screen, while stderr output typically appears in the 
httpd
server log file.
	A GLOBEC inner can produce other output.  For example, there can
be information related to the indirect and optional files, as well as
comments contained in these files.
	The diagnostics file allows control of output other than the
variable list and data.  It is possible to control both quantity and 
destination of output.
	It is possible to produce output before the diagnostics file has
been read.  In this case, output quantity and destination may be determined
by the compile-time values.  If these differ from the run-time values,
output can be split between the 2 places.
	Output may be "turned off" either by defining "quantities" to "none" 
or
"destinations" to "nowhere".  Therefore, it is not considered an error
if output is requested but there is no destination provided.

        The diagnostics optional file is in common format.  Its descriptor 
does
not accept the optional pre-translation- and post-translation- prefixes 
because
it does not contain variable names.  All of the parameters have default 
values,
determined at compile time.  Some may be specified on the compilation line.
See the "switches" documentation.

	Its data pairs consist of 5 types.  

        In the first type, the first element is the name of an output sink and
the second element is a file specification.  The output sinks are: error_sink,
data_comment_sink, other_comment_sink, and debug_sink.  error_sink is the
destination for inner's error messages.  data_comment_sink is the destination
for comment information from data files.  other_comment_sink is the
destination for all other comments.   debug_sink is the destination for
debugging information for inner's debug output.
	Each sink accepts the optional prefix addl-.  Without this prefix,
the appropriate output stream is redirected to the specified file.  With
this prefix, the stream is sent to stdout (or stderr for trace output)
and also to the specified file.  If addl- is specified, the usual stdout
stream may not be redirected.
	Each sink or addl-sink also accepts the optional prefixes new-
or unique-.  By default, output sinks are opened in "append" mode (create
new file if one doesn't exist; add to end if file exists).  
Consecutive uses of the data object will create a log-like file, in which
it may be difficult to correlate data with individual uses of the sink.  
Simultaneous uses of the data object will result in intermixed entries in 
the sink.  The new- prefix will cause a new file to be opened at each use.
Information from a previous use of the object will be lost.  Simultaneous
uses of the data object will produce unknown results (to me, anyway).  The
unique- prefix will cause a time stamp of the form ".yymmmdd-hhmm-ss" to
be appended to the sink name.  The same stamp will be appended to any
unique- sink in a single use of the data object.  Since this name is only
unique to the second, simultaneous use is still possible.  unique- sinks
are opened in "append" mode.  Multiple sinks directed to the same file
must all have the same new- or unique- prefixes (or no prefixes must be
present).
        File specifications may be any file that may be written to, including
/dev/stdout, /dev/stderr, and /dev/null.  They must include any relevant
directory information.  The length of each file specification is approximately
40 characters (there is space for 40 character names for each sink; space
unused by one name may be used by another).  If 2 sinks are to be sent to the
same file, the file must be specified the same way in each pair.  For example,
if you use an absolute directory specifier with one sink, do not use a
relative directory specifier with the other.
	Errors opening the error sink are reported to stdout, along with
the message(s) that were to be sent to the error sink.

	In the second type, the first element is the name of an output level
and the second element is an integer specifying the level.  The output level
names are error_level and trace_level.  error_level controls how much
information is included with each error message.  trace_level controls
how much subroutine call information is sent to the debug sink.
	The addl- prefix can be used with the output level names in 
conjunction
with the addl- prefix used with the output sinks.  When used, it allows
the level sent to the default sink differ from the level sent to the
duplicate sink.
	The levels' values must be between 0 and 99, inclusive.  Level 0 
produces the least output; level 99 the most.  Level n information includes
all the information produced at lower levels.
	error_level =  0 produces no error messages
		      10 produces the error message
		      20 identifies the inner writing the error
		      25 identifies the remote connection (if possible) and
			the time
		      50 may produce information about the file being 
processed
			at the time of the error
		      60 if the file is a command, attempts to copy more
			information from from the process in which the command
			is running
		      80 produces information about all files open at the time
			of the error
	trace_level =  0 produces no trace information
		      10 traces calls to ioopen
		      20 traces calls to the optional file routines
		      40 traces calls to the file-level routines (scanheader*)
		      50 traces calls to the comment-processing routines
		      70 traces calls to the record-level routines (ioreadrec)
		      
        In the third type, the first element is the name of a source of
diagnostic lines and the second element is the number of lines from that
source that are to be inserted into the error stream.  The only name 
currently implemented is maxscriptdiags.  A command being used as a data 
source might fail and issue diagnostics referring to that failure.  max-
scriptdiags refers to the maximum number of acceptable lines from such a
source.  These diagnostics are sent to the error sink.
	maxscriptdiags accepts a addl- prefix (see error_level, above).

        In the fourth type, the first element is the name of an output switch
and the second element is the value of that switch.  The output switches are:
inserted_msgs, inserted_msg_tags, and comment_source_tag.  The inserted_msgs
switch controls whether defgb generates comment lines identifying the indirect
and optional files in use.  The inserted_msg_tag switch controls whether the
inserted messages themselves are tagged with identification as to where they
came from.  The comment_source_tag switch controls whether each optional file
comment is tagged with identification as to where it came from.
        Switch values are TRUE or FALSE.  Synonyms for TRUE are true, T, t,
YES, yes, Y, y and 1. Synonyms for FALSE are false, F, f, NO, no, N, n, and 0.

        In the fifth type, the first element is the name of a data trace type
and the second element is an integer specifying the frequency of the data
trace.  The data trace types are print_iovalstr, print_iovalreal, and
print_*_lines, where * represents one of the optional file descriptors
(including "datafile"). The "diagnostics" descriptor may not be specified,
however, since by the time it is read, information from the diagnostics file
has already been processed.
	A frequency of 0 means no output.  Any other frequency means
print the first occurrence of the data trace type, and every 'frequency'th
occurrence thereafter.  For example, print_datafile_lines=10 means output
lines 1, 11, 21, etc from each input data file.  iovalstr=1 means output
the result of every iovalstr call.
	The addl- prefix can be used with the data trace types in conjunction
with the addl- prefix used with the output sinks.  When used, it allows
the amount of trace information sent to the default sink to differ from 
the amount sent to the duplicate sink.
	All data trace output is written to the debug sink.