31 May 1996 17:48:29 -0400
Description of first .objects parameter for GLOBEC methods (31 May 96) (Describes methods generated from defgb v 2.3 and later) The first parameter (parameter 0) to a GLOBEC method will be a file specification, optionally preceded by an at-sign, or optionally enclosed in parentheses. If neither at sign nor parentheses are present, the parameter is the name of the level 0 data file. If the at-sign is present, the parameter is assumed to be the name of a file containing other file specifications, including that of the level 0 data file. This file is called the indirect input file. If enclosed in parentheses, the file is a command whose output will serve as the level 0 data file. In this case, the file specification may be followed by parameter strings. Piping and/or i/o redirection in the command is not allowed. --- NOTE: Any input line which begins with *** is considered an error message by defgb. The line will be passed to the error processing routines (see documentation of diagnostics optional file for details about controlling error messages) and defgb will exit with an error status --- The indirect input file The format of this file begins with optional comment lines, formatted the same way as comment lines in the level 0 data file for this method. The data in the file consist of lines holding one file specification apiece, optionally preceded by a file descriptor. The files named in the indirect input file are called optional files. An ordinary file specification is the name of a file containing information about a particular option of the method. The file specification may be preceded by a backslash character. If so, it indicates that what follows is not actually a file name. Instead, it is one data record formatted for that particular optional file. (Purpose of this "escape mechanism" is to allow small amounts of information to reside in the indirect file itself without requiring another file). The file specification may also be a keyword of the form .objects_parameter_N, to indicate that the file specification will come from parameter N of the .objects file that includes the definition of the object being accessed. N begins with 1, signifying the first parameter after the name of the indirect input file. The _N is optional. If not present, N is determined from the file descriptor or the position of the line within the indirect file (see below). This type of file specification may be useful if, for example, it's desirable to associate the same indirect input file with many different level 0 data files. The file descriptor, if present, is separated from the file it describes by spaces, tabs, and/or equal signs. The file descriptor tells the method what kind of file is being named. The descriptors are: datafile, transvar, inpwidths, removals, gmtparams, and diagnostics. If no descriptor is present, the file is assumed to be the type implied by its position within the indirect input file. The first file specified in the indirect input file is, if undescribed, the datafile. The second is the transvar file. The third is the inpwidths file. The fourth is the removals file. The fifth is the gmtparams file. The sixth is the diagnostics file. It is possible to mix described and undescribed files. It is not possible to specify an undescribed file without specifying all preceding files. If the file specification is the keyword .objects_parameter, the "number" of the optional file is used to tell which parameter in the .objects file corresponds to the optional file. The datafile will be expected from parameter 1 (first parameter after the indirect input file), the transvar file from parameter 2, etc. The order may be changed by specifying the parameter number after the keyword (see above). The datafile specifies the level 0 data file. It may be enclosed in parentheses to indicate that the level 0 data will come from a script or executable program. The transvar file specifies the file containing variable name translations. The inpwidths file specifies the field widths of fixed-field-width input variables. The removals file contains the names of input variables that are to be logically removed from the dataset. The gmtparams file contains the names of input variables that are to be combined to produce a gmt time. The diagnostics file contains information about what output to produce (besides the data), and the file to which the output goes. See below for the details of contents and formatting of each optional file. File specifications must each include any necessary directory specifications. That is, there is no assumption made about the relationship between the directories holding the various files (eg, that they all come from the directory of the first file named). Optional files may contain information involving variables (eg, datafile, inpwidths and removals files). The variable names in these files may be either the names before transvar translation or after. A file descriptor may contain the prefix "pre-translation-" or the prefix "post-translation-" to indicate which mode is in effect. If neither prefix is supplied, "post-translation-" is assumed (except for datafiles, where "pre-translation-" is assumed). It is an error if "pre-translation-" is supplied but no transvar file is supplied in the indirect file, or if the prefixes are supplied for optional files that do not contain variable names, or if the prefixes are applied to the transvar descriptor. It may be that a method must or must not have a particular optional file. Inner can be compiled with switches to indicate that a particular file is not, in fact, optional. See documentation for details. The level 0 data file is not optional. Sample file: # Test indirect file for met data. # datafile = not required if data file on first line # removals & gmtparams data short enough to be included in here via \ # variable names specified to gmtparams require translation. HR variable # is not in transvar.dat, and therefore is the same pre- and post- # translation, so it doesn't care about the *-translation prefix. # WJS Mar 96 /export/home/rpayne/emet259.dat transvar=/export/home/rpayne/transvar.dat pre-translation-gmtparams= \ hrname=HR; mnname=MN; gmtname=MN removals = \ HR inpwidths = /export/home/rpayne/widths.dat --- The optional input files The format of the optional files may differ on a per-file basis. However there is a common format, used by several, that is described here. Files in this common format have two types of records; comment and data. Comment records that occur before the first data record are inserted in the data comment stream. Comment records that occur after the first data record are ignored. Comment records are specified the same way as comments for the level 0 data file. Data records consist of one or more pairs of strings. Pairs are separated by semicolons. Within a pair, the strings are separated by blanks, tabs, and/or equal signs. The first string of a pair must be unique among the first strings of all pairs. ------- datafile The format of the data file differs from method to method. The format for the default method, def, is described elsewhere. The datafile file descriptor accepts the optional pre-translation- and post-translation- prefixes. The datafile is required to be present. ------- transvar The presence of this file indicates that any variable names in the dataset and also in the translation file are to be changed by the method to the corresponding names in the translation file. There is no required correspondence between the variable names in the dataset and the variable names in the translation file. For example, you may have a single translation file for all the variables found in all your datasets. On the other hand, you may translate only a few of the names in a particular dataset. It is simplest if all the names, both before and after translation, are unique. It is not permitted for a single variable to be translated to more than one synonym. All the other combinations are allowed, but must be used carefully. For example, one dataset might have the variable LONG and another the variable name LONGITUDE. If you want the name of this variable to appear as lon, you might well build one translation file, and in it translate both LONG and LONGITUDE to lon. This will work unless the translation file is used on a dataset that has both the LONG and LONGITUDE variables in it. In this case, both will translate to lon, and the 2nd occurrence will be flagged as a "Duplicate variable name" error. Similarly, it is possible but tricky to translate A to X and X to Y in the same translation file. The transvar optional file is in common format. Its descriptor, however, does NOT accept the pre-translation or post-translation prefix. The first element of a pair is the "old" name, found in the datasets. The second element is the "new" synonym name to which the method is to translate the old name. Sample file: # defmet translation list from RCG via Email 29 Dec 95. WJS 23 Mar 96 # 25 Mar 96. Play with HR, MN translations to get gmt stuff to work. WJS YD = yrday_gmt LAT = lat LONG = lon SMG = speed_trim EDO_Z = depth_w # Will use defgb CREATE_GMT capability to make time_gmt variable # from HR & MN. To store it over MN, will translate MN into it. # Will use defgb variable removal capability to logically remove HR, # so it doesn't need translation MN = time_gmt ------- inpwidths The presence of this file indicates that some of the input data is to be read from fixed-width fields instead of via free-field input. Variables found in this file are read from fixed-width fields. Variables not in the file are read with free-field input. Variables at a particular level must be either all free-field or all fixed-field. It is not possible to specify the starting column for a field. The starting column is determined by the position of the variable within a data record, and the sum of the widths of the preceding variables (if any). The order of the variables is determined by the variable list for the level being read. Accordingly, the sum of the field widths of the variables present at a level must equal the width of the data records of that level (before any next-level file specification) The inpwidths optional file is in common format. Its descriptor accepts the optional pre-translation- and post-translation- prefixes. The first element of a pair is a variable name. The second element is the field width of that variable in the dataset. Sample file: # latitude & longitude recorded to 3 decimal places # Must read data in fixed field since latitude and longitude might # not be separated if longitude goes negative lat = 7 lon = 8 temp_air = 5 ------- removals The default method allows data selection by providing a list of variables to process. The presence of a removals optional file allows data selection by providing a list of variables to omit. If a variable is present in both a selection and omission list, the omission list takes priority. The removals optional file has the same comment record rules as a common format file. Each data record, however, consists of a list of variables separated by blanks, tabs, commas and/or semicolons. The removals descriptor accepts the optional pre-translation- and post-translation- prefixes. Removed variables are available to calculations performed by inner (such as gmt calculation). They are not available to outer. Sample file: # Remove HR variable, since its info will be in GMT variable HR ------- gmtparams The presence of this file indicates that GMT is to be calculated from variables in the dataset containing hour and minute information. The hour data is in 24-hour time format and the minute data is an integer from 0 to 59. The output GMT format is a 4 digit string HHMM. To perform this calculation, inner needs to know the locations of the hour and minute variables, and it needs to know where to store the result, Unfortunately, inner cannot easily create variables, so the result must be stored over some other data, destroying it. If you don't want to replace existing data, room for the gmt variable must be made in the dataset by adding a dummy name at the level where you want the gmt to appear (and, therefore, also adding the dummy name to the level 0 list). If you DO replace existing data, you may well wish to change the variable name of that data to reflect the fact that it will be gmt data instead of what it originally was. This is done via the transvar file. The gmt data may overlay its source hour or minute variables. The gmtparams optional file is in common format. Its descriptor accepts the optional pre-translation- and post-translation- prefixes. Its data consists of 3 pairs. The first element of each pair is a keyword and the second element of each pair is a variable name. The keywords are: hrname, mnname, and gmtname. hrname is the name of an existing variable containing the 2 digit hour. mnname is the name of an existing variable containing the 2 digit minute. gmtname is the name of the existing variable to be replaced by the calculated gmt data. Sample file: # Take hour information from variable HR and minute information from # variable MN. Store the calculated gmt time in the location where the # MN data came from, making the original MN data unavailable to users of # the method. # Note that the resulting gmt data will be labelled MN, unless MN is # translated to some other name (like time_gmt) in the transvar file hrname=HR; mnname=MN; gmtname=MN ------- diagnostics A method produces output of various types. At a minimum, it generates a stream of comments, a list of variables, and data from the data file described above. For a description of this output, see (?). If an error is detected, a message is sent to the standard output device, stdout. If the method is compiled with certain switches, debugging information is generated and written to the standard error device, stderr. When being run interactively, this output can be directed with standard shell output redirection techniques. When being run in an http environment, stdout output appears on a browser screen, while stderr output typically appears in the httpd server log file. A GLOBEC inner can produce other output. For example, there can be information related to the indirect and optional files, as well as comments contained in these files. The diagnostics file allows control of output other than the variable list and data. It is possible to control both quantity and destination of output. It is possible to produce output before the diagnostics file has been read. In this case, output quantity and destination may be determined by the compile-time values. If these differ from the run-time values, output can be split between the 2 places. Output may be "turned off" either by defining "quantities" to "none" or "destinations" to "nowhere". Therefore, it is not considered an error if output is requested but there is no destination provided. The diagnostics optional file is in common format. Its descriptor does not accept the optional pre-translation- and post-translation- prefixes because it does not contain variable names. All of the parameters have default values, determined at compile time. Some may be specified on the compilation line. See the "switches" documentation. Its data pairs consist of 5 types. In the first type, the first element is the name of an output sink and the second element is a file specification. The output sinks are: error_sink, data_comment_sink, other_comment_sink, and debug_sink. error_sink is the destination for inner's error messages. data_comment_sink is the destination for comment information from data files. other_comment_sink is the destination for all other comments. debug_sink is the destination for debugging information for inner's debug output. Each sink accepts the optional prefix addl-. Without this prefix, the appropriate output stream is redirected to the specified file. With this prefix, the stream is sent to stdout (or stderr for trace output) and also to the specified file. If addl- is specified, the usual stdout stream may not be redirected. Each sink or addl-sink also accepts the optional prefixes new- or unique-. By default, output sinks are opened in "append" mode (create new file if one doesn't exist; add to end if file exists). Consecutive uses of the data object will create a log-like file, in which it may be difficult to correlate data with individual uses of the sink. Simultaneous uses of the data object will result in intermixed entries in the sink. The new- prefix will cause a new file to be opened at each use. Information from a previous use of the object will be lost. Simultaneous uses of the data object will produce unknown results (to me, anyway). The unique- prefix will cause a time stamp of the form ".yymmmdd-hhmm-ss" to be appended to the sink name. The same stamp will be appended to any unique- sink in a single use of the data object. Since this name is only unique to the second, simultaneous use is still possible. unique- sinks are opened in "append" mode. Multiple sinks directed to the same file must all have the same new- or unique- prefixes (or no prefixes must be present). File specifications may be any file that may be written to, including /dev/stdout, /dev/stderr, and /dev/null. They must include any relevant directory information. The length of each file specification is approximately 40 characters (there is space for 40 character names for each sink; space unused by one name may be used by another). If 2 sinks are to be sent to the same file, the file must be specified the same way in each pair. For example, if you use an absolute directory specifier with one sink, do not use a relative directory specifier with the other. Errors opening the error sink are reported to stdout, along with the message(s) that were to be sent to the error sink. In the second type, the first element is the name of an output level and the second element is an integer specifying the level. The output level names are error_level and trace_level. error_level controls how much information is included with each error message. trace_level controls how much subroutine call information is sent to the debug sink. The addl- prefix can be used with the output level names in conjunction with the addl- prefix used with the output sinks. When used, it allows the level sent to the default sink differ from the level sent to the duplicate sink. The levels' values must be between 0 and 99, inclusive. Level 0 produces the least output; level 99 the most. Level n information includes all the information produced at lower levels. error_level = 0 produces no error messages 10 produces the error message 20 identifies the inner writing the error 25 identifies the remote connection (if possible) and the time 50 may produce information about the file being processed at the time of the error 60 if the file is a command, attempts to copy more information from from the process in which the command is running 80 produces information about all files open at the time of the error trace_level = 0 produces no trace information 10 traces calls to ioopen 20 traces calls to the optional file routines 40 traces calls to the file-level routines (scanheader*) 50 traces calls to the comment-processing routines 70 traces calls to the record-level routines (ioreadrec) In the third type, the first element is the name of a source of diagnostic lines and the second element is the number of lines from that source that are to be inserted into the error stream. The only name currently implemented is maxscriptdiags. A command being used as a data source might fail and issue diagnostics referring to that failure. max- scriptdiags refers to the maximum number of acceptable lines from such a source. These diagnostics are sent to the error sink. maxscriptdiags accepts a addl- prefix (see error_level, above). In the fourth type, the first element is the name of an output switch and the second element is the value of that switch. The output switches are: inserted_msgs, inserted_msg_tags, and comment_source_tag. The inserted_msgs switch controls whether defgb generates comment lines identifying the indirect and optional files in use. The inserted_msg_tag switch controls whether the inserted messages themselves are tagged with identification as to where they came from. The comment_source_tag switch controls whether each optional file comment is tagged with identification as to where it came from. Switch values are TRUE or FALSE. Synonyms for TRUE are true, T, t, YES, yes, Y, y and 1. Synonyms for FALSE are false, F, f, NO, no, N, n, and 0. In the fifth type, the first element is the name of a data trace type and the second element is an integer specifying the frequency of the data trace. The data trace types are print_iovalstr, print_iovalreal, and print_*_lines, where * represents one of the optional file descriptors (including "datafile"). The "diagnostics" descriptor may not be specified, however, since by the time it is read, information from the diagnostics file has already been processed. A frequency of 0 means no output. Any other frequency means print the first occurrence of the data trace type, and every 'frequency'th occurrence thereafter. For example, print_datafile_lines=10 means output lines 1, 11, 21, etc from each input data file. iovalstr=1 means output the result of every iovalstr call. The addl- prefix can be used with the data trace types in conjunction with the addl- prefix used with the output sinks. When used, it allows the amount of trace information sent to the default sink to differ from the amount sent to the duplicate sink. All data trace output is written to the debug sink.