	Modular JDBC SQL Engine (ModSQL)
	(c) 2004 Chris Studholme


Introduction
------------

ModSQL is a project who's goal is to develop a modular SQL engine that can
be grafted onto any raw database (such as a large flat ASCII file).  ModSQL
is a JDBC driver that will parse SQL queries and execute them by calling
third-party database modules.  The database module API is designed such that
writing database modules is much easier than writing an entire RDBMS.  

Included in this package is the ModSQL engine, three database modules, and a
simple command line JDBC client.  The database modules are SampleDatabase,
used to demonstrate how to write database modules, TemporaryDatabase, for
in-memory storage of temporary tables, and AsciiDatabase, which can be used
to access various formats of flat ASCII file. 


Capabilities
------------

ModSQL works best with read-only databases; however, support for read-write
databases was recently added and works well for temporary tables.  The
following SQL constructs currently work:

  - most standard SELECT . FROM . WHERE . GROUP BY . HAVING . ORDER BY .
  - SELECT DISTINCT
  - operators AND, OR, AS, LIKE, IN, IS, BETWEEN, EXISTS
  - functions ABS, CEIL, FLOOR, GREATEST, LEAST, MOD, POWER, ROUND, SIGN, SQRT
  - functions LEFT, RIGHT, UPPER, LOWER, LENGTH, LPAD, RPAD, SUBSTR, NVL
  - aggregates COUNT, MIN, MAX, SUM, AVG, STDEV
  - JOIN...ON and JOIN...USING
  - subqueries in the SELECT, FROM, and WHERE sections
  - CREATE INDEX ON table [USING SORT or HASH] (column)
  - CREATE TABLE, INSERT, UPDATE, DELETE, and DROP TABLE


Licence
-------

This packages is provided under the terms of the GNU General Public Licence.
Please read the COPYING file for more details.


Disclaimer
----------

This software is considered beta quality.  It is relatively bug free when
used in an environment similar to the one it was designed and written in. 
This software is provided "AS IS" with no warranty or guarantee either 
written or implied.  It is being made available primarily to other developers
who may be interested in the ideas contained within.  


Contents
--------

  AsciiDatabase  -- database module for accessing flat ASCII files
  ModSQL         -- modular SQL engine (JDBC driver; needs database modules)
  SampleDatabase -- in-memory sample database module (for testing purposes)
  TemporaryDatabase -- in-memory writable temporary database
  doc            -- JavaDoc package documentation
  samples        -- sample config files and tables

  ChangeLog	 -- detailed log of changes
  COPYING        -- licence
  Makefile       -- top level make file
  NEWS		 -- what's new in this version
  README         -- this file
  TODO		 -- brief list of what should be done in the near & long term
  UPGRADE	 -- notes regarding upgrading from earlier versions
  modsql.c       -- convenient wrapper for modsql.class
  modsql.class   -- compiled modsql.java 
  modsql.java    -- command line JDBC user interface


Installation
------------

To use modsql.c, you need to edit the file to set the location of your java
runtime environment (if it is not /usr/bin/java), and then type "make modsql".
This code has been tested on Linux and Solaris with Java 1.2 and Java 1.3.1.  

If you just want to use the Java code, it should be ready to go "out of the
box".  The command:

  java -classpath . modsql --version

should yield something like:

  modsql version 0.40 20030402
    ModSQL.Driver version 0.40
  failed to load database module AsciiDatabase.Manager
    SampleDatabase.Manager version 0.40
    TemporaryDatabase.Manager version 0.40

Use:

  java -classpath . modsql -h

for a usage summary.  A sample query can be executed with:

  java -classpath . modsql -c "select * from antiques"

If modsql.c works for you (type `make` to compile it), you can use:

  ./modsql   (or just modsql if it is in your path)

instead of the java commands above.

Typing:

  make install

will install the class files in /usr/local/classes and the modsql binary in
/usr/local/bin.  You will probably need to set your classpath as follows:

  CLASSPATH=/usr/local/classes; export CLASSPATH

to run the installed classes.


Config File
-----------

The master config file for ModSQL is located at: ./ModSQL/ModSQL.conf.  This
config file is self documented.

The AsciiDatabase module has its own config file named asciidb.conf, which is
described below.  The location of this config file can be set in ModSQL.conf.


SampleDatabase
--------------

The SampleDatabase module is an in-memory database useful for testing the
SQL engine.  The following tables are currently defined:

  AntiqueOwners
  Antiques
  EmployeeAddressTable
  EmployeeStatisticsTable
  Orders

These sample tables are useful when working through James Hoffman's SQL 
tutorial available at:

  http://www.geocities.com/SiliconValley/Vista/2207/sql1.html


AsciiDatabase
-------------

To use the AsciiDatabase module you need to create a file named 
/tmp/modsql/asciidb.conf.  Here is an example:

  # /tmp/modsql/asciidb.conf

  # sample tables
  [homeusage]
  tabledef=/tmp/modsql/table/home.table
  datafile=/tmp/modsql/data/home.log
  [sysload]
  tabledef=/tmp/modsql/table/load.table
  datafile=/tmp/modsql/data/load.log.gz

To make this work, create the directories:

  /tmp/modsql
  /tmp/modsql/index

and copy the directories:

  ./samples/table -> /tmp/modsql/table
  ./samples/data -> /tmp/modsql/data

and the file:

  ./samples/asciidb.conf -> /tmp/modsql/asciidb.conf

If all goes well, you can do:

  java -classpath . modsql -c "select * from homeusage"

or

  java -classpath . modsql -c "select * from sysload"

and get results.  The format of asciidb.conf is fairly self explanitory. 
Note that AsciiDatabase will transparently read from gzip compressed files.
Also note that you can have multiple datafile's for each table defined.  Each
additional file is just appended to the previous from ModSQL's point of view.


Table Definition File
---------------------

Two sample table definition files are located in ./samples/table.  Here is
home.table:

  # table definition file

  recordsize=40

  [row]
  # each line is one column
  # srcType       srcLen  srcOfs  colTyp  colName options
  column=char     8       0       int     DATE
  column=char     4       9       int     TIME
  column=char     8       14      int     USAGE
  column=char     16      23      char    USER

Lines starting with a # character are comments.  A record can contain multiple
rows of data.  If recordsize is set to -1, records are terminated by a end-of-
line marker.  The only supported source type is 'char'.  Length is the width
of the field and offset is the offset from the start of the record or the last
'offset=' line.  To demonstrate the use of offset=, the following is the same
as the previous:

  column=char     8       0       int     DATE
  column=char     4       9       int     TIME
  offset=14
  column=char     8       0       int     USAGE
  column=char     16      9       char    USER

Supported column types include: char, int, long, single, double, and bool.
The bool type requires a case-insensitive value of "1", "yes", or "true" to be
true.  False otherwise.  

The options column can include a comma-seperated list of options.  Supported
options are "a", "d", "m", and "n", for add, divide, multiply, and null 
replacement.  Here are some examples:

  column=char     2       64      int     YEAR    a1900
  column=char     10      30      double  A4      d1000,n-0.001

The first column is a two digit year that has 1900 added after translation.
The second column is a floating point number with 3 implied decimals and
-1 (-0.001 after dividing) is to be interpreted as null.  Note that the order
of the options is important.  The last line could have been:

  column=char     10      30      double  A4      n-1,d1000

If you need to skip some fixed number of bytes at the beginning of the file, 
use an offset= option before the first [row] section.

If you need to select particular rows within a record for a given file, use:

  option=select,offset,text

where offset is some number offset from the beginning of the record, and text
is some text to match.  This line should come immediately after a [row] line.

If you need to skip particular rows within a record for a given file, use:

  option=skip,offset,text

where offset is some number offset from the beginning of the record, and text
is some text to match.  This line should come immediately after a [row] line.

I may have missed some options but hopefully this will get you started.


Creating Indices
----------------

The index directory (specified in ModSQL.conf) will hold non-native indices
created for databases.  Use the SQL command:

  CREATE INDEX ON table [USING SORT|HASH] (column);

to create an index on the specified column of the specified table.  Indices
are not updated automatically when the table data changes.  An attempt is 
made to recognize that an underlying table data file has changed (or the table
definition file has changed) and not use the index until the index is 
recreated.  You should probably not index files that change regularly (like
log files).  

Both sorted and hashed index files are supported.  There is very little
difference between them.  When being read the first time, they are simply
scanned in both cases.  During a table join, where multiple scans of the
index are made, efficient lookups will be done.  Lookups are only for an
exact match, and since the hash lookup is O(1) verses O(log n) for binary 
search, the hash table index method is a little faster.  The downside to a 
hash table index is that the column values are stored in an order that looks 
random as opposed to a sorted order.  In the future, better use may be made
of sorted indices.

Creating an index on multiple columns is not currently supported; however, you
can create multiple indices on the same table (different columns).  The SQL
engine will only use one index per table specified in the FROM section of your
query, but it will attempt to choose the best index to use.

Index files make relatively efficient use of disk space.  I have several 10's
of GB of data indexed with about 130MB of indices.  


Bugs
----

I'm sure there are plenty.  Please send comments, criticism, bug reports,
etc. to:

  Chris.Studholme@utoronto.ca


Home Page
---------

Located at:  

  http://modsql.sourceforge.net/


--
$Id: README,v 1.8 2004/01/04 03:01:20 cvs Exp $
