<!--
$Header: /cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.24 2003/09/29 18:18:35 momjian Exp $
Genetic Optimizer
-->

 <chapter id="geqo">
  <docinfo>
   <author>
    <firstname>Martin</firstname>
    <surname>Utesch</surname>
    <affiliation>
     <orgname>
      University of Mining and Technology
     </orgname>
     <orgdiv>
      Institute of Automatic Control
     </orgdiv>
     <address>
      <city>
       Freiberg
      </city>
      <country>
       Germany
      </country>
     </address>
    </affiliation>
   </author>
   <date>1997-10-02</date>
  </docinfo>

  <title id="geqo-title">Genetic Query Optimizer</title>

  <para>
   <note>
    <title>Author</title>
    <para>
     Written by Martin Utesch (<email>utesch@aut.tu-freiberg.de</email>)
     for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
    </para>
   </note>
  </para>

  <sect1 id="geqo-intro">
   <title>Query Handling as a Complex Optimization Problem</title>

   <para>
    Among all relational operators the most difficult one to process
    and optimize is the <firstterm>join</firstterm>. The number of
    alternative plans to answer a query grows exponentially with the
    number of joins included in it. Further optimization effort is
    caused by the support of a variety of <firstterm>join
    methods</firstterm> (e.g., nested loop, hash join, merge join in
    <productname>PostgreSQL</productname>) to process individual joins
    and a diversity of <firstterm>indexes</firstterm> (e.g., R-tree,
    B-tree, hash in <productname>PostgreSQL</productname>) as access
    paths for relations.
   </para>

   <para>
    The current <productname>PostgreSQL</productname> optimizer
    implementation performs a <firstterm>near-exhaustive
    search</firstterm> over the space of alternative strategies. This
    algorithm, first introduced in the <quote>System R</quote>
    database, produces a near-optimal join order, but can take an
    enormous amount of time and memory space when the number of joins
    in the query grows large. This makes the ordinary
    <productname>PostgreSQL</productname> query optimizer
    inappropriate for database application domains that involve the
    need for extensive queries, such as artificial intelligence.
   </para>

   <para>
    The Institute of Automatic Control at the University of Mining and
    Technology, in Freiberg, Germany, encountered the described problems as its
    folks wanted to take the <productname>PostgreSQL</productname> DBMS as the backend for a decision
    support knowledge based system for the maintenance of an electrical
    power grid. The DBMS needed to handle large join queries for the
    inference machine of the knowledge based system.
   </para>

   <para>
    Performance difficulties in exploring the space of possible query
    plans created the demand for a new optimization technique to be developed.
   </para>

   <para>
    In the following we describe the implementation of a
    <firstterm>Genetic Algorithm</firstterm> to solve the join
    ordering problem in a manner that is efficient for queries
    involving large numbers of joins.
   </para>
  </sect1>

  <sect1 id="geqo-intro2">
   <title>Genetic Algorithms</title>

   <para>
    The genetic algorithm (<acronym>GA</acronym>) is a heuristic optimization method which
    operates through
    determined, randomized search. The set of possible solutions for the
    optimization problem is considered as a
    <firstterm>population</firstterm> of <firstterm>individuals</firstterm>.
    The degree of adaptation of an individual to its environment is specified
    by its <firstterm>fitness</firstterm>.
   </para>

   <para>
    The coordinates of an individual in the search space are represented
    by <firstterm>chromosomes</firstterm>, in essence a set of character
    strings. A <firstterm>gene</firstterm> is a
    subsection of a chromosome which encodes the value of a single parameter
    being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or
    <firstterm>integer</firstterm>.
   </para>

   <para>
    Through simulation of the evolutionary operations <firstterm>recombination</firstterm>,
    <firstterm>mutation</firstterm>, and
    <firstterm>selection</firstterm> new generations of search points are found
    that show a higher average fitness than their ancestors.
   </para>

   <para>
    According to the <systemitem class="resource">comp.ai.genetic</> <acronym>FAQ</acronym> it cannot be stressed too
    strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a
    problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly
    non-random (better than random). 
   </para>

   <figure id="geqo-diagram">
    <title>Structured Diagram of a Genetic Algorithm</title>

    <informaltable frame="none">
     <tgroup cols="2">
      <tbody>
       <row>
        <entry>P(t)</entry>
        <entry>generation of ancestors at a time t</entry>
       </row>

       <row>
        <entry>P''(t)</entry>
        <entry>generation of descendants at a time t</entry>
       </row>
      </tbody>
     </tgroup>
    </informaltable>

<literallayout class="monospaced">
+=========================================+
|>>>>>>>>>>>  Algorithm GA  <<<<<<<<<<<<<<|
+=========================================+
| INITIALIZE t := 0                       |
+=========================================+
| INITIALIZE P(t)                         |
+=========================================+
| evaluate FITNESS of P(t)                |
+=========================================+
| while not STOPPING CRITERION do         |
|   +-------------------------------------+
|   | P'(t)  := RECOMBINATION{P(t)}       |
|   +-------------------------------------+
|   | P''(t) := MUTATION{P'(t)}           |
|   +-------------------------------------+
|   | P(t+1) := SELECTION{P''(t) + P(t)}  |
|   +-------------------------------------+
|   | evaluate FITNESS of P''(t)          |
|   +-------------------------------------+
|   | t := t + 1                          |
+===+=====================================+
</literallayout>
   </figure>
  </sect1>

  <sect1 id="geqo-pg-intro">
   <title>Genetic Query Optimization (<acronym>GEQO</acronym>) in PostgreSQL</title>

   <para>
    The <acronym>GEQO</acronym> module is intended for the solution of the query
    optimization problem similar to a traveling salesman problem (<acronym>TSP</acronym>).
    Possible query plans are encoded as integer strings. Each string
    represents the join order from one relation of the query to the next.
    E. g., the query tree
<literallayout class="monospaced">
   /\
  /\ 2
 /\ 3
4  1
</literallayout>
    is encoded by the integer string '4-1-3-2',
    which means, first join relation '4' and '1', then '3', and
    then '2', where 1, 2, 3, 4 are relation IDs within the
    <productname>PostgreSQL</productname> optimizer.
   </para>

   <para>
    Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
    algorithm.
   </para>

   <para>
    Specific characteristics of the <acronym>GEQO</acronym>
    implementation in <productname>PostgreSQL</productname>
    are:

    <itemizedlist spacing="compact" mark="bullet">
     <listitem>
      <para>
       Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit
       individuals in a population, not whole-generational replacement)
       allows fast convergence towards improved query plans. This is
       essential for query handling with reasonable time;
      </para>
     </listitem>

     <listitem>
      <para>
       Usage of <firstterm>edge recombination crossover</firstterm>
       which is especially suited to keep edge losses low for the
       solution of the <acronym>TSP</acronym> by means of a
       <acronym>GA</acronym>;
      </para>
     </listitem>

     <listitem>
      <para>
       Mutation as genetic operator is deprecated so that no repair
       mechanisms are needed to generate legal <acronym>TSP</acronym> tours.
      </para>
     </listitem>
    </itemizedlist>
   </para>

   <para>
    The <acronym>GEQO</acronym> module allows
    the <productname>PostgreSQL</productname> query optimizer to
    support large join queries effectively through
    non-exhaustive search.
   </para>

  <sect2 id="geqo-future">
   <title>Future Implementation Tasks for
    <productname>PostgreSQL</> <acronym>GEQO</acronym></title>

     <para>
      Work is still needed to improve the genetic algorithm parameter
      settings.
      In file <filename>backend/optimizer/geqo/geqo_params.c</filename>, routines
      <function>gimme_pool_size</function> and <function>gimme_number_generations</function>,
      we have to find a compromise for the parameter settings
      to satisfy two competing demands:
      <itemizedlist spacing="compact">
       <listitem>
	<para>
	 Optimality of the query plan
	</para>
       </listitem>
       <listitem>
	<para>
	 Computing time
	</para>
       </listitem>
      </itemizedlist>
     </para>

   </sect2>
  </sect1>

 <sect1 id="geqo-biblio">
  <title>Further Readings</title>

  <para>
   The following resources contain additional information about
   genetic algorithms:

   <itemizedlist>
    <listitem>
     <para>
      <ulink url="http://surf.de.uu.net/encore/www/">The Hitch-Hiker's
      Guide to Evolutionary Computation</ulink> (FAQ for <ulink
      url="news://comp.ai.genetic">comp.ai.genetic</ulink>)
     </para>
    </listitem>
   
    <listitem>
     <para>
      <ulink url="http://www.red3d.com/cwr/evolve.html">Evolutionary
       Computation and its application to art and design</ulink> by
      Craig Reynolds
     </para>
    </listitem>

    <listitem>
     <para>
      <xref linkend="ELMA99">
     </para>
    </listitem>

    <listitem>
     <para>
      <xref linkend="FONG">
     </para>
    </listitem>
   </itemizedlist>
  </para>

 </sect1>
</chapter>

<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->
