#LyX 1.1 created this file. For more info see http://www.lyx.org/
\lyxformat 218
\textclass article
\begin_preamble

\usepackage[T1]{fontenc}
\usepackage{xspace}
\newcommand{\nach}{$\to$\xspace}
\newcommand{\hoch}{\texttt{$^\wedge$}}

\usepackage{html}

\newcommand{\doubledash}{-\hspace{0.1em}-}
\newcommand{\doubledashb}{-\/-}
\newcommand{\dlt}{{\footnotesize$\ll$}}
\newcommand{\dgt}{{\footnotesize$\gg$}}

\begin{htmlonly}

\renewenvironment{lyxcode}
  {\begin{list}{}{
    \setlength{\rightmargin}{\leftmargin}
    \raggedright
    \setlength{\itemsep}{0pt}
    \setlength{\parsep}{0pt}
    \ttfamily}%
   \item[] 
   \begin{ttfamily}}
   {\end{ttfamily}
    \end{list} }

\newenvironment{LyXParagraphIndent}[1]%
{\begin{quote}}
{\end{quote}}

\renewcommand{\LyX}{LyX}

\renewcommand{\doubledash}{\rawhtml &#45;&#45;\endrawhtml}
\renewcommand{\doubledashb}{\rawhtml &#45;&#45;\endrawhtml}
\renewcommand{\dlt}{«}
\renewcommand{\dgt}{»}

\renewcommand{\nach}{\rawhtml <i>to</i> \endrawhtml}
\renewcommand{\hoch}{\rawhtml &#94;\endrawhtml}

\end{htmlonly}
\end_preamble
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize 11
\spacing single 
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation skip
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default

\layout Title

Aspell Devel Docs
\layout Author

Copyright (c) 2002
\newline 
Kevin Atkinson
\newline 
kevina@gnu.org
\layout Standard


\begin_inset LatexCommand \tableofcontents{}

\end_inset 


\layout Section*

Notes
\layout Standard

This manual is designed for those who which to developer Aspell.
 It is currently very sketchy.
 However, it should improve over time.
 The latest version of this document can be found at 
\begin_inset LatexCommand \url{http://savannah.gnu.org/download/aspell/manual/devel/devel.html}

\end_inset 

.
\layout Standard

The eventual goal is to convert this manual into Texinfo.
 However, since I do not have the time to learn Texinfo right now, I decided
 to use something I am already conferable with.
 Once someone goes through the trouble of converting it into Texinfo I will
 maintain the Texinfo version.
\layout Section*

Copyright
\layout Standard

Copyright (c) 2002 Kevin Atkinson.
 Permission is granted to copy, distribute and/or modify this document under
 the terms of the GNU Free Documentation License, Version 1.1 or any later
 version published by the Free Software Foundation; with no Invariant Sections,
 no Front-Cover Texts.
 and no Back-Cover Texts.
 A copy of the license is included in the section entitled "GNU Free Documentati
on License".
\layout Section

Style Guidelines
\layout Standard

As far as coding styles go I am really not that picky.
 The important thing is to stay consistent.
 However, please what ever you do, do not indent with more than 4 characters
 as I find indenting with more than that extremely difficult to read as
 most of the code ends up on the right side of the window.
\layout Section

C++ Standard Library
\layout Standard

The C++ Standard library is not used directly except under very specific
 circumstances.
 The string class and the STL is used indirectly though wrapper classes
 and all I/O is done using the standard C library with light right helper
 classes to make using C I/O a bit more C++ like.
\layout Standard

However the new, new[], delete and delete[] operates are used to allocated
 memory when appropriate.
\layout Section

Templates
\layout Standard

Templates are used in Aspell when there is a clear advantage to doing so.
 When ever you use templates please use them carefully and try very hard
 not to create code bloat by generating a lot of unnecessary, and duplicate
 code.
\layout Section

Error Handling
\layout Standard

Exceptions are not used in Aspell as I find them more trouble than they
 are worth.
 Instead an alternate method of error handling is used which is based around
 the PosibErr class.
 PosibErr is a special Error handling device that will make sure that an
 error is properly handled.
 It is defined in 
\begin_inset Quotes eld
\end_inset 

posib_err.hpp
\begin_inset Quotes erd
\end_inset 

.
 PosibErr is expected to be used as the return type of the function It will
 automatically convert to the "normal" return type however if the normal
 returned type is accessed and there is an "unhandled" error condition it
 will abort It will also abort if the object is destroyed with an "unhandled"
 error condition.
 This includes ignoring the return type of a function returning an error
 condition.
 An error condition is handled by simply checking for the presence of an
 error, calling ignore, or taking ownership of the error.
\layout Standard

The PosibErr class is used extensively though out Aspell.
 Please refer to the Aspell source for examples of using PosibErr until
 better documentation is written.
\layout Section

Source Code Layout 
\layout Description

common/ Common code used by all parts of Aspell
\layout Description

lib/ Library code used only by the actual Aspell library
\layout Description

data/ Data files used by Aspell
\layout Description

modules/ Aspell modules which are eventually meant to be pluggable
\begin_deeper 
\layout Description

speller/ 
\begin_deeper 
\layout Description

default/ Main speller Module.
\end_deeper 
\layout Description

filter/ 
\layout Description

tokenizer/
\end_deeper 
\layout Description

auto/ Scripts and data files to automatically generate code used by Aspell
\layout Description

interface/ Header files and such that external programs should use when
 in order to use the Aspell library.
\begin_deeper 
\layout Description

cc/ The external 
\begin_inset Quotes eld
\end_inset 

C
\begin_inset Quotes erd
\end_inset 

 interface that programs should be using when they wish to use Aspell.
\end_deeper 
\layout Description

prog/ Actual programs based on the Aspell library.
 The main 
\begin_inset Quotes eld
\end_inset 

aspell
\begin_inset Quotes erd
\end_inset 

 utility is included here.
\layout Description

scripts/ Misc.
 scripts used by Aspell
\layout Description

manual/
\layout Description

examples/ Example programs demonstrating the use of the Aspell library
\layout Section

Strings
\layout Subsection

String
\layout Standard

The String class provided the same functionally of the C++ string except
 for fewer constructors.
 It also inherits OStream so that you can write to it with the 
\begin_inset Quotes eld
\end_inset 

<
\latex latex 

\backslash 
/
\latex default 
<
\begin_inset Quotes erd
\end_inset 

 operator.
 It is defined in 
\begin_inset Quotes eld
\end_inset 

string.hpp
\begin_inset Quotes erd
\end_inset 

.
\layout Subsection

ParmString
\layout Standard

ParmString is a special string class that is designed to be used as a parameter
 for a function that is expecting a string.
 It is defined in 
\begin_inset Quotes eld
\end_inset 

parm_sting.hpp
\begin_inset Quotes erd
\end_inset 

.
 It will allow either a "const char *" or "String" class to be passed in.
 It will automatically convert to a "const char *".
 The string can also be accesses via the "str" method.
 Usage example:
\layout LyX-Code

void foo(ParmString s1, ParmString s2) {
\newline 
   const char * str0 = s1;
\newline 
   unsigned int size0 = s2.size()
\newline 
   if (s1 == s2 || s2 == "bar") {
\newline 
     ...
\newline 
   }
\newline 
}
\newline 
...
\newline 
String s1 = "...";
\newline 
foo(s1);
\newline 
const char * s2 = "...";
\newline 
foo(s2);
\layout Standard

This class should be used when a string is being passed in as a parameter.
 It is faster than using 
\begin_inset Quotes eld
\end_inset 

const String &
\begin_inset Quotes erd
\end_inset 

 (as that will create an unnecessary temporary when a const char * is passed
 in), and is less annoying than using 
\begin_inset Quotes eld
\end_inset 

const char *
\begin_inset Quotes erd
\end_inset 

 (as it doesn't require the c_str() method to be used when a String is passed
 in).
\layout Subsection

CharVector
\layout Standard

A character vector is basically a Vector<char> but it has a few additional
 methods for dealing with strings which Vector does not provide.
 It, like String, is also inherits OStream so that you can write to it with
 the 
\begin_inset Quotes eld
\end_inset 

<
\latex latex 

\backslash 
/
\latex default 
<
\begin_inset Quotes erd
\end_inset 

 operator.
 It is defined in 
\begin_inset Quotes eld
\end_inset 

char_vector.hpp
\begin_inset Quotes erd
\end_inset 

.
 Use it when ever you need a string which is guaranteed to be in a continuous
 block of memory which you can write to.
\layout Section

Smart Pointers
\layout Standard

Smart pointers are used extensively in Aspell to avoid simplify memory managemen
t tasks and to avoid memory leaks.
\layout Subsection

CopyPtr
\layout Standard

The CopyPtr class makes a deep copy of an object when ever it is copied.
 The CopyPtr class is defined in 
\begin_inset Quotes eld
\end_inset 

copy_ptr.hpp
\begin_inset Quotes erd
\end_inset 

.
 This header should be included where ever CopyPtr is used.
 The complete definition of the object CopyPtr is pointing to does not need
 to be defined at this point.
 The implementation is defined in 
\begin_inset Quotes eld
\end_inset 

copy_ptr-t.hpp
\begin_inset Quotes erd
\end_inset 

.
 The implementation header file should be included at a point in your code
 where the class CopyPtr is pointing to is completely defined.
\layout Subsection

ClonePtr
\layout Standard

ClonePtr is like copy pointer except the clone() method is used instead
 of the copy constructor to make copies of an object.
 If is defined in 
\begin_inset Quotes eld
\end_inset 

clone_ptr.hpp
\begin_inset Quotes erd
\end_inset 

 and implemented in 
\begin_inset Quotes eld
\end_inset 

clone_ptr-t.hpp
\begin_inset Quotes erd
\end_inset 

.
\layout Subsection

StackPtr
\layout Standard

A StackPtr is designed to be used when ever the only pointer to a new object
 allocated with 
\series bold 
new
\series default 
 is on the stack.
 It is similar to the standard C++ auto_ptr but the semantics are a bit
 different.
 It is defined in 
\begin_inset Quotes eld
\end_inset 

stack_ptr.hpp
\begin_inset Quotes erd
\end_inset 

 unlike CopyPtr of ClonePtr it is defined and implemented in this header
 file.
\layout Subsection

GenericCopyPtr
\layout Standard

A generalized version of CopyPtr and ClonePtr which the two are based on.
 It is defined in 
\begin_inset Quotes eld
\end_inset 

generic_copy_ptr.hpp
\begin_inset Quotes erd
\end_inset 

 and implemented in 
\begin_inset Quotes eld
\end_inset 

generic_copy_ptr-t.hpp
\begin_inset Quotes erd
\end_inset 

.
\layout Section

I/O
\layout Standard

Aspell does not use C++ I/O classes and function in any way since they do
 not provide a way to get at the underlying file number and can often be
 slower than the highly tuned C I/O functions found in the standard C library.
 However, some light weight wrapper classes are provided so that standard
 C I/O can be used in a more C++ like way.
\layout Subsection

IStream/OStream
\layout Standard

These two base classes mimic some of the functionally of the C++ functionally
 of the corresponding classes.
 They are defined in 
\begin_inset Quotes eld
\end_inset 

istream.hpp
\begin_inset Quotes erd
\end_inset 

 and 
\begin_inset Quotes eld
\end_inset 

ostream.hpp
\begin_inset Quotes erd
\end_inset 

 respectfully.
 They are however based on standard C I/O and are not proper C++ streams.
\layout Subsection

FStream
\layout Standard

Defined in 
\begin_inset Quotes eld
\end_inset 

fstream.hpp
\begin_inset Quotes erd
\end_inset 


\layout Subsection

Standard Streams
\layout Standard

CIN/COUT/CERR.
 Defined in 
\begin_inset Quotes eld
\end_inset 

iostream.hpp
\begin_inset Quotes erd
\end_inset 

.
\layout Section

Config Class
\layout Standard

The Config class is used to hold configuration information.
 It has a set of keys which it will except.
 Inserting or even trying to look at a key that it does not know will produce
 an error.
 It is defined in 
\begin_inset Quotes eld
\end_inset 

common/config.hpp
\begin_inset Quotes erd
\end_inset 


\layout Section

Filter Interface
\layout Subsection

Overview
\layout Standard

In Aspell there are 5 types of filters:
\layout Enumerate


\series bold 
Decoders
\series default 
 which take input in some standard format such as iso8859-1 or UTF-8 and
 convert it into a string of FilterChars.
\layout Enumerate


\series bold 
Decoding filters
\series default 
 which manipulates a string of FilterChars by decoding the text is some
 way such as converting SGML character into its Unicode value.
 
\layout Enumerate


\series bold 
True filters
\series default 
 which manipulates a string of FilterChars to make it more suitable for
 spell checking.
 These filers generally blank out text which should not be spell checked
\layout Enumerate


\series bold 
Encoding filters
\series default 
 which manipulates a string of FilterChars by encoding the text is some
 way such as converting certain Unicode characters to SGML characters.
\layout Enumerate


\series bold 
Encoders
\series default 
 which take a string of FilterChars and convert into a standard format such
 as iso8859-1 or UTF-8
\layout Standard

Which types of filters are used depends on the situation
\layout Enumerate

When 
\series bold 
decoding words
\series default 
 for spell checking:
\begin_deeper 
\layout Itemize

The 
\series bold 
decoder
\series default 
 to convert from a standard format
\layout Itemize

The 
\series bold 
decoding filter
\series default 
 to perform high level decoding if necessary
\layout Itemize

The 
\series bold 
encoder
\series default 
 to convert into an internal format used by the speller module
\end_deeper 
\layout Itemize

When 
\series bold 
checking a document
\begin_deeper 
\layout Itemize

The 
\series bold 
decoder
\series default 
 to convert from a standard format
\layout Itemize

The 
\series bold 
decoding filter
\series default 
 to perform high level decoding if necessary
\layout Itemize

A 
\series bold 
true filter
\series default 
 to filter out parts of the document which should not be spell checked
\layout Itemize

The 
\series bold 
encoder
\series default 
 to convert into an internal format used by the speller module
\end_deeper 
\layout Enumerate

When 
\series bold 
encoding words
\series default 
 such as those returned for suggestions:
\begin_deeper 
\layout Itemize

The 
\series bold 
decoder
\series default 
 to convert from the internal format used by the speller module
\layout Itemize

The 
\series bold 
encoding filter
\series default 
 to perform high level encodings if necessary
\layout Itemize

The 
\series bold 
encoder
\series default 
 to convert into a standard format
\end_deeper 
\layout Standard

A FilterChar is a struct defined in 
\begin_inset Quotes eld
\end_inset 

common/filter_char.hpp
\begin_inset Quotes erd
\end_inset 

 which contains two members, a character, and a width.
 Its purpose is to keep track of the width of the character in the original
 format.
 This is important because when a misspelled word is found the exact location
 of the word needs to be returned to the application so that it can highlight
 it for the user.
 For example if the filters translated this:
\layout LyX-Code

Mr.
 foo said &quot;I hate my namme&quot;.
\layout Standard

to this
\layout LyX-Code

Mr.
 foo said "I hate my namme".
\layout Standard

without keeping track of the original width of the characters the application
 will likely highlight 
\begin_inset Quotes eld
\end_inset 

e my 
\begin_inset Quotes erd
\end_inset 

 as the misspelling because the spell checker will return 25 as the offset
 instead of 30.
 However with keeping track of the width using FilterChar the spell checker
 will now that the real position it 30 since the quote is really 6 characters
 wide.
 In particular the text will be annotated something like the following:
\layout LyX-Code

1111111111111611111111111111161
\newline 
Mr.
 foo said "I hate my namme".
\layout Standard

The standard 
\series bold 
encoder
\series default 
 and 
\series bold 
decoder
\series default 
 filters are defined in 
\begin_inset Quotes eld
\end_inset 

common/convert.cpp
\begin_inset Quotes erd
\end_inset 

.
 There should generally not be any need to deal with them so they will not
 be discussed here.
 The other three filters, the 
\series bold 
encoding filter
\series default 
, the 
\series bold 
true filter
\series default 
, and the 
\series bold 
decoding filter
\series default 
, are all defined the exact same way; they are inherited from the IndividualFilt
er class.
\layout Subsection

Adding a New Filter
\layout Standard

To add a new filter create a new file in the modules/filter directory, the
 file should be a C++ file and end in 
\begin_inset Quotes eld
\end_inset 

.cpp
\begin_inset Quotes erd
\end_inset 

.
 The file should contain a new filter class inherited from IndividualFilter,
 a function to return a new filter, and an optional KeyInfo array for adding
 options to control the behavior of the filter.
 The file then needs to be added to Makefile.am so that the build system
 knows about the filter and lib/new_filter.cpp must be modified so that Aspell
 knows about the filter.
\layout Subsection

IndividualFilter class
\layout Standard

All filters are required to inherit from the IndividualFilter class found
 in 
\begin_inset Quotes eld
\end_inset 

indiv_filter.hpp
\begin_inset Quotes erd
\end_inset 

.
 See that file for more details and the other filter modules for examples
 of how it is used.
\layout Subsection

Constructor Function
\layout Standard

After the class is created a function must to created which will return
 a new filter allocated with 
\series bold 
new
\series default 
.
 The function must have the following prototype:
\layout LyX-Code

IndividualFilter * new_<<filter_name>>
\layout Standard

Filters are defined in groups where each group contains an 
\series bold 
encoding filter
\series default 
, a 
\series bold 
true filter
\series default 
, and a 
\series bold 
decoding filter
\series default 
.
 Only one of them is required to be defined, however they all need a separate
 constructor function.
\layout Subsection

Config Options
\layout Standard

A filter group may have any number of options associated with it as long
 as they all start with the filter name.
 See the TeX and SGML filter for examples of what to do and 
\begin_inset Quotes eld
\end_inset 

config.hpp
\begin_inset Quotes erd
\end_inset 

 for the definition of the KeyInfo struct.
\layout Subsection

Makefile Modifications
\layout Standard

After the new file is created simply add the file to the 
\begin_inset Quotes eld
\end_inset 

libaspell_filter_standard_la_SOURCES
\begin_inset Quotes erd
\end_inset 

 line in 
\begin_inset Quotes eld
\end_inset 

modules/filter/Makefile.am
\begin_inset Quotes erd
\end_inset 

 so that the build system knows about it.
\layout Subsection

New_filter Modifications
\layout Standard

Finally modify 
\begin_inset Quotes eld
\end_inset 

lib/new_filter.cpp
\begin_inset Quotes erd
\end_inset 

 so that Aspell knows about the new filter.
 Follow the example there for the other filter modules.
 The filter_modules array should only be modified if there your filter has
 config options.
\layout Section

Data Structures
\layout Standard

When ever possible you should try to use on of the data structures available.
 If the data structures do not provide enough functionally for your needs
 you should consider enhancing them rather than written something from scratch.
\layout Subsection

Vector
\layout Standard

The vector class is defined in 
\begin_inset Quotes eld
\end_inset 

vector.hpp
\begin_inset Quotes erd
\end_inset 

 and works the same way as the standard STL vector does except that it doesn't
 have as many constructors.
\layout Subsection

BasicList
\layout Standard

BasicList is a simple list structure which can either be implemented as
 a singly or doubly linked list.
 It is defined in 
\begin_inset Quotes eld
\end_inset 

basic_list.hpp
\begin_inset Quotes erd
\end_inset 

.
\layout Subsection

StringMap
\layout Standard

StringMap is a associative array for strings.
 You should try to use this when ever possible to avoid code bloat.
 It is defined in 
\begin_inset Quotes eld
\end_inset 

string_map.hpp
\begin_inset Quotes erd
\end_inset 


\layout Subsection

Hash Tables
\layout Standard

Several hash tables are provided when StringMap is not appropriate.
 These hash tables provide a hash_set, hash_multiset, hash_map and hash_multimap
 which are very similar to SGI STL's implementation with a few exceptions.
 It is defined in 
\begin_inset Quotes eld
\end_inset 

hash.hpp
\begin_inset Quotes erd
\end_inset 


\layout Subsection

BlockSList
\layout Standard

BlockSList provided a pool of nodes which can be used for singly linked
 lists.
 It is defined in 
\begin_inset Quotes eld
\end_inset 

block_slist.hpp
\begin_inset Quotes erd
\end_inset 

.
\layout Section

Mk-Src Script
\layout Standard

A good deal of interface code is automatically generated by the 
\begin_inset Quotes eld
\end_inset 

mk-src.pl
\begin_inset Quotes erd
\end_inset 

 Perl script.
 I am doing it this way to avoid having to write a lot of relative code
 for the C++ interface.
 This should also make adding interface for other languages a lot less tedious
 and will allow the interface to automatically take advantage of new Aspell
 functionality as it is made available.
 The 
\begin_inset Quotes eld
\end_inset 

mk-src.pl
\begin_inset Quotes erd
\end_inset 

 script uses 
\begin_inset Quotes eld
\end_inset 

mk-src.in
\begin_inset Quotes erd
\end_inset 

 as its input.
\layout Standard

((MKSRC))
\layout Standard

((FDL))
\the_end
