Okapi-Pack

Centre For Interactive Systems Research
City University
London EC1V 0BH


Appendix F:   Indexing Databases.

1. Indexing an Okapi Runtime Format Database.

This is a two-stage process performed with the following two programs. Both programs can be found in <OKAPI_ROOT>/bin. The index parameters are read from the main database parameter file and the search_groups parameter file.

1.1. ix1

For each index specified in <db_name>.search_groups file the program reads each field specified by the appropriate indexing parameter, splits it into "indexing units"' as determined by the field type, and extracts or generates keys from it in accordance with the specified indexing regime, stemming function and GSL file. Keys are periodically (as determined by the amount of available memory) sorted and merged to form a sequence of index term records (ITRs) which are output to a temporary runfile. When term extraction has finished all the runfiles are merged to form a single output run of ITRs which are used as input to the indexing program ixf.

1.2. ixf

This program is reponsible for "putting in the 'structure'" making primary and secondary dictionary files and the postings files. ixf can also make the document length file if required.


2. Running the Programs.

It is possible to send the output of ix1 to files which may then be read by ixf . However, if disc space allows the simplest way to run the programs is to "pipe" the output from ix1 to ixf and complete the process in one go.

E.g.

    ix1 -c <BSS_PARMPATH> <db_name> <index_no> |
    ixf -c <BSS_PARMPATH> <db_name> <index_no>

3. Options When calling the Programs.

All three programs are capable of being called with a variety of parameters. Calling each with the switch "-help" will display information about the types of parameters that may be passed to each.

3.2. ix1

    maxfiles set to 1021
    Stage 1 indexing program ix1
    Revision dates: major Sep 24 1996, minor Dec 12 1997

    ix1 [-m[em] <mem>]   [-t <tempdir> <minfree> [-t <tempdir> <minfree>]]  
    [-h[elp]]   [-s[tart] <start_rec>]   [-f[inish] <finish_rec>]   [-silent]  
    [-d[ebug] <debug code>]   [-c <control_path>]   [-maxruns <num>]  
    [-trial <trialnum>]   [-[no]index]   [-[no]doclens]   [-[no]merge]  
    [-[no]deltmp]   [-[no]delfinal]   [-l[im] limit mask]  
    <database>   <index number>  >   <output file>


    Table 3.2.1. ix1 switches
    -m <mem> the unit is 1 MB, default 4
    -s <start_rec> default 1st record
    -f <finish_rec> default last record
    -t <tempdir> Temporary directory for pre-merge files: full pathname with or without trailing slash. You can allocate up to 10 of these, but they must be on separate file systems. Default path /tmp
    <minfree> is the amount of space in MB which must be left (default 0). Temporary filenames are ix1.<pid>.0000-9999
    -c <control_path> directory for database parameter files, default BSS_PARMPATH
    -maxruns <num> 2 <= maxruns <= 1021
    -trial <trialnum> every <trialnum>th record is processed (with output discarded), and then the program estimates the amount of space needed for runfiles and intermediate merge files.
    -noindex inhibits output of index (default index)
    -silent inhibits some diagnostic output.
    -nomerge prevents final merging and leaves temp files (default merge)
    -doclens/nodoclens contributes/doesn't contribute to the doclength file
    -deltmp/nodeltmp deletes/doesn't delete the temporary files after nonfinal merges.
    -delfinal/nodelfinal deletes/doesn't delete the temporary files after final merge.
    -l <limit mask> restricts indexing to docs which match the limit mask
    <database> The database parameter file name (NOT path).
    <index number> Must be within the range in the database parameter.

    The final two parameters ( <database> and <index number> ) must appear last.


    Table 3.2.2. Default Parameter Values
    Parameter Value Parameter Value Parameter Value
    maxruns 1021 start 1st record finish Last record
    trial no index yes doclens no
    merge yes silent no debug no
    deltmp no delfinal no    

The program is often run with output piped to ixf (as shown in Section 2, Running the Programs ), which puts the structure in.

3.3. ixf

Final index production program ixf - version date Jan 15 1998

    ixf [-c   <control directory>]   [-no_out]   [-diag]   [-stdout]
      [-report   <reportlevel>]   <database>   <index number>


    Table 3.3.1. ixf switches
    <control directory> directory for database parameter files, default directory: BSS_PARMPATH
    -no_out No output generated by the program
    -diag Display diagnostics
    -stdout Send output to <stdout>
    -report   <reportlevel> Set reporting level
    <database> The database parameter file name (NOT path).
    <index number> Must be within the range in the database parameter (0..15)



Okapi-Pack Main Menu Mail Okapi Support Registration


Last modified:   12th November 2001