Okapi-Pack

Centre For Interactive Systems Research
City University
London EC1V 0BH


Appendix C:   Database Parameter Files.

The database parameter files are stored in the the directory specified by BSS_PARMPATH. As well as those for the two sample databases (cacm.sample and med.sample), there are some for a non-existent database - bigtext - that show how parameter files would be constructed for a multi-volume database.

C.1. The Main Database Parameter file: <db_name>

The contents of the file are, in order:

Entry Comment
name=<db_name> database name
lastbibvol=<no. files comprising the bibfile - 1> default = 0
bib_basename=<basename of bibfile>  
bib_dir=<pathname of bibfile> Repeats (lastbibvol + 1) times
bibsize=<in kilobytes> An overestimate of the available space on corresponding <bibvol>. It repeats (lastbibvol + 1) times.
real_bibsize=<size of corresponding volume> In (bytes / <recmult>). A dummy value should be entered. The correct value is filled in by convert_runtime program. It is repeated (lastbibvol + 1) times.
display_name=<info_display_name> What is to be displayed by the BSS "info databases" command.
explanation=<additional explanatory line of text>  
nr=<number of records> Start with dummy value. The correct value is filled in by convert_runtime.
nf=<number of fields per record>  
f_abbrev=<field abbreviation> Repeats nf times. Conventionally these are two-character mnemonics, unused at present.
rec_mult=<storage unit for a record, a small power of 2> Default 4
fixed=<length in bytes of database records' "fixed" field> Default 0
db_type=<database_type> text or ai
maxreclen=<max record length in bytes> (filled in by convert_runtime)
ni=<number of indexes> 1 <= ni <= 9
last_ixvol=<no of postings file volumes - 1> Defaults to 0
ix_stem=<pathname of index files + prefix> All index files must be in the same directory
ix_volsize=<size in MB> Space available (MB) for index volume.
ix_type=<index_type> 8 for <db_type>=text; 9 for others.

The entries last_ixvol, ix_stem, ix_volsize and ix_type repeat <ni> times.


The main parameter file for the sample database (med,sample) is:

      name=med.sample
      lastbibvol=0
      bib_basename=med.sample.bib
      bib_dir=/project/okapi/OkapiNet/bibfiles/
      bibsize=2047
      real_bibsize=930253
      display_name=med.sample
      explanation=Approx. 1000 records from the Medlars database.
      nr=1033
      nf=3
      f_abbrev=DN
      f_abbrev=MI
      f_abbrev=TX
      rec_mult=4
      fixed=0
      db_type=text
      maxreclen=2047
      ni=2
      last_ixvol=0
      ix_stem=/project/okapi/OkapiNet/bibfiles/med.sample
      ix_volsize=2047
      ix_type=8
      last_ixvol=0
      ix_stem=/project/okapi/OkapiNet/bibfiles/med.sample
      ix_volsize=2047
      ix_type=8

C.2. The Field Types Parameter File.

Each database has a single field type parameter file called <db_name>.field_types. There is a line for each field consisting of:

      <field_no>   <field_type>

  <field_no> in the range 1 to <nf> inclusive.
  <field_type> taken from the following set of predefined types

Field Type Comment Field Type Comment
PERS   CORP  
NAMES   TITLE  
MAIN_TITLE   SUBTITLE  
DEWEY   SH Subject Heading
SH_SUBDIV   TEXT  
PHRASE   LITERAL  
LITERAL_NC Lowercase NUMBERS  
YEAR   UDC  
ANY      

The field_types parameter file for the Medlars sample database (med.sample.field_types) is:

1 LITERAL_NC
2 TEXT
3 TEXT

C.3. The Search Groups Parameter file.

There is one entry per index in the file, each consisting of seven fields:

<index_name>   [|<index_name>] The name or mnemonic of the index.
<dummy> A power of 2. Unused but must be present.
<index_no> From 0 to <ni> - 1
<term extraction regime> See below
<stem function name> wstem | sstem | nostem
<GSL filename> GSL or stoplist filename (in <BSS_PARMPATH>)
<field_list> Zero separated list of fields to be indexed, terminated by -1

The search_groups parameter file for the sample database (med.sample.search_groups) is:

kw 1 0 words3 sstem gsl.med 3   0   -1
dn 1 1 literal nostem gsl.empty 1   0   -1

The GSL file provides a list of stop words, phrases to be indexed rather than as individual terms and synonym groups. The structure of a GSL file is shown in Appendix D.

The term extraction regimes determine how index terms are extracted from the Okapi database. Available values for this field are:

words3 all terms, stemmed by the specified stemming function, except entries in the GSL file which are dealt with according to their GSL code
(see Appendix D ).
literal  
literal_nc Same as literal but lowercase
phrase  
Dewey  
Subject heading  
name_phrase  



Okapi-Pack Main Menu Mail Okapi Support Registration


Last modified:   12th November 2001