|
|
Following the final field, but preceding the record mark, there may be one or more additional temporary fields which will provide part of the contents of the "fixed" area at the beginning of the corresponding runtime record. The first of these fields, if present, contains a number representing the bitwise "or" of the limit criteria which the record satisfies; other such fields might contain accession/modification dates or other codable information, but nothing is defined at the time of writing.
Thus an exchange record is
<field contents> | <field mark> | [ < field contents> | <field mark> ] | <record mark> |
Field contents may be anything which doesn't contain field or record mark characters. Usually it is fairly straight ASCII text. Historically, certain characters in certain types of field (see Field Types), have special meanings, but in general they do not.
The indexing application, "indexer", expects to find a text file containing the exchange format database which it converts to an Okapi runtime database. If, however, you are not using "indexer" it is often not necessary to hold any data in exchange format. Having written a program or script to convert your raw data to exchange format it is pipelined into the program which converts exchange to runtime format.
<fixed field> | <field directory> | <field> | [ <field> ] | [ <padding> ] |
There are no field or record marks.
Fixed field |
If the database has limiting facilities the first two bytes of the fixed field contain the record's limit mask as a 16-bit unsigned value. |
Field directory |
For a non-text database this consists of a 16-bit unsigned field length for each data field. For a text database these directory fields are 24 bits long. Each one contains the length in bytes of the corresponding data field. |
Fields | May contain anything, or nothing. It is not normal for databases other than databases of type "text" to contain newline characters. Interfaces to search programs would normally format to suit the required display. |
Padding |
Ultimately, database records have to be addressed by their offset
in a disk file or sequence of files. This addressing is limited to
31 (or possibly only 30) bits. This would limit the total size of
a database to about two gigabytes or less. Hence records may be
padded on the end, if necessary, so that their length is a
multiple of a small power of two. Increasing this power by one
doubles the maximum possible size of the database. This
information is recorded in the database parameter as
"rec_mult" (see Database Parameters). For example, if
rec_mult is 4, the maximum size of the database will be eight
gigabytes. Of course if rec_mult is large compared to the mean
record length rather a lot of space will be wasted; the mean
amount of wasted space per record is (rec_mult - 1)/2. Any character may be used for padding; the runtime conversion program actually inserts plus signs. |
There is a standard program called convert_runtime to do this. It reads from stdin and writes a runtime bibfile (in <OKAPI_ROOT>/bibfiles), directory file, and (in the case of text-type databases) a paragraph file. It also fills in certain information in the main database parameter file which must exist and be writable before the program can run. The main database parameter file is found in <BSS_PARMPATH>.
"convert_runtime" is called by indexer with the following parameters:
convert_runtime | -c | <BSS_PARMPATH> | <db_name> | < | <exchange format file> |
e.g.
convert_runtime | -c | /okapi/databases | med.sample | < | /okapi/datafiles/med.exch |
convert_runtime |
[-c <ctrl directory>]
[-a]
[-num <maxrecs>]
[-treclimits] [-fixedlimit <fixedlim>] [-halfcollection] [-version] [-help] [-rm <record terminator character>] [-fm <field terminator character>] [-phoney_fcno] [-skip <skipnum>] [-checkpoint <interval>] [-nopar] <database name> < <input file> |
Typing convert_runtime -help will list the above switches.
When running the program the database name (the main parameter file name) must come last; it will be read from:
in that order.
-a | causes database to be appended to an existing one of the same name. |
-num <maxrecs> | limits total database size to <maxrecs> records. |
-skip <num> | causes input records to be skipped before processing starts. |
-checkpoint <num> | causes files to be flushed and stats displayed after every <num> output records. Default 5000, <num>=0 prevents checkpointing. |
-treclimits | inserts predefined doclength limit bits (see the code). |
-fixedlimit | ORs the following arg into the limits field of each record. |
-halfcollection | sets the '1' or '2' bit in the limits field according as the record number is odd or even. |
-rm <record terminator character> | sets the character to be used as the record terminator. Defaults to 0x1D. |
-fm <field terminator character> | sets the character to be used as the field terminator. Defaults to 0x1E |
-phoney_fcno | puts a dummy entry in the second field of each paragraph record (use this arg if field 1 may have more than 1 'word' in it). |
-nopar | prevents paragraph file being made (only applies if text database) |
Okapi-Pack Main Menu | Mail Okapi Support | Registration |