next up previous
Next: Consequences of the Up: Document Length Previous: A Very Rough

Document Length in the Basic Model

Referring back to the basic weighting function 1, we may include document length as one component of the vector x. However, document length does not so obviously have a ``natural'' zero (an actual document of zero length is a pathological case). Instead, we may use the average length of a document for the corresponding component of the reference vector 0; thus we would expect to get a formula in which the document length component disappears for a document of average length, but not for other lengths. The weighting formula then becomes:

where d is document length, and x represents all other information about the document. This may be decomposed into the sum of two components, , where


These two components are discussed separately.

Steve Robertson
Mon May 13 18:33:21 BST 1996