Referring back to the basic weighting function 1, we may include document length as one component of the vector x. However, document length does not so obviously have a ``natural'' zero (an actual document of zero length is a pathological case). Instead, we may use the average length of a document for the corresponding component of the reference vector 0; thus we would expect to get a formula in which the document length component disappears for a document of average length, but not for other lengths. The weighting formula then becomes:
where d is document length, and x represents all
other information about the document.
This may be decomposed into the sum of two components,
,
where
These two components are discussed separately.