Second Component

Next: First Component Up: Document Length Previous: Consequences of the

Second Component

The second component of equation 11 is

Under the Verbosity hypothesis, the second part of this formula is zero. Making the usual term-independence or linked-dependence assumptions, the first part may be decomposed into a sum of components for each query term, thus:

Note that because we are using the zero-vector 0, there is a component for each query term, whether or not the term is in the document.

For almost all normal query terms (i.e. for any terms that are not actually detrimental to the query), we can assume that and . In this case, formula 12 can be shown to be monotonic decreasing with d, from a maximum as , through zero when , and to a minimum as . As indicated, there is one such factor for each of the query terms.

Once again, we can devise a very much simpler function which approximates to this behaviour, as follows:

where is another unknown constant.

Again, is not specified by the model, and must (at present, at least) be discovered by trial and error (values in the range 0--2 appear about right for the TREC databases, although performance is not sensitive to this correction)---see the results section 7.

Steve Robertson
Mon May 13 18:33:21 BST 1996