The second component of equation 11 is
Under the Verbosity hypothesis, the second part of this formula is zero. Making the usual term-independence or linked-dependence assumptions, the first part may be decomposed into a sum of components for each query term, thus:
Note that because we are using the zero-vector 0, there is a component for each query term, whether or not the term is in the document.
For almost all normal query terms (i.e. for any terms that are not actually
detrimental to the query), we can assume that and
.
In this case, formula 12 can be shown to be monotonic decreasing
with d, from a maximum as
, through zero when
,
and to a minimum as
. As indicated, there is one
such factor for each of the
query terms.
Once again, we can devise a very much simpler function which approximates to this behaviour, as follows:
where is another unknown constant.
Again, is not specified by the model, and must
(at present, at least) be discovered by trial and error (values in the
range 0--2 appear about right for the TREC databases, although
performance is not sensitive to this correction
)---see the results
section 7.