next up previous
Next: Second Component Up: Document Length Previous: Document Length in

Consequences of the Verbosity Hypothesis

We assume without loss of generality that the two Poisson parameters for a given term, and , are appropriate for documents of average length. Then the Verbosity hypothesis would imply that while a longer (say) document has more words, each individual word has the same probability of being the term in question. Thus the distribution of term frequencies in documents of length d will be 2--Poisson with means and .

We may also make various independence assumptions, such as between document length and relevance.



Steve Robertson
Mon May 13 18:33:21 BST 1996