next up previous
Next: The Scope Hypothesis Up: Document Length Previous: Second Component

First Component

The first component of equation 11 is:

Expanding this on the basis of term independence assumptions, and also making the assumption that eliteness is independent of document length (on the basis of the Verbosity hypothesis), we can obtain a formula for the weight of a term t which occurs times, as follows:

 

Analysis of the behaviour of this function with varying and d is a little complex. The simple function used for the experiments (formula 10) exhibits some of the correct properties, but not all. In particular, 14 shows that increasing d exaggerates the S-shape mentioned in section 4.2; formula 10 does not have this property. It seems that there may be further scope for development of a rough model based on the behaviour of formula 14.



Steve Robertson
Mon May 13 18:33:21 BST 1996