next up previous
Next: Experiments Up: Some Simple Effective Approximations Previous: Document Length and

Query Term Frequency

 

The natural symmetry of the retrieval situation as between documents and queries suggests that we could treat within-query term frequency () in a similar fashion to within-document term frequency. This would suggest, by analogy with equation 8, a weighting function thus:

 

where is another unknown constant.

In this case, experiments (section 7) suggest a large value of to be effective---indeed the limiting case, which is equivalent to

 

appears to be the most effective. This may perhaps suggest that an S-shaped function like equation 9 could be better still, though again none has been tried in the present experiments.

The experiments are based on combining one of these multipliers with the within-document term frequency and document length functions defined above. However, it should be pointed out that (a) the ``natural symmetry'' as between documents and queries, to which we appealed above, is open to question, and (b) that even if we accept each model separately, it is not at all obvious that they can be combined (a properly constructed combined model would have fairly complex relations between query and document terms, query and document eliteness, and relevance). Both these matters are discussed further by Robertson [12]. In the meantime, the combination of either multiplier with the earlier functions must be regarded as not having a strong theoretical motivation.



Steve Robertson
Mon May 13 18:33:21 BST 1996