Evaluation

Next: Results Up: Experiments Conducted Previous: Search Procedure

Evaluation

In all cases the 1000 top-ranking documents for each topic were run against the supplied relevance assessments using a standard evaluation program from the SMART projects at Cornell University. (This was the official evaluation method used in TREC--2.) The evaluation program outputs a number of standard performance measures for each query, and finally a set of measures averaged over all the queries in the run. The measures used in the tables below are average precision (AveP), precision at 5, 30 and 100 documents (P5 etc.), R-precision (RP) (precision after the number of known relevant documents for a query have been retrieved) and recall (Rcl) (final recall after 1000 documents have been retrieved).

In TREC, a distinction is made between ad-hoc (retrospective) experiments and routing (SDI or filtering). All the results reported here have been obtained using the topics and documents (and methods) used for ad-hoc experiments in TREC--2.

Steve Robertson
Mon May 13 18:33:21 BST 1996