next up previous
Next: Search Procedure Up: Experiments Conducted Previous: Experiments Conducted

Database and Queries

The experiments reported here involved searches of one of the TREC collections, described as disks 1 & 2 (TREC raw data has been distributed on three CD-ROMs). It contains about 743,000 documents. It was indexed by keyword stems, using a modified Porter stemming procedure [13], spelling normalisation designed to conflate British and American spellings, a moderate stoplist of about 250 words and a small cross-reference table and ``go'' list. Topics 101--150 of the 150 TREC--1 and --2 topic statements were used. The mean length (number of unstopped tokens) of the queries derived from title and concepts fields only was 30.3; for those using additionally the narrative and description fields the mean length was 81.

Steve Robertson
Mon May 13 18:33:21 BST 1996