Robertson -- Computer Retrieval

ON RETRIEVAL SYSTEM THEORY

People have been theorizing about information retrieval and retrieval systems since well before the period covered by this review. However, the ideas, theories and models developed during this period, whether or not they make explicit reference to the fact, have been very strongly influenced by both the practice of computer-based retrieval and the understanding and empirical knowledge deriving from evaluation experiments. Two substantial examples of this influence follow.

The whole system, not the parts

I referred above to the problems of changing and confused terminology in information retrieval. In part this arises because the boundaries between different parts of the process become less clear as we realise the possibilities offered to us by computers. Thus for example one reason for difficulty with the term `indexing', reinforced by the discussion on free text above, is that some operations can be carried out either at indexing or at search stage. Given this freedom, it is no longer clear what we might theoretically call `indexing', so the terminological confusion is not surprising. More importantly, in order to understand how a system might behave or perform, we need to have the whole system; it would not make sense, in these circumstances, to even consider trying to construct or to evaluate the indexing stage alone.

This situation strongly suggests a holistic approach to modelling or theorizing about information retrieval. Although not all theoretical papers do so, there is certainly greater awareness of the role that parts play in the whole. To reinforce the terminological point, when Salton [63] or Sparck Jones [83] refer to `automatic indexing', they both in fact treat the indexing stage as part of the whole retrieval process, and indeed do not strongly distinguish the different parts.

However, holism has severe disadvantages. The reason one would like to construct and evaluate (say) the indexing stage alone, is that it would simplify matters greatly. If it were possible to define clearly what the indexing stage is, and what its function is (in relation to the whole), and to measure how well it performs its function, then it would clearly be better to do that (and to build models and theories for that purpose), without being concerned with the other parts of the system. Unfortunately, we seem to be unable to make that separation.

A recent example of this argument that I have come across concerns the evaluation of stemming algorithms. We have some idea what stemming means, and that it contributes a little (not much) to system performance. The problem is, do we evaluate a stemmer by embedding it inside an entire retrieval system, and doing a conventional retrieval test, or do we try to assess it directly? The latter would be much simpler (and potentially much more powerful in a diagnostic sense), but it depends on devising criteria for stemming that we can relate to IR system performance without actually doing the experiment.

The function of the system

It is possible to argue (indeed, I have done so on many occasions) that information retrieval systems have been around for at least two-and-a-half millennia. The justification for this argument is that all library classification schemes (as well as various more recent inventions such as card catalogues and printed indexes) are in fact information retrieval systems. I have no difficulty with this statement, but the designers of those systems might not see it that way.

In particular, the purpose or function of a classification system might perhaps have been expressed in terms of the proverb, `a place for everything and everything in its place'. The idea of `putting a query to' a classification scheme would seem, on the face of it, absurd.

However, as soon as the concept of an information retrieval system exists, and we begin to try to define what it is for, then it becomes clear that dealing with queries (requests for information), pointing them in the direction of appropriate documents, is precisely what a classification scheme is for. Indeed, a traditional library classification scheme (UDC) was among the four systems tested in Cranfield 1.

This idea has far-reaching implications for theorists of information retrieval, whether they see themselves as addressing library classification or any other possible component of a system. The evaluation experiments actually take a rather narrow and restricted view of the function of the system, which might be expressed in the following way:

to retrieve in response to a request documents (items) that will be judged by the requester (or end-user) to be relevant to the request (or underlying need, or anomalous state of knowledge).

In effect, the theorist now has the choice of accepting such a definition of function, or of conceptualising the function of the system in a different way. What he or she can no longer do is to ignore the question of function.

An example of the kind of discussion that follows from this observation is given by Robertson and Belkin [97], who address the relation between the question as to whether relevance is binary or multi-valued, and the design of ranking systems.

Probabilistic models

The prime example of the influence of the idea of evaluation on theory lies in probabilistic models, which were well represented in the Journal. Essentially a probabilistic model involves a proof that, given certain modelling assumptions, a particular procedure will give optimum performance (Robertson [92]). One of the major links in this proof is the Probability Ranking Principle (Robertson [95]). Miller's paper [68] stimulated some of the work on probabilistic models, and developments were reported in many papers (van Rijsbergen [91], Harper and van Rijsbergen [99], Croft and Harper [102], Radecki [105], Bookstein [110], Thompson [127] etc.).

Relevance feedback, discussed at length above, fits very naturally in the probabilistic framework: documents judged relevant by the user can be taken as providing direct sample evidence concerning the various probabilities of interest in the models. Indeed, in this framework one could see user-provided examples of relevant documents as a more natural way to express a query than a verbal description.

Probabilistic models do not necessarily take a holistic view of retrieval -- indeed, one of the problems of probabilistic models of searching is that they take the indexing as given -- but they nevertheless force the integration of some elements that were previously regarded as separate. For example, an associative retrieval technique might involve the assignment of weights to search terms and a match function which measures how similar to the query is any particular document. Some authors treat the two components, the weighting function and the matching function, as separate -- that is, they assume that a decision on a good weighting function is required, and also a decision on a good match function, but do not see any strong connection between the two. Probabilistic models of searching, however, require that the weighting function be regarded as a component of the match function.

Cognitive models

Although cognitive models are not well represented in the Journal, one of the early papers on the ASK model of Belkin and others appears here [107], linked to the seminal paper by Oddy [90]. Ingwersen considers manual searching from a cognitive point of view [108]. Daniels reviews cognitive models in IR [121].

Cognitive approaches to IR usually start from the user end. It is, of course, possible to consider authorship as a cognitive activity; however, the considerations above about the function of the system effectively dictate that the user should be central to the cognitive view (retrieval success such as relevance is assessed by the user, not by the author).

Although one can argue that the incorporation of a cognitive model of the user could potentially be of great benefit, the actual use of such models is fraught with difficulties. However, there have been some spinoffs from these concerns: in particular, the idea of using expert-system techniques and/or knowledge bases in the user interface, and concern with user information-seeking behaviour (whether or not s/he actually uses a formal IR system).

Given that much searching was and is undertaken by intermediaries on behalf of end-users, there is an obvious argument for trying to design an interface that has the expertise of the intermediary, and thus makes end-user searching easier. A number of attempts have been made in this direction; the two which are best represented in the Journal are Plexus (Vickery, Brooks, Robinson and Vickery [122]) and its successor TOME Searcher (Vickery and Vickery [145]). Both these systems might be described as knowledge-based: apart from the knowledge and skill of the intermediary, both try to include a knowledge base of a kind with which we have been familiar for over forty years, namely a thesaurus. (The re-emergence of thesauri as knowledge bases reminds one of the remark by M Jourdain in Le Bourgeois Gentilhomme, about having spoken prose for over forty years without realising it -- even the time period is right!) Although knowledge engineers from other fields might have difficulty in recognising a thesaurus as a knowledge base, nevertheless it is clear that the description is correct, and indeed would also apply to a more traditional library classification scheme. The big question, though, which is still unresolved, is how to design a computer system to make best use of such knowledge. It is not obvious that it must use traditional expert system techniques such as production rules (like Plexus or TOME); for example, Kim and Kim [131], Rada et al. [139], and Lee, Kim and Lee [146] all use thesaurus information in weighted search systems.

Vickery reviews knowledge representation techniques in different areas [118].

The understanding of user information-seeking behaviour has become more and more central to IR system theory and design. A particularly good environment for such study is the library OPAC (online public access catalogue), since huge numbers of searches on OPACs take place daily, without the benefit (or interference) of intermediaries (Hancock-Beaulieu [125] [134] [144], Akeroyd [130]). User behaviour may be understood in a cognitive fashion (Ingwersen [108]), but Ellis [129] argues for a behavioural view which does not require cognitive interpretation. He also [141] distinguishes two paradigms operating in IR research: the physical and the cognitive.

Mathematical models

Apart from the probabilistic models discussed above, a number of mathematical models have been enlisted to the service of IR theory. No other class of models, however, seems able to make a direct connection between performance and design, in the way that the probabilistic approach does.

The Swets model, applying signal detection theory to IR systems, does indeed address the question of performance, though less obviously that of design. This model is well represented in the Journal (Brookes [52]; Heine [77] [88]; Bookstein [82]; Hutchinson [96]. Another related model is the Shannon model (Brookes [73]). More generally, however, mathematical models deal with the internals of systems (at least, these seem to be the aspects which are most amenable to mathematical representation).

Perhaps the best-known mathematical model is the vector-space model which is the basis for Salton's SMART system. In fact the first mention of a vector-space model in the Journal is in a review by Vaswani of a report on `self-organising' files [28]. SMART is represented by a few papers (Salton [63]; Salton and Yang [80]; Wu and Salton [104]); Salton [100] reviews various mathematical models. The vector space model essentially regards the operation of indexing as locating each document as a point in a multi-dimensional space (the axes of the space correspond to the indexing terms available -- thus the space may have thousands of dimensions). Queries are similarly associated with points in the same space. This model does not directly address the question of performance; it does, however, suggest various kinds of mechanisms: for example

(a) associative matching methods generally; more specifically, a match function based on a measure of distance in the space;

(b) relevance feedback: moving the query nearer to the documents judged relevant;

(d) document space modification: using relevance feedback to adjust the indexing of documents already in the system, by moving them closer to queries to which they have been judged relevant.

Thus the spatial view might be said to encourage, without actually providing any strong justification for, certain kinds of internal mechanism or operation. It is no accident that much work in the vector-space area is strongly empirically based: having suggested a mechanism, the model has nothing to say about whether it might be a good one or not, and leaves this question to experimental resolution.

Clustering in general, whether inspired by the vector space model or not, might be seen in the same light. There are in fact two kinds of clustering in IR: clustering of documents (based on the terms occurring in them), and clustering of terms (based on the documents they occur in). Both methods are represented in the Journal (see above). Similarly, one may use relevance information for the benefit of future searches, either by document space modification as suggested by the vector space model, or by assessing the values of different terms (Biru et al. [128]).