Last updated: December 1998
My main current interest is the definition of practical, rigorous
methods for assessing the reliability and
safety of software and similarly complex systems subject to
design faults. The obstacles lie in system complexity, in the
difficulty of reasoning about the effects of design faults, and in
the complexity of the reasoning itself. My recent work in this area
addresses the current Limits to the assessment
of software dependability; the problems with
intuitive judgement in this area; the use of Bayesian methods for
inference from test results, and of
Bayesian belief networks in a more general
context, as an aid for more rigorous assessment.
Much of my work addresses Software fault tolerance and design diversity as a means for improving reliability and safety, and the problems in assessing the efficacy of diversity and the dependability of diverse systems.
I am also interested in software testing, both as a way of evaluating reliability and of finding faults to improve reliability. In particular, I have published on the issues of inference from testing results, including the use of testability notions, and on the modelling of the software failure process, especially for the evaluation of iterative-execution softwareOther papers are listed in my complete bibliography.
Decision makers often need to evaluate how reliable or how safe a new software product will be in operation. The way most of these evaluations are performed nowadays is unsatisfactory: there is no way to show that the evidence taken into account by the evaluator actually demonstrates that their conclusions are right. Quantitative measures, like probabilities of failure, which would be extremely useful if trustworthy, are often assigned by arbitrary and unscientific procedures. An improvement can be sought by first studying the defects of the current methods, and then offering improvements where possible. The paper:
presented to a workshop convened by the European Space Agency, contains a terse summary of my position. .
Extremely high reliability is required from software in many current applications, especially in safety-critical applications. Even if we trust the developers to achieve these levels, it is often impossible to demonstrate that the requirement is satisfied. This is discussed in the paper:
which gives a general survey of the means available to an evaluator. The essential limit is that evaluators seek to make very strong predictions on the basis of scant evidence. A more general introduction to the problem is in
To judge whether a system is dependable enough for its intended use, assessors need to consider complex, diverse evidence: development process, test results, track record of the developers, etc. Many prescriptions in quality or safety standards are there to guarantee that this evidence is available and reliable. However, the assessors then have to integrate all this evidence into a final judgement of acceptability by relying essentially on their experience and expert judgement, usually without the aid of any defined, sound method. Research in psychology has shown that such trust in unaided intuitive reasoning is dangerous. I have surveyed this research, with examples of its relevance in the software assessment domain, and outlined the precautions that decision makers should take, in the technical report:
A short summary of the argument is in:
Aid in these problems can be sought from formalisms by which the assessors can represent complex reasoning in an explicit, logically sound way, thus allowing better communication with other experts, auditing of the arguments, precise calculations by software tools. One such formalism is that of Bayesian belief network. The report:
contains a brief introduction to this formalism, arguments for their use and a simple, software-related example. A shorter article on this topic is:
Direct evidence of the reliability of a product can only be gained from operational testing. However, and especially for highly reliable software, a classical interpretation of test results in terms of confidence levels is insufficient if one does not also consider the other evidence available. This process is best modelled in terms of Bayesian inference. Antonia Bertolino (at IEI-CNR in Pisa) and I have studied elementary applications of Bayesian inference to software reliability assessment, modelling various assumptions that an assessor may be able to make and showing their implications on the assessor's reliability predictions. The paper:
explains the Bayesian model for inference from software testing and shows various applications. ( available on-line).
To improve on the confidence that can be derived from testing, J. Voas and co-authors have proposed a form of reasoning using testability measures. We re-cast their reasoning in terms of Bayesian inference, and clarify its implications for decisions during software development, in:
Another paper,
shows that testability is more naturally modelled as just one characteristic of the prior probability distribution for the reliability of the software, and that considering just a lower bound on testability, as proposed by previous authors, rather than the whole distribution, gives insufficient, often misleading information for decisions about software acceptance.
Assessment on the basis of testing depends on a simple model of the software as a black box subjected to a series of independent demands. The paper:
explains the different situation with iterative-execution software, and rigorously argues, on grounds of economy and trustworthiness of predictions, why testing for reliability evaluation should be based on independent, long series of execution. A series of more detailed models, representing different assumptions (both on the correlation among successive failures in control software and on the robustness of the controlled system against short failure bursts), is shown in
One of the other few cases in which "clear-box" reliability modelling has been applied to software is that of fault-tolerant software configurations.
A short summary of these last two papers is in
We are now engaged in two projects on software fault tolerance and design diversity, which are producing interesting new results, many of which are available here.
This survey discusses the relationship between the evaluation of reliability and safety and design for achieving them.
The choice of testing methods, and in particular the choice between testing with an operational input profile and testing with some other systematic method, must be based on the ability of testing methods to improve the reliability of the software, not just of detecting faults: if a program contains many faults with negligible impact on its failure probability, detecting these faults may be just a waste of effort without any real reliability improvement. Some initial work along these lines, with R. Hamlet, P. Frankl and B. Littlewood, has produced interesting analyses of how the (probabilistic) characteristics of programs and of testing methods should affect decisions. A first report was presented at ICSE'97:
While these papers study the effects of testing on a program with given defects, a further study takes the viewpoint of a decision maker for whom the defects are unknown, and the main concern may be to reduce the risk of releasing highly unreliable programs, rather than improving the average reliability of the programs released. Preliminary results seem to indicate greater advantages from operational testing, as a means of reducing risks, than are generally claimed, and will be presented at ISSRE'98:
Software fault tolerance, including design diversity, is a mature design practice in a few organisations. Elsewhere, it is seldom employed, and generally applicable schemes and toolsets to support its use are lacking. I have worked on software fault tolerance for about 16 years. Interestingly, while many papers on designing fault tolerant software (with increasingly complex design schemes) have been published, their influence on the industry has been minimal, and its simpler forms are often rediscovered by practitioners without being aware (and thus without taking advantage) of previous work in the area.
Previously, my colleagues and I studied different structuring schemes for software-implemented, application-level fault tolerance, meant to contain the complexity that would arise from the ad-hoc addition of redundancy without a general structuring scheme. One such scheme is meant for large systems including both permanent data objects and message-passing process sets. A technical report and further references are available.
An older project in which I was involved concerned the structuring of adaptive fault tolerance in real-time systems. A conceptual description is in
is a survey and discussion of the practical use of conversations (proposed by Randell and co-authors in 1975 for coordinating backward recovery of concurrent processes), a general structuring scheme of which many variations have been proposed in academic research, with few actual applications. A study in supporting a limited form of conversations using standard Ada together with a pre-processor for special constructs inserted in the source code is in:
I am also including pointers to an old but innovative paper on the design and evaluation of voting algorithms for redundant and diverse systems.
The literature on fault-tolerant design contains many different designs for voting (or "decision ", "adjudication") on the results of redundant and possibly diverse replicas of a functional module, but very few attempts to analyse which designs would be best, under which circumstances. The paper
introduced a simple probabilistic statement of the problem, the criteria for comparing different algorithms, and the definition of an "optimal" adjudicator against which any non-optimal designs can be compared.
extends the discussion to the problem of deciding, in a redundant, reconfigurable system, when a redundant processing element, which has exhibited transient failures, should be considered permanently faulty and excluded from the system.