City University Logo (13657 bytes)

CSR Logo

Fault-Tolerant Design of Computer Systems
An Introductory Course

A Five Day Short Course at City University, London

Companies place increasing reliance on computer systems for the very survival of their business; computer applications become ever more complex, yet they are often built from unreliable components, hardware or software.

Fault tolerance - design for surviving component failures- is becoming a necessity for a growing number of companies, far beyond its traditional application areas, like aerospace and telecommunications.

This course - organised as five one-day lectures that can be taken individually - addresses the needs of:

This course has been developed by the Centre for Software Reliability with funding from the Engineering and Physical Sciences Research Council (Grant Number 00711ENG95) as part of their individual MSc Modules Programme.

The Centre for Software Reliability is a Registered Provider for the IEE Continuing Professional Development (CPD) Scheme.

Background, Course Objectives, Contents, Course Teaching, Fees, Booking information

Introduction and Background

Computer failures can have crippling effects on an organisation's ability to function. Any company, not just software-related businesses, can become bankrupt as a result of computer failure. In the next two or three years, the "Millennium bug" alone will generate many vital errors. And yet increasingly, business-critical computing systems are being assembled from off-the-shelf components never designed for high reliability, availability or safety.

This course offers a unique opportunity for engineering managers and software designers to learn about fault tolerance - about systems surviving failure. It is about maintaining systems despite the failure of some of their parts. In other words, without uncontrolled disruption of service. This is not rocket science; if you know the basic principles, you can apply them to everyday design and purchasing decisions.

The participants will learn the basic concepts necessary for decisions about the form and extent of redundancy to be employed during the design or procurement of computer systems. These concepts have been developed by researchers during the whole history of computing, but their application has been mostly limited to safety-critical and other high-risk, high-budget applications. By contrast, this course will consider the range of techniques available to organisations with different dependability requirements and budgets for fault tolerance. We will cover the integration of automatic and manual procedures, and will specifically address software-caused and operator-caused failures. The course will thus satisfy the needs of companies that have to decide between market offerings of fault-tolerant commercial products, and/or the need to integrate a fault-tolerant system out of non- fault-tolerant products.

Course Objectives

At the end of this course you should:

Detailed contents

The standard timetable below includes ample time for class discussions and group problem sessions. The presentation of the material will emphasise examples in practical contexts.

Day 1 Fundamentals of design for dependability and fault tolerance.

Day 2 Methods for error detection, confinement and recovery

Day 3 Recovery, modular redundancy and fault tolerance in distributed systems

Day 4 Fault tolerance against software and design faults, and against operator error

Day 5 Commercial fault tolerant systems; decisions in design, procurement and deployment of fault-tolerant systems

Each day starts at 9:30 and ends at 16:30

Course Teaching

The course is prepared and taught by the Centre for Software Reliability (CSR), at City University, which is recognised internationally as a centre for excellence in software reliability and measurement. The course leader is Prof Lorenzo Strigini, who has 18 years' experience in research in fault tolerance in hardware and software, including consulting and teaching industrial courses.

About CSR (Centre for Software Reliability at City University)

To arrange for a delivery of the course, to be informed of the next delivery, for additional information on course contents, for discussing a tailored version of the course, or to be put on a mailing list for future information, contact the course leader, Prof Lorenzo Strigini