On a scheme for backward recovery in complex systems including both client processes and data servers

By Lorenzo Strigini*, Felicita Di Giandomenico**, Alexander Romanovsky***

* Centre for Software Reliability, City University
** Istituto di Elaborazione della Informazione, Pisa, Italy
*** Dept. of Computing Science, University of Newcastle-upon-Tyne, U.K.

CSR Technical Report. October 1996. Last revised: March 1997.

We discuss a design scheme for co-ordinated backward recovery in complex systems. This report concludes a series of papers on this topic. We give the rationale of our proposed approach, a complete, reasoned specification of the mechanisms, a comparison with other related research, and pointers to other papers describing proof-of-concept examples of use and implementation of this scheme.. 

We consider backward error recovery for complex software systems, where different subsystems may belong to essentially different application areas, like databases and process control. Examples of such systems are found in modern telecommunication, transportation, manufacturing and military applications. Such heterogeneous subsystems are naturally built according to different design "models", viz. the "object-action" model (where the long-term state of the computation is encapsulated in data objects, and active processes invoke operations on these objects), and the "process-conversation" model (where the state is contained in the processes, communicating via messages). To allow backward error recovery in these two "models" of computation, two different schemes are most appropriate. For the object-action model of computation, atomic transactions are now the accepted model of backward recovery. For the process-conversation model, a recovery scheme based on planned conversations has been widely studied. We have shown how checkpointing and roll-back can be co-ordinated between two sets of such heterogeneous subsystems, namely sets of message passing processes organised in conversations and data servers offering atomic transactions. Assuming that each of the two kinds of subsystem already has functioning mechanisms for backward error recovery, we have described the additional provisions needed for co-ordination between heterogeneous subsystems. Our additions are based on rather general models of both transactions or conversations: they could be adapted for most specific instances of either scheme. Our solution involves altering the virtual machine on which the programs run, and programming conventions which seem rather natural and can be automatically enforced.

Full text of report, pdf format

Related papers

The ideas presented here were first outlined in:
  • L. Strigini, F. Di Giandomenico, "Flexible schemes for application-level fault tolerance", Proceedings IEEE 10-th Symposium on Reliable Distributed Systems, Pisa, September-October 1991, pp. 86-95.
  • A shorter version of the discussion and specification parts of this report, with an example of use of our method, is published in

  • L. Strigini, F. Di Giandomenico, and A. Romanovsky, "Coordinated Backward Recovery between Client Processes and Data Servers", to appear in IEE Proceedings on Software Engineeering, Vol. 1, No 2, April 1997.

  • Material from that paper is reused here with permission from IEE.

    The older report

  • L. Strigini, A. Romanovsky and F. Di Giandomenico, "Recovery in heterogeneous systems", Technical Report 133, PDCS-2 ESPRIT Basic Research project, 1994, also available on line at URL http://www.newcastle.research.ec.org/pdcs/trs/index.html#133
  • is superseded by the present report, but may be of interest for a description of how a simple version of our proposed scheme can be implemented in the Ada language.

    The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder. 

    CSR Home | CSR Research Projects | CSR Publications | School of Informatics | City University

    Page maintained by: Lorenzo Strigini