We have created a new language to describe architectures of software
systems and their behaviour, called Xcd (pronounced "exceed"). Here's
the Xcd home
This language allows to specify architectural designs in a textual form, as if you were programming. It also allows you to check these designs for errors. There's a tool that translates the designs into another language that is easier to check automatically (through an external tool called Spin - a model checker).
A recent article describing the Xcd language has been published at the CBSE'14 conference:
Here are a number of research projects related to the Xcd language and system
Note: By "research" we don't mean there will be no programming - on the contrary...
There's a tool called Spin, that takes a model and checks it automatically.
Developed at Bell Labs, received the very prestigious ACM System Software award in 2002, its developer now heads a research lab at NASA. https://spinroot.com/spin/whatispin.html
Spin's written in C. Plain, good, old C. No classes/generic code - simple... (he, he, he).
We'd like to modify Spin itself so that it better supports our Xcd language.
More info (+ binaries and source code) on Spin can be found here: https://spinroot.com/spin/whatispin.html
/* receive a message consisting of three data values from channel chan1 and assign these into d1, d2, d3. */ chan1 ? d1, d2, d3;
You can now also do:
/* only receive a three valued message and assign the first value to d1 if the second value in the message is equal to 3 and the third value is equal to the current value of d4 */ chan1 ? d1, 3, eval(d4);
/* only receive a three valued message into d1, d2, d3, if its values satisfy the predicate after the : character, where $i is the i-th message component */ chan1 ? d1, d2, d3 : ($1 * $2 == $3 + d4);Note: Spin has different types of message reception.
// https://spinroot.com/spin/Man/receive.html c ? m; /* receive the first message in the buffer if it matches m */ c ?? m; /* receive any matching m non-deterministically */ c ? < m >; /* receive the first message in the buffer if it matches m but don't remove it from the channel buffer */ c ?? < m >; /* receive any matching m non-deterministically but don't remove it from the channel buffer */ // https://spinroot.com/spin/Man/poll.html c ? [ m ]; /* poll channel to see if you could receive the first message in the buffer if it matches m */ c ?? [ m ]; /* poll channel to see if you could receive any matching m non-deterministically */
/* select(i, min, max); */ i = min; do /* Non-deterministic loop, can choose any value between min and max */ :: i < max -> ++i :: break od
We'd like a select(i, min, max, predicate) - assign i a value between
min/max that satisfies predicate; block if no such value exists. The latter part is important (and difficult) - the statement shouldn't be executable at all if there is no value within the range for which predicate is true.
And of course, if the predicate is true for some value then it shouldn't block, so the following "implementation" is wrong:
/* select(i, min, max, pred); */ i = min; do /* Non-deterministic loop, can choose any value between min and max */ :: i < max -> ++i :: pred(i) -> break /* Bug: if pred(i) is true, one may choose to advance i nevertheless and get stuck in values of i for which pred(i) isn't true. We could store the last value of i that satisfies pred(i) */ odAnother attempt - but is it correct?
/* select(i, min, max, pred); */ i = min; last_i=min-1; do /* Non-deterministic loop, can choose any value between min and max */ :: pred(i) -> last_i = i ; /* Store the last value of i that satisfies pred(i) */ if :: i < max -> ++i :: break fi :: else -> if :: i < max -> ++i :: (i == max) && (last_i != min) -> i = last_i ; break fi od
We can then modify the Xcd translator to take advantage of these features (especially the generalized message reception, which should help decrease the state space considerably).
This would also help demonstrate the usefulness of these features by comparing how the two versions of the translator behave memory/time-wise.
Languages are defined by their grammar and their semantics.
The grammar says that "a $ 3 ;" is incorrect Java. The semantics
describe what exactly a grammatically correct piece of code actually
means (what does it mean to write something like "a = 4;"? It's
context dependent - it's an error if you haven't defined "a" to be a
Imagine that I give you the grammar of the Greek language. Using it you can identify which Greek sentences are correct and which are wrong but without knowing the semantics you cannot say what the correct sentences actually mean - it's all Greek to you! Well, the same happens with computers.
We have an informal definition of the Xcd language semantics,
described in English and pseudo-code in Mert Ozkaya's PhD thesis.
Mert has also written the tool that translates Xcd designs into the
Spin tool's language so that they can be verified. This tool is
supposed to have followed the English language Xcd semantics but it's
quite possible that some errors have been done (or indeed that the
original semantics were not always correct - we're only humans...).
Consider this - how do you know that your C++ compiler correctly implements the C++ standard (i.e., the C++ semantics as expressed in English by the standard committee)? They've tried to interpret what the committee had written in English and codify that into their compiler but it's not an easy task - bugs have been identified on many occassions.
There's a way to make this task easier by using a recently developed framework called the K Framework. There you use a specific language to describe the semantics of a language and there are tools to produce the compilers for you automatically (like writing a UML statechart and then getting running code from it directly). They have used that framework to produce executable formal semantics for the C language (a very interesting but maybe a bit difficult to understand paper). A much easier but slightly outdated primer into the K Framework is available, showing how the framework works for a simple calculator language - should be much easier to follow (and I'm more than glad to help with it). There's an up-to-date (as far as I know) set of youtube tutorial videos.
So what we want is for you to try to use the K Framework to express
the Xcd semantics with it.
Alternatively, you could express the semantics of FSP with the K Framework, to get our own version of LTSA!
A paper describing the translation of FSP into another formal language (Lotos) may be helpful in understanding how FSP is supposed to work when unsure (mainly a note to myself).
You need to be able to think abstractly and be very, very methodical - giving super attention to detail.
Issue: The code that Spin has internally to be able to simulate models is quite similar to the C code it produces to verify models. The verification code can also execute fragments of C code that can be included in Spin's models (as they're getting compiled along with everything else), while the simulator cannot execute them.
We'd like a version of Spin that enables the produced C program to also simulate the model - one small step towards merging the two facets of Spin.
(or use hobbes to execute the checker code directly inside Spin...)
Spin, is not just a tool - it has so many configuration options that it'd be more right to call it a family of tools.
These options are passed through compiler directives: -DSUPERTRACE -DN=32, etc.
The code is then compiled according to these directives.
This will allow the compiler to compile the *whole* code, not just the one that has been selected by the directives.
This will allow us to look for bugs in Spin, some of which could be masked now. I'm already developing some crazy C pre-processor macros to facilitate this task (one must have fun from time to time...;-).
The whole code? Well, not exactly... You see, Spin contains code inside strings (in files called pangenX.[ch]), that it uses to produce another executable called pan (Promela ANalyzer) given some model - it's the pan executable that actually checks the model. Obviously, this code is never seen by the compiler when Spin is compiled - only seen when compiling pan. In itself that's not too bad, as one can refactor it as we've just described for the Spin code and check for bugs. But some of this code is quite likely similar to the code that Spin itself uses - Spin can simulate a model (i.e., execute one of its execution paths), while pan model-checks it (i.e., executes all its execution paths).
So it'd be really, really nice if we could find a way to refactor this code so that common parts are only written once and so forth. This may require using C++, especially lambda functions and/or template meta-programming. It might also require some crafty C pre-processor macros and shell scripts to turn the same code into a string version (for pan) and a normal version (for Spin).
What's in it for you? You can demonstrate that you can understand and modify a complex piece of software and do so in a systematic manner. Employers with brains would appreciate that, cause that's the majority of what professional programmers do - few write programs from scratch.
Note: The following project ideas may depend on the semantics project above as the current Xcd tool may not be able to treat the cases explored - so I consider them risky and would rather have someone complete the Xcd semantics project first.
Here is a paper discussing how to represent the patterns in the Gang of Four book using aspect-orientation. https://www.cs.ubc.ca/labs/spl/papers/2002/oopsla02-patterns.html
The code from that is here: https://hannemann.pbworks.com/w/page/16577895/Design%20Patterns
Along with a PhD student here at City we're developing a language to describe architectures. All related publications, the tool, and some case studies are here: https://www.staff.city.ac.uk/c.kloukinas/Xcd/index.html
The goal is to take the patterns described in POSA (vol 1) and describe them in Xcd.
POSA books: https://www.dre.vanderbilt.edu/~schmidt/POSA/
Similar to the patterns project - the goal here is to describe some algorithms, like (from C++): swap, sort, find, etc. Also map-reduce (don't think that the Xcd language can support it now but it's good to try).
There are two ways to describe a protocol: a decentralized one, where you give the behaviour of each participant, and a centralized one where you describe at each state which participant can move and what the next state would be.
The Xcd language supports the former.
We'd like to extend it so as to be able to describe a property for such a protocol, which would be expressed as a centralized behaviour.
Concerns language design, and compiler changes.
The current version of the Xcd language does not have any notion of time (like FSP).
But in many systems time needs to be modelled.
So we'd need to extend the syntax to include clocks and expressions on them, and then extend the compiler that translates Xcd models into the language of a model-checker called Spin.
More info (+ binaries and source code) on Spin can be found here: https://spinroot.com/spin/whatispin.html
Note - Spin itself does not support time.
There are two groups that have tried to extend it so that it does:
We'd preferrably like to use the first extension, as it can support more than time.
Looking at the syntax extensions that these two have done for Spin, would be a good way to start on deciding how to extend the syntax of Xcd.
You don't need to understand the underlying theory about model-checking time (well, not all of it).
Like previous projects, it involves working on the syntax of the language and on its compiler.
There's a paper describing how the authors specified the distributed file system of Google (GFS), Microsoft (Niobe), and another one (Chain):
Roxana Geambasu, Andrew Birrell, and John MacCormick. "Experiences with Formal Specification of Fault-Tolerant Storage Systems." In Proceedings of the 38th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN-DCCS), Anchorage, Alaska, June 2008.
The authors were kind enough to give me the specifications.
It would be very interesting to translate these into Xcd, to see if we can make them shorter, etc. by reusing protocols.