Project ideas

Christos Kloukinas

Room:: A205
Email:: c.kloukinas -at- city...

General interests

Software architecture description languages (ADLs) like Xcd (developed here at City), formal methods (like FSP), modelling, model checking (e.g., with the Spin model checker), compilers, security, safety, programming.

Topics

It's just a matter of semantics... Introduction
Languages are defined by their grammar and their semantics. The grammar says that "a $ 3 ;" is incorrect Java. The semantics describe what exactly a grammatically correct piece of code actually means. For example, what does it mean to write something like "a = 4;"? It's context dependent - it's an error if you haven't defined "a" to be some number (int/long/float/etc.) variable.
Imagine that I give you the grammar of the Greek language. Using it you can identify which Greek sentences are correct and which are wrong but without knowing the semantics you cannot say what the correct sentences actually mean - it's all Greek to you! Well, the same happens with computers.
Now it's getting trickier - how do you know that your C++ compiler correctly implements the C++ standard (i.e., the C++ semantics as expressed in English by the standard committee)? The C++ compiler developers have tried to interpret what the committee had written in English and codify that into their compiler but it's not an easy task - bugs have been identified on many occassions.
There's a way to make this task easier by using a recently developed framework called the K Framework. There you use a specific formal (not just English) language to describe the semantics of a language and there are tools to produce the compilers for you automatically (like writing a UML statechart and then getting running code from it directly).
This framework has been used to produce executable formal semantics for the C language (!!!!), published at POPL'12 (a very interesting but maybe a bit difficult to understand paper).
A much easier but slightly outdated primer into the K Framework is available, showing how the framework works for a simple calculator language - should be much easier to follow (and I'm more than glad to help with it). Figure 4, on p. 19 shows the whole input for defining the grammar of a simple calculator and its semantics - have a quick look at it, to see how simple it is.
There's a (hopefully up-to-date...) set of tutorial videos.
The tools of the K Framework are available from the K Framework's Tools Github page.
1. You could express the semantics of FSP with the K Framework, to get our own version of LTSA!
  A paper describing the translation of FSP into another formal language (Lotos) may be helpful in understanding how FSP is supposed to work when unsure (mainly a note to myself).
  You need to be able to think abstractly and be very, very methodical - giving super attention to detail.
  Initial steps:
  1. Go through the K primer
  2. Go through the FSP notation
  3. Go through the (short) FSP semantics
  4. (quickly for the 1st time) Go through the (long) FSP semantics
  5. Then install K
  6. And take the K tutorial to get a better handle on K.
2. You can try to use the K Framework to express the Xcd language semantics with it (much harder than doing so for FSP, since XCD is a more complicated language and its semantics have not been defined formally, unlike FSP's).
  (what's Xcd? a language to describe software architectures)
  We have an informal definition of the Xcd language semantics, described in English and pseudo-code in Mert Ozkaya's PhD thesis. Mert has also written the tool that translates Xcd designs into the Spin tool's language so that they can be verified. This tool is supposed to have followed the English language Xcd semantics but it's quite possible that some errors have been done (or indeed that the original semantics were not always correct - we're only humans...).
Extending the XCD translator tool <-------- !!!!
When you write a Java program, you compile to see if there are any grammar/type errors, and, if not, you run it with different input values to test that it's correct.
When you produce a UML diagram, how do you know that it's correct?
You don't... :-(
We have created a new language to describe architectures of software systems and their behaviour, called Xcd (pronounced "exceed"). Here's the Xcd home page.
This language allows to specify architectural designs in a textual form, as if you were programming. It also allows you to check these designs for errors. There's a tool that translates the designs into another language that is easier to check automatically (through an external tool called Spin - a model checker, see What is Spin).
An article describing the Xcd language has been published at the CBSE'14 conference:
- Xcd article
- Xcd conference presentation slides
Spin unspun (hard) Here we go all-guns-blazing in refactoring and C++ Template Metaprogramming...
Issue:
Spin, is not just a tool - it has so many configuration options that it'd be more right to call it a family of tools.
When compiling its verifier, pan (Protocol ANalyser), it can use a number of options: http://spinroot.com/spin/Man/Pan.html#D1, e.g., -DBITSTATE, -DBFS, -DNOREDUCE, etc.
The code is then compiled selectively according to these directives - whether it'll be exhaustive or bitstate exploration, depth-first or breadth-first, with or no partial-order reduction, etc.
In a perfect world, I'd like to refactor its code entirely, so that instead of selecting code based on these directives I can use C++ Template Metaprogramming to do the selection.
As a first step, it'd be nice if we could:
1. Pass some directives to the compiler, e.g.: g++ -x c++ -DBITSTATE -DNOREDUCE pan.c -o pan
2. Use the directives to define some global enum variables:
```
enum class search_type { exchaustive, bitstate };
#ifndef BITSTATE
 search_type search_type_val = search_type::exhaustive;
#else
 search_type search_type_val = search_type::bitstate;
#fi
```
  (same for the other options).
3. Then use template metaprogramming to select code that is optimised for specific values of these global enum variables:
```
/* Your job! Set of classes that when instantiated at the end will
   print the list of directives used at compilation */
```
What's needed:
1. Identify all these directives (a smart grep will do it)
2. Group directives into groups, e.g., SEARCH_MODE = { Depth_First, Breadth_First }; and respective global variables that are set to the appropriate mode.
3. Replace all #if/endif selective compilation directives with if/switch checks that check the global variables.
This will allow the compiler to compile the *whole* code, not just the one that has been selected by the directives.
This will allow us to look for bugs in Spin, some of which could be masked now. I'm already developing some crazy C pre-processor macros to facilitate this task (one must have fun from time to time...;-).
The whole code? Well, not exactly... You see, Spin contains code inside strings (in files called pangenX.[ch]), that it uses to produce another executable called pan (Promela ANalyzer) given some model - it's the pan executable that actually checks the model. Obviously, this code is never seen by the compiler when Spin is compiled - only seen when compiling pan. In itself that's not too bad, as one can refactor it as we've just described for the Spin code and check for bugs. But some of this code is quite likely similar to the code that Spin itself uses - Spin can simulate a model (i.e., execute one of its execution paths), while pan model-checks it (i.e., executes all its execution paths).
So it'd be really, really nice if we could find a way to refactor this code so that common parts are only written once and so forth. This may require using C++, especially lambda functions and/or template meta-programming. It might also require some crafty C pre-processor macros and shell scripts to turn the same code into a string version (for pan) and a normal version (for Spin).
What's in it for you? You can demonstrate that you can understand and modify a complex piece of software and do so in a systematic manner. Employers with brains would appreciate that, cause that's the majority of what professional programmers do - few write programs from scratch.
Spin - two-in-one (challenging) Spin can simulate a model and also produce a C program that can verify the model. If there are errors during verification, one can replay them using Spin.
Issue: The code that Spin has internally to be able to simulate models is quite similar to the C code it produces to verify models. The verification code can also execute fragments of C code that can be included in Spin's models (as they're getting compiled along with everything else), while the simulator cannot execute them.
We'd like a version of Spin that enables the produced C program to also simulate the model - one small step towards merging the two facets of Spin.
Even better if the produced C program can support all of Spin's parameters (so one can use it as a full replacement for Spin when simulating a model).
(or use hobbes to execute the checker code directly inside Spin...)
C++ algorithms depend on a small set of functions that a class should support if we were to use its objects with these algos (e.g., copy constructor, test for equality, etc.)
1. Can we identify minimum component interfaces for some distributed algorithms (e.g., the asynchronous ones described by Lynch (https://dl.acm.org/doi/book/10.5555/2821576)?
2. Can we specify some of these algorithms using the Xcd language (see above for more on it)?

List of OLD project ideas

Spin (suggested)
There's a tool called Spin, that takes a model and checks it automatically.
Developed at Bell Labs, received the very prestigious ACM System Software award in 2002, its developer now heads a research lab at NASA. https://spinroot.com/spin/whatispin.html
Spin's written in C. Plain, good, old C. No classes/generic code - simple... (he, he, he).
We'd like to modify Spin itself so that it better supports our Xcd language.
More info (+ binaries and source code) on Spin can be found here: https://spinroot.com/spin/whatispin.html
1. Spin now supports channels and message receptions like this:
```
/* receive a message consisting of three data values from channel
   chan1 and assign these into d1, d2, d3. */
chan1 ? d1, d2, d3;
```
  You can now also do:
```
/* only receive a three valued message and assign the first value to
   d1 if the second value in the message is equal to 3 and the third
   value is equal to the current value of d4 */
chan1 ? d1, 3, eval(d4);
```
  We need:
```
/* only receive a three valued message into d1, d2, d3, if its values
   satisfy the predicate after the : character, where $i is the i-th
   message component */
chan1 ? d1, d2, d3 : ($1 * $2 == $3 + d4);
```
  Note: Spin has different types of message reception.
```
// https://spinroot.com/spin/Man/receive.html
c ?    m; /* receive the first message in the buffer if it matches m */
c ??   m; /* receive any matching m non-deterministically */
c ?  < m >; /* receive the first message in the buffer if it matches m but don't remove it from the channel buffer */
c ?? < m >; /* receive any matching m non-deterministically but don't remove it from the channel buffer */
// https://spinroot.com/spin/Man/poll.html
c ?  [ m ]; /* poll channel to see if you could receive the first message in the buffer if it matches m */
c ?? [ m ]; /* poll channel to see if you could receive any matching m non-deterministically */ 
```
2. Also Spin has a non-deterministic select(i, min, max), which assigns i a value between min and max when min ≤ max or min otherwise (it never blocks!).
  It is equivalent to the following Promela code:
```
/* select(i, min, max); */
i = min;
do /* Non-deterministic loop, can choose any value between min and max */
:: i < max -> ++i
:: break
od
```
  We'd like a select(i, min, max, predicate) - assign i a value between min/max that satisfies predicate; block if no such value exists. The latter part is important (and difficult) - the statement shouldn't be executable at all if there is no value within the range for which predicate is true.
  And of course, if the predicate is true for some value then it shouldn't block, so the following "implementation" is wrong:
```
/* select(i, min, max, pred); */
i = min;
do /* Non-deterministic loop, can choose any value between min and max */
:: i < max -> ++i
:: pred(i) -> break /* Bug: if pred(i) is true, one may choose to
advance i nevertheless and get stuck in values of i for which pred(i)
isn't true. We could store the last value of i that satisfies pred(i) */
od
```
  Another attempt - but is it correct?
```
/* select(i, min, max, pred); */
i = min; last_i=min-1;
do /* Non-deterministic loop, can choose any value between min and max */
:: pred(i) -> last_i = i ; /* Store the last value of i that satisfies pred(i) */
              if 
              :: i < max -> ++i
              :: break
              fi
:: else -> if 
           :: i < max -> ++i
	   :: (i == max) && (last_i != min) -> i = last_i ; break
           fi
od
```
We can then modify the Xcd translator to take advantage of these features (especially the generalized message reception, which should help decrease the state space considerably).
This would also help demonstrate the usefulness of these features by comparing how the two versions of the translator behave memory/time-wise.

Note: The following project ideas may depend on the semantics project above as the current Xcd tool may not be able to treat the cases explored - so I consider them risky and would rather have someone complete the Xcd semantics project first.
1. patterns (actually may depend on the semantics project above as the current Xcd tool may not be able to treat these examples...)
  Here is a paper discussing how to represent the patterns in the Gang of Four book using aspect-orientation. https://www.cs.ubc.ca/labs/spl/papers/2002/oopsla02-patterns.html
  The code from that was here: https://hannemann.pbworks.com/w/page/16577895/Design%20Patterns It's still accessible through the Internet archive https://web.archive.org/web/20090826161450/http://hannemann.pbworks.com/Design+Patterns
  Along with a PhD student here at City we're developing a language to describe architectures. All related publications, the tool, and some case studies are here: https://www.staff.city.ac.uk/c.kloukinas/Xcd/index.html
  The goal is to take the patterns described in POSA (vol 1) and describe them in Xcd.
  POSA books: https://www.dre.vanderbilt.edu/~schmidt/POSA/
2. algorithms (actually may depend on the semantics project above as the current Xcd tool may not be able to treat these examples...)
  Similar to the patterns project - the goal here is to describe some algorithms, like (from C++): swap, sort, find, etc. Also map-reduce (don't think that the Xcd language can support it now but it's good to try).
3. (DIFFICULT++) properties (actually may depend on the semantics project above as the current Xcd tool may not be able to treat these examples...)
  There are two ways to describe a protocol: a decentralized one, where you give the behaviour of each participant, and a centralized one where you describe at each state which participant can move and what the next state would be.
  The Xcd language supports the former.
  We'd like to extend it so as to be able to describe a property for such a protocol, which would be expressed as a centralized behaviour.
  Concerns language design, and compiler changes.
4. (DIFFICULT) time (actually may depend on the semantics project above as the current Xcd tool may not be able to treat these examples...)
  
  The current version of the Xcd language does not have any notion of time (like FSP).
  But in many systems time needs to be modelled.
  So we'd need to extend the syntax to include clocks and expressions on them, and then extend the compiler that translates Xcd models into the language of a model-checker called Spin.
  More info (+ binaries and source code) on Spin can be found here: https://spinroot.com/spin/whatispin.html
  Note - Spin itself does not support time.
  There are two groups that have tried to extend it so that it does:
  - Extending model checkers for hybrid system verification: the case study of SPIN https://onlinelibrary.wiley.com/doi/10.1002/stvr.1505/full (can do a lot more than time)
  - Building extensible specifications and implementations of Promela with AbleP https://melt.cs.umn.edu/pubs/mali11spin/mali11spin.pdf
  We'd preferrably like to use the first extension, as it can support more than time.
  Looking at the syntax extensions that these two have done for Spin, would be a good way to start on deciding how to extend the syntax of Xcd.
  You don't need to understand the underlying theory about model-checking time (well, not all of it).
  Like previous projects, it involves working on the syntax of the language and on its compiler.