Projects - Code Breaking
-----------------------

Note: In the text of the second task 4th should be 5th and 5th should
be 4th.

Frequency tables are given in the files

Freq_space - for English with spaces

Freq_no_space - for English where the spaces between words have been
                closed up.

Programmes
----------

To run these programmes copy them into your file space.  To compile
them in Linux (E208, E217, A412, A413) open up a terminal window and use

g77 -o Prog_Name Prog_Name.f

where Prog_Name is replaced by the appropriate programme name. This
compiles the programme. You can then run it by just typing

Prog_Name

It is also possible to run these on the Windows PCs in other labs,
but a bit more work needs to be done. These programmes are run using
Fortran77, but the compilers on the Windows service only recognises
Fortran95. They are not too different. If you change the suffix of the
programme to .f95 and load the programmes into Plato IDE then click on
'File Options ...' and type /FIXED_FORMAT in the box 'Local Compiler
Options' then some of the programmes (e.g. Caesar.f) will run. Others
need a bit of modification as the continuation line convention changed.
See Dr Bowtell's Fortran95 Notes available at 

http://www.staff.city.ac.uk/~ra933/X2NumMaths/FORTRAN95_Notes.pdf

Caesar.f - for the Caesar cypher
--------

Copy the text you want to encypher or decipher into the file Caesar.txt
and run the programme by typing

Caesar

You will be prompted for the shift you want to use and whether you want
the output in UPPER or lower case letters. Obviously you can use all 25
non-trivial shifts and see which one you can read - but try and use some
deductive work to get your answer in fewer goes.

MonoAlph.f - for a monoalphabetic cypher
----------

The key for this code is stored in the file MonoAlph.key. This file
consists of a single line of 26 letters (upper or lower case). A or a
will be replaced by the first letter, B or b by the second and so on. I
suggest you start with a key ABCDEF...XYZ and replace each guessed letter
in lower case. So if you think C is encrypted by B change your key file
to AeCDE...XYZ and run you programme again. The programme assumes it is
working on a plain text, so when decoding use the option (1) encoding,
and use the option for mixed upper and lower case. You shouldn't need
to do a frequency analysis for Code_2. If you need to do a frequency
analysis of a text then use the programme Analysis.f which reads from
a file Analysis.in.

HomoDecode.f - for decoding a monoalphabetic cypher with homophones.
------------

This reads the text to be decoded from the file Homophones.code. It
reads a guess for the key from the file HomoGuess.key. The format of
this file is a file with 27 lines. The first line contains letters and
characters that you think correspond to A, the second to B, and so on up
to 26th for Z and 27 for spaces. If you have no guesses for a given letter
leave the corresponding line blank. The output on running the programme
consists of alternate lines of the enciphered text and the (partial)
decipherment. Unknown letters are replaced by X in the decipherment.
Frequency analysis of pairs is very useful here. For example, spaces are
common, but double spaces are not. So if you see some set of relatively
common letters that are never seen together then they may be spaces!

VigCount.f and Vienere.f
------------------------

VigCount.f takes some code from the file Vigenere.code and looks for
repeated 3, 4 , 5 and six letter repeats. It then writes them out with
the difference in positions, and the position of the first and second
occurrences. It is the difference in positions that matters most, as the
highest common factor of these should give you the length of the key word
(plus a few random matches to throw you off).

Once you have the key word length, you have to split the text up
appropriately and use frequency analysis to find the letters of the
keyword. The guess at the keyword can be entered into Vigenere.f which
reads its source text from the file Vigenere.txt. This will decode your
text based on this guess.

PlayfairBreak.f and PlayfairCuess.f
-----------------------------------

In PlayfairBreak you are given a frequency count of digraphs. Use this to
make some guesses as to suitable substitutions. Don't forget that if th is
encoded as, say, JK then ht will be encoded as KJ, so you would expect JK
to be common and KJ scarce. The guesses are entered in the form of pairs

DE st

where DE is the pair in the code and st is what you think it is in the
plain text. Don't forget to enter other guesses/deductions such as

ED ts

and if you think the grid looks like

..E.T
.....
..S.D
.....
.....

you can also enter

TS ed
ST de

The last line of your guesses should be 

ZZ ZZ

I suggest you have a list of your guesses in a file and just copy and
paste them to the programme after each modification.

Once you have established some sense of what part of the key looks like
transfer to PlayfairGuess.f. In this programme you put your partial guess
into the file Playfair.key in the from

..E.T
..AFG
..SID
.....
.....

You must have a 5x5 grid, use capital letters and full stops for unknown
entries. Every time you change the file just hit return and the guess
should be updated with you attempts at decoding in lower case letters
with "." for one of a pair of letters that cannot be deduced. Original
code in uppercase letters.

RotMono2.f
----------

This programme is for helping decipher a text that has had a rotation
type cipher, where the shift of the ith letter is by a+N*i (mod 26),
followed my a monoalpbetic substitution cipher.

This programme reads the cipher from the file Cipher.7.ABC. They then
split the text up into 26 columns and look for the most frequent letters
in each column. If you have a very large sample and/or are lucky these
will correspond to e being enciphered. From this and other information
you should try and guess what the value of N is. For example if you have 
identified a letter you think corresponds to e in one column, then the 
frequency of the same letter in other columns puts constraints on 
possible values of N. Applying this to several columns will give you an 
idea what N is.

You can assume that a=-N (why?), and so the first column is unaffected
by the a+N*i shift. This may tell you what e is enciphered as in the
monoalphabetics substitution cipher. If you assume the most frequent
letter in each of the other columns originally came from e, then
your knowledge of N will give you guesses for other elements of the
monoalphabetic key. If is even or 13 this may be a bit restrictive and
you may need to look for some more information to help. Put your guesses
for the letters corresponding to e into the file RotMono2.key (same as
for the monoalphabetic substitution cipher) in the order of the columns,
so in the original example the content of this files would be

YKVBLDFSKQBAHESRVDFGRTMWIO

Then use RotMono2.f. If N is even then the two halves of the key should 
be the same. Where there are differences look more closely at the 
distribution in both corresponding columns to get a better guess.

If every thing was perfect and N was odd then you would have a clear
text. However, things are often not. For example by using the most
frequent letter you may get (as is the case here) repeated letters. Here
K is repeated, but if N is odd they can't both be right. RotMono2 will
print out its first attempt. Then you have to try and sort out your
errors. Look at the partial decipherment, spot letters that you think are
wrong, enter the letter's position (remember - each line is 80 characters
long) and what it should be and it will work out the correction to the
monoalphabetic key. The latest version of the monoalphabetic key is given
numerically just above the latest decipherment. This will not look like
your key file.

If N is even you may get a lot of blanks. Scan through your text and with
luck you should spot something that you can work on. Once you get a few
corrections you will find things becoming clearer quite quickly. (A few
of you have even N, and I have successfully deciphered them).

Other uses: If the key file has all the letters in alphabetic order, then
this like a rotation with N=1. Try different values of N in the programme
and the one that gives the most uneven distribution of the letters is 
probably that for the rotation step (with the adjustment of 1 due to the 
key files having an rotation of 1). You can use this for ciphers 7 and 8.