Evolutionary Algorithm For Decryption of Monoalphabetic Homophonic Substitution Ciphers Encoded As Constraint Satisfaction Problems
Evolutionary Algorithm For Decryption of Monoalphabetic Homophonic Substitution Ciphers Encoded As Constraint Satisfaction Problems
David Oranchak
NTU School of Engineering and Applied Science
Roanoke, VA 24018
[email protected]
ABSTRACT
A homophonic substitution cipher maps each plaintext letter
of a message to one or more ciphertext symbols [4]. Monoal-
phabetic homophonic ciphers do not allow ciphertext sym-
bols to map to more than one plaintext letter. Homophonic
ciphers conceal language statistics in the enciphered mes-
sages, making statistical-based attacks more difficult. We
present a dictionary-based attack using a genetic algorithm
that encodes solutions as plaintext word placements sub-
jected to constraints imposed by the cipher symbols. We
test the technique using a famous cipher (with a known so-
lution) created by the Zodiac serial killer. We present several
successful decryption attempts using dictionary sizes of up
to 1,600 words.
Program Track: Real-World Applications
Categories and Subject Descriptors: E.3 Data Encryp- Figure 1: Left: Solved 408-character homophonic
tion: Code breaking substitution cipher sent by the Zodiac serial killer
General Terms: Algorithms, Experimentation to three San Francisco newspapers. Right: Unsolved
Keywords: Evolutionary computing, genetic algorithms, 340-character cipher sent by the killer.
Zodiac killer, Zodiac murder ciphers, codebreaking, cryp-
tography, constraint satisfaction, homophonic substitution
something gives me the most thrilling experence. It is even
better than getting your rocks off with a girl. The best part is
1. INTRODUCTION thae when I die I will be reborn in paradice and all the I have
Simple substitution ciphers encrypt plaintext messages killed will become my slaves. I will not give you my name be-
using symbols which map to individual plaintext letters. cause you will try to sloi down or stop my collecting of slaves
Monoalphabetic ciphers use the same mappings from plain- for my afterlife ebeorietemethhpiti.] The killer mailed a sec-
text to ciphertext throughout the encrypted message. Monoal- ond cipher to a San Francisco newspaper (Figure 1[2]). No
phabetic substitution ciphers are often easy to decipher with satisfactory solution to this cipher has yet been found. We
frequency analysis because the simple mappings preserve let- use the 408-symbol cipher as a test case for our technique,
ter frequencies of the plaintext message. Homophonic ci- which we hope can be used to attack the 340-symbol cipher.
phers hide letter frequencies of plaintext messages. Each Many effective decryption techniques for simple ciphers
letter of enciphered plaintext is mapped to one or more ci- have been studied, such as statistical analysis [3][1], evolu-
phertext units, called homophones, which flattens the distri- tionary computing[8][6], and dictionary-based attacks [5][7].
bution of ciphertext symbols. The Zodiac killer is a famous Our experiments combine the strengths of evolutionary search
serial killer who operated in California in the late 1960s [2]. and constraint-imposed dictionary-based attacks.
In 1969, the killer sent three letters to area newspapers. In
each letter, the killer took credit for recent shootings, and
included a part of the 408-symbol three-part cipher (Fig- 2. APPROACH
ure 1[2]). A high school teacher and his wife soon decoded The 408-character cipher has a keyspace size of 2654 . To
the cipher by hand.[2]: [I like killing people because it is so reduce the space, we attacked a 52-character region of ci-
much fun. It is more fun than killing wild game in the for- phertext that decodes over 90% of the entire message. The
rest because man is the most dangeroue animal of all to kill targeted section decodes to the following: killing wild game
in the forrest because man is the most danger. This
section is only 12.7% of the cipher, but decodes 369 char-
Copyright is held by the author/owner(s).
GECCO’08, July 12–16, 2008, Atlanta, Georgia, USA. acters (90.4%) of the plaintext. We also limit word place-
ACM 978-1-60558-131-6/08/07. ments to sets of unique words having a minimum length
Table 1: Results of experimental runs for different
word pool sizes. Fe is the evolved solution’s multi-
objective fitness, and Fs is the multiobjective fitness
of the known correct solution.
#Words Correctness Fe Fs Generations
500 1.0 58,880 58,880 874
859 1.0 82,1074 82,1074 668
1000 1.0 85,1099 85,1099 807
1395 1.0 104,1270 104,1270 1750
1600 0.9 110,1202 120,984 3222
4. CONCLUSIONS
We encoded a homophonic substitution cipher attack as
an evolutionary search of a combinatorial space of dictio-
nary word placements subjected to constraints imposed by