Rsa 1
Rsa 1
Algorithm
Lecture Notes on Computer and Network Security
by Avi Kak ([email protected])
April 22, 2015
11:13pm
c
2015
Avinash Kak, Purdue University
Goals:
To review public-key cryptography
To demonstrate that confidentiality and sender-authentication can be
achieved simultaneously with public-key cryptography
To review the RSA algorithm for public-key cryptography
To present the proof of the RSA algorithm
To go over the computational issues related to RSA
To discuss the vulnerabilities of RSA
Python implementations for generating primes and for factorizing medium sized numbers
1
CONTENTS
Section Title
Page
12.1
Public-Key Cryptography
12.2
12.2.1
12
12.2.2
14
12.2.3
17
12.3
21
12.3.1
22
12.3.2
24
12.3.3
27
12.4
29
12.5
35
12.5.1
39
12.6
44
12.7
48
12.8
54
12.9
58
12.10
70
12.10.1
74
12.11
76
12.12
87
12.13
Homework Problems
89
Lecture 12
Public-key cryptography is also known as asymmetric-key cryptography, to distinguish it from the symmetric-key cryptography
we have studied thus far.
public key that the server would associate with your login ID so that you can make a password-free
connection with the server. The public key held by a server is commonly referred to as the servers
host key.
When a client, such as your laptop, wants to make a connection with an SSH server, it
sends a connection request to port 22 of the server machine and the server makes its host key available
automatically. On the other hand, in the SSL/TLS protocol, an HTTPS web server makes its public
] As we will
see, this solves one of the most vexing problems associated with
key available through a certificate of the sort youll see in the next lecture.
Lecture 12
Lecture 12
E (P UB , E (P RA, M ))
where E() stands for encryption. The processing steps undertaken by B to recover M from C are
M
D (P UA, D (P RB , C))
The sender A encrypting his/her message with its own private key
P RA provides authentication. This step constitutes A putting
his/her digital signature on the message. Instead of applying the
private key to the entire message, a sender may also sign a message by applying
his/her private key to just a small block of data that is derived from the message to
be sent.
[DID YOU KNOW that you are required to digitally sign the software for your app before you
can market it through the official Android application store Google Play? And did you know that Apples App
Lecture 12
Of course, the price paid for achieving confidentiality and authentication at the same time is that now the message must be
processed four times in all for encryption/decryption. The message goes through two encryptions at the senders place and two
decryptions at the receivers place. Each of these four steps involves separately the computationally complex public-key
algorithm.
Lecture 12
As private key
As public key
PRA
PUA
Bs public key
PUB
Encrypt with PU B
Bs private key
PRB
Decrypt with PR
Message
Message
Party A
As private key
As public key
PR A
PUA
Bs public key
PUB
Encrypt with PR A
Bs private key
PR B
Decrypt with PU
A
Message
Message
Party A
Encrypt
with PRA
Party B
As private key
As public key
PRA
PUA
Bs public key
PUB
Bs private key
PRB
Decrypt
with PRB
Encrypt
with PU
B
Decrypt
with PUA
Message
Message
Party A
Lecture 12
The RSA algorithm named after Ron Rivest, Adi Shamir, and
Leonard Adleman is based on a property of positive integers
that we describe below.
Lecture 12
The result shown above, which follows directly from Eulers theorem, requires that M and n be coprime. However, as shown
in Section 12.2.3, when n is a product of two primes p
and q, this result applies to all M , 0 M < n. In what
follows, lets now see how this property can be used for message
encryption and decryption.
Lecture 12
(n)
the totient of n
Now suppose we are given an integer M , 0 M < n, that represents our message, then we can transform M into another integer
C that will represent our ciphertext by the following modulo exponentiation:
C
M e mod n
Lecture 12
C d mod n
since
(M e)d (mod n) = M ed
(mod (n))
11
(mod n)
Lecture 12
Lecture 12
13
Lecture 12
e d
M )
M ed
(mod n)
We want this guarantee because C = M e mod m is the encrypted form of the message integer M and decryption is carried
out by C d mod n.
pq
(1)
The above factorization is needed because the proof of the algorithm, presented in the next subsection, depends on the following
two properties of primes and coprimes:
14
Lecture 12
{a b (mod pq)}
(2)
This equivalence follows from the fact a b (mod p) implies a b = k1p for some integer k1. But since we also
have a b (mod q) implying a b = k2q, it must be the
case that k1 = k3 q for some k3. Therefore, we can write
a b = k3 p q, which establishes the equivalence. (Note
that this argument breaks down if p and q have common factors other than 1.) [We will use this property in the next subsection to
arrive at Equation (11) from the partial results in Equations (9) and (10).]
(p) (q)
(p 1) (q 1)
(3)
See Section 11.3 of Lecture 11 for a proof of this. [We will use
this property to go from Equation (5) to Equation (6) in the next subsection.]
15
Lecture 12
16
Lecture 12
(mod (n))
(4)
0 (mod (n))
k (n)
(5)
Lecture 12
and
(q) | (e d 1)
(6)
k1(p)
k1(p 1)
(7)
1 + 1
18
mod p = M k1(p
1)
M mod p
(8)
Lecture 12
1 (mod p)
1)
1 (mod p)
M mod p
(9)
Lecture 12
From the second assertion in Equation (6), we can draw an identical conclusion regarding the other factor q of the modulus n:
M ed mod q
M mod q
(10)
20
M mod n
(11)
Lecture 12
= [e, n]
Lecture 12
You first decide upon the size of the modulus integer n. Lets say
that your implementation of RSA requires a modulus of size B
bits.
Lecture 12
You do the same thing for selecting q. You start with a randomly
generated number of size B/2 bits, and so on.
For greater security, instead of incrementing by 2 when the MillerRabin test fails, you generate a new random number.
23
Lecture 12
Lecture 12
Lecture 12
p mod e
q mod e
6=
6
=
1
1
26
Lecture 12
e1 mod (n)
Note that the main source of security in RSA is keeping p and q secret and therefore also keeping (n) secret. It is important to realize that knowing either will reveal
27
Lecture 12
the other. That is, if you know the factors p and q, you can
calculate (n) by multiplying p 1 with q 1. And if you know
(n) and n, you can calculate the factors p and q readily.
28
Lecture 12
For the sake of illustrating how youd use RSA as a block cipher,
lets try to design a 16-bit RSA cipher for block encryption of disk
files. A 16-bit RSA cipher means that our modulus will span 16
29
Lecture 12
bits. [Again, in the context of RSA, an N-bit cipher means that the modulus is of
size N bits and NOT that the block size is N bits. This is contrary to not-so-uncommon
usage of the phrase N-bit block cipher meaning a cipher that encrypts N-bit blocks
at a time as a plaintext source is scanned for encryption.]
With the modulus size set to 16 bits, we are faced with the important question of what to use for the size of bit blocks for
conversion into ciphertext as we scan a disk file. Since our message integer M must be smaller than the modulus n, obviously
our block size cannot equal the modulus size. This requires that
we use a smaller block size, say 8 bits, and use some sort of a
padding scheme to fill up the rest of the 8 bits. As it turns out,
padding is an extremely important part of RSA ciphers. In addition to the need for padding as explained here, padding is also
needed to make the cipher resistant to certain vulnerabilities that
are described in Section 12.7 of this lecture.
In the rest of the discussion in this section, we will assume for our
toy example that our modulus will span 16 bits, but the block
size will be smaller than 16 bits, say, only 8 bits. We will further
assume that, as a disk file is scanned 8 bits at a time, each such
bit block is padded on the left with zeros to make it 16 bits wide.
We will refer to this padded bit block as our message integer M .
30
Lecture 12
So the issue now is how to find a prime suitable for our 8-bit
representation. Following the prescription given in Section 12.3.1,
we could fire up a random number generator, set its first two
bits and the last bit, and then test the resulting number for its
primality with the Miller-Rabin algorithm presented in Lecture
11. But we dont need to go to all that trouble for our toy
example. Lets use the simpler approach described below.
bits of p
bits of q
11 1
11 1
:
:
Lecture 12
set only the first bit. Now it is theoretically possible for the smallest values for p and q
to be not much greater than 27 . So the product p q could get to be as small as 214 ,
which obviously does not span the full 16 bit range desired for n. When you set the first
two bits, now the smallest values for p and q will be lower-bounded by 27 + 26 . So the
product p q will be lower-bounded by 214 + 2 213 + 212 , which itself is lower-bounded
by 2 214 = 215 , which corresponds to the full 16-bit span. With regard to the setting
of the last bit of p and q, that is to ensure that p and q will be odd.]
So the question reduces to whether there exist two primes (hopefully different) whose decimal values exceed 193 but are less than
255. If you carry out a Google search with a string like first
1000 primes, you will discover that there exist many candidates
for such primes. Lets select the following two
p
q
=
=
197
211
which gives us for the modulus n = 197 211 = 41567. The bit
pattern for the chosen p, q, and modulus n are:
bits of p : 0Xc5
=
bits of q : 0Xd3
=
bits of n : 0Xa25f =
32
1100 0101
1101 0011
1010 0010 0101 1111
Lecture 12
= gcd(2,1)
|
|
|
|
|
|
|
|
|
|
|
residue 17 =
residue 3 =
residue 2 =
=
=
residue 1 =
=
=
=
33
0 x 41160 + 1 x 17
1 x 41160 - 2421 x 17
-5 x 3
+ 1 x 17
-5x(1 x 41160 - 2421 x 17) + 1 x 17
12106 x 17 - 5 x 41160
1x3 - 1 x 2
1x(41160 - 2421x17)
- 1x(12106x17 -5x41160)
6 x 41160
- 14527 x 17
6 x 41160
+ 26633 x 17
Lecture 12
where the last equality for the residue 1 uses the fact that the
additive inverse of 14527 modulo 41160 is 26633. [If you dont like
working out the multiplicative inverse by hand as shown above, you can use the Python
script FindMI.py presented in Section 5.7 of Lecture 5. Another option would be to
use the multiplicative inverse() method of the BitVector class.]
Our 16-bit block cipher based on RSA therefore has the following
numbers for n, e, and d:
n
e
d
=
=
=
41567
17
26633
34
Lecture 12
Lecture 12
CRT).
Since the party doing the decryption knows the prime factors p and q of the modulus n, we can first carry out the easier
exponentiations:
Vp
Vq
C d mod p
C d mod q
=
=
=
=
q (q 1 mod p)
p (p1 mod q)
(VpXp + Vq Xq ) mod n
Further speedup can be obtained by using Fermats Little Theorem (presented in Section 11.2 of Lecture 11) that says that if a
and p are coprimes then ap1 = 1 mod p.
36
Lecture 12
+ v
mod p = C v mod p
When you use FLT in conjunction with CRT, you can calculate
C d (mod n) in roughly quarter of the time it takes otherwise. [First
note, as stated earlier in Section 12.3.1, both p and q are of the order of n/2 where n is the modulus. Since
Vp = C d (mod p) = C d
mod(p1)
(mod p), and since d is of the order of n and d mod(p 1) of the order
of p (which itself is of the order of n/2), it should take no more than half the number of multiplications to
calculate Vp compared to the number of multiplications needed for calculating C d (mod n) directly. The same
would be true for calculating Vq . As a result, the total number of multiplications required for both Vp and
Vq would be the same as in the direct calculation of C d (mod n). Note, however, the intermediate results in
the modular exponentiation needed for Vp would never exceed p (and the same would never exceed q for Vq ).
Since integer multiplication takes time that is proportional to the square of the size of the bit fields involved,
each multiplication involved in the calculation of Vp and Vq would take only one-quarter of the time it takes
Lecture 12
38
Lecture 12
Lecture 12
bk bk1bk2 . . . b0
(binary)
2i
bi 6=0
bi 6=0 2
A2
bi 6=0
We could say that this form of AB halves the difficulty of computing AB because, assuming all the bits of B are set, the largest
value of 2i will be roughly half the largest value of B.
bi 6=0
40
"
A2 mod n
mod n
Lecture 12
Note that as we go from one bit position to the next higher bit
position, we square the previously computed power of A.
A2 , A2 , A2 , A2 , . . .
As opposed to calculating each term from scratch, we can calculate each by squaring the previous value. We may express this
idea in the following manner:
A, A2previous, A2previous, A2previous, . . .
41
Lecture 12
To see the dramatic speedup you get with modular exponentiation, try the following terminal session with Python
[ece404.12.d]$ => script
Script started on Mon 20 Feb 2012 10:23:32 PM EST
[ece404.12.d]$ => python
>>>
>>> print pow(7, 9633196, 9633197)
117649
>>>
>>>
>>>
>>> print (7 ** 9633196) % 9633197
117649
>>>
42
Lecture 12
43
Lecture 12
44
Lecture 12
be able to decrypt it with its RSA private key. The client sends the encrypted session key to the server and,
Lecture 12
The solution to this problem with RSA lies in somehow creating a secret session key without putting it
on the wire. Naturally, your first reaction to this thought
would be: but that is impossible!!!. You are likely to
add: How can two sides share a secret without either mentioning it to the other?
Lecture 12
47
Lecture 12
Lecture 12
Lecture 12
||
BT
||
PS
||
<---------- k bytes
00
||
---------->
With that brief introduction to how an encryption block is constructed in PKCS#1v1.5, lets get back to the subject of CCA.
50
Lecture 12
The fact that RSA could be vulnerable to such attacks was first
discovered by George Davida in 1982.
51
Lecture 12
PKCS#1v1.5.
Lecture 12
53
Lecture 12
Please review Section 10.8 of Lecture 10 to appreciate the significance of Low Entropy in the title of this section. [As explained there,
the entropy of a random number generator is at its highest if all numbers are equally likely to be produced
within the range of numbers that the output is designed for. For example, if a CSPRNG can produce 512-bit
random numbers with equal probability, its entropy is at its maximum and it equals 512 bits. However, should
the probabilities associated with the output random numbers be nonuniform, the entropy will be less than 512.
The greater the nonuniformity of this probability distribution, the smaller the entropy. The entropy is zero for
deterministic output.
Lecture 12
moduli.
To see why that is the case, lets say that p is the common factor
of the two moduli N1 and N2 . Lets denote the other factor in
N1 by q1 and in N2 by q2. You already know from Lecture 5
that Euclids recursion makes the calculation of the GCD of any
two numbers extremely fast. [Using Euclids algorithm, the GCD of two 1024-bit
integers on a routine desktop can be computed in just a few microseconds using the Gnu Multiple Precision
(GMP) library. More theoretically speaking, the computational complexity of Euclids GCD algorithm is O(n2 )
Lecture 12
In a truly landmark investigation by Nadia Heninger, Zakir Durumeric, Eric Wustrow, and J. Alex Halderman that was presented at the 2012 USENIX Security Symposium, the authors reported harvesting over 5 million TLS/SSL certificates and around
4 million RSA-based SSH host keys through scans that lasted no
more than a couple of days. As you can see, these authors were
able to harvest a very large number of RSA moduli in a rather
short time. Subsequently they set out to find the factors that
any of the moduli shared with any of the other moduli. [While the
GCD of a pair of numbers can be computed very fast on a run-of-the-mill machine, it
would still take a very long time to do pairwise computation for all the numbers in a
set that contains a few million numbers. For further speedup, Heninger et al. used a
method proposed by Daniel Bernstein. In this method, you start with calculating the
product of all the moduli, multiplying two moduli at a time in whats called a product
tree, and then reduce the product with respect to the pairwise products of the squares
of the moduli in whats known as the remainder tree. This approach, applied to over
11 million RSA moduli from the TLS/SSL and SSH datasets, yielded the p factors in
under 6 hours on a multicore PC class machine with 32 GB of RAM.]
The title of
the publication by Heninger et al. is Mining your Ps and Qs:
Detection of Widspread Weak Keys in Network Devices.
56
Lecture 12
While the prescription stated above is followed for the most part
by most computers of the sort we use everyday, thats not necessarily the case for a large number of what are known as headless
communication devices in the internet. By headless devices we
mean routers, firewalls, sever management cards, etc. As observed by Heninger et al., a very large number of such headless
devices use software entropy sources for the random bytes they
need as candidates for the prime numbers and the most commonly used software entropy source is /dev/urandom that supplies
pseudorandom bytes through non-blocking reads.
Lecture 12
Lecture 12
attacker would try to figure out the totient (n) of the modulus n.
But as stated earlier, knowing (n) is equivalent to knowing the
factors p and q. If an attacker can somehow figure out (n), the
attacker will be able to set up the equation (p1)(q1) = (n),
that, along with the equation p q = n, will allow the attacker
to determine the values for p and q.
Because of their importance in public-key cryptography, a number that is a product of two (not necessarily distinct) primes is
known as a semiprime. Such numbers are also called biprimes,
pq-numbers, and 2-almost primes. Currently the largest
known semiprime is
(230,402,457 1)2
This number has over 18 million digits. This is the square of the
largest known prime number.
Over the years, various mathematical techniques have been developed for solving the integer factorization problem involving
large numbers. A detailed presentation of integer factorization
is beyond the scope of this lecture. We will now briefly mention
some of the more prominent methods, the goal here being
merely to make the reader familiar with the existence
of the methods. For a full understanding of the mentioned
methods, the reader must look up other sources where the meth59
Lecture 12
ods are discussed in much greater detail [Be aware that while the methods listed
below can factorize large numbers, for very large numbers of the sort used these days in RSA cryptography,
you have to custom design the algorithms for each attack. Customization generally consists of making various
conjectures about the modulo properties of the factors and using the conjectures to speed up the search for the
]:
factors.
Lecture 12
n = [(a + b)/2]2
[(a b)/2]2
This method works fast if n has a factor close to its squareroot. In general, its complexity is O(n). Fermats method can
be speeded up by using trial division for candidate factors up
to n.
Sieve Based Methods: Sieve is a process of successive crossing out entries in a table of numbers according to a set of
rules so that only some remain as candidates for whatever one
is looking for. The oldest known sieve is the sieve of Eratosthenes for generating prime numbers. In order to find
all the prime integers up to a number, you first write down
61
Lecture 12
Lecture 12
In the code shown at the end of this section, the simple procedure laid out above is called pollard rho simple(); its
implementation is shown in lines (D1) through (D15) of the
code. We start the calculation by choosing random numbers for a and b, and computing gcd(a b, n). Assuming
63
Lecture 12
The above mentioned ever increasing number of gcd calculations for each iteration of the algorithm is avoided by
what is the heart of the Pollard- algorithm. The candidate
numbers are generated pseudorandomly using a function f
that maps a set to itself through the equivalence of the remainders modulo n. Lets express the sequence of numbers
generated through such a function by xi+1 = f (xi) mod n.
Again assuming the yet unknown factor d of n, suppose we discover a pair of indices i and j, i < j, for
this sequence such that xi xj (mod d), then obviously
f (xi ) f (xj ) (mod d). This implies that each element of
the sequence after j will be congruent to each corresponding element of the sequence after i modulo the unknown
d.
Lecture 12
Some parts of the implementation of the overall integer factorization algorithm shown below should already be familiar to you. The calculation of gcd in lines in (B1) through
(B4) is from Section 5.4.5 of Lecture 5. The Miller-Rabin
based primality testing code in lines (C1) through (C22) is
from Section 11.5.5 of Lecture 11.
65
Lecture 12
#!/usr/bin/env python
##
##
##
##
Factorize.py
Author: Avi Kak
Date: February 26, 2011
Modified: Febrary 25, 2012
import random
import sys
def gcd(a,b):
while b:
a, b = b, a%b
return a
#(B1)
#(B2)
#(B3)
#(B4)
def test_integer_for_prime(p):
probes = [2,3,5,7,11,13,17]
for a in probes:
if a == p: return 1
if any([p % a == 0 for a in probes]): return 0
k, q = 0, p-1
while not q&1:
q >>= 1
k += 1
for a in probes:
a_raised_to_q = pow(a, q, p)
if a_raised_to_q == 1 or a_raised_to_q == p-1: continue
a_raised_to_jq = a_raised_to_q
primeflag = 0
for j in range(k-1):
a_raised_to_jq = pow(a_raised_to_jq, 2, p)
if a_raised_to_jq == p-1:
primeflag = 1
break
if not primeflag: return 0
probability_of_prime = 1 - 1.0/(4 ** len(probes))
return probability_of_prime
#(C1)
#(C2)
#(C3)
#(C4)
#(C5)
#(C6)
#(C7)
#(C8)
#(C9)
#(C10)
#(C11)
#(C12)
#(C13)
#(C14)
#(C15)
#(C16)
#(C17)
#(C18)
#(C19)
#(C20)
#(C21)
#(C22)
def pollard_rho_simple(p):
probes = [2,3,5,7,11,13,17]
for a in probes:
if p%a == 0: return a
d = 1
a = random.randint(2,p)
random_num = []
random_num.append( a )
while d==1:
b = random.randint(2,p)
for a in random_num[:]:
d = gcd( a-b, p )
#(D1)
#(D2)
#(D3)
#(D4)
#(D5)
#(D6)
#(D7)
#(D8)
#(D9)
#(D10)
#(D11)
#(D12)
66
Lecture 12
if d > 1: break
random_num.append(b)
return d
#(D13)
#(D14)
#(D15)
def pollard_rho_strong(p):
probes = [2,3,5,7,11,13,17]
for a in probes:
if p%a == 0: return a
d = 1
a = random.randint(2,p)
c = random.randint(2,p)
b = a
while d==1:
a = (a * a + c) % p
b = (b * b + c) % p
b = (b * b + c) % p
d = gcd( a-b, p)
if d > 1: break
return d
#(E1)
#(E2)
#(E3)
#(E4)
#(E5)
#(E6)
#(E7)
#(E8)
#(E9)
#(E10)
#(E11)
#(E12)
#(E13)
#(E14)
#(E15)
def factorize(n):
#(F1)
prime_factors = []
#(F2)
factors = [n]
#(F3)
while len(factors) != 0:
#(F4)
p = factors.pop()
#(F5)
if test_integer_for_prime(p):
#(F6)
prime_factors.append(p)
#(F7)
#print "Prime factors (intermediate result): ", prime_factors#(F8)
continue
#(F9)
#
d = pollard_rho_simple(p)
#(F10)
d = pollard_rho_strong(p)
#(F11)
if d == p:
#(F12)
factors.append(d)
#(F13)
else:
#(F14)
factors.append(d)
#(F15)
factors.append(p/d)
#(F16)
return prime_factors
#(F17)
if __name__ == __main__:
if len( sys.argv ) != 2:
sys.exit( "Call syntax: Factorize number" )
p = int( sys.argv[1] )
factors = factorize(p)
print "\nFactors of ", p, ":"
for num in sorted(set(factors)):
print "
", num, "^", factors.count(num)
67
#(A1)
#(A2)
#(A3)
#(G1)
#(G2)
#(G3)
#(G4)
Lecture 12
th
64
Factorize.py 18446744073709551617
Lecture 12
Lecture 12
Lecture 12
RSA-XXX
where XXX stands for the number of bits needed for a binary representation of the number to be factored in the round of
challenges starting with RSA 576.
RSA-576
$10000
71
Lecture 12
Digits:
174
Digit Sum:
785
188198812920607963838697239461650439807163563379
417382700763356422988859715234665485319060606504
743045317388011303396716199692321205734031879550
656996221305168759307650257059
72
Lecture 12
73
Lecture 12
Name:
RSA-896
Prize:
$75000
(retracted)
Digits:
270
Digit Sum:
1222
41202343698665954385553136533257594817981169984
43279828454556264338764455652484261980988704231
61841879261420247188869492560931776375033421130
98239748515094490910691026986103186270411488086
69705649029036536588674337317208131041051908642
54793282601391257624033946373269391
Name:
RSA-1024
Prize:
$100000
(retracted)
Digits:
309
Digit Sum:
1369
135066410865995223349603216278805969938881475605
667027524485143851526510604859533833940287150571
909441798207282164471551373680419703964191743046
496589274256239341020864383202110372958725762358
509643110564073501508187510676594629205563685529
475213500852879416377328533906109750544334999811
150056977236890927563
Name:
Prize:
RSA-1536
$150000
(retracted)
74
Lecture 12
Digits:
463
Digit Sum:
2153
184769970321174147430683562020016440301854933866
341017147178577491065169671116124985933768430543
574458561606154457179405222971773252466096064694
607124962372044202226975675668737842756238950876
467844093328515749657884341508847552829818672645
133986336493190808467199043187438128336350279547
028265329780293491615581188104984490831954500984
839377522725705257859194499387007369575568843693
381277961308923039256969525326162082367649031603
6551371447913932347169566988069
Name:
RSA-2048
Prize:
$200000
(retracted)
Digits:
617
Digit Sum:
2738
2519590847565789349402718324004839857142928212620
4032027777137836043662020707595556264018525880784
4069182906412495150821892985591491761845028084891
2007284499268739280728777673597141834727026189637
5014971824691165077613379859095700097330459748808
4284017974291006424586918171951187461215151726546
3228221686998754918242243363725908514186546204357
6798423387184774447920739934236584823824281198163
8150106748104516603773060562016196762561338441436
0383390441495263443219011465754445417842402092461
6515723350778707749817125772467962926386356373289
9121548314381678998850404453640235273819513786365
64391212010397122822120720357
75
Lecture 12
The size of the key in the RSA algorithm typically refers to the
size of the modulus integer in bits. In that sense, the phrase
key size in the context of RSA is a bit of a misnomer. As
you now know, the actual keys in RSA are the public key [n, e]
and the private key [n, d]. In addition to depending on the size of
the modulus, the key sizes obviously depend on the values chosen
for e and d.
Lecture 12
201181538908178518569358459456544005330977672
121582110702985339908050754212664722269478671
818708715560809784221316449003773512418972397
715186575579269079705255036377155404327546356
26323200716344058408361871194193919999
There are 359 decimal digits in this very large integer. [It is trivial
to generate arbitrarily large integers in Python since the language places no limits on
the size of the integer. I generated the above number by simply setting a variable to a
random 256 character hex string by a statement like
num = 0x7fafdbff7fe0f9ff7.... 256 hex characters ...... ff7fffda5f
and then just calling print num.]
Doubling the size of the key will, in general, increase the time
required for public key operations (as needed for encryption or
signature verification) by a factor of four and increase the time
77
Lecture 12
taken by private key operations (decryption and signing) by a factor of 8. Public key operations are not as affected as the private
key operations when you double the size of the key is because the
public key exponent e does not have to change as the key size increases. On the other hand, the private key exponent d changes in
direct proportion to the size of the modulus. The key generation
time goes up by a factor of 16 as the size of the key (meaning the
size of the modulus) is doubled. But key generation is a relatively
infrequent operation. (Ref.: http://www.rsa.com/rsalabs)
The public and the private keys are stored in particular formats
specified by various protocols. For the public key, in addition to
storing the encryption exponent and the modulus, the key may
also include information such as the time period of validity,
the name of the algorithm used for key generation, etc. For
the private key, in addition to storing the decryption exponent
and the modulus, the key may include additional information
along the same lines as for the public key, and, additionally, the
corresponding public key also. Typically, the formats call for the
keys to be stored using Base64 encoding so that they can be
displayed using printable characters. (See Lecture 2 on Base64
encoding.) To see such keys, you could, for example, experiment
with the following function:
ssh-keygen -t rsa
The public and the private keys returned by this call, when stored
appropriately, will allow your laptop to establish SSH connections
78
Lecture 12
Lecture 12
rBDLAoGACEEjZnRkxKogIobZcmLZF1rJEUnpaezuXp5dWjh1CBUqjjfxGKeSR7VH
WCqx21GvA5ipwZp0HuCaWvWNQ/tdx14fTG4aES2/uurZBsOumzJZPJIC25shJLa+
TOCKIDY3afvDdVSktxwzLnCybM0WQZVTGX1k6sttR0HOswshX4A=
-----END RSA PRIVATE KEY-----
And here is an example of the public key that goes with the above
private key
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5amriY96HQS8Y/nKc8zu3zOylvp
On3vzMmWwrtyDy+aBvns4UC1RXoaD9rDKqNNMCBAQwWDsYwCAFsrBzbxRQONHeP
X8lRWgM87MseWGlu6WPzWGiJMclTAO9CTknplG9wlNzLQBj3dP1M895iLF6jvJ7
GR+V3CRU6UUbMmRvgPcsfv6ec9RRPm/B8ftUuQICL0jt4tKdPG45PBJUylHs71F
uE9FJNp01hrj1EMFObNTcsy9zuis0YPyzArTYSOUsGglleExAQYi7iLh17pAa+y
6fZrGLsptgqryuftN9Q4NqPuTiFjlqRowCDU7sSxKDgU7bzhshyVx3+pzXO4D2Q
== kak@pixie
Lecture 12
] Shown below is a
Python script that extracts the public exponent and the modulus
stored in an SSH RSA public key. In line with the note in blue,
the script first separates the three field in the key by splitting it
on white space. It then applies Base64 decoding to the middle
field since thats where the public exponent and the modulus are
stored. Subsequently, it scans the stream of decoded bytes for the
<length,value> records under the assumption that the length
of the value is always placed in the first four bytes of each record.
bytes hold the public exponent; and so on for extracting the modulus integer.
#!/usr/bin/env python
##
##
##
extract_sshpubkey_params.py
Author: Avi Kak
Date:
February 11, 2013
import sys
import base64
import BitVector
if len(sys.argv) != 2:
sys.stderr.write("Usage: %s
<public key file>\n" % sys.argv[0])
sys.exit(1)
keydata = base64.b64decode(open(sys.argv[1]).read().split(None)[1])
bv = BitVector.BitVector( rawbytes = keydata )
parts = []
while bv.length() > 0:
bv_length = int(bv[:32])
# read 4 bytes for length of data
data_bv = bv[32:32+bv_length*8]
# read the data
parts.append(data_bv)
bv.shift_left(32+bv_length*8)
# shift the starting BV and
bv = bv[0:-32-bv_length*8]
#
and truncate its length
public_exponent = int(parts[1])
modulus = int(parts[2])
81
Lecture 12
35
28992239265965680130833686108835390387986295644147105350109222053494471862488069515097328563379
83891022841669525585184878497657164390613162380624769814604174911672498450880421371197440983388
47257142771415372626026723527808024668042801683207069068148652181723508612356368518824921733281
43920627731421841448660007107587358412377023141585968920645470981284870961025863780564707807073
26000355974893593324676938927020360090167303189496460600023756410428250646775191158351910891625
48335568714591065003819759709855208965198762621002125196213207135126179267804883812905682728422
31250173298006999624238138047631459357691872217
The SSL/TLS public and private keys, as also the SSH RSA
private keys, are, on the other hand, stored using a more elaborate
procedure: The key information is first encoded using Abstract
Syntax Notation (ASN) according to the ASN.1 standard and the
resulting data structure DER-encoded into a byte stream. (DER
standards for Distinguished Encoding Rules its a part of
the ASN.1 standard.) Finally, the byte stream thus generated is
turned into a printable representation by Base64 encoding. [The
ASN.1 standard, along with one of its transfer encodings such as DER, accomplishes the same thing for complex
82
Lecture 12
data structures in a binary format that the XML standard does in a textual format. You can certainly convert
XML representations into binary formats, but the resulting encoding will, in general, be much longer than
those produced by ASN.1. Lets say you wish to represent all of your assets in a manner that would be directly
readable by different computing platforms and different programming languages. A record of your assets is
likely to consist of the names of the financial institutions and the value of the assets held by them, a listing of
your fixed assets, such as real estate properties and their worth, etc. In general, such data will require a tree
representation in which the various nodes may stand for the names of the financial institutions or the names
of the assets and the children of the leaf nodes would consist of asset values. The values for some of the nodes
may be in the form of ordered lists, unordered lists (sets), key-value pairs, etc. ASN.1 creates compact byte
level representations for such structures that is portable across platforms and languages. Just to give you a
small taste of the flexibility of ASN.1 representation, it places no constraints on the size of any of the symbolic
entities or any of the numerical values. And to also give you a taste of the secret to the sauce, when ASN.1 is
used with BER (Basic Encoding Rules) encoding, each node of the tree is represented by three blocks of bytes:
(1) Identification block of an unlimited number of bytes; (2) Length block of an unlimited number of bytes; and
(3) Value block of an unlimited number of bytes.The important thing to note here is there are no constraints
on how many bytes are taken up by each of the thee blocks. How does ASN.1 accomplish that? Its all done
by using high-end bytes to carry information about bytes further downstream. For example, if the length is to
be represented by a single byte, then the value of length must not exceed 128. However, if the value of length
is 128 or greater, then the most significant bit of the first byte must be set to 1 and the trailing bits must tell
us how many of the following bytes are being used for storing the length information. Similar rules are used
To generate the private and public keys for the SSL/TLS protocol
you can use the OpenSSL library in the following manner:
openssl genrsa -out myprivate.pem 1024
openssl rsa -in myprivate.pem -pubout > mypublic.pem
83
Lecture 12
where the first command creates a private key for a 1024 bit
modulus and the second then gives you the corresponding public
key. The private key will be deposited in the file myprivate.pem
and the public key in the file mypublic.pem.
If you want to see the modulus and the public exponent used in
the public key, you can execute
openssl rsa -pubin -inform PEM -text -noout < mypublic.pem
Lecture 12
Lecture 12
The rest of the fields are used in the modular exponentiation that
is carried out for decryption.
86
Lecture 12
12.12: IN SUMMARY . . .
Assuming that you are using the best possible random number
generators to create candidates for the primes that are needed
and that you also use a recent version of the RSA scheme that
is resistant to the chosen ciphertext attacks, the security of RSA
encryption depends critically on the difficulty of factoring large
integers.
These days you are unlikely to use a key whose length is or, to speak
more precisely, a modulus whose size is shorter than 1024 bits for RSA. Some
people recommend 2048 or even 4096 bit keys. The following
table vividly illustrates how the key sizes compare for symmetrickey cryptography and RSA-based public-key cryptography for the
same level of cryptographic security [Values taken from NIST Special Publication
800-57, Recommendations for Key Management Part 1, by Elaine Barker et al.]
87
Lecture 12
2-Key 3DES
3-Key 3DES
112
168
1024
2048
AES-128
AES-192
AES-256
128
192
256
3072
7680
15360
As youd expect, the computational overhead of RSA encryption/decryption goes up as the size of the modulus integer increases.
This makes RSA inappropriate for encryption/decryption of actual message content for high data-rate communication links.
However, RSA is ideal for the exchange of secret keys that can
subsequently be used for the more traditional (and much faster)
symmetric-key encryption and decryption of the message content.
88
Lecture 12
3. From the public key, we know the modulus n and the encryption
integer e. If a bad guy could figure out the totient of the modulus,
would that amount to breaking the code?
Lecture 12
M e (mod n)
Lecture 12
Since the enemy will know your public key, he will know that
what your business partner has sent you is C = M e where e is the
public exponent that the enemy would know about. Assuming
for the sake of convenience that e = 3, why cant the enemy
decrypt the confidential message intended for you by just taking
the cube-root of C?
7. Programming Assignment:
To better understand the point made in Section 12.3.2 that a
small value, such as 3, for the encryption integer e is cryptographically unsafe, assume that a party A has sent the same
message M = 10 to three different recipients using the following
three public keys:
[29, 3]
[37, 3]
[41, 3]
In each public key, the first integer is the modulus n and the
second the encryption integer e. Now use the Chinese Remainder
Theorem of Section 11.7 in Lecture 11 to show how you can
reconstruct M 3, which in this case would be 1000, from the three
ciphertext values corresponding to the three public keys. [HINT:
If you are using Python, the ciphertext value in each case is returned by the built-in 3-argument function
pow(). For example, pow(M, 3, 29) will return the ciphertext integer C1 for the first public key shown above.
For each public key, we have Ci = M 3 mod ni where the three moduli are denoted n1 = 29, n2 = 37, and
n3 = 41. Now to solve the problem, you can reason as follows: Since n1 , n2 , and n3 are pairwise co-prime,
CRT allows us to reconstruct M 3 modulo N = n1 n2 n3 . This will require that you find Ni = N/ni for
i = 1, 2, 3. And then you would need to find the multiplicative inverse of each Ni modulo its corresponding ni .
91
Lecture 12
Let Niinv denote this multiplicative inverse. You can use the Python multiplicative-inverse calculator shown
in Section 5.7 of Lecture 5 to calculate the Niinv values. Then, by CRT, you should be able to recover M 3 by
8. Programming Assignment:
Using the PrimeGenerator class shown below and the multiplicative-inverse finding script presented earlier in Section 5.7 of
Lecture 5, write a Python script that would constitute a complete
implementation of a 64-bit RSA algorithm. The call syntax for
constructing an instance of the PrimeGenerator class and then
invoking findPrime() on the instance is shown at the end of
the script below in its main().
#!/usr/bin/env python
##
##
##
PrimeGenerator.py
Author: Avi Kak
Date: February 18, 2011
import random
class PrimeGenerator( object ):
def __init__( self, **kwargs ):
if kwargs.has_key(bits): bits = kwargs.pop(bits)
if kwargs.has_key(debug): debug = kwargs.pop(debug)
self.bits = bits
self.debug = debug
def set_initial_candidate(self):
candidate = random.getrandbits( self.bits )
if candidate & 1 == 0: candidate += 1
candidate |= (1 << self.bits-1)
candidate |= (2 << self.bits-3)
self.candidate = candidate
def set_probes(self):
self.probes = [2,3,5,7,11,13,17]
92
Lecture 12
9. Programming Assignment:
93
Lecture 12
94