Prime Factorisation A New Approach
Prime Factorisation A New Approach
INTRODUCTION
Given an algorithm for integer factorization, one can factor any integer down to its constituent prime
factors by repeated application of this algorithm. The basic method of prime factorisation is known as
the Fermat's method. All other method is modifications of this elegant basic technique. The
factorisation of the large semi-prime number is the key to a secure communication. Prime
factorisation also finds applications in Fast Fourier Transform. Many cryptographic protocols are
based on the difficulty of factoring large composite integers or a related problem, the RSA problem
(In cryptography, RSA (which stands for Rivest, Shamir and Adleman who first publicly described it)
is an algorithm for public-key cryptography). An algorithm which efficiently factors an arbitrary
integer would render RSA-based public-key cryptography insecure. To this date no efficient algorithm
has been recorded for factorisation of large primes in polynomial time.
2. PRIME NUMBERS AND ASSOCIATED CONCEPTS
A natural number is called a prime number (or a prime) if it is bigger than one and has no
divisors other than 1 and itself. For example, 5 is prime, since no number except 1 and 5
divides it. On the other hand, 6 is not a prime (it is composite), since 6 = 2 × 3. The property
of being prime is called primality. There is no known useful formula that yields all of the
prime numbers and no composites. However, the distribution of primes, that is to say, the
statistical behaviour of primes in the large can be modelled. The first result in that direction
is the prime number theorem which says that the probability that a given, randomly chosen
number n is prime is inversely proportional to its number of digits, or the logarithm of n.
Therefore, the density of prime numbers within natural numbers is 0, but in a sense, primes
occur more often than squares of integers.
23244 = 2 · 2 · 3 · 13 · 149
= 22 · 3 · 13 · 149
As in this example, the same prime factor may occur multiple times. A composite number n
can be factored into (finitely many) prime factors p1, p2 ... to pt, this is called prime
factorization of n. The fundamental theorem of arithmetic can be rephrased so as to say that
any factorization into primes will be identical except for the order of the factors. So, albeit
there are many prime factorization algorithms to do this in practice for larger numbers, they
all have to yield the same result.
Most early Greeks did not not even consider 1 to be a number, so clearly did not consider it a
prime. In the 19th century however, many mathematicians did consider the number 1 a
prime. For example, Derrick Norman Lehmer's list of primes up to 10,006,721, reprinted as
late as 1956, started with 1 as its first prime. Henri Lebesgue is said to be the last
professional mathematician to call 1 prime. Although a large body of mathematical work is
also valid when calling 1 a prime, the above fundamental theorem of arithmetic does not
hold as stated. For example, the number 15 can be factored as 3 · 5 or 1 · 3 · 5. If 1 were
admitted as a prime, these two presentations would be considered different factorizations of
15 into prime numbers, so the statement of that theorem would have to be modified.
3. COMPOSITE NUMBERS
A composite number is a positive integer which has a positive divisor other than one or itself. In
other words a composite number is any positive integer greater than one that is not a prime
number. So, if n > 0 is an integer and there are integers 1 < a, b < n such that n = a × b, then n is
composite. By definition, every integer greater than one is either a prime number or a composite
number. The number one is a unit – it is neither prime nor composite. For example, the integer 14
is a composite number because it can be factored as 2 × 7. Likewise, the integers 2 and 3 are not
composite numbers because each of them can only be divided by one and itself.
One way to classify composite numbers is by counting the number of prime factors. A
composite number with two prime factors is a semiprime or 2-almost prime (the factors need
not be distinct; hence squares of primes are included). A composite number with three
distinct prime factors is a sphenic number. In some applications, it is necessary to
differentiate between composite numbers with an odd number of distinct prime factors and
those with an even number of distinct prime factors.
In number theory, the prime number theorem (PNT) describes the asymptotic distribution of the
prime numbers. The prime number theorem gives a general description of how the primes are
distributed amongst the positive integers. Informally speaking, the prime number theorem states
that if a random integer is selected near to some large integer N, the probability that the selected
1,
integer is prime is about (�)
where ln(N) denotes the natural logarithm of N. For example, near
N = 1,000, about one in seven numbers is prime, whereas near N = 10,000,000,000, about one in
23 numbers is prime. In other words, the average gap between consecutive prime numbers near N
is roughly ln(N)
Let π(x) be the prime-counting function that gives the number of primes less than or equal to x,
for any real number x. For example, π(10) = 4 because there are four prime numbers (2, 3, 5 and
7) less than or equal to 10. The prime number theorem then states that the limit of the quotient of
the two functions π(x) and x / ln(x) as x approaches infinity is 1, which is expressed by the
formula
5. INTEGER FACTORISATION ALGORITHMS
When the numbers are very large, no efficient integer factorization algorithm is known. Not
all numbers of a given length are equally hard to factor. The hardest instances of these
problems (for currently known techniques) are semiprimes, the product of two prime
numbers. When they are both large, randomly chosen, and about the same size (but not too
close, e.g. to avoid efficient factorization by Fermat's factorization method), even the fastest
prime factorization algorithms on the fastest computers can take enough time to make the
search impractical. The execution time and the memory requirements that are necessary
becomes very high or impractical when a large number is subject to any of the existing
algorithms.
A general-purpose factoring algorithm's running time depends solely on the size of the
integer to be factored. This is the type of algorithm used to factor RSA numbers. Most
general-purpose factoring algorithms are based on the congruence of squares method.
In number theory, the general number field sieve (GNFS) is the most efficient
classical algorithm known for factoring integers larger than 100 digits It is a
generalization of the special number field sieve: while the latter can only factor
numbers of a certain special form, the general number field sieve can factor any
number apart from prime powers (which are trivial to factor by taking roots). When
the term number field sieve (NFS) is used without qualification, it refers to the
general number field sieve. The principle of the number field sieve (both special and
general) can be understood as an improvement to the simpler quadratic sieve. When
using such algorithms to factor a large number n, it is necessary to search for smooth
numbers (i.e. numbers with small prime factors) of order n1/2. The size of these values
is exponential in the size of n (see below). The general number field sieve, on the
other hand, manages to search for smooth numbers that are sub exponential in the size
of n. Since these numbers are smaller, they are more likely to be smooth than the
numbers inspected in previous algorithms. This is the key to the efficiency of the
number field sieve. In order to achieve this speed-up, the number field sieve has to
perform computations and factorizations in number fields. This results in many rather
complicated aspects of the algorithm, as compared to the simpler rational sieve.
Shor's algorithm, named after mathematician Peter Shor, is a quantum algorithm (an
algorithm which runs on a quantum computer) for integer factorization formulated in 1994.
Informally it solves the following problem: Given an integer N, find its prime factors. On a
quantum computer, to factor an integer N, Shor's algorithm runs in polynomial time (the time
taken is polynomial in log N, which is the size of the input).[1] Specifically it takes time O((log
N)3), demonstrating that the integer factorization problem can be efficiently solved on a quantum
computer
Given a quantum computer with a sufficient number of qubits, Shor's algorithm can be used to
break public-key cryptography schemes such as the widely used RSA scheme. RSA is based on
the assumption that factoring large numbers is computationally infeasible. So far as is known, this
assumption is valid for classical (non-quantum) computers; no classical algorithm is known that
can factor in polynomial time. However, Shor's algorithm shows that factoring is efficient on a
quantum computer, so an appropriately large quantum computer can break RSA. It was also a
powerful motivator for the design and construction of quantum computers and for the study of
new quantum computer algorithms. It has also facilitated research on new cryptosystems that are
secure from quantum computers, collectively called post-quantum cryptography.
6. DEVELOPING A FACTORISING ALGORITHM
In this section we try to develop a method for factorisation of numbers by combing some of the
existing methods. Then we attempt to optimise the method by reducing the number of calculations
required.
6.1.1. Assume N’ is not a perfect square. Check for division by 2 and then by 5 and then by
3 Hence we get the number as
N = 2n 2 5n 5 3n 3 × N′
6.1.2. Now we have a number N’ which ends (1,3,7,9) which is not a multiple of 2, 3, or 5.
�′ = ×
( + ) ( − )
= , =
2 2
2 2
+ −
2
− 2
= − = = �′
2 2
6.1.5. We know the value N’ if we can find a value of ‘a’ such that
= 2 − �′
= =
= � ′ + 1, � �′ �
The equation forms the lower limit of ‘a’. To find the upper limit we have already eliminated
the factors 2,3 and 5. Now the next possibility is 7, therefore we have
�′
= 7+7
2
From these we can specify the range of ‘a’ by defining the lower and upper limit as follows.
� � �� � = �′ + 1
�′
+7
7
� � =
2
�′
+7
7
∈ �′ + 1,
2
n 6n-1 6n+1
1 5 7
2 11 13
3 17 19
4 23 25
20 = 1,4,5,9,16
With the help of this property we can narrow down the search for a . The last three
digits of N can have possibilities as shown below
� � = 0 − 9 × 0 − 9 × 1,3,7,9
� �� = 10 × 10 × 4 = 400
6.5.2.The table cell contains two elements the first represents the last three digits of N’.
6.5.3.A notation of (0-9)odd1 means last three digits of N’ can be of the form 791 251 3611.
6.5.4.Therefore in each of the cells the first row corresponds to the hundredth tenth and unit
digit possibilities o N’.
6.5.5.The second row in each cell corresponds to the possible end digits of ‘a’ values which
when subtracted from the N’ value of the given cell gives rise to perfect square.
6.5.7.Similarly the term odd0 corresponds to values ending in 0 preceded by an odd number eg:
230 510 10 30 etc..
6.5.8.Thus with help of the look up table given below we can calculate the factor of N’ by
considering the values of ‘a’ which have the highest probability of producing a perfect
square when subtracted from N’
(0-9) ODD 7 (0,3,4,7,8) EVEN 7 (1,2,5,6,9) EVEN 7
(0-9)9,(0-9)1 (ODD)6,(EVEN)4 (EVEN)6,(ODD)4
After identifying the set of values required to be checked between the range substitute them in the
equation
�′ − 2
till a perfect square is achieved. This will give rise to a factorisation as shown below
�′ = + ( − )
In mathematics, big O notation is used to describe the limiting behavior of a function when the
argument tends towards a particular value or infinity, usually in terms of simpler functions. In
computer science, big O notation is used to classify algorithms by how they respond (e.g. in their
processing time or working space requirements) to changes in input size.
Big O notation characterizes functions according to their growth rates: different functions with the
same growth rate may be represented using the same O notation. A description of a function in
terms of big O notation usually only provides an upper bound on the growth rate of the function.
Associated with big O notation are several related notations, using the symbols o, Ω, ω, and Θ, to
describe other kinds of bounds on asymptotic growth rates.
Big O notation is also used in many other fields to provide similar estimates.
In typical usage, the formal definition of O notation is not used directly; rather, the O notation for
a function f(x) is derived by the following simplification rules:
If f(x) is a sum of several terms, the one with the largest growth rate is kept, and all others
omitted.
If f(x) is a product of several factors, any constants (terms in the product that do not
depend on x) are omitted.
For example, let f(x) = 6x4 − 2x3 + 5, and suppose we wish to simplify this function, using O
notation, to describe its growth rate as x approaches infinity. This function is the sum of three
terms: 6x4, −2x3, and 5. Of these three terms, the one with the highest growth rate is the one with
the largest exponent as a function of x, namely 6x4. Now one may apply the second rule: 6x4 is a
product of 6 and x4 in which the first factor does not depend on x. Omitting this factor results in
the simplified form x4. Thus, we say that f(x) is a big-oh of (x4) or mathematically we can write
f(x) = O(x4). One may confirm this calculation using the formal definition: let f(x) = 6x4 − 2x3 + 5
and g(x) = x4. Applying the formal definition from above, the statement that f(x) = O(x4) is
equivalent to its expansion,
For some suitable choice of x0 and M and for all x > x0. To prove this, let x0 = 1 and M = 13. Then,
for all x > x0:
so
8 TIME COMPLEXITY
In computer science, the time complexity of an algorithm quantifies the amount of time taken by an
algorithm to run as a function of the size of the input to the problem. The time complexity of an
algorithm is commonly expressed using big O notation, which suppresses multiplicative constants and
lower order terms. When expressed this way, the time complexity is said to be described
asymptotically, i.e., as the input size goes to infinity. For example, if the time required by an
algorithm on all inputs of size n is at most 5n3 + 3n, the asymptotic time complexity is O(n3).
Time complexity is commonly estimated by counting the number of elementary operations performed
by the algorithm, where an elementary operation takes a fixed amount of time to perform. Thus the
amount of time taken and the number of elementary operations performed by the algorithm differ by
at most a constant factor.
Since an algorithm may take a different amount of time even on inputs of the same size, the most
commonly used measure of time complexity, the worst-case time complexity of an algorithm, denoted
as T(n), is the maximum amount of time taken on any input of size n. Time complexities are classified
by the nature of the function T(n). For instance, an algorithm with T(n) = O(n) is called a linear time
algorithm, and an algorithm with T(n) = O(2n) is said to be an exponential time algorithm.
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is
bounded by a value that does not depend on the size of the input. For example, accessing any
single element in an array takes constant time as only one operation has to be performed to
locate it. However, finding the minimal value in an unordered array is not a constant time
operation as a scan over each element in the array is needed in order to determine the minimal
value. Hence it is a linear time operation, taking O(n) time. If the number of elements is
known in advance and does not change, however, such an algorithm can still be said to run in
constant time.
Despite the name "constant time", the running time does not have to be independent of the
problem size, but an upper bound for the running time has to be bounded independently of the
problem size. For example, the task "exchange the values of a and b if necessary so that a≤b"
is called constant time even though the time may depend on whether or not it is already true
that a ≤ b. However, there is some constant t such that the time required is always at most t.
An algorithm is said to take logarithmic time if T(n) = O(log n). Due to the use of the binary
numeral system by computers, the logarithm is frequently base 2 (that is, log2 n, sometimes
written lg n). However, by the change of base equation for logarithms, loga n and logb n differ
only by a constant multiplier, which in big-O notation is discarded; thus O(log n) is the
standard notation for logarithmic time algorithms regardless of the base of the logarithm.
Algorithms taking logarithmic time are commonly found in operations on binary trees or
when using binary search.
8.6 POLYLOGARITHMIC TIME
An algorithm is said to run in polylogarithmic time if T(n) = O((log n)k), for some constant k.
For example, matrix chain ordering can be solved in polylogarithmic time on a Parallel
Random Access Machine.
The quicksort sorting algorithm on n integers performs at most An2 operations for
some constant A. Thus it runs in time O(n2) and is a polynomial time algorithm.
All the basic arithmetic operations (addition, subtraction, multiplication, division, and
comparison) can be done in polynomial time.
Maximum matchings in graphs can be found in polynomial time.
9.1 The best published asymptotic running time is for the general number field sieve (GNFS)
algorithm, which, for a b-bit number n, is:
9.2 For an ordinary computer, GNFS is the best published algorithm for large n (more than about
100 digits). For a quantum computer, however, Peter Shor discovered an algorithm in 1994
that solves it in polynomial time. This will have significant implications for cryptography if
a large quantum computer is ever built. Shor's algorithm takes only O(b3) time and O(b)
space on b-bit number inputs. In 2001, the first 7-qubit quantum computer became the first
to run Shor's algorithm. It factored the number 15.
9.3 The first very large distributed factorisation was RSA129, a challenge number described in
the Scientific American article of 1977 which first popularised the RSA cryptosystem. It was
factorised between September 1993 and April 1994, using MPQS, with relations contributed
by about 600 people from all over the Internet, and the final stages of the calculation
performed on a MasPar supercomputer at Bell Labs.
9.4 Between January and August 1999, RSA-155, a challenge number prepared by the RSA
company, was factorised using GNFS with relations again contributed by a large group, and
the final stages of the calculation performed in just over nine days on the Cray C916
supercomputer at the SARA Amsterdam Academic Computer Center.
9.5 In January 2002, Franke et al. announced the factorisation of a 158-digit cofactor of 2953+1,
using a couple of months on about 25 PCs at the University of Bonn, with the final stages
done using a cluster of six Pentium-III PCs.
9.6 In April 2003, the same team factored RSA-160 using about a hundred CPUs at BSI, with
the final stages of the calculation done using 25 processors of an SGI Origin supercomputer.
9.7 The 174-digit RSA-576 was factored by Franke, Kleinjung and members of the NFSNET
collaboration in December 2003, using resources at BSI and the University of Bonn; soon
afterwards, Aoki, Kida, Shimoyama, Sonoda and Ueda announced that they had factored a
164-digit cofactor of 21826+1.
9.8 A 176-digit cofactor of 11281+1 was factored by Aoki, Kida, Shimoyama and Ueda between
February and May 2005 using machines at NTT and Rikkyo University in Japan.
9.9 The RSA-200 challenge number was factored by Franke, Kleinjung et al. between December
2003 and May 2005, using a cluster of 80 Opteron processors at BSI in Germany; the
announcement was made on 9 May 2005.[2] They later (November 2005) factored the slightly
smaller RSA-640 challenge number.
9.10 On December 12, 2009, a team including researchers from the CWI, the EPFL,
INRIA and NTT in addition to the authors of the previous record factored RSA-768, a 232-
digit semiprime. They used the equivalent of almost 2000 years of computing on a single
core 2.2 GHz AMD Opteron.
10 APPLICATIONS OF FACTORISATION
10.1 CRYPTOGRAPHY
Cryptography is the practice and study of techniques for secure communication in the
presence of third parties (called adversaries). More generally, it is about constructing and
analyzing protocols that overcome the influence of adversaries and which are related to
various aspects in information security such as data confidentiality, data integrity, and
authentication. Modern cryptography intersects the disciplines of mathematics, computer
science, and electrical engineering. Applications of cryptography include ATM cards,
computer passwords, and electronic commerce.
Cryptology prior to the modern age was almost synonymous with encryption, the conversion
of information from a readable state to apparent nonsense. The sender retained the ability to
decrypt the information and therefore avoid unwanted persons being able to read it. Since
World War I and the advent of the computer, the methods used to carry out cryptology have
become increasingly complex and its application more widespread.
The prime-factor algorithm (PFA), also called the Good–Thomas algorithm (1958/1963),
is a fast Fourier transform (FFT) algorithm that re-expresses the discrete Fourier transform
(DFT) of a size N = N1N2 as a two-dimensional N1×N2 DFT, but only for the case where N1
and N2 are relatively prime. These smaller transforms of size N1 and N2 can then be evaluated
by applying PFA recursively or by using some other FFT algorithm.
PFA should not be confused with the mixed-radix generalization of the popular Cooley–
Tukey algorithm, which also subdivides a DFT of size N = N1N2 into smaller transforms of
size N1 and N2. The latter algorithm can use any factors (not necessarily relatively prime), but
it has the disadvantage that it also requires extra multiplications by roots of unity called
twiddle factors, in addition to the smaller transforms. On the other hand, PFA has the
disadvantages that it only works for relatively prime factors (e.g. it is useless for power-of-
two sizes) and that it requires a more complicated re-indexing of the data based on the
Chinese remainder theorem (CRT).
11 REFERENCES
11.1 Thomas, L. H. (1963). "Using a computer to solve problems in physics".
Applications of Digital Computers. Boston: Ginn.
11.2 Chan, S. C.; Ho, K. L. (1991). "On indexing the prime-factor fast Fourier
transform algorithm". IEEE Trans. Circuits and Systems
11.3 Thomas H. Cormen et al., 2001, Introduction to Algorithms, Second Edition
11.4 Donald Knuth. Big Omicron and big Omega and big Theta, ACM SIGACT
News, Volume 8, Issue 2, 1976
11.5 Richard Crandall and Carl Pomerance, Prime Numbers: A Computational
Perspective, second edition, Springer-Verlag, New York, 2005.
11.6 G. H. Hardy, E. M. Wright, A. Wiles et al, An Introduction to the Theory of
Numbers, sixth edition, Oxford University Press,2008
PRIME FACTORISATION: BASIC METHOD AND APPLICATION IN SECURE
COMMUNICATION
Rahulkrishnan C
B080118EC