0% found this document useful (0 votes)
251 views

Base 64 Report

The document discusses base64 encoding and its implementations. It begins with an introduction to base64 encoding and its use for securing communications over the internet. It then discusses the history of encryption and some common encryption techniques. It provides an introduction to base64 encoding and describes how it is used to encode binary data for storage and transfer over media designed for text. It discusses two implementations of base64 - PEM and MIME, describing their history and technical details.

Uploaded by

Periyadan Aswin
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
251 views

Base 64 Report

The document discusses base64 encoding and its implementations. It begins with an introduction to base64 encoding and its use for securing communications over the internet. It then discusses the history of encryption and some common encryption techniques. It provides an introduction to base64 encoding and describes how it is used to encode binary data for storage and transfer over media designed for text. It discusses two implementations of base64 - PEM and MIME, describing their history and technical details.

Uploaded by

Periyadan Aswin
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 30

1.

INTRODUCTION
We are living in a world of electronics and computers and internet and emails are the
mostly used communication media now days. What is the purpose of communication without
security? Communication over the internet need security protocols and different encoding
schemes are worldwide for security.

In this seminar an encoding scheme called base 64 encoding and its many
implementations including PEM are presented.Base64 is a generic term for any number of
similar encoding schemes that encode binary data by treating it numerically and translating it
into a base 64 representation.Base64 encoding schemes are commonly used when there is a
need to encode binary data that needs be stored and transferred over media that are designed
to deal with textual data. That is, base64 is commonly used in transfer of emails .This is to
ensure that the data remains intact without modification during transport. Base64 is used
commonly in a number of applications including email via MIME, and storing complex data
in XML. Besides being the default Encoding standard being used for encoding files to be sent
as attachments by Multipurpose Internet Mail Extensions or MIME, it has also started being
used in a number of other places.

PEM,UTF 7,Open PGP ,MIME are other implementations of base 64which uses
different encryption schemes and base 64 .PEM is the first implementation which is used to
secure emails using base 64.

The advantage of base 64 is that it provides security and being an easy algorithm, it
can be easily implemented.
2. ENCRYPTION

Encryption refers to algorithmic schemes that encode plain text into non-readable
form or cipher text, providing privacy. The receiver of the encrypted text uses a "key" to
decrypt the message, returning it to its original plain text form. The key is the trigger
mechanism to the algorithm we interact with the Internet. A cipher (or cipher) is a pair of
algorithms that create the encryption and the reversing decryption. The detailed operation of a
cipher is controlled both by the algorithm and in each instance by a key. This is a secret
parameter (ideally known only to the communicants) for a specific message exchange
context. Keys are important, as ciphers without variable keys can be trivially broken with
only the knowledge of the cipher used and are therefore useless (or even counter-productive)
for most purposes. Historically, ciphers were often used directly for encryption or decryption
without additional procedures such as authentication or integrity checks.

2.1. HISTORY OF ENCRYPTION

The earliest forms of secret writing required little more than local pen and paper
analogs, as most people could not read. More literacy, or literate opponents, required actual
cryptography. The main classical cipher types are transposition ciphers, which rearrange the
order of letters in a message (e.g., 'hello world' becomes 'ehlol owrdl' in a trivially simple
rearrangement scheme), and substitution ciphers, which systematically replace letters or
groups of letters with other letters or groups of letters (e.g., 'fly at once' becomes 'gmz bu
podf' by replacing each letter with the one following it in the Latin alphabet). Simple versions
of either offered little confidentiality from enterprising opponents, and still do. An early
substitution cipher was the Caesar cipher, in which each letter in the plaintext was replaced
by a letter some fixed number of positions further down the alphabet. It was named after
Julius Caesar who is reported to have used it, with a shift of 3, to communicate with his
generals during his military campaigns, just like Excess-3 code in Boolean algebra. There is
record of several early Hebrew ciphers as well. The earliest known use of cryptography is
some carved cipher text on stone in Egypt (ca 1900 BC), but this may have been done for the
amusement of literate observers. The next oldest is bakery recipes from Mesopotamia.
Until the advent of the Internet, encryption was rarely used by the public, but was
largely a military tool. The development of digital computers and electronics after WWII
made possible much more complex ciphers. Furthermore, computers allowed for the
encryption of any kind of data representable in any binary format, unlike classical ciphers
which only encrypted written language texts; this was new and significant. Today, with
online marketing, banking, healthcare and other services, even the average householder is
aware of encryption. Now the process of hiding information is collectively denoted by the
term cryptography. The term is derived from the Greek language. ’krytos’ means secret and
‘graphos’ means writing.

Modern cryptography intersects the disciplines of mathematics, computer science, and


engineering. Applications of cryptography include ATM cards, computer passwords, and
electronic commerce.
3. BASE 64: AN INTRODUCTION

Base64 is a generic term for any number of similar encoding schemes that encode
binary data by treating it numerically and translating it into a base 64 representation. The
Base64 term originates from a specific MIME content transfer encoding. Base64 encoding
schemes are commonly used when there is a need to encode binary data that needs be stored
and transferred over media that are designed to deal with textual data. This is to ensure that
the data remains intact without modification during transport. So Base 64 encoding method is
commonly used in email systems. The email systems that where developed back in the time
of Arpanet, where designed to support only Letters (A-Z, a-z), Numbers (0-9) and some
limited punctuation marks. So in order to transfer files which can contain more than
characters and digits (for e.g., a picture.jpg file), Base 64 Encoding is used.

Since its introduction, Base64 encoding has extremely quickly gained popularity.
Besides being the default Encoding standard being used for encoding files to be sent as
attachments by Multipurpose Internet Mail Extensions or MIME, it has also started being
used in a number of other places.Base64 is used commonly in a number of applications
including email via MIME, and storing complex data in XML, used in web servers for
implementing HTTP based basic authentication etc.
4. HISTORY AND IMPLEMENTATIONS OF BASE64
4.1 PEM (PRIVACY ENHANCED MAIL)
Privacy Enhanced Mail (PEM), is an early IETF proposal for securing email using
public key cryptography. Although PEM became an IETF proposed standard it was never
widely deployed or used.
The first known standardized use of the encoding now called MIME Base64 was in
the Privacy-enhanced Electronic Mail (PEM) protocol, proposed by RFC 989 in 1987. PEM
defines a "printable encoding" scheme that uses Base64 encoding to transform an arbitrary
sequence of octets to a format that can be expressed in short lines of 6-bit characters, as
required by transfer protocols such as SMTP.
The current version of PEM (specified in RFC 1421) uses a 64-character alphabet
consisting of upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–
9), and the "+" and "/" symbols. The "=" symbol is also used as a special suffix code. The
original specification, RFC 989, additionally used the "*" symbol to delimit encoded but
unencrypted data within the output stream.
To convert data to PEM printable encoding, the first byte is placed in the most
significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least
significant eight bits. If there are fewer than three bytes left to encode (or in total), the
remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant
first, as indices into the string:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
, and the indicated character is output.
The process is repeated on the remaining data until fewer than four octets remain. If
three octets remain, they are processed normally. If fewer than three octets (24 bits) are
remaining to encode, the input data is right-padded with zero bits to form an integral multiple
of six bits.
After encoding the non-padded data, if two octets of the 24-bit buffer are padded-
zeros, two "=" characters are appended to the output; if one octet of the 24-bit buffer is filled
with padded-zeros, one "=" character is appended. This signals the decoder that the zero bits
added due to padding should be excluded from the reconstructed data. This also guarantees
that the encoded output length is a multiple of 4 bytes.
PEM requires that all encoded lines consist of exactly 64 printable characters, with the

exception of the last line, which may contain fewer printable characters. Lines are delimited
by white space characters according to local (platform-specific) conventions.

4.2 MIME
Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends
the format of e-mail to support:
1. Text in character sets other than ASCII
2. Non-text attachments
3. Message bodies with multiple parts
4. Header information in non-ASCII character sets
MIME's use, however, has grown beyond describing the content of e-mail to describing
content type in general, including for the web (see Internet media type).Virtually all human-
written Internet e-mail and a fairly large proportion of automated e-mail is transmitted via
SMTP in MIME format. Internet e-mail is so closely associated with the SMTP and MIME
standards that it is sometimes called SMTP/MIME e-mail.
The content types defined by MIME standards are also of importance outside of e-
mail, such as in communication protocols like HTTP for the World Wide Web. HTTP
requires that data be transmitted in the context of e-mail-like messages, although the data
most often is not actually e-mail.
MIME (Multipurpose Internet Mail Extensions) specification, lists base64 as one of
two binary-to-text encoding schemes (the other being quoted-printable).MIME's Base64
encoding is based on that of the RFC 1421 version of PEM: it uses the same 64-character
alphabet and encoding mechanism as PEM, and uses the "=" symbol for output padding in the
same way, as described at RFC 1521.
MIME does not specify a fixed length for Base64-encoded lines, but it does specify a
maximum line length of 76 characters. Additionally it specifies that any extra-alphabetic
characters must be ignored by a compliant decoder, although most implementations use a
CR/LF newline pair to delimit encoded lines. Thus, the actual length of MIME-compliant
Base64-encoded binary data is usually about 137% of the original data length, though for
very short messages the overhead can be a lot higher because of the overhead of the headers.
Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original
data size + 814 bytes (for headers). In other words, you can approximate the size of the
decoded data with this formula: bytes = (string_length (encoded_string) - 814) / 1.37

4.3 UTF 7
UTF-7 (7-bit Unicode Transformation Format) is a variable-length character encoding
that was proposed for representing Unicode text using a stream of ASCII characters, for
example for use in Internet E-mail messages. UTF-7 was first proposed as an experimental
protocol in RFC 1642, A Mail-Safe Transformation Format of Unicode.
Provided certain rules are followed during encoding, UTF-7 can be sent in e-mail
without using an underlying MIME transfer encoding, but still must be explicitly identified as
the text character set. In addition, if used within e-mail headers such as "Subject:", UTF-7
must be contained in MIME encoded words identifying the character set. Since encoded
words force use of either quoted-printable or base64, UTF-7 was designed to avoid using the
= sign as an escape character to avoid double escaping when it is combined with quoted-
printable (or its variant, the RFC 2047
Some characters can be represented directly as single ASCII bytes. The first group is
known as "direct characters" and contains all 62 alphanumeric characters and 9 symbols: '
( ) , - . / : ?. The direct characters are considered very safe to include literally. The other main
group, known as "optional direct characters", contains all other printable characters in the
range U+0020–U+007E except ~ \ + and space. Using the optional direct characters reduces
size and enhances human readability but also increases the chance of breakage by things like
badly designed mail gateways and may require extra escaping when used in encoded words
for header fields. Space, tab, carriage return and line feed may also be represented directly as
single ASCII bytes. However, if the encoded text is to be used in e-mail, care is needed to
ensure that these characters are used in ways that do not require further content transfer
encoding to be suitable for e-mail. The plus sign (+) may be encoded as +-.
For encoding In UTF 7,first an encoder must decide which characters to represent
directly in ASCII form, which +es have to be escaped as +-, and which to place in blocks of
Unicode characters. A simple encoder may encode all characters it considers safe for direct
encoding directly. However the cost of coming out of a Unicode block to represent a single
character and then going directly back in is 3 to 3⅔ bytes, this is more than the 2⅔ bytes
needed to represent such a character as a part of a Unicode sequence. Each Unicode sequence
must be encoded using the following procedure, then surrounded by the appropriate
delimiters.

We will use the £† (U+00A3 U+2020) character sequence as an example


1. Express the character’s Unicode numbers (UTF-16) in Binary:
0x00A3 → 0000 0000 1010 0011
0x2020 → 0010 0000 0010 0000
2. Concatenate the binary sequences
0000 0000 1010 0011 and 0010 0000 0010 0000 → 0000 0000 1010 0011 0010 0000
0010 0000
3. Regroup the binary into groups of six bits, starting from the left:
0000 0000 1010 0011 0010 0000 0010 0000 → 000000 001010 001100 100000 001000
00
4. If the last group has less than six bits, add trailing zeros:
000000 001010 001100 100000 001000 00 → 000000 001010 001100 100000 001000
000000
5. Replace each group of six bits with a respective Base64 code:
000000 001010 001100 100000 001000 000000 → AKMgIA
For decoding, an encoded data must be separated into plain ASCII text chunks
(including +es followed by a dash) and nonempty Unicode blocks as mentioned in the
description section. Once this is done, each Unicode block must be decoded with the
following procedure (using the result of the encoding example above as example)
1. Express each Base64 code as the bit sequence it represents:
AKMgIA → 000000 001010 001100 100000 001000 000000
2. Regroup the binary into groups of sixteen bits, starting from the left:
000000 001010 001100 100000 001000 000000 → 0000000010100011
0010000000100000 0000
3. If there is an incomplete group at the end, discard it (If the incomplete group contains
more than four bits or contains any ones, the code is invalid):
0000000010100011 0010000000100000
4. Each group of 16 bits is a character's Unicode (UTF-16) number and can be expressed in
other forms:
0000 0000 1010 0011 ≡ 0x00A3 ≡ 16310

4.4 OPEN PGP


OpenPGP is a non-proprietary protocol for encrypting email using public key
cryptography. It is based on PGP as originally developed by Phil Zimmermann. The
OpenPGP protocol defines standard formats for encrypted messages, signatures, and
certificates for exchanging public keys. Beginning in 1997, the OpenPGP Working Group
was formed in the Internet Engineering Task Force (IETF) to define this standard. Over the
past decade, PGP, and later OpenPGP, has become the standard for nearly all of the world's
encrypted email.
PGP encryption uses a serial combination of hashing, data compression, symmetric-
key cryptography, and, finally, public-key cryptography; each step uses one of several
supported algorithms. Each public key is bound to a user name and/or an e-mail address. PGP
supports message authentication and integrity checking. The latter is used to detect whether a
message has been altered since it was completed (the message integrity property), and the
former to determine whether it was actually sent by the person/entity claimed to be the sender
(a digital signature). In PGP, these are used by default in conjunction with encryption, but can
be applied to plaintext as well. The sender uses PGP to create a digital signature for the
message with either the RSA or DSA signature algorithms. To do so, PGP computes a hash
(also called a message digest) from the plaintext, and then creates the digital signature from
that hash using the sender's private keys.
Both when encrypting messages and when verifying signatures, it is critical that the
public key used to send messages to someone or some entity actually does 'belong' to the
intended recipient. Simply downloading a public key from somewhere is not overwhelming
assurance of that association; deliberate (or accidental) impersonation is possible. PGP has,
from its first versions, always included provisions for distributing a user's public keys in an
'identity certificate' which is so constructed cryptographically that any tampering (or
accidental garble) is readily detectable. But merely making a certificate which is impossible
to modify without being detected effectively is also insufficient. It can prevent corruption
only after the certificate has been created, not before. Users must also ensure by some means
that the public key in a certificate actually does belong to the person/entity claiming it. From
its first release, PGP products have included an internal certificate 'vetting scheme' to assist
with this; a trust model which has been called a web of trust.
Because of PGP encryption's importance worldwide (it is thought to be the most

widely chosen quality cryptographic system), many wanted to write their own software that
would interoperate with PGP. Zimmermann became convinced that an open standard for PGP
encryption was critical for them and for the cryptographic community as a whole. In July
1997, PGP Inc. proposed to the IETF that there be a standard called OpenPGP. They gave the
IETF permission to use the name OpenPGP to describe this new standard as well as any
program that supported the standard. The IETF accepted the proposal and started the
OpenPGP Working Group. The Free Software Foundation has developed its own OpenPGP-
compliant program called GNU Privacy Guard (abbreviated GnuPG or GPG). GnuPG is
freely available together with all source code under the GNU General Public License (GPL)
and is maintained separately from several Graphical User Interfaces (GUIs) that interact with
the GnuPG library for encryption, decryption and signing functions. While originally used
primarily for encrypting the contents of e-mail messages and attachments from a desktop
client, PGP products have been diversified since 2002 into a set of encryption applications
which can be managed by an optional central policy server. PGP encryption applications
include e-mail and attachments, digital signatures, laptop full disk encryption, file and folder
security, protection for IM sessions, batch file transfer encryption, and protection for files and
folders stored on network servers. There is also a Wordpress plugin available, called wp-
enigform-authentication, that takes advantage of the session management features of Open
PGP.
5. BASE 64
Base64 is a different way of interpreting bits of data in order to transmit that data over
a text-only medium, such as the body of an e-mail. In the standard 8-bit ASCII character set,
there are 256 characters that are used to format text . However, only a fraction of these
characters are actually printable and readable when you are looking at them onscreen, or
sending them in an e- mail. We need a way to convert unreadable characters into readable
characters, do something with them (i.e. send them in an e- mail), and convert them back to
their original format.

How do you convert unreadable, nonprintable characters into readable, printable


characters? There are many ways to do this, but the way we are covering now is by using
base64 encoding. The 256 characters in the ASCII character set are numbered 0 through 255.
For the tech savvy, this is the same as 28, 8 binary placeholders, or a byte. So for any ASCII
character, you simply need one byte to represent this data. As far as a computer is concerned,
there is no difference between an ASCII character, and a number between 0 and 255 (which
is a string of 8 binary placeholders), only how it is interpreted. Because we are now detached
from ASCII characters, you can also apply these same techniques to binary data, for example,
a picture, or executable file. All you are doing is interpreting data one byte at a time.
The problem with representing data one byte at a time in a readable manner is that
there are not 256 readable characters in the ASCII character set, so we cannot print a
character for each of the 256 combinations that a byte can offer. So we need to take a
different approach to looking at the bits in a byte. So what if instead of looking at a whole
byte, we looked at half of a byte, or 4 bits (also known as a nibble) at a time. This would be
entirely possible because 24 is equal to 16, and there are certainly sixteen readable characters
that we could use to represent each variation of nibble.
The problem with using hex, is that since you are using one ASCII character (which
is, remember, one byte long in storage space) to represent every four bits, anything you
translate into hex will be exactly twice as big as the original data. This might not seem like a
problem for a small message, but imagine you are trying to send an image or executable. The
original size of perhaps a megabyte or more is now doubled. Sending this over email or a
slow Internet connection will take twice as long.

5.1 BASE64 AS AN ALTERNATIVE

We now know that using 16 different characters to represent each half byte is a viable
option, but not our ideal option because it is only half as space efficient as a byte. So how
else can we dice bytes up to get our goal: readable characters for any value of 0 to 255?
Instead of looking at one byte at a time, and trying to chop that byte up, take several bytes
and see what we can do with them.

Table 5.1
As you can easily see, using three bytes, we have a total of 24 bits. How else can we
chop 24 bits up? If instead of 3 bytes of 8 bits each we use 4 "clumps" of 6 bytes each, what
are we left with? Now we have 26 which equals 64. So now instead of needing 3 instances of
a character that can represent any of 256 different combinations, we now need just 4
instances of a character that can represent any of 64 different combinations. The same bits as
in the above table fit into the table below.

Table 5.2

5.2 BASE 64 ALPHABETS


The particular choice of character set selected for the 64 characters required for the
base varies between implementations. The general rule is to choose a set of 64 characters that
is both part of a subset common to most encodings, and also printable. This combination
leaves the data unlikely to be modified in transit through information systems, such as email,
that were traditionally not 8-bit clean. For example, MIME's Base64 implementation uses
uppercase A-Z (26 characters), lowercase a-z (26characters), 0-9 (10 characters), '+' (1
character) and '/' (1 character). 26 + 26 + 10 + 1 + 1 = 64, just the number we need. As you
can

surmise, base64 is still less space efficient than using a full byte, but instead of hex's double
space usage, base64 uses only one and a third as much space. In other words for every 3
bytes, you must have 4 base64 characters. All of the characters listed above are easily
readable. Other variations, usually derived from Base64, share this property but differ in the
symbols chosen for the last two values

Table 5.3
6. ENCODING INTO BASE 64
6.1 BASE 64 ENCODING ALGORITHM

The Base64 encoding process is to:

1. Divide the input bytes stream into blocks of 3 bytes.


2. Divide 24 bits of each 3-byte block into 4 groups of 6 bits.
3. Map each group of 6 bits to 1 printable character, based on the 6-bit value using the
Base64 character set map.
4. If the last 3-byte block has only 1 byte of input data, pad 2 bytes of zero (\x0000).
After encoding it as a normal block, override the last 2 characters with 2 equal signs
(==), so the decoding process knows 2 bytes of zero were padded.
5. If the last 3-byte block has only 2 bytes of input data, pad 1 byte of zero (\x00). After
encoding it as a normal block, override the last 1 character with 1 equal signs (=), so
the decoding process knows 1 byte of zero was padded.
6. Carriage return (\r) and new line (\n) are inserted into the output character stream.
They will be ignored by the decoding process.

6.2 EXAMPLE

Let's start with something simple, a text-to-base64 conversion. We will convert the
string "Hello World!" to a base64 representation. We will start by getting the ASCII and
binary byte values for each letter.
Table 6.1

For base64, we will be using three bytes at a time. Each ASCII character is one byte, so we
will be working with "Hel", "lo[space]", "Wor", and "ld!" separately. Let's start with the first
three characters:

1. Convert the characters to binary.


2. "Hel" is 01001000 01100101 01101100 in binary. (Notice that there are 24 bits).
3. Convert the 24 bits from three 8 bit groups to four 6 bit groups. 01001000 01100101
01101100 becomes 010010 000110 010101 101100.
4. Convert each of the four 6 bit groups into decimal.

010010 = 18

000110 = 6

010101 = 21

101100 = 44

5. Use each of the four decimals to look up the base64 character code.

18 = 'S'

6 = 'G'

21 = 'V'
44 = 's'

6. You now have your first three ASCII characters ("Hel") encoded as base64

("SGVs").

Follow these steps for the next 9 ASCII characters and you get the following results:

"Hel" = SGVs

"lo[space]" = bG8g

"Wor" = V29y

"ld!" = bGQh

The phrase "Hello World!" has been converted to "SGVsbG8gV29ybGQh". The


original phrase has exactly 12 ASCII characters, and is represented by 16 base64 characters,
exactly one and one third more than the original text. So what happens, you might ask, if you
don't have exact sets of three bytes? What if you had remainder bytes left over? For example,
what if the data you had was "Hello" (5 ASCII characters, 5 bytes)? What if it was "blue" (4
ASCII characters, 4 bytes)? In those cases, you have groups of less than three letters: "Hello"
groups into "Hel" "lo", and "blue" groups into "blu" "e". To handle these cases, we throw one
more readable character into our base64 character list. This character is not in the lookup
table because it is only reserved for the two cases where you have one or two remainder bytes
after grouping. We use the "=" character. Let's start with "Hello".

Table 6.2
Follow the same exact steps for the first three characters as above. Your first three ASCII
characters "Hel" are the same base64 as before "SGVs". For the remaining 2 characters,
follow these steps:

1. Convert the characters into binary.

"lo" is 01101100 01101111 in binary.

2. Starting from the left, separate the bytes into 6 bit chunks as best as

possible.

01101100 01101111 becomes 011011 000110 1111.

As you can see, we still need two more bits for the last group, plus a whole other six
bits for the full four base64 characters. What we need is something looking like 011011
000110 1111xx xxxxxx. We can convert 011011 and 000110 to decimal just fine.

011011 = 27

000110 = 6

1111xx = what?

xxxxxx = what?

To resolve this problem, we fill the last two bits of 1111xx with 0's, so 111100 = 60.
We now have:

011011 = 27

000110 = 6

111100 = 60

xxxxxx = what?

Our base64 characters so far are "bG8". Since we are missing one single complete base64
character, we add one of our special "=" characters to the back to signify that we are missing
one byte. Our complete converted base64 string is now "bg8=". So the word "Hello"
translates to "SGVSbg8=". We do the same thing for the word "blue", which is missing 2
bytes.

Table 6.3

The first three characters should be easy by now to convert. "blu" is 01100010
01101100 01110101. Translate that to 6 bit groups and you get 011000 100110 110001
110101. These convert to "Ymx1" in base64. Now you have one remaining character, "e".
We do the exact same thing as last time. "e" in binary is 01100101. When you split that into
four 6 bit groups, you get the following:

011001 = 25

01xxxx = what?

xxxxxx = what?

xxxxxx = what?

Fill the second group with 0's to be able to look it up. 011001 010000 xxxxxx xxxxxx
becomes "ZQ". Because you were missing two complete bytes, add two of our special
character on the end. So the letter "e" in ASCII becomes "ZQ==". The word "blue" becomes
"Ymx1ZQ==". Note: I said before that base64 encoding is one and one third larger than the
byte representation. In the cases were you are missing a byte, it is actually slightly more than
this. The actual range is from exactly one and one third to one and one third plus two
characters.
7. BASE64 DECODING
We will now tackle translating from base64 characters back into normal bytes. We
will use the same mapping of values (0 through 63) to base64 characters (A-Z, a-z, 0-9, '+',
and '/'). The reverse process is relatively simple now that we know how to perform the
forward operation. Let’s start with the base64 string "YmFzZTY0IGlzIGZ1biEh". Right now,
that makes no sense. We begin the same way, by looking up the value for each base64
character.
T

able 7.1

It is very important to remember that when you are encoding, you use 8 bits for each
character, and when you are decoding you use 6 bits for each character! Once again, we start
by chopping it into smaller pieces and work on each piece. When we are decoding a base64
string into normal bytes, we use 4 characters at a time instead of the 3 we used when
encoding. So our base64 string is broken up from "YmFzZTY0IGlzIGZ1biEh" into "YmFz",
"ZTY0", "IGlz", "IGZ1", and "biEh". Instead of using a number to look up a base64
character, we are now using a base64 character to look up a number. Lets start with our first
group, "YmFz".

1. Convert the base64 characters to binary. (Remember to use 6 bit binary!)


"YmFz" is 011000 100110 000101 110011 in binary.
2. Convert the 24 bits from four 6 bit groups to three 8 bit groups.

011000 100110 000101 110011 becomes 01100010 01100001 01110011.

3. Convert each of the three 8 bit groups into decimal.


01100010 = 98

01100001 = 97

01110011 = 115

4. Use each of the three decimals to look up the ASCII character for that value.

98 = 'b'

97 = 'a'

115 = 's'

You now have your first four base64 characters ("YmFz") decoded as ASCII
("bas").

Follow these steps for the next 16 base64 characters and you get the following results:

"ZTY0" = "e64"

"IGlz" = " is"

"IGZ1" = " fu"

"biEh" = "n!!"

The encoded base64 string "YmFzZTY0IGlzIGZ1biEh" has been decoded to "base64


is fun!!".

We know how to encode bytes when we don't have exact groups of three to work
with. But how do you decode base64 that has our special symbol, "="? It is very similar, you
just have to remember the rules that caused us to use the "=". One thing before we get started:
base64 encoded text will always be in groups of 4 base64 characters; if the number of base64
characters is not divisible by 4 with no remainder, then you have corrupted data. Let's try
decoding a base64 string that contains the "=" symbol. Our string this time will be
"Li4ub3IgbWF5YmUgbm90Lg==". The first thing we do is divide this up into groups of four
characters. "Li4ub3IgbWF5YmUgbm90Lg==" becomes "Li4u", "b3Ig", "bWF5", "YmUg",
bm90", and "Lg==". The first five quartets are decoded in the exact same manner. We just
need to learn what to do for the last quartet, "Lg==". Remember what the "="s mean: one "="
means that we were missing one whole byte when we encoded the data, two "="s means that
we were missing two whole bytes when we encoded the data. We begin in the same way as
before.

1. Begin by converting the base64 characters to their base64 values.

'L' = 11

'g' = 32

'=' = nothing

'=' = nothing

2. Convert the values to binary.

11 = 001011

32 = 100000

nothing = xxxxxx (just to call it something)

nothing = xxxxxx (just to call it something)

3. Convert the four 6 bit groups into three 8 bit groups.

001011 100000 xxxxxx xxxxxx becomes 00101110 0000xxxx xxxxxxxx.

We know that because we had two "="s at the end, that we were missing two
complete bytes in the original data. Remember where we had to add zeros when we encoded
into base64? Those are the zeros you see in the second 8 bit group ("0000xxxx"). Because
each of these 8 bit groups represents one byte from the original data, and we know that we
are missing two whole bytes, we discard the last two 8 bit groups, "0000xxxx" and
"xxxxxxx". So the only data we now need to worry about is the first byte, 00101110. We
convert this value to decimal.
00101110 = 46

We convert the 46 to ASCII and we get the character '.' and add this to the other data that we
have decoded.

"Li4u" = "..."

"b3Ig" = "or "

"bWF5" = "may"

"Y mUg" = "be "

"bm90" = "not"

"Lg==" = "."

Our final decoded string is "...or maybe not."

8. APPLICATIONS
8.1 URL APPLICATIONS

Base64 encoding can be helpful when fairly lengthy identifying information is used in
an HTTP environment. For example, a database persistence framework for Java objects might
use Base64 encoding to encode a relatively large unique id (generally 128-bit ) into a string
for use as an HTTP parameter in HTTP forms or HTTP GET URLs. Also, many applications
need to encode binary data in a way that is convenient for inclusion in URLs, including in
hidden web form fields, and Base64 is a convenient encoding to render them in not only a
compact way, but in a relatively unreadable one when trying to obscure the nature of data
from a casual human observer.

Using standard Base64 in URL requires encoding of '+' and '/' characters into special
percent-encoded hexadecimal sequences ('+' = '%2B' and '/' = '%2F'), which makes the string
unnecessarily longer.

For this reason, a modified Base64 for URL variant exists, where no padding '=' will
be used, and the '+' and '/' characters of standard Base64 are respectively replaced by '-' and
'_', so that using URL encoders/decoders are no longer necessary and have no impact on the
length of the encoded value, leaving the same encoded form intact for use in relational
databases, web forms, and object identifiers in general.

8.2 FOR PRIVACY PROTECTION SYSTEMS

Base 64 encoding is commonly used by Proxy Web Sites or Anonymous websites to


encode the website url . I.e. these sites will hide the names of the sites we are visiting and
protect our privacy. These sites are commonly used by people all around the world to bypass
country restrictions.

These systems use base 64 encoding to encrypt the page url there by providing
security to the users. Figure represents such a site which uses base 64 encoding on the
address.
Figure 8.1

Figure 8.2
Figure 8.1 shows a privacy protection website using base 64 encoding. Figure 8.2 shows the
base 64 encoded url of www.google.com by using base 64 encoding.

8.3 PROGRAM IDENTIFIERS

There are other variants that use '_-' or '._' when the Base64 variant string must be
used within valid identifiers for programs.

8.4 XML

XML identifiers and name tokens are encoded using two variants:

1. '.-' for use in XML name tokens (Nmtoken), or even


2. '_:' for use in more restricted XML identifiers (Name).

8.5 REGULAR EXPRESSIONS

Another variant called modified Base64 for regexps uses '!-' instead of '*-' to replace
the standard Base64 '+/', because both '+' and '*' may be reserved for regular expressions
(note that '[]' used in the IRCu variant above would not work in that context).

8.6 OTHER APPLICATIONS

Base64 can be used in a variety of contexts:

1. Evolution and Thunderbird use Base64 to obfuscate e-mail passwords


2. Base64 can be used to transmit and store text that might otherwise cause delimiter
collision
3. Base64 is often used as a quick but insecure shortcut to obscure secrets without
incurring the overhead of cryptographic key management
4. Base64 is used to store a password hash computed with crypt in the /etc/passwd
5. Spammers use Base64 to evade basic anti-spamming tools, which often do not decode
Base64 and therefore cannot detect keywords in encoded messages.
6. Base64 is used to encode character strings in LDIF files
7. Base64 is often used to embed binary data in an XML file, using a syntax similar to
<data encoding="base64">…</data> e.g. favicons in Firefox's bookmarks.html.
8. Base64 is used to encode binary files such as images within scripts, to avoid
depending on external files.
9. The data URI scheme can use Base64 to represent file contents. For instance,
background images can be specified in a CSS stylesheet file as data: URIs, instead of
being supplied in separate image files.
9. ADVANTAGES
1. Base64 encoding provides a successful method of transmitting unreadable data over
the internet
2. It provides security to emails, attachments etc
3. Base64 encoding can be easily implemented using java programs
4. Base64 provides privacy while browsing the internet by using URL encoding
5. Base64 is the most common algorithm used over the internet, web applications etc for
the transfer of data providing security
10. CONCLUTION

Base64 is a different way of interpreting bits of data in order to transmit that data over
A text-only medium, such as the body of an e-mail. As from this seminar we can
understand that base 64 is the most economical and easy and secure method of creating an
en encoded data that can be easily transmitted over the internet. Base 64 will encode the
data into a different format that can’t be understood by a normal user. Base 64 provides
different advantages and applications. It provides security to emails, attachments
etc,Base64 encoding can be easily implemented using java programs,Base64 provides
privacy while browsing the internet by using URL encoding etc are some important
applications. There are different implementations for base 64 encoding. These include
UTF 7, Privacy Enhanced Mail, Open PGP and MIME.
11. REFERENCES

1. Base 64 , Chris Melnick ,www.aardwulf.com

2. Hack Proofing Your Network: Second Edition , David R. Mirza Ahmad &

Ido Dubrawsky pages 166-195

3. " MIME: A Portable and Robust Multimedia Format for Internet Mail" by Nathaniel
S. Borenstein <[email protected]> for Multimedia Systems journal

4. http://www.openpgp.org/

5. http://tools.ietf.org/html/rfc1421.html

6. http://tools.ietf.org/html/rfc1422.html

7. http://tools.ietf.org/html/rfc3748.html

8. http://tools.ietf.org/html/rfc3548.html

9. http://tools.ietf.org/html/rfc4648.html

10. http://www.scribd.com/doc/6945825/Base64-Encoding-Torn-Apart.html

11. http://www.herongyang.com/encoding/Base64-Encoding.html

12. http://en.wikipedia.org/wiki/Privacy-enhanced_Electronic_Mail

13. http://en.wikipedia.org/wiki/ASCII

14. http://en.wikipedia.org/wiki/MIME

15. http://www.wisegeek.com/what-is-encryption.html

You might also like