0% found this document useful (0 votes)
14 views4 pages

Internet Technologies in Depth The Technique of Spam Recognition Based On Header Investigating

The document discusses email technologies and a technique for recognizing spam based on analyzing email headers. It provides background on email architecture and protocols, describes how email is delivered, and the format of email messages. The proposed technique examines email headers to detect spam without additional tools or processing.

Uploaded by

ayoub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views4 pages

Internet Technologies in Depth The Technique of Spam Recognition Based On Header Investigating

The document discusses email technologies and a technique for recognizing spam based on analyzing email headers. It provides background on email architecture and protocols, describes how email is delivered, and the format of email messages. The proposed technique examines email headers to detect spam without additional tools or processing.

Uploaded by

ayoub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Internet Technologies in Depth.

The Technique of
Spam Recognition Based on Header Investigating
Dr. Abzetdin Adamov
Chief Information Officer / Head of Computer Engineering Department,
Qafqaz University, Baku, Azerbaijan
[email protected]

Abstract – E-mail is most effective business and personal At the same time, because of Internet message concept and
communication tool. The popularity, openness and wide availability architecture, the most important features of which are simplicity,
of this Internet service makes it attractive for advertising of openness, compatibility, standardization, this service is
products and services by sending unsolicited e-mails (Spam). The vulnerable for security threats. The most prevalent of them is
goal of paper is to offer a comprehensive and usable technique to
spam or unsolicited e-mails. The spam continue to evolve
recognize spam that helps to detect and protect users from junk
email, fraudulent e-mail threats and viruses. While widespread becoming more complex and making harder to stop it. Today,
methods are complex and expensive, proposed technique is based on spam is not just unwanted e-mail, it’s also security problems,
header investigating without additional tools and hard processing. viruses, phishing, and other malware. Fortunately, recent years
computer professionals and business have intensified the fight
Keywords - Internet technologies, e-mail architecture, spam, spam against spam and spammers. And yet it is still a serious problem
recognition, which costs businesses tens of billions and continues to rise from
year to year [2]. There are several popular methods for spam
I. INTERNET MESSAGE AS COMMUNICATION TOOL AND SPAM detection and prevention like email filtering based on the content
of the email, DNS-based blackhole lists (DNSBL), greylisting,
The asynchronous nature of e-mail provides convenience and spamtraps, etc. [3]. However, most of them require advanced
more effective use of time for communication participants. In knowledge, special software, or a lot processing time. In contrast
contrast to immediate communication means like telephone, to them, technique proposed in this paper does not require any
email is deferred type of communication. So, instead of software or special experience. This technique makes it possible
immediate reaction, recipients now have the comfort to read, to examine e-mail using any e-mail client software, or even with
interpret and react on received information later, or do nothing if webmail (gmail, yahoo mail). Since this technique is based on
no action is required [1]. email header investigating, it’s necessary to observe e-mail
Because of mentioned and other advantages of email architecture, format and meaning of e-mail headers.
communication, the popularity of email as the communication
means for business and personal use has risen steadily over the II. EMAIL GENERAL STRUCTURE
last decade. The following Figure 1. shows rising popularity of
the email communication over the last years and some prediction As other Internet services the email system based on Internet
for future. standards and some dedicated protocols. There are a lot of
different email protocols implemented by different email servers,
however, some of them are common for all email servers and
email clients:
1. Basic email format standard [7] (RFC 5322)
2. Multipurpose Internet Mail Extensions (MIME) standard
3. Simple Message Transfer Protocol (SMTP)
4. Post Office Protocol (POP3) or Internet Message Access
Protocol (IMAP)

Fig. 1. Email using progress by years


III. EMAIL PHYSICAL ARCHITECTURE AND PROTOCOLS 6. If the authentication module accepts eligibility of the user, the
email is downloaded to the user’s email agent.
The general email architecture consists of two core components
and protocols those enable transfer of the electronic text
messages between them. The first component is the Email Agent
(or email client), which allow users to receive, read, create, and
send email. The second component is the Mail Server (or
Message Transfer Agent), which is responsible for a message
delivery from the source to the destination. As it was mentioned
above, there are two key protocols of email system. The SMTP
protocol determines the process of message transferring from the
source mail server to the destination mail server. The POP3 (or
IMAP) protocol defines the process of message retrieval from
destination (receiver) mail server to the client’s email agent. The Fig 3. Detailed structure of email delivery
software applications developed in accordance to these protocols
were named as SMTP and POP3 Internet services, and they V. THE INTERNET MASSAGE (EMAIL) FORMAT
actually, form Mail Server itself. The interaction of email
components and protocols enable this interaction is show in The first Internet message standard was described by [5] in 1977,
Figure 2. which was renewed by [6] in 1982 had been using for almost
twenty years. The newest email standard is described in [7] was
published in 2008.
According to the last standard the Internet message (or email)
consists of an envelope and content (for further more information
see [8]). This is illustrated in Figure 4. “a”. The envelope, which
is part of SMTP protocol, can be viewed as container of message
and has information about from whom the message originated
Fig 2.General architecture of email system and protocols (sender) and to whom it is destined (recipient or list of
recipients). The existence of sender’s information is necessary to
IV. HOW EMAIL DELIVERY WORKS be able to send back the error message if the message delivery is
failed. The envelope is a temporary container created by source
The email delivery is a whole process of massage transfer from mail server just before passing the message to the destination
the source to the destination. The Figure 3. shows this process in mail server, as is shown in Figure 4. “b”. By the time a message
detail. Let see the process step by step: has been delivered to a recipient’s mailbox there is no envelope.
1. Using email agent the sender is submitted email for
[email protected].
2. The SMTP service of the mail server received sender’s
message resolves the email domain “b.com”. To do so the
mail server using DNS service (see DNS resolving at [4])
asks the NS server of b.com for the MX record. The MX
record specifies the mail server, which is destined to gets all
emails with domain name b.com. The name of such a male
server is in our example is mail.b.com.
3. Email is routed to the receiver’s mail server mail.b.com.
4. The SMTP service of mail.b.com places the email into
recipient’s mailbox “smith” in the mail store.
Fig 4. Email format and envelope concept
5. The recipient checks for email for user [email protected] using
the POP3 service of his email agent. To be able to access to
There is no inherent relationship between recipients’ addresses in
mailbox user has to pass authentication process of the POP3
the envelope and the addresses in the header section (such as To,
service.
Cc, Bcc), however according to [8] (RFC 5321) appropriate The level of importance of each header field in message
header fields can be used to form recipient(s) list. That can be formation is different. For example, any internet message must
imagined like the postal mail with the destination address on the include From: and Date: fields, and should include Message-ID:
envelope, at the same time it may have address on the top of the and In-Reply-To:. The rest of fields are optional or are managed
message within envelope, which does not make sense for automatically by mail servers. The one of the most important
delivery. It is why, sometimes recipient receives message even if headers Received: is deserved to be reviewed in more detailed
he can’t find his address within the recipients’ list. way. This header significantly simplifies the fight against spam
and spammers. When we receive unsolicited bulk email, our
VI. EMAIL HEADER INVESTIGATING AND SPAM RECOGNITION email agent program normally shows only the standard To:,
From:, Subject:, and Date: headers, as for any other email. At the
The content of email includes header fields and message body. same time, the From: address may appear to be from someone
The meaning of the header fields is to provide receiver’s email we well know, or from some organization whose name we
agent with descriptive information about message, such as respect or trust. In reality these spoofed messages do not
sender, receiver, date, subject, etc. The header block contains originate from the address that appears in the From: header. To
several textual lines each of which presents syntax: “header title: see the real address message was sent from, it is necessary to
value” (look at Figure 4. “a”). The body separated from header control Received: filed, which tells us the route the message took
fields by empty line, contains textual information the sender is when it was sent to us.
sending to the recipient. The primary header fields specified by Now we will try to understand how to find original source of the
[7] (RFC 5322) are shown in Table 1. suspicion email through investigating the email header. To do so,
firstly we need to be able to see the full email header. Generally,
TABLE I all email client programs (even webmail services like Gmail,
INTERNET MESSAGE HEADER FIELDS
Yahoo, etc.) have appropriate function to display full header of
any message in your inbox. Let see the header of message I have
Header Description received recently is shown in Figure 5.
From: The name and email address of the message originator
The local date and time when the message was written or 1. Delivered-To: [email protected]
Date: 2. Return-Path:
sent
<SRS0=M78ycc=RT=p3slh174.shr.phx3.secureserver.net=
Machine readable unique identifier generated by mail
[email protected]>
Message-ID: server; designated to prevent multiple delivery, and to use as Received: ……………………
3.
reference in In-Reply-To
Used for reply messages only, and contain Message-ID of 4. Received: by 10.220.162.197 with SMTP id w5cs344529vcx;
In-Reply-To:
the original message(s), creating relational tree of messages Sun, 17 Oct 2010 05:24:20 -0700 (PDT)
To: Email address(es) of the primary recipient(s) 5. Received: from bosmailscan05.eigbox.net ([10.20.15.5])
Email address(es) of the secondary recipient(s). Generally, by bosmailout03.eigbox.net with esmtp (Exim) id
1P7SHj-0007rH-Qy
Cc: used to indicate recipients whose don’t have immediate
for [email protected]; Sun, 17 Oct 2010 08:24:19 -
relation to the matter, however should be informed
0400
Same as Cc, but hidden from recipients. SMTP removes this 6. Received: from p3slh174.shr.phx3.secureserver.net
Bcc:
header field before delivering of the message (localhost.localdomain [127.0.0.1])
Subject: Textual human readable summary of message by p3slh174.shr.phx3.secureserver.net
MIME type of the message content, designed for email (8.12.11.20060308/8.12.11) with ESMTP id o9HCOF7n030063
Content Type: for <[email protected]>; Sun, 17 Oct 2010
agent to display message properly
Contain information about all mail servers that were 05:24:15 -0700
Received: 7. Received: (from lindaadleen2@localhost)
involved in the message delivery
by p3slh174.shr.phx3.secureserver.net
Like In-Reply-To uses Message-ID(s), but designed to (8.12.11.20060308/8.12.11/Submit) id o9HCOEvK030054;
References:
identify a thread of correspondence Sun, 17 Oct 2010 05:24:14 -0700 Date: Sun, 17 Oct 2010
Keywords: Keywords specified by sender 05:24:14 -0700
Email address should be used when recipient replies to 8. Message-Id:
Reply-To: <[email protected].
message
This header indicates the email address of message’s sender. secureserver.net>
9. To: [email protected]
Return-Path: The value of this header has to be same as “From” address
10. Subject: xxxxxxxxxxxxxxxxx!!!!!
of the SMTP Envelope
11. From: [email protected]
Delivered-To: The email address of recipient
Actual sender of the message (generally, used address listed
Sender: Fig. 5. Email header investigation to find the original source of spoofed message
in the From)
The header has been slightly modified by removing most eleven In order to enhance the reliability of email, it is crucial to be able
Receive: fields. The Receive: headers appear in reverse order. So, to verify addresses in From: and To: headers. The verification
the first Receive: header from bottom (see line 7) presents the method based on headers investigation makes it possible to
original source of the message. The line “from distinguish wanted email from spam (junk, bulk, unsolicited)
lindaadleen2@localhost” shows information about computer the email with quite high level of accuracy.
message was sent from. Probably, spammer uses SMTP service
installed on his computer in order to send bulk mail. The next REFERENCES
line shows the name of the first mail server involved in delivery
process “p3slh174.shr.phx3.secureserver.net”, the exact date and [1] Value-Added Services for Next Generation Networks, Thierry Van de Velde,
Auerbach Publication, 2008
time of receiving, and unique id assigned by server to this [2] Ferris Research: Cost of Spam, http://www.ferris.com/research-
message. The id is unique for particular mail server and can be library/industry-statistics/
used for tracking of the message. The two headers To: (see line [3] Shawn Hernan; James R. Cutler; David Harris (1997-11-25). "I-005c: E-Mail
9) indicates to whom the message is sent and Delivered-To: (see Spamming countermeasures: Detection and prevention of E-Mail
spamming". Computer Incident Advisory Capability Information Bulletins.
line 1) indicates by who it is received, are supposed to be same. United States Department of Energy. Retrieved 2007-01-06.
Furthermore, other two headers From: (see line 11) and Return- [4] Abzetdin Adamov, Neglected point of Internet performance. How to choose
Path: (see line 2) are also supposed to be same. The fact that they the right DNS service, http://aadamov.wordpress.com/
are not same testifies the spam nature of the message. [5] RFC 733, Standard for the format of ARPA network text messages, 21
November 1977, http://www.ietf.org/rfc/rfc0733.txt
[6] RFC 822, Standard for the format of ARPA internet text messages, August
CONCLUSION 13, 1982, http://www.ietf.org/rfc/rfc0822.txt
[7] RFC 5322, Internet Message Format, October 2008,
The increasing popularity of e-mail-based communication http://tools.ietf.org/html/rfc5322
without significant change in architecture makes this tool [8] RFC 5321, Simple Mail Transfer Protocol, October 2008,
http://tools.ietf.org/html/rfc5321)
vulnerable to many styles of attack.

You might also like