Monitoring SIP Traffic Using
Support Vector Machines
Mohamed Nassar, Radu State, Olivier Festor
(nassar, state, festor)@loria.fr
MADYNES Team
INRIA, Nancy Grand Est
17 September 2008
Outline
• Introduction to SIP
• Threats
• Monitoring system
• Experiments
• Future works and Conclusion
2/25
SIP Hard phone
Soft phone
[email protected]
[email protected]
• SIP (Session Initiation Protocol -
INVITE (SDP (U-Law) )
RFC 3261) Text-based like HTTP 5060 5060
100 Trying
5060 5060
• Request + response = transaction 5060
180 Ringing
5060
200 OK (SDP (A-Law))
5060 5060
• URI = ACK
sip:user@host:port;parameters 5060 5060
RTP (A-Law)
10502 34154
RTP (A-Law)
10502 34154
BYE
5060 5060
200 OK
5060 5060
3/25
DNS
SIP Trapezoid Database
server server
IP address of
SIP service at
berlin.org Where Alice
is registered?
INVITE sip:[email protected]
Proxy server
Proxy server
INVITE
INVITE sip:[email protected] SIP/2.0 sip:[email protected]
Via: SIP/2.0/UDP loria.nancy.org:5060;branch=z9hG4bKfw19b
Max-Forwards: 70
To: Alice <sip:[email protected]>
From: Bob <sip:[email protected]>;tag=76341
Bob Call-ID: [email protected]
CSeq: 1 INVITE Alice
Contact: <sip:[email protected]>
Content-Type: application/sdp
<SDP body not shown>
4/25
Threats in the VoIP domain
Misrepresentation Discovering the
Messages not Resulting in
Unwanted calls identity to obtain users extensions in
compliant to protocol resource
for personal information a VoIP domain
specifications exhaustion
telemarketing of the
and advertising target
Displaying a number Brute force
different than the originating one voice-mail and Resulting in premature session
register-account tear down or service abuse
password cracking
5/25
DoS
Using invalid destination domains with 100 Invite/second
• Flooding attacks target the signaling plane elements (e.g. proxy, gateway, etc.) with
the objective to take them down or to limit their quality, reliability and availability
Strategy Destination
Legitimate SIP messages A valid URI in the target domain
Malformed SIP messages A non existent URI in the target domain
Invalid SIP messages A URI with an invalid domain or IP address
Spoofed SIP messages An invalid URI in another domain
CPU-based attacks targeting the
authentication process
A valid URI in another domain.
6/25
SPIT or SPam over Internet Telephony
• Like SPAM (cost-free) but more annoying (phone ringing all the
day, interruption of work)
• Expected to become a severe issue with the large deployment of
VoIP services
• SPIT transactions are technically correct
• We don’t know the content until the phone rings
• We need to be reachable
• SPAM filtering solutions are not directly applicable
• Current approaches: multi-level grey list, Turing tests, Trust
management, VoIP SEAL from NEC, VoIP SPAM detector from
University of North Texas
*From winnipeg.ca
7/25
Monitoring Approach
SIP Flow Queue is full
Vector
Events
(Features) Event Correlator/
Queue Processor Classifier
Decider
Update Couples
(vector, Class Id)
Border
Effect Learning
False Alarms
Positives
Normal Attack Normal
Period
8/25
Monitoring System
Learning
Selected
features Couples Alarms
Adjustment
Short-term (vector, Class
window Id)
Vector Recovery
Analyser Classifier algorithm
(Features)
Flood detection
SIP Start/Stop
flow Long-term
window Events
Vector Event Correlator
Analyser Classifier
(Features) / Decider
Update Couples
(vector, Class
• Short-term/long-term monitoring Learning Id) Alarms
• Count-related/chronological windows
• Different classification and anomaly detection techniques
• Learning-updating/ testing
• Defense against manipulation attacks (poisoning)
• Feature selection and extraction
• Event correlation
• Prevention
9/25
Why SVM ?
Kernel
Function
(Radial Basis, Linear, polynomial, sigmoid )
• Known to process high dimensional data
• Classification, regression and exploration of data
• High performance in many domains (Bioinformatics, pattern recognition) and in network-
based intrusion detection as well
•Unsupervised Learning
10/25
Feature Selection
• We have 38 Features INVITE (SDP)
characterizing the SIP
traffic
100 Inter request arrival
• Distributed over 5 groups:
1. General statistics Inter SDP arrival
2. Call-ID based statistics OPTIONS
Inter response arrival
3. Dialog final state
distribution
200 OK (SDP)
4. Request distribution
Inter request arrival
5. Response distribution
Inter response arrival
• We take into account 200 OK
inbound and outbound
messages
ACK
• Other features can be
investigated as well
•Average inter request arrival
• Features must be
•Average inter response arrival
characterized by a small
extraction complexity •Average inter SDP arrival
• Our feature extraction tool •Number of request / total number of messages
is written in Java using the •Number of responses /total number of messages
Jain SIP parser •Number of SDP/ total number of messages
•Number of messages having the same Call-ID
11/25
Traces and testbed
Real World
VoIP service
provider
12/25
VoIP specific bots
Launches attacks
Asterisk
Available from www.loria.fr/ Cisco
Linksys
Victim
~nassar Thomson,
Grandstream
DoS
VoIP Bot SPIT
commands
Retrieves exploit
Malicious user
VoIP Bot Upload Exploit code
Web server
With dynamic DNS
SIP
IRC IRC RTP
HTTP
Manager IRC server / VoIP Agent
channel
13/25
Experiments
Classification time < 1s
Trace Normal DoS KIF Unknown
SIP pkts 57960 6076 2305 7033
Duration(min) 8.6 3.1 50.9 83.7
14/25
Normal Data Coherence Test
Day 1
Day 1
Day 1
Day 2
15/25
Monitoring Window Size
The overall
trace is about
8.6 minutes
and message
arrival is
about 147
Msg/s
16/25
Feature selection
17/25
Feature Selection
• Greater number of features doesn’t mean
higher accuracy
• Feature selection increases the accuracy
and the performance of the system
• Selected features are highly dependent on
the underlying traffic and the attacks to be
detected
• A preliminary approach combines F-score
and SVM
18/25
Flooding Detection
Background traffic ~ 147 Msg/sec
Window = 30 messages
A N
t
Attack
Period
19/25
Selected Features for Flooding /
Short Term Monitoring
Number Name
F-score
11 NbReceivers
14 NbCALLSET
20 NbInv
4 NbSdp
2 NbReq
3 NbResp
13 NbNOTACALL
12 AvMsg
20/25
SPIT Detection
Background traffic ~ 147 Msg/sec
Window = 30 messages
False Positive = 0 %
A N
t
Attack
Period
21/25
Selected Features for SPIT /
Long Term Monitoring
Number Name
16 NbRejected F-score
4 NbSdp
20 NbInv
23 NbAck
36 Nb4xx
34 Nb2xx
7 AvInterSdp
35 Nb3xx
13 NbNOTACALL
22/25
Event Correlation
Predicate SPIT Intensity
10 Distributed positives in a 2 minutes Low (Stealthy)
period
Multiple Series of 5 successive Medium
Positives
Multiple Series of 10 successive High
Positives
23/25
Conclusion and Future works
• Online monitoring methodology is proposed based on
SVM learning machine
• Offline experiments shows real-time performance and
high detection accuracy
• Anomaly detection and unsupervised learning approach
are future works
• Studying traces of other VoIP attacks
• More investigation about the set of features and the
selection algorithms
• Extending the event correlation framework in order to
reveal attack strategies and attacker plan recognition
24/25
Annex
25/25
Features
Group 1 - General Statistics
1 Duration Total time of the slice
2 NbReq # of requests / Total # of messages
3 NbResp # of responses / Total # of messages
4 NbSdp # of messages carrying SDP / Total # of
messages
5 AvInterReq Average inter arrival of requests
6 AvInterResp Average inter arrival of responses
7 AvInterSdp Average inter arrival of messages carrying SDP
bodies
26/25
Features
Group2 - Call-Id based statistics
8 NbSess # of different Call-IDs
9 AvDuration Average duration of a Call-ID
10 NbSenders # of different senders / Total # of Call-IDs
11 NbReceivers # of different receivers / Total # of Call-IDs
12 AvMsg Average # of messages per Call-ID
27/25
Features
Group 3 – Dialogs’ Final State Distribution
13 NbNOTACALL # of NOTACALL/ Total # of Call-ID
14 NbCALLSET # of CALLSET/ Total # of Call-ID
15 NbCANCELED # of CANCELED/ Total # of Call-ID
16 NbREJECTED # of REJECTED/ Total # of Call-ID
17 NbINCALL # of INCALL/ Total # of Call-ID
18 NbCOMPLETED # of COMPLETE/ Total # of Call-ID
19 NbRESIDUE # of RESIDUE/ Total # of Call-ID
28/25
Features
Group 4 – Request Distribution
20 NbInv # of INVITE / Total # of requests
21 NbReg # of REGISTER/ Total # of requests
22 NbBye # of BYE/ Total # of requests
23 NbAck # of ACK/ Total # of requests
24 NbCan # of CANCEL/ Total # of requests
25 NbOpt # of OPTIONS / Total # of requests
26 NbRef # of REFER/ Total # of requests
27 NbSub # of SUBSCRIBE/ Total # of requests
28 NbNot # of NOTIFY/ Total # of requests
29 NbMes # of MESSAGE/ Total # of requests
30 NbInf # of INFO/ Total # of requests
31 NbPra # of PRACK/ Total # of requests
32 NbUpd # of UPDATE/ Total # of requests
29/25
Features
Group5 – Response Distribution
33 Nb1xx # of Informational responses / Total # of
responses
34 Nb2xx # of Success responses / Total # of responses
35 Nb3xx # of Redirection responses / Total # of
responses
36 Nb4xx # of Client error responses / Total # of
responses
37 Nb5xx # of Server error responses / Total # of
responses
38 Nb6xx # of Global error responses / Total # of
responses
30/25
Phreaking by social engineering scheme
I am a technician doing a
Gateway
test, please transfer me to SIP / PSTN
that operator by dialing 9 0 #
and hang up
Trudy
IP PSTN
network network
Bob has a
contract to
make phone
calls towards
the PSTN
31/25
Machine Learning
• Pros
– Better accuracy, small false alarm rate
– Compact representation
– Detecting Novelty
• Cons
– Embedding of network data in metric spaces
– Difficulty of getting labels
– Vulnerable to malicious noise
– Huge data volumes
32/25
*From Wikipedia
33/25
Traces
• Call Setup is a small fraction of the signaling traffic
• Some empty messages are used as Ping or KeepALive for device
management
• Some messages throw parsing exceptions
34/25
Traces
• OPTIONS and REGISTER messages are the most numerous
• MESSAGE, PRACK and UPDATE are absent
• The number of NOTIFY is constant over the time (messages automatically generated at fixed rate)
• #INVITE/#BYE = 2.15 (Not every INVITE result s in a BYE e.g. callee is busy, retransmission, re-
INVITE)
•#INVITE/#ACK = 0.92 (Some INVITE are acknowledged twice)
35/25
Traces
• The most numerous is the 2xx family (in response to
REGISTER and OPTIONS messages)
• #INVITE/#1xx = 0.59 (Probably a 100 Trying and 180
Ringing for each INVITE)
36/25
Traces
• Average Inter-request = Average Inter Response = 20 ms
• Average inter-request with SDP bodies is inversely proportional
to the #INVITE, BYE, ACK and 1xx (which are only used in call-
setup)
•Average inter-request carrying SDP reaches 3s in quiet hours and
0.5s in rush hours which reveals a high call-setup traffic
37/25
LibSVM
38/25