Intrusion Detection
Intrusion Detection
-log(False negative)
False negative
Approaches to Intrusion Detection
• Signature-based
– What is bad, is known
– What is not bad, is good
• Anomaly detection
– What is usual, is good
– What is unusual, is bad
• Specification-based detection
– What is correct, is good
– What is incorrect, is bad
Signature-based IDS
• An attack fits an attack pattern
• E.g.,
– Attack CVE-2017-9841 arrives in a HTTP request: “GET
/vendor/phpunit/phpunit/src/Util/PHP/[Link]”
– How?
• System reads web server logs
• If the string “GET /vendor/phpunit/phpunit/src/Util/PHP/[Link]” is found, then an alert is raised,
– e.g., an email is sent
– A dashboard indicates a signature match/attack
• Pros:
– Obvious: If you know the signature, you should detect the attack with that signature
– Quick to deploy: When a new attack is detected, often the fastest defense to deploy is one
that uses the signature
– Low false positive (usually)
• Cons:
– Never ending list of signatures
• Need an automated way to building signatures and deploying them
– Unable to defend against unknown attacks
– Unable to defend against variations of the same attack
– Pattern matching systems are complicated and can be attacked
Specification-based IDS
• Specification of software describes how the
software behaves
• If the IDS knows how it should behave, it will
also know when it is behaving incorrectly
Agent-based Data Collection
2. Data is sent 3. Data received by your
through the Internet ingestion systems
Internet
1. Software agent
collects data about
4. Data saved in
the turbine
your DB
Internet
4. Data storage is
1. Attacker has reverse expensive
engineering your agent
and sends huge
amounts of data
Internet
1. Software agent
collects data 4. Data saved in
about the turbine your DB
In the event of a
network failure,
buffer data and 6. Power Brokers Enable
upload when 5. Data process Clean Energy
connectivity is by your state-of-
restored the-art AI
Attack of Agent-based Data Collection
2. Data is sent through
the Internet 3. Data ingestion is
overwhelmed
Internet
4. Data storage is
1. Attacker has reverse expensive
engineering your agent
and send huge
amounts of data
• Example: windows lock out after too many failed login attempts
– Appropriate threshold may depend on non-obvious factors
• Typing skill of users
• If keyboards are US keyboards, and most users are French, typing errors very common
– Dvorak vs. non-Dvorak within the US
– Stealth attack
• Suppose the threshold is 100 failed attempts per day
• Attacker tries a random password 99 times a day for each account
• Maybe make the threshold smaller, but then risk spurious lock-outs
• Difficulties
– Thresholds are difficult to set in general
– Detecting Stealth attacks vs random errors
Threshold metric detailed example: detecting scanning
determined 160
140
120
100
80
Low threshold 20
0
0 200 400 600 800 1000 1200 1400 1600 1800
Attack starts
My Experience
• I use thresholds frequently
• Usually, careful detection is not required. Instead, detection is
obvious
• E.g.,
– My system can handle 1M new users/day.
– Usually, we have less than 100 new users/day
– Set a threshold on 10000 new users/day from a single IP address
Markov Model
• Another statistical based method. But uses powerful
statistical techniques
• Assumption: Past state affects current transition
• Anomalies based upon sequences of events, and not
on occurrence of single event
• Problem: need to train system to establish valid
sequences
– Use known, training data that is not anomalous
– The more training data, the better the model
– Training data should cover all possible normal uses of system
Markov Model of a Web Surfing
0.3 Gallery 0.4
0.699
New session
0.999
Home 0.599 Comments
null
Page page
0.001 0.001
Site
directory 0.001
1
page
The numbers are the probability that a user takes a particular action that leads to the next state
Markov Assumption: the probability of jumping from one state to the next only depends on the current state
User’s sequence of events: Home Page; Gallery; Comments; Gallery; Site directory; Home page; Site directory
normal abnormal
Start page Next page Score (-log(p)) S=max(0,S+score-1) Above threshold 3?
Home page Gallery -log(0.3)=0.5 max(0,0+0.5-1)=0 No
Gallery Comments -log(0.699)=0.1 max(0,0+0.1-1)=0 No
Comment Gallery -log(0.4)=0.4 max(0,0+0.4-1)=0 No
Gallery Site Directory -log(0.001)=3 max(0,0+3-1)=2 No
Site Home page -log(1)=0 max(0,2+0-1)=1 No
Directory
Home page Site Directory -log(0.001)=3 max(1,1+3-1)=3 Yes
Comparison and Contrast
• Signature-Based: Signature (e.g., a string at appears in a log, or a http
request) is used to detect the attack
– Pros: easy, fast to deploy, low false alarm (Often used for prevention)
– Cons: Cannot detect unknown attacks
• Specification-based: what the software is allowed to do is precisely
known. If it does something else, it is labeled as an attack
– Pros: Can detect many known attacks, since only what is specifically known
behaviors are permitted. Low false alarm
– Cons: In practice, it is difficult for the IDS to know what behaviors are possible
and allowed
• Anomaly detection: detects unusual events
– Pros: Can detect unknown attacks
– Cons: High false alarm probability since unknown events are not necessarily an
attack
IDS Architecture
• Basically, an audit system
– Agent like logger; it gathers data for analysis
– Analyze data obtained from the agents according to its internal rules
– Takes some action
• May simply notify security officer
• May activate response mechanism
• Other checks
– Check that patches are up to date
– Check that viruses detection is running (e.g., McAfee virus checking)
– Other rules
• Challenge: There are so many agents that a machine performance is degraded by all the
agents
Remote Agent (as opposed to host-based)
• Agent runs an external machine
• Agent accesses data on monitored machine
– SNMP (Simple Network Management Protocol)
• Allows one to retrieve various types of information about the remote machine
• Agent runs small scripts on monitored machine
– E.g., ssh to a machine, run a few lines of bash code, retrieve and process the
results
• Host-base Agent • Remote Agent
– Can collect a high range of data – Collect only data that is
– Able to monitor and control the accessible from remote API
impact on the host – Difficult to control the impact the
– Requires software to be installed measurement has on the
– No ports need to be opened machine
– No change to firewall or VLAN – No software installed
– Ports must be opened, firewall
and VLANS must be adjusted
– Passwords stored on the
measurement machine
Pseudo/Almost Host-Based
• Take snap-shot of the disk
– Easy in the cloud
• Process data on the disk
– Virus detection
– Installed programs
– System configuration
• Cons
– Cannot detect what is running
Network-Based Agents
• Detects network-oriented attacks by
examining network traffic only
• E.g., see data collected by Wireshark
Host-based IDS Network-based IDS
• The agent must be installed on • The IDS only needs to be installed in LAN
switches or routers,
every host, which is difficult – Not too difficult in a datacenter
– Simple in the cloud
internet
Router
switch
• Traffic between host must pass through a switch • Traffic from/to the internet must pass through a router
• • A monitor can be placed in front of the router
A monitor can be placed at each switch
• • Does not need to be integrated into the router
Must be integrated into the switch
• • Only a single device is needed
Many monitors are required
• • Does not capture all traffic
Difficult to change monitors (since each switch would
be to be changed) • Integrated monitor is can monitor traffic between subnets
• Monitors all traffic
NetFlow
• Standard for monitoring flows
– as oppose to deep packet inspection, which is covered soon
• A flow
– Defined by the end host IPs and the end host ports and protocol
– E.g., when you download a web page, each image is an individual flow
– A skype call is a flow
– Streaming a movie is a flow
details
Wireshark example
• Monitoring IP addresses and TCP ports is most popular and easiest
– Can track flows. Netflow
• Monitoring anything else is referred to deep packet inspection
– Monitor http request: which web page is requested (recall markov chain model of web
surfing)
– Monitor the application: classify the application as web (http), email, bit-torrent, skype, etc.
– Monitor email attachments for malware or phishing
• Requires reconstructing a message from many packets. Requires memory and computational resources
Flow vs. packet level monitoring
Flow level monitoring Packet level monitoring
• Detect TCP-SYN attacks
– Too many TCP-SYNs and not enough TCP-SYN-ACKs • Detect applications
– Too many network error messages
– Detect bit-torrent, skype
• Detect scanning
– Too many connections • Skype is encrypted, by a signature is
– Too many network error messages possible, including that the host connects
• Detect machines running backdoors to a skype server
• Abnormal connections • Detect ARP-based attacks
– Many machines connecting
– from odd places • Detect machines in botnet
– at odd times of day
–
– Detect IRC commands/patterns
E.g., Shamoon
• Attacked 30,000 machines in a Saudi oil company • Malware detection
• Infected machines communicated with a particular
command-and –control server – Check signature (hash or regular
• Easy to detect this behavior
expression) on packet or message
• Detect machines in botnet
–
payload
Track communication: as server, connect to unusual
machines • Phishing detection
• Monitor surfing of unauthorized web sites
– Detect links in email
• Detect access control breach
– Detect that a host is connecting to a machine that it does • Computationally expensive
not have access to
• Access control policy is not applied to machines, but to – Packets cross an enterprise switch at
people/users many Gbs or even Tbs
– Tracking messages requires specialized
Summary
• Host-based vs Network-based
• Switch-based vs router-based
• Flow-based vs deep packet inspection
LAN-based IPS/IDS
• Switch-based
• ASICs
– high throughput
– Deep packet inspection
– Message reconstruction
– Specialized regular expression checking
(checking for patterns)
• Check for .*45*27.*, where. * is any string
• Difficult. Once a 0x45 is detected, some state
is saved. This state needs to be saved for a
long time (e.g., until the end of the message)
• A packet can partially match many patterns
at the same time (causing the pattern
matcher to be overwhelmed)
• Centralized director-notifier (LANsight)
• Identity-based
– User logs in, so the user is known
– Network-based access control based on
user
• Which user can access which systems
– Identity-based IPS/IDS
• A profile (i.e., counts and thresholds) for
each user