TorScan: Attacks on Anonymity in Tor
TorScan: Attacks on Anonymity in Tor
Introduction
Anonymity clearly was not a concern when the Internet Protocol was designed.
Hence it comes as no surprise that internet communications are traceable. Today, the consequences of linking your traffic profile to your persona vary: it
ranges from ISPs selling your aggregated web browsing history to marketers in
democratic countries to being imprisoned for criticizing the government online in
countries with repressive regimes. For many people, the first approach to hiding
their identity is a public proxy server. This however is no panacea: the owner
of the proxy can be forced to reveal any logs potentially stored or even worse,
the server may turn out to be a honeypot of the organization you are trying to
hide from.
A better solution is to forward traffic through a chain of network nodes,
so-called relays. This idea was developed by David Chaum in 1981 in his seminal paper [3] on mix networks. A mix is a fundamental building block in an
anonymity network that hides the input/output relationship of the messages
that pass through it. A number of implementations of mix networks have been
built, notably Mixmaster and Mixminion for making email communication untraceable; moreover mix networks are used in a number of e-voting protocols. To
guarantee good anonymity, a mix network usually delays messages and inserts
chaff. This however is not acceptable for low-latency communications.
In 1996, Goldschlag, Reed and Syverson [5] presented Onion Routing, a design limiting traffic analysis on low-latency communication that was inspired
by Chaums mix networks. Tor is the refined successor of the original Onion
Routing Project. The Tor network is a low-latency anonymity network which at
the time of writing comprised of 2500-3000 routers with an estimated number of
daily users (unique IPs) exceeding 400,000. Tor tries hard to achieve low traffic
latency to provide a good user experience, thus sacrificing some anonymity for
performance. To keep latency low and network throughput high, Tor relays do
not delay incoming messages and do not use padding.
One way to undermine the anonymity of a Tor user is to reveal the pair of
the corresponding entry and exit node; this is supposed to be hard. Once the
correspondence between the entry and exit nodes is known, the anonymity of the
observed connection is reduced to the case of two known sequentially connected
proxies, or to the case of a single proxy if the attacker controls the exit node.
Though this will not allow us to immediately determine the actual originator of
the connection, this is already a significant information leak because triplets of
guard nodes can serve as unique user identifiers within the Tor network, and also
because knowing the entry node tells the attacker where to target next. Namely,
other attack may be launched to compromise the entry node, or the entry nodes
operator/ISP could be presented with legal demands to reveal the network logs.
Given that the exit node is known, the probability of correctly guessing the entry
node is n1 , where n is the number of guards in the Tor network. For an adversary
with less visibility than a global passive adversary and a fully connected network,
increasing this probability is far from straightforward. Still in reality, not all entry
and exit nodes are connected via three hop paths (which is default for Tor) at
a given point of time. This observation can become the basis of several novel
attacks on Tor, as will be shown in the paper: The main contributions of this
paper are:
(i) We present two ways to reveal the connectivity of nodes in the Tor network:
one using canonical connections which are a part of the Tor specification; the
other is a more generic technique, namely a timing attack on the connection
establishment between two relays.
(ii) We present novel attacks which are based on the connectivity scanning approach. The first attack allows to identify the guard node which was used in
a circuit carrying a long-lived connection such as an SSH session or a large
file download. The second attack, which we have chosen to call differential
scan attack, uses recurrent connections to reveal all guard nodes of a user.
(iii) We give some guidance on countermeasures that can be implemented to
make the Tor network more resilient to leakage of topology information.
The rest of the paper is organized as follows: in the next section, we summarize
aspects of the Tor specification which are relevant for the connectivity scanning
techniques and for the description of our attacks. Thereafter we give a short
overview of previous attacks on Tor. We describe our techniques for revealing
the connectivity of Tor relays in Section 3. In Section 4.1, we describe our attack
on long-lived streams. The differential scan attack is described in Section 4.2.
An analysis of the attacks is performed in Section 5. We discuss the potential
countermeasures in Section 6 and conclude in Section 7.
Background
the long-lived stream. In other words, a circuit is not destroyed until at least one
stream is attached to it. In a similar way, a TLS connection between two Tor
relays is not closed if it carries at least one circuit. A TLS connection without
circuits between two Tor routers lives for three minutes. There is one exception
to the rule. A circuit which has never carried a stream (a clean circuit2 ) lives for
1 hour.
When a pair of Tor routers or a Tor router and a client have several circuits
between them, they try to tunnel them over a single TLS connection. In Figure
1 communication between two Tor routers is shown. The routers use a single
TLS connection (which is also called Onion Routing connection) which carries a
number of circuits, two in this picture (which may belong to different end users).
Multiple streams of one user may be multiplexed over a single circuit.
TLS Connection
Circuit
Stream
Stream
Stream
Circuit
R1
R2
Stream
Stream
2.1
Related papers
attacker to have the global view of the network which is needed by a number of
passive traffic analysis attacks. In addition, the attacks presented in this paper
are orthogonal to the previous attacks and thus can be used to improve some
existing attacks making them more practical by reducing the traffic costs, or the
number of monitored nodes (for ex. Murdochs attack [8]). Finally, the attacks
presented here do not rely on the details of a particular user application or
protocol.
Consider an attacker who wants to link the exit and the guard node of a circuit
and thus decrease the anonymity of the user. Given the Tor network connectivity information, she can determine possible 3-hop paths from the exit node to
the set of guard nodes and eliminate those which are impossible, thus already
decreasing the claimed anonymity of Tor network to some extent. However, the
decrease of anonymity depends on the connectivity of the exit router as well as
on the connectivity of its adjacent routers. Even for the low bandwidth routers,
connectivity at a given point in time can be as high as 120-300. For routers
from the set of 10% fastest routers, the connectivity may be higher than 1500.
Thus, exploiting Tor topology at just one point in time may not be sufficient.
A much more efficient way would be to observe Tor connectivity changes over
time. Indeed, an application that requires a persistent connection, will force the
routers in the circuit to maintain a connection between them for the applications lifetime at least. An attacker who wants to trace such a communication
needs to observe the exit node for a while and eliminate routers which it looses
connections to. On the other hand, if users application drops a connection, an
attacker may observe a new defect in the topology and link this defect with
the users application (note that if the attacker controls the exit node, she can
cause the connection to drop.) In this way, we come to a simple but powerful
idea: observation of local Tor network connectivity dynamics gives us a way to
decrease the anonymity provided by Tor. More specifically, to trace long-lived
(or persistent) connections and to reveal short-lived connections.
3.1
We will now show how an attacker can scan a Tor relay to find out what TLS
connections are established between it and other relays. To explain how this
works, we first have to delve into details of the Tor specification. In order to
prevent an attacker to force a relay to open a new TLS connection for each
extend request, a Tor relay uses an existing connection (if any) corresponding
to the fingerprint specified in the extend request no matter what IP address was
indicated. This could potentially allow a malicious party to perform a man-inthe-middle attack. For the two relays R1 , R2 , the attacker would send an extend
request with a forged IP address X to R1 before other circuits (and hence a
connection) are established between R1 and R2 . If the machine at IP address
X were then to connect to R2 and forward all of the traffic it received from
R1 to R2 and vice versa, it could perform a byte-counting attack. To prevent
this from happening, Tor uses a countermeasure called canonical connections.
Briefly, a connection to a router is canonical if the destination IP address of
this connection corresponds to the one in the consensus. If a Tor relay gets an
extend request with a fingerprint, it should use an existing canonical connection
corresponding to this fingerprint.
We noticed that Canonical connections give an attacker a convenient way to
determine how routers in the Tor network are connected to each other. When
sending a RELAY EXTEND cell, the circuit originator specifies both the identity
fingerprint and the IP address of the router he wants to extend the circuit to.
Assume that the attacker wants to figure out whether a router A is connected
to a router B. In order to do this, the attacker forges a Tor RELAY EXTEND cell
with the fingerprint of router B and [Link] with an unreachable port (port
1 for example) and sends it to router A. When the cell is received, the reaction
of router A depends on whether it has a connection to router B:
If A has a canonical connection to B (it should be noted that if a connection
exists it is almost always canonical), router A ignores the IP address from the
forged RELAY EXTEND cell and uses the already established TLS connection,
extends the circuit and sends back RELAY EXTENDED cell.
If A does not have a connection to B then it tries to make a new TLS
connection using the address from the received cell. Obviously, the connection attempt is refused which causes router A to send a DESTROY cell to the
attacker.
By inspecting the cell the attacker receives back from router A, she can determine
whether router A is connected to router B. Evidently, the attacker can probe
router A for connection with any router contained in the consensus3 .
3.2
By coincidence, this scanning technique can not only be used to scan the connectivity
of a Tor router, but also to scan for open ports on random IP addresses from a relay
that has an all-reject exit-policy.
7
TCP: SYN
R1
R1
TCP: ACK
R1
TLS: ClientHello
R1
R1
R1
R1
R1
TCP: ACK
TLS: ClientCertificate, ClientKeyExchange, ChangeCipher Spec, Finished
TCP: ACK
Tor: VERSIONS
Tor: CREATED
R2
R2
R2
R2
R2
R2
R2
Tor: NETINFO
Tor: CREATE
R1
R2
R2
R1
R1
R2
R1
R1
R1
R1
R2
R2
R1
R2
R2
R2
R1
R1
R1
R2
R2
TCP: ACK
TCP: ACK
R1
TCP: SYN|ACK
R2
R2
R2
Fig. 2. Tor circuit setup. The last two steps are performed always. Steps marked with
dashed lines are performed only when there is no TLS-connection between R1 and R2 .
circuit creation until the CREATE cell is received by R1 . Approximately 6.5 round
trips are required for the TLS connection setup alone, another round-trip for
the v2 handshake. By sending multiple RELAY EXTEND requests and comparing
the time it takes for the first one to arrive versus subsequent ones, we can
determine whether a relay is connected to another relay. This has been confirmed
with experiments. The disadvantage of this method is that network jitter as
well as cell forwarding delays by the relay scanned can add significant amounts
of noise which makes the method less reliable. Moreover, in contrast to the
method described in the previous subsection, this method will really establish
TLS connections to all routers that are scanned and not just prolong the lifetimes
of the connections that are already open.
4
4.1
Tor is used by many people to establish long-lived SSH sessions, download very
large files (sometimes using file-sharing applications, even though this is frowned
upon) and to communicate over instant messaging networks. The latter usage
of Tor is particularly important for countries with repressive regimes such as
China, Iran, or Syria: people are regularly sent to prison or worse for statements critical of their government. The use-cases described above imply longlived TCP-streams which necessarily create long-lived TLS-connections between
Tor routers which are used to carry the stream. Thus, we show how an attacker
knowing the exit node of a long-lived TCP-stream can link it with the guard
node using our scanning techniques4 .
One-Hop Attack In this attack, we assume that the attacker controls one or
more very fast exit routers which see a significant fraction of the traffic exiting
the Tor network, thus she gets access to pseudonyms of the users (ex. cookies,
logins). This is not an unrealistic scenario; some organizations have control over
sizable portions of the total exit traffic: according to the consensus current at the
time of writing this paper, 7.2% of total exit capacity were provided by the Chaos
Computer Club, 5.9% by [Link] and 5.4% by Formless Networking LLC.
The attacker is curious to connect the pseudonyms with the guard triplets for
the users that pass through her Exit relays. Assume that one of the attackers
nodes E (see Figure 3) is selected as the exit node of a circuit. By looking at
the traffic pattern, the attacker will be able to infer that the connection to the
exit node is likely to be of long-lived type. The attacker then starts the attack:
1. The attacker starts scanning the middle node M for connectivity using either
of the techniques described in the previous section. The set of connected
nodes necessarily includes the guard node G in question and makes up its
initial anonymity set.
2. Next, the attacker continues with the connectivity scanning of the middle
node for several hour or even days in hope that the majority of the nodes of
the initial anonymity set will disconnect (nodes with dash lines on Figure 3.)
3. The attack stops when the anonymity set of the guard node is considerably
reduced or when the user closes the long-lived TCP-stream.
When the attack is finished, the users guard node will be contained in
the resulting anonymity set (node G and another node with the solid line on
Figure 3) along with some number of other connections that can be considered
as noise. The attacker may also infer extra information from the speed of the
connection, which will indicate whether the middle or the guard node are the
bottleneck for the traffic of the long-lived circuit; this helps her to further shrink
the set of candidates for the guard node since it allows to discard very active
routers from the list of candidate guard nodes.
4
One important note is that in the current Tor protocol, the connections between
two routers which last more than 7 days are marked as bad for new circuits and
no new circuits can be added to such connections. However persistent circuits inside
these connections are not closed and will continue running. At the same time we
cannot see these persistent OR connections anymore using our probing techniques
after 7 days have elapsed.
Two-Hop Attack This attack does not required from the attacker to control
any relays in the Tor network and can be performed by a server (or an attacker
close to the server) who tries to reveal the guard nodes of pseudonymous users
connecting to the server. The attack starts from connectivity scanning of the
exit node (similar to one-hop attack) in order to reduce the anonymity set of
the middle node. After having narrowed down the set sufficiently, the candidate
middle nodes are scanned resulting in the anonymity set of the guard node. The
attack might be successful if either middle or guard nodes are low-bandwidth
which might be inferred from the connection latency by the attacker. We also
assume that exit node is medium or low-bandwidth. The difficulty in the two-hop
attack comes from the fact that many middle nodes reachable from the exit node
would come from a set of active routers with many connections. This will result
in hundreds of candidate guard nodes even after several days of scanning. This
effect happens due to immortalconnections formed between active routers,
which we will describe in Section 5. In spite of its simplicity, the described
attack is quite powerful since:
(i) it does not require control over any relays in the Tor network. The attacker
merely probes relays (probing could be also done from a distributed set of
addresses);
(ii) it is cheap in terms of bandwidth: in order to scan one router the aggregated
amount of traffic that needs to be sent and received is less than 5 MBytes
(for the current size of 3000 routers in Tor network);
(iii) it is fast: the average time of scanning one router is 20 seconds and scanning
of different routers can be easily parallelized.
Experimental results In order to estimate how efficient the attacks can be in
the wild, we used Python to implement a rudimentary Tor client which provides
basic functionality. The client can establish a TLS connection to an arbitrary
Tor router, complete Diffie-Hellman key establishment protocol and send and
receive Tor relay cells. In other words, the client is able to create and extend
arbitrary chosen circuits. Using canonical connectivity scanning, our client is
able to check a Tor router for connectivity with 99% of other routers in the Tor
network in less than 30 seconds.
10
In order to check the correctness of the proposed canonical connectivity scanning, we scanned two routers under our control omicron and Layercake for five
days from February 11th until February 16th, 2012. During the experiment the
routers had bandwidth weights in the range [500 - 1500] for omicron and in
the range [15000-55000] for layercake which means that the later was in the
top 10% set of fastest and thus most frequently chosen routers. Both relays had
Guard flags and did not have Exit flags. Since the routers were operated by us,
we could gather the real time statistics directly from them using the Tor control port. We then compared the results from the canonical connectivity scan
and from the control port. Figure 5 shows the number of persistently connected
Tor routers over time, i.e. those routers which were connected to our routers at
the start of the experiment and never disconnected during the experiment. The
close match of the results as shown on Figure 5 demonstrates that canonical connections scanning provides reliable results. The slight difference in the results
is explained by the difference of scanning frequency: for canonical connection
scanning, each sample cannot be taken faster than every three minutes (i.e. the
lifetime of an idle Tor TLS connection); the data from the routers control port
however was fetched every ten seconds. According to Figure 5, for the router
with bandwidth weight 1500 (omicron), the number of persistently connected
routers decayed from 303 to 20 in just 12 hours. This matches with our prediction from Section 5.1. It then took 4 days for another 18 routers to disconnect.
Among the remaining two connections, there was one which we established by
ourselves and which we tried to identify. The decay rate of persistent connections
of the high-bandwidth router (layercake) looks similar: the number of persistent
connections drops sharply from 1116 to 300 in 12 hours and then decays slowly.
We tested canonical connection scanning against several Tor routers not under our control. The result for one such router with bandwidth weight in range
[2040-2190] is shown on Figure 6. We observed a very similar behaviour: a big
chunk of connections drop quickly, and then it decays slowly. After two days of
scanning, we found 12 persistent connections.
1200
1500-bw-router, Control-Port-measurements
1500-bw-router, Canonical-Connections-Probing
36000-bw-router, Control-Port-Measurements
36000-bw-router, Canonical-Connections-Probing
400
800
600
400
200
0
11Feb14:18
2170-bw-router, Canonical-connections-probing
350
Number of connections
Number of connections
1000
300
250
200
150
100
50
12Feb14:18
13Feb14:18
14Feb14:18
15Feb14:18
Time
0
13Feb09:23
13Feb21:26
14Feb09:30
14Feb21:31
15Feb09:34
Time
11
4.2
Attack description Consider user which periodically checks some Web server
or a web service that instructs the users browser to periodically re-establish
streams. Google Mail for instance builds a series of short-lived (around 2 minutes)
TCP sessions. Another example are news web sites with auto-refresh contents.
In this section, we describe an attack on such kind of recurrent connections. The
aim of the attacker is to find at least one of the guard nodes of a pseudonymous
user (identified by a cookie or a login credential) that uses such a service for
several days. Note that this attack does not require a single long-lived circuit
or session. It just requires that a Tor client is connected to the Tor network for
non-negligible amount of time within the span of a month (as long as the guards
are still valid).
Similar to Section 4.1, in this attack, the attacker has control over a significant
fraction of the exit capacity of the Tor network. Assume that a user visits a Web
server S (see Figure 4) that causes recurrent connections to occur. Ten minutes
after the first connection, his initial circuit should expire and the users Tor client
will try to build a new circuit. Given a sufficient number of exit nodes controlled
by the attacker, the circuit will include one of the attackers exit nodes E. Once
the exit node receives incoming traffic destined to the web server it executes the
following sequence of steps:
1. The exit node E observing the stream to the web server determines the
middle node M of the circuit that caused the stream to be established and
transmits it to the attacker.
2. The attacker probes the connectivity of M and remembers the list of routers
connected to it (nodes connected to M both with dash and solid lines on
Figure 4).
3. E sends a DESTROY cell5 down the circuit which leads to the circuit termination. The circuit termination may lead to the connection termination
between the middle node and the users guard node with some probability
which can be estimated using expressions from Section 5.2.
4. The attacker waits for three minutes and starts the scan of M again.
5. The attacker computes the difference between the sets obtained via the first
and the second scans, i.e. he determines connections which were present in
the first list but absent in the second (node G and another node with dash
line.) We say that we have a differential with node G and M if G is in the
difference.
6. The attacker then repeats steps 1-3 each time one of her exit nodes is chosen
for the recurrent connection.
7. Once an attacker has performed the above steps often enough, and given
that the circuit closure event caused the connections closure frequently, she
can derive the users three guard nodes: the probability of having the guard
node in the difference should converge to 1/3.
5
if the attacker wants to be more stealthy she can just wait until the circuit expires
by itself
12
This attack may be further enhanced by scanning the full network at regular and
frequent intervals. Then if the connection to the malicious Exit arrives shortly
after the full network scan, the attacker will have additional differential connectivity information in order to filter the noise. Our experiments have shown that
the full network scan can be done in 3 minutes using 20 hosts (using Amazon
EC2 service, a day of full network scans with 3 minutes between scans costs
around 80 USD).
A similar but less stealthy approach can be used to track any users connection. Assume that a user connecting to a server chose one of the attackers exit
node. This allows the attacker to incorporate a small piece of code in each HTML
document requested by the user, which artificially creates recurrent connections.
Specifically the user can be redirected to an arbitrary address and port. Note
that in the current Tor network, aggregated exit bandwidth for different port is
different, thus by choosing the appropriate port range, the attacker can increase
the probability that her exit node is chosen: at the time of the experiment total
exit capacity was approximately 5 106 Kbytes/s, the bandwidth capacity of
scarce ports6 was about 1.2 106 Kbytes/s.
Experimental results We have implemented a proof of concept version of our
differential scanning technique and have tested it using sets of paths generated
by a modified version of the Tor client this client does not create any circuits
but simply outputs randomly generated paths with user-specified constraints.
These paths are then used to build circuits through the control port of the Tor
daemon. After a circuit has been built, a scan is conducted, then the circuit
is torn down, the program waits for 200 seconds and scans again. To perform
experiments more quickly we have implemented this in a parallelized manner
on Amazons EC2 platform so that many (non-interfering) experiments can be
conducted in parallel. As a first experiment, we used only one guard node with
capacity of 36500 and allowed for middle nodes with capacity of 1600 or lower in
the consensus7 . For 150 paths, 125 successful differential scans were performed.
In these, the guard node we had selected appeared at position 1 in the list of most
frequently occurring guard relays in the difference sets, having been counted 58
times.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
C37B234FAD013453B90375EB55864FEBC876104A:
CA1CF70F4E6AF9172E6E743AC5F1E918FFE2B476:
0B7ED44C67DBE50313F0B32BD335D093D0474CE8:
847B1F850344D7876491A54892F904934E4EB85D:
DB8C6D8E0D51A42BDDA81A9B8A735B41B2CF95D1:
173B220F9F32F39086D5661274A47485EDA26131:
1603DFE9FC373ECDA39046FADB5A76B87A4BA36B:
1F52D692FA2C21B23FAD4D711A7BF17BAE2673DF:
47916CAB5878C810E7EF71A316D37FC823CC7F52:
95A0D58710EA9B61DAD3A01CAD3BE77DACA76BEF:
58
35
33
31
30
29
27
26
26
25
(PPrivCom052) bw=36500
(spfTOR3) bw=29800
(bauruine2) bw=117000
(tor26) bw=20
(rainbowwarrior) bw=81300
(TorExitProgressbar9) bw=650
(StickItToTheMan) bw=46800
(alice) bw=7170
(CCN) bw=53100
(OccupyMyPants) bw=30300
This shows that differential probing works in practice: theres a drastic reduction in the anonymity set of the guard nodes, even for high capacity guard
6
7
13
nodes. Below is the concrete data of one of the experiments in which we had
chosen guards of capacity 300, 412, and 501, constrained the capacity of the
middle nodes to 30,000 and scanned different middle nodes in 134 trials8 :
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
A58E0F05C1939725D7247BA60BA3135DB88209BC:
D3378ABA009078158DB59E8B36B8EBB88B309BA7:
2629979FD21BF3B522E818B73F6F8D0B5D8A5CF0:
A9C039A5FD02FCA06303DCFAABE25C5912C63B26:
FA486415B86D28CD047D10F76768E4E88A182F71:
131B60B9AFE6AEA60042132D648798534ABEA07E:
4536ED68D9DB4B2FF532AD43A632AAF600B798CC:
1D8625690AB9729FB2040D8194EC0D6789A4D092:
FC35DE87F6E4022693323275F6B8EE5F72FD21B5:
CA1CF70F4E6AF9172E6E743AC5F1E918FFE2B476:
43
40
40
29
28
28
27
25
24
23
(jefOlewkia), bw = 501
(torn0t), bw = 412
(tapir), bw = 300
(chaoscomputerclub5), bw = 173000
(ZhangPoland1), bw = 56400
(wagtail), bw = 24400
(Unnamed), bw = 116
(TOR1CINIPAC), bw = 43900
(Unzane), bw = 3160
(spfTOR3), bw = 28700
5.1
Long-lived connections
In section 4.1, one could notice that after a relatively short period of scanning
time, a substantial number of routers which were connected at the beginning of
the scan disconnected. The decay rate of the number of persistently connected
routers becomes very low after. In other words, when the number of connections
drops to some value, the reduction rate of the anonymity set of the guard node
an attacker is trying to identify becomes negligible. This value can be considered
8
14
as a threshold for this attack which we try to estimate in this section. There are
two main reasons of why a connection between a pair of routers may last very
long and thus increase the anonymity set:
1. this pair of routers is a part of a long-lived circuit similar to the one the
attacker tries to identify, i.e. a circuit which is used for an application which
requires a long-lived TCP-stream;
2. the circuit creation rate over this connection is high and there is always at
least one circuit inside this connection which prevents it from closing. Such
immortal connections form if the product of bandwidths of the two routers
exceeds a certain threshold as will be shown below.
First, we estimate the number of connection of the first type. Figures 7 and
8 show circuit duration distributions over a connection between two high bandwidth routers (layercake with bandwidth weight of 35300 and bouazizi with
bandwidth weight of 69700 for 13 of Feb 2012). Figures 9 and 10 show circuit
duration distributions over a connection between a high bandwidth router and
a non-high bandwidth router (omicron with bandwidth 491 for 13 of Feb 2012
and layercake). Life-times of circuits have two clear peaks at around 10 and 60
minutes due to properties of the Tor protocol: renewal time of dirty circuits
and the lifetime of clean circuits which have never been marked as dirty.
According to the measurement, circuits with life-time longer than 2 hours constitute less than 1.5% of the total number of circuits. From this we can assume
that the majority of long-lived connections in Tor are of the seconds type, i.e.
formed by high circuits creation rate over these connections. Another observation is that the anonymity set of persistent streams is small compared to the
anonymity set of non-persistent streams.
100
3.5
3
80
Circuits, %
Circuits, %
2.5
2
1.5
60
40
1
20
0.5
0
0
0
600
1200
1800
2400
3000
3600
4200
4800
5400
6000
600
1200
1800
2400
3000
3600
4200
4800
5400
6000
15
100
3.5
3
80
Circuits, %
Circuits, %
2.5
2
1.5
60
40
1
20
0.5
0
0
0
600
1200
1800
2400
3000
3600
4200
4800
5400
6000
600
1200 1800 2400 3000 3600 4200 4800 5400 6000 6600
circuit per ten seconds gathered during two days on one of our active routers.
We observed that:
Circuits arrive according to the non-homogeneous Poisson process.
Assuming that client circuit arrival rate is proportional to the guard routers
bandwidth, we estimate an average circuit arrival rate R in the whole Tor
network to be about 900 circuits per second (not at peak times). In the
expressions below one can also use the value of circuit arrival rate for the
specific time of the day instead of the average value.
The average circuit duration time tavg is about 200 seconds which varies
only slightly for routers with different bandwidth weights.
400
100
80
250
200
150
20
Probability, %
300
Probability, %
350
60
40
100
50
0
25Jan14:16 25Jan22:16 26Jan06:16 26Jan14:17 26Jan22:17 27Jan06:17
Time, GMT
Fig. 11. Circuit arrival rate for an active high bandwidth router
10
5
0
20
15
25000
50000
We now estimate the probability that a pair of routers A and B is connected with almost immortal connection. Note that a TLS-connection between
Tor relays is closed only if no circuits were carried over this connection for three
16
minutes. In other words, for a connection to stay open, the time between arrivals
of two consecutive circuits should not exceed the average circuit duration plus 3
minutes. Denote by t the time of the attack. Then during this time, t R pa,b
new circuits will arrive. Here pa,b is the probability of routers A and B to form
an edge in a new circuit9 .
bwa bwb
1
1
pa,b = 2
+
,
bwtotal bwguards
bwexit
where bwguards is the total bandwidth of guard nodes, bwexit is the total bandwidth of exit nodes, bwtotal is the total bandwidth of the whole Tor network, bwa
and bwb are bandwidths of routers A and B respectively. Taking into account
that circuits arrive according to the Poisson distribution, the probability to have
an immortal connection can be computed using the following expression:
Pimmortal (A, B) = (1 eR(tavg +tidle )pa,b )tRpa,b ,
where tidle = 180 seconds. A connection between A and B almost never closes if
Pimmortal (A, B) is close to 1. Using this expression we find that immortal connections are formed between routers of bandwidth > 17, 500 (or routers with
product of bandwidths above 300 million). By bandwidth we mean not the advertised bandwidth but actual figures from the Consensus computed by Tor authorities bandwidth measurements and used in the Tor code to choose routers
for the circuits. Given the bandwidth of a router, an attacker can estimate the
number of immortal connections that it has and decide whether it is worthwhile
to perform the attack.
Figure 14 shows complementary cumulative bandwidth distribution of Tor
relays along with the share (i.e. the percentage of total number of Tor relays) of
persistent connections for each bandwidth10 . Note that bandwidth distribution of
Tor relays changed only slightly during March and February 2012. For example,
if an attacker decides to scan a Tor relay with bandwidth weight of 5000, she
can expect that this relay has about 1% of immortal connections. Given 3000
Tor relays, this yields the anonymity set of 30 relays. This kind of prediction
corresponds well with the experimental results obtained in section 4.1. If bw <
1300, the attack should give the unique solution11 . Note that although only few
routers have large percentage of immortal connections, these routers are highbandwidth and and are selected more frequently.
9
10
11
This expression for pa,b is an approximation since it does not take into account
all peculiarities of the Tor path selection algorithm, in particular, the expression
ignores weights which are assigned to a relay based on its position in the circuit and
its flags. We compared our approximation with the precise calculation and found
that simpler approximation is sufficient for our purposes and makes the analysis
easier to understand.
Note that bandwidth distribution can be approximated by the Pareto distribution
with minimal value xm = 350 and exponent = 0.85.
For 11th of February 17:00, 2012, there were 2388 nodes out of 2897 with bandwidth
less than 1300. Their aggregated capacity was 371,159 out of 9,458,556 total capacity
of the whole Tor network.
17
In order to give a first order approximation of how long we should wait until
a persistent connection is detectable among other non-immortal connections,
we collected connection duration statistics from Tor routers operated by us for
7 days12 . Figure 13 shows the connection duration distribution for two pairs of
routers: medium-to-medium bandwidth (lower curve, in green), medium-to-high
bandwidth (in red). For medium-to-high only 5% of connections have duration
of more than three hours. In the case of medium-to-medium bandwidth routers
(see Fig. 13), only 5% of connections between them have duration of more than
1 hour. In ten hours, 99% of all non-immortal connections should disconnect for
both cases. Thus, we expect that if a persistent connection under observation has
a duration of more then 10 hours, the probability of its successful identification
depends mostly on the number of immortal connections.
100
40
30
% of routers
Connections, %
BW distribution
Percentage of immortal connections
35
80
60
40
25
20
15
10
20
5
0
10
11
30000
60000
90000
120000
150000
180000
210000
5.2
12
In this section, we explore the limits of the differential scan attack. Assume that
an attacker tries to reveal a guard node g by observing circuits {c1 , ..., ck } which
leads to scanning of a set of middle nodes M = {mc1 , mc2 , ..., mck }. Let T denote
the set of all Tor relays and |T | = n. Then we define d : M T {0, 1} in the
following way:
1 if we observed a differential between Tor relays mci and r for circuit ci
d(mci , r) =
0 otherwise.
Pk
The success of the attack depends on: (1) Signal = i=1 d(mci , g), i.e. number
Pk
of differentials with guard node g , and (2) N oiserj = i=1 d(mci , rj ), number
of differentials with some other Tor relay rj , j = 1, ..., n. We then use signalto-noise ratio SN R = maxjSignal
{N oiser } as a measure of the success of the attack.
j
12
The logs we obtained were stored on computers with full-disk encryption behind the
firewall of our academic institution.
18
We first estimate the Signal and Prob[d(mci , g) = 1]. Denote by t0 the time
when ci was destroyed. d(mci , g1 ) = 1 iff the connection which carried ci closes
3 minutes after ci is destroyed. This happens if no new circuit with duration t
arrives during [t0 t; t0 ] and no circuits arrive during [t0 ; t0 + tidle ]. Let f (t) be
the probability density distribution of the circuit duration. Then given that the
circuits arrive according Poisson distribution, we have:
R
Rpa,b tf (t)dt
Prob[d(mci , g1 ) = 1] = e 0
eRpa,b tidle = eRpa,b (tavg +tidle ) ,
(1)
where R is the current circuit arrival rate of the whole Tor network, and pa,b is
the probability of router A and B to form an edge in a circuit (see Section 5.1).
To estimate the N oise and Prob[d(mci , r) = 1] for some Tor relay r 6= g we
use the following approach: d(mci , r) = 1 if: (a) at the time of the first scan,
there is a connection between mci and r; (b) there is no connection at the time
of the second scan. We find the probability of the second event using (1). We
compute the probability of the first event as the ratio of the average gap between
connections and the average duration of the connection. Note that the average
delay between two circuits given that it is larger than tavg + tidle is computed
as:
R
t a,b ea,b t dt
1
tavg +tidle
,
(2)
= tavg + tidle +
I=
a,b
ea,b (tavg +tidle )
where a,b = R pa,b . Thus, the probability that there is a connection between
A and B at an arbitrary point of time is:
1 ea,b (tavg +tidle )
I (tavg + tidle )
ea,b (tavg +tidle )
=1
I
a,b (tavg + tidle ) + 1
Thus:
Prob[d(m, r) = 1] =
To demonstrate how the above expressions work, we used the set of 125 middle nodes from the experiment described in Section 4.2 with bandwidth weights
equal or less then 1600. Figure 15 shows: (a) the Signal of the guard node against
its bandwidth. (b) the Noise of a Tor relay against its bandwidth. As can be
seen from the figure, for low-bandwidth nodes the signal is close to its maximum
value. This happens since for this type of node, the probability that the connection between it and a middle node carries just one circuit is very high. Low
circuit arrival rate of a low-bandwidth relay also implies the low value of noise
since the probability to have a connection between it and a middle node is low.
In this paper, we have shown two ways to extract topology information of the
Tor network. One way to determine the real connectivity of Tor relays is to
19
140
Signal
Noise
Number of dierentials
120
100
80
60
40
20
0
30000
60000
90000
120000
150000
180000
210000
Bandwidth weight
20
number of Tor routers grew, those attacks became too expensive in terms of
required bandwidth and time. This is because for those attacks to be successful,
exhaustive probing of each link in the Tor network was required. Given a way
to determine the real connectivity of Tor network, these attacks can become
practical again since the amount of links to be probed is significantly reduced.
Conclusion
All prior research on Tor assumed opacity of the Tor network topology meaning
that the attacker had to assume a fully connected graph. In practice, the real
degree of a node in this graph is substantially smaller than its maximum at any
given point in time. For the first time, we have shown methods to determine the
real connectivity of relays in the Tor network and the dynamics of the topology
of the whole Tor network. Based on this, we described several novel attacks that
use this information to deanonymize the entry points of the users into the Tor
network.
Acknowledgement
References
ller, U., and Stiglic, A. Traffic analysis attacks and trade-offs in
1. Back, A., Mo
anonymity providing systems. In Proceedings of the 4th International Workshop on
Information Hiding (London, UK, UK, 2001), IHW 01, Springer-Verlag, pp. 245
257.
2. Bissias, G. D., Liberatore, M., Jensen, D., and Levine, B. N. Privacy vulnerabilities in encrypted http streams. In In Proceedings of Privacy Enhancing
Technologies Workshop (PET 2005 (2005), pp. 111.
3. Chaum, D. Untraceable electronic mail, return addresses, and digital pseudonyms.
Communications of the ACM 24, 2 (1981), 8488.
4. Danezis, G. The traffic analysis of continuous-time mixes. In In Proceedings of
Privacy Enhancing Technologies workshop (PET 2004), LNCS (2004), pp. 3550.
5. Goldschlag, D. M., Reed, M. G., and Syverson, P. F. Hiding routing information. In Information Hiding 1996 (1996), R. J. Anderson, Ed., vol. 1174 of
Lecture Notes in Computer Science, Springer, pp. 137150.
6. Levine, B. N., Reiter, M. K., Wang, C., and Wright, M. Timing attacks in
low-latency mix systems. In Proceedings of Financial Crypto 2004 (2004), vol. 3110
of LNCS, Springer, pp. 251265.
7. Manils, P., Chaabane, A., le Blond, S., Kaafar, M., Castelluccia,
C., Legout, A., and Dabbous, W. Compromising tor anonymity exploiting
p2p information leakage. In Technical Report 00471556, INRIA, April 2010.
[Link]
8. Murdoch, S. J., and Danezis, G. Low-cost traffic analysis of Tor. In In Proceedings of the 2005 IEEE Symposium on Security and Privacy. IEEE CS (2005),
pp. 183195.
21
9. Panchenko, A., Niessen, L., , and Zinnen, A. Website fingerprinting in onion
routing based anonymization networks. ACM, pp. 110.
10. Serjantov, A., and Sewell, P. Passive attack analysis for connection-based
anonymity systems. In In Proceedings of European Symposium on Research in
Computer Security (ESORICS (2003), pp. 116131.
11. Wang, X., Chen, S., and Jajodia, S. Network flow watermarking attack on
low-latency anonymous communication systems. In Proceedings of the 2007 IEEE
Symposium on Security and Privacy (Washington, DC, USA, 2007), SP 07, IEEE
Computer Society, pp. 116130.
12. Wang, X., and Reeves, D. S. Robust correlation of encrypted attack traffic
through stepping stones by manipulation of interpacket delays. In Proceedings of
the 10th ACM conference on Computer and communications security (New York,
NY, USA, 2003), CCS 03, ACM, pp. 2029.
13. Yu, W., Fu, X., Graham, S., Xuan, D., and Zhao, W. Dsss-based flow marking
technique for invisible traceback. In Proceedings of the 2007 IEEE Symposium
on Security and Privacy (Washington, DC, USA, 2007), SP 07, IEEE Computer
Society, pp. 1832.
14. Zhu, Y., Fu, X., Graham, B., Bettati, R., and Zhao, W. On flow correlation attacks and countermeasures in mix networks. In in Proceedings of Privacy
Enhancing Technologies workshop (2004), pp. 2628.