Csc523: Analysis of The P2P Bittorrent Protocol: Abram Hindle 0020755 April 16, 2004
Csc523: Analysis of The P2P Bittorrent Protocol: Abram Hindle 0020755 April 16, 2004
1.1 What is BitTorrent Seeder - A Peer who has all the blocks in a torrent.
Choked - A connection is choked if not file data is
BitTorrent is a P2P protocol meant for distributing files. passed through it. Control data may flow but the trans-
The purpose behind BitTorrent is to reduce the bandwidth mission of actual blocks will not.
load for the peer (the seeder) initially sharing the file . Peers
who download from the seeding peer join the network and Interest - indicates whether a peer has blocks which
share their blocks of the file with other clients. Each file is other peers want.
split into blocks so a seeder can distribute blocks among the
downloading peers such that peers can download the blocks Snubbed - A peer acting poorly - not uploading - or
off other peers. Thus a downloader effectively becomes an sending bad control messages, usually disconnected or
uploader. As more peers join they connect to the download- ignored.
ing peers and trade file blocks from them.
To promote sharing of bandwidth a Tit For Tat algorithm 3 Literature Review
is implemented on each peer. This suggests that a peer must
send data to another peer if it expects the other peer to send I read various papers on issues relating P2P. Some re-
data back. Thus to successfully download from a BitTorrent lated to BitTorrent, while other related slightly to analysis
network one has to allocate some of their upstream band- using techniques such as Chernof Bounds or iterated pris-
width to the network otherwise suffer very slow transfers. oners dilemma. Here are some reviews of some of them.
BitTorrent is interesting as it is currently used by many “Analyzing peer-to-peer traffic across large networks”
users to distribute large files. Originally used to distribute by Sen and Wang [10] discussed how P2P traffic actually
legal high quality bootleg recordings of live concerts, Bit- looked on a large network. They analyzed the network traf-
Torrent is now very popular with those who trade televi- fic of an ISP (probably a AT&T owned ISP) and concluded
sion, movies, arcade games, comic books and music ille- the results. They took statistics about packet sources and
gally. BitTorrent is also used heavily in the Linux commu- destinations and reasoned that many of the current p2p net-
nity to distribute like files such as CD images. The popu- work available today can be taken down by the removal of a
larity of BitTorrent is likely due to the control the seed has few important nodes. They also were able to notice the dif-
over the network as well as the ability for a seed to distribute ference p2p users using networks that used supernodes and
a file’s whole size once while. A seeder can post a torrent those which didn’t. They observed the phenomena of less
file on their webserver and say “If you want this file jump than 10% of the hosts being responsible for more than 90%
on to this torrent”. A user downloads the torrent and Bit- of the traffic and content on the file based P2P networks.
Torrent downloads the file. The focus of effort and lack of Some of my research went into how agents act. “Emo-
”always-on” P2P sharing software makes it especially use- tional Pathfinding”, [11] by Donaldson, Park and Lin de-
ful in small community like those based on messageboards. scribed agents who prioritize goals using emotions. Emo-
There are many communities which trade television shows tions were abstracted away to the idea of dominant priori-
and comics through semi-private webboards using BitTor- ties that can increase or decrease in importance and override
rent. Due to BitTorrent’s tit for tat bandwidth sharing poli- other goals. This is to avoid certain erratic and fast acting
1
behaviors which can be detrimental to the agent or the sys- “Scalable Byzantine Agreement” [6] by Lewis and Saia
tem. This paper provided a good basis for what a computer discussed an algorithm to solve the Byzantine Agreement
scientist means by the term “emotional”. It’s almost like using an randomized algorithm. It uses randomness and
smoothing the transitions between priorities. “A Framework
2
* /&
what strange. Wouldn’t it be better to download the first how many peers chose peer .
block from the network as it’d be guaranteed to be there? We want to know how many rounds before a peer has all
Potential issues with using one static block would be that the blocks.
no one on the network wants that block thus you rely on the
optimistic parts of the tit for tat algorithm. By requesting a
Based on coupon collector problem( [7] >
one random block per turn,
0$1 2
5
if =)
we? are offered
where is
?
totally random block you slowly get a block which is likely a constant.
to be rare to other peers such that you can participate in the This would only be true if the distribution was uniform
network effectively. per turn of a block being offered.
Lets skip to Turn where the seeder has uploaded all
the blocks to network.
4.1.2 Rarest Block
2$3
Rarest Blocks are requested by the peers. By rarest we
Assuming a uniform distribution, 0$1 2 is 5a r.v. * which in-
dicates how many& blocks peer has.
3
mean the block that the fewest peers have. By requesting 0>1 2 3 5 . Thus
each peer has blocks left.
rarest blocks, peers try to keep all the blocks available to
If we assume a system where everyone uploads ran-
the network. This reduces the reliance on one peer to host
domly to everyone else. This is basically the coupon col-
one block and provides redundancy as the weakest parts are
lector problem. Each round we get a block randomly from
backed up first.
someone. Lets assume that our current collection of blocks
I will demonstrate how using the rarest block first algo-
0$1 2 3 5
is unique to us thus the likely hood of getting one of the
rithm we will still get all the blocks in the network. Let initial
0$1 2 3 5 * zero
blocks back is initially
and
* slowly starts
there be peers. Let there be a file of chunks. For each
to grow to be bounded by . So we’ll ig-
chunks assume at least 1 peer has that chunk. in-
dicates how many peers have chunk m.
!"
become the rarest. The the peer will get that block and if
then everyone has downloaded all the =7(
So the expected number
& of turns
* & * Dhas
til everyone
=7? their
blocks should be .
necessary blocks.
Given peers and 1 seeder. How many times does the
Let’s compare our model’s results with the real world.
These results are from the empirical section 6.2, see there
seeder have to upload to send the whole file of
the peers assuming each peer drops off once they have all
parts to
for any constraints on the experimental results. See figure
1 for the data and the graph of the data in figure 2. The
the pieces.
So our upper bound is naively
$#
parts uploaded.
comparison between the experimental results and the model
aren’t quite valid for 2 reasons. In the experiment the peers
Given or Assumed:
drop out when they are finished and the network had shared
%
Lets assume each peer is connected to all other peers. but limited bandwidth.
-Chance
, a peer is not chosen in a round, A Game that is Pareto efficient if there no way someone is
better off without making someone worse off [1]. In BitTor-
What is the expected number of peers that chose any rent this is used to spur peers to look for better peers or at
given peer? least be fair and communicate with many peers.
3
file chunks experimental n=4 log(10) n=4 log(2) n=5 log(10) n=5 log(2) n=5 log(exp(1))
1.00 17.86 0.81 0.38 0.57 0.84 0.48 0.64
2.00 22.05 2.53 3.75 3.22 2.65 4.17 3.50
4.00 30.74 6.86 13.51 10.59 7.23 14.74 11.44
8.00 48.17 17.34 39.02 29.50 18.32 42.28 31.76
16.00 81.60 41.90 102.04 75.64 44.34 110.16 81.27
32.00 154.20 98.25 252.08 184.55 104.10 271.52 198.02
64.00 303.65 225.40 600.16 435.64 239.03 645.43 467.02
128.00 639.06 508.60 1392.31 1004.35 539.71 1495.67 1076.00
256.00 1309.33 1132.79 3168.63 2274.88 1202.72 3400.94 2435.91
Figure 1. Experimental Results Versus Model Results File Chunks Versus Turns/Time
100
10
0.1
1 10 100 1000
Figure 2. Experimental Results Versus Model Results File Chunks Versus Turns/Time
4
In computer science terms, seeking Pareto ef- 4.3 Related Problems
ficiency is a local optimization algorithm in which
pairs of counterparties see if they can improve 4.3.1 Byzantine Generals Problem
their lot together, and such algorithms tend to lead
to global optima. Specifically, if two peers are The Byzantine Generals Problem is related to BitTorrent
both getting poor reciprocation for for some of the more so as a warning against sabotage on the the BitTor-
upload they are providing, they can often start up- rent network. Sabotage could come from copyright holders
loading to each other instead and both get a better to Internet vigilantes to hackers.
download rate than they had before.[4] This relates to BitTorrent. How does BitTorrent defend
against colluding peers that seek to subvert the network?
Effectively BitTorrent is designed to promote the shar- An area in BitTorrent where this could be used in detecting
ing of bandwidth in order to improve transfer rates between if a peer or a group of peers is lying about their upload /
peers. download statistics to the tracker. If everyone voted and
agreed what one client uploaded that might work out quite
as suggested by
4.2.2 Tit For Tat Lewis and Saia [6].
In BitTorrent If a peer detects invalid data from an-
Tit For Tat is a strategy for uploading and downloading be-
other peer such as damaged datastructures or improper field
tween peers. There are pessimistic and optimistic tit for tat
lengths, it automatically disconnects that peer. If a peer
algorithms.
sends invalid data to another peer, this will be noticed as
Tit For Tat is a strategy used in game theory problems
the SHA1 hash from that chunk will not match.
such as prisoner dilemma where you take the strategy of
your opponent. If they cooperate, you cooperate next turn.
They don’t cooperate; you don’t cooperate. 5 Hashing
How would a random strategy fair against tit for tat? By
fair we mean minimize the time it takes to download while SHA1 hashes are used by BitTorrent on chunks of the
minimizing the amount of data uploaded. A strategy that file. The size of the chunks range
The chunks are always of sizes
( , from 64KB to 1024KB.
where is greater than
doesn’t upload much downloads a lot is a good strategy.
or equal to 16.
Lets create an small game.
Lets define the tit for tat strategy as for the first round we (
SHA1 hashes are 160-bit, thus naively the likelihood of
always upload. Given a round where & and any two strings having a matching checksum is . As
and
5
B doesn’t upload to A B uploads to A
A doesn’t upload to B (0,0) (1,0)
A uploads to B (0,1) (1,1)
Total Uploaded
Total Uploaded
UL Turns
UL Turns
create a list called ’preferred’ of connections are not 1200
800
reverse sort ’preferred’ by the upload rate (so the
largest uploading connections appear first) 600
400
cut the the tail off the list to reduce the size to
200
max uploads in size
0
(
0 50 100 150 200 250 300
6
def connection_made(self, connection, p = None):
if p is None:
p = randrange(-2, len(self.connections) + 1)
self.connections.insert(max(p, 0), connection)
self._rechoke()
700000
UL Rate 1
UL Rate 2
DL Rate 2
UL Rate 3
600000 DL Rate 3
UL Rate 4
DL Rate 4
UL Rate 5
500000 DL Rate 5
400000
300000
200000
100000
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Figure 4. Upload and download Rates of Peers from a seed. In this case, the Seed uploaded almost
4 times the filesize in bytes to 4 peers
7
18
File Size Vs Turns/Upload Ratio
well there needs to be research into how to detect, prevent
UL Ratio To Size
16
UL Ratio To Size
UL Ratio Turns To Size
and protect against this collusion.
UL Ratio Turns To Size
An interesting problem to investigate is whether or not
14
multiple peers on the same computer can download faster
12
than one peer on one computer. Judging by the tit for tat al-
10
gorithm I have reason to believe that this might be true since
8
there is a tendency to be optimistic and grant a client a band-
6 width reprieve to see if they will share more. Thus will this
4 compounded optimism work in the favor of the host with
2 multiple peers versus the host with a single peer? From my
0
personal study of this phenomenon I found that the peers on
(
1 10 100 1000
the same host will share amongst themselves very rapidly.
( Given the tit for tat choking algorithms use a 30 second
Figure 6. Given files of size to megabytes window is there a way to use this timing information to im-
in size we see how much the seeder uploads prove the upload download ratio in one’s favor?
to 4 other peers on a shared network, this is Often BitTorrent is used across the Internet, over many
the ratio of turns or uploaded megabytes to networks. BitTorrent tests should probably be done across a
the original file size peer to peer network with explicit paths rather than a shared
medium network like 10Mbit ethernet. This should simu-
late being on different networks and should avoid the prob-
7 Conclusions lem of limiting 5 peers to a 1 MBit pipe.
The models generated here are too naive to be really 9 What Did I actually Do
useful when modeling BitTorrent. More complex models
would have greater difficulty in proving facts about the sys- Comment and Extract algorithms from BitTorrent. At-
tem. Although as shown in section ?? even a naive model tempt to understand parts of the program from it’s
can successfully model the constraints of the real world. source. Specifically the piecepicker and choker.
Perhaps a simple model is best as it leaves room for ex-
Attempt to read a lot of papers in the area.
perimental error.
BitTorrent uses concepts from game theory and eco- Attempt to run empirical tests on BitTorrent using a
nomics to promote fairness. The use of optimistic strategies network of computer and modified BitTorrent client.
enables connections to attempt to renegotiate and counter Generate a testing framework. Generate the code nec-
balance the downward spiraling effects of the tit for tat al- essary to collect, retrieve and analyze the data.
gorithm.
Optimistic techniques used by BitTorrent seem to be use- Attempt at proving facts about naive models of BitTor-
ful strategies to promote file downloading. Bram Cohen, rent. Attempt to verify those models against experi-
the author of BitTorrent, has suggested that BitTorrent was mental data.
never meant to promote 1 to 1 upload download ratios, it Write this report.
was created to reduce the load of sharing large files.
In relation to randomized algorithms and analysis of References
such systems, BitTorrent is quite interesting albeit quite
complex. The major parts of BitTorrent related to random- [1] Pareto efficiency. 2004.
ized algorithms are piece picking, upload / download strat- http://en.wikipedia.org/wiki/Pareto efficiency.
egy (tit for tat), and hashing. The game theory aspects
of BitTorrent relate closely with economics and statistics [2] A. Adya, W. Bolosky, M. Castro, R. Chaiken, G. Cer-
which are quite related to probabilistic analysis. mak, J. Douceur, J. Howell, J. Lorch, M. Theimer, and
R. Wattenhofer. Farsite: Federated, available, and reli-
8 Future Work able storage for an incompletely trusted environment,
2002.
There needs to be further investigation into how a group [3] A. L. C. Bazzan and R. H. Bordini. A framework
of peers can collude to create unfair network conditions, for the simulation of agents with emotions. In Pro-
such as download more than upload, or even simply disrupt ceedings of the fifth international conference on Au-
the network enough to disable the distribution of a file. As tonomous agents, pages 292–299. ACM Press, 2001.
8
[4] B. Cohen. Incentives build robustness in bittorrent.
May 2003.
[5] L. Lamport, R. Shostak, and M. Pease. The byzantine
generals problem. ACM Trans. Program. Lang. Syst.,
4(3):382–401, 1982.
[6] C. S. Lewis and J. Saia. Scalable byzantine agreement,
2004.
[7] M. Mitzenmacher and E. Upfal. Probabilistic Analysis
and Randomized Algorithms: A First Course. Brown
University, 2003.
[8] L. Mui, M. Mohtashemi, and A. Halberstadt. Notions
of reputation in multi-agents systems: a review. In
Proceedings of the first international joint conference
on Autonomous agents and multiagent systems, pages
280–287. ACM Press, 2002.
10 Appendix
9
350000
UL Rate 1
UL Rate 2
DL Rate 2
UL Rate 3
300000 DL Rate 3
UL Rate 4
DL Rate 4
UL Rate 5
250000 DL Rate 5
200000
150000
100000
50000
0
0 2 4 6 8 10 12 14 16 18
10
400000
UL Rate 1
UL Rate 2
DL Rate 2
350000 UL Rate 3
DL Rate 3
UL Rate 4
DL Rate 4
300000 UL Rate 5
DL Rate 5
250000
200000
150000
100000
50000
0
0 5 10 15 20 25
11
400000
UL Rate 1
UL Rate 2
DL Rate 2
350000 UL Rate 3
DL Rate 3
UL Rate 4
DL Rate 4
300000 UL Rate 5
DL Rate 5
250000
200000
150000
100000
50000
0
0 5 10 15 20 25 30 35
12
600000
UL Rate 1
UL Rate 2
DL Rate 2
UL Rate 3
DL Rate 3
500000 UL Rate 4
DL Rate 4
UL Rate 5
DL Rate 5
400000
300000
200000
100000
0
0 5 10 15 20 25 30 35 40 45 50
13
700000
UL Rate 1
UL Rate 2
DL Rate 2
UL Rate 3
600000 DL Rate 3
UL Rate 4
DL Rate 4
UL Rate 5
500000 DL Rate 5
400000
300000
200000
100000
0
0 10 20 30 40 50 60 70 80 90
14
700000
UL Rate 1
UL Rate 2
DL Rate 2
UL Rate 3
600000 DL Rate 3
UL Rate 4
DL Rate 4
UL Rate 5
500000 DL Rate 5
400000
300000
200000
100000
0
0 20 40 60 80 100 120 140 160
15
700000
UL Rate 1
UL Rate 2
DL Rate 2
UL Rate 3
600000 DL Rate 3
UL Rate 4
DL Rate 4
UL Rate 5
500000 DL Rate 5
400000
300000
200000
100000
0
0 50 100 150 200 250 300 350
16
700000
UL Rate 1
UL Rate 2
DL Rate 2
UL Rate 3
600000 DL Rate 3
UL Rate 4
DL Rate 4
UL Rate 5
500000 DL Rate 5
400000
300000
200000
100000
0
0 100 200 300 400 500 600 700
17
700000
UL Rate 1
UL Rate 2
DL Rate 2
UL Rate 3
600000 DL Rate 3
UL Rate 4
DL Rate 4
UL Rate 5
500000 DL Rate 5
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 1400
18