0% found this document useful (0 votes)
69 views

Rarest First and Choke Algorithms Are Enough

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Rarest First and Choke Algorithms Are Enough

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Rarest First and Choke Algorithms Are Enough

Arnaud Legout G. Urvoy-Keller and P. Michiardi


I.N.R.I.A. Institut Eurecom
Sophia Antipolis France Sophia Antipolis France
[email protected] {Guillaume.Urvoy,Pietro.Michiardi}@eurecom.fr

ABSTRACT for this success. Whereas content localization has attracted


The performance of peer-to-peer file replication comes from considerable research interest in the last years [7, 12, 23, 25],
its piece and peer selection strategies. Two such strategies content replication has started to be the subject of active
have been introduced by the BitTorrent protocol: the rarest research only recently. As an example, the most popular
first and choke algorithms. Whereas it is commonly ad- peer-to-peer file sharing networks [1] eDonkey2K, FastTrack,
mitted that BitTorrent performs well, recent studies have Gnutella, Overnet focus on content localization. The only
proposed the replacement of the rarest first and choke algo- widely used [16, 17, 20] peer-to-peer file sharing application
rithms in order to improve efficiency and fairness. In this focusing on content replication is BitTorrent [8].
paper, we use results from real experiments to advocate that Yang et al. [26] studied the problem of efficient content
the replacement of the rarest first and choke algorithms can- replication in a peer-to-peer network. They showed that
not be justified in the context of peer-to-peer file replication the capacity of the network to serve content grows exponen-
in the Internet. tially with time in the case of a flash crowd, and that a key
We instrumented a BitTorrent client and ran experiments improvement on peer-to-peer file replication is to split the
on real torrents with different characteristics. Our exper- content into several pieces. Qiu et al. [22] proposed a re-
imental evaluation is peer oriented, instead of tracker ori- fined model of BitTorrent and showed its high efficiency. In
ented, which allows us to get detailed information on all summary, these studies show that a peer-to-peer architec-
exchanged messages and protocol events. We go beyond the ture for file replication is a major improvement compared to
mere observation of the good efficiency of both algorithms. a client server architecture, whose capacity of service does
We show that the rarest first algorithm guarantees close to not scale with the number of peers.
ideal diversity of the pieces among peers. In particular, on However, both studies assume global knowledge, which is
our experiments, replacing the rarest first algorithm with not realistic. Indeed, they assume that each peer knows all
source or network coding solutions cannot be justified. We the other peers. As a consequence, the results obtained with
also show that the choke algorithm in its latest version fos- this assumption can be considered as the optimal case. In
ters reciprocation and is robust to free riders. In particu- real implementations, there is no global knowledge. The
lar, the choke algorithm is fair and its replacement with a challenge is then to design a peer-to-peer protocol that
bit level tit-for-tat solution is not appropriate. Finally, we achieves a level of efficiency close to the one achieved in
identify new areas of improvements for efficient peer-to-peer the case of global knowledge.
file replication protocols. Piece and peer selection strategies are the two keys of
efficient peer-to-peer content replication. Indeed, in a peer-
Categories and Subject Descriptors: C.2.2 [Computer- to-peer system, the content is split into several pieces, and
Communication Networks]: Network Protocols; C.2.4 each peer acts as a client and a server. Therefore, each
[Computer-Communication Networks]: Distributed Systems peer can receive and give any piece to any other peer. An
General Terms: Measurement, Algorithms, Performance efficient piece selection strategy should guarantee that each
Keywords: BitTorrent, choke algorithm, rarest first algo- peer can always find an interesting piece from any other peer.
rithm, peer-to-peer The rationale is to offer the largest choice of peers to the
peer selection strategy. An efficient peer selection strategy
should maximize the capacity of service of the system. In
1. INTRODUCTION particular, it should employ selection criteria based, e.g., on
In a few years, peer-to-peer file sharing has become the upload and download capacity, and should not be biased by
most popular application in the Internet [16, 17]. Efficient the lack of available pieces in some peers.
content localization and replication are the main reasons The rarest first algorithm is a piece selection strategy that
consists of selecting the rarest pieces first. This simple strat-
egy used by BitTorrent performs better than random piece
selection strategies [5, 9]. However, Gkantsidis et al. [11]
Permission to make digital or hard copies of all or part of this work for argued based on simulations that the rarest first algorithm
personal or classroom use is granted without fee provided that copies are
may lead to the scarcity of some pieces of content and pro-
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to posed a solution based on network coding. Whereas this
republish, to post on servers or to redistribute to lists, requires prior specific solution is elegant and has raised a lot of interest, it leads
permission and/or a fee. to several complex deployment issues such as security and
IMC’06, October 25–27, 2006, Rio de Janeiro, Brazil. computational cost. Other solutions based on source coding
Copyright 2006 ACM 1-59593-561-4/06/0010 ...$5.00.
[18] have also been proposed to solve the claimed deficiencies of consider that each peer only knows few other peers, i.e.,
of the rarest first algorithm. each peer has a small peer set [5, 11]. In the case of real
The choke algorithm is the peer selection strategy of Bit- torrents, the peer set size is much larger. The consequence
Torrent. This strategy is based on the reciprocation of up- is that BitTorrent builds a random graph, connecting the
load and download speeds. Several studies [5, 10, 13, 15] dis- peers, that has a larger diameter in simulations than in real
cussed the fairness issues of the choke algorithm. In par- torrents. However, the diameter has a fundamental impact
ticular, they argued that the choke algorithm is unfair and on the efficiency of the rarest first algorithm.
favors free riders, i.e., peers that do not contribute. Solu- In this study, we show that in the specific context con-
tions based on a bit level tit-for-tat have been proposed to sidered, i.e., Internet peer-to-peer file replication, the rarest
address the choke algorithm’s fairness problem. first and choke algorithms are good enough. Even if we can-
In this paper, we perform an experimental evaluation of not extend our conclusions to other peer-to-peer contexts,
the piece and peer selection strategies as implemented in we believe this paper sheds new light on a system that uses
BitTorrent. Specifically, we have instrumented a client and a large fraction of the Internet bandwidth.
run extensive experiments on several torrents with differ- The rest of the paper is organized as follows. We present
ent characteristics in order to evaluate the properties of the the terminology used throughout this paper in section 2.1.
rarest first and choke algorithms. While we have not ex- Then, we give a short overview of the BitTorrent proto-
amined all possible cases, we argue that we have covered a col in section 2.2 and a description of the rarest first and
representative set of today real torrents. choke algorithms in section 2.3. We present our experimen-
Our main conclusions on real torrents are the following. tal methodology in section 3, and our detailed results in sec-
tion 4. Related work is discussed in section 5. We conclude
• The rarest first algorithm guarantees a high diversity the paper with a discussion of the results in section 6.
of the pieces. In particular, it prevents the reappear-
ance of rare pieces and of the last pieces problem.
2. BACKGROUND
• We have found that torrents in a startup phase can We introduce in this section the terminology used
have low piece diversity. The duration of this phase throughout this paper. Then, we give an overview of the Bit-
depends only on the upload capacity of the source of Torrent protocol, and we present the rarest first and choke
the content. In particular, the rarest first algorithm is algorithms.
not responsible for the low piece diversity during this
phase. 2.1 Terminology
The terminology used in the peer-to-peer community and
• The fairness achieved with a bit level tit-for-tat strat- in particular in the BitTorrent community is not standard-
egy is not appropriate in the context of peer-to-peer ized. For the sake of clarity, we define in this section the
file replication. We have proposed two new fairness terms used throughout this paper.
criteria in this context.
• Pieces and Blocks Files transfered using BitTorrent
• The choke algorithm is fair, fosters reciprocation, and are split in pieces, and each piece is split in blocks.
is robust to free riders in its latest version. Blocks are the transmission unit on the network, but
the protocol only accounts for transfered pieces. In
Our contribution is to go beyond the mere confirmation particular, partially received pieces cannot be served
of the good performance of BitTorrent. We provide new in- by a peer, only complete pieces can.
sights into the role of peer and piece selection for efficient
peer-to-peer file replication. We show for the first time that • Interested and Choked We say that peer A is in-
on real torrents, the efficiency of the rarest first and choke terested in peer B when peer B has pieces that peer A
algorithms do no justify their replacement by more complex does not have. Conversely, peer A is not interested in
solutions. Also, we identify, based on our observations, new peer B when peer B only has a subset of the pieces of
area of improvements: the replication of the first pieces and peer A. We say that peer A chokes peer B when peer
the speed of delivery of the first copy of the content. Fi- A decides not to send data to peer B. Conversely, peer
nally, we propose two new fairness criteria in the context of A unchokes peer B when peer A decides to send data
peer-to-peer file replication and we present for the first time to peer B.
results on the new version of the choke algorithm that fixes
fundamental fairness issues. • Peer Set Each peer maintains a list of other peers it
Our findings significantly differ from previous work [5, 10, knows about. We call this list the peer set. The notion
11, 13, 15, 18]. There are three main reasons for this diver- of peer set is also known as neighbor set.
gence. First, we target peer-to-peer file replication in the • Local and Remote Peers We call local peer the peer
Internet. As a consequence, the peers are well connected with the instrumented BitTorrent client, and remote
without severe network bottlenecks. The problems iden- peers the peers that are in the peer set of the local
tified in the literature with the rarest first algorithm are peer.
in the context of networks with connectivity problems or
low capacity bottlenecks. Second, we evaluate for the first • Active Peer Set A peer can only send data to a sub-
time the new version of the choke algorithm. The evaluation set of its peer set. We call this subset the active peer
of the choke algorithm in the literature was performed on set. The choke algorithm (described in section 2.3.2)
the old version. We show that the new version solves the determines the peers being part of the active peer set,
problems identified on the old one. Finally, we perform an i.e., which remote peers will be choked and unchoked.
experimental evaluation on real torrents. Simulating peer- Only peers that are unchoked by the local peer and
to-peer protocols is hard and requires many simplifications. interested in the local peer are part of the active peer
In particular, all the simulations of BitTorrent we are aware set.
• Leecher and Seed A peer has two states: the leecher joined the torrent. A torrent can thus be viewed as a collec-
state, when it is downloading content, but does not tion of interconnected peer sets. If ever the peer set size of
have yet all the pieces; the seed state when the peer a peer falls below a predefined threshold, typically 20 peers,
has all the pieces of the content. For short, we say this peer will contact the tracker again to obtain a new list
that a peer is a leecher when it is in leecher state and of IP addresses of peers. By default, the maximum peer
a seed when it is in seed state. set size is 80. Moreover, a peer should not exceed a thresh-
old of 40 initiated connections among the 80 at each time.
• Initial Seed The initial seed is the peer that is the As a consequence, the 40 remaining connections should be
first source of the content. initiated by remote peers. This policy guarantees a good
interconnection among the peer sets in the torrent.
• Rarest First Algorithm The rarest first algorithm
Each peer knows the distribution of the pieces for each
is the piece selection strategy used in BitTorrent. We
peer in its peer set. The consistency of this information is
give a detailed description of this algorithm in sec-
guaranteed by the exchange of messages [3]. The exchange
tion 2.3.1. The rarest first algorithm is also called the
of pieces among peers is governed by two core algorithms:
local rarest first algorithm.
the rarest first and the choke algorithms. These algorithms
• Choke Algorithm The choke algorithm is the peer are further detailed in section 2.3.
selection strategy used in BitTorrent. We give a de-
tailed description of this algorithm in section 2.3.2. 2.3 BitTorrent Piece and Peer Selection
The choke algorithm is also called the tit-for-tat algo- Strategies
rithm, or tit-for-tat like algorithm. We focus here on the two core algorithms of BitTorrent:
the rarest first and choke algorithms. We do not give all
• Rare and Available Pieces We call the pieces only
the details of these algorithms, but explain the main ideas
present on the initial seed rare pieces, and we call the
behind them.
pieces already served at least once by the initial seed
available pieces.
2.3.1 Rarest First Algorithm
• Rarest Pieces and Rarest Pieces Set The rarest The rarest first algorithm works as follows. Each peer
pieces are the pieces that have the least number of maintains a list of the number of copies of each piece in its
copies in the peer set. In the case the least replicated peer set. It uses this information to define a rarest pieces
piece in the peer set has m copies, then all the pieces set. Let m be the number of copies of the rarest piece,
with m copies form the rarest pieces set. The rarest then the index of each piece with m copies in the peer set
pieces can be rare pieces or available pieces, depending is added to the rarest pieces set. The rarest pieces set of a
on the number of copies of the rarest pieces. peer is updated each time a copy of a piece is added to or
removed from its peer set. Each peer selects the next piece
2.2 BitTorrent Overview to download at random in its rarest pieces set.
BitTorrent is a P2P application that capitalizes on the The behavior of the rarest first algorithm can be modified
bandwidth of peers to efficiently replicate contents on a large by three additional policies. First, if a peer has downloaded
set of peers. A specificity of BitTorrent is the notion of strictly less than 4 pieces, it chooses randomly the next piece
torrent, which defines a session of transfer of a single content to be requested. This is called the random first policy. Once
to a set of peers. Torrents are independent. In particular, it has downloaded at least 4 pieces, it switches to the rarest
participating in a torrent does not bring any benefit for the first algorithm. The aim of the random first policy is to
participation to another torrent. A torrent is alive as long as permit a peer to download its first pieces faster than with the
there is at least one copy of each piece in the torrent. Peers rarest first policy, as it is important to have some pieces to
involved in a torrent cooperate to replicate the file among reciprocate for the choke algorithm. Indeed, a piece chosen
each other using swarming techniques [24]. In particular, at random is likely to be more replicated than the rarest
the file is split in pieces of typically 256 kB, and each piece pieces, thus its download time will be on average shorter.
is split in blocks of 16 kB. Other piece sizes are possible. Second, BitTorrent also applies a strict priority policy,
A user joins an existing torrent by downloading a .tor- which is at the block level. When at least one block of
rent file usually from a Web server, which contains meta- a piece has been requested, the other blocks of the same
information on the file to be downloaded, e.g., the piece size piece are requested with the highest priority. The aim of
and the SHA-1 hash values of each piece, and the IP ad- the strict priority policy is to complete the download of a
dress of the so-called tracker of the torrent. The tracker is piece as fast as possible. As only complete pieces can be
the only centralized component of BitTorrent, but it is not sent, it is important to minimize the number of partially
involved in the actual distribution of the file. It keeps track received pieces.
of the peers currently involved in the torrent and collects Finally, the last policy is the end game mode [8]. This
statistics on the torrent. mode starts once a peer has requested all blocks, i.e., all
When joining a torrent, a new peer asks to the tracker a blocks have either been already received or requested. While
list of IP addresses of peers to build its initial peer set. This in this mode, the peer requests all blocks not yet received
list typically consists of 50 peers chosen at random in the list to all the peers in its peer set that have the corresponding
of peers currently involved in the torrent. The initial peer blocks. Each time a block is received, it cancels the request
set will be augmented by peers connecting directly to this for the received block to all the peers in its peer set that
new peer. Such peers are aware of the new peer by receiving have the corresponding pending request. As a peer has a
its IP address from the tracker. Each peer reports its state small buffer of pending requests, all blocks are effectively
to the tracker every 30 minutes in steady-state regime, or requested close to the end of the download. Therefore, the
when disconnecting from the torrent, indicating each time end game mode is used at the very end of the download,
the amount of bytes it has uploaded and downloaded since it thus it has little impact on the overall performance.
2.3.2 Choke Algorithm 3. EXPERIMENTAL METHODOLOGY
The choke algorithm was introduced to guarantee a rea- In order to evaluate experimentally the rarest first and
sonable level of upload and download reciprocation. As a choke algorithms on real torrents, we have instrumented a
consequence, free riders, i.e., peers that never upload, should BitTorrent client and connected this client to live torrents
be penalized. For the sake of clarity, we describe without with different characteristics. The experiments were per-
loss of generality the choke algorithm from the point of view formed one at a time in order to avoid a possible bias due
of the local peer. In this section, interested always means in- to overlapping experiments. We have instrumented a sin-
terested in the local peer, and choked always means choked gle client and we make no assumption on the other clients
by the local peer. connected to the same torrent. As we only considered real
The choke algorithm differs in leecher and seed states. We torrents, we captured a large variety of client configuration,
describe first the choke algorithm in leecher state. At most connectivity, and behavior. In the following, we give details
4 remote peers can be unchoked and interested at the same on how we conducted the experiments.
time. Peers are unchoked using the following policy.
1. Every 10 seconds, the interested remote peers are or-
3.1 Choice of the Monitored BitTorrent Client
dered according to their download rate to the local Several BitTorrent clients are available. The first BitTor-
peer and the 3 fastest peers are unchoked. rent client has been developed by Bram Cohen, the inventor
of the protocol. This client is open source and is called
2. Every 30 seconds, one additional interested remote mainline [2]. As there is no well maintained and official
peer is unchoked at random. We call this random un- specification of the BitTorrent protocol, the mainline client
choke the optimistic unchoke. is considered as reference for the BitTorrent protocol. It
In the following, we call the three peers unchoked in step 1 should be noted that, up to now, each improvement of Bram
the regular unchoked (RU) peers, and the peer unchoked in Cohen to the BitTorrent protocol has been replicated to the
step 2 the optimistic unchoked (OU) peer. The optimistic most popular other clients.
unchoke peer selection has two purposes. It allows to evalu- The other clients differ from the mainline client by a more
ate the download capacity of new peers in the peer set, and sophisticated interface with a nice look and feel, realtime
it allows to bootstrap new peers that do not have any piece statistics, many configuration options, experimental exten-
to share by giving them their first piece. sions to the protocol, etc.
We describe now the choke algorithm in seed state. In Since our goal is to evaluate the basic BitTorrent protocol,
previous versions of the BitTorrent protocol, the choke algo- we have decided to restrict ourselves to the mainline client.
rithm was the same in leecher state and in seed state except This client is very popular as it is the second most down-
that in seed state the ordering performed in step 1 was based loaded BitTorrent client at SourceForge with more than 52
on upload rates from the local peer. With this algorithm, million downloads. We instrumented the version 4.0.2 of
peers with a high download rate are favored independently the mainline client released at the end of May 20051 . This
of their contribution to the torrent. version of the instrumented mainline client implements the
Starting with version 4.0.0, the mainline client [2] intro- new choke algorithm in seed state (see section 2.3.2).
duced an entirely new algorithm in seed state. We are not
aware of any documentation on this new algorithm, nor of 3.2 Choice of the Torrents
any implementation of it apart from the mainline client. The aim of this work is to understand how the rarest first
We describe this new algorithm in seed state in the follow- and choke algorithms behave on real torrents. It is not in-
ing. At most 4 remote peers can be unchoked and interested tended to provide an exhaustive study on the characteristics
at the same time. Peers are unchoked using the following of today’s torrents. For this reason, we have selected tor-
policy. rents based on: their proportion of seeds to leechers, the
absolute number of seeds and leechers, and the content size.
1. Every 10 seconds, the unchoked and interested remote
The torrents monitored in this study were found on pop-
peers are ordered according to the time they were last
ular sites2 . We considered copyrighted and free contents,
unchoked, most recently unchoked peers first.
which are TV shows, movies, cartoons, music albums, live
2. For two consecutive periods of 10 seconds, the 3 first concert recordings, and softwares. Each experiment lasted
peers are kept unchoked and an additional 4th peer for 8 hours in order to make sure that each client became
that is choked and interested is selected at random a seed and to have a representative trace in seed state. We
and unchoked. performed all the experiments between June 2005 and May
2006.
3. For the third period of 10 seconds, the 4 first peers are We give the characteristic of each torrent in Table 1. The
kept unchoked. number of seeds and leechers is given at the beginning of
In the following, we call the three or four peers that are the experiment. Therefore, these numbers can be very dif-
kept unchoked according to the time they were last unchoked ferent at the end of the experiment. We see that there is
the seed kept unchoked (SKU) peers, and the unchoked peer a large variety of torrents: torrents with few seeds and few
selected at random the seed random unchoked (SRU) peer. leechers, torrents with few seeds and a large number of leech-
With this new algorithm, peers are no longer unchoked ac-
1
cording to their upload rate from the local peer, but accord- The latest stable branch of development is 4.20.x. In this
ing to the time of their last unchoke. As a consequence, the branch, there is no new functionality to the core protocol,
but a new tracker-less functionality and some improvements
peers in the active peer set are changed regularly, each new to the client. As the evaluation of the tracker functionality
SRU peer taking an unchoke slot off the oldest SKU peer. was outside the scope of this study we focused on version
We show in section 4.2.1 why the new choke algorithm 4.0.2.
in seed state is fundamental to the fairness of the choke 2
www.legaltorrents.com, bt.etree.org, fedora.redhat.com,
algorithm. www.mininova.org, isohunt.com.
is limited by default by the client to 20 kB/s. There is
Table 1: Torrent characteristics. Column 1 (ID): no limit to the download capacity. We obtained effective
torrent ID, column 2 (# of S): number of seeds at the maximum download speed ranging from 20 kB/s up to 1500
beginning of the experiment, column 3 (# of L): number kB/s depending on the experiments. We ran between 1 and
of leechers at the beginning of the experiment, column 4 3 experiments on the 26 different torrents given in Table 1
S
(Ratio L ): ratio (number of seeds)/(number of leechers), and performed a detailed analysis of each of these traces.
column 5 (Max. PS): maximum peer set size in leecher The results given in this paper are for a single run for each
state, column 6 (Size): size of the content in MB. torrent. Multiple runs on some torrents were used in a cali-
S
bration phase as explained in section 3.5.1.
ID # of S # of L Ratio L Max. PS Size Finally, whereas we have control over the monitored main-
1 0 66 0 60 700 line client, we do not control any other client in a torrent.
2 1 2 0.5 3 580
3 1 29 0.034 34 350
In particular, all peers in the peer set of the local peer are
4 1 40 0.025 75 800 real live peers.
5 1 50 0.02 60 1419
6 1 130 0.0078 80 820 3.4 Peer Identification
7 1 713 0.0014 80 700 In our experiments, we uniquely identify a peer by its
8 1 861 0.0012 80 3000 IP address and peer ID. The peer ID, which is 20 bytes, is
9 1 1055 0.00095 80 2000 a string composed of the client ID and a randomly gener-
10 1 1207 0.00083 80 348 ated string. This random string is regenerated each time
11 1 1411 0.00071 80 710
12 3 612 0.0049 80 1413
the client is restarted. The client ID is a string composed
13 9 30 0.3 35 350 of the client name and version number, e.g., M4-0-2 for the
14 20 126 0.16 80 184 mainline client in version 4.0.2. We are aware of around
15 30 230 0.13 80 820 20 different BitTorrent clients, each client existing in sev-
16 50 18 2.8 40 600 eral different versions. When in a given experiment, we see
17 102 342 0.3 80 200 several peer IDs corresponding to the same IP address3 , we
18 115 19 6 55 430 compare the client ID of the different peer IDs. In the case
19 160 5 32 17 6 the client ID is the same for all the peer IDs on a same
20 177 4657 0.038 80 2000
IP address, we deem that this is the same peer. We can-
21 462 180 2.6 80 2600
22 514 1703 0.3 80 349 not rely on the peer ID comparison, as the random string
23 1197 4151 0.29 80 349 is regenerated each time a client crashes or restarts. The
24 3697 7341 0.5 80 349 pair (IP, client ID) does not guarantee that each peer can
25 11641 5418 2.1 80 350 be uniquely identified, because several peers beyond a NAT
26 12612 7052 1.8 80 140 can use the same client in the same version. However, con-
sidering the large number of client IDs, it is common in our
experiments to observe 15 different client IDs, the probabil-
ity to have several different clients beyond a NAT with the
ers, torrents with a large number of seeds and few leechers, same client ID is reasonably low for our purposes. More-
and torrents with a large number of seeds and leechers. We over, unlike what was reported by Bhagwan et al. [4] for
discuss in section 3.5.2 the limitations in the choice of the the Overnet file sharing network, we did not see any prob-
torrents considered. lem of peer identification due to NATs. In fact, BitTorrent
has an option, activated by default, to prevent accepting
multiple concurrent incoming connections from the same IP
3.3 Experimental Setup address. The idea is to prevent peers to increase their share
We performed a complete instrumentation of the main- of the torrent, by opening multiple clients from the same
line client. The instrumentation consists of: a log of each machine. Therefore, even if we found in our traces differ-
BitTorrent message sent or received with the detailed con- ent peers with the same IP address at different moments in
tent of the message, a log of each state change in the choke time, two different peers with the same IP address cannot
algorithm, a log of the rate estimation used by the choke be connected to the local peer during overlapping periods.
algorithm, and a log of important events (end game mode,
seed state). 3.5 Limitations and Interpretation of the Re-
As monitored client, we use the mainline client with all sults
the default parameters for all our experimentations. It is
In this section we discuss the two main limitations of this
outside of the scope of this study to evaluate the impact
work, namely the single client instrumentation and the lim-
of each BitTorrent parameter. The main default parame-
ited set of monitored torrents. We also discuss why, de-
ters for the monitored client are: the maximum upload rate
spite these limitations, we believe our conclusions hold for a
(default to 20 kB/s), the minimum number of peers in the
broader range of scenarios than the ones presented.
peer set before requesting more peers to the tracker (de-
fault to 20), the maximum number of connections the local 3.5.1 Single Client Instrumentation
peer can initiate (default to 40), the maximum number of
peers in the peer set (default to 80), the number of peers in We have chosen for this study to focus on the behavior
the active peer set including the optimistic unchoke (default of a single client in a real torrent. Whereas it may be ar-
to 4), the block size (default to 214 Bytes), the number of gued that a larger number of instrumented peers would have
pieces downloaded before switching from random to rarest given a better understanding of the torrents, we made the
first piece selection (default to 4). 3
Between 0% and 26% of the IP addresses, depending on
We did all our experimentations on a machine connected the experiments, are associated in our traces to more than
to a high speed backbone. However, the upload capacity one peer ID. The mean is around 9%.
decision to be as unobtrusive as possible. Increasing the torrents with different characteristics and observed a con-
number of instrumented clients would have required to ei- sistent behavior on these torrents, we believe our findings
ther control those clients ourselves, or to ask some peers to to be representative of the rarest first and choke algorithms
use our instrumented client. In both cases, the choice of the behavior.
instrumented peer set would have been biased, and the be-
havior of the torrent impacted. Instead, our decision was to 4. EXPERIMENTAL RESULTS
understand how a new peer (our instrumented peer) joining
a real torrent behaves. We present in this section the results of our experiments.
Moreover, monitoring a single client does not adversely In a first part, we discuss the results with a focus on the
impact the generality of our findings for the following rea- rarest first algorithms. Then, in a second part, we discuss
sons. First, a torrent is a random graph of interconnected the results with a focus on the choke algorithm.
peers. For this reason, with a large peer set of 80, each 4.1 Rarest First Algorithm
peer should have a view of the torrent as representative
as any other peer. Even if each peer will see variations The aim of a piece selection strategy is to guarantee that
due to the random choice of the population in its peer set, each peer is always interested in any other peer. The ratio-
the big picture will remain the same. Second, in order to nal is that each time the peer selection strategy unchokes
make sure that there is no unforeseen bias due to the sin- a peer, this peer must be interested in the unchoking peer.
gle client instrumentation, we have monitored several tor- This way, the peer selection strategy can reach the optimal
rents with three different peers, each peer with a different system capacity (but, designing such an optimal peer selec-
IP address. These experiments were performed during a tion strategy is a hard task). Therefore, the piece selection
calibration phase, and are not presented here due to space strategy is fundamental to reach good system capacity.
limitation. Whereas the download speed of the peers may However, the efficiency of the piece selection strategy can-
significantly vary, e.g., due to very fast seeds that may of not be measured in terms of system capacity, because the
may not be present in the peer set of a monitored client, we system capacity is the result of both the piece and peer se-
did not observe any other significant difference among the lection strategies. A good way to evaluate the efficiency of
clients that may challenge the generality of our findings. the piece selection strategy is to measure the entropy of the
torrent, i.e., the repartition of pieces among peers.
There is no simple way to directly measure the entropy
3.5.2 Limited Torrent Set of a torrent. For this reason, we characterize the entropy
We have considered for this study 26 different torrents. with the peer availability. We define the peer availability of
Whereas it is a large number of torrents, it is not large peer x according to peer y as the ratio of the time peer y
enough to be exhaustive or to be representative of all the is interested (see section 2.1) in peer x over the time peer
torrents that can be found in the Internet. However, our in- x is in the peer set of peer y. If peer x is always available
tent is to evaluate the behavior of the rarest first and choke for peer y, then the peer availability is equal to one. In
algorithm in a variety of situations. The choice of the tor- the following, we characterize the entropy of a torrent with
rents considered in this study was targeted to provide a chal- the availability of the peers in this torrent. For the sake of
lenging environment to the rarest first and choke algorithms. clarity, we will simply refer to the notion of entropy.
For instance, torrents with no seed (torrent 1) or with only We say that there is ideal entropy in a torrent when each
one seed and a large number of leechers (e.g., torrent 7– leecher4 is always interested in any other leecher. We do
11) were specifically chosen to evaluate how the rarest first not claim that ideal entropy can be always achieved, but
algorithm behaves in the context of pieces scarcity. Tor- it should be the objective of any efficient piece selection
rents with a large number of peers were selected to evaluate strategy.
how the choke algorithm behaves when the torrent is large We evaluated the rarest first algorithm on a representa-
enough to favor free riders. tive set of real torrents. We showed that the rarest first
We have around half of the presented torrents with no or algorithm achieves a close to ideal entropy, and that its re-
few seeds, as this is a challenging situation for a peer-to- placement by more complex solutions cannot be justified.
peer protocol. However, it can be argued that the largest Then, we evaluated the dynamics of the rarest first algo-
presented torrent with a single seed has a small number of rithm to understand the reasons for this good entropy. Fi-
leechers (1441 leechers at the beginning of the experiment nally, we focused on a specific problem called the last pieces
for torrent 11). Indeed, the target of a peer-to-peer protocol problem, which is presented [11, 18] as a major weakness
is to distribute content to millions of peers. But, a peer- of the rarest first strategy. We showed that the last pieces
to-peer protocol capitalizes on the bandwidth of each peer. problem is overestimated. In contrast, we identified a first
Thus, it is not possible to scale to millions of peers without blocks problem, which is a major area of improvement for
a significant proportion of seeds. If we take the same pro- BitTorrent.
portion of seeds and leechers as the one of torrent 11, only
710 seeds are enough to scale to one million of peers. Also, 4.1.1 Entropy Characterization
a torrent with a ratio number of seeds lower than 10−3 is The major finding of this section is that the rarest first
number of leechers
enough to stress a piece selection strategy based on a local algorithm achieves a close to ideal entropy for real torrents.
view of only 80 peers. We remind that ideal entropy is achieved when each leecher
Finally, in such an experimental study it is not possible to is always interested in any other leecher. As we do not have
reproduce an experiment, and thus to gain statistical infor- global knowledge of the torrent, we characterize the entropy
mation because each experiment depends on the behavior from the point of view of the local peer with two ratios. For
of peers, the number of seeds and leechers in the torrent, each remote peer we compute:
and the subset of peers randomly returned by the tracker. 4
Only the case of leechers is relevant for the entropy char-
However, studying the dynamics of the protocol is as impor- acterization, as seeds are always interesting for leechers and
tant as studying its statistical properties. As we considered never interested in leechers.
Interest of the Local Peer in the Remote Peers replicated faster than rare pieces. This leads to two prob-
1 lems. First, the probability of having peers in a peer set
with the same subset of pieces is higher during the torrent
Ratio a/b

startup than when there is no rare piece in the torrent. Sec-


0.5 ond, when there is no rare piece, a peer with all the available
pieces becomes a seed. But, when there are rare pieces, a
peer with all the available pieces remains a leecher because
0 it does not have the rare pieces. However, these leechers
0 5 10 15 20 25
cannot be interested in any other peer as they have all the
Interest of the Remote Peers in the Local Peer available pieces at this point of time, but they stay in the
1 peer set of the local peer. Thus a low ratio for these leechers
in Fig. 1. In conclusion, the low entropy we observed is not
Ratio c/d

due to a deficiency of the rarest first algorithm, but to the


0.5 startup phase of the torrent whose duration depends only on
the upload capacity of the initial seed. We discuss further
this point in section 4.1.2.1.
0 Now, we discuss why the remote peers are often not in-
0 5 10 15 20 25
Torrent ID terested in the local peer for torrents 2, 4, 10, 18, 19, 21,
and 26 (see Fig. 1, bottom graph). No dot is displayed for
torrent 19 because due to the small number of leechers in
Figure 1: Entropy characterization. Top graph: For each
this torrent, the local peer in leecher state had no leecher
remote leecher peer for a given torrent, a dot represents the
in its peer set. Five torrents have a 20th percentile close to
ratio ab where a is the time the local peer in leecher state is
0. The percentile for four of these torrents is computed on
interested in this remote peer and b is the time this remote
a small number of ratios: 3, 8, 12, and 15 for torrents 2,
peer spent in the peer set when the local peer is in leecher
18, 21 and 26 respectively. Therefore, the 20th percentile
state. Bottom graph: For each remote leecher peer for a
is not representative as it is not computed on a set large
given torrent, a dot represents the ratio dc where c is the time
enough. Additionally, the reason for the low 20th percentile
this remote peer is interested in the local peer in leecher state
is peers with a ratio of 0. We identified two reasons for a
and d is the time this remote peer spent in the peer set when
ratio of 0. First, some peers join the peer set with almost all
the local peer is in leecher state. For both graphs: Each
pieces. They are therefore unlikely to be interested in the
vertical solid lines represent the 20th percentile (bottom of
local peer. Second, some peers with no or few pieces never
the line), the median (identified with a circle), and the 80th sent an interested message to the local peer. This can be
percentile (top of the line) of the ratios for a given torrent. explained by a client behavior changed with a plugin or an
option activation. The super seeding option [3] available in
several BitTorrent clients has this effect. In conclusion, the
• the ratio ab where a is the time the local peer in leecher
low entropy of some peers is either a measurement artifact
state is interested in this remote peer and b is the time
due to modified or misbehaving clients, or the result of the
this remote peer spent in the peer set when the local
inability of the rarest first algorithm to reach ideal entropy
peer is in leecher state;
in some extreme cases.
• the ratio dc where c is the time this remote peer is We have seen that peers that join the torrent with almost
interested in the local peer in leecher state and d is all pieces may not be interested in the local peer. In this
the time this remote peer spent in the peer set when scenario, the rarest first algorithm does not guarantee ideal
the local peer is in leecher state. entropy. However, we argue that this case does not justify
the replacement of the rarest first algorithm for two reasons.
In the case of ideal entropy the above ratios should be one. First, this case appears rarely and does not significantly im-
Fig. 1 gives a characterization of the entropy for the torrents pact the overall entropy of the torrent. Second, the peers
considered in this study. with low entropy are peers that join the peer set with only
For most of our torrents, we see in Fig. 1 that the ratios a few missing pieces. In the case of torrent startup, it is
are close to 1, thus a close to ideal entropy. For the top not clear whether a solution based, for instance, on source
graph, 70% of the torrents have the 20th percentile close to or network coding would have proposed interesting pieces
one, and 80% have the median close to one. For the bottom to such peers. Indeed, when content is split into k pieces,
graph, 70% of the torrents have a 20th percentile close to there is no solution based on coding that can reconstruct
one, and 90% of the torrents have the median close to one. the content in less than k pieces. For this reason, when the
We discuss below the case of the torrents with low entropy. initial seed has not yet sent at least one copy of each piece,
First, we discuss why the local peer is often not interested there is no way to reconstruct the content, so no way to have
in the remote peers for torrents 1, 2, 4, 5, 6, 7, 8, and 9 (see interesting pieces for all the peers.
Fig. 1, top graph). These torrents have low entropy because An important question is how rarest first compares with
they are in a startup phase. This means that the initial seed network coding in the presented scenarios. As there is no
has not yet served all the pieces of the content. We remind client based on network coding that is as popular as Bit-
that the pieces only present on the initial seed are the rare Torrent, it is not possible to evaluate both solutions on the
pieces, and that the pieces already served at least once by same torrents. However, based on the theoretical network
the initial seed are the available pieces (see section 2.1). The coding results, we discuss the respective merits of rarest first
reason for the low observed entropy is that during a torrent and network coding in section 4.1.4.
startup, available pieces are replicated with an exponential For the computation of the ratios on Fig. 1, we did not
capacity of service [26], but rare pieces are served by the consider peers that spent less than 10 seconds in the peer
initial seed at a constant rate. Thus, available pieces are
set. Our motivation was to evaluate the entropy of pieces in Replication of Pieces in the Peer Set, LS
a torrent. However, due to several misbehaving clients, there 80
is a permanent noise created by peers that join and leave the
peer set frequently. Such peers stay typically less than a few
seconds in the peer set, and they do not take part in any 60

Number of Copies
active upload or download. Therefore, these misbehaving
peers adversely bias our entropy characterization. Filtering Max
all peers that stay less than 10 seconds remove the bias. 40 Mean
In summary, we have seen that the rarest first algorithm Min
enforces a close to ideal entropy for the presented torrents.
We have identified torrents with low entropy and shown that 20
the rarest first algorithm is not responsible for this low en-
tropy. We have also identified rare cases where the rarest
first algorithm does not perform optimally, but we have ex- 0
0 0.5 1 1.5 2 2.5 3
plained that these cases do not justify a replacement with Time (s) 4
x 10
a more complex solution. In the following, we evaluate how
the rarest first piece selection strategy achieves high entropy. Figure 2: Evolution of the number of copies of pieces in the
peer set with time for torrent 8 in leecher state. Legend:
4.1.2 Rarest First Algorithm Dynamics The dotted line represents the number of copies of the most
We classify a torrent in two states: the transient state and replicated piece in the peer set at each instant. The solid
the steady state5 . In transient state, there is only one seed line represents the mean number of copies over all the pieces
in the torrent. In particular, there are some pieces that are in the peer set at each instant. The dashed line represents
rare, i.e., present only at the seed. This state corresponds to the number of copies of the least replicated piece in the peer
the beginning of the torrent, when the initial seed has not yet set at each instant.
uploaded all the pieces of the content. All torrents with low
entropy (Fig. 1, top graph) are in transient state. A good Number of Rarest Pieces, LS
piece replication algorithm should minimize the time spent 300
in the transient state because low entropy may adversely
impact the service capacity of a torrent by biasing the peer 250
selection strategy. In steady state, there is no rare piece, and
the piece replication strategy should prevent the torrent to
200
enter again a transient state. All torrents with high entropy
Num. rarest

are in steady state.


150
In the following, we evaluate how the rarest first algorithm
performs in transient and steady state. We show that the
low entropy of torrents experienced in transient state is due 100
to the limited upload capacity of the initial seed, and that
the rarest first algorithm minimizes the time spent in this 50
state. We also show that the rarest first algorithm is efficient
at keeping a torrent in steady state, thus guaranteeing a high 0
0 0.5 1 1.5 2 2.5 3
entropy. Time (s) 4
x 10

4.1.2.1 Transient State. Figure 3: Evolution of the number of rarest pieces in the
In order to understand the dynamics of the rarest first peer set for torrent 8 in leecher state. The rarest pieces set
algorithm in transient state, we focus on torrent 8. This is formed by the pieces that are equally the rarest, i.e., the
torrent consisted of 1 seed and 861 leechers at the beginning pieces that have the least number of copies in the peer set.
of the experiment. The file distributed in this torrent is split
in 863 pieces. We run this experiment during 58991 seconds,
but in the following we only discuss the results for the first
available pieces are replicated with an exponential capacity
29959 seconds when the local peer is in leecher state.
of service [26], but rare pieces are served by the initial seed
Torrent 8 is in transient state for most of the experiment.
at a constant rate. This is confirmed by Fig. 3 that shows
As we don’t have global knowledge of the torrent, we do not
the number of rarest pieces, i.e., the set size of the pieces
have a direct observation of the transient state. However,
that are equally rarest. We see that the number of rarest
there are several evidences of this state. Indeed, Fig. 2 shows
that there are missing pieces during the experiment in the pieces decreases linearly with time. As the size of each piece
in this torrent is 4 MB, a rapid calculation shows that the
local peer set, as the minimum curve (dashed line) is at
rarest pieces are duplicated in the peer set at a constant rate
zero. Moreover, we probed the tracker to get statistics on
close to 36 kB/s. We do not have a direct proof that this
the number of seeds and leechers during this experiment. We
rate is the one of the initial seed, because we do not have
found that this torrent had only one seed for the duration
global knowledge of the torrent. However, the torrent is in
of the experiment.
its startup phase and most of the pieces are only available on
We see in Fig. 1, top graph, that torrent 8 has low entropy.
This low entropy is due to the limited upload capacity of the initial seed. Indeed, Fig. 2 shows that there are missing
pieces in the peer set, thus the rarest pieces presented in
the initial seed. Indeed, when a torrent is in transient state,
Fig. 3 are missing pieces in the peer set. Therefore, only
5
Our definition of transient and steady state differs from the the initial seed can serve the missing pieces shown in Fig. 3.
one given by Yang et al. [26]. In conclusion, the upload capacity of the initial seed is the
bottleneck for the replication of the rare pieces, and the time Replication of Pieces in the Peer Set
spent in transient state only depends on the upload capacity 80
of the initial seed. Max
The rarest first algorithm attempts to minimize the time Mean
spent in transient state and replicates fast available pieces. 60 Min

Number of Copies
Indeed, leechers download first the rare pieces. As the rare
pieces are only present on the initial seed, the upload ca-
pacity of the initial seed will be fully utilized and no or few 40
duplicate rare pieces will be served by the initial seed. Once
served by the initial seed, a rare piece becomes available and
is served in the torrent with an increasing capacity of ser- 20
vice. As rare pieces are served at a constant rate, most of
the capacity of service of the torrent is used to replicate the
available pieces on leechers. Indeed, Fig. 2 shows that once 0
0 0.5 1 1.5 2 2.5 3
a piece is served by the initial seed, the rarest first algorithm Time (s) 4
x 10
will start to replicate it fast as shown by the continuous in-
crease in the mean number of copies over all the peers, and Figure 4: Evolution of the number of copies of pieces in the
by the number of copies of the most replicated piece (dotted peer set with time for torrent 7. Legend: The dotted line
line) that is always close to the maximum peer set size of represents the number of copies of the most replicated piece
80. in the peer set at each instant. The solid line represents the
In summary, the low entropy observed for some torrents is mean number of copies over all the pieces in the peer set
due to the transient phase. The duration of this phase can- at each instant. The dashed line represents the number of
not be shorter than the time for the initial seed to send one copies of the least replicated piece in the peer set at each
copy of each piece, which is constrained by the upload ca- instant.
pacity of the initial seed. Thus, the time spent in this phase
cannot be shorten further by the piece replication strategy. Size of the Peer Set
The rarest first algorithm minimizes the time spent in tran- 80
sient state. Once a piece is served by the initial seed, the
rarest first algorithm replicates it fast. Therefore, a replace- 70
ment of the rarest first algorithm by another algorithm can-
60
not be justified based on the real torrents we have monitored
in transient state.
Peer set size

50

40
4.1.2.2 Steady State.
In order to understand the dynamics of the rarest first 30
algorithm in steady state, we focus on torrent 7. This torrent
consisted of 1 seed and 713 leechers at the beginning of the 20
experiment. We have seen on Fig. 1 that torrent 7 has a
10
high entropy. Fig. 4 shows that the least replicated piece
(min curve) has always more than 1 copy in the peer set. 0
Thus, torrent 7 is in steady state. 0 0.5 1 1.5 2 2.5 3
Time (s) x 10
4
In the following, we present the dynamics of the rarest first
algorithm in steady state, and explain how this algorithm
prevents the torrent to return in transient state. Fig. 4 shows Figure 5: Evolution of the peer set size for torrent 7.
that the mean number of copies remains well bounded over
time by the number of copies of the most and least replicated
pieces. The variation observed in the number of copies are In summary, the rarest first algorithm in steady state en-
explained by the variation of the peer set size, see Fig. 5. sures a good replication of the pieces in real torrents. It
The decrease in the number of copies 9051 seconds after the also replicates fast the rarest pieces in order to prevent the
beginning of the experiment corresponds to the local peer reappearance of a transient state. We conclude that on real
switching to seed state. Indeed, when a leecher becomes a torrents in steady state, the rarest first algorithm is enough
seed, it closes its connections to all the seeds. to guarantee a high entropy.
The rarest first algorithm does a very good job at increas-
ing the number of copies of the rarest pieces. Fig. 4 shows 4.1.3 Last Pieces Problem
that the number of copies of the least replicated piece (min We say that there is a last pieces6 problem when the down-
curve) closely follows the mean, but does not significantly load speed suffers a significant slow down for the last pieces.
get closer. However, we see in Fig. 6 that the number of This problem is due to some pieces replicated on few over-
rarest pieces, i.e., the set size of the pieces that are equally loaded peers, i.e., peers that receive more requests than they
rarest, follow a sawtooth behavior. Each peer joining or can serve. This problem is detected by a peer only at the
leaving the peer set can alter the set of rarest pieces. But, end of the content download. Indeed, a peer always seeks for
as soon as a new set of pieces becomes rarest, the rarest first fast peers to download from. Thus, it is likely that if some
algorithm quickly duplicates them as shown by a consistent pieces are available on only few overloaded peers, these peers
drop in the number of rarest pieces in Fig.6. Finally, we 6
never observed in any of our torrents a steady state followed This problem is usually referenced as the last piece (singu-
by a transient state. lar) problem. However, there is no reason why this problem
affects only a single piece.
Number of Rarest Pieces make the comparison: the steady and transient states. In
40 steady state, we have seen in section 4.1.2.2 that the entropy
35
of the presented torrents is close to one with rarest first. An
entropy close to one means that each peer is interested in
30 each other peer in its peer set most of the time. As this is
close to the target of an ideal piece selection strategy, we see
Num. rarest

25 that in steady state, the possibility of improvement for any


20 piece selection strategy in not significant compared to rarest
first. For this reason, we argue that a replacement of rarest
15 first cannot be justified in the studied context. In transient
state, a solution based on network coding will enable the
10
initial seed to send one entire copy of the content faster
5 than in the case of rarest first that may suffer from duplicate
pieces. The problem with rarest first is that the number of
0
0 0.5 1 1.5 2 2.5 3 duplicate pieces will depends on the peer selection strategy.
Time (s) 4 Indeed, if the initial seed chooses the same set of peers to
x 10
upload the initial pieces to and that these peers are all in
the same peer set, then they will have the same view of
Figure 6: Evolution of the number of rarest pieces in the
the rarest pieces, and they will download from the initial
peer set for torrent 7. The rarest pieces set is formed by the
seed an entire copy of the content without any duplicate
pieces that are equally the rarest, i.e., the pieces that have
pieces. But, other peer selection policies may increase the
the least number of copies in the peer set.
ratio of duplicate pieces before a first copy of the content
is sent. There is no such a problem with network coding.
However, simple policies can be implemented to guarantee
will be chosen only at the end of the content download when that the ratio of duplicate pieces remains low for the initial
there is no other pieces to download. seed, e.g., the new choke algorithm in seed state or the super
Due to space limitation, we just give our main conclusions. seeding mode [3]. In this case, the benefit of network coding
For a detailed discussion on the last pieces problem, the compared to rarest first will not be significant at the scale
interested reader may refer to [19]. of the content download.
We never observed a last pieces problem on torrents in Network coding appears as a solution more general than
steady state. However, we observed this problem on a few rarest first, as it works optimally in all cases. However, we
torrents in transient state. We found that this problem is argue in favor of the simplicity of rarest first. Network cod-
inherent to the transient state of the torrent, and is not due ing raises several implementation issues and is CPU inten-
to the rarest first algorithm. Moreover, the rarest first algo- sive. Rarest first is simple, easy to implement, and already
rithm is efficient at mitigating this problem by replicating widely used. We have seen that in a context of peer-to-
fast rare pieces once they become available. peer content replication with a large peer set and a good
It is important to study the piece interarrival time, be- network connectivity, rarest first is a simple and very effi-
cause partially received pieces cannot be retransmitted by a cient solution. That is in this context that we argue that a
BitTorrent client, only complete pieces can. However, pieces replacement of rarest first cannot be justified.
are split into blocks, which are the BitTorrent unit of data
transfer. For this reason, we have also evaluated the block 4.2 Choke Algorithm
interarrival time. We identified a first blocks problem. This The choke algorithm is a peer selection strategy. It should
first blocks problem results in a slow startup of the torrent, guarantee fairness and maximize the system capacity. In
which is an area of improvement for BitTorrent. this section, we focus on the fairness issue, as the claimed
In conclusion, the last pieces problem is overstated, but deficiencies of the choke algorithm are related to its fairness
the first blocks problem is underestimated and a possibility properties. Whereas the evaluation and optimization of the
of performance improvement. system capacity is an important issue, the choke algorithm is
indisputably an efficient peer selection strategy that is used
4.1.4 Discussion on Rarest First and Network Cod- by millions of persons. A detailed evaluation of the system
ing capacity reached with the choke algorithm is an interesting
We have seen that rarest first is an efficient piece selection area of future research.
strategy on the presented torrents. We have also shown that
the claimed deficiencies of rarest first cannot be identified in 4.2.1 Fairness Issue
our experiments, or are the results of a misunderstanding of Several recent studies [5, 10, 13, 15] challenge the fairness
the reason of piece scarcity for torrents in transient state. properties of the choke algorithm because it does not im-
However, this paper is not a case against solutions based plement a bit level tit-for-tat, but a coarse approximation
on source or network coding. Network coding enables a piece based on short term download estimations. Moreover, it is
selection strategy that is close to optimal in all cases, which believed that a fair peer selection strategy must enforce a
is not the case of rarest first. Indeed, in specific contexts byte level reciprocation. For instance, a peer A refuses to
like small outdegree constraint, or poor network connectivity upload data to a peer B if the amount of bytes uploaded by
between cluster of peers, rarest first will perform poorly. In A to B minus the amount of bytes downloaded from B to
this study, we show that on real torrents in the Internet, A is higher than a given threshold [5, 10, 15]. The rationale
which have a large peer set of 80 and do not suffer from behind this notion of fairness is that free riders should be
connectivity problems, rarest first performs very well. penalized, and reciprocation should be enforced. We call
In fact, rarest first is close to a solution based on network this notion of fairness, tit-for-tat fairness.
coding in the presented torrents. We consider two cases to We argue in the following that tit-for-tat fairness is not
appropriate in the context of peer-to-peer file replication. Contribution to the Amount of Uploaded Bytes, LS
A peer-to-peer session consists of seeds, leechers, and free 1
riders, i.e., leechers that never upload data. We consider
the free riders as a subset of the leechers. With tit-for-

Ratio
tat fairness, when there is more capacity of service in the 0.5
torrent than request for this capacity, the excess capacity
will be lost even if slow leechers or free riders could benefit
from it. Excess capacity is not rare as it is a fundamental 0
0 5 10 15 20 25
property of peer-to-peer applications. Indeed, there are two
important characteristics of peer-to-peer applications that Contribution to the Amount of Downloaded Bytes, L
tit-for-tat fairness does not take into account. First, leechers 1
can have an asymmetrical network connectivity, the upload
capacity being lower than the download capacity. In the

Ratio
case of tit-for-tat fairness, a leecher will never be able to use 0.5
its full download capacity even if there is excess capacity in
the peer-to-peer session. Second, a seed cannot evaluate the
reciprocation of a leecher, because a seed does not need any 0
0 5 10 15 20 25
piece. As a consequence, there is no way for a seed to enforce Torrent ID
tit-for-tat fairness. But, seeds can represent an important
part of a peer-to-peer session, see Table 1. For this reason, Figure 7: Fairness characterization of the choke algorithm
it is fundamental to have a notion of fairness that takes into in leecher state for each torrent. Top graph: Amount of
account seeds. bytes uploaded from the local peer to remote peers. We cre-
In the following, we present two fairness criteria that take ated 6 sets of 5 remote peers each, the first set (in black) con-
into account the characteristics of leechers and seeds and the tains the 5 remote peers that receive the most bytes from the
notion of excess capacity: local peer. Each next set contains the next 5 remote peers.
The sets representation goes from black for the set contain-
• Any leecher i with an upload speed Ui should get a ing the 5 best remote downloaders, to white for the set con-
lower download speed than any other leecher j with taining the 25 to 30 best downloaders. Bottom graph:
an upload speed Uj > Ui . Amount of bytes downloaded from remote peers to the local
peer. The same set construction is kept. Thus, this graph
• A seed should give the same service time to each shows how much each set of downloaders, as defined in the
leecher. top graph, uploaded to the local peer.
With these two simple criteria, leechers are allowed to use
the excess capacity, but not at the expense of leechers with
a higher level of contribution. Reciprocation is fostered and 4.2.2 Leecher State
free riders are penalized. Seeds do not make a distinction The choke algorithm in leecher state fosters reciprocation.
between contributing leechers and free riders. However, free We see in Fig. 7 that peers that receive the most from the lo-
riders cannot compromise the stability of the system because cal peer (top graph) are also peers from which the local peer
the more there are contributing leechers, the less the free downloaded the most (bottom graph). Indeed, the same
riders receive from the seeds. color in the top and bottom graphs represents the same set
Tit-for-tat fairness can be extended to evenly distribute of peers. All seeds are removed from the data used for the
the capacity of seeds to peers in a torrent. With this exten- bottom graph, as it is not possible to reciprocate data to
sion, tit-for-tat fairness will verify our two fairness criteria. seeds. This way, a ratio of 1 in the bottom graph represents
However, in the context of peers with asymmetric capacity, the total amount of bytes downloaded from leechers.
finding a threshold that maximizes the capacity of the sys- Two torrents present a different characteristic. The local
tem is a hard task that is not yet solved in the context of peer for torrent 19 does not upload any byte in leecher state
a distributed system. Moreover, using a default threshold because due to the small number of leechers in this torrent,
may lead to a high inefficiency of the system. We will see in the local peer in leecher state had no leecher in its peer
the following that the choke algorithm verifies our two fair- set. Torrents 5, which is in transient state, has a low level
ness criteria with a simple distributed algorithm that does of reciprocation. This is explained by a single leecher that
not require the complex computation of a threshold. gave to the local peer half of the pieces, but who received few
To summarize the above discussion, tit-for-tat fairness is pieces from the local peer. The reason is that this remote
not appropriate in the context of peer-to-peer file replication leecher was almost never interested in the local peer. This
protocols like BitTorrent. For this reason, we proposed two problem is due to the low entropy of the torrent in transient
new criteria of fairness, one for leechers and one for seeds. state.
It is beyond the scope of this study to perform a detailed Because the choke algorithm takes its decisions based on
discussion of the fairness issues for peer-to-peer protocols. the current download rate of the remote peers, it does not
Our intent is to give a good intuition on how a peer-to-peer achieve a perfect reciprocation of the amount of bytes down-
protocol should behave in order to achieve a reasonable level loaded and uploaded. However, Fig. 7 shows that the peers
of fairness. from which the local peer downloads the most are also the
In the following, we show on real torrents that the choke peers that receive the most uploaded bytes. Thus there is
algorithm in leecher state fosters reciprocation, and that the a strong correlation between the amount of bytes uploaded
new choke algorithm in seed state gives the same service time and the amount of bytes downloaded.
to each leecher. We conclude that the choke algorithm is fair The above results show that with a simple distributed
according to our two new fairness criteria. algorithm and without any stringent reciprocation require-
Correlation Unchoke and Interested Duration, LS Contribution of Peers to the Amount of Uploaded Bytes, SS
1
# of Unchokes
400
0.8
200

Upload Contribution
0 0.6
0 2000 4000 6000 8000
Correlation Unchoke and Interested Duration, SS
0.4
400
# of Unchokes

200 0.2

0 0
0 1 2 3 4 5 6 0 5 10 15 20 25
Interested Time (s) 4 Torrent ID
x 10
Figure 9: Fairness characterization of the choke algorithm
Figure 8: Correlation between the number of unchokes and
in seed state for each torrent. Legend: We created 6 sets of
the interested time for each remote peer for torrent 7. For
5 remote peers each, the first set (in black) contains the 5
each remote peer, a dot represents the correlation between
remote peers that receive the most bytes from the local peer.
the number of times this remote peer is unchoked by the
Each next set contains the next 5 remote peers. The set
local peer and the time this remote peer is interested in the
representation goes from black from the set containing the
local peer. Top graph: Correlation when the local peer
5 best remote downloaders, to white for the set containing
is in leecher state. Bottom graph: Correlation when the
the 25 to 30 best downloaders.
local peer is in seed state.

Fig. 7 shows that for four torrents in transient state, tor-


ments, unlike tit-for-tat fairness, one can achieve a good rents 4, 5, 6 and 8, the amount of bytes uploaded by the
reciprocation. More importantly, the choke algorithm in 30 best remote peers is lower than for the other torrents.
leecher state allows leechers to benefit from the excess capac- Torrents in transient state have low entropy. Therefore, the
ity. It is important to understand why the choke algorithm peers are no longer selected based only on their reciproca-
achieves this good reciprocation. One reason is the way the tion level, but also on the pieces available. For this reason, a
active peer set is built. In the following, we focus on how larger set of peers receives pieces from the local peer. Thus,
the local peer selects the remote peers to upload blocks to. a lower fraction of bytes uploaded to the best remote peers.
The choke algorithm in leecher state selects a small subset In summary, we have seen that the choke algorithm fos-
of peers to upload blocks to. We see in Fig. 7, top graph, ters reciprocation. One important reason is that each peer
that the 5 peers that receive the most data from the local elects a small subset of peers to upload data to. This stabil-
peer (in black) represents a large part of the total amount of ity improves the level of reciprocation. We have seen that
uploaded bytes. At first sight, this behavior is expected from this stability is not due to a lack of interest. Our guess is
the choke algorithm because a local peer selects the three that the choke algorithm leads to an equilibrium in the peer
fastest downloading peers to upload to, see section 2.3.2. selection. The exploration of this equilibrium is fundamen-
However, there is no guarantee that these three peers will tal to the understanding of the choke algorithm efficiency.
continue to send data to the local peer. In the case they It is beyond the scope of this study to do this analysis, but
stop sending data to the local peer, the local peer will also it is an important area of future research.
stop reciprocating to them.
We focus on torrent 7 in order to understand how this 4.2.3 Seed State
subset of peers is selected. Fig. 8 (top graph) shows that The new choke algorithm in seed state gives the same
most of the peers are unchoked few times and few peers are service time to each remote peer. We see in Fig. 9 that each
unchoked frequently. The optimistic unchoke gives a chance peer receives roughly the same amount of bytes from the
to each peer to be unchoked few times, whereas the regular local peer. The differences among the peers are due to the
unchoke is used to unchoke frequently peers that send the time remote peers are interested in the local peer. The more
fastest to the local peer. The optimistic unchoke acts as a a remote peer is interested in the local peer, the more times
peer discovery mechanism. The peers that are not unchoked this remote peer is unchoked. This is confirmed by Fig. 8
at all are either initial seeds, or peers that do not stay in (bottom graph) that shows a strong correlation between the
the peer set long enough to be optimistically unchoked. time a peer is interested in the local peer and the number
We see in Fig. 8 (top graph) that there is no correlation be- of times the local peer unchokes it. For torrents 6 and 15
tween the number of times a peer is unchoked and how long the five best downloaders receive most of the bytes, because
a peer is interested in the local peer. However, we see that for both torrents there were less than 10 remote peers that
the number of unchokes for the peers that are unchoked few received bytes from the local peer.
times increases slightly with the interested time duration. This new version of the choke algorithm in seed state is
This is because the optimistic unchoke takes at random a the only one to give the same service time to each leecher.
peer to be optimistically unchoked. Thus the longer a peer This has three fundamental benefits compared to the old
is interested in the local peer, the more likely it has to be version. First, as each leecher receives a small and equiva-
optimistically unchoked. lent service time from the seeds, the entropy of the pieces
is improved. In contrast, with the old choke algorithm, a a discrete-event simulator that supports up to 5000 peers.
few fast leechers can receive most of the pieces, which de- The authors concentrate on the evaluation of the BitTor-
creases the diversity of the pieces. Second, free riders cannot rent performance by looking at the upload capacity of the
receive more than contributing leechers. In contrast, with nodes and at the fairness defined in terms of the volume of
the old choke algorithm, a fast free rider can monopolize a data served by each node. They varied various parameters
seed. Third, the resilience in transient phase is improved. of the simulation as the peer set and active peer set size.
Indeed, the initial seed does not favor any leecher. Thus, They provide important insights into the behavior of Bit-
if a leecher leaves the peer set, it will only remove a small Torrent. However, they do not evaluate a peer set larger
subset of the pieces from the torrent. In contrast, with the than 15 peers, whereas the real implementation of BitTor-
old choke algorithm, the initial seed can send most of the rent has a default value of 80 peers. This restriction may
pieces to a single leecher. If this leecher leaves the torrent, have an important impact on the behavior of the protocol as
that will adversely impact the torrent and increase the time the piece selection strategy is impacted by the peer set size.
in transient state. The validation of a simulator is always hard to perform, and
In summary, the new choke algorithm in seed state gives the simulator restrictions may biased the results. Our study
the same service to time to each leecher. This new algorithm provides real word results that can be used to validate simu-
is a significant improvement over the old one. In particular, lated scenarios. Moreover, our study is different because we
whereas the old choke algorithm can be unfair and sensible do not modify the default parameters of BitTorrent, but we
to free riders, the new choke algorithm is fair and robust to observed its default behavior on a large variety of real tor-
free riders. rents. Finally, we provide new insights into the rarest first
piece selection and on the choke algorithm peer selection. In
particular, we argue that the choke algorithm in its latest
5. RELATED WORK version is fair.
Whereas BitTorrent can be considered as one of the most Pouwelse et al. [21] study the file popularity, file availabil-
successful peer-to-peer protocol, there are few studies on it. ity, download performance, content lifetime and pollution
Several analytical studies of BitTorrent-like protocols ex- level on a popular BitTorrent tracker site. This work is or-
ist [6, 22, 26]. Whereas they provide a good insight into the thogonal to ours as they do not study the core algorithms of
behavior of such protocols, the assumption of global knowl- BitTorrent, but rather focus on the contents distributed us-
edge limits the scope of their conclusions. Biersack et al. [6] ing BitTorrent and on the users behavior. The work that is
propose an analysis of three content distribution models: a the most closely related to our study was done by Izal et al.
linear chain, a tree, and a forest of trees. They discuss the [14]. In this paper, the authors provide seminal insights into
impact of the number of chunks (what we call pieces) and of BitTorrent based on data collected from a tracker log for a
the number of simultaneous uploads (what we call the ac- single yet popular torrent, even if a sketch of a local vision
tive peer set) for each model. They show that the number of from a local peer perspective is presented. Their results
chunks should be large and that the number of simultaneous provide information on peers behavior, and show a corre-
uploads should be between 3 and 5. Yang et al. [26] study lation between uploaded and downloaded amount of data.
the service capacity of BitTorrent-like protocols. They show Our work differs from [14] in that we provide a thorough
that the service capacity increases exponentially at the be- measurement-based analysis of the rarest first and choke al-
ginning of the torrent and then scale well with the number gorithms. We also study a large variety of torrents, which
of peers. They also present traces obtained from a tracker. allows us not to be biased toward a particular type of tor-
Such traces are very different from ours, as they do not allow rent. Moreover, without pretending to answer all possible
to study the dynamics of a peer. Both studies presented in questions that arise from a simple yet powerful protocol as
[6] and [26] are orthogonal to ours as they do not consider the BitTorrent, we provide new insights into the rarest first and
dynamics induced by the choke and rarest first algorithms. choke algorithms.
Qiu and Srikant [22] extend the initial work presented in [26]
by providing an analytical solution to a fluid model of Bit-
Torrent. Their results show the high efficiency in terms of 6. DISCUSSION
system capacity utilization of BitTorrent, both in a steady In this paper we go beyond the common wisdom that Bit-
state and in a transient regime. Furthermore, the authors Torrent performs well. We have performed a detailed experi-
concentrate on a game-theoretical analysis of the choke and mental evaluation of the rarest first and choke algorithms on
rarest first algorithms. However, a major limitation of this real torrents with varying characteristics in terms of num-
analytical model is the assumption of global knowledge of all ber of leechers, number of seeds, and content sizes. Whereas
peers to make the peer selection. Indeed, in a real system, we do not pretend to have reached completeness, our eval-
each peer has only a limited view of the other peers, which uation gives a reasonable understanding of the behavior of
is defined by its peer set. As a consequence, a peer cannot both algorithms on a large variety of real cases.
find the best suited peers to send data to in all the peers in Our main results are the following.
the torrent (global optimization assumption), but in its own
peer set (local and distributed optimization). Also, the au- • The rarest first algorithm guarantees a close to ideal
thors do not evaluate the rarest first algorithm, but assume entropy on the presented torrents. In particular, it
a uniform distribution of pieces. Our study is complemen- prevents the reappearance of rare pieces and of the
tary, as it provides an experimental evaluation of algorithms last pieces problem.
with limited knowledge. In particular, we show that the ef-
ficiency on real torrents is close to the one predicted by the • We have found that torrents in a startup phase can
models. have low entropy. The duration of this phase depends
Felber et al. [9] compare different peer and piece selection only on the upload capacity of the source of the con-
strategies in static scenarios using simulations. Bharambe et tent. In particular, the rarest first algorithm is not
al. [5] present a simulation-based study of BitTorrent using responsible of the low entropy during this phase.
• The fairness achieved with a bit level tit-for-tat strat- [4] R. Bhagwan, S. Savagen, and G. Voelker. Understanding
egy is not appropriate in the context of peer-to-peer availability. In International Workshop on Peer-to-Peer
Systems, Berkeley, CA, USA, February 2003.
file replication. We have proposed two new fairness [5] A. R. Bharambe, C. Herley, and V. N. Padmanabhan.
criteria in this context. Analysing and improving bittorrent performance. In Proc.
IEEE Infocom’2006, Barcelona, Spain, April 2006.
• The choke algorithm is fair, fosters reciprocation, and [6] E. W. Biersack, P. Rodriguez, and P. Felber. Performance
is robust to free riders in its latest version. analysis of peer-to-peer networks for file distribution. In Proc.
Fifth International Workshop on Quality of Future Internet
Services (QofIS’04), Barcelona, Spain, September 2004.
Our main contribution is to show that on real torrents
[7] Y. Chawathe, S. Ratnasamy, L. Breslau, and S. Shenker.
the rarest first and choke algorithms are enough to have an Making gnutella-like p2p systems scalable. In Proc. ACM
efficient and viable file replication protocol in the Internet. SIGCOMM’03, Karlsruhe, Germany, August 25-29 2003.
In particular, we discussed the benefits of the new choke [8] B. Cohen. Incentives build robustness in bittorrent. In Proc.
First Workshop on Economics of Peer-to-Peer Systems,
algorithm in seed state. This new algorithm outperforms Berkeley, USA, June 2003.
the old one and should replace it. We also identified two [9] P. Felber and E. W. Biersack. Self-scaling networks for content
new areas of improvement: the downloading speed of the distribution. In Proc. International Workshop on Self-*
first blocks, and the duration of the transient phase. Properties in Complex Information Systems, Bertinoro, Italy,
May-June 2004.
The rarest first algorithm is simple. It does not require [10] P. Ganesan and M. Seshadri. On cooperative content
global knowledge or important computational resources. distribution and the price of barter. In IEEE ICDCS’05,
Yet, it guarantees a peer availability, for the peer selection, Columbus, Ohio, USA, June 2005.
close to the ideal one. We do not see any striking argument [11] C. Gkantsidis and P. Rodriguez. Network coding for large scale
content distribution. In Proc. IEEE Infocom’2005, Miami,
in favor of a more complex solution in the evaluated context. USA, March 2005.
We do not claim that the choke algorithm is optimal. The [12] K. Gummadi, R. Gummadi, S. Gribble, S. Ratnasamy,
understanding of its equilibrium is an area of future research. S. Shenker, and I. Stoica. The impact of dht routing geometry
However, it achieves a reasonable level of efficiency, and most on resilience and proximity. In Proc. ACM SIGCOMM’03,
Karlsruhe, Germany, August 25-29 2003.
importantly it guarantees a viable system by fostering recip- [13] L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, and X. Zhang.
rocation, preventing free riders to attack the stability of the Measurements, analysis, and modeling of bittorrent-like
system, and using the excess capacity. Solutions based on a systems. In Proc. ACM IMC’2005, Berkeley, CA, USA,
October 2005.
bit level tit-for-tat are not appropriate.
[14] M. Izal, G. Urvoy-Keller, E. W. Biersack, P. Felber, A. A.
Our conclusions only hold in the context we explored, i.e., Hamra, and L. Garcés-Erice. Dissecting bittorrent: Five
peer-to-peer file replication in the Internet. There are many months in a torrent’s lifetime. In Proc. PAM’04, Antibes
different contexts where peer-to-peer file replication can be Juan-les-Pins, France, April 2004.
[15] S. Jun and M. Ahamad. Incentives in bittorrent induce free
used: small files, small group of peers, dynamic groups in riding. In Proc. SIGCOMM’05 Workshops, Philadelphia, PA,
ad-hoc networks, peers with partial connectivity, etc. All USA, August 2005.
these contexts are beyond the scope of this paper, but are [16] T. Karagiannis, A. Broido, N. Brownlee, and K. C. Claffy. Is
interesting areas for future research. p2p dying or just hiding? In Proc. IEEE Globecom’04, Dalla,
Texas, USA, Nov. 29-Dec. 3 2004.
We also identified two areas of improvement. The time [17] T. Karagiannis, A. Broido, M. Faloutsos, and K. C. Claffy.
to deliver the first blocks of data should be reduced. In Transport layer identification of p2p traffic. In Proc. ACM
the case of large contents, this delivery time will marginally IMC’04, Taormina, Sicily, Italy, October 2004.
increase the overall download time. But, in the case of small [18] D. Kostić, R. Braud, C. Killian, E. Vandekieft, J. W.
Anderson, A. C. Snoeren, and A. Vahdat. Maintaining high
contents, the penalty is significant. Also, the duration of bandwidth under dynamic network conditions. In Proc.
the transient phase should be minimized as the low entropy USENIX’05, Anaheim, CA, USA, April 2005.
may results in a performance penalty. The way to solve [19] A. Legout, G. Urvoy-Keller, and P. Michiardi. Rarest first and
these problems is beyond the scope of this study, but is an choke algorithms are enough. Technical Report
(inria-00001111, version 3 - 6 September 2006), INRIA, Sophia
interesting area of future research. Antipolis, September 2006.
We believe that this work sheds a new light on two new [20] A. Parker. The true picture of peer-to-peer filesharing.
algorithms that enrich previous content distribution tech- http://www.cachelogic.com/, July 2004.
niques in the Internet. BitTorrent is the only existing peer- [21] J. A. Pouwelse, P. Garbacki, D. H. J. Epema, and H. J. Sips.
The bittorrent p2p file-sharing system: Measurements and
to-peer replication protocol that exploits these two promis- analysis. In Proc. 4th International Workshop on
ing algorithms in order to improve system capacity utiliza- Peer-to-Peer Systems (IPTPS’05), Ithaca, New York, USA,
tion. We deem that the understanding of these two algo- February 2005.
[22] D. Qiu and R. Srikant. Modeling and performance analysis of
rithms is of fundamental importance for the design of future bittorrent-like peer-to-peer networks. In Proc. ACM
peer-to-peer content distribution applications. SIGCOMM’04, Portland, Oregon, USA, Aug. 30–Sept. 3 2004.
[23] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and
S. Shenker. A scalable content-addressable network. In Proc.
Acknowledgment ACM SIGCOMM’01, San Diego, California, USA, August
27-31 2001.
We would like to thank the anonymous reviewers, and also [24] P. Rodriguez and E. W. Biersack. Dynamic parallel-access to
Chadi Barakat, Ernst W. Biersack, Walid Dabbous, Katia replicated content in the internet. IEEE/ACM Transactions
Obraczka, Thierry Turletti for their valuable comments. on Networking, 10(4), August 2002.
[25] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and
H. Balakrishnan. Chord: A scalable peer-to-peer lookup service
7. REFERENCES for internet applications. In Proc. ACM SIGCOMM’01, San
Diego, California, USA, August 27-31 2001.
[1] http://www.slyck.com.
[26] X. Yang and G. de Veciana. Service capacity in peer-to-peer
[2] http://www.bittorrent.com/. networks. In Proc. IEEE Infocom’04, pages 1–11, Hong Kong,
[3] Bittorrent protocol specification v1.0. China, March 2004.
http://wiki.theory.org/BitTorrentSpecification, June 2005.

You might also like