Rarest First and Choke Algorithms Are Enough
Rarest First and Choke Algorithms Are Enough
Number of Copies
active upload or download. Therefore, these misbehaving
peers adversely bias our entropy characterization. Filtering Max
all peers that stay less than 10 seconds remove the bias. 40 Mean
In summary, we have seen that the rarest first algorithm Min
enforces a close to ideal entropy for the presented torrents.
We have identified torrents with low entropy and shown that 20
the rarest first algorithm is not responsible for this low en-
tropy. We have also identified rare cases where the rarest
first algorithm does not perform optimally, but we have ex- 0
0 0.5 1 1.5 2 2.5 3
plained that these cases do not justify a replacement with Time (s) 4
x 10
a more complex solution. In the following, we evaluate how
the rarest first piece selection strategy achieves high entropy. Figure 2: Evolution of the number of copies of pieces in the
peer set with time for torrent 8 in leecher state. Legend:
4.1.2 Rarest First Algorithm Dynamics The dotted line represents the number of copies of the most
We classify a torrent in two states: the transient state and replicated piece in the peer set at each instant. The solid
the steady state5 . In transient state, there is only one seed line represents the mean number of copies over all the pieces
in the torrent. In particular, there are some pieces that are in the peer set at each instant. The dashed line represents
rare, i.e., present only at the seed. This state corresponds to the number of copies of the least replicated piece in the peer
the beginning of the torrent, when the initial seed has not yet set at each instant.
uploaded all the pieces of the content. All torrents with low
entropy (Fig. 1, top graph) are in transient state. A good Number of Rarest Pieces, LS
piece replication algorithm should minimize the time spent 300
in the transient state because low entropy may adversely
impact the service capacity of a torrent by biasing the peer 250
selection strategy. In steady state, there is no rare piece, and
the piece replication strategy should prevent the torrent to
200
enter again a transient state. All torrents with high entropy
Num. rarest
4.1.2.1 Transient State. Figure 3: Evolution of the number of rarest pieces in the
In order to understand the dynamics of the rarest first peer set for torrent 8 in leecher state. The rarest pieces set
algorithm in transient state, we focus on torrent 8. This is formed by the pieces that are equally the rarest, i.e., the
torrent consisted of 1 seed and 861 leechers at the beginning pieces that have the least number of copies in the peer set.
of the experiment. The file distributed in this torrent is split
in 863 pieces. We run this experiment during 58991 seconds,
but in the following we only discuss the results for the first
available pieces are replicated with an exponential capacity
29959 seconds when the local peer is in leecher state.
of service [26], but rare pieces are served by the initial seed
Torrent 8 is in transient state for most of the experiment.
at a constant rate. This is confirmed by Fig. 3 that shows
As we don’t have global knowledge of the torrent, we do not
the number of rarest pieces, i.e., the set size of the pieces
have a direct observation of the transient state. However,
that are equally rarest. We see that the number of rarest
there are several evidences of this state. Indeed, Fig. 2 shows
that there are missing pieces during the experiment in the pieces decreases linearly with time. As the size of each piece
in this torrent is 4 MB, a rapid calculation shows that the
local peer set, as the minimum curve (dashed line) is at
rarest pieces are duplicated in the peer set at a constant rate
zero. Moreover, we probed the tracker to get statistics on
close to 36 kB/s. We do not have a direct proof that this
the number of seeds and leechers during this experiment. We
rate is the one of the initial seed, because we do not have
found that this torrent had only one seed for the duration
global knowledge of the torrent. However, the torrent is in
of the experiment.
its startup phase and most of the pieces are only available on
We see in Fig. 1, top graph, that torrent 8 has low entropy.
This low entropy is due to the limited upload capacity of the initial seed. Indeed, Fig. 2 shows that there are missing
pieces in the peer set, thus the rarest pieces presented in
the initial seed. Indeed, when a torrent is in transient state,
Fig. 3 are missing pieces in the peer set. Therefore, only
5
Our definition of transient and steady state differs from the the initial seed can serve the missing pieces shown in Fig. 3.
one given by Yang et al. [26]. In conclusion, the upload capacity of the initial seed is the
bottleneck for the replication of the rare pieces, and the time Replication of Pieces in the Peer Set
spent in transient state only depends on the upload capacity 80
of the initial seed. Max
The rarest first algorithm attempts to minimize the time Mean
spent in transient state and replicates fast available pieces. 60 Min
Number of Copies
Indeed, leechers download first the rare pieces. As the rare
pieces are only present on the initial seed, the upload ca-
pacity of the initial seed will be fully utilized and no or few 40
duplicate rare pieces will be served by the initial seed. Once
served by the initial seed, a rare piece becomes available and
is served in the torrent with an increasing capacity of ser- 20
vice. As rare pieces are served at a constant rate, most of
the capacity of service of the torrent is used to replicate the
available pieces on leechers. Indeed, Fig. 2 shows that once 0
0 0.5 1 1.5 2 2.5 3
a piece is served by the initial seed, the rarest first algorithm Time (s) 4
x 10
will start to replicate it fast as shown by the continuous in-
crease in the mean number of copies over all the peers, and Figure 4: Evolution of the number of copies of pieces in the
by the number of copies of the most replicated piece (dotted peer set with time for torrent 7. Legend: The dotted line
line) that is always close to the maximum peer set size of represents the number of copies of the most replicated piece
80. in the peer set at each instant. The solid line represents the
In summary, the low entropy observed for some torrents is mean number of copies over all the pieces in the peer set
due to the transient phase. The duration of this phase can- at each instant. The dashed line represents the number of
not be shorter than the time for the initial seed to send one copies of the least replicated piece in the peer set at each
copy of each piece, which is constrained by the upload ca- instant.
pacity of the initial seed. Thus, the time spent in this phase
cannot be shorten further by the piece replication strategy. Size of the Peer Set
The rarest first algorithm minimizes the time spent in tran- 80
sient state. Once a piece is served by the initial seed, the
rarest first algorithm replicates it fast. Therefore, a replace- 70
ment of the rarest first algorithm by another algorithm can-
60
not be justified based on the real torrents we have monitored
in transient state.
Peer set size
50
40
4.1.2.2 Steady State.
In order to understand the dynamics of the rarest first 30
algorithm in steady state, we focus on torrent 7. This torrent
consisted of 1 seed and 713 leechers at the beginning of the 20
experiment. We have seen on Fig. 1 that torrent 7 has a
10
high entropy. Fig. 4 shows that the least replicated piece
(min curve) has always more than 1 copy in the peer set. 0
Thus, torrent 7 is in steady state. 0 0.5 1 1.5 2 2.5 3
Time (s) x 10
4
In the following, we present the dynamics of the rarest first
algorithm in steady state, and explain how this algorithm
prevents the torrent to return in transient state. Fig. 4 shows Figure 5: Evolution of the peer set size for torrent 7.
that the mean number of copies remains well bounded over
time by the number of copies of the most and least replicated
pieces. The variation observed in the number of copies are In summary, the rarest first algorithm in steady state en-
explained by the variation of the peer set size, see Fig. 5. sures a good replication of the pieces in real torrents. It
The decrease in the number of copies 9051 seconds after the also replicates fast the rarest pieces in order to prevent the
beginning of the experiment corresponds to the local peer reappearance of a transient state. We conclude that on real
switching to seed state. Indeed, when a leecher becomes a torrents in steady state, the rarest first algorithm is enough
seed, it closes its connections to all the seeds. to guarantee a high entropy.
The rarest first algorithm does a very good job at increas-
ing the number of copies of the rarest pieces. Fig. 4 shows 4.1.3 Last Pieces Problem
that the number of copies of the least replicated piece (min We say that there is a last pieces6 problem when the down-
curve) closely follows the mean, but does not significantly load speed suffers a significant slow down for the last pieces.
get closer. However, we see in Fig. 6 that the number of This problem is due to some pieces replicated on few over-
rarest pieces, i.e., the set size of the pieces that are equally loaded peers, i.e., peers that receive more requests than they
rarest, follow a sawtooth behavior. Each peer joining or can serve. This problem is detected by a peer only at the
leaving the peer set can alter the set of rarest pieces. But, end of the content download. Indeed, a peer always seeks for
as soon as a new set of pieces becomes rarest, the rarest first fast peers to download from. Thus, it is likely that if some
algorithm quickly duplicates them as shown by a consistent pieces are available on only few overloaded peers, these peers
drop in the number of rarest pieces in Fig.6. Finally, we 6
never observed in any of our torrents a steady state followed This problem is usually referenced as the last piece (singu-
by a transient state. lar) problem. However, there is no reason why this problem
affects only a single piece.
Number of Rarest Pieces make the comparison: the steady and transient states. In
40 steady state, we have seen in section 4.1.2.2 that the entropy
35
of the presented torrents is close to one with rarest first. An
entropy close to one means that each peer is interested in
30 each other peer in its peer set most of the time. As this is
close to the target of an ideal piece selection strategy, we see
Num. rarest
Ratio
tat fairness, when there is more capacity of service in the 0.5
torrent than request for this capacity, the excess capacity
will be lost even if slow leechers or free riders could benefit
from it. Excess capacity is not rare as it is a fundamental 0
0 5 10 15 20 25
property of peer-to-peer applications. Indeed, there are two
important characteristics of peer-to-peer applications that Contribution to the Amount of Downloaded Bytes, L
tit-for-tat fairness does not take into account. First, leechers 1
can have an asymmetrical network connectivity, the upload
capacity being lower than the download capacity. In the
Ratio
case of tit-for-tat fairness, a leecher will never be able to use 0.5
its full download capacity even if there is excess capacity in
the peer-to-peer session. Second, a seed cannot evaluate the
reciprocation of a leecher, because a seed does not need any 0
0 5 10 15 20 25
piece. As a consequence, there is no way for a seed to enforce Torrent ID
tit-for-tat fairness. But, seeds can represent an important
part of a peer-to-peer session, see Table 1. For this reason, Figure 7: Fairness characterization of the choke algorithm
it is fundamental to have a notion of fairness that takes into in leecher state for each torrent. Top graph: Amount of
account seeds. bytes uploaded from the local peer to remote peers. We cre-
In the following, we present two fairness criteria that take ated 6 sets of 5 remote peers each, the first set (in black) con-
into account the characteristics of leechers and seeds and the tains the 5 remote peers that receive the most bytes from the
notion of excess capacity: local peer. Each next set contains the next 5 remote peers.
The sets representation goes from black for the set contain-
• Any leecher i with an upload speed Ui should get a ing the 5 best remote downloaders, to white for the set con-
lower download speed than any other leecher j with taining the 25 to 30 best downloaders. Bottom graph:
an upload speed Uj > Ui . Amount of bytes downloaded from remote peers to the local
peer. The same set construction is kept. Thus, this graph
• A seed should give the same service time to each shows how much each set of downloaders, as defined in the
leecher. top graph, uploaded to the local peer.
With these two simple criteria, leechers are allowed to use
the excess capacity, but not at the expense of leechers with
a higher level of contribution. Reciprocation is fostered and 4.2.2 Leecher State
free riders are penalized. Seeds do not make a distinction The choke algorithm in leecher state fosters reciprocation.
between contributing leechers and free riders. However, free We see in Fig. 7 that peers that receive the most from the lo-
riders cannot compromise the stability of the system because cal peer (top graph) are also peers from which the local peer
the more there are contributing leechers, the less the free downloaded the most (bottom graph). Indeed, the same
riders receive from the seeds. color in the top and bottom graphs represents the same set
Tit-for-tat fairness can be extended to evenly distribute of peers. All seeds are removed from the data used for the
the capacity of seeds to peers in a torrent. With this exten- bottom graph, as it is not possible to reciprocate data to
sion, tit-for-tat fairness will verify our two fairness criteria. seeds. This way, a ratio of 1 in the bottom graph represents
However, in the context of peers with asymmetric capacity, the total amount of bytes downloaded from leechers.
finding a threshold that maximizes the capacity of the sys- Two torrents present a different characteristic. The local
tem is a hard task that is not yet solved in the context of peer for torrent 19 does not upload any byte in leecher state
a distributed system. Moreover, using a default threshold because due to the small number of leechers in this torrent,
may lead to a high inefficiency of the system. We will see in the local peer in leecher state had no leecher in its peer
the following that the choke algorithm verifies our two fair- set. Torrents 5, which is in transient state, has a low level
ness criteria with a simple distributed algorithm that does of reciprocation. This is explained by a single leecher that
not require the complex computation of a threshold. gave to the local peer half of the pieces, but who received few
To summarize the above discussion, tit-for-tat fairness is pieces from the local peer. The reason is that this remote
not appropriate in the context of peer-to-peer file replication leecher was almost never interested in the local peer. This
protocols like BitTorrent. For this reason, we proposed two problem is due to the low entropy of the torrent in transient
new criteria of fairness, one for leechers and one for seeds. state.
It is beyond the scope of this study to perform a detailed Because the choke algorithm takes its decisions based on
discussion of the fairness issues for peer-to-peer protocols. the current download rate of the remote peers, it does not
Our intent is to give a good intuition on how a peer-to-peer achieve a perfect reciprocation of the amount of bytes down-
protocol should behave in order to achieve a reasonable level loaded and uploaded. However, Fig. 7 shows that the peers
of fairness. from which the local peer downloads the most are also the
In the following, we show on real torrents that the choke peers that receive the most uploaded bytes. Thus there is
algorithm in leecher state fosters reciprocation, and that the a strong correlation between the amount of bytes uploaded
new choke algorithm in seed state gives the same service time and the amount of bytes downloaded.
to each leecher. We conclude that the choke algorithm is fair The above results show that with a simple distributed
according to our two new fairness criteria. algorithm and without any stringent reciprocation require-
Correlation Unchoke and Interested Duration, LS Contribution of Peers to the Amount of Uploaded Bytes, SS
1
# of Unchokes
400
0.8
200
Upload Contribution
0 0.6
0 2000 4000 6000 8000
Correlation Unchoke and Interested Duration, SS
0.4
400
# of Unchokes
200 0.2
0 0
0 1 2 3 4 5 6 0 5 10 15 20 25
Interested Time (s) 4 Torrent ID
x 10
Figure 9: Fairness characterization of the choke algorithm
Figure 8: Correlation between the number of unchokes and
in seed state for each torrent. Legend: We created 6 sets of
the interested time for each remote peer for torrent 7. For
5 remote peers each, the first set (in black) contains the 5
each remote peer, a dot represents the correlation between
remote peers that receive the most bytes from the local peer.
the number of times this remote peer is unchoked by the
Each next set contains the next 5 remote peers. The set
local peer and the time this remote peer is interested in the
representation goes from black from the set containing the
local peer. Top graph: Correlation when the local peer
5 best remote downloaders, to white for the set containing
is in leecher state. Bottom graph: Correlation when the
the 25 to 30 best downloaders.
local peer is in seed state.