Nwu Eecs 09 22
Nwu Eecs 09 22
!
!∀ #∃! %
&!!
#∀ ! ∀ ∋ (∃!
)!
!
∀
! #∃%∃& ∋
(
∗&+
∀!,
Pitfalls for Testbed Evaluations of Internet Systems
David Choffnes and Fabián E. Bustamante
Department of Electrical Engineering & Computer Science, Northwestern University
{drchoffnes,fabianb}@eecs.northwestern.edu
1
Internet paths and estimating cross-network traffic costs, to Tier-1 Customer-Provider Peering
3.14% 12.86% 40.99%
name a few. While several research efforts have successfully
extracted detailed topologies from public vantage points, it Table 1: Percent of links missing from public views, but found
is well known that these are incomplete Internet views. from edge systems, for major categories of AS relationships.
To explore a lower bound for missing topology informa-
tion, we used AS-level paths and AS relationships inferred
from traceroute data gathered from hundreds of thousands of 1
0.9
users located at the edge of the network [2]. As Fig. 1 shows,
0.8
the number of vantage points available from edge systems 0.7 All - Up
CDF [X<v]
far outnumbers those from public views, particularly for 0.6 All - Down
lower tiers of the Internet hierarchy where most of the ASes 0.5 BGP - Up
reside. 0.4 BGP - Down
Our dataset includes probes from nearly 1 million source 0.3
IP addresses to more than 84 million unique destination IP 0.2
0.1
addresses, all of which represent active users of the BitTor- 0
rent P2P system. By comparison, the BitProbes study [6] 0 0.2 0.4 0.6 0.8 1
used a few hundred sources from the PlanetLab testbed to Missing volume
measure P2P hosts comprising 500,000 destination IPs.
Missing links. Besides providing a large number of Figure 2: CDF of the portion of each host’s traffic volume that
vantage points, our dataset also discovers links invisible could not be mapped to a path based on both public views and
to the public view. In total, our approach identified 20% traceroutes between a subset of P2P users.
additional links missing from the public view, the vast
majority of which were located below Tier-1 ASes in the
Internet hierarchy. Not surprisingly, the number of missing From these results it is clear that our community may need
links discovered increases with the tier number location for to revisit analysis that is based on public-view topologies
these links. Thus, when evaluating the interaction between (e.g., when looking at traffic cost or Internet resilience).
network topologies and Internet systems at the edge (often Impact of missing paths. To better understand how
located in lower Internet tiers), testbed-based topologies are this missing information affects studies of Internet-scale
less likely to include many relevant links. systems, we investigate the impact of missing links using
three weeks of connection data from P2P users. In particular,
we would like to know how much of these users’ traffic
P2P traceroute
volumes can be mapped to AS-level paths – an essential step
2000 for evaluating P2P traffic costs and locality.
Public BGP
We begin by determining the volume of data transferred
Total # of vantage points
1500
over each connection for each host, then we map each
connection to a source/destination AS pair using the Team
Cymru service [18]. We use the set of paths from public
1000 views and P2P traceroutes [2] and, finally, for each host we
determine the portion of its traffic volume that could not be
mapped to any AS path in our dataset.
500
Figure 2 uses a cumulative distribution function (CDF)
to plot these unmapped traffic volumes using only BGP
0
data (labeled BGP) and the entire dataset (labeled All).
1 2 3 4 5 The figure shows that when using All path information, we
Network tier of vantage point
cannot locate complete path information for 84% of hosts;
fortunately, the median portion of traffic for which we cannot
Figure 1: Number of vantage point ASes corresponding to
inferred network tiers.
locate an AS path is only 6.7%. Of the hosts in our dataset
16% use connections for which we have path information for
In addition to the locations of links in the Internet hierar- only half of their traffic volumes and 3% use connections for
chy, it is useful to understand what kinds of AS relationships which we have no path information at all. When using only
are included in these missing links. Table 1 categorizes links BGP information the median volume of unaccounted traffic
into Tier-1, customer-provider or peering links and shows is nearly 90%.
the missing links as a fraction of existing links in the public One implication of this result is that any Internet wide
view, for each category. Note that there is a large number study from a testbed environment cannot accurately charac-
of additional peering links (44%) and, more surprisingly, a terize path properties for traffic volumes from the majority
significant fraction of new customer-provider links (12%). of P2P users. Even though the additional links discovered
2
by Chen et al. [2] cannot identify every single paths carrying 1
traffic volumes for P2P systems, a partial view of these paths MIT
0.9 PL
based on traceroutes to randomly selected peers allows us to 0.8 PL-to-P2P
map nearly all of the flows. 0.7 P2P
CDF [X<rtt]
0.6
0.5
4. GENERALIZING MEASUREMENTS 0.4
While the previous section showed that large portions of 0.3
the network are invisible to current testbeds, in this section 0.2
0.1
we show how properties of topology links measured from
0
testbed vantage points do not extend to those measured from 1 10 100 1000
the edge of the network. We begin by focusing on estimating RTT (ms)
distances between Internet hosts, which is essential to a va-
riety of network performance optimizations including server Figure 3: CDFs of latencies from different measurement
selection, central leader election and connection biasing. platforms (semilog scale). Our measurement study exclusively
We close the section by examining Internet-wide bandwidth between peers in Vuze (labeled P2P) exhibits double the median
latency “in the wild” (labeled PL-to-P2P).
capacities as measured by BitTorrent throughput from users
at the edge of the network.
There is a large body of research addressing the issue
of how to measure, calculate and encode Internet distances B is larger than the sum of the latency from A to C and C to
in terms of round-trip latencies [5, 10, 16, 17]. Generally, B (A 6= B 6= C). This is caused by factors such as network
these solutions rely on methods to predict latencies between topology and routing policies (see, for example, [8, 17]).
arbitrary hosts without requiring the N 2 number of mea- Wang et al. [19] demonstrate that TIVs can significantly
surements that provide ground-truth information. Previous reduce the accuracy of network positioning systems.
work has identified the following key properties that impact We performed a TIV analysis on our dataset and found
network positioning performance: the structure of the la- that over 13% of the triangles had TIVs (affecting over
tency space, the rate of triangle-inequality violations (TIVs) 99.5% of the source/destination pairs). Lumezanu et al. [8]
in this latency space and last-mile delays. We now show study the dynamics of TIVs and demonstrate that using
how these key properties are significantly different when the minimum RTTs, as done in this study, is likely to
measured exclusively from edge systems compared to those underestimate the rate of TIVs. Thus our results can be
measured from testbed environments. considered a lower bound for TIVs in a large-scale P2P
We base our results on 2 billion latency samples gathered environment.
from edge systems during June 10–25th, 2008.3 Unlike Compared to TIV rates reported in an analysis of datasets
studies that use PlanetLab hosts to measure latencies or infer from Tang and Crovella [17], TIV rates in the P2P environ-
them based on latencies between DNS servers, this dataset ment we studied are between 100% and 400% higher, and
consists exclusively of directly measured latencies between the number of source/destination pairs experiencing TIVs in
edge systems. It is also an order of magnitude larger than the our dataset (nearly 100%) is significantly greater than the
set used by Agarwal et al. [1] to evaluate server selection in 83% reported by Ledlie et al. [7]. These patterns for TIVs
gaming systems. and their severity hints at the challenges in accounting for
Latencies. To begin, Figure 3 compares the average TIVs in coordinate systems.
latencies seen by hosts using the Ono plugin (labeled P2P) Last-mile effects. It is well known that last-mile links
to those seen from three related projects: the RON testbed often have poorer quality than the well provisioned links in
(MIT), PlanetLab (PL) and Ledlie et al.’s study (PL-to- transit networks. The problem is particularly acute in typical
P2P). The graph shows that latencies from edge systems are network edge settings. However, most of today’s network
generally much larger than those from MIT King [5] and positioning systems either ignore or naively account for this
PlanetLab (PL). In fact, the median latency in our dataset is effect.
twice as large as reported by the study by Ledlie et al. [7], We analyze last-mile effects by dividing the traceroute-
which used PlanetLab nodes to probe Vuze P2P users (PL- based IP-level path between hosts into quartiles and de-
to-P2P).4 termining the portion of the end-to-end latency contained
Triangle-Inequality Violations. TIVs in the Internet in each quartile. If the latency were evenly distributed
delay space occur when the latency between hosts A and among IP hops along a path, each quartile would contain
3 25% of the end-to-end latency. In contrast, the first quartile
For more details about this dataset and our methodology, see [4].
4 (which is very likely to contain the entire first mile) accounts
We found that P2P traffic did not significantly impact latencies;
when our measurement hosts were not transferring data their for disproportionately large fractions of the total end-to-
latencies were smaller than those in the complete dataset, but the end latency. For instance, when looking at the median
difference in median latencies was less than 10%. values, the first quartile alone captures 80% of the end-to-
3
end latency. The middle two quartiles, in contrast, each 1
account for only 8%. Also note that the first quartile (and 0.9
a significant fraction of the last quartile) has a large number 0.8
of values close to and larger than 1. This demonstrates the 0.7
CDF [X<t]
variance in latencies along these first and last miles, where 0.6
0.5
measurements to individual hops along the path can yield
0.4
latencies that are close to or larger than the total end-to-end 0.3
latency (as measured by probes to the last hop). In fact, 0.2
more than 10% of the first quartile samples have a ratio 0.1 Up
Down
greater than 1. While Vivaldi uses “height” to account for 0
1 10 100 1000 10000 100000
(first- and) last-mile links [5], this analysis suggests that a
single parameter is insufficient due to the large and variable Throughput (KB/s)
latencies in a large-scale P2P environment.
Figure 4: CDF of transfer rates for all users, where the median
Bandwidth capacities. Bandwidth capacities are an
is only 50 to 100 KB/s. This suggests that the BitTorrent system
important factor in the design of distributed systems, from is dominated by mid-to-low-capacity hosts.
making encoding decisions in video streaming to informing
peer selection in P2P systems. While there are many pro-
posed techniques for estimating capacities, these techniques one for most hosts – some of the above samples contain
are not amenable to widespread studies due to limitations only upstream transfer rates. For those hosts where we can
on measurement traffic volumes and the need for compliant measure both upstream and downstream throughputs, we
endpoints. Further, previous work has cast doubts on their find that the median ratio is 0.32 and the 90th percentile
accuracy [14]. ratio is 0.77. This is in line with the asymmetric bandwidth
We take an alternative approach to estimating capacities allocations typical of DSL and cable Internet technologies
based on passively monitoring BitTorrent throughput. Bit- being used by the majority of our vantage points.
Torrent generally attempts to maximize bandwidth capaci- We now compare these lower-bound estimates of ca-
ties to minimize time to completion for downloading files, pacities with those measured from PlanetLab in 2006 as
so we expect observed throughputs to be proportional to reported by Isdal et al. [6]. We expect that bandwidth
a host’s bandwidth capacity. This approach alleviates the capacities have increased since then, so BitTorrent-based
issues of compliant endpoints and generating measurement capacities should be larger. The authors report that 70%
traffic; however, this environment can be affected by ISP of hosts have an upload capacity between 350Kbps and
interference (e.g., traffic shaping) and user-specified limits 1Mbps. However, we surprisingly find that only 45% of
on the maximum throughput consumed by a P2P application. hosts in our study achieve such transfer rates. In fact, 40%
While accounting for ISP interference is the topic of ongoing of hosts in our study achieve less than 350 Kbps maximum
work, we have the necessary data to account for user- upstream rates. This suggests that even if the testbed-based
specified limits and filter out these cases. bandwidth capacity measurements were accurate, they are
After this filtering step, we use the maximum upstream insufficient for predicting achieved transfer rates in a P2P
and downstream transfer rates seen by each host during a system. Although Isdal et al. were unable to directly
three-week period in April, 2009. As such, our results are a measure or estimate downstream rates at the edge of the
lower bound for each host’s bandwidth capacity. network, Fig. 4 shows that they closely track upstream
Figure 4 depicts a CDF of maximum upstream and down- rates until after the 30th percentile, where downstream rates
stream throughput seen for each host in our study. First, significantly exceed upstream ones.
we note the rarity of step-like functions in the CDFs, which Finally, we analyze the maximum throughput achieved
would occur if BitTorrent were, as commonly believed, by hosts grouped by country in Fig. 5. We find that
most often saturating the full bandwidth capacity. Thus, hosts in Germany, Romania and Sweden achieve the highest
while BitTorrent attempts to saturate each user’s downstream transfer rates while those in India, the Philippines and
bandwidth capacity, in practice it does not always do so. Brazil achieve the lowest. This is in line with results from
We also find that the median upstream rate is 54 KB/s independent bandwidth tests from Speedtest.net, indicating
while the median for downstream rates is 102 KB/s. Inter- that maximum transfer rates measured from P2P users, when
estingly, this indicates that although asymmetric bandwidth grouped by location, are in fact predictive of the bandwidth
allocation – often with about an order of magnitude larger capacity rankings.
downstream rates – is common in the Internet, the transfer
rates achieved by P2P systems are indeed limited by the 5. INFERRING SYSTEM PERFORMANCE
peers’ upstream capacities.
It is important to note that these CDFs do not imply that While measurements from the edge of the network help
the ratio of upstream to downstream capacities is nearly us better understand network topologies, delay behavior and
bandwidth capacity distributions, they also are essential to
4
JP SE BR GB CA DE
US IN AU RO PH latency (median error is ≈ 165 ms) than GNP and V1;
1 however, we show in the next paragraph that its relative error
0.9 is in fact smaller.
0.8 Relative error, the difference between the expected and
0.7 measured latency, is a better measure of accuracy for net-
0.6 work positioning systems. To compute relative errors,
CDF [X<r]
0.5
we first calculate the absolute value of the relative error
between Vivaldi’s estimated latency and the ping latency
0.4
for each sample, then find the average of these errors for
0.3
each client running our software. Fig. 6(b) plots a CDF
0.2
of these values; each point represents the average relative
0.1
error for a particular client. For Vivaldi V1, the median
0 relative error for each node is approximately 74%, whereas
1 10 100 1000 10000
Per-Country Download (KB/s) the same for V2 is 55% – both significantly higher than
the 26% median relative error reported in studies based
Figure 5: Per-country throughput CDFs, showing that on PlanetLab nodes [7]. Interestingly, the median error
Germany, Romania and Sweden have the highest average for Vivaldi V2 is approximately the same as for GNP,
capacities while India, the Philippines and Brazil have the indicating that decentralized coordinates do not significantly
lowest.
hurt relative performance. Finally, because Meridian and
CRP do not predict distances, Fig. 6(b) plots the relative
error for the closest peers they found. Meridian finds the
designing, evaluating and optimizing distributed systems closest peer approximately 20% of the time while CRP can
that run in this environment. We now show how more locate the closest peer more than 70% of the time.
accurate views of the edge of the network affect system Network costs in P2P file sharing. Large traffic volumes
performance when compared to evaluations conducted from generated by P2P file-sharing systems have generated a great
testbed environments. deal of publicity as network providers attempt to reduce
Network positioning. We begin with network positioning their costs by blocking, shaping or otherwise interfering with
systems and determine how the latency space measured P2P connections. Given the popularity of these systems, a
in the previous section affects accuracy for a variety of number of research efforts have investigated this issue by
positioning systems including GNP [10], Vivaldi [5], Merid- designing systems to reduce cross-network traffic [3,21] and
ian [20] and CRP [16]. evaluate the potential for P2P traffic locality [12].
The Vivaldi and CRP systems are implemented in our Most previous work in this area relies on limited de-
measurement platform, so their values represent true, “in ployments and/or simulation results to estimate network
the wild” performance. For evaluating GNP performance, costs of P2P systems. We now show how measurements
we use the authors’ simulation implementation. The results from a large-scale, live deployment – combined with more
are based on three runs of the simulation, each using a complete AS topology information – provides a different
randomly chosen set of 15 landmarks, 464 targets and an view of the costs incurred by these systems.
8-dimensional coordinate space. We also simulate Meridian A number of studies estimate the costs of P2P traffic
using settings proportional to those in the original evalua- as proportional to the number of AS hops along paths to
tion, with 379 randomly selected Meridian nodes, 100 target different hosts. In this context, traffic is considered “no-
nodes, 16 nodes per ring and 9 rings per node. Our results cost” (also referred to as “local”) if it stays entirely in the
are based on four simulation runs, each of which performs same AS. We now refine this metric to include all paths for
25,000 latency queries. which no hop contains a customer-provider relationship (or
We begin our analysis by evaluating the accuracy of vice-versa); i.e. our definition of “no-cost” includes traffic
GNP and of the Vuze Vivaldi implementations in terms of that remains in the origin AS or traverses peering and sibling
errors in predicted latency. Meridian and CRP are omitted links.
here because they do not provide quantitative latency pre- Figure 7(a) presents a CDF of the portion of each P2P
dictions. Figure 6(a) presents the cumulative distribution user’s traffic that is “no-cost.” Our results from 130,000
function (CDF) of errors on a semilog scale, where each source IPs and 12 million destination IPs indicates that
point represents the absolute value of the average error the vast majority of hosts naturally generate at least some
from one measurement host. We find that GNP has lower no-cost traffic. This result contradicts those from Piatek
measurement error (median is 59.8 ms) than the original et al. [12], who use inferred testbed-based results and a
Vivaldi implementation (labeled V1, median error is ≈ single deployed vantage point to question the effectiveness
150 ms), partially due to GNP’s use of fixed, dedicated of reducing ISP costs in P2P systems. In fact, we find that
landmarks. Somewhat surprisingly, Ledlie et al.’s Vivaldi the majority of traffic volumes are no-cost for a significant
implementation (labeled V2) has slightly larger errors in
5
Vivaldi V1 GNP Vivaldi V1 CRP
Vivaldi V2 Vivaldi V2 Meridian
GNP
1
1
0.9
0.9
0.8
0.8
0.7
0.7
CDF [X<e]
0.6
CDF [X<e]
0.6
0.5
0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
1 10 100 1000 10000 0.01 0.1 1 10 100 1000
Abs(Error) (ms) Relative Error
(a) Absolute error. (b) Relative error.
Figure 6: Absolute value of errors between estimated and measured latencies, in milliseconds (right), and absolute value of relative
errors between estimated and measured latencies (left).
1 1
0.9 0.9
0.8
0.8
0.7
CDF [X<v]
CDF [X<c]
0.7 0.6
0.6 0.5
0.5 0.4
0.3
0.4
0.2
0.3 0.1
0.2 0
0.01 0.1 1 -4 -3 -2 -1 0 1 2
No-cost portion of traffic volume Average weighted cost
(a) Portion of “no-cost” traffic generated per host. (b) Internet-wide costs incurred by BitTorrent traffic.
Figure 7: CDF of portion of “no-cost” traffic generated per host (right), and estimated Internet-wide costs incurred by BitTorrent
traffic (left). The vast majority of hosts generate at least some no-cost traffic while the majority of traffic volumes are no-cost for
12% of hosts. Further, our results show that P2P traffic has a net effect of generating significant revenue for provider ISPs.
6
fraction (12%) of hosts. [5] DABEK , C OX , K AASHOEK , AND M ORRIS , R. Vivaldi: A
It is important to note that these results may be biased decentralized network coordinate system. In Proc. of ACM
by the fact that the measured hosts are using Ono to prefer- SIGCOMM (2004).
[6] I SDAL , T., P IATEK , M., K RISHNAMURTHY, A., AND
entially use no-cost peer connections. We do not believe A NDERSON , T. Leveraging BitTorrent for end host
that this effect strongly affects our results: while Ono measurements. In Proc. of PAM (2007).
has been installed nearly 800,000 times, it is still a small [7] L EDLIE , J., G ARDNER , P., AND S ELTZER , M. Network
minority (less than half of one percent) of the total number coordinates in the wild. In Proc. of USENIX NSDI (Apr.
of BitTorrent users worldwide. Thus, these results represent 2007).
[8] L UMEZANU , C., BADEN , R., S PRING , N., AND
neither an upper nor a lower bound for the portion of no-cost B HATTACHARJEE , B. Triangle inequality variations in the
traffic that P2P systems can successfully use. internet. In Proc. of IMC (2009).
Finally, Fig. 7(b) plots the average cost per byte for each [9] M ADHYASTHA , H. V., I SDAL , T., MICHAEL P IATEK ,
user, based on the net costs of P2P traffic according to the D IXON , C., A NDERSON , T., K IRSHNAMURTHY, A., AND
traffic volumes per path and AS relationships along each UN V ENKATARAMANI , A. iPlane: an information plane for
distributed systems. In Proc. of USENIX OSDI (2006).
path. Specifically, the cost of a path is the sum of the cost of
[10] N G , T., AND Z HANG , H. Predicting Internet network distace
each AS hop, where a hop between customer and provider with coordinates-based approaches. In Proc. of IEEE
is assigned a cost of 1, provider to customer a cost of -1 INFOCOM (2002).
and zero otherwise (sibling and peer AS hops). We then [11] O LIVEIRA , R., P EI , D., W ILLINGER , W., Z HAN , B., AND
determine, for each host, the portion of all traffic volume Z HANG , L. Quantifying the completeness of the observed
generated by each of its connections and multiply this by the Internet AS-level structure. Tech. Rep. 080026, UCLA,
September 2008.
cost of the path. Each point in Fig. 7(b) represents the sum [12] P IATEK , M., M ADHYASTHA , H. V., J OHN , J. P.,
of these values for each host. K RISHNAMURTHY, A., AND A NDERSON , T. Pitfalls for
As the figure shows, the vast majority of hosts generate ISP-friendly P2P design. In Proc. of HotNets (2009).
flows with a net effect of generating revenue (i.e., negative [13] P LANET L AB. Planetlab.
costs) for ISPs. While this result is in agreement with com- http://www.planet-lab.org/.
[14] P RASAD , R. S., M URRAY, M., D OVROLIS , C., AND
monly held notions that P2P traffic has generated revenue for C LAFFY, K. Bandwidth estimation: Metrics, measurement
ISPs, we believe that we are the first to attempt to quantify techniques, and tools. IEEE Network 17 (2003), 27–35.
this effect. We leave a study of which ISPs are benefiting [15] ROUTEVIEWS. http://www.routeviews.org/.
from this (and by how much) as part of our future work. [16] S U , A.-J., C HOFFNES , D., B USTAMANTE , F. E., AND
K UZMANOVIC , A. Relative network positioning via CDN
redirections. In Proc. of ICDCS (2008).
6. CONCLUSION [17] TANG , L., AND C ROVELLA , M. Virtual landmarks for the
This article discussed potential issues with extending internet. In Proc. of IMC (2003).
[18] T EAM C YMRU. The Team Cymru IP to ASN lookup page.
results from limited platforms to Internet wide perspectives. http://www.cymru.com/BGP/asnlookup.html.
In particular, we showed that testbed-based views of Internet [19] WANG , G., Z HANG , B., AND N G , T. S. E. Towards
paths are limited, the properties of these paths do not extend network triangle inequality violation aware distributed
to the edge of the network and these inaccuracies have a systems. In Proc. of IMC (2007).
significant impact on inferred system-wide performance for [20] W ONG , B., S LIVKINS , A., AND S IRER , E. Meridian: A
lightweight network location service without virtual
services running at the edge. These results make a strong coordinates. In Proc. of ACM SIGCOMM (2005).
case for research in new evaluation strategies for Internet- [21] X IE , H., YANG , R., K RISHNAMURTHY, A., L IU , Y., AND
scale systems, both through edge-systems traces (as those S ILBERSCHATZ , A. P4P: Provider portal for (P2P)
available via our CloudScope project) and new evaluation applications. In Proc. of ACM SIGCOMM (2008).
platforms.
7. REFERENCES
[1] AGARWAL , S., AND L ORCH , J. R. Matchmaking for online
games and other latency-sensitive P2P systems. In Proc. of
ACM SIGCOMM (2009).
[2] C HEN , K., C HOFFNES , D., P OTHARAJU , R., C HEN , Y.,
B USTAMANTE , F., AND Z HAO , Y. Where the sidewalk ends:
Extending the Internet AS graph using traceroutes from P2P
users. In Proc. of ACM CoNEXT (2009).
[3] C HOFFNES , D. R., AND B USTAMANTE , F. E. Taming the
torrent: A practical approach to reducing cross-ISP traffic in
peer-to-peer systems. In Proc. of ACM SIGCOMM (2008).
[4] C HOFFNES , D. R., S ANCHEZ , M. A., AND B USTAMANTE ,
F. E. Network positioning from the edge: An empirical study
of the effectiveness of network positioning in p2p systems. In
Proc. of IEEE INFOCOM Mini Conference (2010).