0% found this document useful (0 votes)
9 views14 pages

21 p2p

The document discusses peer-to-peer (P2P) file sharing techniques including centralized indexing, flooding searches, and structured overlays. It describes several popular P2P systems like Napster, Gnutella, and KaZaA and compares their approaches to publishing, searching, and fetching files.

Uploaded by

sefnacor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

21 p2p

The document discusses peer-to-peer (P2P) file sharing techniques including centralized indexing, flooding searches, and structured overlays. It describes several popular P2P systems like Napster, Gnutella, and KaZaA and compares their approaches to publishing, searching, and fetching files.

Uploaded by

sefnacor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Scaling Problem

• Millions of clients ⇒ server and network


meltdown

Peer-to-Peer

15-441

P2P System
Why P2P?
• Harness lots of spare capacity
– 1 Big Fast Server: 1Gbit/s, $10k/month++
– 2,000 cable modems: 1Gbit/s, $ ??
– 1M end-hosts: Uh, wow.
• Build self-managing systems / Deal with huge
scale
– Same techniques attractive for both companies /
• Leverage the resources of client machines (peers) servers / p2p
– Computation, storage, bandwidth • E.g., Akamaiʼs 14,000 nodes
• Googleʼs 100,000+ nodes

3 4

1
Outline P2P file-sharing
• p2p file sharing techniques • Quickly grown in popularity
– Downloading: Whole-file vs. chunks – Dozens or hundreds of file sharing applications
– Searching – 35 million American adults use P2P networks --
• Centralized index (Napster, etc.) 29% of all Internet users in US!
• Flooding (Gnutella, etc.)
• Smarter flooding (KaZaA, …) – Audio/Video transfer now dominates traffic on the
• Routing (Freenet, etc.) Internet
• Uses of p2p - what works well, what doesnʼt?
– servers vs. arbitrary nodes
– Hard state (backups!) vs soft-state (caches)
• Challenges
5 6
– Fairness, freeloading, security, …

Whatʼs out there? Searching


Central Flood Super- Route
node N2
N1 N3
flood
Whole Napster Gnutella Freenet
Key=“title” Internet
File Value=MP3 data… ? Client
Publisher
Lookup(“title”)
Chunk BitTorrent KaZaA DHTs
N4 N6
Based (bytes, eDonkey N5
not 2000
chunks)
7 8

2
Searching 2 Framework
• Needles vs. Haystacks • Common Primitives:
– Searching for top 40, or an obscure punk – Join: how do I begin participating?
track from 1981 that nobodyʼs heard of? – Publish: how do I advertise my file?
• Search expressiveness – Search: how to I find a file?
– Whole word? Regular expressions? File – Fetch: how to I retrieve a file?
names? Attributes? Whole-text search?
• (e.g., p2p gnutella or p2p google?)

9 10

Next Topic... Napster: History


• Centralized Database
– Napster • 1999: Sean Fanning launches Napster
• Query Flooding
– Gnutella • Peaked at 1.5 million simultaneous
• Intelligent Query Flooding users
– KaZaA
• Swarming • Jul 2001: Napster shuts down
– BitTorrent
• Unstructured Overlay Routing
– Freenet
• Structured Overlay Routing
– Distributed Hash Tables

11 12

3
Napster: Overiew Napster: Publish
• Centralized Database:
– Join: on startup, client contacts central insert(X,
server 123.2.21.23)
– Publish: reports list of files to central ...
server
– Search: query the server => return Publish
someone that stores the requested file
– Fetch: get the file directly from peer
I have X, Y, and Z!
123.2.21.23
13 14

Napster: Search Napster: Discussion


123.2.0.18
• Pros:
search(A) – Simple
--> – Search scope is O(1)
Fetch 123.2.0.18
– Controllable (pro or con?)
Query Reply
• Cons:
– Server maintains O(N) State
– Server does all processing
Where is file A?
– Single point of failure
15 16

4
Next Topic... Gnutella: History
• Centralized Database
– Napster • In 2000, J. Frankel and T. Pepper from
• Query Flooding
– Gnutella
Nullsoft released Gnutella
• Intelligent Query Flooding • Soon many other clients: Bearshare,
– KaZaA
• Swarming Morpheus, LimeWire, etc.
– BitTorrent
• Unstructured Overlay Routing
• In 2001, many protocol enhancements
– Freenet including “ultrapeers”
• Structured Overlay Routing
– Distributed Hash Tables

17 18

Gnutella: Overview Gnutella: Search


I have file A.
• Query Flooding: I have file A.
– Join: on startup, client contacts a few other
nodes; these become its “neighbors”
– Publish: no need Reply
– Search: ask neighbors, who ask their
neighbors, and so on... when/if found, reply
to sender.
• TTL limits propagation
Query
– Fetch: get the file directly from peer
Where is file A?
19 20

5
Gnutella: Discussion KaZaA: History
• Pros: • In 2001, KaZaA created by Dutch company
– Fully de-centralized Kazaa BV
– Search cost distributed • Single network called FastTrack used by
– Processing @ each node permits powerful search
semantics other clients as well: Morpheus, giFT, etc.
• Cons: • Eventually protocol changed so other clients
– Search scope is O(N) could no longer talk to it
– Search time is O(???) • Most popular file sharing network today with
– Nodes leave often, network unstable >10 million users (number varies)
• TTL-limited search works well for haystacks.
– For scalability, does NOT search every node. May
21 22

have to re-issue query later

KaZaA: Overview KaZaA: Network Design


“Super Nodes”
• “Smart” Query Flooding:
– Join: on startup, client contacts a “supernode” ...
may at some point become one itself
– Publish: send list of files to supernode
– Search: send query to supernode, supernodes
flood query amongst themselves.
– Fetch: get the file directly from peer(s); can fetch
simultaneously from multiple peers

23 24

6
KaZaA: File Insert KaZaA: File Search
search(A)
-->
123.2.22.50
insert(X,
123.2.21.23)
... search(A)
123.2.22.50 -->
Publish Query Replies 123.2.0.18

I have X! Where is file A?


123.2.21.23
123.2.0.18
25 26

KaZaA: Fetching KaZaA: Discussion


• More than one node may have requested file... • Pros:
• How to tell? – Tries to take into account node heterogeneity:
– Must be able to distinguish identical files • Bandwidth
– Not necessarily same filename • Host Computational Resources
– Same filename not necessarily same file... • Host Availability (?)
• Use Hash of file – Rumored to take into account network locality
– KaZaA uses UUHash: fast, but not secure • Cons:
– Alternatives: MD5, SHA-1 – Mechanisms easy to circumvent
• How to fetch? – Still no real guarantees on search scope or search time
– Get bytes [0..1000] from A, [1001...2000] from B
• Similar behavior to gnutella, but better.
– Alternative: Erasure Codes

27 28

7
Stability and Superpeers BitTorrent: History
• Why superpeers? • In 2002, B. Cohen debuted BitTorrent
– Query consolidation • Key Motivation:
• Many connected nodes may have only a few files – Popularity exhibits temporal locality (Flash Crowds)
• Propagating a query to a sub-node would take more b/w – E.g., Slashdot effect, CNN on 9/11, new movie/game
than answering it yourself release
– Caching effect • Focused on Efficient Fetching, not Searching:
• Requires network stability – Distribute the same file to all peers
– Single publisher, multiple downloaders
• Superpeer selection is time-based
• Has some “real” publishers:
– How long youʼve been on is a good predictor of
– Blizzard Entertainment using it to distribute the beta of their
how long youʼll be around. new game
29 30

BitTorrent: Overview BitTorrent: Publish/Join


Tracker
• Swarming:
– Join: contact centralized “tracker” server, get a list
of peers.
– Publish: Run a tracker server.
– Search: Out-of-band. E.g., use Google to find a
tracker for the file you want.
– Fetch: Download chunks of the file from your
peers. Upload chunks you have to them.
• Big differences from Napster:
– Chunk based downloading (sound familiar? :)
– “few large files” focus
31 32
– Anti-freeloading mechanisms

8
BitTorrent: Fetch BitTorrent: Sharing Strategy
• Employ “Tit-for-tat” sharing strategy
– A is downloading from some other people
• A will let the fastest N of those download from him
– Be optimistic: occasionally let freeloaders
download
• Otherwise no one would ever start!
• Also allows you to discover better peers to download
from when they reciprocate
• Goal: Pareto Efficiency
– Game Theory: “No change can make anyone
better off without making others worse off”
33
– Does it work? (donʼt know!) 34

BitTorrent: Summary Next Topic...


• Centralized Database
• Pros: – Napster

– Works reasonably well in practice • Query Flooding


– Gnutella
– Gives peers incentive to share resources; avoids • Intelligent Query Flooding
freeloaders – KaZaA

• Cons: • Swarming
– BitTorrent
– Pareto Efficiency relatively weak condition
• Unstructured Overlay Routing
– Central tracker server needed to bootstrap swarm – Freenet
– (Tracker is a design choice, not a requirement. • Structured Overlay Routing
Could easily combine with other approaches.) – Distributed Hash Tables (DHT)

35 36

9
Distributed Hash Tables DHT: Chord Summary
• Academic answer to p2p • Routing table size?
• Goals
–Log N fingers
– Guaranteed lookup success
– Provable bounds on search time • Routing time?
– Provable scalability
–Each hop expects to 1/2 the distance to the
• Makes some things harder
desired id => expect O(log N) hops.
– Fuzzy queries / full-text search / etc.
• Read-write, not read-only
• Hot Topic in networking since introduction in
~2000/2001

37 38

DHT: Discussion When are p2p / DHTs useful?


• Pros: • Caching and “soft-state” data
– Guaranteed Lookup – Works well! BitTorrent, KaZaA, etc., all
– O(log N) per node state and search scope use peers as caches for hot data
• Cons: • Finding read-only data
– No one uses them? (only one file sharing – Limited flooding finds hay
app) – DHTs find needles
– Supporting non-exact match search is hard • BUT

39 40

10
A Peer-to-peer Google? Writable, persistent p2p
• Complex intersection queries (“the” + “who”) • Do you trust your data to 100,000 monkeys?
– Billions of hits for each term alone • Node availability hurts
• Sophisticated ranking – Ex: Store 5 copies of data on different nodes
– Must compare many results before returning a – When someone goes away, you must replicate the
data they held
subset to user
– Hard drives are *huge*, but cable modem upload
• Very, very hard for a DHT / p2p system bandwidth is tiny - perhaps 10 Gbytes/day
– Need high inter-node bandwidth – Takes many days to upload contents of 200GB
– (This is exactly what Google does - massive hard drive. Very expensive leave/replication
clusters) situation!

41 42

P2P: Summary
• Many different styles; remember pros and cons of
each
– centralized, flooding, swarming, unstructured and structured
routing
Extra Slides
• Lessons learned:
– Single points of failure are very bad
– Flooding messages to everyone is bad
– Underlying network topology is important
– Not all nodes are equal
– Need incentives to discourage freeloading
– Privacy and security are important
– Structure can provide theoretical bounds and guarantees
43

11
KaZaA: Usage Patterns KaZaA: Usage Patterns (2)
• KaZaA is more than • KaZaA is not Zipf!
one workload! – FileSharing:
– Many files < 10MB “Request-once”
(e.g., Audio Files) – Web: “Request-
– Many files > 100MB repeatedly”
(e.g., Movies)

from Gummadi et al., SOSP 2003 from Gummadi et al., SOSP 2003
45 46

KaZaA: Usage Patterns (3) Freenet: History


• What we saw: • In 1999, I. Clarke started the Freenet
– A few big files consume most of the bandwidth
– Many files are fetched once per client but still very popular
project
• Solution? • Basic Idea:
– Caching! – Employ Internet-like routing on the overlay
network to publish and locate files
• Addition goals:
– Provide anonymity and security
– Make censorship difficult
from Gummadi et al., SOSP 2003
47 48

12
Freenet: Overview Freenet: Routing Tables
• id – file identifier (e.g., hash of file)
• next_hop – another node that stores the file id
• Routed Queries: • file – file identified by id being stored on the local node
– Join: on startup, client contacts a few other id next_hop file
nodes it knows about; gets a unique node id • Forwarding of query for file id


– Publish: route file contents toward the file id. File – If file id stored locally, then stop
is stored at node with id closest to file id • Forward data back to upstream requestor
– Search: route query for file id toward the closest – If not, search for the “closest” id in the table, and


node id forward the message to the corresponding
next_hop
– Fetch: when query reaches a node containing – If data is not found, failure is reported back
file id, it returns the file to the sender • Requestor then tries next closest match in routing
table

49 50

Freenet: Routing Freenet: Routing Properties


query(10)

n1 n2 • “Close” file ids tend to be stored on the same


4 n1 f4 1 9 n3 f9
4’
node
12 n2 f12
5 n3 4 n4 n5
– Why? Publications of similar file ids route toward
14 n5 f14 5 4 n1 f4
the same place
2
13 n2 f13 10 n5 f10
n3 3 3 n6 8 n6 • Network tend to be a “small world”
3 n1 f3 – Small number of nodes have large number of
14 n4 f14
5 n3 neighbors (i.e., ~ “six-degrees of separation”)
• Consequence:
– Most queries only traverse a small number of hops
to find the file
51 52

13
Freenet: Anonymity & Security Freenet: Discussion
• Anonymity • Pros:
– Randomly modify source of packet as it traverses the – Intelligent routing makes queries relatively short
network
– Can use “mix-nets” or onion-routing – Search scope small (only nodes along search path
involved); no flooding
• Security & Censorship resistance
– No constraints on how to choose ids for files => easy to – Anonymity properties may give you “plausible
have to files collide, creating “denial of service” (censorship) deniability”
– Solution: have a id type that requires a private key signature • Cons:
that is verified when updating the file
– Cache file on the reverse path of queries/publications =>
– Still no provable guarantees!
attempt to “replace” file with bogus data will just cause the – Anonymity features make it hard to measure,
file to be replicated more! debug

53 54

14

You might also like