0% found this document useful (0 votes)
19 views

Structure

The document summarizes the structure of the World Wide Web. It discusses how: 1) The Web is an application that runs on top of the Internet, using hyperlinks to connect documents. 2) The first web browser was created by Tim Berners-Lee in 1989-1991 at CERN, allowing documents to be publicly accessible and accessed using a browser. 3) The Web has a network structure where nodes are documents connected by links, and can be modeled as a directed graph.

Uploaded by

anjanamenon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Structure

The document summarizes the structure of the World Wide Web. It discusses how: 1) The Web is an application that runs on top of the Internet, using hyperlinks to connect documents. 2) The first web browser was created by Tim Berners-Lee in 1989-1991 at CERN, allowing documents to be publicly accessible and accessed using a browser. 3) The Web has a network structure where nodes are documents connected by links, and can be modeled as a directed graph.

Uploaded by

anjanamenon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Structure of The World Wide

Web
”From “Networks, Crowds and Markets
Chapter 13

Eyal Feder
Nov, 14
?What Is the Web
Not really
• The Web != Internet
• None of the are made of cats

• The World Wide Web is an application of the Internet

• https://www.youtube.com/watch?v=lskpNmUl8yQ
Information networks Vs. social networks
• The basic units connected (nodes) are pieces of information
• The edges symbolize some kind of connection between them
• Share a lot of the ideas mentioned in earlier sessions
Back to the web
• Created by Tim Burners-Lee
• A research project in 1989-
1991 at CERN
• An application of the internet
• Two basic features:
• Make documents on your
computer publically accessible
• Easily access these documents
using a browser
The first browser
Some are still there
The web as a network
• The nodes are documents (pages)
• The edges are links (figure 13.2)

• How do links work? Hypertext


Hypertext
(The coolest thing about the web)
Different ways to manage information
• Alphabetically
• Hierarchy (like folders)
• Classification systems

• All of these have one thing in common

Linearrrr
Earlier non linear connections
• Academic references
• (also in legal decisions and patents)

• Relevant to the web?


Earlier non linear connections
• Cross-reference encyclopedia
(figure 13.4)
Memex
• Vannevar Bush, 1945 Article: “As
We May Think”
• Our memory is not linear.
• Hypothetical model – the Memex
• Inspired the idea of hypertext
Introducing: Hypertext
• The ultimate reason text is blue.
• Invented by Burners-Lee
• The way web pages are connected
• An associative way to organize information
Changes in the web over time
Static pages >> Query pages
• In the early days – static pages of contact
• Today?
• More and more transactional actions, which create query pages
Importance of static pages
• “The Backbone of the Internet”
• Reliable over time
• Include most links
• Navigational vs. transactional
• Our focus when thinking about structure
!Time for math
)…just a little bit, sorry(
The web as a directed graph
• The best mathematical approximation – a graph
• Why directed?
What is a path in a directed graph?
• “A Path from node A to a node B in a directed graph is a sequence of
nodes, beginig with A and ending with B, with the property that each
consecutive pair of nodes in the sequence is connected by an edge
pointing in the forward direction”
What is Strong Connectivity in a directed
graph?
• “A directed graph is Strongly connected if there is a path from every
node to every other node”
The Concept of Reachability
• Since connectivity does not describe all of the connections in a graph,
we need another concept – Reachability
• Reachability describes the nodes that are reacheable from a certain
node or vice versa
• How do we check this?
Strongly connected components
• Parts of a graph that have strong connectivity
• In other words – a group of nodes in which each node is reachable from all
other nodes.
• Formal:
We say that a strongly connected component (SCC) in a directed graph is a subset of
the nodes such that: (i) every node in the subset has a path to every other; and (ii) the
subset is not part of some larger set with the property that every node can reach every
other.
How does all that help us understand the
web?
• We can map reachability
• Using the super-graph
The Bow Tie Structure
History
• Short reminder – the Web is not the Internet!
• Created in 1999 by Andrei Broder and his colleagues
• Used data from biggest search engine back then – AltaVista.
• Afterwards – reevaluated many times
The bow tie structure
Why a giant component?
• Counter-intuative, ha?
• Let’s think probability
Different kinds of nodes
• In the SCC
• In the “inbound” part
• In the “outbound” part
• Tendrils
• Disconnected nodes
Limitations
• The bow-tie structure is a “mile high” view
• Not understanding the role of specific nodes (sites)
Web 2.0
What is web 2.0?
• A concept made popular by Tim O’railey in 2004
• Basically – the web’s move towards a “Prosumer”
crowd
• Three main charachteristics:
(i) the growth of Web authoring styles that enabled
many people to collectively create and maintain shared
content;
(ii) the movement of people’s personal on-line data
(including e-mail, calendars, photos, and videos) from their
own computers to services offered and hosted by large
companies;
(iii) the growth of linking styles that emphasize on-line
connections between people, not just between
documents.
Different implications of web 2.0
• “Software that gets better as more people use it”
• “The wisdom of the crowds”
• “The Long Tail”
A little bit more a bout the
structure of the web
From: Albert R., Jeong H, & Barabasi A. - Diameter of the World Wide Web (2000)
About the research
• Trying to map reachability on the web
• Their main finding – the probability of a node
to have k links (inbound and out) follow a
power law
• Meaning – the web is a Small World Graph,
typically found in biological and social
networks
• This was proven more by the short path
research

You might also like