0% found this document useful (0 votes)
29 views

MIW Chapter 2

The document discusses the basic technologies that underlie the World Wide Web. It describes the three main components that make web resources readily available: uniform resource identifiers (URIs) for locating resources, protocols like HTTP for accessing resources over the web, and hypertext for easy navigation between resources using HTML. It also discusses web documents, resource identifiers like URLs and URNs, protocols, log files, search engines, and the role of standard languages like SGML and HTML in developing the initial infrastructure of the web.

Uploaded by

Aniket Shetye
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

MIW Chapter 2

The document discusses the basic technologies that underlie the World Wide Web. It describes the three main components that make web resources readily available: uniform resource identifiers (URIs) for locating resources, protocols like HTTP for accessing resources over the web, and hypertext for easy navigation between resources using HTML. It also discusses web documents, resource identifiers like URLs and URNs, protocols, log files, search engines, and the role of standard languages like SGML and HTML in developing the initial infrastructure of the web.

Uploaded by

Aniket Shetye
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Basic WWW Technologies

2.1 Web Documents. 2.2 Resource Identifiers: URI, URL, and URN. 2.3 Protocols. 2.4 Log Files. 2.5 Search Engines.

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

What Is the World Wide Web?


The world wide web (web) is a network of information resources. The web relies on three mechanisms to make these resources readily available to the widest possible audience: 1. A uniform naming scheme for locating resources on the web (e.g., URIs). 2. Protocols, for access to named resources over the web (e.g., HTTP). 3. Hypertext, for easy navigation among resources (e.g., HTML).
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

Internet vs. Web


Internet: Internet is a more general term Includes physical aspect of underlying networks and mechanisms such as email, FTP, HTTP Web: Associated with information stored on the Internet Refers to a broader class of networks, i.e. Web of English Literature Both Internet and web are networks
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

Essential Components of WWW


Resources:
Conceptual mappings to concrete or abstract entities, which do not change in the short term ex: ICS website (web pages and other kinds of files)

Resource identifiers (hyperlinks):


Strings of characters represent generalized addresses that may contain instructions for accessing the identified resource http://www.ics.uci.edu is used to identify the ICS homepage

Transfer protocols:
Conventions that regulate the communication between a browser (web user agent) and a server

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

Standard Generalized Markup Language (SGML)


Based on GML (generalized markup language), developed by IBM in the 1960s An international standard (ISO 8879:1986) defines how descriptive markup should be embedded in a document Gave birth to the extensible markup language (XML), W3C recommendation in 1998

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

SGML Components
SGML documents have three parts:
Declaration: specifies which characters and delimiters may appear in the application DTD/ style sheet: defines the syntax of markup constructs Document instance: actual text (with the tag) of the documents

More info could be found: http://www.W3.Org/markup/SGML


Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

DTD Example One


<!ELEMENT UL - - (LI)+>
ELEMENT is a keyword that introduces a new element type unordered list (UL) The two hyphens indicate that both the start tag <UL> and the end tag </UL> for this element type are required Any text between the two tags is treated as a list item (LI)

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

DTD Example Two


<!ELEMENT IMG - O EMPTY> The element type being declared is IMG The hyphen and the following "O" indicate that the end tag can be omitted Together with the content model "EMPTY", this is strengthened to the rule that the end tag must be omitted. (no closing tag)
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

HTML Background
HTML was originally developed by Tim BernersLee while at CERN, and popularized by the Mosaic browser developed at NCSA. The Web depends on Web page authors and vendors sharing the same conventions for HTML. This has motivated joint work on specifications for HTML. HTML standards are organized by W3C : http://www.w3.org/MarkUp/
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

HTML Functionalities
HTML gives authors the means to:
Publish online documents with headings, text, tables, lists, photos, etc
Include spread-sheets, video clips, sound clips, and other applications directly in their documents

Link information via hypertext links, at the click of a button Design forms for conducting transactions with remote services, for use in searching for information, making reservations, ordering products, etc

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

10

HTML Versions
HTML 4.01 is a revision of the HTML 4.0 Recommendation first released on 18th December 1997.
HTML 4.01 Specification:

http://www.w3.org/TR/1999/REC-html401-19991224/html40.txt HTML 4.0 was first released as a W3C Recommendation on 18 December 1997 HTML 3.2 was W3C's first Recommendation for HTML which represented the consensus on HTML features for 1996 HTML 2.0 (RFC 1866) was developed by the IETF's HTML Working Group, which set the standard for core HTML features based upon current practice in 1994.

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

11

Sample Webpage

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

12

Sample Webpage HTML Structure


<HTML> <HEAD> <TITLE>The title of the webpage</TITLE> </HEAD> <BODY> <P>Body of the webpage </BODY> </HTML>
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

13

HTML Structure
An HTML document is divided into a head section (here, between <HEAD> and </HEAD>) and a body (here, between <BODY> and </BODY>) The title of the document appears in the head (along with other information about the document) The content of the document appears in the body. The body in this example contains just one paragraph, marked up with <P>

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

14

HTML Hyperlink
<a href="relations/alumni">alumni</a> A link is a connection from one Web resource to another It has two ends, called anchors, and a direction Starts at the "source" anchor and points to the "destination" anchor, which may be any Web resource (e.g., an image, a video clip, a sound bite, a program, an HTML document)
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

15

Resource Identifiers
URI: Uniform Resource Identifiers URL: Uniform Resource Locators URN: Uniform Resource Names

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

16

Introduction to URIs
Every resource available on the Web has an address that may be encoded by a URI URIs typically consist of three pieces: The naming scheme of the mechanism used to access the resource. (HTTP, FTP) The name of the machine hosting the resource The name of the resource itself, given as a path
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

17

URI Example
http://www.w3.org/TR There is a document available via the HTTP protocol Residing on the machines hosting www.w3.org Accessible via the path "/TR"

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

18

Protocols
Describe how messages are encoded and exchanged Different Layering Architectures ISO OSI 7-Layer Architecture TCP/IP 4-Layer Architecture

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

19

ISO OSI Layering Architecture

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

20

ISOs Design Principles


A layer should be created where a different level of abstraction is needed Each layer should perform a well-defined function The layer boundaries should be chosen to minimize information flow across the interfaces The number of layers should be large enough that distinct functions need not be thrown together in the same layer, and small enough that the architecture does not become unwieldy
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

21

TCP/IP Layering Architecture

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

22

TCP/IP Layering Architecture


A simplified model, provides the end-toend reliable connection The network layer
Hosts drop packages into this layer, layer routes towards destination Only promise Try my best

The transport layer


Reliable byte-oriented stream
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

23

Hypertext Transfer Protocol (HTTP)


A connection-oriented protocol (TCP) used to carry WWW traffic between a browser and a server One of the transport layer protocol supported by Internet HTTP communication is established via a TCP connection and server port 80

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

24

GET Method in HTTP

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

25

Domain Name System


DNS (domain name service): mapping from domain names to IP address IPv4: IPv4 was initially deployed January 1st. 1983 and is still the most commonly used version. 32 bit address, a string of 4 decimal numbers separated by dot, range from 0.0.0.0 to 255.255.255.255. IPv6: Revision of IPv4 with 128 bit address
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

26

Top Level Domains (TLD)


Top level domain names, .com, .edu, .gov and ISO 3166 country codes There are three types of top-level domains:
Generic domains were created for use by the Internet public Country code domains were created to be used by individual country The .arpa domain Address and Routing Parameter Area domain is designated to be used exclusively for Internetinfrastructure purposes
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

27

Registrars
Domain names ending with .aero, .biz, .com, .coop, .info, .museum, .name, .net, .org, or .pro can be registered through many different companies (known as "registrars") that compete with one another InterNIC at http://internic.net Registrars Directory: http://www.internic.net/regist.html
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

28

Server Log Files


Server Transfer Log: transactions between a browser and server are logged
IP address, the time of the request Method of the request (GET, HEAD, POST) Status code, a response from the server Size in byte of the transaction

Referrer Log: where the request originated Agent Log: browser software making the request (spider) Error Log: request resulted in errors (404)
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

29

Server Log Analysis


Most and least visited web pages Entry and exit pages Referrals from other sites or search engines What are the searched keywords How many clicks/page views a page received Error reports, like broken links
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

30

Server Log Analysis

Modeling the Internet and the Web

School of Information and Computer Science University of California, Irvine

31

Search Engines
According to Pew Internet Project Report (2002), search engines are the most popular way to locate information online About 33 million U.S. Internet users query on search engines on a typical day. More than 80% have used search engines Search Engines are measured by coverage and recency
Modeling the Internet and the Web
School of Information and Computer Science University of California, Irvine

32

You might also like