0% found this document useful (0 votes)
42 views

Web Content Mining: by Saumya Aggarwal (0232083107 - IT) Richa Sharma (0732082707 - CSE)

Uploaded by

Richa Mayank
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Web Content Mining: by Saumya Aggarwal (0232083107 - IT) Richa Sharma (0732082707 - CSE)

Uploaded by

Richa Mayank
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

WEB CONTENT MINING

By
Saumya Aggarwal(0232083107---IT)
Richa Sharma(0732082707---CSE)
WHAT IS WEB MINING…???
 Web mining is the application of data mining techniques
to extract knowledge from web data, including web
documents, hyperlinks between documents, usage logs
of web sites, etc.
Data Mining Views

Process Centric Data Centric


WHY WEB MINING..???
 The amount of information on the Web is huge and diverse.
 Much of the Web information is redundant. The same piece
of information or its variants may appear in many pages.
 A Web page typically contains a mixture of many kinds of
information, e.g., main contents , advertisements,
navigation panels, copyright notices, etc.
 The Web is dynamic. Information on the Web changes
constantly. Keeping up with the changes and monitoring the
changes are important issues.
 Above all, the Web is a virtual society. It is not only about
data, information and services, but also about interactions
among people, organizations and automatic systems.
TAXONOMY IN WEB MINING
 Web Mining is a very broad term which has been
classified into three major streams:

Web Content Web Structure Web Usage


Mining Mining Mining
process of extracting process of discovering process of discovering
useful information useful knowledge from interesting usage from the web
the structures and patterns from the web
hyperlinks from the
web.

Next
WEB CONTENT MINING
 Web content mining is the process of extracting useful
information from the contents of web documents.

It includes--
 Mining

 Extraction of data

 Integration of knowledge

from Web page contents.

 The content data may consist of text, images, audio,video,


or structured records such as lists and tables.
Back
WEB STRUCTURE MINING
 Web structure mining is the process of discovering
structure information from the web.
 Web graph---

hyperlink

node node
o Categories(based on structure of information)

Hyperlinks Document Structure


Back
WEB USAGE MINING
 It discovers interesting usage patterns from web usage
data.
 Understand and better serve the needs of web-based
applications.
 Usage data captures the identity or origin of web users
and their browsing behaviour at a web site

 Classification based on the kind of usage:

Web server logs Application Server Logs Application Level logs

Back
SEARCH ENGINE
 Search engine is a software program that searches for sites based on
the words that you designate as search terms.
 Search engines look through their own databases of information in
order to find what it is that you are looking for.
 “Search engine” is the popular term for an Information Retrieval
(IR) system.
HOW DOES A SEARCH ENGINE WORK
WHAT NEXT…???

 Search engine plays important role in accessing the content over the
internet, it fetches the pages requested by the user.

 An in depth (comparative) study of the major search engines


today---
 Google
 Yahoo
 Msn

 Study of all the information retrieval models that have been


developed so far.

 The need for better search engines only increases.


THANK YOU

You might also like