Web Usage Mining Negative-Association: S.vignesh
Web Usage Mining Negative-Association: S.vignesh
NEGATIVE-
ASSOCIATION
s.vignesh
1hk07cs073
HKBKCE
Web Mining
Web Mining is the use of the data mining techniques to automatically
discover and extract information from web documents/services
Discovering useful information from the World-Wide Web and its usage
patterns
My Definition: Using data mining techniques to make the web more useful
and more profitable (for some) and to increase the efficiency of our
interaction with the web
Web Mining
Data Mining Techniques
Association rules
Sequential patterns
Classification
Clustering
Outlier discovery
Classification
People with age less than 40 and salary > 40k trade on-line
Clustering
Users A and B access similar URLs
Outlier Detection
User A spends more than twice the average amount of time
surfing on the Web
Web Mining
Network Management
Performance management
Fault management
User Profiling
Important for improving customization
Provide users with pages, advertisements of interest
Example profiles: on-line trader, on-line shopper
Engage technologies
Tracks web traffic to create anonymous user profiles of Web surfers
Has profiles for more than 35 million anonymous users
Internet Advertizing
Ads are a major source of revenue for Web
portals (e.g., Yahoo, Lycos) and E-commerce
sites
Scheme 2:
Automate association between ads and users
Use ad click information to cluster users (each user is associated
with a set of ads that he/she clicked on)
For each cluster, find ads that occur most frequently in the cluster
and these become the ads for the set of users in the cluster
Internet Advertizing
Use collaborative filtering (e.g. Likeminds, Firefly)
Each user Ui has a rating for a subset of ads (based
on click information, time spent, items bought etc.)
Rij - rating of user Ui for ad Aj
Problem: Compute user Ui’s rating for an unrated ad
Aj
A1 A2 A3
Internet Advertizing
Key Idea: User Ui’s rating for ad Aj is set to Rkj, where Uk is
the user whose rating of ads is most similar to Ui’s
Query image
Problems with Web Search Today
Today’s search engines are plagued by
problems:
the abundance problem (99% of info of no interest to
99% of people)
limited coverage of the Web (internet sources
hidden behind search interfaces)
Largest crawlers cover < 18% of all web pages
limited query interface based on keyword-oriented
search
limited customization to individual users
Problems with Web Search Today
Today’s search engines are plagued by
problems:
Web is highly dynamic
Lot
of pages added, removed, and updated
every day
Very high dimensionality
Improve Search By Adding
Structure to the Web
Use Web directories (or topic hierarchies)
Provide a hierarchical classification of documents (e.g., Yahoo!)
Yahoo home page
Router
Service Provider Network
Server
Why is Traffic Management
Important?
While annual bandwidth demand is increasing ten-fold
on average, annual bandwidth supply is rising only by
a factor of three
Akamai, Inktomi
Traffic Management
Congested
link
Congested
server
Request
Router
Service Provider Network
Server
Traffic Management
Need to mine network and Web traffic to determine
What content to replicate?
Which servers should store replicas?
Which server to route a user request?
What path to use to route packets?
Need to analyze alarm and traffic data to carry out root cause analysis of
faults
Bayesian classifiers can be used to predict the root cause given a set of
alarms
Web Mining Issues
Size
Grows at about 1 million pages a day
Google indexes 9 billion documents
Number of web sites
Netcraft survey says 72 million sites
(http://news.netcraft.com/archives/web_server_survey.html)