100% found this document useful (2 votes)
359 views

Data Warehousing & Mining: Unit - V

This document discusses topics related to data mining from multimedia and web sources. It covers multimedia databases, mining multimedia data, web data mining including web content, structure and usage mining. It also discusses metadata, data visualization, and applications of data mining. Limitations of web mining discussed include privacy invasion and potential misuse of personal data.

Uploaded by

Sunil Kr Pandey
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
359 views

Data Warehousing & Mining: Unit - V

This document discusses topics related to data mining from multimedia and web sources. It covers multimedia databases, mining multimedia data, web data mining including web content, structure and usage mining. It also discusses metadata, data visualization, and applications of data mining. Limitations of web mining discussed include privacy invasion and potential misuse of personal data.

Uploaded by

Sunil Kr Pandey
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Warehousing & Mining

UNIT – V

Prof. S. K. Pandey, I.T.S, Ghaziabad


Syllabus of Unit - V
 Multimedia Data-Mining : Multimedia-Databases
 Mining Multimedia Data
 Data-Mining and the World Wide Web
 Web Data-Mining
 Mining and Meta-Data
 Data Visualization & Overall Perspective
 Data Visualization
 Applications of Data-Mining

Prof. S.K. Pandey, I.T.S, Ghaziabad 2


Multimedia Data-Mining : Multimedia-
Databases
 Multimedia databases include video, images, audio and
text media. They can be stored on extended object-
relational or object-oriented databases, or simply on a file
system. Multimedia is characterized by its high
dimensionality, which makes data mining even more
challenging. Data mining from multimedia repositories
may require computer vision, computer graphics, image
interpretation, and natural language processing
methodologies.

Prof. S.K. Pandey, I.T.S, Ghaziabad 3


Data-Mining and the World Wide
Web
 World Wide Web: The World Wide Web is the most heterogeneous and
dynamic repository available. A very large number of authors and publishers
are continuously contributing to its growth and metamorphosis, and a massive
number of users are accessing its resources daily. Data in the World Wide Web
is organized in inter-connected documents. These documents can be text,
audio, video, raw data, and even applications.
 Conceptually, the World Wide Web is comprised of three major components:
The content of the Web, which encompasses documents available; the structure
of the Web, which covers the hyperlinks and the relationships between
documents; and the usage of the web, describing how and when the resources
are accessed. A fourth dimension can be added relating the dynamic nature or
evolution of the documents.
 Data mining in the World Wide Web, or web mining, tries to address all these
issues and is often divided into web content mining, web structure mining and
web usage mining.
Prof. S.K. Pandey, I.T.S, Ghaziabad 4
Web Data Mining
 The term Web Data Mining is a technique used to
crawl through various web resources to collect required
information, which enables an individual or a company
to promote business, understanding marketing
dynamics, new promotions floating on the Internet, etc.
 There is a growing trend among companies,
organizations and individuals alike to gather
information through web data mining to utilize that
information in their best interest.

Prof. S.K. Pandey, I.T.S, Ghaziabad 5


Web Data-Mining
 Web Mining - is the application of data mining techniques to discover patterns
from the Web. According to analysis targets, web mining can be divided into three
different types, which are:
1. Web Usage Mining
Web usage mining is the process of finding out what users are looking for on the
Internet. Some users might be looking at only textual data, whereas some others
might be interested in multimedia data. Web usage mining also helps find patterns
for a particular group of people, or for Internet users in a particular region.
2. Web Structure Mining
 Web structure mining is the process of using graph theory to analyze the
node and connection structure of a web site. According to the type of web structural
data, web structure mining can be divided into two kinds:
i. Extracting patterns from hyperlinks in the web: a hyperlink is a structural component that
connects the web page to a different location.
ii. Mining the document structure: analysis of the tree-like structure of page structures to describe
HTML or XML tag usage.
3. Web Content Mining.

Prof. S.K. Pandey, I.T.S, Ghaziabad 6


Advantages of Web Mining
 Web mining essentially has many advantages which makes this technology attractive
to corporations including the government agencies.
 This technology has enabled ecommerce to do personalized marketing, which
eventually results in higher trade volumes.
 The government agencies are using this technology to classify threats and fight
against terrorism.
 The predicting capability of the mining application can benefits the society by
identifying criminal activities.
 The companies can establish better customer relationship by giving them exactly
what they need. Companies can understand the needs of the customer better and they
can react to customer needs faster. The companies can find, attract and retain
customers; they can save on production costs by utilizing the acquired insight of
customer requirements.
 They can increase profitability by target pricing based on the profiles created. They
can even find the customer who might default to a competitor the company will try
to retain the customer by providing promotional offers to the specific customer, thus
reducing the risk of losing a customer.

7
Prof. S.K. Pandey, I.T.S, Ghaziabad
Limitations of Web Mining
 Web mining the technology itself doesn’t create issues, but this technology when used on data of
personal nature might cause concerns.
 The most criticized ethical issue involving web mining is the invasion of privacy.
 Privacy is considered lost when information concerning an individual is obtained, used, or
disseminated, especially if this occurs without their knowledge or consent. The obtained data will be
analyzed, and clustered to form profiles; the data will be made anonymous before clustering so that no
individual can be linked directly to a profile. But usually the group profiles are used as if they are
personal profiles.
 Thus these applications de-individualize the users by judging them by their mouse clicks. De-
individualization, can be defined as a tendency of judging and treating people on the basis of group
characteristics instead of on their own individual characteristics and merits.
 Another important concern is that the companies collecting the data for a specific purpose might use the
data for a totally different purpose, and this essentially violates the user’s interests. The growing trend
of selling personal data as a commodity encourages website owners to trade personal data obtained from
their site. This trend has increased the amount of data being captured and traded increasing the
likeliness of one’s privacy being invaded.
 The companies which buy the data are obliged make it anonymous and these companies are considered
authors of any specific release of mining patterns. They are legally responsible for the contents of the
release; any inaccuracies in the release will result in serious lawsuits, but there is no law preventing
them from trading the data.

Prof. S.K. Pandey, I.T.S, Ghaziabad 8


Mining and Meta-Data

Prof. S.K. Pandey, I.T.S, Ghaziabad 9


Contd…..
 The primary rational for data warehousing is to provide businesses
with analytics results from data mining, OLAP, Score-carding and
reporting. The cost of obtaining front-end analytics are lowered if
there is consistent data quality all along the pipeline from data
source to analytical reporting.
 Metadata is about controlling the quality of data entering the data
stream.
 Batch processes can be run to address data degradation or changes
to data policy. Metadata policies are enhance by using metadata
repositories.
 The first step in the process of realignment the data warehousing policies was
the examination of the metadata policies and deriving a unified view that can
work for all stakeholders. As the company was embarking on a new Score-
carding initiative it became feasible to bring the departments together and
propose a new enterprise-wide metadata policy.
Prof. S.K. Pandey, I.T.S, Ghaziabad 10
Data Visualization & Overall
Perspective
 Data visualization is the study of the visual representation of data, meaning
"information which has been abstracted in some schematic form, including attributes
or variables for the units of information.
 According to Friedman (2008) the "main goal of data visualization is to
communicate information clearly and effectively through graphical means. It doesn’t
mean that data visualization needs to look boring to be functional or extremely
sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and
functionality need to go hand in hand, providing insights into a rather sparse and
complex data set by communicating its key-aspects in a more intuitive way. Yet
designers often fail to achieve a balance between design and function, creating
gorgeous data visualizations which fail to serve their main purpose — to
communicate information.
 Data visualization is closely related to Information graphics, Information
visualization, Scientific visualization and Statistical graphics. In the new millennium
data visualization has become active area of research, teaching and development.
According to Post et al (2002) it has united the field of scientific and information
visualization".
Prof. S.K. Pandey, I.T.S, Ghaziabad 11
Applications of Data-Mining

Prof. S.K. Pandey, I.T.S, Ghaziabad 12


Contd….
 Performing basket analysis
– Which items customers tend to purchase together. This knowledge
can improve stocking, store layout strategies, and promotions.
 Sales forecasting
– Examining time-based patterns helps retailers make stocking
decisions. If a customer purchases an item today, when are they
likely to purchase a complementary item?
 Database marketing
– Retailers can develop profiles of customers with certain behaviors,
for example, those who purchase designer labels clothing or those
who attend sales. This information can be used to focus cost–
effective promotions.
 Merchandise planning and allocation
– When retailers add new stores, they can improve merchandise
planning and allocation by examining patterns in stores with similar
demographic characteristics. Retailers can also use data mining to
determine the ideal layout for a specific store.

Prof. S.K. Pandey, I.T.S, Ghaziabad 13

You might also like