0% found this document useful (0 votes)
10 views28 pages

TMK DWDM Unit 7 Advance Topics

The document covers advanced topics in data warehousing and data mining, focusing on web mining, spatial data mining, temporal mining, text mining, and multimedia mining. It explains the processes, techniques, applications, and challenges associated with each type of mining, highlighting their significance in various fields such as healthcare, finance, and e-commerce. The content is prepared for a Computer Engineering course at Government Engineering College, Rajkot.

Uploaded by

vebifa5267
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views28 pages

TMK DWDM Unit 7 Advance Topics

The document covers advanced topics in data warehousing and data mining, focusing on web mining, spatial data mining, temporal mining, text mining, and multimedia mining. It explains the processes, techniques, applications, and challenges associated with each type of mining, highlighting their significance in various fields such as healthcare, finance, and e-commerce. The content is prepared for a Computer Engineering course at Government Engineering College, Rajkot.

Uploaded by

vebifa5267
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Government Engineering College, Rajkot

Computer Engineering Department


B.E. 6th SEMESTER Artificial Intelligence and Data Science
Subject Name: Data Warehousing and Data Mining (3164305)

Unit 8

Advance topics
Introduction to Web Mining, Spatial Data Mining, Temporal Mining, Text
Mining and Multimedia Mining.

Prepared by
Prof. T. M. Kodinariya
Department of Computer Engineering
Government Engineering College - Rajkot
What is Web Mining?
• Web mining is the process of extracting useful information and patterns
from web data, including web pages, links, server logs, and user behavior.
• It combines data mining techniques with web technologies to analyze
structured, semi-structured, and unstructured data on the internet.
• Web data includes :
– web documents
– hyperlinks between documents
– usage logs of web sites

2
Data Mining vs Web Mining

● Data Mining : It is a concept of identifying a significant pattern from the data that gives a
better outcome.

● Web Mining : It is the process of performing data mining in the web. Extracting the
web documents and discovering the patterns from it.

3
Data Mining vs Web Mining

4
Types of Web Mining

5
Web Content Mining
• Web content mining can be used to extract useful data, information, knowledge from
the web page content.
• Web content mining performs scanning and mining of the text, images, and group of
web pages according to the content of the input by displaying the list in search engines.
• Web content mining is related but different from data mining and text mining.
• Web data are mainly semi-structured and/or unstructured, while data mining deals
primarily with structured data and text mining focuses on unstructured texts.
• Example: Search engines like Google analyze webpage content to rank results.
• Techniques Used:
– Natural Language Processing (NLP) for text analysis.
– Image and video recognition.
– Sentiment analysis to understand user opinions.
6
Web Structure Mining
• Web structure mining focuses on analyzing the web structure and the relationships
between web pages.
• This includes analyzing links between pages, identifying communities of pages, and
detecting patterns in website design.
• Web structure mining techniques are used to improve search engine results, identify
authoritative pages, and detect web spam.
• Example: Google’s PageRank algorithm analyzes link structures to determine webpage
importance.
• Techniques Used:
– Graph theory to analyze link relationships.
– Network analysis to find influential pages.

7
Web Usage Mining
• It analyzes user behavior on websites, such as clicks, time spent, and navigation patterns.
• It mines Weblog records to discover user access patterns of Web pages.
• Analyzing and exploring regularities in Weblog records can identify potential customers
for electronic commerce, enhance the quality and delivery of Internet information
services to the end user, and improve Web server system performance.
• A Web server usually registers a (Web) log entry, or Weblog entry, for every access of a
Web page. It includes the URL requested, the IP address from which the request
originated, and a timestamp
• Example: E-commerce sites like Amazon track user browsing habits to recommend
products.
• Techniques Used:
– Log file analysis (server logs, cookies).
– Clickstream analysis to track user navigation.
– Machine learning for behavior prediction. 8
Text Mining

9
Text Mining

• Text mining is a subfield of data mining that involves extracting useful information from
unstructured text data.
• Text mining is used to analyze and mine information from text data, such as text
documents, social media posts, customer reviews, etc., and extract valuable insights that
can help organizations make data-driven decisions. Text mining techniques
include natural language processing (NLP), sentiment analysis, topic modeling, and text
classification.
• It involves the use of natural language processing (NLP) techniques to extract useful
information and insights from large amounts of unstructured text data.
• Text mining can be used as a preprocessing step for data mining or as a standalone
process for specific tasks.

10
Key steps in Text Mining
• Text Preprocessing
– Tokenization: Splitting text into individual words or phrases.
– Stopword Removal: Removing common words (e.g., "the," "and," "is") that do not
add meaning.
– Stemming/Lemmatization: Reducing words to their root forms (e.g., "running" →
"run").
– Normalization: Converting text into a standard format (e.g., lowercasing, removing
punctuation).
• Feature Extraction
– Bag of Words (BoW): Representing text as word frequency counts.
– TF-IDF (Term Frequency-Inverse Document Frequency): Identifying important words
in a document.
– Word Embeddings: Using vector representations like Word2Vec or BERT for semantic
analysis.

11
Key steps in Text Mining
• Text Analysis Techniques
– Named Entity Recognition (NER): Identifying proper names, places, and dates.
– Sentiment Analysis: Determining the emotional tone of text (e.g., positive,
negative, neutral).
– Topic Modeling: Identifying themes in a collection of documents (e.g., using
LDA – Latent Dirichlet Allocation).
– Text Classification: Categorizing text into predefined groups (e.g., spam
detection).
• Visualization & Interpretation
– Word Clouds: Displaying frequently used words in a dataset.
– Clustering: Grouping similar text documents together.
– Network Graphs: Mapping relationships between words and topics.

12
Text Mining

Applications of Text Mining


• Customer Feedback Analysis (e.g., analyzing reviews and social
media sentiment).
• Spam Detection (e.g., filtering unwanted emails).
• Fraud Detection (e.g., analyzing financial documents for
anomalies).
• Healthcare & Biomedical Research (e.g., extracting insights from
medical records).
• Legal & Compliance Monitoring (e.g., analyzing contracts for
risks).
13
Spatial Data Mining

14
Spatial Data Mining
• Spatial data mining is a specialized subfield of data mining that deals with extracting
knowledge from spatial data.
• Spatial data refers to data that is associated with a particular location or geography.
Examples of spatial data include maps, satellite images, GPS data, and other geospatial
information.
• Spatial data mining involves analyzing and discovering patterns, relationships, and
trends in this data to gain insights and make informed decisions.
• The use of spatial data mining has become increasingly important in various fields,
such as logistics, environmental science, urban planning, transportation, and public
health.
• By analyzing spatial data, researchers and data mining professionals can identify
correlations, predict future events, and make informed decisions that can have a
significant impact.

15
Spatial Data Mining
Applications of Spatial Data Mining
• Geographical Information Systems (GIS): Urban planning, land-use
analysis.
• Environmental Science: Predicting climate change, tracking deforestation.
• Public Health: Disease outbreak detection (e.g., COVID-19 hotspot
identification).
• Disaster Management: Identifying flood-prone areas, earthquake risk
analysis.
• Location-Based Services: Recommender systems for restaurants, hotels,
etc.

16
Spatial Data Mining
Spatial Data Mining Techniques
• Spatial Clustering: Groups similar spatial objects together (e.g., detecting hotspots of
crime in a city).
• Spatial Classification: Assigns labels to spatial objects (e.g., land cover classification in
remote sensing).
• Spatial Association Rule Mining: Finds relationships between spatial features (e.g.,
"areas with high rainfall often have dense vegetation").
• Spatial Outlier Detection: Identifies unusual patterns in spatial data (e.g., an unusually
high crime rate in a low-risk neighborhood).
• Spatial Prediction: Uses historical data to predict future spatial patterns (e.g.,
predicting urban expansion).

17
Spatial Data Mining
Challenges in Spatial Data Mining
• Data Complexity: Handling multidimensional and heterogeneous data.
• Scalability: Processing large spatial datasets efficiently.
• Uncertainty and Noise: Errors in spatial measurements and missing data.
• Computational Cost: High resource demand for large-scale spatial analysis.

18
Temporal Mining

19
Temporal Mining
• Temporal data mining is the process of discovering meaningful patterns, relationships,
and trends in data that change over time i.e. temporal data.
• Unlike traditional data mining, which deals with static datasets, temporal mining
considers the sequential and time-dependent nature of the data.
• Temporal data refers to data that is time-dependent, meaning that it represents
observations collected at specific points or intervals over time. Each data point is
associated with a timestamp, making time an essential component of the data.
• Temporal Data is the temporary data that is valid only for a prescribed time.
• Temporal data becomes invalid or obsolete after a certain period of time.
• Temporal data mining is concerned with the analysis of temporal data and for
discovering temporal patterns and consistencies in sets of temporal information.

20
Temporal Mining
• It also allows the possibility of computer-driven, automatic exploration of the data.
• There are various tasks in temporal mining which are as follows −
– Data characterization and comparison
– Clustering analysis
– Classification
– Association rules
– Pattern analysis
– Prediction and trend analysis
Applications of Temporal Data Mining
• Finance: Stock market prediction, fraud detection in banking.
• Healthcare: Disease progression analysis, patient monitoring.
• Weather Forecasting: Identifying climate patterns and predicting storms.
• Retail and Marketing: Customer behavior analysis, seasonal sales forecasting.
• Cybersecurity: Detecting anomalies in network traffic over time.
21
Temporal Mining
• It also allows the possibility of computer-driven, automatic exploration of the data.
• There are various tasks in temporal mining which are as follows −
– Data characterization and comparison
– Clustering analysis
– Classification
– Association rules
– Pattern analysis
– Prediction and trend analysis
Applications of Temporal Data Mining
• Finance: Stock market prediction, fraud detection in banking.
• Healthcare: Disease progression analysis, patient monitoring.
• Weather Forecasting: Identifying climate patterns and predicting storms.
• Retail and Marketing: Customer behavior analysis, seasonal sales forecasting.
• Cybersecurity: Detecting anomalies in network traffic over time.
22
Temporal Mining
Challenges in Temporal Data Mining
• Handling Large-Scale Data: Temporal datasets grow rapidly over time.
• Data Uncertainty: Missing, noisy, or imprecise time-stamped data.
• Complexity of Patterns: Temporal relationships can be intricate and multi-layered.
• Computational Efficiency: Processing and analyzing long time-series data efficiently.

23
Spatial Data Mining Temporal Data Mining
Spatial data mining refers to the extraction of temporal data mining refers to the process of
knowledge, spatial relationships and extraction of knowledge about the occurrence
interesting patterns that are not specifically of an event whether they follow, random, cyclic,
stored in a spatial database. seasonal variation, etc
It needs space. It needs time.
Primarily, it deals with spatial data such as Primarily, it deals with implicit and explicit
location, geo-referenced. temporal content, form a huge set of data.
It targets mining new patterns and unknown
It involves characteristic rules, discriminant
knowledge, which takes the temporal aspects of
rules, evaluation rules, and association rules.
data.
Examples: An association rules which seems -
"Any person who buys motorcycle also buys
Examples: Finding hotspots, unusual locations. helmet". By temporal aspect, this rule would be
- "Any person who buys a motorcycle also buy a
helmet after that." 24
Multimedia Mining

25
Multimedia Mining
• Multimedia mining is the process of finding useful information from multimedia data
sets, such as images, videos, audio, and text.
• Multimedia data mining is an interdisciplinary field that integrates image processing
and understanding, computer vision, data mining, and pattern recognition.
• Unlike traditional data mining, which primarily deals with structured numerical data,
multimedia mining involves analyzing unstructured and semi-structured data.
Types of Multimedia Data
• Image Data – Photos, medical scans (X-rays, MRIs), satellite images.
• Video Data – Surveillance footage, movies, user-generated content.
• Audio Data – Speech recordings, music, sonar signals.
• Text Data – Web pages, social media posts, documents.
• Graphics – 3D models, computer-generated design

26
Multimedia Mining
Applications of Multimedia Data Mining
• Healthcare – Detecting diseases from medical images (e.g., AI-based cancer diagnosis).
• Surveillance & Security – Face recognition, crime pattern detection in CCTV footage.
• Social Media Analysis – Understanding trends using image/video content (e.g.,
Instagram hashtags).
• Entertainment & Media – Recommending music, movies (e.g., Netflix, Spotify).
• E-commerce – Product image recognition for visual search (e.g., Google Lens).
• Remote Sensing – Analyzing satellite images for climate change monitoring.
Challenges in Multimedia Data Mining
• 🔹 Large Data Volumes – Processing high-quality images and videos requires massive
storage and computing power.
🔹 Data Complexity – Multimedia data is unstructured and needs advanced
algorithms.
🔹 Noise & Redundancy – Background noise in audio, irrelevant objects in images.
🔹 Privacy & Security – Risks of unauthorized access to multimedia data.
27
Multimedia Mining
Example: Multimedia Mining in Healthcare
• Scenario
• A hospital wants to use AI to detect lung diseases from chest X-rays.
• How Multimedia Mining is Used
✔ Image Processing – AI scans the X-rays for abnormalities.
✔ Pattern Recognition – Compares new X-rays with past cases of lung diseases.
✔ Classification – Determines if a patient has pneumonia, tuberculosis, or is healthy.
✔ Predictive Analysis – Alerts doctors about high-risk patients.
• Outcome
• Doctors receive AI-assisted diagnoses, leading to faster and more accurate treatment.

28

You might also like