0% found this document useful (0 votes)

142 views5 pages

1-What Is Text Mining - IBM

The document discusses text mining, which is the process of analyzing unstructured text data to identify meaningful patterns and insights. It defines text mining and describes how it can transform unstructured text into structured data through natural language processing techniques. Finally, it outlines several common applications of text mining, including customer service, risk management, maintenance, and healthcare.

Uploaded by

Nagendra Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

142 views5 pages

1-What Is Text Mining - IBM

Uploaded by

Nagendra Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Text Mining

Learn about text mining, which is the practice of

analyzing vast collections of textual materials to
capture key concepts, trends and hidden
relationships.

What is text mining?

Text mining, also known as text data mining, is the process of transforming unstructured
text into a structured format to identify meaningful patterns and new insights. By applying
advanced analytical techniques, such as Naïve Bayes, Support Vector Machines (SVM),
and other deep learning algorithms, companies are able to explore and discover hidden
relationships within their unstructured data.

Text is a one of the most common data types within databases. Depending on the
database, this data can be organized as:

Structured data: This data is standardized into a tabular format with numerous rows
and columns, making it easier to store and process for analysis and machine learning
algorithms. Structured data can include inputs such as names, addresses, and phone
numbers.

Unstructured data: This data does not have a predefined data format. It can include
text from sources, like social media or product reviews, or rich media formats like,
video and audio files.

Semi-structured data: As the name suggests, this data is a blend between

structured and unstructured data formats. While it has some organization, it doesn’t
have enough structure to meet the requirements of a relational database. Examples
of semi-structured data include XML, JSON and HTML files.

Since 80% of data in the world resides in an unstructured format, text mining is an

extremely valuable practice within organizations. Text mining tools and natural language
processing (NLP) techniques, like information extraction (PDF, 127.9 KB) (link reside
outside of IBM), allow us to transform unstructured documents into a structured format to
enable analysis and the generation of high-quality insights. This, in turn, improves the
decision-making of organizations, leading to better business outcomes.

Text mining vs. text analytics

The terms, text mining and text analytics, are largely synonymous in meaning in
conversation, but they can have a more nuanced meaning. Text mining and text analysis
identifies textual patterns and trends within unstructured data through the use of machine
learning, statistics, and linguistics. By transforming the data into a more structured format
through text mining and text analysis, more quantitative insights can be found through text
analytics. Data visualization techniques can then be harnessed to communicate findings to
wider audiences.

Text mining techniques

The process of text mining comprises several activities that enable you to deduce
information from unstructured text data. Before you can apply different text mining
techniques, you must start with text preprocessing, which is the practice of cleaning and
transforming text data into a usable format. This practice is a core aspect of natural
language processing (NLP) and it usually involves the use of techniques such as language
identification, tokenization, part-of-speech tagging, chunking, and syntax parsing to format
data appropriately for analysis. When text preprocessing is complete, you can apply text
mining algorithms to derive insights from the data. Some of these common text mining
techniques include:

Information retrieval
Information retrieval (IR) returns relevant information or documents based on a pre-
defined set of queries or phrases. IR systems utilize algorithms to track user behaviors
and identify relevant data. Information retrieval is commonly used in library catalogue
systems and popular search engines, like Google. Some common IR sub-tasks include:

Tokenization: This is the process of breaking out long-form text into sentences and
words called “tokens”. These are, then, used in the models, like bag-of-words, for text
clustering and document matching tasks.

Stemming: This refers to the process of separating the prefixes and suffixes from
words to derive the root word form and meaning. This technique improves
information retrieval by reducing the size of indexing files.
Natural language processing (NLP)
Natural language processing, which evolved from computational linguistics, uses methods
from various disciplines, such as computer science, artificial intelligence, linguistics, and
data science, to enable computers to understand human language in both written and
verbal forms. By analyzing sentence structure and grammar, NLP sub-tasks allow
computers to “read”. Common sub-tasks include:

Summarization: This technique provides a synopsis of long pieces of text to create

a concise, coherent summary of a document’s main points.

Part-of-Speech (PoS) tagging: This technique assigns a tag to every token in a

document based on its part of speech—i.e. denoting nouns, verbs, adjectives, etc.
This step enables semantic analysis on unstructured text.

Text categorization: This task, which is also known as text classification, is

responsible for analyzing text documents and classifying them based on predefined
topics or categories. This sub-task is particularly helpful when categorizing synonyms
and abbreviations.

Sentiment analysis: This task detects positive or negative sentiment from internal or
external data sources, allowing you to track changes in customer attitudes over time.
It is commonly used to provide information about perceptions of brands, products,
and services. These insights can propel businesses to connect with customers and
improve processes and user experiences.

Information extraction
Information extraction (IE) surfaces the relevant pieces of data when searching various
documents. It also focuses on extracting structured information from free text and storing
these entities, attributes, and relationship information in a database. Common information
extraction sub-tasks include:

Feature selection, or attribute selection, is the process of selecting the important

features (dimensions) to contribute the most to output of a predictive analytics model.

Feature extraction is the process of selecting a subset of features to improve the

accuracy of a classification task. This is particularly important for dimensionality
reduction.

Named-entity recognition (NER) also known as entity identification or entity

extraction, aims to find and categorize specific entities in text, such as names or
locations. For example, NER identifies “California” as a location and “Mary” as a
woman’s name.
Data mining
Data mining is the process of identifying patterns and extracting useful insights from big
data sets. This practice evaluates both structured and unstructured data to identify new
information, and it is commonly utilized to analyze consumer behaviors within marketing
and sales. Text mining is essentially a sub-field of data mining as it focuses on bringing
structure to unstructured data and analyzing it to generate novel insights. The techniques
mentioned above are forms of data mining but fall under the scope of textual data
analysis.

Text mining applications

Text analytics software has impacted the way that many industries work, allowing them to
improve product user experiences as well as make faster and better business decisions.
Some use cases include:

Customer service: There are various ways in which we solicit customer feedback

from our users. When combined with text analytics tools, feedback systems, such
as chatbots, customer surveys, NPS (net-promoter scores) , online reviews, support
tickets, and social media profiles, enable companies to improve their customer
experience with speed. Text mining and sentiment analysis can provide a mechanism
for companies to prioritize key pain points for their customers, allowing businesses to
respond to urgent issues in real-time and increase customer satisfaction. Learn how
Verizon is using text analytics in customer service.

Risk management: Text mining also has applications in risk management, where it

can provide insights around industry trends and financial markets by monitoring shifts
in sentiment and by extracting information from analyst reports and whitepapers. This
is particularly valuable to banking institutions as this data provides more confidence
when considering business investments across various sectors. Learn how CIBC and
EquBot are using text analytics for risk mitigation.

Maintenance: Text mining provides a rich and complete picture of the operation and
functionality of products and machinery. Over time, text mining automates decision
making by revealing patterns that correlate with problems and preventive and
reactive maintenance procedures. Text analytics helps maintenance professionals
unearth the root cause of challenges and failures faster. Learn how Korean Airlines is
using text analytics for maintenance.

Healthcare: Text mining techniques have been increasingly valuable to researchers

in the biomedical field, particularly for clustering information. Manual investigation of
medical research can be costly and time-consuming; text mining provides an
automation method for extracting valuable information from medical literature.

Spam filtering: Spam frequently serves as an entry point for hackers to infect

computer systems with malware. Text mining can provide a method to filter and
exclude these e-mails from inboxes, improving the overall user experience and
minimizing the risk of cyber-attacks to end users.

Text mining and IBM Watson

Find trends with IBM Watson Discovery so your business can make better decisions
informed by data. Text analytics dig through your data in real time to reveal hidden
patterns, trends and relationships between different pieces of content. Use text analytics to
gain insights into customer and user behavior, analyze trends in social media and e-
commerce, find the root causes of problems and more. There is untapped business value
in your hidden insights. Get started with IBM Watson Discovery today.

Allow your data scientists to excel by equipping them with a powerful data mining toolkit.
IBM’s Watson Natural Language Understanding can help your teams learn how to analyze
text to reveal structure and meaning. Your teams can extract metadata from content such
as concepts, entities, keywords, categories, sentiment, emotion, relations and semantic
roles using natural language understanding. Get started with IBM Watson Natural
Language Understanding today.

Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
63 pages
Text Mining in Data Mining Guide
No ratings yet
Text Mining in Data Mining Guide
18 pages
Text Mining
No ratings yet
Text Mining
12 pages
Module 4
No ratings yet
Module 4
63 pages
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
No ratings yet
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
11 pages
Module 1 Part1
No ratings yet
Module 1 Part1
54 pages
Case Study On Text Mining
100% (1)
Case Study On Text Mining
8 pages
Unit 1
No ratings yet
Unit 1
8 pages
Astma Lab Manual
No ratings yet
Astma Lab Manual
17 pages
DMTerm Paper
No ratings yet
DMTerm Paper
4 pages
Text Mining
No ratings yet
Text Mining
16 pages
What Is Text Mining
No ratings yet
What Is Text Mining
9 pages
Text Mining
No ratings yet
Text Mining
18 pages
(IJCST-V6I4P5) :S.Sheela, T.Bharathi
No ratings yet
(IJCST-V6I4P5) :S.Sheela, T.Bharathi
7 pages
Text Mining: 2 History
No ratings yet
Text Mining: 2 History
8 pages
Data Mining for Business Experts
No ratings yet
Data Mining for Business Experts
41 pages
Text Mining and Its Business Applications
No ratings yet
Text Mining and Its Business Applications
17 pages
Text Mining: A Burgeoning Technology For Knowledge Extraction
100% (1)
Text Mining: A Burgeoning Technology For Knowledge Extraction
5 pages
Text Analytics and Text Mining Overview
No ratings yet
Text Analytics and Text Mining Overview
16 pages
Survey Data Analysis
No ratings yet
Survey Data Analysis
17 pages
AFM - Module 4
No ratings yet
AFM - Module 4
48 pages
Assignment Rubel - Data Mining
No ratings yet
Assignment Rubel - Data Mining
12 pages
Text Mining: Techniques and Its Application: December 2014
100% (1)
Text Mining: Techniques and Its Application: December 2014
5 pages
Text Mining: Concepts, Process and Applications: January 2013
No ratings yet
Text Mining: Concepts, Process and Applications: January 2013
5 pages
Information Retrieval
No ratings yet
Information Retrieval
3 pages
Comparative Analysis of Text Mining Techniques For
No ratings yet
Comparative Analysis of Text Mining Techniques For
12 pages
Business Intelligence and Anlytics UNIT 2
No ratings yet
Business Intelligence and Anlytics UNIT 2
35 pages
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
42 pages
Text Mining: Tools, Techniques, and Applications
No ratings yet
Text Mining: Tools, Techniques, and Applications
19 pages
Text Mining
No ratings yet
Text Mining
6 pages
Text Mining
No ratings yet
Text Mining
13 pages
FDS-Content Beyond Syllabus
No ratings yet
FDS-Content Beyond Syllabus
15 pages
Text Mining in Big Data Analytics
No ratings yet
Text Mining in Big Data Analytics
34 pages
Text Mining: Techniques & Applications
No ratings yet
Text Mining: Techniques & Applications
10 pages
10 1109@icaccs 2019 8728547
No ratings yet
10 1109@icaccs 2019 8728547
5 pages
TextAnalyticsApplicationofTextMining2021 31122023 071845am 1 10122024 061001pm
No ratings yet
TextAnalyticsApplicationofTextMining2021 31122023 071845am 1 10122024 061001pm
7 pages
1 2 3 4 5 Merged
No ratings yet
1 2 3 4 5 Merged
23 pages
Text Mining Techniques Overview
100% (1)
Text Mining Techniques Overview
4 pages
Unit 5 DM
No ratings yet
Unit 5 DM
11 pages
Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
64 pages
Text Mining
No ratings yet
Text Mining
16 pages
Data Mining
No ratings yet
Data Mining
34 pages
Text Mining for Business Insights
No ratings yet
Text Mining for Business Insights
10 pages
Unit I - Text Mining
No ratings yet
Unit I - Text Mining
48 pages
Web and Text Mining
No ratings yet
Web and Text Mining
6 pages
Seven Text Mining Techniques
No ratings yet
Seven Text Mining Techniques
21 pages
Text and Web Mining
No ratings yet
Text and Web Mining
44 pages
10 - Session 10 - Text Analytics, Text Mining and Sentiment Analysis
No ratings yet
10 - Session 10 - Text Analytics, Text Mining and Sentiment Analysis
36 pages
Effective Classification of Text
No ratings yet
Effective Classification of Text
6 pages
Text Mining: Techniques and Challenges
No ratings yet
Text Mining: Techniques and Challenges
5 pages
A Detailed Study On Text Mining Techniques
No ratings yet
A Detailed Study On Text Mining Techniques
4 pages
Text Analytics
No ratings yet
Text Analytics
9 pages
Submitted To: Submitted By:: Text Mining
No ratings yet
Submitted To: Submitted By:: Text Mining
15 pages
DMPPT 557
No ratings yet
DMPPT 557
14 pages
Text Mining
No ratings yet
Text Mining
3 pages
Data Information and Knowledge Management
No ratings yet
Data Information and Knowledge Management
2 pages
SAP Implementation Guide
No ratings yet
SAP Implementation Guide
17 pages
AV FDTI - Audio Visualfusion For Dronethreatidentification
No ratings yet
AV FDTI - Audio Visualfusion For Dronethreatidentification
8 pages
Cold Forging Process Tool Design
No ratings yet
Cold Forging Process Tool Design
8 pages
l81 Lbbbio8 Zoolfun Lab t3 Ay 24 25 (S-Bjma)
No ratings yet
l81 Lbbbio8 Zoolfun Lab t3 Ay 24 25 (S-Bjma)
5 pages
The Oxford Linear Algebra For Scientists Andre Lukas Digital Access
No ratings yet
The Oxford Linear Algebra For Scientists Andre Lukas Digital Access
403 pages
Epas NC II - CBC
100% (1)
Epas NC II - CBC
94 pages
Chapter 4 QUality Control
100% (1)
Chapter 4 QUality Control
27 pages
GROUP PROJECT Cost Behaviour Relevant Cost and Incremental Analysis EMBA27JB
No ratings yet
GROUP PROJECT Cost Behaviour Relevant Cost and Incremental Analysis EMBA27JB
3 pages
PROJ6004 - Assessment 2 - 20240603
No ratings yet
PROJ6004 - Assessment 2 - 20240603
8 pages
COMM2381 - CSP - 24 - W7 - Key Message, Insight and Big Idea For Sustainable Brand Development - Canvas
No ratings yet
COMM2381 - CSP - 24 - W7 - Key Message, Insight and Big Idea For Sustainable Brand Development - Canvas
32 pages
Great Communication Secrets of Great Leaders - Summary
No ratings yet
Great Communication Secrets of Great Leaders - Summary
9 pages
Alex Fedde Kalverboer - Brian Hopkins - Reint Geuze - European Network On Longitudinal Studies On Individual Development - Motor Develop
No ratings yet
Alex Fedde Kalverboer - Brian Hopkins - Reint Geuze - European Network On Longitudinal Studies On Individual Development - Motor Develop
402 pages
Ebook - Farmer The Decision Maker
No ratings yet
Ebook - Farmer The Decision Maker
125 pages
SOR Unit 3 TTL
No ratings yet
SOR Unit 3 TTL
13 pages
MYP 1 - Criterion A Rubric
No ratings yet
MYP 1 - Criterion A Rubric
1 page
TIAv18IEC61850ClientBasicLibrary V102
No ratings yet
TIAv18IEC61850ClientBasicLibrary V102
62 pages
Contemporary Children'S Literature: Overview & Assessment
No ratings yet
Contemporary Children'S Literature: Overview & Assessment
52 pages
Community Journalism Challenges in Nigeria
No ratings yet
Community Journalism Challenges in Nigeria
51 pages
Whistle Blower Fed Reg
No ratings yet
Whistle Blower Fed Reg
85 pages
ITIL 4 Foundation Sample Exams Paper A 2022 v1.1 - UNLK
No ratings yet
ITIL 4 Foundation Sample Exams Paper A 2022 v1.1 - UNLK
36 pages
Ae 13 Financial Accounting & Reporting Prelim. Exam
No ratings yet
Ae 13 Financial Accounting & Reporting Prelim. Exam
3 pages
Ultimatum Laporan Akhir M. Bagas Rizky PG-1
No ratings yet
Ultimatum Laporan Akhir M. Bagas Rizky PG-1
43 pages
Analyzing An Audience in Public Speaking, LOGICAL ORGANIZITION, DURATION
No ratings yet
Analyzing An Audience in Public Speaking, LOGICAL ORGANIZITION, DURATION
16 pages
Identifying Promising Technologies of Electric Vehicles From The Perspective of Market and Technical Attributes
No ratings yet
Identifying Promising Technologies of Electric Vehicles From The Perspective of Market and Technical Attributes
22 pages
1 s2.0 S0952197621004048 Main
No ratings yet
1 s2.0 S0952197621004048 Main
12 pages
Learning From Observation
No ratings yet
Learning From Observation
5 pages
GEOE 309 Introduction - 2023
No ratings yet
GEOE 309 Introduction - 2023
33 pages
Parent Involvement in Early Education
No ratings yet
Parent Involvement in Early Education
4 pages
X UNIT 1 Communication Skills Notes
No ratings yet
X UNIT 1 Communication Skills Notes
5 pages

1-What Is Text Mining - IBM

Uploaded by

1-What Is Text Mining - IBM

Uploaded by

Text Mining

Learn about text mining, which is the practice of

What is text mining?

Semi-structured data: As the name suggests, this data is a blend between

Since 80% of data in the world resides in an unstructured format, text mining is an

Text mining vs. text analytics

Text mining techniques

Summarization: This technique provides a synopsis of long pieces of text to create

Part-of-Speech (PoS) tagging: This technique assigns a tag to every token in a

Text categorization: This task, which is also known as text classification, is

Feature selection, or attribute selection, is the process of selecting the important

Feature extraction is the process of selecting a subset of features to improve the

Named-entity recognition (NER) also known as entity identification or entity

Text mining applications

Customer service: There are various ways in which we solicit customer feedback

Risk management: Text mining also has applications in risk management, where it

Healthcare: Text mining techniques have been increasingly valuable to researchers

Spam filtering: Spam frequently serves as an entry point for hackers to infect

Text mining and IBM Watson

You might also like