Case Study For Text Analytics
Case Study For Text Analytics
Situation:
Vinod who is working in AI company. He has a client who is a librarian who need to
analyze a online book and get the datas based on the features of words and their
characterstics.The features of words can be its sentiments, parts of speech etc. By validating
the sentences the client need to get each words features based on the natural language
processing.
Solution:
Since the book has more than 300 pages ,it necessary to organize the data of words and
able to provide a better approach for the model.To categorize each text based on NLP approach
we can provide statistical techniques for analyzing sentiments and other aspects of the text.This
techiques can be solved using two learnings in machine learning approach. They are suprvised
and unsupervised learning.
Supervised learning: In this learning we can able to get the labbled text according to users
choice by training the datasets. By using various algorithms you extract the data based on
tagged texts. There provides features of NLP to extract the text documents. Tokenization is one
of the feature the text of the words book is seperated into pieces to figure out easily by the
machine.Another feature is to identify the parts of speech and assigning. The next feature is to
get the named entity of the text followed by its places,people,title etc.The next method is to
analyse the sentiment of the text whether it is positive,negative and neutral and extract by
means of each category.The final method is to classify and categorize the data to get the results
most accurately in faster time.
Unsupervised learning:In this method you get the data on the basis of group and doesnot have
labelled set of datas.Since the words are in hierarchical form it’s a method to use clustering to
group the documents and then sorted,and other method is latent semantics indexing to identify
and search words and phrases that’s are frequently used with each other. We can explore the
data by another technique of unsupervised learning called Matrix Factorization.The data of
texts are combines in a matrix and able to identify the similarity between two smaller
matrices.This process is called latent factors.By various matrices approach we can analyze the
data based on syntax we can recall the sentences text later
To get the exact information of the sentence in the book nlp classifies the information such as
semantic information is to get correct meaning for word that suites the sentence,syntax
information is to get the information based on the meaning that how we read and finally the
context of the sentence.Thus based on these crietria we need take a good machine learning
model to build the application.There are following levels/rules in hybrid machine learning
model system to undergo:To get the following text from the book we need the initial process to
run text and change to structured data by tuning the model.Then the mid level text funtion is to
extract the content of the text and final stage is high level text function by detrmining and
applying the sentiment to the text by NLP approach and summarize the case generating the
data of texts.
Results:
By detailed view of choosing the machine learning approach and NLP model we
can generate the data from the book and its features.