0% found this document useful (0 votes)
5 views

Building An AI Model Capable of Judging User Sentiments

Uploaded by

Ravishek Singh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Building An AI Model Capable of Judging User Sentiments

Uploaded by

Ravishek Singh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Building an AI model capable of judging user sentiments, the

approach can be broken down into the following key steps:


1. Problem Definition
 Defining the scope: Is it a classification of sentiment in general as positive, negative, or
neutral, or is it more detailed than that (i.e., recognizing anger, joy, sadness, etc.)?
 Choose the mode of conducting the sentiment analysis: (textual, audial, or videography-
based)
2. Data Collection
 Output data:
o User Data: From where the text will be based includes user reviews, social media
comments, chat rooms, or fill-in-the-questionnaire.
o Domain-Specific Data: Establish if there is a specific field whereby the model
should function, for instance, healthcare, finance, or customer care, where
sentiment data from that domain can be obtained.
o Labelled Data: Dataset that has been labeled, whereby every piece of text that
was provided is assigned a specific sentiment, either positive, negative, or
neutral. Evaluation datasets from IMDb, Roboto reviews, and Twitter sentiment
datasets are available and can be used to create your own data.
3. Data Pretreatment
 Noise removal: It is aimed to eliminate noise such as special characters, stop words, use
of punctuation, HTML tags, etc.
 Subsequent Tokenization: To undertake segmentation of the sentences and determine
the words or subwords used in the sentences.
 Normalization: Bring it to lowercase, lemmatize or stem the words so they are in their
basic form (e.g., changing "running" to "run").
 Treatment of Classification Imbalance: Examine if there is a class imbalance within the
dataset and rectify if of importance (e.g., more positive than negative reviews).
 Encoding: These techniques convert the text into some numerical representation:
o Bag of Words

o TF-IDF

o Word Embedding

o Sentence Embedding or Contextual Embedding

4. Choosing a Model
 Depending on the complexity of the problem and the dataset, choose from:
o Traditional models: Logistic Regression (LR), Naive Bayes (NB), Support Vector
Machines (SVM)
o Deep Learning models: Long Short-Term Memory (LSTM), Gated Recurrent
Unit (GRU), Convolutional Neural Networks (CNN) for text classification
o Transformer Models: For better understanding context, use BERT, RoBERTa,
or Generative Pre-Trained Transformer (GPT) models
5. Train and Validate
 Splitting of the dataset: Once the dataset is ready, it is divided into training data,
validation data, and test data (e.g., 70% training, 15% validation, 15% testing).
 Training: Given the selected model, the next step is to use the training data to train the
model.
 Evaluation: The evaluation of the model performance involves the use of evaluation
metrics such as accuracy, precision, recall, F1-score, confusion matrix, etc.
6. Hyperparameter Tuning
 Hyperparameters such as learning rate, batch size, epochs, optimizer type, etc., should
also be adjusted to increase performance.
7. Model Deployment
 After the model has been validated against the test data, the model can be placed in
practical use.
 For example, if the end-user types in inputs such as customer reviews or social media
comments, the model can process this information and give back predictions regarding
sentiments.
8. Post-Deployment Monitoring
 Need to keep a check on the performance or prediction of the model over the data
collected from the real world, in which the model is expected to be used.
Inputs Needed from the User:
1. Dataset: Textual information from user interactions, reviews, feedback, or any other
domain-specific data.
2. Sentiment Labels: Information that the user wishes to categorize the sentiments (e.g.,
two ways of categorization positive or negative, or three-way anger, joy, sadness).
3. Domain Information: Clarification is usually pertinent to whether the given model is
sentiment-oriented to any specific domain, like healthcare, finance, customer support,
etc.
4. Performance Metrics: Performance metrics like accuracy or interpretability, and speed
of making predictions are priorities of the user.
5. Deployment Preferences: Long text about what the user intends to do with the model,
such as whether they would be looking for a batch or online model or where the model
would be deployed, just like the cloud, the web, or mobile.

You might also like