NLP Q2 21SAL54 Scheme

The document discusses an end semester examination for a course on mobile device forensics. It includes questions in two parts - Part A has multiple choice questions on topics related to natural language processing and Part B has questions requiring longer answers on applications of NLP, algorithms used in NLP and preprocessing text data techniques like stemming and lemmatization.

Uploaded by

prakash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

NLP Q2 21SAL54 Scheme

Uploaded by

prakash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

SRINIVAS UNIVERSITY

INSTITUTE OF ENGINEERING & TECHNOLOGY

DEPARTMENT OF ARTIFICIAL INTELIGENCE and MACHINE LEARNING
5th Sem B. Tech End Semester Examination, Dec-2023
Subject: Mobile Device Forensics- Scheme
Question No Question Particulars with Scheme

PART A

1 1. To classify text documents

2. Behavior tree
3. Analysis
4. All the Above
5. Text representation
6. Robot
7. The smallest units of sound in a language
8. Natural Language Toolkit
9. Image recognition
10. Common sense and world knowledge

PART B

1 a NLP applications:
• Email platforms, such as Gmail, Outlook, etc., use NLP extensively to
provide a range of product features, such as spam classification, priority
inbox, calendar event extraction, auto-complete, etc.
• Voice-based assistants, such as Apple Siri, Google Assistant, Microsoft
Cortana, and Amazon Alexa rely on a range of NLP techniques to
interact with the user, understand user commands, and respond
accordingly.
• Modern search engines, such as Google and Bing, which are the
cornerstone of today’s internet, use NLP heavily for various subtasks,
such as query understanding, query expansion, question answering,
information retrieval, and ranking and grouping of the results, to name
a few.
• Machine translation services, such as Google Translate, Bing Microsoft
Translator, and Amazon Translate are increasingly used in today’s
world to solve a wide range of scenarios and business use cases. These
services are direct applications of NLP.
1 b • Text Classification: Text classification, also known as text
categorization, is the process of assigning predefined categories or
labels to text documents based on their content. Machine learning
algorithms, particularly natural language processing (NLP) techniques,
are commonly used for text classification.
• Information Extraction: Information extraction (IE) is the process of
automatically extracting structured information from unstructured or
semi-structured text. This involves identifying specific entities (such as
names, dates, and locations) and their relationships from a given text.
Information Retrieval:
• Information retrieval (IR) is the process of retrieving relevant
information from a large dataset, typically a collection of documents, in
response to user queries or search terms. Search engines are a common
example of information retrieval systems, where users input keywords,
and the system returns a list of relevant documents.
• Language Modelling: This is the task of predicting what the next word
in a sentence will be based on the history of previous words. The goal
of this task is to learn the probability of a sequence of words appearing
in a given language. Language modeling is useful for building solutions
for a wide variety of problems, such as speech recognition, optical
character recognition, handwriting recognition, machine translation,
and spelling correction

2 a Explanation of each module:

2 b Support vector machine The support vector machine (SVM) is another

popular classification algorithm. An SVM can learn both a linear and nonlinear
decision boundary to separate data points belonging to different classes. A
linear decision boundary learns to repre‐ sent the data in a way that the class
differences become apparent. An SVM learns an optimal decision boundary so
that the distance between points across classes is at its maximum. The
conditional random field (CRF) is another algorithm that is used for sequential
data. Conceptually, a CRF essentially performs a classification task on each
element in the sequence. Imagine the same example of POS tagging, where a
CRF can tag word by word by classifying them to one of the parts of speech
from the pool of all POS tags. Since it takes the sequential input and the context
of tags into consideration, it becomes more expressive than the usual
classification methods and generally performs better. CRFs outperform HMMs
for tasks such as POS tagging, which rely on the sequential nature of language.
Hidden Markov Model The hidden Markov model (HMM) is a statistical
model that assumes there is an underlying, unobservable process with hidden
states that generate the data—i.e., we can only observe the data once it is
generated. An HMM then tries to model the hidden states from this data.

Naive Bayes Naive Bayes is a classic algorithm for classification tasks that
mainly relies on Bayes’ theorem (as is evident from the name). Using Bayes’
theorem, it calculates the probability of observing a class label given the set of
features for the input data. A characteristic of this algorithm is that it assumes
each feature is independent of all other features. For the news classification
example mentioned earlier in this chapter, one way to represent the text
numerically is by using the count of domain-specific words, such as sport-
specific or politics-specific words, present in the text. We assume that these
word counts are not correlated to one another. If the assumption holds, we can
use Naive Bayes to classify news articles. While this is a strong assump‐ tion to
make in many cases, Naive Bayes is commonly used as a starting algorithm for
text classification. This is primarily because it is simple to understand and very
fast to train and run.

3 a Stemming:
Definition: Stemming involves removing suffixes from words to obtain their
root form. The goal is to reduce words to a common base or stem.
Example: Consider the word "running." The stem of this word, obtained
through stemming, would be "run." Similarly, "jumps" would be stemmed to
"jump."
Output : ['run', 'jump', 'happili', 'better']

Lemmatization:
Definition: Lemmatization, like stemming, aims to reduce words to their base
form. However, lemmatization considers the word's meaning and context,
ensuring that the resulting lemma is a valid word.
Example: Consider the word "better." The lemma of this word, obtained
through lemmatization, would also be "better." Similarly, "running" would be
lemmatized to "run."
Output: ['running', 'jump', 'happily', 'better']
b • Overfitting on small datasets
• Few-shot learning and synthetic data generation
• Domain adaptation
• Cost
• Interpretable models
• Common sense and world knowledge
• On-device deployment
Explain any least fou

4 a Feature Engineering with classical NLP:

b List and Discuss the evaluation parameters.

5 a BoW maps words to unique integer IDs between 1 and |V|. Each document in
the corpus is then converted into a vector of |V| dimensions where in the i th
component of the vector, i = wid, is simply the number of times the word w
occurs in the document, i.e., we simply score each word in V by their occur‐
rence count in the document.
Thus, for our toy corpus, where the word IDs are dog = 1, bites = 2, man = 3,
meat = 4 , food = 5, eats = 6,
D1 becomes [1 1 1 0 0 0]. This is because the first three words in the vocabulary
appeared exactly once in D1, and the last three did not appear at all.
D2 becomes [1 1 1 0 0 0 ]
D3 Becomes [1 0 0 1 0 1]
D4 becomes [0 0 1 0 1 1]
5 b One Hot Encoding
D1 Dog bites man.
D2 Man bites dog.
D3 Dog eats meat.
D4 Man eats food. First map each of the six words to unique IDs: dog = 1, bites
= 2, man = 3, meat = 4 , food = 5, eats = 6.
Let’s consider document D1: “dog bites man”. As per the scheme, each word
is a six-dimensional vector. The dog is represented as [1 0 0 0 0 0], as the word
“dog” is mapped to ID 1. Bites is represented as [0 1 0 0 0 0], and so on. Thus,
D1 is represented as [ [1 0 0 0 0 0] [0 1 0 0 0 0] [0 0 1 0 0 0]]. D4 is represented
as [ [ 0 0 1 0 0] [0 0 0 0 1 0] [0 0 0 0 0 1]].

Module-5 (1)
No ratings yet
Module-5 (1)
57 pages
6. Applications of NLP
No ratings yet
6. Applications of NLP
85 pages
Applications of AI
No ratings yet
Applications of AI
11 pages
13. TEXT CLASSIFICATION USING NLP
No ratings yet
13. TEXT CLASSIFICATION USING NLP
28 pages
NLP_Module 2
No ratings yet
NLP_Module 2
54 pages
Final PPT
No ratings yet
Final PPT
14 pages
NLP_DeepNLP
No ratings yet
NLP_DeepNLP
61 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
News Classification Using Machine Learning
No ratings yet
News Classification Using Machine Learning
5 pages
NLP Intro
No ratings yet
NLP Intro
74 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
Unit 6 Endsem PYQs
No ratings yet
Unit 6 Endsem PYQs
15 pages
Module 05 - Learners Guide
No ratings yet
Module 05 - Learners Guide
31 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
Document
No ratings yet
Document
7 pages
Transformer
No ratings yet
Transformer
5 pages
13 Ai Cse551 NLP 1 PDF
No ratings yet
13 Ai Cse551 NLP 1 PDF
50 pages
Unit iv
No ratings yet
Unit iv
57 pages
Unit 5
No ratings yet
Unit 5
8 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
Unit 5 - Aiaaia
No ratings yet
Unit 5 - Aiaaia
19 pages
Unit iv
No ratings yet
Unit iv
58 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
Module III
No ratings yet
Module III
42 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
37 pages
Machine Learning For NLP: Vocabulary
No ratings yet
Machine Learning For NLP: Vocabulary
37 pages
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
No ratings yet
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
29 pages
AI UNIT-5 notes
No ratings yet
AI UNIT-5 notes
27 pages
Data Science & Data Analytics Project - Documentation
No ratings yet
Data Science & Data Analytics Project - Documentation
10 pages
DM Chapter 9 - word embedding
No ratings yet
DM Chapter 9 - word embedding
7 pages
Speech and Language Processing - J&M
No ratings yet
Speech and Language Processing - J&M
599 pages
Important 2 Marks
No ratings yet
Important 2 Marks
11 pages
Pert23 - NLP
No ratings yet
Pert23 - NLP
30 pages
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
No ratings yet
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
45 pages
04 - Text Representation
No ratings yet
04 - Text Representation
131 pages
Pipeline
No ratings yet
Pipeline
9 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
UNIT_5_DL
No ratings yet
UNIT_5_DL
11 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
NLP BOOK
No ratings yet
NLP BOOK
599 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
Predictive Methods For Text Mining
No ratings yet
Predictive Methods For Text Mining
75 pages
Lab 5
No ratings yet
Lab 5
27 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
74 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
24 pages
bag_of_words nlp
No ratings yet
bag_of_words nlp
23 pages
5. DM_AI22C07_UNIT 5.pptx
No ratings yet
5. DM_AI22C07_UNIT 5.pptx
101 pages
lecture 10
No ratings yet
lecture 10
86 pages
module5_DS_ppt
No ratings yet
module5_DS_ppt
38 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Well Posed Learning Problems and Applications of ML
No ratings yet
Well Posed Learning Problems and Applications of ML
17 pages
AP for NLP-LO1
No ratings yet
AP for NLP-LO1
61 pages
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
index llibre 2
No ratings yet
index llibre 2
6 pages

NLP Q2 21SAL54 Scheme

Uploaded by

NLP Q2 21SAL54 Scheme

Uploaded by

SRINIVAS UNIVERSITY

INSTITUTE OF ENGINEERING & TECHNOLOGY

1 1. To classify text documents

2 a Explanation of each module:

2 b Support vector machine The support vector machine (SVM) is another

4 a Feature Engineering with classical NLP:

b List and Discuss the evaluation parameters.

You might also like