Unit I NLP

overview of NLP

Uploaded by

Aruna

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

Unit I NLP

overview of NLP

Uploaded by

Aruna

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

ACADEMIC YEAR 2024-2025

IFETCE R-2023

Unit I – INTRODUCTION 1.1.5

1.1 NLP: Overview: Future Scope:
 The meaning of NLP is Natural Language  Bots: Chatbots assist clients to get to the
Processing (NLP) which is a fascinating and point quickly by answering inquiries and
rapidly evolving field that intersects referring them to relevant resources and
computer science, artificial intelligence, and products at any time of day or night.
linguistics.  Supporting Invisible UI: Almost every
 With the increasing volume of text data connection we have with machines involves
generated every day, from social media posts human communication, both spoken and
to research articles, NLP has become an written.
essential tool for extracting valuable insights  Smarter Search: NLP’s future also
and automating various tasks. includes improved search, something we’ve
 Natural language processing (NLP) is a field been discussing at Expert System for a long
of computer science and a subfield of time.
artificial intelligence that aims to make 1.2 Approaches in NLP
computers understand human language. https://www.geeksforgeeks.org/rule-based-
1.1.2 NLP Techniques: approach-in-nlp/
 Text Processing and Preprocessing In  In the context of Natural Language
NLP Processing (NLP), "approaches" refer to
 Syntax and Parsing In NLP different methodologies or techniques used to
 Semantic Analysis tackle various tasks related to understanding
 Information Extraction and processing human language.
 Text Classification in NLP There are three types of NLP approaches:
 Language Generation  Rule-based Approach – Based on
 Speech Processing linguistic rules and patterns
 Question Answering  Machine Learning Approach – Based
 Dialogue Systems on statistical analysis
 Sentiment and Emotion Analysis in NLP  Neural Network Approach – Based
1.1.3 Working of Natural Language Processing on various artificial, recurrent, and
(NLP): convolutional neural network
algorithms
1.3 Data Acquisition:
https://www.linkedin.com/pulse/data-
acquisition-natural-language-processing-nlp-
vivekanandan
 Data acquisition is the process of gathering
and collecting data for use in natural
language processing (NLP) tasks. The quality
and quantity of the data is critical to the
 Working in natural language processing success of any NLP model.
(NLP) typically involves using
 There are a number of different ways to
computational techniques to analyze and
acquire data for NLP tasks. Some common
understand human language. This can
methods include:
include tasks such as language
 Crawling and scraping the web
understanding, language generation, and
 Using social media data
language interaction.
 Customer reviews:
1.1.4 Applications of Natural Language
 Using public datasets
Processing (NLP):
 Generating synthetic data:
 Spam Filters
1.4 Text extraction: Unicode Normalization
 Algorithmic Trading
 Text extraction in NLP refers to the process
 Questions Answering
of identifying and extracting relevant
 Summarizing Information

1
ACADEMIC YEAR 2024-2025
IFETCE R-2023

information or structured data from  Phonetic Matching

unstructured textual data. This is  Language Models
particularly useful for tasks such as  Rule-Based Approaches
information retrieval, information  User Feedback
extraction, and summarization.  Domain-Specific
Techniques involved in text extraction:  Customization
 Entity Extraction  Pre-processing
 Keyword Extraction https://www.analyticsvidhya.com/blog/
 Phrase Extraction 2021/06/text-preprocessing-in-nlp-with-
 Information Extraction python-codes/
 Template Filling
 Text Summarization 1.5 Text preprocessing :
 Feature Extraction  Text preprocessing is an essential step
 Document Classification in natural language processing (NLP) that
 Unicode Normalization involves cleaning and transforming
http://www.unicode.org/reports/tr15/ unstructured text data to prepare it for
 It is normalizing Unicode to make analysis.
processing more uniform.
 a Unicode normalization standard to  It includes tokenization, stemming,
decompose a character into its basic parts. lemmatization, stop-word removal, and
 Unicode Normalization Forms are part-of-speech tagging.
formally defined normalizations of
Unicode strings which make it possible to
determine whether any two Unicode
strings are equivalent to each other.
Depending on the particular Unicode
Normalization Form, that equivalence can
either be a canonical equivalence or a
compatibility equivalence.
The four Unicode Normalization Forms
are summarized in Table 1.

 Text preprocessing is to prepare the text

data for the model building. It is the very
first step of NLP projects. Some of the
preprocessing steps are:
 Removing punctuations like . , ! $( ) *
%@
 Removing URLs
 Removing Stop words
1.4.3
 Lower casing
Spell Corrections:
 Tokenization
https://www.naukri.com/code360/library/
 Stemming
spelling-correction-in-nlp
 Lemmatization
 One way to deal with spelling errors in
 Preliminaries
NLP is by using techniques such as spell
https://www.slideshare.net/slideshow/
checking, phonetic matching, and
lecture-2-preliminaries-understanding-and-
incorporating language models that handle
preprocessing-data/54905946
out-of-vocabulary words effectively.
 In Natural Language Processing (NLP), the
1.4.3.1 several techniques commonly used to
preliminaries in preprocessing refer to the
handle spelling errors:
initial steps taken to prepare raw text data
 Spell Checking:

2
ACADEMIC YEAR 2024-2025
IFETCE R-2023

before it can be used for more advanced  Steps

linguistic analysis or modeling tasks. in NLP
 Tokenization
 Stemming
 Lemmatization
 Part-of-speech (POS) tagging
 Named entity recognition
 Chunking
1.6 Feature engineering :
https://www.geeksforgeeks.org/feature-
 These preliminary steps are crucial as they extraction-techniques-nlp/
help clean and transform the text into a  Feature engineering is the process
format that is more suitable for the specific of transforming raw data into features that are
NLP task at hand. suitable for machine learning models.
 Here are some common preliminaries in  In other words, it is the process of
preprocessing: selecting, extracting, and transforming the
 Text Cleaning most relevant features from the available
 Tokenization data to build more accurate and efficient
 Stopword Removal machine learning models.
 Normalization
 Handling Noise:
 Handling Rare Words:
 Sentence Segmentation:
 Part-of-Speech Tagging (POS
tagging)
 Feature Extraction

 These preliminaries are essential because

they lay the groundwork for more advanced
NLP tasks such as sentiment analysis, named
entity recognition, machine translation, and
more.
1.5.1 Frequent steps:
https://www.analyticsvidhya.com/blog/
2021/06/text-preprocessing-in-nlp-with-
Processes Involved in Feature Engineering
python-codes/
 Feature Creation
 Feature Transformation
 Feature Extraction
 Feature Selection
 Feature Scaling
Techniques Used in Feature Engineering
 One-Hot Encoding
 Binning
 Scaling
 Feature Split
 Text Data Preprocessing
Feature Engineering Tools
There are several tools available for feature
engineering some of popular ones are:
 Featuretools

3
ACADEMIC YEAR 2024-2025
IFETCE R-2023

 TPOT of the
 DataRobot features extracted.
 Alteryx 1.6.2 Modelling:
 H2O.ai  In Natural Language Processing (NLP),
1.6.1 Machine Learning Pipeline in NLP: modeling refers to the process of building
https://www.geeksforgeeks.org/ computational models that can understand,
natural-language-processing-nlp generate, or analyze human language.
Feature engineering in the context of  These models are designed to process textual
Machine Learning (ML) and Deep data in a way that enables them to perform
Learning (DL) pipelines in Natural specific tasks or solve particular problems.
Language Processing (NLP) refers to
the process of creating meaningful
 Tasks in NLP Modeling
 Text Classification:
and relevant features from raw text
 Named Entity Recognition (NER
data that can be used as input to
 Machine Translation:
machine learning or deep learning
 Text Generation:
models.
 Question Answering:
 This is crucial because raw text data,
 Sentiment Analysis:
being unstructured, needs to be
 Steps in NLP Modeling:
transformed into a structured format that
 Data Preparation:
can effectively capture the underlying
 Model Selection
patterns and relationships in the data.
 Training
 Here’s how feature engineering fits into  Evaluation
ML and DL pipelines in NLP:  Deployment and Fine-tuning
Machine Learning Pipeline in NLP:  Challenges in NLP Modeling:
 Text Preprocessing:  Ambiguity and Variability
 Feature Extraction  Data Sparsity:
 Feature Selection/Engineering  Interpretable Representations
 Model Training and Evaluation 1.6.3 Evaluation:
Deep Learning Pipeline in NLP:  Evaluation metrics are quantitative
 Text Preprocessing measures
 Feature Representation used to assess the performance and
 Model Architecture effectiveness of Natural Language
 Training and Optimization Processing
 Fine-tuning: (NLP) systems.
 Evaluation
 These metrics help evaluate how well a
Integration of Feature Engineering in ML
particular NLP system performs its
and DL Pipelines:
intended task, such as machine translation,
 Pipeline Design: The design of ML and sentiment analysis, or named entity
DL pipelines in NLP often involves recognition.
integrating various stages of text
 Importance of evaluation metrics skills:
preprocessing, feature extraction, model
 Accuracy of NLP Results:
training, and evaluation.
 Comparative Analysis
 Iterative Process: Feature engineering is  Improvement and Optimization
often an iterative process where different  Task-Specific Expertise
features and representations are  Quality Assurance
experimented with to find the most
 key areas covered under the umbrella
effective ones for the task at hand.
of evaluation metrics:
 Domain Knowledge: Incorporating  Precision and Recall
domain knowledge and task-specific  F1 Score
requirements into feature engineering  Accuracy
enhances the relevance and effectiveness  Perplexity

4
ACADEMIC YEAR 2024-2025
IFETCE R-2023

 Task-Specific Metrics
 Applications of Evaluation
Metrics
 Model Development and
Selection
 Algorithm Fine-tuning and
Optimization
 Benchmarking and Research
Comparisons
 Quality Assurance and User
Satisfaction
 Performance Monitoring and
Error Analysis
1.6.4 Post Modelling Phases:
 Post-modeling phases in Natural
Language Processing (NLP) involve
activities that occur after the model has
been trained and evaluated.
 The key post-modeling phases in NLP:
 Model Evaluation and Validation:
 Hyperparameter Tuning:
 Model Deployment:
 Performance Monitoring and
Maintenance:
 Iterative Improvement and Feedback
Loop
 Ethical Considerations and Bias
Mitigation
 Documentation and Knowledge
Sharing

NLP PPT Presentation
85% (13)
NLP PPT Presentation
20 pages
Speech and Language Processing, 2nd Editio - Daniel Jurafsky
67% (3)
Speech and Language Processing, 2nd Editio - Daniel Jurafsky
383 pages
Frane Selak Lesson
No ratings yet
Frane Selak Lesson
2 pages
Doreen Virtue Magia Divina Kybalion
95% (41)
Doreen Virtue Magia Divina Kybalion
56 pages
AIML - Unit4 HPJ
No ratings yet
AIML - Unit4 HPJ
31 pages
Data Science With Python - Lesson 09 - Data Science With Python - NLP PDF
No ratings yet
Data Science With Python - Lesson 09 - Data Science With Python - NLP PDF
62 pages
NLP2
No ratings yet
NLP2
3 pages
AI Unit-5
No ratings yet
AI Unit-5
10 pages
sha10
No ratings yet
sha10
6 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
Natural Language Processing (NLP) Tutorial - GeeksforGeeks
No ratings yet
Natural Language Processing (NLP) Tutorial - GeeksforGeeks
12 pages
NLP
No ratings yet
NLP
6 pages
Unit 5
No ratings yet
Unit 5
20 pages
Bhawini NLP File
No ratings yet
Bhawini NLP File
100 pages
U1_NLP complete (1).pptx
No ratings yet
U1_NLP complete (1).pptx
108 pages
AI Chapter 6
No ratings yet
AI Chapter 6
27 pages
Chap1 NLP-Students (1) Removed
No ratings yet
Chap1 NLP-Students (1) Removed
54 pages
NLP Presentation
No ratings yet
NLP Presentation
20 pages
L1 introduction to NLP
No ratings yet
L1 introduction to NLP
21 pages
1
No ratings yet
1
5 pages
literature_review
No ratings yet
literature_review
32 pages
Bhawini NLP Practical
No ratings yet
Bhawini NLP Practical
98 pages
Development of NLP Powered Semantic Analysis For Document Understanding
No ratings yet
Development of NLP Powered Semantic Analysis For Document Understanding
4 pages
NLP 833
No ratings yet
NLP 833
26 pages
Introduction To NLP Natural Language Processing1
No ratings yet
Introduction To NLP Natural Language Processing1
10 pages
U1 NLP App Solved
No ratings yet
U1 NLP App Solved
26 pages
AIML
No ratings yet
AIML
10 pages
Natural Language Processing
No ratings yet
Natural Language Processing
73 pages
Top 50 NLP Interview Questions and Answers 2023
No ratings yet
Top 50 NLP Interview Questions and Answers 2023
45 pages
Chapter-6 Communicating, Perceiving, and Acting
100% (1)
Chapter-6 Communicating, Perceiving, and Acting
10 pages
NLP Unit-1,2
No ratings yet
NLP Unit-1,2
340 pages
Natural Language Processing (NLP) (A Complete Guide)
No ratings yet
Natural Language Processing (NLP) (A Complete Guide)
26 pages
NLP-UNIT-I FINAL
No ratings yet
NLP-UNIT-I FINAL
31 pages
4 - Aisc
No ratings yet
4 - Aisc
14 pages
Building AI - No-Code NLP Workflows
No ratings yet
Building AI - No-Code NLP Workflows
109 pages
Module-1 Introduction To NLP
No ratings yet
Module-1 Introduction To NLP
28 pages
Natural Language APA
No ratings yet
Natural Language APA
6 pages
NLP for IAF-RS-Final-09-03-2023-new
No ratings yet
NLP for IAF-RS-Final-09-03-2023-new
63 pages
NLP Lecture 1
No ratings yet
NLP Lecture 1
3 pages
Unlocking-the-Power-of-Natural-Language-Processing-Computational-Linguistics(1)
No ratings yet
Unlocking-the-Power-of-Natural-Language-Processing-Computational-Linguistics(1)
15 pages
natural_language_processing
No ratings yet
natural_language_processing
2 pages
Basic NLP to End-to-end Pipeline .pptx_removed
No ratings yet
Basic NLP to End-to-end Pipeline .pptx_removed
35 pages
NLP DL
No ratings yet
NLP DL
26 pages
Building AI No-Code NLP Workflows
No ratings yet
Building AI No-Code NLP Workflows
109 pages
NLP Notes
No ratings yet
NLP Notes
16 pages
Unit 1
No ratings yet
Unit 1
35 pages
ML1701 - NLP Notes Unit-1
No ratings yet
ML1701 - NLP Notes Unit-1
38 pages
UNIT - 03 (All Topics) (3)
No ratings yet
UNIT - 03 (All Topics) (3)
54 pages
Lecture-1 (Intro NLP)
No ratings yet
Lecture-1 (Intro NLP)
40 pages
Natural Language Processing
No ratings yet
Natural Language Processing
16 pages
NLP Slides 1-55
No ratings yet
NLP Slides 1-55
55 pages
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
No ratings yet
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
7 pages
Unit 5 AI
No ratings yet
Unit 5 AI
9 pages
Introduction To Natural Language Processing NLP
No ratings yet
Introduction To Natural Language Processing NLP
10 pages
foundation for NLP
No ratings yet
foundation for NLP
14 pages
AI Unit 3 - Natural Language Processing by Kulbhushan (Krazy Kaksha & KK World)
No ratings yet
AI Unit 3 - Natural Language Processing by Kulbhushan (Krazy Kaksha & KK World)
4 pages
Advances in Natural Language Processing - A Survey of Current Research Trends, Development Tools and Industry Ap..
No ratings yet
Advances in Natural Language Processing - A Survey of Current Research Trends, Development Tools and Industry Ap..
4 pages
Deep Learning in Natural Language Processing A State-of-the-Art Survey
No ratings yet
Deep Learning in Natural Language Processing A State-of-the-Art Survey
6 pages
The spaCy Handbook: Simplifying Natural Language Processing
From Everand
The spaCy Handbook: Simplifying Natural Language Processing
Robert Johnson
No ratings yet
Exploring the Fascinating World of Natural Language Processing (NLP): Revolutionizing Communication and Empowering Machines through NLP Techniques and Applications
From Everand
Exploring the Fascinating World of Natural Language Processing (NLP): Revolutionizing Communication and Empowering Machines through NLP Techniques and Applications
daniel Huston
No ratings yet
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
From Everand
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
Sanket Subhash Khandare
No ratings yet
AI Techniques and Tools Through Python. Supervised Learning: Classification Methods, Ensemble Learning and Neural Networks
From Everand
AI Techniques and Tools Through Python. Supervised Learning: Classification Methods, Ensemble Learning and Neural Networks
César Pérez López
No ratings yet
Unit 1 Fundamentals
No ratings yet
Unit 1 Fundamentals
2 pages
Manhattan Distance
No ratings yet
Manhattan Distance
12 pages
List of Experiments
No ratings yet
List of Experiments
1 page
Sem II IA 2 ANS KEY AI&DS
No ratings yet
Sem II IA 2 ANS KEY AI&DS
15 pages
UG I YR - MODEL QP Template ODD SEM (100 Mark Pattern) WITH CO PO
No ratings yet
UG I YR - MODEL QP Template ODD SEM (100 Mark Pattern) WITH CO PO
3 pages
Python - Assignement Word
No ratings yet
Python - Assignement Word
2 pages
Despre Romania - Brosura
No ratings yet
Despre Romania - Brosura
19 pages
Bricks Reading 170 Nonfiction - L3 - Grammar Sheet
No ratings yet
Bricks Reading 170 Nonfiction - L3 - Grammar Sheet
16 pages
English CV Alexandra Singeorzan (1)
No ratings yet
English CV Alexandra Singeorzan (1)
3 pages
The School Story Quiz
No ratings yet
The School Story Quiz
3 pages
The Italic I - Between Liveness and The Lens, Emma Cocker and Clare Thornton
100% (1)
The Italic I - Between Liveness and The Lens, Emma Cocker and Clare Thornton
8 pages
Resume Writing Book - Day Students
No ratings yet
Resume Writing Book - Day Students
15 pages
Alexandre Koyre - The Liar 2
No ratings yet
Alexandre Koyre - The Liar 2
3 pages
bottle
No ratings yet
bottle
18 pages
练习册第14讲答案
No ratings yet
练习册第14讲答案
4 pages
108 Dances-Mudras Kerala Tradition
No ratings yet
108 Dances-Mudras Kerala Tradition
34 pages
TT3 Tests Unit 1B
No ratings yet
TT3 Tests Unit 1B
2 pages
Communicating With Japanese by The Total Method 1
100% (3)
Communicating With Japanese by The Total Method 1
391 pages
A - An - The or Zero Article Practice
No ratings yet
A - An - The or Zero Article Practice
1 page
Scott Murphy - Scoring Loss in Some Recent Popular Film and Television
100% (1)
Scott Murphy - Scoring Loss in Some Recent Popular Film and Television
20 pages
School Reading Program: Project SMART: Start Making A Reader Today
100% (4)
School Reading Program: Project SMART: Start Making A Reader Today
15 pages
Visual Communication Notes
No ratings yet
Visual Communication Notes
10 pages
Modal Verb Have To Explanation PDF
No ratings yet
Modal Verb Have To Explanation PDF
1 page
Unit 11
No ratings yet
Unit 11
20 pages
Follows 14 Questionaire
No ratings yet
Follows 14 Questionaire
3 pages
Detailed Lesson Plan in Filipino 1
No ratings yet
Detailed Lesson Plan in Filipino 1
2 pages
Class 4
No ratings yet
Class 4
9 pages
Imperative: The Negative Form
No ratings yet
Imperative: The Negative Form
5 pages
Speaking 1
100% (1)
Speaking 1
6 pages
Subject and Predicate Match Free
No ratings yet
Subject and Predicate Match Free
8 pages
class 10 before final 2024
No ratings yet
class 10 before final 2024
1 page
Math 7 Module
50% (4)
Math 7 Module
15 pages
Future Tense8
No ratings yet
Future Tense8
1 page
66 FDF 480 Dea 0 C 20535 Da
No ratings yet
66 FDF 480 Dea 0 C 20535 Da
28 pages

Unit I NLP

Uploaded by

Unit I NLP

Uploaded by

ACADEMIC YEAR 2024-2025

Unit I – INTRODUCTION 1.1.5

information or structured data from  Phonetic Matching

 Text preprocessing is to prepare the text

before it can be used for more advanced  Steps

 These preliminaries are essential because

You might also like