0% found this document useful (0 votes)
28 views

Unit I NLP

overview of NLP

Uploaded by

Aruna
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Unit I NLP

overview of NLP

Uploaded by

Aruna
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

ACADEMIC YEAR 2024-2025

IFETCE R-2023

Unit I – INTRODUCTION 1.1.5


1.1 NLP: Overview: Future Scope:
 The meaning of NLP is Natural Language  Bots: Chatbots assist clients to get to the
Processing (NLP) which is a fascinating and point quickly by answering inquiries and
rapidly evolving field that intersects referring them to relevant resources and
computer science, artificial intelligence, and products at any time of day or night.
linguistics.  Supporting Invisible UI: Almost every
 With the increasing volume of text data connection we have with machines involves
generated every day, from social media posts human communication, both spoken and
to research articles, NLP has become an written.
essential tool for extracting valuable insights  Smarter Search: NLP’s future also
and automating various tasks. includes improved search, something we’ve
 Natural language processing (NLP) is a field been discussing at Expert System for a long
of computer science and a subfield of time.
artificial intelligence that aims to make 1.2 Approaches in NLP
computers understand human language. https://www.geeksforgeeks.org/rule-based-
1.1.2 NLP Techniques: approach-in-nlp/
 Text Processing and Preprocessing In  In the context of Natural Language
NLP Processing (NLP), "approaches" refer to
 Syntax and Parsing In NLP different methodologies or techniques used to
 Semantic Analysis tackle various tasks related to understanding
 Information Extraction and processing human language.
 Text Classification in NLP There are three types of NLP approaches:
 Language Generation  Rule-based Approach – Based on
 Speech Processing linguistic rules and patterns
 Question Answering  Machine Learning Approach – Based
 Dialogue Systems on statistical analysis
 Sentiment and Emotion Analysis in NLP  Neural Network Approach – Based
1.1.3 Working of Natural Language Processing on various artificial, recurrent, and
(NLP): convolutional neural network
algorithms
1.3 Data Acquisition:
https://www.linkedin.com/pulse/data-
acquisition-natural-language-processing-nlp-
vivekanandan
 Data acquisition is the process of gathering
and collecting data for use in natural
language processing (NLP) tasks. The quality
and quantity of the data is critical to the
 Working in natural language processing success of any NLP model.
(NLP) typically involves using
 There are a number of different ways to
computational techniques to analyze and
acquire data for NLP tasks. Some common
understand human language. This can
methods include:
include tasks such as language
 Crawling and scraping the web
understanding, language generation, and
 Using social media data
language interaction.
 Customer reviews:
1.1.4 Applications of Natural Language
 Using public datasets
Processing (NLP):
 Generating synthetic data:
 Spam Filters
1.4 Text extraction: Unicode Normalization
 Algorithmic Trading
 Text extraction in NLP refers to the process
 Questions Answering
of identifying and extracting relevant
 Summarizing Information

1
ACADEMIC YEAR 2024-2025
IFETCE R-2023

information or structured data from  Phonetic Matching


unstructured textual data. This is  Language Models
particularly useful for tasks such as  Rule-Based Approaches
information retrieval, information  User Feedback
extraction, and summarization.  Domain-Specific
Techniques involved in text extraction:  Customization
 Entity Extraction  Pre-processing
 Keyword Extraction https://www.analyticsvidhya.com/blog/
 Phrase Extraction 2021/06/text-preprocessing-in-nlp-with-
 Information Extraction python-codes/
 Template Filling
 Text Summarization 1.5 Text preprocessing :
 Feature Extraction  Text preprocessing is an essential step
 Document Classification in natural language processing (NLP) that
 Unicode Normalization involves cleaning and transforming
http://www.unicode.org/reports/tr15/ unstructured text data to prepare it for
 It is normalizing Unicode to make analysis.
processing more uniform.
 a Unicode normalization standard to  It includes tokenization, stemming,
decompose a character into its basic parts. lemmatization, stop-word removal, and
 Unicode Normalization Forms are part-of-speech tagging.
formally defined normalizations of
Unicode strings which make it possible to
determine whether any two Unicode
strings are equivalent to each other.
Depending on the particular Unicode
Normalization Form, that equivalence can
either be a canonical equivalence or a
compatibility equivalence.
The four Unicode Normalization Forms
are summarized in Table 1.

 Text preprocessing is to prepare the text


data for the model building. It is the very
first step of NLP projects. Some of the
preprocessing steps are:
 Removing punctuations like . , ! $( ) *
%@
 Removing URLs
 Removing Stop words
1.4.3
 Lower casing
Spell Corrections:
 Tokenization
https://www.naukri.com/code360/library/
 Stemming
spelling-correction-in-nlp
 Lemmatization
 One way to deal with spelling errors in
 Preliminaries
NLP is by using techniques such as spell
https://www.slideshare.net/slideshow/
checking, phonetic matching, and
lecture-2-preliminaries-understanding-and-
incorporating language models that handle
preprocessing-data/54905946
out-of-vocabulary words effectively.
 In Natural Language Processing (NLP), the
1.4.3.1 several techniques commonly used to
preliminaries in preprocessing refer to the
handle spelling errors:
initial steps taken to prepare raw text data
 Spell Checking:

2
ACADEMIC YEAR 2024-2025
IFETCE R-2023

before it can be used for more advanced  Steps


linguistic analysis or modeling tasks. in NLP
 Tokenization
 Stemming
 Lemmatization
 Part-of-speech (POS) tagging
 Named entity recognition
 Chunking
1.6 Feature engineering :
https://www.geeksforgeeks.org/feature-
 These preliminary steps are crucial as they extraction-techniques-nlp/
help clean and transform the text into a  Feature engineering is the process
format that is more suitable for the specific of transforming raw data into features that are
NLP task at hand. suitable for machine learning models.
 Here are some common preliminaries in  In other words, it is the process of
preprocessing: selecting, extracting, and transforming the
 Text Cleaning most relevant features from the available
 Tokenization data to build more accurate and efficient
 Stopword Removal machine learning models.
 Normalization
 Handling Noise:
 Handling Rare Words:
 Sentence Segmentation:
 Part-of-Speech Tagging (POS
tagging)
 Feature Extraction

 These preliminaries are essential because


they lay the groundwork for more advanced
NLP tasks such as sentiment analysis, named
entity recognition, machine translation, and
more.
1.5.1 Frequent steps:
https://www.analyticsvidhya.com/blog/
2021/06/text-preprocessing-in-nlp-with-
Processes Involved in Feature Engineering
python-codes/
 Feature Creation
 Feature Transformation
 Feature Extraction
 Feature Selection
 Feature Scaling
Techniques Used in Feature Engineering
 One-Hot Encoding
 Binning
 Scaling
 Feature Split
 Text Data Preprocessing
Feature Engineering Tools
There are several tools available for feature
engineering some of popular ones are:
 Featuretools

3
ACADEMIC YEAR 2024-2025
IFETCE R-2023

 TPOT of the
 DataRobot features extracted.
 Alteryx 1.6.2 Modelling:
 H2O.ai  In Natural Language Processing (NLP),
1.6.1 Machine Learning Pipeline in NLP: modeling refers to the process of building
https://www.geeksforgeeks.org/ computational models that can understand,
natural-language-processing-nlp generate, or analyze human language.
Feature engineering in the context of  These models are designed to process textual
Machine Learning (ML) and Deep data in a way that enables them to perform
Learning (DL) pipelines in Natural specific tasks or solve particular problems.
Language Processing (NLP) refers to
the process of creating meaningful
 Tasks in NLP Modeling
 Text Classification:
and relevant features from raw text
 Named Entity Recognition (NER
data that can be used as input to
 Machine Translation:
machine learning or deep learning
 Text Generation:
models.
 Question Answering:
 This is crucial because raw text data,
 Sentiment Analysis:
being unstructured, needs to be
 Steps in NLP Modeling:
transformed into a structured format that
 Data Preparation:
can effectively capture the underlying
 Model Selection
patterns and relationships in the data.
 Training
 Here’s how feature engineering fits into  Evaluation
ML and DL pipelines in NLP:  Deployment and Fine-tuning
Machine Learning Pipeline in NLP:  Challenges in NLP Modeling:
 Text Preprocessing:  Ambiguity and Variability
 Feature Extraction  Data Sparsity:
 Feature Selection/Engineering  Interpretable Representations
 Model Training and Evaluation 1.6.3 Evaluation:
Deep Learning Pipeline in NLP:  Evaluation metrics are quantitative
 Text Preprocessing measures
 Feature Representation used to assess the performance and
 Model Architecture effectiveness of Natural Language
 Training and Optimization Processing
 Fine-tuning: (NLP) systems.
 Evaluation
 These metrics help evaluate how well a
Integration of Feature Engineering in ML
particular NLP system performs its
and DL Pipelines:
intended task, such as machine translation,
 Pipeline Design: The design of ML and sentiment analysis, or named entity
DL pipelines in NLP often involves recognition.
integrating various stages of text
 Importance of evaluation metrics skills:
preprocessing, feature extraction, model
 Accuracy of NLP Results:
training, and evaluation.
 Comparative Analysis
 Iterative Process: Feature engineering is  Improvement and Optimization
often an iterative process where different  Task-Specific Expertise
features and representations are  Quality Assurance
experimented with to find the most
 key areas covered under the umbrella
effective ones for the task at hand.
of evaluation metrics:
 Domain Knowledge: Incorporating  Precision and Recall
domain knowledge and task-specific  F1 Score
requirements into feature engineering  Accuracy
enhances the relevance and effectiveness  Perplexity

4
ACADEMIC YEAR 2024-2025
IFETCE R-2023

 Task-Specific Metrics
 Applications of Evaluation
Metrics
 Model Development and
Selection
 Algorithm Fine-tuning and
Optimization
 Benchmarking and Research
Comparisons
 Quality Assurance and User
Satisfaction
 Performance Monitoring and
Error Analysis
1.6.4 Post Modelling Phases:
 Post-modeling phases in Natural
Language Processing (NLP) involve
activities that occur after the model has
been trained and evaluated.
 The key post-modeling phases in NLP:
 Model Evaluation and Validation:
 Hyperparameter Tuning:
 Model Deployment:
 Performance Monitoring and
Maintenance:
 Iterative Improvement and Feedback
Loop
 Ethical Considerations and Bias
Mitigation
 Documentation and Knowledge
Sharing

You might also like