NLP-Unit-1-part1
NLP-Unit-1-part1
NLP
• Natural Language Processing (NLP) is a branch of artificial
intelligence that deals with the interaction between computers
and human languages. It involves developing algorithms and
models that allow machines to process, analyze, and produce
text or speech in a way that is meaningful to humans.
Page 02
some real-world examples of(NLP)
Voice Assistants
Machine Translation
Chat bots
Sentiment Analysis
Text Summarization
Spell Check and Grammar Correction
Speech Recognition
Text-Based Search Engines
Autocorrect and Auto complete
Applications of NLP
• 1. Question Answering:
Applications of NLP
• 2.Spam Detection
Applications of NLP
• 3. Sentiment Analysis
Applications of NLP
• 4. Machine Translation
Applications of NLP
• 5. Spelling correction
Applications of NLP
• 6. Speech Recognition
• Speech recognition is used for converting spoken words into
text. It is used in applications, such as mobile, home
automation, video recovery, dictating to Microsoft Word, voice
biometrics, voice user interface, and so on.
Applications of NLP
• 7. Chatbot
Components of NLP
• There are two components of NLP, Natural Language
Understanding (NLU)and Natural Language Generation
(NLG).
• Natural Language Understanding (NLU) which involves
transforming human language into a machine-readable
format.
• It helps the machine to understand and analyze human
language by extracting the text from large data such as
keywords, emotions, relations, and semantics.
Components of NLP
• Natural Language Generation (NLG) acts as a translator that
converts the computerized data into natural language
representation.
• It mainly involves Text planning, Sentence planning, and
Text realization.
LEVELS OF NLP
PAGE 013
LEXICAL LEVEL
• This phase scans the source code as a stream of characters and converts
it into meaningful lexemes.
• Real-World Analogy:
• The syntax is still correct, but the semantics are odd or illogical because mats
do not "sit."
DISCOURSE LEVEL
• Imagine you say, "Can you pass the salt?" Pragmatics is about
recognizing that this is not just a question about your ability to pass
the salt, but a polite request for someone to pass it.
NATURAL LANGUAGE PROCESSING WITH PYTHON'S NLTK
PACKAGE
• Now we are going to see kinds of text preprocessing tasks you can
do with NLTK so that you’ll be ready to apply them in future
projects.
NLP pipeline
NLP pipeline
• Step1: Sentence Segmentation
• Step2: Word Tokenization
• Step3: Stemming
• Step 4: Lemmatization
• Step 5: Identifying Stop Words
• Step 6: Dependency Parsing
• Step 7: POS tags
• Step 8: Named Entity Recognition (NER)
• Step 9: Chunking
Sentence Segmentation
• Sentence Segment is the first step for building the NLP pipeline. It breaks the
paragraph into separate sentences.
• Example: Consider the following paragraph -
• Input Text:
"Natural Language Processing is fascinating. It has many applications in AI."
• Segmented Output:
• Sentence 1: "Natural Language Processing is fascinating."
• Sentence 2: "It has many applications in AI."
WORD TOKENIZATION
• Word Tokenize is used to break the sentence into separate words or
tokens.
• Input Sentence:
"Natural Language Processing is fascinating!“
• Tokenized Output:
['Natural', 'Language', 'Processing', 'is', 'fascinating', '!']
STEMMING • Stemming is the process of reducing a word to its
root or base form by removing suffixes.
• Example of Stemming
LEMMATIZATION
• Like stemming, lemmatizing reduces words to their core meaning, but
it will give you a complete English word that makes sense on its own
instead of just a fragment of a word like 'discoveri'.
• Stop words are words that you want to ignore, so you filter them out of
your text when you’re processing it. Very common words like 'in', 'is',
and 'an' are often used as stop words since they don’t add a lot of
meaning to a text in and of themselves.
• Note: nltk.download("stopwords")
• Example
• Input Sentence:
“She is going to the market to buy fruits.."
• POS stands for parts of speech, which includes Noun, verb, adverb,
and Adjective. It indicates that how a word functions with its
meaning as well as grammatically within the sentences. A word has
one or more parts of speech based on the context in which it is
used.
EXAMPLE: GOOGLE SOMETHING ON THE INTERNET.
• NER Output:
Barack Obama → Person (PER)
Hawaii → Location (LOC)
August 4, 1961 → Date (DATE)
CHUNKING
•Tools like Microsoft Word, Grammarly, and Google Docs use NLP.
• LanguageTool
INFORMATION EXTRACTION
• Definition: A process of automatically extracting structured
information from unstructured data (e.g., text).
• Legal Field: Extracting case laws and related details from legal
documents.
QUESTION ANSWERING
• Find the Answer: The system searches for the answer in:
Text Documents, Websites (Google, Wikipedia), Databases
• Example: If the topic is "Cricket", you ask: "Who won the 2011 Cricket World
Cup?“ Answer: "India."
• 2. Answer Extraction
• Example:
Passage: "Isaac Newton discovered gravity in 1687."
Extracted Answer: "Isaac Newton"
DIAGRAM: WORKFLOW OF A QA SYSTEM
• Output:
Answer: "Paris"
WORD SEGMENTATION
• Word Segmentation in NLP (Natural Language Processing) is the
process of breaking down a continuous sequence of text into
individual words. It is especially important for languages that do not
use spaces between words, like Chinese, Japanese, or Thai.
• Statistical Methods
Use probabilities to determine the most likely word splits.
Example: In "ilikesheep" → Is it "I like sheep" or "I like shee p"?
The system calculates which one is more likely.
• English Example:
Input: "itiseasytosegment"
Output: "it / is / easy / to / segment"
• Chinese Example:
Input: 我喜欢学习
Output: 我 / 喜欢 / 学习 → "I / like / studying"
• Thai Example:
Input: ฉันชอบแมว
Output: ฉัน / ชอบ / แมว → "I / like / cats"