0% found this document useful (0 votes)
34 views

Natural Language Processing

Natural language processing (NLP) is a field that develops methods for computers to understand human language. The document outlines key NLP tasks like word segmentation, part-of-speech tagging, syntactic analysis, and semantic analysis. It also discusses applications such as information retrieval, information extraction, question answering, and machine translation that utilize NLP. The document provides examples and explanations of how each task is approached in NLP research and applications.

Uploaded by

Abhishek Saini
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Natural Language Processing

Natural language processing (NLP) is a field that develops methods for computers to understand human language. The document outlines key NLP tasks like word segmentation, part-of-speech tagging, syntactic analysis, and semantic analysis. It also discusses applications such as information retrieval, information extraction, question answering, and machine translation that utilize NLP. The document provides examples and explanations of how each task is approached in NLP research and applications.

Uploaded by

Abhishek Saini
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Natural Language

Processing
By Abhishek Saini
Lecture Outline
• What is Natural Language Processing?

• Fundamental tasks in NLP

• Some applications of NLP


What is Natural Language Processing?
• A field of computer science, artificial intelligence and computational
linguistics.
• To get computers to perform useful tasks involving human languages
- Human-Machine communication
− Improving human-human communication
- E.g Machine Translation
− Extracting information from texts
Why NLP is interesting?
• Languages involve many human activities − Reading, writing,
speaking, listening
• Voice can be used as an user interface in many applications − Remote
controls, virtual assistants like siri,...
• NLP is used to acquire insights from massive amount of textual data −
E.g., hypotheses from medical, health reports
Fundamental Tasks in NLP
• Word Segmentation
• Part-of-speech (POS) tagging
• Syntactic Analysis
• Semantic Analysis
Word Segmentation
• In some languages, there is no space between words, or a word may
contain smaller syllables .
• In such languages, word segmentation is the first step of NLP systems.
• Word tokenization (also called word segmentation) is the problem
of dividing a string of written language into its component words. In
English and many other languages using some form of Latin alphabet,
space is a good approximation of a word divider.
• ['Its', 'history', 'can', 'be', 'traced', 'back', 'nearly', '5,000', 'years', 'to',
'archeological', 'discoveries', 'in', 'the', 'Middle', 'East', '.']
Word Segmentation
1.Text Lemmatizaton
The process of removing inflectional endings only and to return the
base dictionary form of a word is known as lemma.
For ex- Worse-bad(lemma)
2. Text Stemming
The process of reducing inflected (or sometimes derived )words to their
root form .
For ex-Meeting-Meet(stem)
POS (Part of Speech)Tagging
• Each word in a sentence can be classified in to classes, such as verbs,
adjectives, nouns, etc
• POS Tagging is a process of tagging words in a sentences to particular
part-of-speech, based on:
− Its definition
− Its context in the sentence
Sequence Labeling
• Many NLP problems can be viewed as sequence labeling
• Each token in a sequence is assigned a label.
• Labels of tokens are dependent on the labels of other tokens in the
sequence, particularly their neighbors.
• John saw the saw and decided to take it to the table.
NNP VBD DT NN CC VBD TO VB PRP IN DT NN
Sequence Labeling as Classification
• Classify each token independently
• Use as features, information about the surrounding tokens (sliding
window).
Probabilistic Sequence Models
• Model probabilities of pairs (token sequences, tag sequences) from
annotated data set.
• Exploit dependency between tokens
• Typical sequence models
1.Hidden Markov Models (HMMs)
2. Conditional Random Fields (CRF)
Syntactical Analysis
• The task of recognizing a sentence and assigning a syntactic structure to it

• The purpose of this phase is to draw exact meaning or you say dictionary
meaning from the text.

• Syntax analysis check the text for meaningfulness comparing to the rules
of the grammar.
Syntactical Analysis
Syntactical Analysis
Syntactical Analysis
• Ambiguity problem: one sentence may have many possible parsing
trees
• Vietnamese language processing (VNLP) still lacks accurate syntax
parsers (in my understanding)
− Accuracy about 78 ~ 84%
Approach to Syntactical Analysis
• Top-down parsing
• Bottom-up parsing
• Dynamic programming methods
− CYK algorithm
− Earley algorithm
− Chart parsing
• Probabilistic Context-Free Grammars (PCFG)
• Assign probabilities for derivations
Semantic Analysis
• Two levels
Lexical semantics
-Representing meaning of words
− Word sense disambiguation (e.g., word bank)
• Compositional semantics
− How words combined to form a larger meaning.
Meaning Representations
• First order predicate calculus
• E.g., Maharani serves vegetarian food. => Serves(Maharani,
vegetarian food)
• E.g., I only have five dollars and I don’t have a lot of time =>
Have(Speaker, FiveDollars) ∧ ¬Have(Speaker, LotOfTime)
Syntax-driven Semantic Analysis
Some Applications
• Information Retrieval
• Information Extraction
• Question Answering
• Machine Translation
Information Retrieval
• Query: “list of good sushi restaurants in kyoto?”
Architecture of an ad hoc IR system
Information Extraction
• To extract from unstructured text, information which pre-specified or
pre-defined in templates − Fill a number of slots/attributes
• Example: use template [PERSON, go, LOCATION, TIME] to extract
information about the destination of an individual goes. − “President
Obama went to Hanoi yesterday. − [PERSON = “President Obama”, go,
LOCATION = “Hanoi”, TIME = “yesterday”]
Question Answering
• A system that automatically return answers for an user’s question by
retrieving information from a collected documents.
• Differences from information retrieval system:
• − QA system’s goal is to respond exact answer instead of documents
related to users’ question.
• Q: who did invent the internet?
• A: Robert E. Kahn and Vint Cerf.
• − QA system requires more complicated semantic analysis
Question Answering
Machine Translation
• The use computer to automatic some or all of the process of
translating one language to the other one.
• Fully automatic machine translation is one of the most challenging
and hot topic in NLP.
• Recent advances of Deep Learning raise the trend of Neural Machine
Translation.
Thanks
End of Session!

You might also like