0% found this document useful (0 votes)
11 views55 pages

Natural Language Processing & Major Tasks: Trần Thanh Phước

The document provides an overview of Natural Language Processing (NLP), including its definition, applications, challenges, and research directions. It covers various NLP tasks such as language modeling, text categorization, and machine translation, along with tools like NLTK and Under The Sea for practical implementation. Additionally, it discusses the challenges faced in NLP, including ambiguity and the need for quality training data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views55 pages

Natural Language Processing & Major Tasks: Trần Thanh Phước

The document provides an overview of Natural Language Processing (NLP), including its definition, applications, challenges, and research directions. It covers various NLP tasks such as language modeling, text categorization, and machine translation, along with tools like NLTK and Under The Sea for practical implementation. Additionally, it discusses the challenges faced in NLP, including ambiguity and the need for quality training data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

NATURAL LANGUAGE

PROCESSING & MAJOR TASKS

INTRODUCTION

Trần Thanh Phước Introduction


OUTLINE
1. Introduction to Natural Language Processing (NLP)
2. NLP’s applications
3. Practice with NLTK

4. Challenges in NLP
5. Research Directions in NLP
6. Practice with Under The Sea
Trần Thanh Phước
1. Introduction to NLP

❑ Natural languages:
❑ Refer to the languages spoken by people (English, Vietnamese,…), as opposed
to artificial languages (C++, Java, …)
❑ Natural language processing:
❑ Applications that deal with natural language processing
❑ Computational linguistics:
❑ Doing linguistics on computers
❑ More on the linguistic side than NLP, but closely related
1. Introduction to NLP

❑ Do you understand?
❑ Sdfsdf;sldjfsdf
❑ Iuerpnlc;lwke;rkef
❑ Klskdfsjkdpwoierpmcfs;df
❑ Sdjkf ;lkjewkewo s;dlmf;sdfm;
1. Introduction to NLP

❑ Computer lack knowledge


❑ Computer see text in English the same you have seen the previous text.
❑ People have no trouble understanding language
❑ Common sense knowledge

❑ Reasoning capacity

❑ Experience

❑ Computers have
❑ No common sense knowledge

❑ No reasoning capacity

❑ Unless we teach them!


1. Introduction to NLP

❑ What is Natural Language Processing?


❑ Automatically process natural language

❑ Computer using natural language as input and/or output


1. Introduction to NLP

❑ NLP is a subfield of artificial intelligence and computational linguistics. It studies the


problems of automated generation and understanding of natural human languages.
❑ Natural-language-generation systems convert information from computer databases into
normal-sounding human language.
❑ Natural-language-understanding systems convert samples of human language into more
formal representations that are easier for computer programs to manipulate.
1. Introduction to NLP
OUTLINE
1. Introduction to Natural Language Processing (NLP)
2. NLP’s applications
3. Practice with NLTK

4. Challenges in NLP
5. Research Directions in NLP
6. Practice with Under The Sea
Trần Thanh Phước
2. NLP’s applications

❑ Language model ❑ Text categorization


❑ Morphology analysis ❑ Information extraction
❑ Part of Speech tagging ❑ Information retrieval
❑ Syntactic parsing ❑ Machine translation
❑ Word sense disambiguation ❑ Named entity recognition
❑ Semantic representation ❑ Text generation
❑ Collocation/multi-word expression extraction ❑ Question answering
❑ Anaphora resolution ❑ Sentiment analysis & Opinion mining
❑ Preposition attachment
❑ Word Net
2. NLP’s applications

❑ Language Model:
◻ Estimate the probability of a sequence of words (sentence) in a language.

P(“I like it”) = ?


P(“I lay it”) = ?
❑ Large Language model:
◻ Large language models, also known as LLMs, are very large deep learning models that
are pre-trained on vast amounts of data.
2. NLP’s applications

❑ Morphology analysis
❑ To analyze the structure of a word.

❑ For example:
❑ going -> go[V] + ing

❑ dogs -> dog [N]+ s


❑ computerization -> ?

❑ preprocessing -> ?

❑ Word segmentation in Vietnamese: sinh_viên, sinh_đẻ

❑ Học sinh học sinh học => ?

❑ Hợp tác xã có thể hiểu là sự hợp tác xã hội để mang lại lợi ích cho nhau => ?
2. NLP’s applications

❑ Part-Of-Speech tagging (POS)


❑ INPUT:
❑ Profits soared at Boeing Co., easily topping forecasts on Wall Street, as their CEO Alan
Mulally announced first quarter results.
❑ OUTPUT:
❑ Profits/N soared/V at/P Boeing/N Co./N ,/, easily/ADV topping/V forecasts/N on/P Wall/N
Street/N ,/, as/P their/POSS CEO/N Alan/N Mulally/N announced/V first/ADJ
quarter/N results/N ./.
❑ “Tôi là một sinh_viên” -> “Tôi/PRO là/V một/Det sinh_viên/N
2. NLP’s applications

❑ Examples:
❑ I fish a fish => ?

❑ I can can a can => ?

❑ Học sinh học sinh học => ?


2. NLP’s applications

❑ Syntactic parsing
❑ To determine grammatical structure of a sentence
2. NLP’s applications

❑ Examples:
❑ tôi có người bạn tốt => 我有一个好朋友 => i have good friend

❑ Tôi có người bạn tốt => 我有一个好朋友 => i have a good friend

❑ Tôi có bạn gái => 我有一个女朋友 => i have a girlfriend

❑ Nhân viên ngân hàng này rất xinh đẹp => 这位银行职员很漂亮。
❑ => This bank employee is very beautiful
2. NLP’s applications

❑ Word Sense Disambiguation


❑ Task of automatically selecting the correct sense for a word given a context
❑ For example:
❑ Go to the bank to deposite money.
❑ Go along the bank.
❑ In Vietnamese, “ba” means “three”, “father”, or “ba” in “ba hoa”?
2. NLP’s applications

❑ Representation meaning:
❑ For example:

“Ay Caramba is near ICSI”


Presenting by First Order Logic
2. NLP’s applications

❑ Collocation extraction:
❑ In corpus linguistics, collocation defines a sequence of words or terms that co-occur more
often than would be expected by chance.
Strong tea -> not powerful tea
Powerful computer -> not strong computer
❑ Collocation extraction is a task that extracts collocations automatically from a corpus,
using computational linguistics.
2. NLP’s applications

❑ Collocation extraction:
❑ Xin chào hai anh em => Xin chào hai em anh?

❑ Xin chào hai vợ chồng => Xin chào hai chồng vợ?

❑ Xin chào hai mẹ con => Xin chào hai con mẹ?

❑ Trăm năm hạnh phúc => Ngàn năm hạnh phúc?


2. NLP’s applications

❑ Anaphora resolution:
❑ The problem of resolving what a pronoun, or a noun phrase refers to
❑ For example:

“The dog entered my room. It scared me”


Find the connection between dog and it, not room and it
❑ Chiều hôm qua Lan gặp Phụng trên trường. Cô ấy trông tươi tắn hẳn ra
2. NLP’s applications

❑ Preposition attachment
❑ Prepositional phrase attachment is a common cause of structural ambiguity in natural
language
❑ For example:

“I saw the man in the park with a telescope”


2. NLP’s applications

❑ Word Net:
2. NLP’s applications

❑ Word Net:
❑ Đo khoảng cách về nghĩa của các từ

❑ Animal, Cat, Dog

❑ Cat, banana

❑ Kiểm tra ràng buộc về nghĩa trong câu

❑ the table eats the chicken

❑ the dog eats the chicken


2. NLP’s applications

❑ Text Categorization:
❑ Classify documents by: topics, language, author, spam filtering, information retrieval
(relevant, not relevant), sentiment classification (positive, negative)
2. NLP’s applications

❑ Information Extraction:
❑ Motivation:
❑ Complex searches (“Find me all
the job in advertising paying at
least $50,000 in Boston”)
❑ Statistical queries (“Does the
number of jobs in accounting
increases over the years?”)

❑ Goal: map a document collection

to structured database
2. NLP’s applications

❑ Machine translation:
2. NLP’s applications

❑ Text summarization:
❑ Automatic summarization is

the creation of a shortened

version of a text by a
computer program.
❑ The product of this procedure

still contains the most


important points of the
original text.
2. NLP’s applications

❑ Name Entity Recognition:


❑ INPUT: Profits soared at Boeing Co., easily topping forecasts on Wall Street, as their CEO Alan

Mulally announced first quarter results.


❑ OUTPUT: Profits soared at [Company Boeing Co.], easily topping forecasts on [Location Wall

Street], as their CEO [Person Alan Mulally] announced first quarter results.
2. NLP’s applications

❑ Dialog System (Conversation agent):


❑ User: I need a flight from Boston to Washington, arriving by 10 pm.

❑ System: What day are you flying on?

❑ User: Tomorrow

❑ System: Returns a list of flights


2. NLP’s applications

❑ Information retrieval:
❑ General model:
❑ A huge collection of texts
❑ query
❑ Task: find documents that are relevant to the given query

❑ Examples: google, Bing, ….


2. NLP’s applications

❑ Natural language generation:


❑ Task of generating natural language from a machine representation system such as a
knowledge base or a logical form.
❑ Text generation:
❑ Generate natural sentence
❑ For summarization
❑ For changing the way of sentence
2. NLP’s applications

❑ Sentiment analysis and Opinion mining:


OUTLINE
1. Introduction to Natural Language Processing (NLP)
2. NLP’s applications
3. Practice with NLTK

4. Challenges in NLP
5. Research Directions in NLP
6. Practice with Under The Sea
Trần Thanh Phước
3. Practice with NLTK

❑ NLTK is a leading platform for building Python programs to work with human
language data.
❑ Used for: Classification, Tokenization, Stemming, Tagging, Parsing, Semantic reasoning, …
❑ https://colab.research.google.com/drive/1EElgDjP8PBam_C9-
S7cCvzkg4iZOt_fN#scrollTo=kZVazLWZ4dAu
OUTLINE
1. Introduction to Natural Language Processing (NLP)
2. NLP’s applications

3. Practice with NLTK


4. Challenges in NLP
5. Research Directions in NLP
6. Practice with Under The Sea
Trần Thanh Phước
4. Challenges in NLP

❑ Ambiguity
❑ Slang and teen code
❑ Misspelling and Mispronunciation
❑ Training Data
4. Challenges in NLP

❑ Ambiguity:
❑ Natural language often comprises ambiguity in the textual sentences and occurs at word,

sentence, or meaning levels, etc.


❑ Examples in English:
❑ Lexical ambiguity: “Bat”

❑ Syntactic Ambiguity: “I saw the man with a telescope”

❑ Semantic Ambiguity: “He gave her a ring”

❑ Pragmatic Ambiguity: “Can you open the window?”


❑ Referential Ambiguity: “Alice told Jane that she would win the prize,”

4. Challenges in NLP

❑ Slang and teen code:



4. Challenges in NLP

❑ Slang and teen code:



4. Challenges in NLP

❑ Misspelling and Mispronunciation:


❑ Natural language text often comprises misspelled words or short texts, creating significant

challenges to the text analysis. Understanding or recognizing the intention of the writer from
their misspelled text is arduous for the models or machines.
4. Challenges in NLP

❑ Training Data:
❑ For the machine learning-based natural language processing, modeling or providing the

availability of training data to the decision-making system is challenging due to the availability
of irrelevant or diversified textual content across various applications.
❑ “I go to the bank” => “Tôi đi đến ngân hàng”

❑ => “Tôi đi đến bờ sông”


OUTLINE
1. Introduction to Natural Language Processing (NLP)
2. NLP’s applications

3. Practice with NLTK


4. Challenges in NLP
5. Research Directions in NLP
6. Practice with Under The Sea
Trần Thanh Phước
5. Research directions in NLP

❑ Corpus construction for NLP tasks


❑ Neural machine translation for low-resource languages
❑ Deep Learning-based Contextual Text Generation for Conversational Text
❑ Deep Learning-based Contextual Word Embedding for Text Generation
❑ Pre-trained Deep Learning Model based Text Generation
❑ Text Generation with Deep Transfer Learning
❑ Sentiment Classification in Social Media with Deep Contextual Embedding
❑ Deep Learning-based sentiment Classification in Conversational Text
❑ Speech Processing;
❑ Comparative of deep learning models for NLP.
❑ Retrieval Augmented Generation with LLMs
5. Research directions in NLP

❑ Corpus construction for NLP tasks:


❑ Bilingual corpus, multilingual
corpus: for machine translation,
cross-language tasks
❑ https://www.clarin.eu/resource-
families/parallel-corpora
❑ Corpus for Question and Answering:
SQuAD
(https://rajpurkar.github.io/SQuAD-
explorer/)
❑ Corpus for sentiment analysis
❑ Etc.
❑ https://sites.google.com/view/ph
uoc-tran/writing?authuser=0
5. Research directions in NLP

❑ Machine translation for low-


resource languages:
❑ ACM Transactions on Asian
and Low-Resource Language
Information Processing (
https://dl.acm.org/journal/TAL
LIP)
2. NLP’s applications
5. Research directions in NLP

❑ Deep Learning-based Contextual Text Generation for Conversational Text


❑ Deep Learning-based Contextual Word Embedding for Text Generation
❑ Pre-trained Deep Learning Model based Text Generation
❑ Text Generation with Deep Transfer Learning
5. Research directions in NLP

❑ Sentiment Classification in Social Media with Deep Contextual Embedding


❑ Deep Learning-based sentiment Classification in Conversational Text
5. Research directions in NLP

❑ Speech Processing:
❑ Speech recognition
❑ Speech to text
❑ Text to speech
5. Research directions in NLP
❑ Comparative of deep learning models for NLP
5. Research directions in NLP

❑ Retrieval Augmented Generation with LLMs:


❑ Hallucination in LLMs
OUTLINE
1. Introduction to Natural Language Processing (NLP)
2. NLP’s applications

3. Practice with NLTK


4. Challenges in NLP
5. Research Directions in NLP
6. Practice with Under The Sea
Trần Thanh Phước
6. Practice with Under The Sea

❑ Underthesea is a suite of open source Python modules, datasets and tutorials


supporting research and development in Vietnamese Natural Language
Processing.
❑ Link: https://underthesea.readthedocs.io/en/latest/index.html
❑ https://colab.research.google.com/drive/13vPlzfijoodIQ7jcutEwMJmAY2fD
gckq#scrollTo=Wy8wGPaWQAAu
Q&A

You might also like