Introduction_AdvNLP
Introduction_AdvNLP
1) fundamentals of NLP
Speech Text
Technology Technology
Language Technology
Knowledge Technology
4
Levels of Language Processing
Phonetics and Phonology — The study of
linguistic sounds
Morphology —The study of the meaningful
components of words
Syntax —The study of the structural relationships
between words
Semantics — The study of meaning
Pragmatics — The study of how language is used
to accomplish goals
Discourse—The study of linguistic units larger
than a single utterance
Major Tasks in NSLP
• Speech recognition • Information retrieval (IR)
• Text-to-speech • Query expansion
• Speech segmentation • Natural language search
• Optical character recognition • Automatic summarization
• Truecasing • Natural language generation
• Morphological segmentation • Text simplification
• Stemming • Text-proofing
• Part-of-speech tagging • Topic segmentation and recognition
• Word segmentation • Coreference resolution
• Sentence breaking • Relationship extraction
• Parsing • Sentiment analysis
• Language Modeling • Automated essay scoring
• Word sense disambiguation • Natural language understanding
• Named entity recognition • Discourse analysis
• Question answering • Machine translation
• Information extraction (IE)
NLP from the different perspectives
Engineering:
How to build a system?
How to select a suitable approach/tool/data source?
How to combine different approaches/tools/data
source?
How to optimize the performance with respect to
quality and resource requirements?
Science:
Why an approach/tool/data source works/fails?
Why an approach/tool/data source A works better
than B?
Approaches to NLP
Rule Based (Hand Crafted Rules)
Develop the rules to process different types of natural language data
based on known facts, rules and exceptions cases.
Machine Learning
Capture patterns from examples (corpus which is annotated or
otherwise) and apply on new instances
Supervised: learn by comparing with expected output
Unsupervised: blind learning. Create knowledge by association
rather than predefined output
Semi-Supervised: Start with seed of labeled data and iteratively
learn using both supervised and unsupervised learning
Deap Generative Learning: Advanced unsupervised learning
using self generated data that is of higher similarity with the
original data (uses variational autoencoder and/or GAN).
Reinforcement: is a machine learning training method based on
rewarding desired behaviors and punishing undesired ones
Assignments
Fundamentals of NLP
• Prepare a presentation on the following
topics:
– Introduction (what is NLP?, Approaches to NLP, Tasks
of NLP, Foundations of NLP, History of NLP, NLP related
technologies and disciplines, etc)
– Linguistic related issues in NLP
– Classical/Statistical Machine Learning (What it is,
Supervised, semi-supervised, unsupervised,
reinforcement, etc) as applied to NLP
– Deep Learning/Generative Learning as applied to NLP
(What are neural network and DL? Types/Architectures
of DL, activation functions, normalization, optimization,
hyper-parameters, etc)
Sate of the art research on one of
the NLP topics (see slide 6)
• Presentation and Report
• Can be on your potential research topic of
your project and one of the human
languages (for an Ethiopian language)
• You Need to review:
– The current research issues
– State of the art approaches, methods,
techniques, results and research gaps.
– Related works
Experiment on the development of an
NLP system for a local language
• You need to do the following:
– Formulate your research problem
– Prepare/acquire data,
– Select learning algorithms/techniques (preferably
DNN/GL), methods, approaches,
– Follow the NLP development steps
– Use NLP development ecosystems,
– Analyze experimental results
– Presentation of what you have done
– Prepare a report or a publishable manuscript