![]() |
    CS 410/510        Natural Language Processing     |
![]() |
We rely on natural languages for knowledge storage, communication, and reasoning. Much of our collective knowledge resides in textual form, found in books, papers, and articles. A pivotal focus of artificial intelligence (AI) involves creating computer systems capable of comprehending and emulating human communication and reasoning processes using this textual data. This field, known as Natural Language Processing (NLP), holds significant importance across various domains due to its wide-ranging applications. Recent advancements in AI, powered by large language models such as ChatGPT, GPT-4, and Gemini, as well as transformer-based deep learning architectures, stem directly from NLP research.
This course will cover several levels of text analysis and understanding, including word and phrase level analysis (document retrieval and text classification), syntactic analysis (grammars and parsing), semantic analysis (word and sentence meaning), and discourse analysis (pronoun resolution and text structure). Students will learn to use such techniques to solve different NLP problems, including part of speech tagging, parsing, language modeling, sentiment analysis, information extraction, question answering, machine translation and text generation. While fundamental technologies will be introduced, emphasis will be placed on machine learning methods, particularly deep learning and pre-trained language models, to address these challenges. Deep learning and pre-trained language models have demonstrated exceptional performance in recent years, establishing themselves as primary tools for solving NLP problems.
Thien Huu Nguyen, [email protected]
Daniel Jurafsky and James H. Martin, Speech and Language Processing, 3nd Edition, 2024. Draft available Online!
Tom Mitchell, Machine Learning, 1997.
Kevin Murphy, Machine Learning: A Probabilistic Perspective, 2012.
Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning, 2009 (Online!)
Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning, 2016 (Online!)
This course covers key challenges in Natural Language Processing (NLP), including text classification, part-of-speech tagging, parsing, information extraction, language modeling, question answering, and text generation. It emphasizes fundamental methods to address these challenges, with a primary focus on machine learning techniques such as word embeddings, deep learning, sequential labeling, supervised learning, semi-supervised learning, sequence-to-sequence models, and pre-trained language models.
Upon successful completion of the course, students will be able to:
Upon successful completion of the course, students will have acquired the following skills:
Dates | Topics | Resources |
---|---|---|
Apr 1, 3 | NLP introduction (Slides), Text classification (Slides) | SLP 4 |
Apr 8, 10 | Word embeddings (Slides), deep learning (Slides) | SLP 6 |
Apr 15, 17 | Sequential Labeling, HMM, MEMM, CRF, Viterbi and RNN (Slides) | SLP 8, 9 |
Apr 22, 24 | Syntax, Constituent (Slides) and Dependency Parsing (Slides) | SLP 17, 18 |
Apr 29, May 1 | Information Extraction (Slides) | SLP 19 |
May 6, 8 | Continue content from previous week | |
May 13, 15 | Semi-supervised learning, distant supervision, Review (Slides) and Midterm | |
May 20, 22 | Language Modeling, Transformers, Pre-trained Language Models (Slides) | SLP 3, 7, 10 |
May 27, 29 | Continue content from previous week (No class on May 27 - Memorial Day) | |
June 3, 5 | Continue content from previous week, Tuning and In-context Learning with LLMs |
Assignment 1 (written): Link (posted on April 9), due date: April 17 at 11:59pm.
Assignment 2 (programming): Link (posted on April 22), due date: May 3 at 11:59pm.
Assignment 3 (programming): Link (posted on May 12), due date: May 27 at 11:59pm.
Final Project Proposal Due: May 9 (11:59 pm)
Final report and code due: Finals week (June 11, 11:59pm)
Helpful links
This course will be taught in-person. Please use Piazza and Canvas for communication and discussion.
Grading will be based on the following criteria:
Percentage | Component |
40 | written and programming assignments |
30 | midterm exam |
30 | final project |
  A   |  A+ >= 97.00  |  A 93.34-96.90  |  A- 90.00-93.33  |
  B   |  B+ 86.67-89.99  |  B 83.34-86.66  |  B- 80.00-83.33  |
  C   |  C+ 76.67-79.99  |  C 73.34-76.66  |  C- 70.00-73.33  |
  D   |  D+ 66.67-69.99  |  D 63.34-66.66  |  D- 60.00-63.33  |
  F   |  F 0.00-59.99  |   |   |