dl_nlp_reading materials_bda_cs_25
dl_nlp_reading materials_bda_cs_25
by
Language Processing (DL&NLP)
F T
Deep Learning and its Application in Natural
A
DA345
R
Suggested reading materials
D
P
Soumitra Samanta
L
February 24, 2025
& N
D L
Contents
S S
by
T
1 Motivation, Course overview, Syllabus, Prerequisites and
Resources 4
F
1.1 Class schedule: . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Teaching Assistant (TA): . . . . . . . . . . . . . . . . . . . . . 4
A
1.3 Prerequisite (s) . . . . . . . . . . . . . . . . . . . . . . . . . . 4
R
1.4 Course url: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Credit : 4 (four), approximately 60 credit hours . . . . . . . . 5
D
1.6 Tentative syllabus . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Related books . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
P
1.8 Evaluation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.9 Assignments: . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
L
1.10 Project: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.11 Academic ethics: . . . . . . . . . . . . . . . . . . . . . . . . . 8
N
1.12 DL & NLP related tools . . . . . . . . . . . . . . . . . . . . . 9
&
1.13 NLP datasets repository . . . . . . . . . . . . . . . . . . . . . 9
1.14 DL & NLP related top tier conference . . . . . . . . . . . . . 10
L
1.15 DL & NLP related top journals . . . . . . . . . . . . . . . . . 11
1.16 For recent updates on ML you can follow the arXiv . . . . . . 11
D
1.17 Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . 11
2
CONTENTS 3
S
5 Introduction to loss function and gradients 16
5.1 Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . 16
S
6 Introduction to backpropagation 17
y
6.1 Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . 17
b
7 Introduction to parameter initialisation and update rules 18
7.1 Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . 18
T
7.2 Assignment-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
F
8 Convolutional Networks-1 19
8.1 Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . 19
A
9 Convolutional Networks-2 20
R
9.1 Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . 20
D
10 Convolutional Networks-3 21
10.1 Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . 21
P
10.2 Assignment-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
L
11 Introduction to NLP, language model: N-gram 22
11.1 Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . 22
N
11.2 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
&
12 Word embeddings: vector semantics, neural word embed-
ding 23
L
12.1 Suggested reading . . . . . . . . . . . . . . . . . . . . . . . . . 23
12.2 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
D
Lecture 1
S S
Motivation, Course overview,
Syllabus, Prerequisites and by
Resources
F T
1.1 Class schedule:
R A
D
• Tuesday: 10:30AM - 12:00 PM (IH402)
P
• Wednesday: 12:00PM - 1:30 PM (IH402)
L
• Friday: 12:00PM - 1:30 PM (IH402)
1.2
L
We have a TA in this course:
• TA: Suvajit Patra (2nd yr. PhD student) (IH413)
D
• Email: [email protected]
4
1.4. COURSE URL: 5
S
• Introduction to Machine Learning
by
T
1.5 Credit : 4 (four), approximately 60 credit
F
hours
A
1.6 Tentative syllabus
R
Here are is a tentative syllabus:
D
• Artificial neural network (ANN): Modelling Single neuron activity, dif-
ferent types of activity functions (sigmoid, tanh, ReLU, ELU etc.), how
P
to connect multiple neurons to form a network, Multi-layer perceptron
L
• Optimization: Back propagation, different loss functions, gradient de-
cent, stochastic gradient decent and different update rules (AdaGrad,
N
RPMSprops, Adam etc.) for network parameters, regularization, dropout,
batch normalisation etc.
&
• Deep learning toolbox: Explore a deep learning toolbox like PyTorch
L
(my personal choice)/ TensorFlow and their autograd functionalities
D
• Convolutional neural network (CNN): Concept of kernel and convolu-
tion, some pooling operation (max, average etc.), some standard CNN
architectures like LeNet, AlexNet, VggNet, ResNet etc. and concept of
transfer learning
S
• Neural language model:
S
– Introduction to NLP
– Text preprocessing: tokenisation, stop words, stemming, lemma-
y
tisation, etc.
b
– Vector representations of text: Bag of Words, TF-IDF, word em-
beddings, Word2Vec, GloVE, etc.
T
– Sequence modelling: Recurrent neural network (RNN), Self-Attention
network, etc.
F
– Transformers: Attention, BERT and its different variants, Encoder-
A
Decoder models
R
– Large language model (LLM): GPT different variants, pre-trained
language model, transfer learning
D
– Application: text classification, sentiment analysis, Named Entity
Recognition (NER), machine translation, text summarization, text
generation, etc.
1.7
L P Related books
N
We will follow multiple books for different topics. Here are some suggested
&
books will follow in our course :
L
• Charu C. Aggarwal. Neural Networks and Deep Learning: A Textbook,
Springer Cham, 2nd edition, 2023.
D
• Simon Haykin. Neural Networks and Learning Machines, Pearson, 3rd
edition, 2009.
S
• Eugene Charniak. Introduction to deep learning, MIT Press, 2018.
S
• Michael Nielsen. Neural Networks and Deep Learning, online
y
• Ovidiu Calin. Deep Learning Architectures: A Mathematical Approach,
Springer Cham, 1st edition, 2020.
b
• Dan Jurafsky and James H. Martin. Speech and Language Processing.
draft, 3rd edition, 2024. [online]
F T
• Delip Rao, Brian McMahan. Natural Language Processing with Py-
Torch, O’Reilly Media, Inc, 2019
A
• Lewis Tunstall, Leandro von Werra, and Thomas Wolf. Natural Lan-
R
guage Processing with Transformers, O’Reilly Media, Inc, 2022. online
code only
D
• Yoav Goldberg. A Primer on Neural Network Models for Natural Lan-
guage Processing. online
1.8
L P
Evaluation:
N
Approximate weightage of different components in evaluation are as follows:
Midterm Exam 10%
&
Final Exam 40%
L
Assignment and Class test/Quizzes 20%
Project 25%
D
Class attendance 5%
1.9 Assignments:
There will be some programming assignments. For the programming assign-
ment, we will follow Python programming language for this course. The
assignment submission deadline is strict and We will consider 11.59PM as
our day end.
8LECTURE 1. MOTIVATION, COURSE OVERVIEW, SYLLABUS, PREREQUISITES AND
1.10 Project:
• Can be done in a group (max two students)
S
• Be careful about your project partner!
S
• If he is auditing the course then you will be in trouble!
y
• Define your own project
b
• Submit a one page project proposal- within fixed time (first four weeks)?
T
• Finished the work within the time-line
F
• Report submission
A
• Submission deadline: seven days before the final exam date, is strict
and you can adjust your assignment buffer days here
D R
• We will consider 11:59PM as our day end
P
before the final exam date
L
1.11 Academic ethics:
N
We will follow some academic ethics:
&
• Your grade should reflect your own work.
L
• Copying or paraphrasing someone’s work (code included), or permit-
D
ting your own work to be copied or paraphrased, even if only in part,
is strictly forbidden, and will result in an automatic grade of zero for
the entire assignment or exam in which the copying or paraphrasing
was done.
S
• Machine Learning in Python - https://scikit-learn.org/stable/
S
• ML in GPU - https://rapids.ai/
y
• PyTorch - https://pytorch.org/
b
• Natural Language Toolkit - https://www.nltk.org/
• NLP for Indian language - https://github.com/AI4Bharat/indicnlp_
T
catalog
F
• Bangla nlp - https://github.com/sagorbrur/bnlp
• ···
R A
D
You can find some datasets to evaluate yous NLP models here:
• https://github.com/niderhoff/nlp-datasets
P
• https://github.com/sebastianruder/NLP-progress
L
• https://www.nltk.org/nltk_data/
N
• https://universaldependencies.org/
&
• Movie subtitles: https://opus.nlpl.eu/OpenSubtitles-v2018.php
L
• I am not sure the data can be downloadable or not! But you can try
for your application from these sources:
D
– Related to Bengali literature: https://nltr.itewb.gov.in/
– https://nltr.itewb.gov.in/downloads.php
– https://rabindra-rachanabali.nltr.org/node/1
– https://nazrul-rachanabali.nltr.org/
– https://bankim-rachanabali.nltr.org/
– https://sarat-rachanabali.nltr.org/
– https://advaitaashrama.org/cw/content.php
10LECTURE 1. MOTIVATION, COURSE OVERVIEW, SYLLABUS, PREREQUISITES AN
S
icml.cc/
S
• Neural Information Processing Systems (NeurIPS) - https://neurips.
cc/
by
• International Conference on Learning Representations (ICLR) - https:
//iclr.cc/
T
• Association for the Advancement of Artificial Intelligence (AAAI) -
F
https://www.aaai.org/
A
• Computer Vision Foundation (CVF) - https://openaccess.thecvf.
com/menu
R
• Association for Computational Linguistics (ACL)[every year] - papers
D
https://aclanthology.org/venues/acl//
P
• Empirical Methods in Natural Language Processing (EMNLP)[every
year] - papers https://aclanthology.org/venues/emnlp/
L
• North American Chapter of the Association for Computational Lin-
N
guistics ( NAACL)[every year] - papers https://aclanthology.org/
venues/naacl/
L&
• European Chapter of the Association for Computational Linguistics
(EACL)[every year] - papers https://aclanthology.org/venues/eacl/
D
• International Conference on Computational Linguistics (COLING) [al-
ternate year (even)] - papers https://aclanthology.org/venues/
coling/
• ···
1.15. DL & NLP RELATED TOP JOURNALS 11
S
org/
S
• Journal of Computational Linguistics (JCL) - https://direct.mit.
edu/coli/
y
• Transactions of the Association for Computational Linguistics (TACL)
b
- https://transacl.org/index.php/tacl/index
T
journal/10791
F
• ···
1.16
A
For recent updates on ML you can follow
the arXiv
R
D
You can go to Computer Science (CS) section in arXiv and under that you
can find different branches of CS (like ML, CL, AI, IR, etc.).
P
• ML - https://arxiv.org/list/cs.LG/recent
L
• CL - https://arxiv.org/list/cs.CL/recent
N
• AI - https://arxiv.org/list/cs.AI/recent
&
• IR - https://arxiv.org/list/cs.IR/recent
L
• ···
D
1.17 Suggested reading
Please go through the class slides.
Lecture 2
S S
Introduction to Artificial
neural network by
F T
A
2.1 Suggested reading
R
Please go through Chapter 1, till section 1.2.1.6 of Charu Aggarwal’s book [1]
D
(you can find it in our library or you may find it online here but not sure!)
or Chapter 1 of Simon Haykin’s book [7] (you can find it in our library or
P
you may find online here but not sure!).
NL
L&
D
12
Lecture 3
S S
Perceptron learning algorithm
by
3.1 Suggested reading
F T
A
The perceptron learning algorithms were proposed by Frank Rosenblatt in
R
1958 [12]. An online version of the original paper can be found here. Also,
he wrote a technical report [13] in detail about perceptron.
D
You can find perceptron learning algorithms in any Machine Learning or
Pattern Recognition book. Here are some references:
P
You can find the algorithm in Shalev-Shwartz et al. book [14] Chapter
9, Section 9.1.2: Perceptron for Half-spaces and for the convergence proof
L
please go through the Theorem 9.1 in [14].
Mohri et al. book [10] Chapter 8, section 8.3.1: Perceptron algorithm
N
and for convergence proof Theorem 8.8 [14].
For a brief history and original perceptron setup picture in Bishop’s
&
book [3] Chapter 4, Section 4.1.7.
L
Perceptron Algorithm:
D
3.2 Assignment
Implement the perceptron learning algorithm for a two class synthetic data
we have discussed in the class with the following settings:
• Consider a two class classification problem and generate the dataset
(100 points uniformly from each class) using the script form here:
https://xlms.rkmvu.ac.in/pluginfile.php/4571/mod_assign/introattachment/
0/gui_inputs.py?forcedownload=1
13
14 LECTURE 3. PERCEPTRON LEARNING ALGORITHM
S
Output: Final weight vector w.
S
1: w1 ← w0
2: for t ← 1 to T do
y
3: for i ← 1 to N do
4: if y i (wtT xi ) < 0 then
b
5: w t ← w t + y i xi
6: end if
T
7: end for
8: end for
F
9: return wT +1
A
• Implement the perceptron leaning algorithm discussed in the class with
R
following three initialisation:
D
– Randomly
– With the help from your dataset
P
– With zeros
L
• Plot the results (your linear separators) with the data points for the
above three cases.
N
Submission deadline: 15-01-2025 (11:59 PM)
&
Submission file format: your_ID_full_name_perceptron_2d_data_version_
no.ipynb
D L
Lecture 4
S S
Introduction to different
activation functions by
F T
A
4.1 Suggested reading
R
Please go through the class slides.
D
Please go through the Chapter 2 of Ovidiu Calinl’s book [4] (you can find
it in our library). Also, you can go through Chapter 4, section 4.4 of Charu
P
Aggarwal’s book [1] (you can find it in our library or you may find it online
here but not sure!.
NL
L &
D
15
Lecture 5
S S
Introduction to loss function
and gradients by
F T
A
5.1 Suggested reading
R
Please go through Chapter 2 of Charu Aggarwal’s book [1] (you can find it
D
in our library or you may find it online here but not sure!) or Chapter 5 of
Simon J.D. Prince’s book [11] (you may find online here but not sure!).
L P
& N
D L
16
Lecture 6
S S
Introduction to
backpropagation by
F T
A
6.1 Suggested reading
R
Please go through Chapter 2, section 2.4 of Charu Aggarwal’s book [1] (you
D
can find it in our library or you may find it online here but not sure!)
L P
& N
D L
17
Lecture 7
S S
Introduction to parameter
initialisation and update rules by
F T
A
7.1 Suggested reading
R
Please go through the class slides.
D
Please go through Chapter 2, section 2.7 and Chapter 4, section 4.5 of
Charu Aggarwal’s book [1] (you can find it in our library or you may find it
P
online here but not sure!).
Project proposal submission deadline: 22-02-2025 (11:59 PM)
7.2
NL Assignment-2
&
Implement a simple two layers neural network to classify handwritten digits
in MNIST dataset with the following settings:
L
• Please follow the notebook and fill in the blanks (TODO) in first_nn_exc.py
D
• Consider different initialisation strategies discussed in the class.
• Implement different update rules discussed in the class.
• Search for the optimum hyper-parameters (learning rate, number of
hidden layers) through a grid search.
Submission deadline: 31-01-2025 (11:59 PM)
Submission file format: your_ID_full_name_2lr_net_mnist_data_version_
no.ipynb
18
Lecture 8
S S
Convolutional Networks-1
by
8.1 Suggested reading
F T
A
Please go through the class slides.
R
Please go through Chapter 2, section 2.7 and Chapter 4, section 4.5 of
Charu Aggarwal’s book [1] (you can find it in our library or you may find it
D
online here but not sure!) OR you can check Chapter 9 of Ian Goodfellow
et al. book [6]. Chapter 9 is freely downloadable from here.
L P
& N
D L
19
Lecture 9
S S
Convolutional Networks-2
by
9.1 Suggested reading
F T
A
Please go through the class slides.
R
Please go through Chapter 2, section 2.7 and Chapter 4, section 4.5 of
Charu Aggarwal’s book [1] (you can find it in our library or you may find it
D
online here but not sure!) OR you can check Chapter 9 of Ian Goodfellow
et al. book [6]. Chapter 9 is freely downloadable from here. You can also
P
check Christopher M. Bishop and Hugh Bishop’s book [2]. Chapter 10 is
freely online readable from here.
NL
L&
D
20
Lecture 10
S S
Convolutional Networks-3
by
10.1 Suggested reading
F T
A
Please go through the class slides.
R
Please go through Chapter 2, section 2.7 and Chapter 4, section 4.5 of
Charu Aggarwal’s book [1] (you can find it in our library or you may find it
D
online here but not sure!) OR you can check Chapter 9 of Ian Goodfellow
et al. book [6]. Chapter 9 is freely downloadable from here. You can also
P
check Christopher M. Bishop and Hugh Bishop’s book [2]. Chapter 10 is
freely online readable from here.
L
For backpropagation, you can check this.
N
10.2 Assignment-3
&
Implement a CNN to classify different objects in image data. The exact
L
problem will be given by the TA and it will be evaluated on the spot.
Submission deadline: 21-02-2025 (11:59 PM)
D
Submission file format: your_ID_full_name_cnn_version_no.ipynb
21
Lecture 11
S S
Introduction to NLP, language
model: N-gram by
F T
A
11.1 Suggested reading
R
Please go through the class slides.
D
For N-gram language model, you can go through Jurafsky and Martin’s [8]
book Chapter 3 (N-gram Language Models) [online]. For further interest, you
P
can look into the papers referred in Chapter 3
L
11.2 Homework
N
Implement a N-gram model on a toy corpus discussed in the class.
&
For the Bag-of-Words model, you can go through Jacob’s [5] book Chapter
4 (Linguistic applications of classification)[online]. For further interest, I
L
am encouraging you to go through the paper by Pang et al. titled with
Thumbs up?: sentiment classification using machine learning techniques and
D
the paper by Zellig S. Harris titled with Distributional Structure.
22
Lecture 12
S S
Word embeddings: vector
semantics, neural word by
embedding
F T
12.1 Suggested reading
R A
D
First go through the word representation in vectorised form in Jurafsky and
P
Martin’s [8] book Chapter 6 (Vector Semantics and Embeddings) [online]. For
word2vec, please go through the original paper title with efficient estimation
L
of word representations in vector space [9]. A good documentation word2vec
parameters title with word2vec Parameter Learning Explained. An online
N
demo https://ronxin.github.io/wevi/. A word2vec test demo notebook
is here: ss_word2vec_demo.ipynb
L
12.2
&Homework
D
Derive the gradient of cross-entropy loss with respect to all the parameters
in the word2vec model discussed in the class.
23
Bibliography
S S
by
[1] Charu C. Aggarwal. Neural Networks and Deep Learning: A Textbook.
T
Springer Cham, 2nd edition, 2023.
F
[2] Christopher Bishop and Hugh Bishop. Deep Learning: Foundations and
Concepts. Springer Cham, 1st edition, 2023.
A
[3] Christopher M. Bishop. Pattern Recognition and Machine Learning.
R
Springer, 1st edition, 2006.
D
Springer Cham, 1st edition, 2020.
P
[5] Jacob Eisenstein. Introduction to Natural Language Processing. MIT
Press, 1st edition, 2019.
L
[6] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning.
N
MIT Press, 1st edition, 2016.
[7] Simon Haykin. Neural Networks and Learning Machines. Pearson, 3rd
&
edition, 2009.
L
[8] Dan Jurafsky and James H. Martin. Speech and Language Processing.
D
draft, 3rd edition, 2023.
[9] Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. Efficient
estimation of word representations in vector space, 2013.
[11] Simon J.D. Prince. Understanding Deep Learning. MIT Press, 1st edi-
tion, 2023.
24
BIBLIOGRAPHY 25
S
[13] Frank Rosenblatt. Principles of neurodynamics: Perceptrons and the
S
theory of brain mechanisms. Technical report, Cornell Aeronautical
Laboratory, 1961.
y
[14] Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine
b
Learning: From Theory to Algorithms. Cambridge University Press,
1st edition, 2014.
F T
R A
D
L P
& N
D L