Deep Learning for Natural Language GDG Bloomington 1690248059
Deep Learning for Natural Language GDG Bloomington 1690248059
Bloomington-Normal
July 18th, 2023
🏛🔠
Okay, but what's the deal with ChatGPT?
ChatGPT is an example of a large language
model (LLM), a type of deep learning model
trained with hundreds of millions or billions
of parameters on very large bodies of text.
Large language models currently represent
the state of the art in NLP.
We are primarily concerned with NLP as it pertains to the field of data science and AI, in this
meaning referring to teaching computers to process - and perhaps even "understand" - text
written in ordinary language and perform associated tasks.
Though the term processing usually refers specifically to altering and preparing data, in the
domain of AI, NLP is often used to refer more generally to any language problem, including
those of applying machine learning (ML) to language, since these still require processing text
data beforehand.
🔡🛠💡
A Brief History of NLP (according to Wikipedia)
🧮 🧠
Symbolic Statistical / ML Neural
(1950's- 1970's) (1980's- 2000's) (2000's - Present)
Whereas traditional software development is deterministic and requires the coding of specific
logic, machine learning models can learn from training data and infer relationships or make
predictions based upon patterns in a given data set, without being given explicit instructions.
Much of the mathematical backing for machine learning techniques has existed for quite some
time; it is only fairly recent advances in computing power, scale, and availability that have
enabled their application computationally, giving rise to the field of ML.
🤖🎓
Types of Machine Learning
A
B
Make predictions from a dataset Uses statistical techniques to Teaches an agent a behavior by
and data labels - associated uncover patterns in a dataset. optimizing against a target
categorical or numeric values, objective with a reward function.
In NLP, a major applications are
In NLP, can be used to classify topic modeling and embeddings - It is an important aspect of some
documents based upon their finding representations of large language models (LLMs) as
content, or to predict the next language in a vector space that part of their training using
character in generative text captures their statistical Reinforcement Learning from
applications. properties. Human Feedback (RLHF).
What is Deep Learning?
• Deep Learning is a specialized type of ML that takes
inspiration from the structure of the human brain
• Measurement of error
• Backpropagation (“backprop”)
Loss
applies changes to weights in Training
&
network as determined from Data
Gradients
gradients (direction of greatest
decrease of error)
Epoch 1
• Deep learning differs from other machine
learning in the neural networks are Batch 1
trained with batches (subsets of fixed
Batch 2
size) of the training data
Batch 3
• Once all the data has gone through the
network once, this is referred to as a Batch 4
Batch 4
(Python) Deep Learning Frameworks
• Google product
• Graph-based computation, GPU training
• Other deployment options (Tensorflow Lite, TF.js)
• Easy with integration of Keras into TF 2.x
• Facebook product
• Graph-based computation, GPU training
• Pytorch Mobile for embedded, no web (ONNX?)
• OOP dev focus (ML eng), Lightning equivalent to Keras
Language Models
🔠🧪💻
Sequence-to-Sequence Models
• Sequence-to-sequence
(Seq2Seq) neural networks take
a sequence as input and return a
sequence as output
• Applications in language
(generative models, translation,
text-to speech, summarization),
time series, audio / video
(captioning, transcription)
Many-to-one Many-to-many
Long-Short Term Memory Networks (LSTMs)
• Special type of RNN that captures
long-term dependencies
GPT-1
https://research.aimultiple.com/gpt/
To the Moon?
GPT-2 (2019):
1.5B parameters
GPT-3
GPT-3 (2020): GPT-2
175B parameters
https://research.aimultiple.com/gpt/
Evaluating Large Language Models
Massive Multitask Language Understanding (MMLU) Performance
As LLMs have become more
sophisticated and began excelling at
"few-shot" and "zero-shot" learning
tasks, general evaluation has more
become more challenging.
As such a series of benchmarks has
arisen which are closer to knowledge and
reasonings tasks that would be given to a
human.
Some of these benchmarks are
composites encompassing existing
benchmarks as a suite (e.g. HeLM).
Source: https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu
Chinchilla: Bigger is better?
The Chinchilla Model was presented by
Google Deepmind in March 2022 and
followed the development of the earlier
Gopher model.
https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI
🤯
🙃
🙄
LLMs: Player Pianos or beginnings of AGI?
The Shameless Plug
NLP4Free 🔠⚡🤖🧠😃
https://mylesharrison.com/nlp4free/
A Free Natural Language Processing (NLP) microcourse, from basics to deep learning
Let's keep learning together!
Feel free to connect with me and
continue the conversation:
www.mylesharrison.com
linkedin.com/in/mylesharrison/
calendly.com/mylesmharrison
Thanks for
listening!
😀
Image Attribution