Chapter 1
Chapter 1
Field focused on
Advanced models within AI that generates new
Definition processing human
NLP using deep learning content
language
Tokenization, sentiment
Examples analysis, text GPT-3, BERT, T5 GPT-3, DALL-E, VQ-VAE
summarization
LLM Terms
•GPT-4: 1.2trillion A powerful tool for specific tasks requiring fast processing and large-scale data analy
•Human Brain(80 billion): Unmatched in creativity, emotional intelligence, and adaptive learning.
• Foundation LLM (Large Language Model): AI models trained on massive datasets to understand and generate human-like text.
• Transformer architecture: A neural network architecture that uses self-attention mechanisms to process sequences of data, foundational for
many LLMs.
• Attention Mechanism: A technique within transformers where each word in a sentence is weighted by its relevance to other words,
• Parameters: Numeric values within the model that determine its learning capability (e.g., GPT-3 has 175 billion parameters !!!).
• Token: Smallest unit of text (words, characters, or subwords) processed by the model.
Microsoft Copilot has a context window size of 4096
• Context Window: The amount of text the model can consider at once, typically defined by the number tokens when working with OpenAI GPT-4o. This allows
of tokens.
Copilot to handle a larger amount of text compared to
• Pre-Training: The initial phase where a model is trained on a vast dataset to learn general language GPT-3's 2048 token
patterns.
• Fine-tuning: Adapting a pre-trained model to a specific task or dataset for improved performance.
• Prompt Engineering: Crafting specific input queries to elicit desired outputs from the model.
Imagine a Transformer as a highly specialized calculator for text, capable of processing language efficiently and
accurately. In contrast, the human brain is like a vast, adaptive network, capable of creativity, emotions, and
holistic understanding.
Transformers are powerful tools that help AI understand and generate human language by looking at all parts of a sentence
together and figuring out how they relate to each other.
Hallucination
Biased output
Hallucinations(inaccurate or nonsensical
information confidently)- Example
• Input Prompt: "Tell me about the moon landing in 1969."
• Hallucinated Response: "In 1969, astronauts from the fictional
country of Zogonia landed on the moon. They discovered a hidden
civilization of moon people who communicated using light signals.
The Zogonian astronauts brought back samples of moon cheese,
which became a popular delicacy on Earth."
• Clearly, this response is completely fabricated and not based on any
factual information. It's important for AI users to verify the
information from trusted sources, especially when it comes to
historical events or scientific facts.
Biased answer
• Ethical Considerations: LLMs may inadvertently produce biased or harmful
outputs, reflecting biases in their training data. Generating gender-stereotypical
job suggestions like "Women are better suited for nursing than engineering.“
• "Which is better, Android or iOS?"
• Biased Response: "iOS is definitely superior to Android in every way. iPhones have better build
quality, more consistent updates, and overall, it's the only platform worth using. Android phones
are just too fragmented and laggy."
• This response is biased because it expresses a strong preference for one operating system over
the other without acknowledging any potential strengths of the Android platform. It's important
for AI to remain neutral and present balanced information, allowing users to form their own
opinions.
• LLMs also face other challenges including data privacy and security concerns,
bias and fairness issues, high computational resource requirements
•Data Collection and Filtering:
•Large Text Data: The process begins with gathering a
•vast amount of unstructured text data from various sources such as books
•, articles, and web content.
•Quality Filtering: This data is then filtered to ensure high quality.
•Typically, 1-3% of the original tokens (words or characters) are filtered out to improve the training dataset's quality.
•Embedding and Transformation:
•Embedding Layer + Transformer: The filtered text data is passed
• through an embedding layer and a transformer within the Large Language Model (LLM).
• This step converts the text into numerical representations (embeddings) that capture the semantic meaning of the word
•Token Representation Example:
•The image provides an example of how tokens are represented:
•Token String: Words like 'The' and 'teacher'.
•Token ID: Numerical IDs assigned to each token (e.g., 37 for 'The' and 3145 for 'teacher').
•Embedding / Vector Representation:
•Each token is converted into a dense vector that captures
• its semantic meaning (e.g., [-0.0513, -0.0584, 0.0230, ...] for 'The').
•Model Types:
•The image highlights three types of models used in LLMs:
•Encoder Only Models: Focus on understanding the input data.
•Encoder Decoder Models: Use both encoders and decoders to understand and generate data.
•Decoder Only Models: Focus on generating data from given input.
•Hardware:
•The training process involves the use of GPUs (Graphics Processing Units),
•Self-Attention Mechanism: The encoder applies the self-attention mechanism to the input vectors.
• This step helps the model to weigh the importance of each word in the context of the entire input sequence.
• It allows the model to focus on relevant parts of the input when forming an understanding.
•Query, Key, and Value Vectors: Each word’s embedding is transformed into three vectors: query (Q), key (K),
•and value (V). The attention scores are calculated as the dot product of the query and key vectors,
•which are then scaled and passed through a softmax function to obtain attention weights.
•These weights are used to compute a weighted sum of the value vectors.
•Feed-Forward Neural Network: After self-attention, the output goes through a feed-forward neural network (FFN)
• This step helps in learning more complex patterns and relationships in the data.
•The FFN is applied to each position separately and identically.
•Add & Norm: The output from the self-attention and FFN layers are then normalized
•and added together (residual connections). This helps in stabilizing the training process and
•preserving the learned information.
•Stacking Layers: The encoder consists of multiple layers,
•each containing the self-attention mechanism, feed-forward network, and normalization steps.
•Stacking these layers helps in learning hierarchical representations and capturing complex dependencies.
FFN vs CNN vs RNN in LLMs
Network Type Usage in LLMs Primary Application
Non-linear transformations in
Feed-Forward Networks (FFNs) Integral part of Transformer layers
Transformers
Convolutional Neural Networks Image processing, local
Rarely used in LLMs
(CNNs) dependency capture in text
Not used in modern LLMs, replaced Sequential data processing,
Recurrent Neural Networks (RNNs)
by Transformers temporal dependencies
Transformers use a mechanism called self-attention that allows them to process all tokens in the input
sequence(means context) simultaneously. -> bad in long range dependency and no parallelism in RNN
Transformer Architecture
Why RNN is replaced but FFN is present in Transformers?
• The absence of RNNs and the presence of Feed Forward Networks (FFNs) in
Transformers are key design choices that contribute to the efficiency and
effectiveness of the model.
Decoder
Tutorial 1 (06.01.2025):
Using the Hugging Face Transformers library, create a Python Hugging Face Transformers
•Library: Hugging Face Transformers is a library that provides
script to perform sentiment analysis on a given set of movie
reviews. analyze the sentiment of each review, and print the pre-trained models and tools for various NLP tasks. It includes
results with confidence scores. implementations of many state-of-the-art models like BERT,
GPT-3, RoBERTa, and more.
•Model Hub: Hugging Face also hosts a model hub where you
pip install transformers can find and share pre-trained models.
from transformers import pipeline
# Load pre-trained sentiment-analysis pipeline
classifier = pipeline('sentiment-analysis') pipeline is a high-level API provided by the Hugging Face
# Sample text for sentiment analysis Transformers library. It simplifies the use of various NLP models fo
texts = [ r specific tasks, such as sentiment analysis,
text generation, question answering, and more
"I love this product! It's absolutely amazing.",
"I'm very disappointed with the service.",
"The movie was okay, not great but not terrible either."
]
# Perform sentiment analysis on the sample text
results = classifier(texts)
# Print results
for text, result in zip(texts, results):
Lab Exercise
• vocab = {'I': 0, 'am': 1, 'a': 2, 'student': 3, '<pad>': 4}
• sentence = ['I', 'am', 'a', 'student’]