0% found this document useful (0 votes)
17 views

Chapter 1

The document provides an overview of Natural Language Processing (NLP) and Large Language Models (LLMs), detailing key components, traditional approaches, and applications of NLP. It highlights the enhanced capabilities of LLMs, including contextual understanding and adaptability, while also addressing challenges such as hallucination and bias in AI outputs. Additionally, it discusses the training process for LLMs, including data collection, tokenization, and the use of transformer architecture.

Uploaded by

vicky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Chapter 1

The document provides an overview of Natural Language Processing (NLP) and Large Language Models (LLMs), detailing key components, traditional approaches, and applications of NLP. It highlights the enhanced capabilities of LLMs, including contextual understanding and adaptability, while also addressing challenges such as hallucination and bias in AI outputs. Additionally, it discusses the training process for LLMs, including data collection, tokenization, and the use of transformer architecture.

Uploaded by

vicky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 1

Natural Language Processing(NLP)


• NLP is a field of artificial intelligence that focuses on the interaction between computers and humans
through natural language. The goal is to enable computers to understand, interpret, and generate human
language.
• Key Components:
• Tokenization: Breaking text into smaller units like words or sentences.
• Part-of-Speech Tagging: Identifying the grammatical parts of speech in a text.
• Named Entity Recognition (NER): Detecting and classifying entities like names, dates, and locations.
• Sentiment Analysis: Determining the sentiment or emotion behind a text.
• Machine Translation: Translating text from one language to another.
• Text Summarization: Condensing long texts into shorter summaries.
• Speech Recognition: Converting spoken language into text.
• Applications:
Sentiment Analysis in NLP
• Chatbots and virtual assistants
• Text analysis and sentiment analysis
Traditional NLP Approaches:
• Language translation
• Rule-Based Methods
• Information retrieval •Sentiment Analysis in LLMs
• Speech-to-text systems Large Language Models:
• Contextual Understanding and Sentiment Analysis Models:
Overview of LLM Concepts
• LLM Jargons and Concepts

• Generative AI refers to artificial intelligence systems that create new


content, such as text, images, or music, by learning patterns from
existing data.

• Large Language Model (LLM), refers to AI models trained on vast


amounts of text data to understand and generate human-like language
Enhanced Capabilities of LLM over NLP
1.Contextual Understanding: LLMs can comprehend and generate text that is
contextually accurate and coherent, making interactions more natural and effective.
2.Versatility: They excel in a wide range of tasks including text generation,
summarization, translation, and sentiment analysis.
3.Scalability: LLMs handle vast amounts of data and complex patterns, leading to more
robust and accurate models.
4.Adaptability: They can be fine-tuned for specific tasks, improving performance on
domain-specific applications.
Comparison

Aspect NLP LLMs Generative AI

Field focused on
Advanced models within AI that generates new
Definition processing human
NLP using deep learning content
language

Broad, includes various Subset of NLP, focuses on Broad, includes text,


Scope
tasks and techniques deep learning models images, music, and more

Statistical methods, ML, Transformers, deep


Techniques GANs, VAEs, transformers
deep learning learning

Text generation, question Image generation, text


Sentiment analysis, NER,
Applications answering, language generation, music
translation
translation creation

Tokenization, sentiment
Examples analysis, text GPT-3, BERT, T5 GPT-3, DALL-E, VQ-VAE
summarization
LLM Terms
•GPT-4: 1.2trillion A powerful tool for specific tasks requiring fast processing and large-scale data analy
•Human Brain(80 billion): Unmatched in creativity, emotional intelligence, and adaptive learning.

• Foundation LLM (Large Language Model): AI models trained on massive datasets to understand and generate human-like text.

• Transformer architecture: A neural network architecture that uses self-attention mechanisms to process sequences of data, foundational for

many LLMs.

• Attention Mechanism: A technique within transformers where each word in a sentence is weighted by its relevance to other words,

allowing the model to capture context more effectively.

• Parameters: Numeric values within the model that determine its learning capability (e.g., GPT-3 has 175 billion parameters !!!).

• Token: Smallest unit of text (words, characters, or subwords) processed by the model.
Microsoft Copilot has a context window size of 4096
• Context Window: The amount of text the model can consider at once, typically defined by the number tokens when working with OpenAI GPT-4o. This allows
of tokens.
Copilot to handle a larger amount of text compared to
• Pre-Training: The initial phase where a model is trained on a vast dataset to learn general language GPT-3's 2048 token
patterns.

• Fine-tuning: Adapting a pre-trained model to a specific task or dataset for improved performance.

• Prompt Engineering: Crafting specific input queries to elicit desired outputs from the model.

•Traditional Word Weighting (e.g., TF-IDF) in NLP:


•Attention Mechanism in LLMs: Assigns dynamic weights
based
prompt engineering is a technique aimed at optimizing how queries are presented to the AI model to •on the context of the word in relation to other words in the
elicit better responses, rather than training users .
sentence.
Fine tuning in LLM vs Optimization in
ML
• fine-tuning is a form of optimization focused on adapting a pre-
trained model to a specific task, optimization in machine learning
encompasses a variety of techniques aimed at improving model
performance by adjusting both hyperparameters and parameters.
• Fine-tuning a pre-trained BERT model for a specific NLP task is an
example of optimization. You adjust the model's weights by
continuing training on task-specific data, which optimizes the model
for that particular task. About Parameters
In a simple neural network with one input layer, one hidden layer, and one output
layer:
•If the input layer has 3 neurons, the hidden layer has 4 neurons, and the output layer
has 1 neuron, the number of parameters would be:
• Weights: (3 inputs×4 hidden neurons)
+(4 hidden neurons×1 output neuron)(3 \text{ inputs} \times 4 \
text{ hidden neurons}) + (4 \text{ hidden neurons} \times 1 \text{ output
neuron})
• Biases: 4 biases for hidden layer+1 bias for output layer4 \text{ biases for
hidden layer} + 1 \text{ bias for output layer}
• Total Parameters: (3×4)+(4×1)+4+1=21
FFN vs CNN vs RNN in LLMs
Network Type Usage in LLMs Primary Application
Non-linear transformations in
Feed-Forward Networks (FFNs) Integral part of Transformer layers
Transformers
Convolutional Neural Networks Image processing, local
Rarely used in LLMs
(CNNs) dependency capture in text
Not used in modern LLMs, replaced Sequential data processing,
Recurrent Neural Networks (RNNs)
by Transformers temporal dependencies

Aspect RNNs Transformers


Processing Sequential (one step at a time) Parallel (entire sequence at once)
Efficiently captures long-range
Long-Range Dependencies Struggles with very long sequences
dependencies
Training Speed Slower due to sequential nature Faster due to parallelization
Complexity Simpler architecture More complex due to self-attention
Requires more computational
Resource Usage Less computationally intensive
resources
NLP tasks, large datasets, long
Applications Time series, smaller sequences
sequences
Qns?
• what is next to genAI?
• Can I use Autonomous AI for realtime issues when spaceship carrying astronauts?
• What would be the output of sentiment analysis from NLP and sentiment analysis of LLM for the statement given
“"the climate is very cool"
• What is GAN VAE transformers?
• Weightage of word in a sentence is there in NLP?
• How TF-IDF differs from attention mechanism in LLM?
• Is self attention mechanism a part of transformer?
• What is Context Window in LLM?
• Is prompt engineering to train the user or train the software to help models to understand the query better?
• Number of Parameters for GPT 3 and 4?
• approximately how many parameters max any human brain can handle?
• Is GPT 4 better than human brain?
• Transformer and Brain are similar ?
• Neuron and Parameters are similar?

Imagine a Transformer as a highly specialized calculator for text, capable of processing language efficiently and
accurately. In contrast, the human brain is like a vast, adaptive network, capable of creativity, emotions, and
holistic understanding.
Transformers are powerful tools that help AI understand and generate human language by looking at all parts of a sentence
together and figuring out how they relate to each other.

Overview of LLM Concepts


Overview of Training of LLMs from Scratch
➢Data Collection and Preparation: Gather diverse text data
(e.g., books, articles, web content) to ensure the model
learns from varied language patterns.
➢Tokenization: Break down text into smaller units (tokens)
like words, subwords, or characters for the model to process
efficiently.
➢Embedding Representation: Convert tokens into dense
numerical vectors (embeddings) that capture their semantic
meaning in a high-dimensional space.
➢Self-Supervised Learning: Train the model on tasks like
predicting the next token or filling blanks, leveraging context
without needing labeled data.
➢Optimization through Backpropagation: Update billions of
parameters iteratively to minimize errors and improve the
model’s ability to predict accurately.
➢Foundation for Generalization: Produce a versatile model
capable of understanding and generating human-like text,
ready for fine-tuning on specific tasks.
Major Issues with LLM Models

Hallucination
Biased output
Hallucinations(inaccurate or nonsensical
information confidently)- Example
• Input Prompt: "Tell me about the moon landing in 1969."
• Hallucinated Response: "In 1969, astronauts from the fictional
country of Zogonia landed on the moon. They discovered a hidden
civilization of moon people who communicated using light signals.
The Zogonian astronauts brought back samples of moon cheese,
which became a popular delicacy on Earth."
• Clearly, this response is completely fabricated and not based on any
factual information. It's important for AI users to verify the
information from trusted sources, especially when it comes to
historical events or scientific facts.
Biased answer
• Ethical Considerations: LLMs may inadvertently produce biased or harmful
outputs, reflecting biases in their training data. Generating gender-stereotypical
job suggestions like "Women are better suited for nursing than engineering.“
• "Which is better, Android or iOS?"
• Biased Response: "iOS is definitely superior to Android in every way. iPhones have better build
quality, more consistent updates, and overall, it's the only platform worth using. Android phones
are just too fragmented and laggy."
• This response is biased because it expresses a strong preference for one operating system over
the other without acknowledging any potential strengths of the Android platform. It's important
for AI to remain neutral and present balanced information, allowing users to form their own
opinions.

• LLMs also face other challenges including data privacy and security concerns,
bias and fairness issues, high computational resource requirements
•Data Collection and Filtering:
•Large Text Data: The process begins with gathering a
•vast amount of unstructured text data from various sources such as books
•, articles, and web content.
•Quality Filtering: This data is then filtered to ensure high quality.
•Typically, 1-3% of the original tokens (words or characters) are filtered out to improve the training dataset's quality.
•Embedding and Transformation:
•Embedding Layer + Transformer: The filtered text data is passed
• through an embedding layer and a transformer within the Large Language Model (LLM).
• This step converts the text into numerical representations (embeddings) that capture the semantic meaning of the word
•Token Representation Example:
•The image provides an example of how tokens are represented:
•Token String: Words like 'The' and 'teacher'.
•Token ID: Numerical IDs assigned to each token (e.g., 37 for 'The' and 3145 for 'teacher').
•Embedding / Vector Representation:
•Each token is converted into a dense vector that captures
• its semantic meaning (e.g., [-0.0513, -0.0584, 0.0230, ...] for 'The').
•Model Types:
•The image highlights three types of models used in LLMs:
•Encoder Only Models: Focus on understanding the input data.
•Encoder Decoder Models: Use both encoders and decoders to understand and generate data.
•Decoder Only Models: Focus on generating data from given input.
•Hardware:
•The training process involves the use of GPUs (Graphics Processing Units),
•Self-Attention Mechanism: The encoder applies the self-attention mechanism to the input vectors.
• This step helps the model to weigh the importance of each word in the context of the entire input sequence.
• It allows the model to focus on relevant parts of the input when forming an understanding.
•Query, Key, and Value Vectors: Each word’s embedding is transformed into three vectors: query (Q), key (K),
•and value (V). The attention scores are calculated as the dot product of the query and key vectors,
•which are then scaled and passed through a softmax function to obtain attention weights.
•These weights are used to compute a weighted sum of the value vectors.
•Feed-Forward Neural Network: After self-attention, the output goes through a feed-forward neural network (FFN)
• This step helps in learning more complex patterns and relationships in the data.
•The FFN is applied to each position separately and identically.
•Add & Norm: The output from the self-attention and FFN layers are then normalized
•and added together (residual connections). This helps in stabilizing the training process and
•preserving the learned information.
•Stacking Layers: The encoder consists of multiple layers,
•each containing the self-attention mechanism, feed-forward network, and normalization steps.
•Stacking these layers helps in learning hierarchical representations and capturing complex dependencies.
FFN vs CNN vs RNN in LLMs
Network Type Usage in LLMs Primary Application
Non-linear transformations in
Feed-Forward Networks (FFNs) Integral part of Transformer layers
Transformers
Convolutional Neural Networks Image processing, local
Rarely used in LLMs
(CNNs) dependency capture in text
Not used in modern LLMs, replaced Sequential data processing,
Recurrent Neural Networks (RNNs)
by Transformers temporal dependencies

Aspect RNNs Transformers


Processing Sequential (one step at a time) Parallel (entire sequence at once)
Efficiently captures long-range
Long-Range Dependencies Struggles with very long sequences
dependencies
Training Speed Slower due to sequential nature Faster due to parallelization
Complexity Simpler architecture More complex due to self-attention
Requires more computational
Resource Usage Less computationally intensive
resources
NLP tasks, large datasets, long
Applications Time series, smaller sequences
sequences
RNN, LSTM, GRU and Transformers

Transformers use a mechanism called self-attention that allows them to process all tokens in the input
sequence(means context) simultaneously. -> bad in long range dependency and no parallelism in RNN
Transformer Architecture
Why RNN is replaced but FFN is present in Transformers?
• The absence of RNNs and the presence of Feed Forward Networks (FFNs) in
Transformers are key design choices that contribute to the efficiency and
effectiveness of the model.
Decoder
Tutorial 1 (06.01.2025):

Using the Hugging Face Transformers library, create a Python Hugging Face Transformers
•Library: Hugging Face Transformers is a library that provides
script to perform sentiment analysis on a given set of movie
reviews. analyze the sentiment of each review, and print the pre-trained models and tools for various NLP tasks. It includes
results with confidence scores. implementations of many state-of-the-art models like BERT,
GPT-3, RoBERTa, and more.
•Model Hub: Hugging Face also hosts a model hub where you
pip install transformers can find and share pre-trained models.
from transformers import pipeline
# Load pre-trained sentiment-analysis pipeline
classifier = pipeline('sentiment-analysis') pipeline is a high-level API provided by the Hugging Face
# Sample text for sentiment analysis Transformers library. It simplifies the use of various NLP models fo
texts = [ r specific tasks, such as sentiment analysis,
text generation, question answering, and more
"I love this product! It's absolutely amazing.",
"I'm very disappointed with the service.",
"The movie was okay, not great but not terrible either."
]
# Perform sentiment analysis on the sample text
results = classifier(texts)
# Print results
for text, result in zip(texts, results):
Lab Exercise
• vocab = {'I': 0, 'am': 1, 'a': 2, 'student': 3, '<pad>': 4}
• sentence = ['I', 'am', 'a', 'student’]

• How Embedding Layer Works:


• Initialization: The embedding layer is initialized with random weights. Each
index corresponds to a unique row in the embedding matrix.
• Lookup: When a token index is passed to the embedding layer, it retrieves
the corresponding dense vector (row) from the embedding matrix.
• Learning: During training, the weights (dense vectors) in the embedding
matrix are adjusted so that similar words have similar vectors.
Embedding Layer
1.Tokenization: The sentence "I am a student" is tokenized
into four tokens: ["I", "am", "a", "student"].
Input Sentence: "I am a student" 2.Token Indices: Each token is converted into an
index using the vocabulary: [0, 1, 2, 3].
Tokenization: ["I", "am", "a", "student"]
Embedding Layer
•The embedding layer converts each token index into
Token Indices: [0, 1, 2, 3] •a dense vector. Since we have four tokens, we get four dense
vectors.
Embedding Layer:
Dense Vectors
+-----+--------------------------+ •Each row in the output dense vector represents
| 0 | [0.3126, -0.3404, ...] | <- Dense vector for "I" •the embedding of one token:
| 1 | [0.3983, -0.6919, ...] | <- Dense vector for "am"
| 2 | [0.0448, -1.3975, ...] | <- Dense vector for "a" •The first row corresponds to the token "I" (index 0).
| 3 | [-0.4251, -0.8090, ...] | <- Dense vector for •The second row corresponds to the token "am" (index 1).
"student" •The third row corresponds to the token "a" (index 2).
•The fourth row corresponds to the token "student" (index 3).
+-----+--------------------------+
Final embedding
•input Sentence: "I am a student"
•Token Indices: [0, 1, 2, 3]
•Positional Encoding: A matrix with sine and cosine functions added to the embeddings to encode positional
information.
•Final Embeddings: Dense vectors with positional information added, allowing the model to understand the order of
the tokens.
•Dense Vectors: Embeddings obtained from the embedding layer.

Token indices: tensor([[0, 1, 2, 3]])


Dense vectors: tensor([[[ 0.4030, -0.3771, -1.2724, 0.2956, 1.4098, 0.3282, -0.3339, 0.1200],
[ 0.5299, -1.2559, -0.3435, 0.2764, 1.1120, 0.4046, -0.5038, 0.3093],
[-0.5219, 0.8253, 0.0578, -1.1450, -0.7489, -0.8933, 1.0720, -0.1573],
[-0.6408, -0.2061, 0.7078, -0.5652, -0.0746, -0.5749, 1.1648, -0.1233]]])
Positional Encoding: tensor([[ 0.0000, 1.0000, 0.0000, 1.0000, 0.0000, 1.0000, 0.0000, 1.0000],
[ 0.8415, 0.5403, 0.0084, 0.5403, 0.0084, 0.5403, 0.0084, 0.5403],
[ 0.9093, -0.4161, 0.0168, 0.4161, 0.0168, 0.4161, 0.0168, 0.4161],
[ 0.1411, -0.9900, 0.0252, 0.2904, 0.0252, 0.2904, 0.0252, 0.2904]])
Final Embeddings with Positional Encoding: tensor([[ 0.4030, 0.6229, -1.2724, 1.2956, 1.4098, 1.3282, -0.3339, 1.1200],
[ 1.3714, -0.7156, -0.3351, 0.8167, 1.1204, 0.9449, -0.4954, 0.8496],
[ 0.3874, 0.4092, 0.0746, -0.7289, -0.7321, -0.4772, 1.0888, 0.2588],
[-0.4997, -1.1961, 0.7330, -0.2748, -0.0494, -0.2845, 1.1900, 0.1671]])
Transformer Architecture

You might also like