0% found this document useful (0 votes)
34 views

1. LLMs for Me - Introduction LLMs & Generative Text

Uploaded by

vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

1. LLMs for Me - Introduction LLMs & Generative Text

Uploaded by

vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Before we get started, feel free to drop in the chat:

Who are you? (Introduce


1 yourself, feel free to drop
your LinkedIn)

Where are you based and


2 current role?

1 question you have about


3 NLP, data science, or AI
LLMs for me

Myles Harrison,
AI Consultant & Trainer

Introduction to LLMs &


Generative Text January 6th, 2025

llmsfor.me
Agenda

01 Welcome & Course Overview

02 Introduction to Large Language Models

03 Generative Text Models

04 Conclusion
Manifesto

Knowledge is only valuable if it is useful.

The best way to Learning is a


learn is by doing. non-linear process.

Learning is not a Teaching and


journey, it is guided learning are
exploration. complementary.
Course Overview
The course will run 6 weeks from Monday,
January 6th to Monday, February 10th,
2025.

Course sessions are 7-10 PM EST on


Monday evenings.

COURSE Mondays 7-10 PM EST


SESSIONS
Content & Delivery 1 Introduction to LLMs and Generative Text

2
The course will span 6 live sessions of
3 hours each, covering the topics in Fine-tuning LLMs, PEFT and Quantization
the curriculum show on the right.

Course sessions will be held online


through Google Meet.
3 GPT and the OpenAI Ecosystem

4 Developing Large Language Model


Slides will be provided in PDF format
and code in Jupyter notebooks which
can either be run locally or through
Applications Locally
Google Colab.
5 Multimodal LLMs and Frameworks

6 Case Study in LLms and Generative AI


Introduction to LLMs &
Generative Text

In Part 1, we will introduce the field of


Large Language Models (LLMs) and
Generative Text models.

We will also get started working with


LLMs with our first steps with the
Hugging Face library.
Fine-tuning LLMs, PEFT
and Quantization

In this continuation from Part 1, we will


look at fine-tuning LLMs with our own
datasets to modify their behavior.

We will also look at approaches for


making this more computationally
tractable, collectively known as
Performance Efficient Fine-Tuning
(PEFT) techniques, as well as model
quantization.
GPT and the OpenAI
Ecosystem

In the third session of the course, we will


get introduced to the ecosystem of
models created by OpenAI as well as the
OpenAI platform.

We’ll make calls to the OpenAI API


programmatically in Python as a
precursor to building an LLM application
backed by one of the GPT-series of
models.
Developing Large
Language Model
Applications Locally

In the fourth session of the course,


we’ll looking at frameworks for
working with LLMs locally.

This will build upon our work using


Hugging Face, and we will also look at
the Ollama framework and associated
tooling.
Multimodal LLMs and
Frameworks

In the penultimate session of the


program, we will dive into multimodal
models by looking at image generation
with models such as Stable Diffusion and
Flux.

We will also look at frameworks for


running image generation models locally
such as ComfyUI.
Case Study in LLMs
and Generative AI

In the final session of the program, we


will apply all we’ve learned together to
build a simple MVP LLM application for a
case study.

We’ll also review everything we’ve


learned and look at potential next steps
on your learning journey into GenAI as
the course concludes.
Reminder Pricing &
Payment
This course is offered on a
Pay-What-You-Can (PWYC) basis.

You may pay any amount for the


course (including $0), based on what
you are able to comfortably afford
and that you feel the course is worth.

I would appreciate your support in


my developing the course and future
content.

You may pay at any time during the


course or after the course concludes.

nlpfromscratch.com/pwyc
Required Tooling
In the first part of the course, we’ll
be working exclusively in Google
Colab, which will make the technical
requirements much easier.

For sessions which require local LLM


development or other software and
tools, you will be notified well in
advance in order to allow you time
to install what is required and
familiarize yourself with it in
advance of each weekly session.
Intro to LLMs
What the Heck is an LLM?
A large language model (LLM) is a type of machine learning
model.

More specifically, LLMs are a kind of neural network or deep


learning model, a type of model based upon imitating the
structure of neurons in the brain.

The “large” in large language models refers to both the size


of the models - most modern LLMs being composed of
hundreds of millions, billions, or now even trillions (!) of
parameters - as well as the data they are trained upon,
which is typically very large bodies of text (trillions of
words).
Large language models currently represent the state of the
art in natural language processing (NLP) applications and
the vast majority are based upon the transformer
architecture.
The Transformer Architecture
● Groundbreaking paper "Attention is All You Need"
from Google researchers (Vaswani et al, 2017)
introduced Transformer architecture
● Original application in machine translation but now
general purpose and applied to a myriad of other
tasks
● Represents the state of the art for LLMs and also
applied in domains outside of language (image
generation) - virtually all new models based on this
architecture
● Popularized by OpenAI and the Generative
Pretrained Transformer (GPT) series of models
Types of Transformers (not Decepticons)

Encoder Only Decoder Only Encoder-Decoder

autoencoding models autoregressive models seq2seq models

TASKS

● Classification ● Text generation ● Translation


● Named entity recognition (Causal language modeling) ● Summarization
● Extractive QA ● Generative QA
● Masked language modeling

Credit: Abby Morgan


Language Modeling Tasks - Two Examples

The rain in [MASK] falls mainly in the plain. Masked Language


The rain in Spain falls mainly in the plain. Modeling (MLM)

The rain in Spain ? ? ? ? ? ?


The rain in Spain falls ? ? ? ? ?
The rain in Spain falls gracefully ? ? ? Causal Language
The rain in Spain falls gracefully from ? ? Modeling (CLM)
The rain in Spain falls gracefully from the ?
The rain in Spain falls gracefully from the sky.
Foundation Models

Encoder Only Decoder Only Encoder-Decoder

BERT GPT T5

● Bi-directional stacked encoders ● Stacked decoders ● Encoder and decoder


● Trained using masked token and ● Generative text model ● Text-To-Text Transfer Transformer
next sentence prediction ● Innovation and improved ● Multiple different tasks in training
● Highly generalizable by adding performance with RLHF and objectives
heads for different tasks ● Size follows Moore’s Law, ● Text as input, text as output
● “Foundation of foundation” proprietary after GPT-2 ● Google Research, June 2020
● Google Research, October 2018 ● OpenAI, June 2018
Use Cases for Generative Text
Code autocompletion and AI-assisted coding: Microsoft’s Github Copilot was launched in
June 2022. Initially, more that ¼ of developers’ code files on average were generated by
GitHub Copilot, and today with widespread adoption this is close to nearly half (~46%) and
has been used by over 1M developers. In October 2023, Copilot surpassed $100M in
annually recurring revenue.

Writing Assistants for creativity and copywriting: AI writing assistants have arisen for
improved productivity and content creation for marketing, sales, creative, and numerous
other areas. For example, Google has made this a part of their core offerings with their
announcement of Duet AI and Canva has introduced MagicWrite based upon OpenAI’s
offerings.

Entertainment and Social: Training generative language models on specific datasets has
allowed to give them “personality”. Character.ai was created by developers who previously
worked on Google’s LaMDA model, offers chatbots based upon fictional characters and
famous individuals. It is #2 on Anderssen- Horowitz’s list of top 50 most popular GenAI web
products (Sept 2023).
GPT - The Household name of LLMs

GPT-1
Toronto Book Corpus
115M parameters ~800M words

GPT-2 1.5B parameters WebText (8M docs, 40GB)

CommonCrawl, Books 1+2,


GPT-3 175B parameters WebText, Wikipedia (~45 TB?)

GPT-4 1.7T parameters?


Hugging Face
Hugging Face is a software company founded in 2013 and based in New
York city. As of August 2023, the company is in Series ‘D’ funding with a
valuation of $4.5B and backing from companies such as Salesforce,
Google, Amazon, IBM, Nvidia, AMD, and Intel.
While this name refers to the company, it also refers to the software and
platform they develop for working with large language models and data in
the natural language processing and other domains.
The datasets library allows working with data hosted on the platform,
and the transformers library for working with models of this type.
There are also other libraries for working with specialized types of models
(e.g. diffusers for diffusion models) and data processing and model
optimization.
Creating a Hugging Face account
Generative Text
Models
Generating Text with a Model
When generating text, the model assigns probabilities to all possible tokens based on its
understanding of the entire context. It then selects the next token in the output based on
these probabilities.
There are different parameters we can specify when generating text from a model to vary the
outputs thereof.

INPUT MODEL PROBABILITIES OUTPUT

The rain in plain

Spain falls plain.


meadow
fields
mainly in the… mountains
Generating Text in Hugging Face 🤗
INPUT MODEL OUTPUT

The rain in plain of the


Spain falls Canary
mainly in the… Islands, but

[1, 15, 22, 104, …]


[22, 105, 52, …]
[1, 0, 1, 1, 1, …]
TOKEN IDS &
TOKENIZER OUTPUT TOKEN IDS TOKENIZER
ATTENTION MASK
Greedy Search vs. Beam Search
● Greedy search, is the simplest decoding strategy, and chooses the token
with the highest probability at each step. However, this may not always
lead to the most coherent outputs since it prioritizes the most probable
token at each step without considering the overall context.
● Beam search, on the other hand, keeps track of a fixed number (the beam
width) of the most probable tokens at each step, and chooses the
combination of multiple tokens with the highest overall probability over
the beam width.
● In general, beam search tends to work well with tasks such as
translation or summarization, where the output length is predictable,
but less so in open-ended generation, where its results can be repetitive
or predictable
GREEDY SEARCH 0.90 grasses In Greedy search, the most
probable next token is always
0.40 flowers selected at each point in the
meadow
The rain in 0.10 predicted sequence.
Spain falls
mainly in the… plain 0.45 ‘Plain’ is the most probable
0.60 next token, followed by
of
‘which’.
0.55
which

BEAM SEARCH grasses


Here, for a beam width of 2,
0.90 0.4 x 0.9 = 0.36 which is
greater than 0.6 x 0.55 = 0.33,
0.40 flowers so these tokens are used.
meadow
The rain in 0.10
Spain falls The probability over the beam
mainly in the… plain 0.45 width is greater, even though
0.60 the first token, ‘meadow’, has a
of
lower probability than ‘plain’.
0.55
which
Temperature
● When generating text, the temperature refers to determines the
variability of the output generated by the model

● A higher temperature value leads to more diverse and varied


outputs, whereas a lower value results in more focused and
deterministic results

● Setting a temperature value of 0 will result in 100% deterministic


outputs (same output for a given input)

● Setting a temperature value higher will give the model too much
freedom and can result in random or nonsensical outputs (gibberish)

● Lower temperatures more appropriate when performing tasks that


have a "correct" answer (e.g. Q&A or summarization)
More technically speaking The rain in Spain falls
mainly in the…
● The probability distribution of next tokens for a given
input is modeled by the softmax function:

where here, T represents the temperature and can be


any number from 0 to infinity
plain meadow fields mountains
● Therefore, as T approaches infinity, all tokens in
vocabulary become equally likely
● “Reasonable” values for temperature will therefore
vary by dataset model trained on and associated
distribution of probabilities, vocabulary size, etc.
● In practice, T is never set to zero, but some very small
number
plain meadow fields mountains
Top-k and Top-p (Nucleus) Sampling
● Both top-k and top-p sampling are methods to
introduce variety into text outputs and make them less
deterministic for a given input

● In top-k sampling, instead of selecting from all possible


tokens, only the top k most probable tokens by rank
are considered

● In top-p, or nucleus sampling, only the most probable


tokens whose collective probability is greater than or
equal to a specified threshold, p, are considered

● For both methods, the total probability mass is


redistributed amongst the new of possible tokens
The rain in Spain falls mainly in the…

Top-k, k = 5 Top-p, p = 0.8

token probability cumulative rank token probability cumulative rank


plain 0.5 0.5 1 plain 0.5 0.5 1
meadow 0.15 0.65 2 meadow 0.15 0.65 2
field 0.1 0.75 3 field 0.1 0.75 3
mountains 0.05 0.8 4 mountains 0.05 0.8 4
afternoon 0.05 0.85 5 afternoon 0.05 0.85 5
sunshine 0.025 0.875 6 sunshine 0.025 0.875 6
cities 0.025 0.9 7 cities 0.025 0.9 7
morning 0.05 0.95 8 morning 0.05 0.95 8
evening 0.025 0.975 9 evening 0.025 0.975 9
farms 0.025 1 10 farms 0.025 1 10
Finding a balance: Temperature and sampling
Low temperature, low top-p: Consider a narrow range of high-probability
tokens. This combination results in highly focused and predictable output.

High temperature, low top-p: Consider a narrow range of high-probability


tokens with near equal likelihood. The high temperature may still introduce
some randomness in the output.

Low temperature, high top-p: Consider a wider range of tokens but only
select the most probable ones, resulting in less varied output.

High temperature, high top-p: Consider a wide range of tokens with


increased likelihood of selecting any individual token. Can result in highly
varied but less coherent output.
Message Roles

SYSTEM USER ASSISTANT

Sets the behavior of Responses from the


Provide requests or
the assistant - model. Can be used to
input to which the
how it should behave include conversation
assistant will respond
at the conversation history when it is
(i.e. the prompts)
level (optional) important (optional)
Training a chat LLM - data format

<|user|>Hello, how are


conversation = [ import
from transformers
AutoModelForCausalLM, AutoTokenizer, set_seed
{"role": "user", "content": "Hello, how
conversation = [
are you?"},
you?<|end|>
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant",
{"role": "content":
"assistant", "content": "I'm doing great."I'm
How can I help you today?"},
doing great. How can I help you today?"},
{"role": "user", "content": "I'd like to show off how
<|assistant|>
] chat templating works!"},
]
I'm doing great. How
can I help you
today?<|end|>
tokenizer=AutoTokenizer.from_pretrained("meta-ll
Tokenizer = AutoTokenizer.from_pretrained(
ama/Llama-2-7b-hf")
"microsoft/Phi-3-mini-4k-instruct")
tokenizer.apply_chat_template(conversation,

<|endoftext|>
tokenize=False)
tokenizer.apply_chat_template(conversation,
tokenize=False))
End of Part 1
LLMsfor.me
PWYC Microcourse in LLMs and Generative AI
January 2025

Part 1 - Introduction to LLMs & Generative Text


Monday, January 6th, 2025 llmsfor.me
llmsfor.me

You might also like