100% found this document useful (4 votes)

68 views

Instant download [EARLY RELEASE] Quick Start Guide to Large Language Models: Strategies and Best Practices for using ChatGPT and Other LLMs Sinan Ozdemir pdf all chapter

Guide

Uploaded by

katellravai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (4 votes)

68 views

Instant download [EARLY RELEASE] Quick Start Guide to Large Language Models: Strategies and Best Practices for using ChatGPT and Other LLMs Sinan Ozdemir pdf all chapter

Guide

Uploaded by

katellravai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Download Full Version ebookmass - Visit ebookmass.

com

[EARLY RELEASE] Quick Start Guide to Large

Language Models: Strategies and Best Practices for
using ChatGPT and Other LLMs Sinan Ozdemir

https://ebookmass.com/product/early-release-quick-start-
guide-to-large-language-models-strategies-and-best-
practices-for-using-chatgpt-and-other-llms-sinan-ozdemir/

OR CLICK HERE

DOWLOAD NOW

Discover More Ebook - Explore Now at ebookmass.com

Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...

Applied Generative AI for Beginners: Practical Knowledge

on Diffusion Models, ChatGPT, and Other LLMs Akshay
Kulkarni
https://ebookmass.com/product/applied-generative-ai-for-beginners-
practical-knowledge-on-diffusion-models-chatgpt-and-other-llms-akshay-
kulkarni/
ebookmass.com

Applied Generative AI for Beginners: Practical Knowledge

on Diffusion Models, ChatGPT, and Other LLMs 1st Edition
Akshay Kulkarni
https://ebookmass.com/product/applied-generative-ai-for-beginners-
practical-knowledge-on-diffusion-models-chatgpt-and-other-llms-1st-
edition-akshay-kulkarni/
ebookmass.com

Quick Start Guide to FFmpeg V. Subhash

https://ebookmass.com/product/quick-start-guide-to-ffmpeg-v-subhash/

ebookmass.com

Dragon Heat: Destined lovers collide in a mythical

paradise. (Dragon Island Book 1) Jodi Kendrick

https://ebookmass.com/product/dragon-heat-destined-lovers-collide-in-
a-mythical-paradise-dragon-island-book-1-jodi-kendrick/

ebookmass.com
The Age of Post-Rationality: Limits of economic reasoning
in the 21st century 1st Edition Val Colic-Peisker

https://ebookmass.com/product/the-age-of-post-rationality-limits-of-
economic-reasoning-in-the-21st-century-1st-edition-val-colic-peisker/

ebookmass.com

Organizing Occupy Wall Street Marisa Holmes

https://ebookmass.com/product/organizing-occupy-wall-street-marisa-
holmes/

ebookmass.com

Infoselves Demetra Garbasevschi

https://ebookmass.com/product/infoselves-demetra-garbasevschi/

ebookmass.com

Death, Society, and Human Experience

https://ebookmass.com/product/death-society-and-human-experience/

ebookmass.com

European Welfare State Constitutions after the Financial

Crisis Ulrich Becker (Editor)

https://ebookmass.com/product/european-welfare-state-constitutions-
after-the-financial-crisis-ulrich-becker-editor/

ebookmass.com
Everyday Enterprise Architecture: Sense-making, Strategy,
Structures, and Solutions 1st Edition Tom Graves

https://ebookmass.com/product/everyday-enterprise-architecture-sense-
making-strategy-structures-and-solutions-1st-edition-tom-graves-2/

ebookmass.com
Quick Start Guide to Large Language
Models
Strategies and Best Practices for using ChatGPT and Other LLMs

Sinan Ozdemir

Addison-Wesley
Contents at a Glance

Preface
Part I: Introduction to Large Language Models
1. Overview of Large Language Models
2. Launching an Application with Proprietary Models
3. Prompt Engineering with GPT3
4. Fine-Tuning GPT3 with Custom Examples
Part II: Getting the most out of LLMs
5. Advanced Prompt Engineering Techniques
6. Building a Recommendation Engine
7. Combining Transformers
8. Fine-Tuning Open-Source LLMs
9. Deploying Custom LLMs to the Cloud
Table of Contents

Preface
Part I: Introduction to Large Language Models
Chapter 1: Overview of Large Language Models
What Are Large Language Models (LLMs)?
Popular Modern LLMs
Domain-Specific LLMs
Applications of LLMs
Chapter 2: Launching an Application with Proprietary Models
Introduction
The Task
Solution Overview
The Components
Putting It All Together
The Cost of Closed-Source
Summary
Chapter 3: Prompt Engineering with GPT3
Introduction
Prompt Engineering
Working with Prompts Across Models
Building a Q/A bot with ChatGPT
Summary
Chapter 4: Fine-Tuning GPT3 with Custom Examples
Overview of Transfer Learning & Fine-tuning
Overview of GPT3 Fine-tuning API
Using Fine-tuned GPT3 Models to Get Better Results
Part II: Getting the most out of LLMs
Chapter 5: Advanced Prompt Engineering Techniques
Input/Output Validation
Chain of Thought Prompting
Prompt Chaining Workflows
Preventing against Prompt Injection Attacks
Building a bot that can execute code on our behalf
Chapter 6: Building a Recommendation Engine
Overview of Siamese BERT Architectures
Fine-Tuning BERT for Classifying + Tagging Items
Fine-Tuning Siamese BERT for Recommendations
Chapter 7: Combining Transformers
Overview of Vision Transformer
Building an Image Captioning System with GPT-J
Chapter 8: Fine-Tuning Open-Source LLMs
Overview of T5
Building Translation/Summarization Pipelines with T5
Chapter 9: Deploying Custom LLMs to the Cloud
Overview of Cloud Deployment
Best Practices for Cloud Deployment
Preface

The advancement of Large Language Models (LLMs) has revolutionized the

field of Natural Language Processing in recent years. Models like BERT, T5,
and ChatGPT have demonstrated unprecedented performance on a wide
range of NLP tasks, from text classification to machine translation. Despite
their impressive performance, the use of LLMs remains challenging for many
practitioners. The sheer size of these models, combined with the lack of
understanding of their inner workings, has made it difficult for practitioners
to effectively use and optimize these models for their specific needs.

This practical guide to the use of LLMs in NLP provides an overview of the
key concepts and techniques used in LLMs and explains how these models
work and how they can be used for various NLP tasks. The book also covers
advanced topics, such as fine-tuning, alignment, and information retrieval
while providing practical tips and tricks for training and optimizing LLMs for
specific NLP tasks.

This work addresses a wide range of topics in the field of Large Language
Models, including the basics of LLMs, launching an application with
proprietary models, fine-tuning GPT3 with custom examples, prompt
engineering, building a recommendation engine, combining Transformers,
and deploying custom LLMs to the cloud. It offers an in-depth look at the
various concepts, techniques, and tools used in the field of Large Language
Models.
Topics covered:

Coding with Large Language Models (LLMs)

Overview of using proprietary models

OpenAI, Embeddings, GPT3, and ChatGPT

Vector databases and building a neural/semantic information retrieval system

Fine-tuning GPT3 with custom examples

Prompt engineering with GPT3 and ChatGPT

Advanced prompt engineering techniques

Building a recommendation engine

Combining Transformers

10. Deploying custom LLMs to the cloud

Part I: Introduction to Large Language
Models
1

Overview of Large Language Models

Ever since an advanced artificial intelligence (AI) deep learning model called
the Transformer was introduced by a team at Google Brain in 2017, it has
become the standard for tackling various natural language processing (NLP)
tasks in academia and industry. It is likely that you have interacted with a
Transformer model today without even realizing it, as Google uses BERT to
enhance its search engine by better understanding users’ search queries. The
GPT family of models from OpenAI have also received attention for their
ability to generate human-like text and images.

Figure 1.1 A brief history of Modern NLP highlights using deep learning to
tackle language modeling, advancements in large scale semantic token
embeddings (Word2vec), sequence to sequence models with attention
(something we will see in more depth later in this chapter), and finally the
Transformer in 2017.

These Transformers now power applications such as GitHub’s Copilot

(developed by OpenAI in collaboration with Microsoft), which can convert
comments and snippets of code into fully functioning source code that can
even call upon other LLMs (like in Listing 1.1) to perform NLP tasks.

Using the Copilot LLM to get an output from Facebook’s BART LLM

from transformers import pipeline

def classify_text(email):
"""
Use Facebook's BART model to classify an email into "spam"

Args:
email (str): The email to classify
Returns:
str: The classification of the email
"""
# COPILOT START. EVERYTHING BEFORE THIS COMMENT WAS INPUT T
classifier = pipeline(
'zero-shot-classification', model='facebook/bart-large-
labels = ['spam', 'not spam']
hypothesis_template = 'This email is {}.'

results = classifier(
email, labels, hypothesis_template=hypothesis_template)

return results['labels'][0]
# COPILOT END

In this listing, I use Copilot to take in only a Python function definition and
some comments I wrote and wrote all of the code to make the function do
what I wrote. No cherry-picking here, just a fully working python function
that I can call like this:

classify_text('hi I am spam') # spam

It appears we are surrounded by LLMs, but just what are they doing under the
hood? Let’s find out!

What Are Large Language Models (LLMs)?

Large language models (LLMs) are AI models that are usually (but not
necessarily) derived from the Transformer architecture and are designed to
understand and generate human language, code, and much more. These
models are trained on vast amounts of text data, allowing them to capture the
complexities and nuances of human language. LLMs can perform a wide
range of language tasks, from simple text classification to text generation,
with high accuracy, fluency, and style.

In the healthcare industry, LLMs are being used for electronic medical record
(EMR) processing, clinical trial matching, and drug discovery. In finance,
LLMs are being utilized for fraud detection, sentiment analysis of financial
news, and even trading strategies. LLMs are also used for customer service
automation via chatbots and virtual assistants. With their versatility and
highly performant natures, Transformer-based LLMs are becoming an
increasingly valuable asset in a variety of industries and applications.

Note

I will use the term understand a fair amount in this text. I am

usually referring to “Natural Language Understanding” (NLU)
which is a research branch of NLP that focuses on developing
algorithms and models that can accurately interpret human
language. As we will see, NLU models excel at tasks such as
classification, sentiment analysis, and named entity
recognition. However, it is important to note that while these
models can perform complex language tasks, they do not
possess true understanding in the way humans do.

The success of LLMs and Transformers is due to the combination of several

ideas. Most of these ideas had been around for years but were also being
actively researched around the same time. Mechanisms such as attention,
transfer learning, and scaling up neural networks which provide the
scaffolding for Transformers were seeing breakthroughs right around the
same time. Figure 1.1 outlines some of the biggest advancements in NLP in
the last few decades, all leading up to the invention of the Transformer.

The Transformer architecture itself is quite impressive. It can be highly

parallelized and scaled in ways that previous state of the art NLP models
could not be, allowing it to scale to much larger data sets and training times
than previous NLP models. The Transformer uses a special kind of attention
calculation called self-attention to allow each word in a sequence to “attend
to” (look to for context) all other words in the sequence, enabling it to capture
long-range dependencies and contextual relationships between words. Of
course, no architecture is perfect. Transformers are still limited to an input
context window which represents the maximum length of text it can process
at any given moment.

Since the advent of the Transformer in 2017, the ecosystem around using and
deploying Transformers has only exploded. The aptly named “Transformers”
library and its supporting packages have made it accessible for practitioners
to use, train, and share models, greatly accelerating its adoption and being
used by thousands of organizations and counting. Popular LLM repositories
like Hugging Face have popped up, providing access to powerful open-source
models to the masses. In short, using and productionizing a Transformer has
never been easier.

That’s where this book comes in.

My goal is to guide you on how to use, train, and optimize all kinds of LLMs
for practical applications while giving you just enough insight into the inner
workings of the model to know how to make optimal decisions about model
choice, data format, fine-tuning parameters, and so much more.

My aim is to make using Transformers accessible for software developers,

data scientists, analysts, and hobbyists alike. To do that, we should start on a
level playing field and learn a bit more about LLMs.

Definition of LLMs

To back up only slightly, we should talk first about the specific NLP task that
LLMs and Transformers are being used to solve and provides the foundation
layer for their ability to solve a multitude of tasks. Language modeling is a
subfield of NLP that involves the creation of statistical/deep learning models
for predicting the likelihood of a sequence of tokens in a specified
vocabulary (a limited and known set of tokens). There are generally two
kinds of language modeling tasks out there: autoencoding tasks and
autoregressive tasks Figure 1.2)

Note

The term token refers to the smallest unit of semantic

meaning created by breaking down a sentence or piece of text
into smaller units and are the basic inputs for an LLM. Tokens
can be words but also can be “sub-words” as we will see in
more depth throughout this book. Some readers may be
familiar with the term “n-gram” which refers to a sequence of
n consecutive tokens.

Autoregressive language models are trained to predict the next token in a

sentence, based only on the previous tokens in the phrase. These models
correspond to the decoder part of the transformer model, and a mask is
applied to the full sentence so that the attention heads can only see the tokens
that came before. Autoregressive models are ideal for text generation and a
good example of this type of model is GPT.

Autoencoding language models are trained to reconstruct the original

sentence from a corrupted version of the input. These models correspond to
the encoder part of the transformer model and have access to the full input
without any mask. Autoencoding models create a bidirectional representation
of the whole sentence. They can be fine-tuned for a variety of tasks such as
text generation, but their main application is sentence classification or token
classification. A typical example of this type of model is BERT.
Figure 1.2 Both the autoencoding and autoregressive language modeling task
involves filling in a missing token but only the autoencoding task allows for
context to be seen on both sides of the missing token.

To summarize, Large Language Models (LLMs) are language models that are
either autoregressive , autoencoding, or a combination of the two. Modern
LLMs are usually based on the Transformer architecture which is what we
will use but they can be based on another architecture. The defining feature of
LLMs is their large size and large training datasets which enables them to
perform complex language tasks, such as text generation and classification,
with high accuracy and with little to no fine-tuning.

Table 1.1 shows the disk size, memory usage, number of parameters, and
approximate size of the pre-training data for several popular large language
models (LLMs). Note that these sizes are approximate and may vary
depending on the specific implementation and hardware used.

Table 1.1 Comparison of Popular Large Language Models (LLMs)

But size is everything. Let’s look at some of the key characteristics of LLMs
and then dive into how LLMs learn to read and write.

Key Characteristics of LLMs

The original Transformer architecture, as devised in 2017, was a sequence-

to-sequence model, which means it had two main components:

An encoder which is tasked with taking in raw text, splitting them up into its
core components (more on this later), converting them into vectors (similar to
the Word2vec process), and using attention to understand the context of the
text

A decoder which excels at generating text by using a modified type of

attention to predict the next best token

As shown in Figure 1.3, The Transformer has many other sub-components

that we won’t get into that promotes faster training, generalizability, and
better performance. Today’s LLMs are for the most part variants of the
original Transformer. Models like BERT and GPT dissect the Transformer
into only an encoder and decoder (respectively) in order to build models that
excel in understanding and generating (also respectively).
Figure 1.3 The original Transformer has two main components: an encoder
which is great at understanding text, and a decoder which is great at
generating text. Putting them together makes the entire model a “sequence to
sequence” model.

In general, LLMs can be categorized into three main buckets:

Autoregressive models, such as GPT, which predict the next token in a

sentence based on the previous tokens. They are effective at generating
coherent free-text following a given context

Autoencoding models, such as BERT, which build a bidirectional

representation of a sentence by masking some of the input tokens and trying
to predict them from the remaining ones. They are adept at capturing
contextual relationships between tokens quickly and at scale which make
them great candidates for text classification tasks for example.

Combinations of autoregressive and autoencoding, like T5, which can use

the encoder and decoder to be more versatile and flexible in generating text.
It has been shown that these combination models can generate more diverse
and creative text in different contexts compared to pure decoder-based
autoregressive models due to their ability to capture additional context using
the encoder.
Figure 1.4 A breakdown of the key characteristics of LLMs based on how
they are derived from the original Transformer architecture.

Figure 1.4 shows the breakdown of the key characteristics of LLMs based on
these three buckets.

More Context Please

No matter how the LLM is constructed and what parts of the Transformer it
is using, they all care about context (Figure 1.5). The goal is to understand
each token as it relates to the other tokens in the input text. Beginning with
the popularity of Word2vec around 2013, NLP practitioners and researchers
were always curious about the best ways of combining semantic meaning
(basically word definitions) and context (with the surrounding tokens) to
create the most meaningful token embeddings possible. The Transformer
relies on the attention calculation to make this combination a reality.

Figure 1.5 LLMs are great at understanding context. The word “Python” can
have different meanings depending on the context. We could be talking about
a snake, or a pretty cool coding language.

Choosing what kind of Transformer derivation you want isn’t enough. Just
choosing the encoder doesn’t mean your Transformer is magically good at
understanding text. Let’s take a look at how these LLMs actually learn to
read and write.

How LLMs Work

How an LLM is pre-trained and fine-tuned makes all the difference between
an alright performing model and something state of the art and highly
accurate. We’ll need to take a quick look into how LLMs are pre-trained to
understand what they are good at, what they are bad at, and whether or not
we would need to update them with our own custom data.

Pre-training

Every LLM on the market has been pre-trained on a large corpus of text data
and on specific language modeling related tasks. During pre-training, the
LLM tries to learn and understand general language and relationships
between words. Every LLM is trained on different corpora and on different
tasks.

BERT, for example, was originally pre-trained on two publicly available text
corpora (Figure 1.6):

English Wikipedia - a collection of articles from the English version of

Wikipedia, a free online encyclopedia. It contains a range of topics and
writing styles, making it a diverse and representative sample of English
language text

• At the time 2.5 billion words.

The BookCorpus - a large collection of fiction and non-fiction books. It was

created by scraping book text from the web and includes a range of genres,
from romance and mystery to science fiction and history. The books in the
corpus were selected to have a minimum length of 2000 words and to be
written in English by authors with verified identities

• 800M words.

and on two specific language modeling specific tasks (Figure 1.7):

The Masked Language Modeling (MLM) task (AKA the autoencoding task)
—this helps BERT recognize token interactions within a single sentence.

The Next Sentence Prediction Task—this helps BERT understand how tokens
interact with each other between sentences.
Figure 1.6 BERT was originally pre-trained on English Wikipedia and the
BookCorpus. More modern LLMs are trained on datasets thousands of times
larger.

Pre-training on these corpora allowed BERT (mainly via the self-attention

mechanism) to learn a rich set of language features and contextual
relationships. The use of large, diverse corpora like these has become a
common practice in NLP research, as it has been shown to improve the
performance of models on downstream tasks.
Note

The pre-training process for an LLM can evolve over time as

researchers find better ways of training LLMs and phase out
methods that don’t help as much. For example within a year of
the original Google BERT release that used the Next Sentence
Prediction (NSP) pre-training task, a BERT variant called
RoBERTa (yes, most of these LLM names will be fun) by
Facebook AI was shown to not require the NSP task to match
and even beat the original BERT model’s performance in
several areas.

Depending on which LLM you decide to use, it will likely be pre-trained

differently from the rest. This is what sets LLMs apart from each other. Some
LLMs are trained on proprietary data sources including OpenAI’s GPT
family of models in order to give their parent companies an edge over their
competitors.

We will not revisit the idea of pre-training often in this book because it’s not
exactly the “quick” part of a “quick start guide” but it can be worth knowing
how these models were pre-trained because it’s because of this pre-training
that we can apply something called transfer learning to let us achieve the
state-of-the-art results we want, which is a big deal!
Figure 1.7 BERT was pre-trained on two tasks: the autoencoding language
modeling task (referred to as the “masked language modeling” task) to teach
it individual word embeddings and the “next sentence prediction” task to help
it learn to embed entire sequences of text.

Transfer Learning

Transfer learning is a technique used in machine learning to leverage the

knowledge gained from one task to improve performance on another related
task. Transfer learning for LLMs involves taking an LLM that has been pre-
trained on one corpus of text data and then fine-tuning it for a specific
“downstream” task, such as text classification or text generation, by updating
the model’s parameters with task-specific data.

The idea behind transfer learning is that the pre-trained model has already
learned a lot of information about the language and relationships between
words, and this information can be used as a starting point to improve
performance on a new task. Transfer learning allows LLMs to be fine-tuned
for specific tasks with much smaller amounts of task-specific data than it
would require if the model were trained from scratch. This greatly reduces
the amount of time and resources required to train LLMs. Figure 1.8 provides
a visual representation of this relationship.

Fine-tuning

Once a LLM has been pre-trained, it can be fine-tuned for specific tasks.
Fine-tuning involves training the LLM on a smaller, task-specific dataset to
adjust its parameters for the specific task at hand. This allows the LLM to
leverage its pre-trained knowledge of the language to improve its accuracy
for the specific task. Fine-tuning has been shown to drastically improve
performance on domain-specific and task-specific tasks and lets LLMs adapt
quickly to a wide variety of NLP applications.

Figure 1.8 The general transfer learning loop involves pre-training a model
on a generic dataset on some generic self-supervised task and then fine-
tuning the model on a task-specific dataset.

Figure 1.9 shows the basic fine-tuning loop that we will use for our models in
later chapters. Whether they are open-sourced or closed-sourced the loop is
more or less the same:

1. We define the model we want to fine-tune as well as any fine-tuning

parameters (e.g., learning rate)

2. We will aggregate some training data (the format and other characteristics
depend on the model we are updating)

3. We compute losses (a measure of error) and gradients (information about how

to change the model to minimize error)

4. We update the model through backpropagation – a mechanism to update

model parameters to minimize errors

If some of that went over your head, not to worry: we will rely on pre-built
tools from Hugging Face’s Transformers package (Figure 1.9) and OpenAI’s
Fine-tuning API to abstract away a lot of this so we can really focus on our
data and our models.

Note

You will not need a Hugging Face account or key to follow

along and use any of this code apart from very specific
advanced exercises where I will call it out.

Attention

The name of the original paper that introduced the Transformer was called
“Attention is all you need”. Attention is a mechanism used in deep learning
models (not just Transformers) that assigns different weights to different
parts of the input, allowing the model to prioritize and emphasize the most
important information while performing tasks like translation or
summarization. Essentially, attention allows a model to “focus” on different
parts of the input dynamically, leading to improved performance and more
accurate results. Before the popularization of attention, most neural networks
processed all inputs equally and the models relied on a fixed representation of
the input to make predictions. Modern LLMs that rely on attention can
dynamically focus on different parts of input sequences, allowing them to
weigh the importance of each part in making predictions.
Figure 1.9 The Transformers package from Hugging Face provides a neat
and clean interface for training and fine-tuning LLMs.

To recap, LLMs are pre-trained on large corpora and sometimes fine-tuned

on smaller datasets for specific tasks. Recall that one of the factors behind the
Transformer’s effectiveness as a language model is that it is highly
parallelizable, allowing for faster training and efficient processing of text.
What really sets the Transformer apart from other deep learning architectures
is its ability to capture long-range dependencies and relationships between
tokens using attention. In other words, attention is a crucial component of
Transformer-based LLMs, and it enables them to effectively retain
information between training loops and tasks (i.e. transfer learning), while
being able to process lengthy swatches of text with ease.

Attention is attributed for being the most responsible for helping LLMs learn
(or at least recognize) internal world models and human-identifiable rules. A
Stanford study in 2019 showed that certain attention calculations in BERT
corresponded to linguistic notions of syntax and grammar rules. For example,
they noticed that BERT was able to notice direct objects of verbs,
determiners of nouns, and objects of prepositions with remarkably high
accuracy from only its pre-training. These relationships are presented visually
in Figure 1.10.

There is research that explores what other kinds of “rules” LLMs are able to
learn simply by pre-training and fine-tuning. One example is a series of
experiments led by researchers at Harvard that explored an LLM’s ability to
learn a set of rules to a synthetic task like the game of Othello (Figure 1.11).
They found evidence that an LLM was able to understand the rules of the
game simply by training on historical move data.
Figure 1.10 Research has probed into LLMs to uncover that they seem to be
recognizing grammatical rules even when they were never explicitly told
these rules.
Figure 1.11 LLMs may be able to learn all kinds of things about the world,
whether it be the rules and strategy of a game or the rules of human language.

For any LLM to learn any kind of rule, however, it has to convert what we
perceive as text into something machine readable. This is done through a
process called embedding.

Embeddings

Embeddings are the mathematical representations of words, phrases, or

tokens in a large-dimensional space. In NLP, embeddings are used to
represent the words, phrases, or tokens in a way that captures their semantic
meaning and relationships with other words. There are several types of
embeddings, including position embeddings, which encode the position of a
token in a sentence, and token embeddings, which encode the semantic
meaning of a token (Figure 1.12).

Figure 1.12 An example of how BERT uses three layers of embedding for a
given piece of text. Once the text is tokenized, each token is given an
embedding and then the values are added up, so each token ends up with an
initial embedding before any attention is calculated. We won’t focus too
much on the individual layers of LLM embeddings in this text unless they
serve a more practical purpose but it is good to know about some of these
parts and how they look under the hood!

LLMs learn different embeddings for tokens based on their pre-training and
can further update these embeddings during fine-tuning.

Tokenization
Tokenization, as mentioned previously, involves breaking text down into the
smallest unit of understanding - tokens. These tokens are the pieces of
information that are embedded into semantic meaning and act as inputs to the
attention calculations which leads to ... well, the LLM actually learning and
working. Tokens make up an LLMs static vocabulary and don’t always
represent entire words. Tokens can represent punctuation, individual
characters, or even a sub-word if a word is not known to the LLM. Nearly all
LLMs also have special tokens that have specific meaning to the model. For
example, the BERT model has a few special tokens including the [CLS]
token which BERT automatically injects as the first token of every input and
is meant to represent an encoded semantic meaning for the entire input
sequence.

Readers may be familiar with techniques like stop words removal, stemming,
and truncation which are used in traditional NLP. These techniques are not
used nor are they necessary for LLMs. LLMs are designed to handle the
inherent complexity and variability of human language, including the usage
of stop words like “the” and “an” and variations in word forms like tenses
and misspellings. Altering the input text to an LLM using these techniques
could potentially harm the performance of the model by reducing the
contextual information and altering the original meaning of the text.

Tokenization can also involve several preprocessing steps like casing, which
refers to the capitalization of the tokens. There are two types of casing:
uncased and cased. In uncased tokenization, all the tokens are lowercased and
usually accents from letters are stripped, while in cased tokenization, the
capitalization of the tokens is preserved. The choice of casing can impact the
performance of the model, as capitalization can provide important
information about the meaning of a token. An example of this can be found in
Figure 1.13.

Note

It is worth mentioning that even the concept of casing has

some bias to it depending on the model. To uncase a text -
lowercasing and stripping of accents - is a pretty Western style
preprocessing step. I myself speak Turkish and know that the
umlaut (e.g. the Ö in my last name) matters and can actually
help the LLM understand the word being said. Any language
model that has not been sufficiently trained on diverse corpora
may have trouble parsing and utilizing these bits of context.

Figure 1.13 The choice of uncased versus cased tokenization depends on the
task. Simple tasks like text classification usually prefer uncased tokenization
while tasks that derive meaning from case like Named Entity Recognition
prefer a cased tokenization.

Figure 1.14 shows an example of tokenization, and in particular, an example

of how LLMs tend to handle Out of Vocabulary (OOV) phrases. OOV
phrases are simply phrases/words that the LLM doesn’t recognize as a token
and has to split up into smaller sub-words. For example, my name (Sinan) is
not a token in most LLMs (story of my life) so in BERT, the tokenization
scheme will split my name up into two tokens (assuming uncased
tokenization):

sin - the first part of my name

##an - a special sub-word token that is different from the word “an” and is
used only as a means to split up unknown words

Figure 1.14 Any LLM has to deal with words they’ve never seen before.
How an LLM tokenizes text can matter if we care about the token limit of an
LLM.

Some LLMs limit the number of tokens we can input at any one time so how
an LLM tokenizes text can matter if we are trying to be mindful about this
limit.

So far, we have talked a lot about language modeling - predicting

missing/next tokens in a phrase, but modern LLMs also can also borrow from
other fields of AI to make their models more performant and more
importantly more aligned - meaning that the AI is performing in accordance
with a human’s expectation. Put another way, an aligned LLM has an
objective that matches a human’s objective.

Beyond Language Modeling—Alignment + RLHF

Alignment in language models refers to how well the model can respond to
input prompts that match the user’s expectations. Standard language models
predict the next word based on the preceding context, but this can limit their
usefulness for specific instructions or prompts. Researchers are coming up
with scalable and performant ways of aligning language models to a user’s
intent. One such broad method of aligning language models is through the
incorporation of reinforcement learning (RL) into the training loop.

RL with Human Feedback (RLHF) is a popular method of aligning pre-

trained LLMs that uses human feedback to enhance their performance. It
allows the LLM to learn from feedback on its own outputs from a relatively
Discovering Diverse Content Through
Random Scribd Documents
The Bell in Själevad.

When the church at Själevad was about to be built, parishioners

could not agree upon a location. Those who resided farthest north
wished it built at Hemling, and those dwelling to the south desired it
more convenient to them. To terminate the wrangle an agreement
[213]was arrived at as ingenious as simple. Two logs were thrown out
into Hörätt Sound, and it was decided that if they floated out to sea
the church should be built at Voge, but if they floated in toward the
Fjord of Själevad, Hemling should be the building spot.

It happened that just then it was full high tide, when the current
changes from its usual course, and in consequence the logs floated
in favor of Hemling.

The Southerners found it hard to swallow their disappointment and at

once set their wits at work to find a way to defeat the accidental good
luck of their neighbors. In the old chapel of Hemling there was an
unusually large bell, said to have been brought from some strange
land, and regarded with great veneration. Upon this the Southerners
set their hope. One beautiful night they stole the bell and took it
southward, persuaded that their opponents would follow and build
the church near Voge. But the bell, which knew best where the
church ought to stand, provided itself with invisible wings and started
to fly back to the place from which it had been brought.

As it was winging its way homeward, an old woman standing on

Karnigberg—Hag Mountain—saw something strange floating
through the air, at which she stared earnestly, wondering what it
could be, finally recognizing the much prized bell of the parish,
whereupon she cried out:—
“Oh! See our holy church bell!”

Nothing more was needed to deprive the bell of its power of

locomotion and it plunged, like a stone, into Prest Sund—priest
sound—where, every winter, a hole in the ice marks its resting place
at the bottom. [214]
[Contents]
The Vätts Storehouse.

In Herjedalen, as in many of the northern regions of our country,

where there is yet something remaining of the primitive pastoral life,
there are still kept alive reminiscences of a very ancient people,
whose [215]occupation was herding cattle, which constituted their
wealth and support. It is, however, with a later and more civilized
people, though no date is given, that this narrative deals.

In days gone by, so the story goes, it happened that a milkmaid did
not produce as much milk and butter from her herd as usual, for
which her master took her severely to task. The girl sought
vindication by charging it upon the Vätts, who, she claimed,
possessed the place and appropriated a share of the product of the
herd. This, the master was not willing to believe, but, to satisfy
himself, went one autumn evening, after the cattle had been brought
home, to the dairy house, where he secreted himself, as he
supposed, under an upturned cheese kettle. He had not sat in his
hiding place long when a Vätt mother with her family—a large one—
came trooping in and began preparation for their meal.

The mother, who was busy at the fireplace, finally inquired if all had
spoons.

“Yes,” replied one of the Vätts. “All except him under the kettle.”

The dairyman’s doubts were now dispelled, and he hastened to move

his residence to another place. [216]
[Contents]
The Stone in Grönan Dal.

It is probable that the “Stone in Grönan Dal” is like the traditional

Phœnix, a pure tradition, since it has never been found by any one
of the many who have made pilgrimages to the valley in search of it,
for the purpose of deciphering the Runic [217]characters said to be
engraved thereon. Yet many stories are widely current in the land
concerning it, and the old people relate the following:

When St. Jaffen, “the Apostle of the North,” was one time riding
through Jämtland from the borders of Norway, his way led along a
beautiful green valley, in the parish of Åre. Becoming weary, he
dismounted and laid himself down for a nap. When he awoke it
occurred to him that such a garden spot must some day be inhabited
by mankind, so, selecting a slab of stone, he cut in its surface the
following prophetic lines:

“When Swedish men adopt foreign customs

And the land loses its old honor,
Yet, shall stand the Stone in Grönan Dal.

When churches are converted into prisons,

And God’s services have lost their joyous light,
Yet shall stand the Stone in Grönan Dal.

When rogues and villains thrive

And honest men are banished,
Yet will stand the Stone in Grönan Dal.

When priests become beggars,

And farmers monsters,
Then shall lie the Stone in Grönan Dal.”

When the Governor of the Province, Baron Tilas, in 1742, traveled

through Jämtland, he found, a few paces east of the gate of Skurdal,
a stone lying, which he concluded must be the stone so much talked
about. When his coat of arms and the date had been engraved upon
it, he caused it to be raised, so that, “even yet it stands, the Stone in
Grönan Dal.” [218]
[Contents]
The Voyage in a Lapp Sled.

In the great forest west of Samsele, a hunter, early one morning,

pursued his way in quest of game. About midday he ascended a
ridge, where he was overtaken by a Troll-iling—a storm said to be
raised by and to conceal a Troll—before which sticks and straws
danced in the air. Quickly grasping his knife he threw it at the wind,
which at once subsided, and in a few seconds the usual quiet
reigned.

Some time later he was again hunting, when he lost his way. After a
long and wearisome wandering he reached a Lapp hut, where he
found a woman stirring something in a kettle. When she had
concluded her cooking, she invited the hunter to dine, and gave him
the same knife to eat with that he had thrown at the storm.

The following day he wished to return home, but could not possibly
discover the course he should take, whereupon the Troll woman—for
his hostess was none other—directed him to get into the Lapp sled,
and attach to it a rope, in which he must tie three knots.

“Now, untie one knot at a time,” said she, “and you will soon reach
home.”

The hunter untied one knot, as instructed, and away went the rope,
dragging the sled after it into the air. After a time he untied another
knot, and his speed was increased. Finally he untied the last knot,
increasing the speed to such a rate that when the sled came to a
standstill, as it did, suddenly, not long after, he concluded his
journey, falling into his own yard with such force as to break his leg.
[219]
[Contents]
The Lapp Genesis, or the First of Mankind.

The Lapps, like other people, have their legends, and many of them
the same, or nearly so, as are found among other nations. Others
reflect more particularly the national characteristics of the Lapp folk.
Thus, for instance, there is to be found among them a tradition of a
general deluge, a universal catastrophe, whereof there still remains
a dim reminiscence in the consciences of so many other primitive
people.

Before the Lord destroyed mankind, so says the Lapp legend, there
were people in Samelads (Lappland), but when the Flood came
upon the earth every living creature perished except two, a brother
and sister, whom God conducted to a high mountain—Passevare
—“The Holy Mountain.”

When the waters had subsided and the land was again dry, the
brother and sister separated, going in opposite directions in search
of others, if any might be left. After three years’ fruitless search they
met, and, recognizing each other, they once more went into the
world, to meet again in three years, but, recognizing one another
now, also, they parted a third time. When they met at the end of
these three years neither knew the other, whereafter they lived
together, and from them came the Lapps and Swedes.

Again, as to the distinct manners and customs of the Lapps and

Swedes, they relate that at first both [220]Lapps and Swedes were as
one people and of the same parentage, but during a severe storm
the one became frightened, and hurried under a board. From this
came the Swedes, who live in houses. The other remained in the
open air, and he became the progenitor of the Lapps, who, to this
day, do not ask for a roof over their heads. [221]
[Contents]
The Giant’s Bride.

More than with anything else, the Lapp legends have to do with
giants and the adventures of mankind with them. The giant is feared
because of his great size and strength and his insatiable appetite for
human flesh. His laziness, clumsiness, and that he is inferior to the
man in intelligence are, however, often the cause of his overthrow.

It is, therefore, commonly an adventure wherein the giant has been

outwitted by a Lapp man or woman that concludes the giant stories.

There was one time a giant who made love to a rich Lapp girl.
Neither she nor her father were much inclined toward the match, but
they did not dare do otherwise than appear to consent and at the
same time thank the Giant for the high honor he would bestow upon
them. The father, nevertheless, determined that the union should not
take place, and consoled himself with the hope that when the time
arrived some means of defeating the Giant’s project would be
presented. Meantime he was obliged to set the day when the Giant
might come and claim his bride. Before the Giant’s arrival the Lapp
took a block of wood, about the size of his daughter, and clothing it in
a gown, a new cap, silver belt, shoes and shoe band, he sat it up in
a corner of the tent, with a close veil, such as is worn by Lapp brides,
over the head.

When the Giant entered the tent he was much [222]pleased to find the
bride, as he supposed, in her best attire awaiting him, and at once
asked his prospective father-in-law to go out with him and select the
reindeer that should go with the bride as her dower. Meanwhile the
daughter was concealed behind an adjacent hill with harnessed
reindeer ready for flight. When the reindeer had been counted out
the Giant proceeded to kill one of them for supper, while the Lapp
slipped off into the woods, and, joining his daughter, they fled with all
speed into the mountains.

The Giant, after dressing the reindeer, went into the tent to visit his
sweetheart.

“Now, my little darling,” said he, “put the kettle over the fire.”

But no move in the corner.

“Oh, the little dear is bashful, I’ll have to do it myself then,” said he.

After the pot had been boiling awhile he again addressed the object
in the corner:

“Now my girl, you may cleave the marrow bone,” but still no
response.

“My little one is bashful, then I must do it myself,” thought he.

When the meat was cooked he tried again:

“Come, now, my dear, and prepare the meat.” But the bride was as
bashful as before, and did not stir.

“Gracious! how bashful she is. I must do it myself,” repeated the

Giant.

When he had prepared the meal he bade her come and eat, but
without effect. The bride remained motionless in her corner. [223]

“The more for me, then,” thought he, and sat himself to the repast
with a good appetite. When he had eaten, he bade his bride prepare
the bed.
“Ah, my love, are you so bashful? I must then do it myself,” said the
simple Giant.

“Go now and retire.” No, she had not yet overcome her bashfulness,
whereupon the Giant became angry and grasped the object with
great force.

Discovering how the Lapp had deceived him, and that he had only a
block of wood instead of a human of flesh and blood, he was beside
himself with rage, and started in hot pursuit after the Lapp. The latter,
however, had so much the start that the Giant could not overtake
him. At the same time it was snowing, which caused the Giant to
lose his way in the mountains. Finally he began to suffer from the
cold. The moon coming up, he thought it a fire built by the Lapp, and
at once set out on a swift run toward it, but he had already run so far
that he was completely exhausted. He then climbed to the top of a
pine, thinking thereby to get near enough to the fire to warm himself,
but he froze to death instead, and thus ends the story. [224]
[Contents]
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.