0% found this document useful (0 votes)

60 views

Genai Principles

The document provides an introduction to generative artificial intelligence (GenAI) tools. Recent advances in machine learning, large datasets, and computing power have enabled GenAI tools to produce novel content like text, audio, video, and code based on user prompts, achieving human-level performance on academic benchmarks. This progress has led many to believe GenAI technologies could supercharge business processes and enable new opportunities. GenAI startups have seen substantial investment as the applications of these tools continue to expand.

Uploaded by

gawihe2261

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views

Genai Principles

Uploaded by

gawihe2261

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Karan Singh, Assistant Professor of Operations Research

Principles of Generative AI
A Technical Introduction

Generative artificial intelligence (GenAI) tools are an emerging class of new-age artificial
intelligence algorithms capable of producing novel content — in varied formats such as text,
audio, video, pictures, and code — based on user prompts. Recent advances in machine
learning (ML), massive datasets, and substantial increases in computing power have propelled
such tools to human-level performance on academic and professional benchmarks1,
comparable to the ninetieth percentile on the SAT and the bar exam.

This rapid progress has led many2 to to believe that the metamorphosis of these technologies
from research-grade demos to accessible and easy-to-use production-grade goods and
services carries the potential to supercharge business processes and operations while enabling
entirely new deliverables heretofore rendered infeasible by economic or technological factors. It
took OpenAI’s ChatGPT, a conversational web app based on a generative (multimodal)
language model, about five days to reach one million users3 (compared to 2.5 months for
Instagram). On the business side, the Economist reports that the number of jobs mentioning AI-
related skills quadrupled from 2022 to 2023. This enthusiasm has not gone unmet by investors.
Generative AI startups reportedly raised 600% more capital in 2022 than in 20204.

Figure 1: A taxonomy of GenAI-related disciplines.

1
Karan Singh, Assistant Professor of Operations Research

Purpose and Scope

What are these new-era AI technologies? How do they function? What principles do they
operate on? What makes them different than already-hyped-up conventional machine learning
(ML) models? For what tasks is this class of technology most impactful? What future advances
might one look forward to? These are the questions this report attempts to shed some light on.
The report will also tease out how this understanding foundationally informs the best uses (and
misuses) of GenAI in applied contexts.

A word of disclaimer: this gradient of topics also means that, while the initial sections deal with
factual, if somewhat simplified, nuts-and-bolt workings of such models, the later sections delve
into hopefully reasonable, but in a manner that only time may attest to, extrapolations and
speculations, as necessitated by the developing nature of this technology and its current phase
in the technology adoption cycle.

While generative AI models come in many different shapes, utilizing varied statistical and
computational techniques to target various modalities, ranging from code and text to audio and
video, this report focuses almost exclusively on large language models (LLMs) capable of
generating novel text from textual prompts. This choice is partly due to the substantial lead
LLMs have in driving the overall usage of generative AI models5 and partly due to the centrality
of language in formulating and addressing commonplace information-processing tasks. That
said, image- and code-based GenAI models have already witnessed successful commercial
product deployment, for example, by Adobe for creating visual content and by Github as a
programming assistance tool.

Figure 2: An image- Figure 3: Based on a code-based GenAI model, OpenAI Codex,

based GenAI model, Github Copilot is a commercial tool that can generate functional
Midjourney’s response to code from specifications given as natural language. Reportedly, as
the prompt — of June 2023, it served over a million users.
“Businessman in Tokyo
amidst rush hour, his 2
gaze fixed ahead,
surrounded by a sea of
black umbrellas.”
Karan Singh, Assistant Professor of Operations Research

A Quick First Introduction to Language Models

At its core, a language model implements a simple functionality— to predict the next word (or
token) given a context window specifying preceding words. More precisely, given a context
window, a language model outputs a probability distribution over all possible words in its
vocabulary, indicating the probability with which each possible word follows the given list of
words. Upon sampling6 a guess of the next word from the said distribution, the language model
incrementally repeats this ostensibly primitive step to produce a more extensive body of text.

Figure 4: A probabilistic model predicting the next word coupled with sampling can produce
larger bodies of text.

We make two observations here:

1. Completions are random. The predicted completion, given a context window, is not
deterministic. Sampling the next word in each step from the output distribution introduces
enough randomness to permit that the predicted completions could be meaningfully
different on every fresh run. This stochasticity is why ChatGPT, for instance, can offer
varied answers for the same prompt across successive runs. Replacing the sampling step
with choosing (greedily) the most likely immediate word is known to degrade the quality of
the produced text. The randomness in responses is also desirable from a user

3
Karan Singh, Assistant Professor of Operations Research

perspective in getting varied responses. From the deployer’s perspective, this optionally
allows the model to gather user feedback regarding the quality of seemingly plausible
responses. This choice partly also contributes to hallucination in language models.

2. Initial prompt matters. Language models are conditional probabilistic models. They
produce a completion conditioned on the initial set of words. In this way, the initial context
window, termed prompt, matters crucially to the produced completion. One hallmark of
modern language models is that they keep track of the initial prompt even when
generating large bodies of text, unlike the earlier generation of models, thus producing
more coherent responses. Artful and cleverly crafted prompts can significantly improve
the quality and utility of the synthesized text. Prompt engineering7, for example, practices
that encourage the language model to solve a problem by decomposing it into
intermediate subproblems, has been known to improve the performance on logical
reasoning tasks.

Contextualizing LLMs in terms of Recent AI Advances

Although we describe the text generation procedure above, many questions still need to be
addressed: How do language models function internally? How are the output probabilities for
the next word determined? What goes into creating (and indeed using) a language model? How
are language models different from more traditional predictive models if all they do is predict the
next token?

We address these questions indirectly in the present section by taking a tour of the essential
significant developments in machine learning and artificial intelligence that have occurred in the
last decade and have fueled the creation of modern large language models.

Classical Machine Learning as Prediction Machines

We start with the most well-understood subset of machine learning techniques: supervised
learning. The central objective in supervised learning is to produce a prediction rule that predicts
well on unseen data, given enough labeled examples. For example, consider predicting house
prices from the square footage in a given zip code. Instead of creating a hand-crafted prediction
rule, the machine learning methodology advocates for choosing a prediction rule from an
expressive but non-exhaustive class of rules, such as linear predictors, that provides the best fit
on an existing collection of size-price examples. The statistically well-substantiated leap of faith
here is that we expect (or at least hope) that a parsimonious prediction rule that predicts well on
collected data, for which we know the correct answers, continues to maintain its predictive edge
on unseen data, where answers or prices are unknown. Such a predictive methodology benefits
from an abundance of labeled examples, hoping that a prediction rule learned from more
examples is more robust in that its superior predictive performance on seen data is less
ascribable to chance alone. Another example of a supervised learning task is to separate spam
from non-spam mail, given the text in email messages. Again, having more examples of spam
and non-spam emails is helpful to a supervised learning algorithm.

4
Karan Singh, Assistant Professor of Operations Research

Figure 5: Predicting house prices from square footage. Pictured is a linear

regression, an example of a supervised learning algorithm that uses extant
data to learn a linear predictor.

Characteristics common to both language models and supervised learning:

1. Predicting Well is the Yardstick. A prediction rule is good as long as it makes

reasonable predictions on average. Compared to more ambitious sub-disciplines in
statistics, any statements about causality, p-values, and recovering latent structure are
absent. We are similarly impervious to such considerations in language models. Such
simplicity of goals enables very flexible prediction rules in machine learning. Although
seeming modest in its aim, the art of machine learning has long been to cast as many
disparate problems as questions about prediction as possible. Predicting house prices
from square footage is a regular regression task. But, for reverse image captioning, is
“predicting” a (high-dimensional) image given a few words a reasonable or well-defined
classification task? Yet, this is how machine learning algorithms function.

2. Model Agnosticism. Supervised learning algorithms realize the adage that all models
are wrong, but some are useful. For example, when building the price predictor above, a
data scientist does not believe that the genuine relationship between prices and area is
linear or well-specified. Similarly, when using neural networks to predict the next word in
language models, we don’t believe that this is how Shakespeare must have employed a
neural network to compose his texts.

Yet, there are crucial differences:

5
Karan Singh, Assistant Professor of Operations Research

1. Fidelity of Seen Data vs. Unseen Data. Classical supervised learning operates on the
assumption that seen data must be representative of unseen data in a particular sense,
namely that any fixed example is equally likely to be in the seen or unseen bucket. In the
absence of temporal effects, this is reasonable for house prices. More generally,
supervised learning requires a well-curated dataset that is closely aligned with the
prediction task at hand. But, as we will see, language models are trained on vast corpora
of somewhat ruthlessly collected texts from the internet. Yet, completing a random partial
sentence from the internet is presumably not what businesses using language models
care about.

Deep Learning as Automated Representation Learning

Although useful for panel or tabular data, pre-deep-learning-era supervised algorithms struggled
to predict well when presented with visual or auditory inputs. Although the promise of machine
learning is predicated on the automation of learning, in practice, supervised learning algorithms
require carefully crafted representations of input data in which operations like additions and
multiplications, for example, for linear regression, were semantically relevant. Decades of
painstaking research in signal processing and computer vision had resulted in domain-specific
hand-crafted representations, each useful for a specific modality (images, audio, or video). The
predictive performance of ML algorithms was limited by how good such representations were.

Figure 6: A typical deep neural network for recognizing faces. Each

successive layer progressively learns higher-level representations (from
edges to contours to faces).
6
Karan Singh, Assistant Professor of Operations Research

The revolution in deep learning was to automate the process of representation learning itself.
Deep learning uses neural networks with multiple layers, each layer incrementally converting
the data into a more manageable form, all to make better predictions. This form of automated
hierarchical representation learning heralded a decade of tremendous progress in image and
speech recognition and machine translation, starting with the breakthrough work of Krizhevsky,
Sutskever, and Hinton8 in 2012 on the Imagenet challenge. Taking advantage of GPUs (a form
of shared-memory parallel computing) and the availability of a large public dataset, this seminal
work slashed the error rate for image recognition by a substantial multiple. Parallel gains were
later realized using similar deep neural network architectures in speech recognition and other
machine learning domains. In this sense, the advances deep learning enabled were (relatively)
domain agnostic.

Although deep neural networks are data-hungry in that they require a substantially large dataset
to start predicting well, they also successfully realize a long-promised advantage of neural
networks. This factor is crucial to the practice of modern-day machine learning. In the process of
hierarchically learning representations, deep nets learn task- (or label--) agnostic features of the
dataset in the lower layers, while higher layers closer to the output account for task-specific
representations. This permits us to (a) train a deep net to separate images of cats and dogs on
a large dataset and (b) subsequently build a shallow (even linear) performant neural net that
uses the lower layers of the former to craft useful representations to classify images of zebra
and giraffes. Step A is often called pre-training, and step B is referred to as supervised fine-
tuning. This manner of amortizing the learning across tasks that are not individually data-rich is
central to language models.

Word Embeddings and Contrastive Learning

While the progress of deep learning in speech and audio was made possible by the availability
of large crowd-labeled datasets (with 10s of millions of annotated images), such large high-
quality datasets were absent in the textual domain, despite a plethora of unlabelled data in the
form of books, Wikipedia articles, and articles on the internet. Could a machine learning
algorithm make use of the cheap, unlabelled data instead?

In computational linguistics, the distributional hypothesis codifies an appealing and intuitive idea
that similar words occur in similar contexts. In 2013, inspired by this observation, Mikolov et al9
trained a neural network, termed Word2Vec, to predict randomly selected words in a text corpus
given neighboring words for each. Note that this step doesn’t require any need human
annotators. They observed that the 300-dimensional vector representations the neural net
learned for words had excellent linear algebraic properties that transparently reflected the
underlying semantics. For example, one obtained Queen when queried for the word with the
vector closest to King - Man + Woman. Thus, each vector dimension captured some abstract
semantic degree of freedom. These representations were also valuable for natural classification
tasks with limited data, such as sentiment classification, given a small number of examples.

7
Karan Singh, Assistant Professor of Operations Research

Figure 7: Vector space representations of words exhibit linear algebraic

relationships between semantic units and can be used to answer analogy
questions, e.g., son - father + mother = daughter.

The approach of creating auxiliary labeling tasks for free from unlabelled data to learn
semantically relevant representation is called contrastive learning and has proved helpful in
other domains, too. For example, given a set of unlabelled images, a classifier trained to
recognize random crops from the same image as a positive match and those from distinct
images as a negative match (pre-training step) learns representations useful for supervised fine-
tuning on genuine classification tasks downstream.

Transformers mollify the Optimization Landscape

While word embeddings serve as proof that textual semantic regularities can be assessed
without labeled data, substantive language processing tasks need an algorithmic
implementation of the concept of memory to capture relationships between words that are
positionally far apart. For example, a common motif in stories is that the next act derives from
some event that occurred a while ago.

Figure 8: RNNs capture memory effects by sequentially processing

information. 8
Karan Singh, Assistant Professor of Operations Research

The first generation of neural networks that captured the notion of memory were Recurrent
Neural Networks (RNNs), by sequentially processing a piece of text one word at a time while
updating an internal state to maintain continuity, a proxy for memory. Unfortunately, optimizing
such recurrent neural nets to find one that best fits a given dataset proved extra-ordinarily error-
prone and challenging.

In 2017, Vaswani et al10 introduced a different neural network architecture, termed transformer,
that could efficiently capture long-range relations between tokens compactly (non-sequentially)
by processing the entire surrounding context window at once while remaining amenable to
gradient-based optimization. The introduction of transformers spurred a line of research on
language models, culminating in training models with an increasingly higher number of
parameters trained on ever larger datasets. For example, GPT2 (Generative Pre-trained
Transformer 2), released in 2019, is a 1.5 billion parameter model trained on 40 GB of data,
while GPT3, released in 2020, is a 175 billion parameter model trained on 570 GB of text data.
While larger models resulted in better performance, the open-market cost for training these
enormous models was estimated to be tens of millions of dollars.

Figure 9: The LLM arms race with exponentially increasing

parameter counts. (Credit: HuggingFace)

General-Purpose Language Models: Supervised Fine-tuning & GPT3

The general paradigm brought about by contrastive learning was first to learn a large model on
auxiliary tasks created using an unlabelled dataset (the pre-training step) and subsequently to
use these learned representations in a downstream supervised learning task given a few task-
specific labeled examples (the supervised fine-tuning step). While broadly useful and practical,
supervised fine-tuning requires replicas of the baseline pre-trained model for each downstream

9
Karan Singh, Assistant Professor of Operations Research

task; further, the large size of language models makes running even a few steps of gradient-
based iterative optimization for supervised learning prohibitive except on computationally
expensive hardware setups.

The paper11 describing the architecture of the GPT3 model presents a far cheaper and more
convenient way of repurposing pre-trained language models for specific downstream tasks,
namely, by specifying a few labeled examples in the prompt before asking for a label or
response for unseen data. This mode of inference, in-context learning, does not require
computationally expensive adjustments to the weights or parameters of an LLM and instead
treats the entire downstream supervised task as a prompt for the language model to complete.
This makes LLMs very attractive for end-users, who no longer have to create copies of the large
model to customize, nor do they have to run a sophisticated optimization procedure to adjust
parameters; each downstream task, in effect, becomes a conversation. While fine-tuning may
still result in additional performance gains over in-context learning for some tasks in exchange
for a massive increase in computational load, a crucial advance of GPT3 is that this
substantially lowers this gap, democratizing the use (although not the training) of LLMs.

Figure 10: An illustration of in-context learning. GPT4 figures out the

correct pattern that the answer is the first number + reverse of the
second, given two examples.

10
Karan Singh, Assistant Professor of Operations Research

Towards Conversational AI: Learning from Human Feedback

While GPT3-like models happen to be good at conversation-centered tasks, they are not
explicitly trained or incentivized to follow instructions. OpenAI’s InstructGPT model12 post pre-
training aligns the model to follow the users’ instructions by fine-tuning the model to mimic
labeled demonstrations of the desired behavior (via supervised learning) and highly-ranked
responses to prompts as collected using human feedback (via reinforcement learning).

Figure 11: While GPT3 performs text completion by guessing the

most plausible completion, InstructGPT has been explicitly
trained to follow instructions. (Credit: OpenAI’s web report)

The Future: Foundation Models

Given the success of language models, there has been increased interest in the possibility of
recreating the magic of LLMs in other domains. Such models, generically termed foundation
models, attempt to amortize the cost of limited-data downstream tasks by pre-training on large
corpora of broadly related tasks or unlabelled datasets. For example, one might be able to
repurpose the LLM paradigm to train a generalist robot or decision-making agent that learns
from supply chain operations across all industries.

Conclusion

This report contextualizes large-language models within the more extensive machine learning
and artificial intelligence landscape by training the origins of the principal ideas that fuel today’s
large language models. By bringing out their essential characteristics and differences against
traditional modes of machine learning, we hope that a user of such models can be better

11
Karan Singh, Assistant Professor of Operations Research

informed of the underlying tradeoffs such models induce, e.g., the performance-resource
tradeoffs between fine-tuning and in-context learning.

Endnotes

1See the rst table on OpenAI’s announcement for an overview of GPT4’s performance on other academic,
professional and programming exams. The quoted nine eth percen le performance on the bar exam was assessed
by Katz et al, but others have raised concerns.
2 See quotes by industry and research leaders here.
3 See ini al consumer adop on sta s cs for ChatGPT here and here.
4 See this repor ng for investments in GenAI.
5 See current and project user bases for GenAI here.
6When producing text, rather than sampling the next word incrementally, a more systema c search opera on
termed Beam Search, coined by Raj Reddy at CMU, o en yields be er results.
7 Structuring ini al text to elicit useful outputs from GenAI model is called prompt engineering.
8 See the full Krizhevshy, Sutskever, Hinton paper here.
9 See the Word2Vec paper here.
10 See the paper that introduced Transformers here.
11 See the GPT3 paper here.
12 See the instruct GPT paper here.

12
ti
fi
ti
ti
ti
ti
ti
ft
ti
tt
ti
ti
ti

Introduction To Building AI Applications With Foundation Models - AI Engineering
100% (1)
Introduction To Building AI Applications With Foundation Models - AI Engineering
32 pages
Tech Report Generative AI
100% (1)
Tech Report Generative AI
17 pages
Generative AI - A Beginner's Guide
No ratings yet
Generative AI - A Beginner's Guide
62 pages
Classroom Toolkit - Unlocking Generative AI Safely and Responsibly
No ratings yet
Classroom Toolkit - Unlocking Generative AI Safely and Responsibly
51 pages
Communication Skills Past Papers
100% (9)
Communication Skills Past Papers
10 pages
Fink Bloom Taxonomies
No ratings yet
Fink Bloom Taxonomies
5 pages
Chen. Resume. 2019
No ratings yet
Chen. Resume. 2019
2 pages
CPCS335 - Chapter 10-Final
No ratings yet
CPCS335 - Chapter 10-Final
27 pages
GPT Models
No ratings yet
GPT Models
10 pages
Generative Artificial Intelligence Fundamentals
No ratings yet
Generative Artificial Intelligence Fundamentals
43 pages
13-Generative AI For Software Practitioners
No ratings yet
13-Generative AI For Software Practitioners
9 pages
Generative AI For Software Practitioners
No ratings yet
Generative AI For Software Practitioners
9 pages
Generative-AI-on-Amazon-Web-Services-ebook
No ratings yet
Generative-AI-on-Amazon-Web-Services-ebook
33 pages
Open AI Chat GPT and Bias by OBrien and Alsmadi
No ratings yet
Open AI Chat GPT and Bias by OBrien and Alsmadi
15 pages
Generative Ai Primer
No ratings yet
Generative Ai Primer
4 pages
Introduction_GenAI_EoAI
No ratings yet
Introduction_GenAI_EoAI
69 pages
Generative AI Tutorial Apr
No ratings yet
Generative AI Tutorial Apr
8 pages
Module1_L1_L2
No ratings yet
Module1_L1_L2
35 pages
Promt Engg.
No ratings yet
Promt Engg.
14 pages
Microsoft Azure AI Fundamentals Generative AI
No ratings yet
Microsoft Azure AI Fundamentals Generative AI
66 pages
unit3sem7 generative ai
No ratings yet
unit3sem7 generative ai
41 pages
genai-overview-final-june-2024
No ratings yet
genai-overview-final-june-2024
58 pages
Introduction To Generative AI LLM
100% (1)
Introduction To Generative AI LLM
9 pages
Introduction to Gen AI
No ratings yet
Introduction to Gen AI
7 pages
Building Intelligent Applications with Azure OpenAI: End-to-End Solutions in Conversational Programming and LLMs
From Everand
Building Intelligent Applications with Azure OpenAI: End-to-End Solutions in Conversational Programming and LLMs
Aarav Joshi
No ratings yet
AI for Executive 1738738378
No ratings yet
AI for Executive 1738738378
14 pages
OCI GIA & LLM Fundations
No ratings yet
OCI GIA & LLM Fundations
11 pages
Google About Generative Ai
No ratings yet
Google About Generative Ai
17 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
AWSEducate Introduction to-Generative AI Transcript v1.Docx
No ratings yet
AWSEducate Introduction to-Generative AI Transcript v1.Docx
13 pages
Paper+26+ (2024 6 1) +Advancements+and+Applications+of+Generative +JCSTS+
No ratings yet
Paper+26+ (2024 6 1) +Advancements+and+Applications+of+Generative +JCSTS+
7 pages
What Is Natural Language Processing (NLP)
No ratings yet
What Is Natural Language Processing (NLP)
15 pages
generativeaiuni1
No ratings yet
generativeaiuni1
47 pages
IF12426.4
No ratings yet
IF12426.4
3 pages
21B21A4238 Generative AI
No ratings yet
21B21A4238 Generative AI
26 pages
Generative Ai Explained
No ratings yet
Generative Ai Explained
28 pages
2 the Power of Natural Language Processing Updated
No ratings yet
2 the Power of Natural Language Processing Updated
5 pages
Workshop Notes
No ratings yet
Workshop Notes
10 pages
Career Opportunities in Generative AI For Software Developers
No ratings yet
Career Opportunities in Generative AI For Software Developers
27 pages
ChatGPT Is Not All You Need. A State of The Art Review of Large Generative AI Models
No ratings yet
ChatGPT Is Not All You Need. A State of The Art Review of Large Generative AI Models
22 pages
Artificial INtelligence Articels
No ratings yet
Artificial INtelligence Articels
54 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
9 pages
Gradient Flow Trend 2023 Report Final
No ratings yet
Gradient Flow Trend 2023 Report Final
16 pages
Seminar
No ratings yet
Seminar
20 pages
Navigating College in the Artificial Intelligence Era
No ratings yet
Navigating College in the Artificial Intelligence Era
20 pages
Learning_and_Teaching_with_AI_tools_
No ratings yet
Learning_and_Teaching_with_AI_tools_
9 pages
Generative_AI_-_Overview[1]
No ratings yet
Generative_AI_-_Overview[1]
35 pages
Week 1 - Introduction To SDGAI
No ratings yet
Week 1 - Introduction To SDGAI
36 pages
00779778a72413121603 (1)
No ratings yet
00779778a72413121603 (1)
42 pages
Workshop AI Baker PDF
No ratings yet
Workshop AI Baker PDF
88 pages
Rise of LLM
No ratings yet
Rise of LLM
64 pages
AI Made Easy For All
No ratings yet
AI Made Easy For All
54 pages
Do Santander X Xperts Transformative Potential of Generative AI
No ratings yet
Do Santander X Xperts Transformative Potential of Generative AI
37 pages
intro-to-intelligent-apps-workshop
No ratings yet
intro-to-intelligent-apps-workshop
106 pages
Zeroheight-Design Systems in The Age of AI
No ratings yet
Zeroheight-Design Systems in The Age of AI
50 pages
An AIRevolutionfroman Open AIFull Paper 1
No ratings yet
An AIRevolutionfroman Open AIFull Paper 1
14 pages
GEN-AI-unit 3
No ratings yet
GEN-AI-unit 3
30 pages
Language Models: A Guide for the Perplexed
No ratings yet
Language Models: A Guide for the Perplexed
35 pages
Generative AI Is Exploding. These Are The Most Important Trends To Know - 1176521821
No ratings yet
Generative AI Is Exploding. These Are The Most Important Trends To Know - 1176521821
1 page
4. Introduction to Generative AI-en
No ratings yet
4. Introduction to Generative AI-en
3 pages
Generative AI and Prompt Engineering
No ratings yet
Generative AI and Prompt Engineering
36 pages
AI Unit-5
No ratings yet
AI Unit-5
34 pages
CAIpaper_AFR-v3.3-Drafted (1)
No ratings yet
CAIpaper_AFR-v3.3-Drafted (1)
8 pages
ACHS - New Registration (Update 1)
No ratings yet
ACHS - New Registration (Update 1)
27 pages
02-3 QCS 2014
No ratings yet
02-3 QCS 2014
5 pages
WLP Etech WK10
100% (1)
WLP Etech WK10
2 pages
MODULE 3 Lesson 1
No ratings yet
MODULE 3 Lesson 1
47 pages
00 General Info Me644
No ratings yet
00 General Info Me644
2 pages
Teachers' Guide
No ratings yet
Teachers' Guide
32 pages
STQA Prelim Question Paper AY 2024-25
No ratings yet
STQA Prelim Question Paper AY 2024-25
2 pages
Essay+-+Pitari+-+Severino+on+Specialization
No ratings yet
Essay+-+Pitari+-+Severino+on+Specialization
21 pages
Lesson Plan-Liceo Decroliano
No ratings yet
Lesson Plan-Liceo Decroliano
1 page
Community based Traditional Music in Scotland A Pedagogy of Participation 1st Edition Josephine L. Miller download pdf
100% (6)
Community based Traditional Music in Scotland A Pedagogy of Participation 1st Edition Josephine L. Miller download pdf
81 pages
2021 2022 Undergraduate Northwestern PDF
No ratings yet
2021 2022 Undergraduate Northwestern PDF
388 pages
List of Medical Colleges in India
100% (1)
List of Medical Colleges in India
30 pages
The Practice of Psychotherapy 506 Questions and Answers 1st Edition Lewis R. Wolberg download pdf
100% (1)
The Practice of Psychotherapy 506 Questions and Answers 1st Edition Lewis R. Wolberg download pdf
55 pages
Science - Technology - Engineering - Mathematics: Preschool-Grade 6
50% (2)
Science - Technology - Engineering - Mathematics: Preschool-Grade 6
20 pages
Unit 1 - Introduction To Translation
No ratings yet
Unit 1 - Introduction To Translation
34 pages
Unity in Diversity
No ratings yet
Unity in Diversity
9 pages
Krishnan Ramu - Entrepreneurship in Power Semiconductor Devices, Power Electronics, and Electric Machines and Drive Systems-CRC Press (2020)
No ratings yet
Krishnan Ramu - Entrepreneurship in Power Semiconductor Devices, Power Electronics, and Electric Machines and Drive Systems-CRC Press (2020)
451 pages
Luo Et Al., 2022
No ratings yet
Luo Et Al., 2022
11 pages
Tenses Revision
No ratings yet
Tenses Revision
1 page
Mixed Methods Research An Opportunity To Improve o
No ratings yet
Mixed Methods Research An Opportunity To Improve o
2 pages
ICT LP
No ratings yet
ICT LP
7 pages
A3 Solution 2024
No ratings yet
A3 Solution 2024
1 page
CHEM1214 Syllabus
No ratings yet
CHEM1214 Syllabus
7 pages
Ed102873 PDF
No ratings yet
Ed102873 PDF
554 pages
POD - F009 Home Visitation Form
No ratings yet
POD - F009 Home Visitation Form
1 page
Guidance for FSTD Operators
No ratings yet
Guidance for FSTD Operators
30 pages
Contemporary Sport Management 6th Edition (eBook PDF) All Chapters Instant Download
100% (4)
Contemporary Sport Management 6th Edition (eBook PDF) All Chapters Instant Download
46 pages

Genai Principles

Uploaded by

Genai Principles

Uploaded by

Karan Singh, Assistant Professor of Operations Research

Figure 1: A taxonomy of GenAI-related disciplines.

Purpose and Scope

Figure 2: An image- Figure 3: Based on a code-based GenAI model, OpenAI Codex,

A Quick First Introduction to Language Models

We make two observations here:

Contextualizing LLMs in terms of Recent AI Advances

Classical Machine Learning as Prediction Machines

Figure 5: Predicting house prices from square footage. Pictured is a linear

Characteristics common to both language models and supervised learning:

1. Predicting Well is the Yardstick. A prediction rule is good as long as it makes

Yet, there are crucial differences:

Deep Learning as Automated Representation Learning

Figure 6: A typical deep neural network for recognizing faces. Each

Word Embeddings and Contrastive Learning

Figure 7: Vector space representations of words exhibit linear algebraic

Transformers mollify the Optimization Landscape

Figure 8: RNNs capture memory effects by sequentially processing

Figure 9: The LLM arms race with exponentially increasing

General-Purpose Language Models: Supervised Fine-tuning & GPT3

Figure 10: An illustration of in-context learning. GPT4 figures out the

Towards Conversational AI: Learning from Human Feedback

Figure 11: While GPT3 performs text completion by guessing the

The Future: Foundation Models

You might also like