0% found this document useful (0 votes)

13 views16 pages

NLP & LLM - 11

The document provides a comprehensive overview of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL), detailing their definitions, historical context, and various phases of development. It covers foundational concepts, key paradigms, and applications, including case studies like IBM's Deep Blue and Google's Neural Machine Translation. Additionally, it discusses advancements in AI, such as Generative AI and Agentic AI, along with evaluation methods and future developments in the field.

Uploaded by

saurabhtanwar7320

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views16 pages

NLP & LLM - 11

Uploaded by

saurabhtanwar7320

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Artificial Intelligence, Machine Learning, and

Deep Learning
IIT ROPAR Minor In AI
21 March, 2025

Contents
1 Introduction to AI, Machine Learning, and Deep Learning 3
1.1 AI: Mimicking Human Intelligence . . . . . . . . . . . . . . . . . 3
1.2 Machine Learning: Learning from Data without Explicit Coding 3
1.3 Deep Learning: Inspired by Human Brain, Uses Neural Networks 4

2 Phases of AI: Rule-based, Predictive, Generative, Agentic 4

2.1 Rule-based AI (1950s-1990s) . . . . . . . . . . . . . . . . . . . . . 4
2.2 Predictive AI (1990s-2010s) . . . . . . . . . . . . . . . . . . . . . 5
2.3 Generative AI (2010s-Present) . . . . . . . . . . . . . . . . . . . . 5
2.4 Agentic AI (Emerging) . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Deep Learning Basics 6

3.1 Inspiration from Human Brain Neurons . . . . . . . . . . . . . . 6
3.2 Perceptrons and Multi-layer Neural Networks . . . . . . . . . . . 6
3.3 Convolutional Neural Networks (CNN) for Image Processing . . . 7
3.4 Recurrent Neural Networks (RNN) for Sequential Data . . . . . . 7

4 Transformers and Attention Mechanism 8

4.1 Google’s ”Attention is All You Need” Paper (2017) . . . . . . . . 8
4.2 Self-attention and Parallel Processing Capabilities . . . . . . . . 9

5 Large Language Models (LLMs) 9

5.1 Training Process: Pre-training, Post-training, Reinforcement Learn-
ing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1.1 Pre-training . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1.2 Post-training (Fine-tuning) . . . . . . . . . . . . . . . . . 10
5.1.3 Reinforcement Learning from Human Feedback (RLHF) . 10
5.2 Applications and Limitations . . . . . . . . . . . . . . . . . . . . 10
5.2.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1
6 Prompt Engineering 11
6.1 Types: Zero-shot, Few-shot, Chain of Thought . . . . . . . . . . 11
6.1.1 Zero-shot Learning . . . . . . . . . . . . . . . . . . . . . . 11
6.1.2 Few-shot Learning . . . . . . . . . . . . . . . . . . . . . . 12
6.1.3 Chain of Thought (CoT) . . . . . . . . . . . . . . . . . . 12
6.2 Components: Instruction, Context, Input Data, Output Indicator 12

7 Future Developments 13
7.1 Agentic AI and Autonomous Agents . . . . . . . . . . . . . . . . 13
7.2 Debates on AI Capabilities and Potential Risks . . . . . . . . . . 14
7.2.1 Alignment and Safety . . . . . . . . . . . . . . . . . . . . 14
7.2.2 Scaling Laws and Emergent Abilities . . . . . . . . . . . . 14

8 Evaluation of LLMs 15
8.1 Code-based, Human Evaluation, LLM as Judge . . . . . . . . . . 15
8.1.1 Code-based Evaluation . . . . . . . . . . . . . . . . . . . . 15
8.1.2 Human Evaluation . . . . . . . . . . . . . . . . . . . . . . 15
8.1.3 LLM as Judge . . . . . . . . . . . . . . . . . . . . . . . . 15
8.2 Concepts like Distillation and Mixture of Experts . . . . . . . . . 16
8.2.1 Knowledge Distillation . . . . . . . . . . . . . . . . . . . . 16
8.2.2 Mixture of Experts (MoE) . . . . . . . . . . . . . . . . . . 16

2
1 Introduction to AI, Machine Learning, and
Deep Learning
1.1 AI: Mimicking Human Intelligence
Artificial Intelligence (AI) refers to systems designed to perform tasks that typ-
ically require human intelligence, such as visual perception, speech recognition,
decision-making, and language translation.

Historical Context
The term ”Artificial Intelligence” was coined by John McCarthy in 1956
at the Dartmouth Conference, which is considered the founding event
of AI as a field. Early AI systems were predominantly rule-based and
focused on symbolic reasoning.

Case Study: IBM’s Deep Blue

The 1997 chess match between IBM’s Deep Blue and world champion Garry
Kasparov represented an early milestone in AI. Deep Blue used a combination of
brute-force computation and sophisticated evaluation functions to defeat Kas-
parov, demonstrating how machines could outperform humans in specific do-
mains through different approaches than human cognition.



 Task Performance

Learning Capability
AI System = (1)


 Adaptability
Reasoning Mechanisms


1.2 Machine Learning: Learning from Data without Ex-

plicit Coding
Machine Learning (ML) is a subset of AI that focuses on building systems
that can learn from and make decisions based on data, without being explicitly
programmed for specific tasks.

f :X→Y (2)

Where X represents input data and Y represents output predictions. The

function f is learned from training data rather than being explicitly defined.
Key ML Paradigms:
• Supervised Learning: Training on labeled data
• Unsupervised Learning: Finding patterns in unlabeled data

3
• Reinforcement Learning: Learning through interaction with an envi-
ronment
Case Study: Netflix Recommendation System
Netflix employs machine learning algorithms to analyze user viewing history,
ratings, and preferences to recommend content. This system processes billions
of data points to learn patterns that predict which shows a user might enjoy,
demonstrating how ML can create personalized experiences at scale. The rec-
ommendation system combines collaborative filtering (comparing user behavior
with similar users) and content-based methods (analyzing show attributes).

1.3 Deep Learning: Inspired by Human Brain, Uses Neu-

ral Networks
Deep Learning is a subset of machine learning that uses neural networks with
multiple layers (hence ”deep”) to progressively extract higher-level features from
raw input.

y = σ(w · x + b) (3)
Where σ is an activation function, w represents weights, x is the input, and
b is a bias term.
Case Study: AlphaFold
DeepMind’s AlphaFold represents a breakthrough application of deep learn-
ing in protein structure prediction. Prior to AlphaFold, determining protein
structures was an enormously time-consuming laboratory process. AlphaFold
uses deep neural networks trained on known protein structures to predict the
three-dimensional structure of proteins from their amino acid sequences with
unprecedented accuracy, revolutionizing molecular biology and drug discovery.

2 Phases of AI: Rule-based, Predictive, Gener-

ative, Agentic
2.1 Rule-based AI (1950s-1990s)
Rule-based AI systems operate using explicitly programmed rules in the form
of if-then statements.
Listing 1: Example of Rule-Based AI in Prolog
% Facts
p a r e n t ( john , mary ) .
p a r e n t ( john , tom ) .
p a r e n t ( mary , ann ) .

% Rules
g r a n d p a r e n t (X, Z ) :− p a r e n t (X, Y) , p a r e n t (Y, Z ) .

4
Case Study: MYCIN
MYCIN was an early expert system developed at Stanford University in
the 1970s to diagnose infectious blood diseases and recommend antibiotics. It
contained approximately 600 rules that encoded the knowledge of infectious
disease experts. When tested, MYCIN performed at a level comparable to
specialists, demonstrating how explicit rules could capture expert knowledge.

2.2 Predictive AI (1990s-2010s)

Predictive AI uses statistical methods and machine learning to make predictions
based on patterns in data.
Case Study: Credit Scoring Models
Financial institutions use predictive AI to assess credit risk. These systems
analyze factors such as payment history, debt levels, and income to predict
the likelihood of loan repayment. Modern credit scoring systems use ensemble
methods combining multiple models (decision trees, logistic regression, etc.)
to improve prediction accuracy. These models have transformed lending by
enabling more objective, data-driven decisions.

2.3 Generative AI (2010s-Present)

Generative AI creates new content (text, images, audio, etc.) that resembles
human-created content.

P (xt |x<t ) = softmax(W · ht + b) (4)

Where xt is the next token to be generated, x<t represents previous tokens,

and ht is the hidden state.
Case Study: DALL-E
OpenAI’s DALL-E demonstrates the capabilities of generative AI in visual
domains. Given a text prompt like ”an astronaut riding a horse in a photoreal-
istic style,” DALL-E can generate original images that integrate these concepts.
This demonstrates how generative models can combine concepts in creative ways
never explicitly shown during training, exhibiting a form of artificial creativity.

2.4 Agentic AI (Emerging)

Agentic AI systems can operate autonomously, make decisions, and take actions
to achieve specified goals.

5
Agentic AI Framework

1. Perception: Understanding the environment

2. Planning: Determining action sequences

3. Execution: Implementing planned actions

4. Learning: Improving from experiences

Case Study: AutoGPT

AutoGPT represents an early example of agentic AI application. It combines
large language models with the ability to use tools (web search, file operations,
etc.) and maintain a memory of past actions. Given a high-level objective
like ”research the market for electric vehicles and write a report,” AutoGPT
can break this down into sub-tasks, execute them sequentially, and produce the
desired output with minimal human intervention, demonstrating autonomous
goal-directed behavior.

3 Deep Learning Basics

3.1 Inspiration from Human Brain Neurons
Artificial neural networks draw inspiration from the structure and function of
biological neurons in the human brain.

Biological Neuron → Artificial Neuron (5)

Dendrites → Input Weights (6)
Cell Body → Summation & Activation (7)
Axon → Output (8)

While artificial neurons are vast simplifications of biological neurons, they

capture the essential computational elements: receiving weighted inputs, in-
tegrating them, and producing an output if the integrated signal exceeds a
threshold.

3.2 Perceptrons and Multi-layer Neural Networks

The perceptron is the fundamental building block of neural networks.

n
X
z= wi xi + b (9)
i=1
a = σ(z) (10)

6
Where wi are weights, xi are inputs, b is bias, and σ is an activation function.
Multi-layer networks stack these units to create more complex architectures:

z[1] = W[1] x + b[1] (11)

[1] [1]
a = σ(z ) (12)
[2] [2] [1] [2]
z =W a +b (13)
[2] [2]
a = σ(z ) (14)

Case Study: XOR Problem

The XOR problem (exclusive OR) illustrates why multi-layer networks are
necessary. A single perceptron cannot solve the XOR problem because it’s
not linearly separable. However, a neural network with at least one hidden
layer can learn this function. This simple example demonstrates how adding
layers enables networks to represent increasingly complex functions and decision
boundaries.

3.3 Convolutional Neural Networks (CNN) for Image Pro-

cessing
CNNs apply convolutional operations to extract spatial features from images.

XX
(f ∗ g)(x, y) = f (m, n)g(x − m, y − n) (15)
m n

Key Components:

• Convolutional layers: Extract features using learnable filters

• Pooling layers: Reduce dimensionality while preserving important infor-
mation
• Fully connected layers: Final classification based on extracted features

Case Study: ResNet

Residual Networks (ResNet) addressed the problem of training very deep
CNNs by introducing skip connections that allow gradients to flow more easily
through the network. This innovation enabled the creation of networks with
over 100 layers that could be effectively trained. ResNet dramatically improved
image classification performance on the ImageNet dataset and became a foun-
dational architecture for many computer vision applications.

3.4 Recurrent Neural Networks (RNN) for Sequential Data

RNNs process sequential data by maintaining a hidden state that captures in-
formation from previous timesteps.

7
ht = σ(Wxh xt + Whh ht−1 + bh ) (16)
yt = σ(Why ht + by ) (17)

LSTM (Long Short-Term Memory) networks address the vanishing gra-

dient problem in traditional RNNs:

ft = σ(Wf · [ht−1 , xt ] + bf ) (18)

it = σ(Wi · [ht−1 , xt ] + bi ) (19)
C̃t = tanh(WC · [ht−1 , xt ] + bC ) (20)
Ct = ft ∗ Ct−1 + it ∗ C̃t (21)
ot = σ(Wo · [ht−1 , xt ] + bo ) (22)
ht = ot ∗ tanh(Ct ) (23)

Case Study: Neural Machine Translation

Google’s Neural Machine Translation (GNMT) system demonstrated the
power of RNNs in sequence-to-sequence learning. Prior to the transformer
architecture, GNMT used bidirectional LSTMs with attention mechanisms to
translate between languages. The system showed significant improvements over
phrase-based statistical methods, especially for grammatically complex language
pairs like English-Japanese, by capturing long-range dependencies and context.

4 Transformers and Attention Mechanism

4.1 Google’s ”Attention is All You Need” Paper (2017)
The landmark paper by Vaswani et al. introduced the transformer architecture,
which revolutionized natural language processing by eliminating recurrence and
convolutions in favor of attention mechanisms.

Transformer Architecture
Key innovations:
• Self-attention mechanism
• Positional encoding

• Multi-head attention
• Feed-forward networks in each layer

8
4.2 Self-attention and Parallel Processing Capabilities
Self-attention computes relationships between all positions in a sequence:

QK T

Attention(Q, K, V ) = softmax √ V (24)
dk

Where Q (queries), K (keys), and V (values) are derived from the input
sequence.
Multi-head attention computes attention multiple times in parallel:

MultiHead(Q, K, V ) = Concat(head1 , . . . , headh )W O (25)

Where each head performs attention with different linear projections.

Case Study: BERT
Google’s Bidirectional Encoder Representations from Transformers (BERT)
demonstrated the power of transformer architectures for language understand-
ing. BERT is pre-trained using masked language modeling and next sentence
prediction objectives on a large corpus of text. When fine-tuned on specific
tasks, BERT achieved state-of-the-art results on a wide range of natural lan-
guage understanding benchmarks. Its bidirectional attention mechanism allows
it to consider context from both directions, improving performance on tasks like
question answering and sentiment analysis.
Parallel Processing Advantage:
Unlike RNNs, which process sequences element by element, transformers
process entire sequences in parallel:

RNN Complexity = O(n) sequential operations (26)

Transformer Complexity = O(1) sequential operations (27)

This parallelization enables efficient training on modern GPU hardware, al-

lowing for much larger models.

5 Large Language Models (LLMs)

5.1 Training Process: Pre-training, Post-training, Rein-
forcement Learning
5.1.1 Pre-training
During pre-training, models learn general language patterns from vast amounts
of text.

9
Pre-training Scale

Modern LLMs are trained on:

• Hundreds of billions to trillions of tokens

• Diverse sources: books, websites, code, research papers

• Months of computation on thousands of GPUs

5.1.2 Post-training (Fine-tuning)

After pre-training, models are adapted for specific capabilities:

• Supervised Fine-tuning (SFT): Using human-created demonstrations

• Instruction Tuning: Teaching models to follow user instructions

5.1.3 Reinforcement Learning from Human Feedback (RLHF)

RLHF aligns model outputs with human preferences:

pθ (y|x)
LRLHF = Ex∼D [rϕ (x, y) − β log ] (28)
pref (y|x)

Where rϕ is a learned reward model based on human preferences, and the

second term is a KL-divergence penalty to prevent excessive deviation from the
reference model.
Case Study: ChatGPT
OpenAI’s ChatGPT illustrates the full LLM training pipeline. Starting with
a GPT architecture pre-trained on a diverse text corpus, it underwent instruc-
tion tuning to follow user directions and RLHF to align with human preferences.
This process transformed a general text prediction model into an assistant that
could respond helpfully to user queries, follow instructions, and generate more
useful, safe, and truthful responses. Its capabilities and limitations demonstrate
both the potential and challenges of current LLM technology.

5.2 Applications and Limitations

5.2.1 Applications
• Content Generation: Writing, summarization, translation
• Code Assistance: Generating, explaining, and debugging code
• Conversational AI: Customer service, digital assistants

• Information Extraction: Analyzing documents, reports

10
5.2.2 Limitations
Hallucination: LLMs can generate plausible-sounding but factually incorrect
information.

Example of Hallucination

When asked about obscure topics, LLMs may confidently generate fic-
tional information, such as inventing non-existent research papers or cre-
ating false historical events.

Knowledge Cutoff: LLMs cannot know about events after their training
data ends.

(
Comprehensive for t < tcutoff
Knowledge Access = (29)
None for t > tcutoff

Context Length: LLMs have a finite window of text they can process at
once.

Maximum Context = n tokens (30)

Where n has increased from about 2,048 in early models to 128,000+ in

recent architectures.
Case Study: Mitigating Limitations in Claude
Anthropic’s Claude demonstrates approaches to addressing LLM limitations.
To reduce hallucinations, Claude was trained using constitutional AI methods
that encourage the model to express uncertainty rather than confabulate when
asked about topics outside its knowledge base. To overcome context limitations,
Claude implements techniques for efficient context compression and retrieval, al-
lowing it to process longer documents while maintaining coherent understand-
ing.

6 Prompt Engineering
6.1 Types: Zero-shot, Few-shot, Chain of Thought
6.1.1 Zero-shot Learning
The model performs tasks without specific examples:

Listing 2: Zero-shot Prompt

C l a s s i f y the f o l l o w i n g text as e i t h e r p o s i t i v e or negative :
”The s e r v i c e a t t h i s r e s t a u r a n t was t e r r i b l e and t h e f o o d was c o l d . ”

11
6.1.2 Few-shot Learning
Providing examples helps the model understand the desired pattern:

Listing 3: Few-shot Prompt

C l a s s i f y reviews as p o s i t i v e or negative :

Review : ”Amazing f o o d and e x c e l l e n t s e r v i c e ! ”

Sentiment : P o s i t i v e

Review : ” Waited an hour and t h e f o o d was bland . ”

Sentiment : N e g a t i v e

Review : ”The ambiance was n i c e but o v e r p r i c e d f o r what you g e t . ”

Sentiment :

6.1.3 Chain of Thought (CoT)

Encouraging step-by-step reasoning improves performance on complex tasks:

Listing 4: Chain of Thought Prompt

Q u e s t i o n : I f a s t o r e has 10 a p p l e s and s e l l s 3 t o customer A and 4 t o customer B

Let ’ s t h i n k through t h i s s t e p by s t e p :
1 . The s t o r e s t a r t s with 10 a p p l e s .
2 . I t s e l l s 3 a p p l e s t o customer A, l e a v i n g 10 − 3 = 7 a p p l e s .
3 . I t s e l l s 4 a p p l e s t o customer B, l e a v i n g 7 − 4 = 3 a p p l e s .
4 . I t buys 5 more a p p l e s , g i v i n g i t 3 + 5 = 8 a p p l e s t o t a l .

T h e r e f o r e , t h e s t o r e has 8 a p p l e s now .
Case Study: GSM8K Math Problems
Research on the GSM8K benchmark (grade school math problems) demon-
strates the dramatic improvement in performance achieved through chain-of-
thought prompting. Without CoT, even large language models struggle with
multi-step reasoning problems. With CoT prompting, performance improved by
20-40 percentage points across various model sizes, highlighting how the right
prompting strategy can unlock capabilities already present in the model.

6.2 Components: Instruction, Context, Input Data, Out-

put Indicator
Effective prompts typically include:

12
Prompt Components

1. Instruction: Clear directions about the task

2. Context: Background information or constraints

3. Input Data: The specific content to process
4. Output Indicator: Format or style specifications

Example of a Structured Prompt:

Listing 5: Structured Prompt Components

# INSTRUCTION
Summarize t h e f o l l o w i n g m e d i c a l r e s e a r c h a b s t r a c t i n s i m p l e terms t h a t a p a t i e n t

# CONTEXT
This i s f o r a p a t i e n t e d u c a t i o n w e b s i t e . The a u d i e n c e has no m e d i c a l background .

# INPUT DATA
[ Research a b s t r a c t t e x t h e r e ]

# OUTPUT INDICATOR
Your summary s h o u l d be 3−5 s h o r t p a r a g r a p h s . I n c l u d e a one−s e n t e n c e ”Key Takeawa
Case Study: Legal Document Analysis
Law firms use structured prompts to extract specific information from con-
tracts. By providing clear instructions (e.g., ”Identify all payment terms and
obligations”), relevant context (e.g., ”This is for a procurement contract re-
view”), specific input data (the contract text), and output indicators (e.g.,
”Format as a table with clause references”), they achieve consistent, structured
outputs that can be directly incorporated into legal workflows, demonstrating
how well-crafted prompts can turn LLMs into specialized information extraction
tools.

7 Future Developments
7.1 Agentic AI and Autonomous Agents
Agentic AI systems combine LLMs with:

• Planning: Breaking down complex goals into subtasks

• Memory: Maintaining information across interactions

• Tool Use: Leveraging external capabilities (APIs, databases, etc.)
• Self-Improvement: Learning from successes and failures

13



Perception Module
Memory System



Agent Architecture = Planning Engine (31)

Action Execution




Learning Mechanism

Case Study: BabyAGI

BabyAGI demonstrates simple but powerful agentic capabilities. Given a
high-level task like ”Research investment opportunities in renewable energy,”
it autonomously creates subtasks, executes them in a reasonable order, utilizes
tools like web search and document analysis, and compiles findings into a coher-
ent output. While limited compared to human researchers, its ability to work
autonomously toward complex goals illustrates the direction of agent-based AI
systems.

7.2 Debates on AI Capabilities and Potential Risks

7.2.1 Alignment and Safety
As AI systems become more capable, ensuring they act in accordance with
human values becomes increasingly important:

Alignment Gap = AI Capability − Alignment Level (32)

7.2.2 Scaling Laws and Emergent Abilities

Research suggests that capabilities may emerge non-linearly as models scale:

Performance ≈ C · (Compute)α · (Data)β · (Parameters)γ (33)

Case Study: Frontier Model Research

Research by organizations like Anthropic on frontier models has revealed
surprising emergent capabilities. As models scaled beyond certain thresholds,
they suddenly demonstrated abilities not observed in smaller versions, such as
multi-step reasoning, code generation, and creative problem-solving. These dis-
continuous improvements suggest that further scaling may unlock additional
capabilities that are difficult to predict in advance, highlighting both the poten-
tial and uncertainty in continued AI advancement.

14
8 Evaluation of LLMs
8.1 Code-based, Human Evaluation, LLM as Judge
8.1.1 Code-based Evaluation
Automated metrics provide objective but limited assessment:

• BLEU, ROUGE: Lexical overlap with reference texts

• Perplexity: Probability assigned to correct tokens
• Task-specific Metrics: Accuracy, F1 score, etc.

8.1.2 Human Evaluation

Human judgments capture nuanced quality aspects:


Helpfulness


Accuracy



Human Evaluation = Safety (34)

Quality





Bias

8.1.3 LLM as Judge

Using stronger models to evaluate outputs:

Listing 6: LLM-as-Judge Prompt Template

Rate t h e q u a l i t y o f t h e f o l l o w i n g r e s p o n s e t o t h e g i v e n query :

Query : [ User query ]

Response : [ Model r e s p o n s e ]

S c o r e from 1−10 on :
− R e l e v a n c e t o query
− Factual accuracy
− Completeness
− Clarity
− Helpfulness

P r o v i d e j u s t i f i c a t i o n f o r each s c o r e .
Case Study: MMLU Benchmark
The Massive Multitask Language Understanding (MMLU) benchmark eval-
uates models across 57 subjects ranging from elementary mathematics to pro-
fessional medicine. This comprehensive evaluation reveals both strengths and

15
weaknesses in model capabilities across different domains of knowledge. Re-
cent models achieve human expert-level performance in some categories while
still struggling in others, providing a nuanced picture of progress and remaining
challenges in language model development.

8.2 Concepts like Distillation and Mixture of Experts

8.2.1 Knowledge Distillation
Transferring knowledge from larger to smaller models:

Ldistill = αLtask + (1 − α)LKD (35)

Where LKD measures the divergence between student and teacher model
outputs.

8.2.2 Mixture of Experts (MoE)

Combining specialized sub-networks:

n
X
y= g(x, i) · fi (x) (36)
i=1

Where g(x, i) is a gating function determining how much expert fi con-

tributes to the output.
Case Study: Google’s Switch Transformer
Google’s Switch Transformer demonstrated the efficiency gains possible with
MoE architectures. By using a sparse mixture of experts approach where only
a subset of experts process each input token, the model achieved performance
comparable to dense models with significantly less computation during inference.
This approach enables larger effective model sizes while maintaining reasonable
training and deployment costs, potentially offering a more efficient scaling path
than simply increasing dense model parameters.
Benefits of MoE:
• Computational efficiency through sparse activation
• Specialization of different components for different subtasks
• Capacity scaling without proportional computation increase

Parameters in MoE ≫ Parameters used per forward pass (37)

Effective Capacity ≈ Experts × Parameters per Expert (38)
Computation ≈ Active Experts × Parameters per Expert (39)

Notes - Introduction To AI, ML, DS
No ratings yet
Notes - Introduction To AI, ML, DS
61 pages
Unit 1 Full
No ratings yet
Unit 1 Full
41 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
15 pages
Lec 01 Introductionv 2024
No ratings yet
Lec 01 Introductionv 2024
127 pages
AI Unit 1 With Assignment
No ratings yet
AI Unit 1 With Assignment
60 pages
Preprints202502 0369 v1
No ratings yet
Preprints202502 0369 v1
54 pages
Aiml
No ratings yet
Aiml
101 pages
Introductiontoaiml 240919083826 24f51819
No ratings yet
Introductiontoaiml 240919083826 24f51819
105 pages
Artificial Intelligence All Unit
No ratings yet
Artificial Intelligence All Unit
41 pages
Unit 1a - Fundamentals of Deep Learning
No ratings yet
Unit 1a - Fundamentals of Deep Learning
54 pages
Deep Learning Introduction Class
No ratings yet
Deep Learning Introduction Class
46 pages
Final Unit No1
No ratings yet
Final Unit No1
40 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
29 pages
CS3492-DBMS Question Bank - Watermark
No ratings yet
CS3492-DBMS Question Bank - Watermark
23 pages
AI Basics for Beginners
No ratings yet
AI Basics for Beginners
12 pages
Unit - 1 DL
No ratings yet
Unit - 1 DL
28 pages
DL 1
No ratings yet
DL 1
54 pages
Unit I AI
No ratings yet
Unit I AI
22 pages
Unit 3-1
No ratings yet
Unit 3-1
34 pages
Artificial Intelligence and Machine Learning Question Bank
No ratings yet
Artificial Intelligence and Machine Learning Question Bank
23 pages
AI & ML: A Beginner's Guide
No ratings yet
AI & ML: A Beginner's Guide
23 pages
Unit 1
No ratings yet
Unit 1
11 pages
Chapter 2. Artificial Intelligence and Machine Learning
No ratings yet
Chapter 2. Artificial Intelligence and Machine Learning
19 pages
Ai 3RD Sem
No ratings yet
Ai 3RD Sem
20 pages
Unit 3 of AI in Marketing
No ratings yet
Unit 3 of AI in Marketing
15 pages
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
No ratings yet
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
23 pages
Ai Unit1
No ratings yet
Ai Unit1
16 pages
NOTE
No ratings yet
NOTE
13 pages
UNIT - 1 Notes
No ratings yet
UNIT - 1 Notes
28 pages
Lect 1
No ratings yet
Lect 1
17 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
9 pages
AI
No ratings yet
AI
12 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Unit-1 AI
No ratings yet
Unit-1 AI
103 pages
1 Introduction To AI 15-07-2024
No ratings yet
1 Introduction To AI 15-07-2024
63 pages
UNIT1
No ratings yet
UNIT1
11 pages
DK SeminarReport-p2 Pagenumber
No ratings yet
DK SeminarReport-p2 Pagenumber
17 pages
CS3491 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING 01 - by WWW - LearnEngineering.in
No ratings yet
CS3491 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING 01 - by WWW - LearnEngineering.in
23 pages
Unit 1-L2
No ratings yet
Unit 1-L2
22 pages
AI Last Verison
No ratings yet
AI Last Verison
8 pages
Presentation AI Lect1
No ratings yet
Presentation AI Lect1
14 pages
genai_handout(handout)
No ratings yet
genai_handout(handout)
14 pages
CTC 408, AI Fndamentals.
No ratings yet
CTC 408, AI Fndamentals.
16 pages
Artificial Intelligence & Machine Learning
No ratings yet
Artificial Intelligence & Machine Learning
14 pages
Trends 125 Week 11 20
71% (7)
Trends 125 Week 11 20
72 pages
AI Fundamentals Textbook
No ratings yet
AI Fundamentals Textbook
10 pages
Upi Fraud Detection Using Machine Learning Algorithms
No ratings yet
Upi Fraud Detection Using Machine Learning Algorithms
12 pages
57 ST CLASS 2 AI @UPSCPirates
No ratings yet
57 ST CLASS 2 AI @UPSCPirates
14 pages
IntroToAi NOTES
No ratings yet
IntroToAi NOTES
2 pages
5
No ratings yet
5
5 pages
3
No ratings yet
3
6 pages
Ai Research Paper
No ratings yet
Ai Research Paper
4 pages
Plant Disease Detection With Machine and Deep Lea-Groen Kennisnet 634493
No ratings yet
Plant Disease Detection With Machine and Deep Lea-Groen Kennisnet 634493
82 pages
Presentationn
No ratings yet
Presentationn
6 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
AI Explains AI
No ratings yet
AI Explains AI
5 pages
Minor AI
No ratings yet
Minor AI
6 pages
Deep Learning Interview Questions - Deep Learning Questions
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
21 pages
CPC' 22 Briefing
No ratings yet
CPC' 22 Briefing
57 pages
IITD DSDS F324a6de7a
No ratings yet
IITD DSDS F324a6de7a
30 pages
Final Report - Rahma Ahme (P-EM0295-23)
No ratings yet
Final Report - Rahma Ahme (P-EM0295-23)
42 pages
When AI Meets Store Layout Design A Review
No ratings yet
When AI Meets Store Layout Design A Review
24 pages
Mathematics Decision - PHD Program CDn76LG
No ratings yet
Mathematics Decision - PHD Program CDn76LG
35 pages
Generative AI Engineering Basics
No ratings yet
Generative AI Engineering Basics
25 pages
Machine Learning: Chapter 2 Clustering
No ratings yet
Machine Learning: Chapter 2 Clustering
23 pages
Expert Systems - 2023 - Khalane - Evaluating Significant Features in Context Aware Multimodal e
No ratings yet
Expert Systems - 2023 - Khalane - Evaluating Significant Features in Context Aware Multimodal e
25 pages
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
No ratings yet
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
18 pages
AI-enabled Organoids Construction, Analysis, and Application
No ratings yet
AI-enabled Organoids Construction, Analysis, and Application
24 pages
Total Quality Management
No ratings yet
Total Quality Management
24 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
Machine Learning Based Crop Recommendation System For Local Farmers of Pakistan
No ratings yet
Machine Learning Based Crop Recommendation System For Local Farmers of Pakistan
12 pages
1 s2.0 S2352864824001251 Main
No ratings yet
1 s2.0 S2352864824001251 Main
12 pages
Luan Et Al. Education Big Data and Ai
No ratings yet
Luan Et Al. Education Big Data and Ai
10 pages
Jaison Iyer 2025 Empowering Democracy A Comprehensive Analysis and Predictive Modelling of Voter Turnout in Indian
No ratings yet
Jaison Iyer 2025 Empowering Democracy A Comprehensive Analysis and Predictive Modelling of Voter Turnout in Indian
11 pages
El2015machine - What Is ML
No ratings yet
El2015machine - What Is ML
9 pages
Foundations of AI and ML Detailed Presentation
No ratings yet
Foundations of AI and ML Detailed Presentation
22 pages
Demo Course PPT - Python
No ratings yet
Demo Course PPT - Python
18 pages
MD Zaid Hussain GEN AI Engineer Resume 1
No ratings yet
MD Zaid Hussain GEN AI Engineer Resume 1
2 pages
Module 1 Introduction To AI in Project Management
No ratings yet
Module 1 Introduction To AI in Project Management
16 pages
Amazon Aif c01 Dumps by Cunningham 30-08-2024 6qa Braindumpscollection
No ratings yet
Amazon Aif c01 Dumps by Cunningham 30-08-2024 6qa Braindumpscollection
6 pages
Stress Detection Using Machine Learning Techniques
No ratings yet
Stress Detection Using Machine Learning Techniques
5 pages
Email Spam Filtering Using Machine Learning in Python Ex No: 1 Date: 20/6/25
No ratings yet
Email Spam Filtering Using Machine Learning in Python Ex No: 1 Date: 20/6/25
5 pages
Human-AI Synergy in Investment
No ratings yet
Human-AI Synergy in Investment
4 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page