0% found this document useful (0 votes)

22 views

LLMs Overview and OpenAI API Ver 1-8 - Final NLP Day-UM6P-Nov 2023

Uploaded by

aqilkaoutar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

LLMs Overview and OpenAI API Ver 1-8 - Final NLP Day-UM6P-Nov 2023

Uploaded by

aqilkaoutar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Large Language Models: Overview &

OpenAI API

El Habib NFAOUI, Ph.D.

Full Professor
Department of Computer Science,
Faculty of Sciences Dhar El Mahraz,
Sidi Mohamed Ben Abdellah University, Fez
[email protected]
Outline
• LLMs:
• Language Modeling Objective, Architectures, Decoding strategy,
• Types of LLMs, In-context learning
• Fine-tuning
• LLM use cases, LLM-powered applications
• Generative AI life cycle project
• OpenAI API
• Chat Completions API endpoint
• Fine-tuning API endpoint
• Embeddings API endpoint

2
1. Definition of LLMs
Prompt Completion
(input) LLM (Output) The original Transformer
model described in
“Attention is All You Need”
(Vaswani et al., 2017)
“ Typically, large language models (LLMs) refer to
Transformer language models that contain
hundreds of billions (or more) of parameters*,
which are trained on massive text data [1], such as
GPT-3 [2], PaLM [3], Galactica [4], and LLaMA [5]. Model parameters size:
LLMs exhibit strong capacities to understand GPT-3 (2020):175B
natural language and solve complex tasks PaLM (2022): 540B
(via text generation).” (Wayne Xin Zhao et al., 2023)
Pre-train Data Scale:
GPT-3 (2020): 300B tokens
Prompt (NLP spell-checking task) PaLM (2022): 780B tokens
Completion
Proofread the following text and rewrite the
corrected version: GPT-3.5
"He did not did its homework."

(NLP translation task)

Translate the following text into Arabic and

French:
"The poor and middle class work to money. GPT-3.5 3
The rich have money to work for them." 3
Timeline of existing large language models (larger than 10B)

(Wayne Xin Zhao et al., 2023)

4
Statistics of large language models (larger than 10B) (Wayne Xin Zhao et al., 2023)

Scaling Laws for LLMs:

▪ Chinchilla scaling law
(Google DeepMind team,
Hoffmann et al. 2022)
▪ OpenAI team (Kaplan et al.,
2020)

5
Statistics of large language models (larger than 10B) (Wayne Xin Zhao et al., 2023)

6
2. Major development stages of Language Modeling Approach
• Statistical language models:
o Based on statistical learning methods (Markov assumption)

• Neural language models:

o Characterize the probability of word sequences by neural networks, e.g., recurrent neural networks (RNNs).
o Word2vec (shallow NN)

• Pre-trained language models:

o ELMo (BiLSTM)
o BERT (based on Transformer architecture)

• Large language models* (term coined to large-sized pre-trained language models)

o Scaling pre-trained language models (scaling model/data sizes) often leads to an improved model capacity on
downstream tasks (following the scaling law).

✓ LLMs show surprising abilities (called emergent abilities) in solving a series of complex tasks (typicaly, in-context learning
(present in GPT-3 and not observed in small-scale language models (e.g., BERT, GPT-2), instruction following, and step-by-
step reasoning (a.k.a. Chain-of-thought).
* Note that an LLM is not necessarily more capable than a small Pre-trained language models, and emergent abilities may not
occur in some LLMs.
7
3. Text Generation:
General working flow of an LLM predicting the next word (autoregressive model )

The way these models actually work is that after each token is produced, that token is added to the sequence of inputs, and
that new sequence becomes the input to the model in its next step. This is an idea called “auto-regression”.

Alammar, J (2019). The Illustrated GPT-2 [Blog post].

https://jalammar.github.io/illustrated-gpt2/

8
3. Text Generation:
General working flow of an LLM predicting the next word (autoregressive model )

Output at step 1 Output at step 2 Output at step 3 Output at step 4

Proba
Word in the subset of AI
vocab
models
subset 0.075 of 0.082 AI 0.080 models 0.091
class 0.060 related 0.070 techniques 0.070 algorithms 0.075
kind 0.045 different 0.002 ….. ….. and 0.015
set 0.030 and 0.063 book 0.003 …. ….
models 0.015 text 0.009 and 0.002
coherent 0.011 …… …….. …… …….
politics 0.005
…. ….
LLM

Input at Generative Artificial Generative Artificial Generative Artificial Generative Artificial

each step Intelligence (AI) Intelligence (AI) Intelligence (AI) Intelligence (AI) refers
refers to a refers to a subset refers to a subset of to a subset of AI

General working flow of an LLM predicting the next word (e.g., select the
token (word) with the highest probability) (auto-regressive models).
9
The LLM is focused on generating the next token given the sequence of tokens. The model does this in a loop appending the predicted token to the
input sequence. Then, it can generate text by predicting one word at a time. LLMs are an example of Generative AI.
3. Text Generation

A transformer model is a
neural
Input
LLM

The model first generates logits for each possible output token. Those logits then are passed to a softmax function to
generate probabilities for each possible output, giving a probability distribution over the vocabulary. Here is the softmax
equation for calculating the actual probability of a token:

𝑒 𝑙𝑜𝑔𝑖𝑡𝑘
𝑃 𝑡𝑜𝑘𝑒𝑛𝑘 𝑡𝑜𝑘𝑒𝑛 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑙𝑜𝑔𝑖𝑡𝑘 =
σ𝑗 𝑒 𝑙𝑜𝑔𝑖𝑡𝑗

Where:
- 𝑃 𝑡𝑜𝑘𝑒𝑛𝑘 𝑡𝑜𝑘𝑒𝑛 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 is the probability of 𝑡𝑜𝑘𝑒𝑛𝑘 given the context from previous tokens (𝑡𝑜𝑘𝑒𝑛1 to 𝑡𝑜𝑘𝑒𝑛𝑘−1 )
- 𝑙𝑜𝑔𝑖𝑡𝑘 is the output of the neural network

10
4. Decoding Strategy
How to select the output tokens (a.k.a. decoding strategy)?

✓ Greedy search:
A basic decoding method that predicts the most likely token at each step based on the previously generated
tokens, formally modeled as:

A transformer model is a
neural
11
Input
LLM
4. Decoding Strategy
Narrow
✓ Random sampling (sampling based methods): distribution of Broader distribution
probabilities of probabilities
• Temperature sampling:
To modulate the randomness of sampling, a practical method is to
adjust the temperature coefficient of the softmax function for
computing the probability of the j-th token over the vocabulary:

where lj′ is the logits of each word and t is the temperature

coefficient.

Temperature is a parameter that you can access in LLMs which

essentially guides the model on how random their behaviour is, cohere.ai
which means that the temperature influences the model's
creativity.
12
4. Decoding Strategy
• Top-k sampling:
top-k sampling directly truncates the tokens with lower probability and only samples from the tokens with
the top k highest probabilities.

K=3 =>

cohere.com

- top-k sampling does not consider the overall possibility distribution, a constant value of k may not be
suitable for different contexts. 13
4. Decoding Strategy
• Top-p sampling
top-p sampling (a.k.a., nucleus sampling) is proposed by sampling from the smallest set having a cumulative
probability above (or equal to) p.

cohere.com

Generally, it is recommended to alter top-p or temperature but not both.

14
4.1 Practical Settings (LLM Parameters: options to control the outputs of text generation)
LLM Parameters : temperature, top_p, …
(options to control the outputs of text generation)

Prompt Completion
(input) (Output)
LLM

• While the model decides what is the most probable output, there are key parameters to consider tuning (tweaking) to
influence those probabilities for getting the best outputs of your LLM projects.

Category Example: OpenAI API parameter

name
(try experimenting on the
playground)

Let the model know when - max_tokens=1728

to stop: - stop=["."]
number of tokens, stop
words

Predictability vs. creativity - temperature=0.19

(decoding strategies): - top_p=1
temperature, top_p

Control the repetition - frequency_penalty=0.26

degree of generation: - presence_penalty=0.29
repetition penalty - logit_bias
15
5. OpenAI Text generation models
• OpenAI's text generation models (often called generative pre-trained transformers or large language
models) have been trained to understand natural language, code, and images. The models provide text
outputs in response to their inputs.

18
5.1 OpenAI Chat Completions API endpoint
API REQUEST (Creates a model response for the given prompt (chat conversation):
POST https://api.openai.com/v1/chat/completions

Application
API response: OpenAI Chat
A chat completion object, or a streamed sequence of chat completion
chunk objects if the request is streamed.
Completions API

Typically, a conversation is formatted as

follows:
- system message
- user messages
- assistant messages

An example API call looks as follows: (try experimenting on the playground)

Chat Completions response format

An example Chat Completions API response looks as

Example of Chat Completions Request:
follows:
from openai import OpenAI
client = OpenAI() {
"choices": [
response = client.chat.completions.create( {
model="gpt-3.5-turbo", "finish_reason": "stop",
messages=[
{
"index": 0,
"role": "system", "message": {
"content": "You are a helpful assistant. You "role": "assistant",
will be provided with a general knowledge "content": "The Los Angeles Dodgers won the
question and your task is to provide a World Series in 2020. "
concise answer." }
},
{
}
"role": "user", ],
"content": "Who won the world series in 2020?" "created": 1677664795,
} "id": "chatcmpl-7QyqpwdfhqwajicIEznoc6Q47XAyW",
], "model": "gpt-3.5-turbo-0613",
temperature=1, "object": "chat.completion",
max_tokens=256,
top_p=1,
"usage": {
frequency_penalty=0, "completion_tokens": 13,
presence_penalty=0 "prompt_tokens": 28,
) "total_tokens": 41
} 20
}
6. Pre-training LLMs (Unsupervised pre-training Task):
Causal language modeling task (autoregressive models)
• Most LLMs are developed based on the decoder-only architecture, such as GPT-3, BLOOM, Gopher, and OPT.
• During the pre-training stage, LLMs are trained using the language modeling objective on a large-scale
corpus.
• Language Modeling task (i.e., the conventional LM) is the most commonly used objective to pre-train
decoder-only LLMs (e.g., GPT-3).
Given a sequence of tokens:

The LM task aims to autoregressively predict the target tokens based on the preceding tokens
in a sequence. A general training objective is to maximize the following likelihood:

✓ A large language model, after pretraining (using a language modeling task), is able to provide a global
understanding of the language it is trained on.
✓ Large language models can perform hundreds of NLP tasks they were not trained for. 21
LLMs can perform
hundreds of NLP tasks
they were not trained
for.

Example: Brand
Monitoring, keep a
close eye on your
brand’s reputation

In this example, the

LLM performs 3 tasks
at once:
✓ Sentiment Analysis
✓ Emotion
recognition
✓ Entity extraction

22
Ratios of various data sources in the pre-training data for existing LLMs

(Wayne Xin Zhao et al., 2023)

23
7. Types of Large Language Models
There are three types of LLMs:
▪ Base LLMs (pre-trained model checkpoints)
Model checkpoints obtained right after pre-training.
✓ A base LLM acquires the general abilities for solving
various tasks.
✓ It lacks the consideration of human values or
preferences.

▪ Fine-tuned LLMs (adapting pre-trained LLMs: instruction or

alignment fine-tuned model checkpoints, - also called chat
models)
- Example: representative alignment criteria (i.e., helpful,
honest, and harmless)

▪ Specialized LLMs (adapted model checkpoints for some

specific task or domain: Healthcare, Finance, Legal, etc.).

24
8. LLMs utilization:
How to work with large language models?

25
8.1. Prompt, completion, and prompt engineering
Prompt Completion
LLM
Prompts involve instructions and context passed to a A completion refers to the text that is generated
language model to achieve a desired task. and returned as a result of the provided
The space or memory that is available to the prompt is prompt/input.
called the context window.

Prompt (NLP spell-checking task) Completion

Proofread the following text and rewrite the corrected version: GPT-3.5
"He did not did its homework."

(NLP translation task)

Translate the following text into Arabic and French:

"The poor and middle class work to money. The rich GPT-3.5
have money to work for them."
26

Prompt engineering:

Prompt engineering is a useful skill for AI engineers and researchers to improve and efficiently use language models.
OpenAI Guide: https://platform.openai.com/docs/guides/prompt-engineering
26
8.2. Different kinds of prompts

✓ Large language models can be prompted to produce

output in a few ways :
• Zero-shot prompt
• One-shot prompt
• Few-shot prompt
• CoT prompt
• Planning

✓ Fine-tuning LLM for a specific task (supervised fine-tuning)

In-context learning (zero-shot prompt, one-shot prompt, few-shot

prompt):
A typical prompting method which formulates the task description
and/or demonstrations in the form of natural language text.

Zero-shot, one-shot and few-shot, contrasted with traditional fine-tuning 27

(Brown et al., 2020: GPT-3 original paper)
Zero-shot prompt

Prompt Completion

LLM
Classify the sentiment of
the following tweet as Sentiment: positive
positive, neutral, or
negative.
Tweet: I loved the new
Samsung Galaxy S23 Ultra

28
Zero-shot prompt
- Example: Zero-shot prompt given to GPT-3.5-turbo
Completion
Prompt

Determine whether each item in the following list of emotions is

conveyed in the text below, which
is delimited with triple backticks. GPT-3.5-turbo
Give your answer as a list with labels and 0 or 1 for each label.
List of emotions: Anger, Anticipation, Disgust, Fear, Joy, Love,
Optimism, Pessimism, Sadness, Surprise, Trust, neutral
Text : ``` I am filled with jealous rage, I am feeling quite sad,
sorry for myself but I will snap out of it soon.```

- OpenAI playground (GPT-3.5-turbo)

29
One-shot prompt

Prompt Completion

LLM
Classify the sentiment of
Task Sentiment: Negative
the following tweet as
description
positive, neutral, or
negative.
Tweet: I loved the new
One
Samsung Galaxy S23 Ultra
example
Sentiment: Positive

Tweet: The design of this

laptop is bad.
Sentiment:

30
Few-shot prompt

Prompt
Classify the sentiment of the following tweet as
positive, neutral, or negative.
Tweet: I loved the new Samsung Galaxy S23 Ultra Completion
Sentiment: Positive

Tweet: The design of this laptop is bad.

Sentiment: Negative
LLM Sentiment: Positive

Tweet: The camera of this phone is good.

Sentiment:

Typically, 10 to 100 shots for GPT-3 (Brown et al., 2020: GPT-3 original paper)

31
Chain of thought (CoT) prompting

Chain-of-thought prompting enables large language models to tackle complex arithmetic, commonsense, and
symbolic reasoning tasks. Chain-of-thought reasoning processes are highlighted. (Wei, Jason et al., 2023)

✓ The core concept behind CoT is that by presenting the LLM few-shot examples that include reasoning, it will
subsequently incorporate the reasoning process into its responses when addressing prompts.
32
Chain of thought (CoT) prompting: Zero-shot CoT
Examples with reasoning in the prompt = adding "Let's consider step by step" across the task.

(Takeshi Kojima et al., 2023) 34

Fine-tuning LLM for a specific task (supervised fine-tuning)

• Fine-tuning LLMs can make them better for specific applications.

• Fine-tuning involves updating the weights of a pre-trained model by training on a supervised
dataset specific to the desired task.
• Once a model has been fine-tuned, you won't need to provide as many examples in the prompt
(few-shot).

36
8.3 OpenAI Fine-tuning API endpoint

Fine-tuning workflow using OpenAI Fine-tuning

endpoint :
• Prepare training and validation data
(formatted as a JSONL) document).
• Upload training and validation data files (use
Files endpoint)
• Fine-tune the selected model.
• Analyse the results.

Once a model finishes the fine-tuning process, it

is available to be used in production right away.

Code to start a fine-tuning job

using the OpenAI SDK

37
9. LLM use cases, tasks, real-world applications
Large language models have numerous applications in various fields, including but not limited to:

• Language translation: LLMs can be used to translate text from one language to another.
• Question answering: LLMs can be used to answer questions based on a given context.
• Text summarization: Large language models can be used to generate summaries of text documents.
• Content creation (generation): LLMs can be used to generate content for various purposes, such as
marketing and advertising.
• Code generation
• Sentiment analysis: LLMs can be used to analyze the sentiment of text
• Chatbots
• Summarization, Essay writing
• Entity extraction
• Etc.,

✓ None of these capabilities are explicitly programmed in—they all emerge as a result of training
using language modeling task.
✓ A large language model, after pretraining (using a language modeling task), is able to provide a
global understanding of the language it is trained on.
✓ Large language models can perform hundreds of NLP tasks they were not trained for.
38
LLM-powered applications
Retrieval Augmented Generation (RAG)
✓ Some of the limitations of large language models are:
▪ The internal knowledge held by a model cuts off at the moment of pretraining,
▪ Hallucination
▪ Struggling with complex math

✓ Retrieval Augmented Generation (RAG) allows an LLM to augment its knowledge at

inference time by retrieving relevant information from external data sources.
✓ One of the earliest implementations of RAG (Lewis et al., 2020)

DeepLearning.AI (http://deeplearning.ai/ ) 39
ChatGPT (as an LLM chat model) has potentially changed how humans access information, which has been
implemented in the release of New Bing.

Bing chat is an example of a search-based LLM workflow*

* https://medium.com/data-science-at-microsoft/how-large-language-models-work-91c362f5b78f 40
High Level Overview of integrating Enterprise Knowledge with LLM*

* Sachin Kulkarni, Generative AI with Enterprise Data

(https://medium.com/@Sachin.Kulkarni.NL/generative-ai-with-enterprise-data-3c81a8bffaf2) 41
10. Application of LLMs to embeddings
• LLMs can be used to provide embeddings for ML algorithms. Embedding models return a vector representation of a given
input that can be easily consumed by machine learning models and algorithms.

Embedding (numerical
representation of text
Text embeddings measure the relatedness of text strings.
Input text
LLM useful for other systems) Embeddings are commonly used for:
• Text Search
• Clustering (where text strings are grouped by similarity)
• Recommendations (where items with related text strings are
recommended)
• Classification, Topic clustering, Anomaly detection
• Preparing data to be fed into a machine learning model

42
11. OpenAI Embeddings API endpoint
▪ OpenAI has trained several embedding models with different dimensions and different capabilities.
OpenAI recommends using text-embedding-ada-002 model for creating text embeddings for nearly all use cases.

43
Application of LLMs to embeddings
Example of using OpenAI Embedding API in semantic search, question-answering, threat detection ...

Integration of OpenAI's Large Language Models with Pinecone

Source: https://docs.pinecone.io/docs/openai
44
The scope of an Industry 4.0 artificial intelligence (I4.0 AI) specialist (Denis Rothman, 2022)

Foundation models, although designed with an innovative architecture, are built on top of the history of AI. As a result, an artificial intelligence
specialist’s range of skills is stretching!
46
13. Generative AI project lifecycle
• The overall life cycle of a generative AI project involving LLMs

DeepLearning.AI (http://deeplearning.ai/ )

• Stage 1: Define the scope as accurately and narrowly as you can

• Stage 2: Decide whether to train your own model from scratch or work with an existing base model
• Stage 3: Adapt and align model (highly iterative)
• Stage 4: Application integration

47
Challenges and some future directions
• Transformer complexity:

With a sequence of length L:

=> Efficiency has become an important issue when training and making inference with long inputs.
=> Reduce the time complexity (originally to be quadratic costs) incurred by the standard self-
attention mechanism

• Emergent abilities: when and how they are obtained by LLMs are not yet clear.

48
Challenges and some future directions

• Catastrophic forgetting (challenge for neural networks => has a negative impact on LLMs)
• Task specialization: fine-tuning a LLM according to some specific tasks => can affect the general ability
of LLMs
• Alignment tuning with human values => alignment tax

• Existing prompting approaches:

• Involve considerable human efforts in the design of prompts => automatic generation of effective
prompts !
• Lack flexible task formatting methods for complex tasks requiring logic rules (e.g., numerical
computation)

• RLHF heavily relies on high-quality human feedback data from professional labelers
=> Difficult implementation in practice

49
References
● [1] M. Shanahan, Talking about large language models, CoRR, vol. abs/2212.03551, 2022.
● [2] T. B. Brown et al., Language models are few-shot learners,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020,
● [3] (A. Chowdhery et al., “Palm: Scaling language modeling with pathways,” CoRR, vol. abs/2204.02311, 2022.
● [4] R. Taylor et al., “Galactica: A large language model for science,” CoRR, vol. abs/2211.09085, 2022.
● [5] H. Touvron et al., “Scaling instruction-finetuned language models,” CoRR, vol. abs/2210.11416, 2022.
● Alto, V. 2023. Modern Generative AI with ChatGPT and OpenAI Models: Leverage the capabilities of OpenAI's LLM for productivity and innovation with GPT3 and GPT4. Packt Publishing.
● Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E. and Brynjolfsson, E., 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
● Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., et al., 2020. Language Models are Few-Shot Learners. ArXiv. /abs/2005.14165
● Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
● Dong, L., Xu, S. and Xu, B., 2018, April. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5884-5888). IEEE.
● Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv
preprint arXiv:2010.11929.
● Glaese, A., McAleese, N., Trębacz, M., Aslanides, J., Firoiu, V., Ewalds, T., Rauh, M., Weidinger, L., Chadwick, M., Thacker, P. and Campbell-Gillingham, L., 2022. Improving alignment of dialogue agents via targeted human judgements. arXiv
preprint arXiv:2209.14375.
● Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S. and Shah, M., 2022. Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s), pp.1-41.
● Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.T., Rocktäschel, T. and Riedel, S., 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information
Processing Systems, 33, pp.9459-9474.
● Li, W., Luo, H., Lin, Z., Zhang, C., Lu, Z. and Ye, D., 2023. A survey on transformers in reinforcement learning. arXiv preprint arXiv:2301.03044.
● Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C.L., Ma, J. and Fergus, R., 2021. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the
National Academy of Sciences, 118(15), p.e2016239118.
● Rothman, D. and Gulli, A., 2022. Transformers for Natural Language Processing: Build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3. Packt Publishing Ltd.
● Schwaller, P., Laino, T., Gaudin, T., Bolgar, P., Hunter, C.A., Bekas, C. and Lee, A.A., 2019. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS central science, 5(9), pp.1572-1583.
● M. Shanahan, 2022. “Talking about large language models,” CoRR, vol. abs/2212.03551, 2022.
● A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, et al., 2022. “Palm: Scaling language modeling with pathways,” CoRR, vol. abs/2204.02311, 2022.
● H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. Lachaux, T. Lacroix, et al., 2023. “Llama: Open and efficient foundation language models,” CoRR, 2023.
● Tunstall, L., Von Werra, L. and Wolf, T., 2022. Natural language processing with transformers. " O'Reilly Media, Inc."
● R. Taylor, M. Kardas, G. Cucurull, T. Scialom, et al., 2022. “Galactica: A large language model for science,” CoRR, vol. abs/2211.09085, 2022.
● Takeshi Kojima et al., 2023. Large Language Models are Zero-Shot Reasoners. https://arxiv.org/pdf/2205.11916.pdf
● Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z. and Du, Y., 2023. A survey of large language models. arXiv preprint arXiv:2303.18223.
● Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I., 2017. Attention is all you need. Advances in neural information processing systems, 30.
● Wei, Jason et al., 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. https://arxiv.org/abs/2201.11903

Webgraphy:

• DeepLearning.AI (http://deeplearning.ai/ )

• Huggingface (https://huggingface.co/docs/transformers/index)
50

(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (13)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
Generative AI Usecases - A Comprehensive Guide - Dummies
100% (1)
Generative AI Usecases - A Comprehensive Guide - Dummies
19 pages
Sub Netting Exercises Solutions
No ratings yet
Sub Netting Exercises Solutions
4 pages
MLIR Tutorial
No ratings yet
MLIR Tutorial
78 pages
7469_Magicoder_Empowering_Code
No ratings yet
7469_Magicoder_Empowering_Code
26 pages
1. LLMs for Me - Introduction LLMs & Generative Text
No ratings yet
1. LLMs for Me - Introduction LLMs & Generative Text
38 pages
Code Generation With LLMs
No ratings yet
Code Generation With LLMs
59 pages
Code Llama: Open Foundation Models For Code
No ratings yet
Code Llama: Open Foundation Models For Code
48 pages
Instant download [EARLY RELEASE] Quick Start Guide to Large Language Models: Strategies and Best Practices for using ChatGPT and Other LLMs Sinan Ozdemir pdf all chapter
100% (4)
Instant download [EARLY RELEASE] Quick Start Guide to Large Language Models: Strategies and Best Practices for using ChatGPT and Other LLMs Sinan Ozdemir pdf all chapter
66 pages
Instant ebooks textbook [EARLY RELEASE] Quick Start Guide to Large Language Models: Strategies and Best Practices for using ChatGPT and Other LLMs Sinan Ozdemir download all chapters
100% (3)
Instant ebooks textbook [EARLY RELEASE] Quick Start Guide to Large Language Models: Strategies and Best Practices for using ChatGPT and Other LLMs Sinan Ozdemir download all chapters
50 pages
LLM From Scratch
No ratings yet
LLM From Scratch
27 pages
LLaMA Open and Efficient Foundation Language Models
No ratings yet
LLaMA Open and Efficient Foundation Language Models
27 pages
Go-Tuning: Improving Zero-Shot Learning Abilities of Smaller Language Models
No ratings yet
Go-Tuning: Improving Zero-Shot Learning Abilities of Smaller Language Models
9 pages
(2) Basic AI & ML Concepts Explained _ LinkedIn
No ratings yet
(2) Basic AI & ML Concepts Explained _ LinkedIn
10 pages
4-HC24.PrimisAI.Hans_Bouwmeester.v4
No ratings yet
4-HC24.PrimisAI.Hans_Bouwmeester.v4
29 pages
2024.eacl-long.30
No ratings yet
2024.eacl-long.30
34 pages
Large Language Models Johns Hopkins University
No ratings yet
Large Language Models Johns Hopkins University
54 pages
Generative AI and LLMS
No ratings yet
Generative AI and LLMS
34 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
[10 December 2024, NeurIPS] Tutorial on Language Modeling
No ratings yet
[10 December 2024, NeurIPS] Tutorial on Language Modeling
255 pages
Generative AI Roadmap
No ratings yet
Generative AI Roadmap
36 pages
Retro Model Deep-Mind
No ratings yet
Retro Model Deep-Mind
43 pages
Train 400x faster Static Embedding Models with Sentence Transformers
No ratings yet
Train 400x faster Static Embedding Models with Sentence Transformers
47 pages
Building LLaMA 3 From Scratch With Python
No ratings yet
Building LLaMA 3 From Scratch With Python
34 pages
2023.ijcnlp Short.10
No ratings yet
2023.ijcnlp Short.10
10 pages
LLM Quantization Aware Training
No ratings yet
LLM Quantization Aware Training
15 pages
Ai 1
No ratings yet
Ai 1
22 pages
19 20-gpt-3 Prompts
No ratings yet
19 20-gpt-3 Prompts
68 pages
2023.acl Industry.34
No ratings yet
2023.acl Industry.34
14 pages
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
No ratings yet
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
285 pages
Demystifying LLMs
No ratings yet
Demystifying LLMs
53 pages
1-s2.0-S2949719124000682-main
No ratings yet
1-s2.0-S2949719124000682-main
13 pages
Few shottrainingLLMsforproject Specificcode Summarization
No ratings yet
Few shottrainingLLMsforproject Specificcode Summarization
5 pages
Generative Ai Terminology
100% (1)
Generative Ai Terminology
26 pages
List Items One by One: A New Data Source and Learning Paradigm For Multimodal Llms
No ratings yet
List Items One by One: A New Data Source and Learning Paradigm For Multimodal Llms
21 pages
The Generative AI List of Lists
No ratings yet
The Generative AI List of Lists
17 pages
Lecture 04 - Pre-trained Language Models (PLMs)
No ratings yet
Lecture 04 - Pre-trained Language Models (PLMs)
36 pages
Jay Alammar - Visualizing Machine Learning One Concept at A Time.
No ratings yet
Jay Alammar - Visualizing Machine Learning One Concept at A Time.
15 pages
Open Sees Workshop
100% (1)
Open Sees Workshop
305 pages
Slides
No ratings yet
Slides
137 pages
What Language Model Architecture and Pretraining Objective Work Best For Zero-Shot Generalization
No ratings yet
What Language Model Architecture and Pretraining Objective Work Best For Zero-Shot Generalization
26 pages
Llm Oakland2024
No ratings yet
Llm Oakland2024
19 pages
Repository Level Prompt Generation For LLMs
No ratings yet
Repository Level Prompt Generation For LLMs
21 pages
The Novice LLM Training Guide
No ratings yet
The Novice LLM Training Guide
13 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
Sujets PFE NYUAD
No ratings yet
Sujets PFE NYUAD
3 pages
Question Bank's
No ratings yet
Question Bank's
89 pages
Slides For 'Large Language Model: From Theory To Implementations', Chapter 1
No ratings yet
Slides For 'Large Language Model: From Theory To Implementations', Chapter 1
40 pages
AI Trends of May 2023 You Need To Know by Gonzalo Recio Medium
No ratings yet
AI Trends of May 2023 You Need To Know by Gonzalo Recio Medium
1 page
pythonppt-230111072927-1c7002a5
No ratings yet
pythonppt-230111072927-1c7002a5
20 pages
Intro to Large Language Models
No ratings yet
Intro to Large Language Models
45 pages
Research Paper Llama
No ratings yet
Research Paper Llama
27 pages
Java Notes Unit1&2
No ratings yet
Java Notes Unit1&2
62 pages
GEN AI
No ratings yet
GEN AI
17 pages
Harvard CS197 Lecture 4 Notes
No ratings yet
Harvard CS197 Lecture 4 Notes
15 pages
LangChain
No ratings yet
LangChain
7 pages
Download Full [EARLY RELEASE] Quick Start Guide to Large Language Models: Strategies and Best Practices for using ChatGPT and Other LLMs Sinan Ozdemir PDF All Chapters
No ratings yet
Download Full [EARLY RELEASE] Quick Start Guide to Large Language Models: Strategies and Best Practices for using ChatGPT and Other LLMs Sinan Ozdemir PDF All Chapters
50 pages
HW LM
No ratings yet
HW LM
36 pages
UER: An Open-Source Toolkit For Pre-Training Models
No ratings yet
UER: An Open-Source Toolkit For Pre-Training Models
6 pages
PWP Model Answer Summer 2022
100% (9)
PWP Model Answer Summer 2022
23 pages
Python Performance Engineering: Strategies and Patterns for Optimized Code
From Everand
Python Performance Engineering: Strategies and Patterns for Optimized Code
Aarav Joshi
No ratings yet
Paint Characterisation
100% (1)
Paint Characterisation
81 pages
The Impact of The Open' Workspace On Human Collaboration
No ratings yet
The Impact of The Open' Workspace On Human Collaboration
8 pages
2014 NYC Building Code BC1905 Concrete Quality, Mixing and Placing
No ratings yet
2014 NYC Building Code BC1905 Concrete Quality, Mixing and Placing
12 pages
AISC - Box Girder Design
No ratings yet
AISC - Box Girder Design
4 pages
PIG Interview Qusetions
No ratings yet
PIG Interview Qusetions
15 pages
Tutorial Chapter 4
No ratings yet
Tutorial Chapter 4
3 pages
Learn French For Beginners Including French Grammar French Short Stories and 1000 French Phrases PDF Eia DR Notes
No ratings yet
Learn French For Beginners Including French Grammar French Short Stories and 1000 French Phrases PDF Eia DR Notes
396 pages
Fat Soluble Vitamins - HPLC
No ratings yet
Fat Soluble Vitamins - HPLC
6 pages
ACI - Mix - Design-Very Good Step by Step
No ratings yet
ACI - Mix - Design-Very Good Step by Step
8 pages
Cusp
No ratings yet
Cusp
3 pages
Asterisk Free PBX Configuration
No ratings yet
Asterisk Free PBX Configuration
3 pages
1c-Parts of Speech
No ratings yet
1c-Parts of Speech
7 pages
2080 rm001 - en e
No ratings yet
2080 rm001 - en e
722 pages
844g90vta SX PDF
No ratings yet
844g90vta SX PDF
3 pages
Corrosion Mic &mitigation
No ratings yet
Corrosion Mic &mitigation
6 pages
Business Analytics 2nd Edition Evans Test Bank - Quickly Download For The Best Reading Experience
No ratings yet
Business Analytics 2nd Edition Evans Test Bank - Quickly Download For The Best Reading Experience
45 pages
En 13476-1 (E) (2007) PDF
100% (1)
En 13476-1 (E) (2007) PDF
29 pages
Mechanical: Operations
No ratings yet
Mechanical: Operations
222 pages
Amp PDF
No ratings yet
Amp PDF
48 pages
System Development Life Cycle: Learning Objectives
No ratings yet
System Development Life Cycle: Learning Objectives
34 pages
On The Fuel Spray Applications of Multi-Phase Eulerian CFD Techni
No ratings yet
On The Fuel Spray Applications of Multi-Phase Eulerian CFD Techni
104 pages
Tugas Termodinamika Teknik Ii
No ratings yet
Tugas Termodinamika Teknik Ii
4 pages
PDCCH Resource Allocation & Aggregation.
No ratings yet
PDCCH Resource Allocation & Aggregation.
1 page
(Civil Engineering Hydraulics) Robert H. J. Sellin B.SC., Ph.D. (Auth.) - Flow in Channels-Macmillan Education UK (1969) PDF
No ratings yet
(Civil Engineering Hydraulics) Robert H. J. Sellin B.SC., Ph.D. (Auth.) - Flow in Channels-Macmillan Education UK (1969) PDF
159 pages
Second Test Solutions
No ratings yet
Second Test Solutions
12 pages
h2s Removal Absorbents BR
No ratings yet
h2s Removal Absorbents BR
16 pages
Features in Python: 1. Easy To Code
No ratings yet
Features in Python: 1. Easy To Code
4 pages
Paper-IV at HTML
No ratings yet
Paper-IV at HTML
5 pages