0% found this document useful (0 votes)
22 views

LLMs Overview and OpenAI API Ver 1-8 - Final NLP Day-UM6P-Nov 2023

Uploaded by

aqilkaoutar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

LLMs Overview and OpenAI API Ver 1-8 - Final NLP Day-UM6P-Nov 2023

Uploaded by

aqilkaoutar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Large Language Models: Overview &

OpenAI API

El Habib NFAOUI, Ph.D.


Full Professor
Department of Computer Science,
Faculty of Sciences Dhar El Mahraz,
Sidi Mohamed Ben Abdellah University, Fez
[email protected]
Outline
• LLMs:
• Language Modeling Objective, Architectures, Decoding strategy,
• Types of LLMs, In-context learning
• Fine-tuning
• LLM use cases, LLM-powered applications
• Generative AI life cycle project
• OpenAI API
• Chat Completions API endpoint
• Fine-tuning API endpoint
• Embeddings API endpoint

2
1. Definition of LLMs
Prompt Completion
(input) LLM (Output) The original Transformer
model described in
“Attention is All You Need”
(Vaswani et al., 2017)
“ Typically, large language models (LLMs) refer to
Transformer language models that contain
hundreds of billions (or more) of parameters*,
which are trained on massive text data [1], such as
GPT-3 [2], PaLM [3], Galactica [4], and LLaMA [5]. Model parameters size:
LLMs exhibit strong capacities to understand GPT-3 (2020):175B
natural language and solve complex tasks PaLM (2022): 540B
(via text generation).” (Wayne Xin Zhao et al., 2023)
Pre-train Data Scale:
GPT-3 (2020): 300B tokens
Prompt (NLP spell-checking task) PaLM (2022): 780B tokens
Completion
Proofread the following text and rewrite the
corrected version: GPT-3.5
"He did not did its homework."

(NLP translation task)

Translate the following text into Arabic and


French:
"The poor and middle class work to money. GPT-3.5 3
The rich have money to work for them." 3
Timeline of existing large language models (larger than 10B)

(Wayne Xin Zhao et al., 2023)


4
Statistics of large language models (larger than 10B) (Wayne Xin Zhao et al., 2023)

Scaling Laws for LLMs:


▪ Chinchilla scaling law
(Google DeepMind team,
Hoffmann et al. 2022)
▪ OpenAI team (Kaplan et al.,
2020)

5
Statistics of large language models (larger than 10B) (Wayne Xin Zhao et al., 2023)

6
2. Major development stages of Language Modeling Approach
• Statistical language models:
o Based on statistical learning methods (Markov assumption)

• Neural language models:


o Characterize the probability of word sequences by neural networks, e.g., recurrent neural networks (RNNs).
o Word2vec (shallow NN)

• Pre-trained language models:


o ELMo (BiLSTM)
o BERT (based on Transformer architecture)

• Large language models* (term coined to large-sized pre-trained language models)


o Scaling pre-trained language models (scaling model/data sizes) often leads to an improved model capacity on
downstream tasks (following the scaling law).

✓ LLMs show surprising abilities (called emergent abilities) in solving a series of complex tasks (typicaly, in-context learning
(present in GPT-3 and not observed in small-scale language models (e.g., BERT, GPT-2), instruction following, and step-by-
step reasoning (a.k.a. Chain-of-thought).
* Note that an LLM is not necessarily more capable than a small Pre-trained language models, and emergent abilities may not
occur in some LLMs.
7
3. Text Generation:
General working flow of an LLM predicting the next word (autoregressive model )

The way these models actually work is that after each token is produced, that token is added to the sequence of inputs, and
that new sequence becomes the input to the model in its next step. This is an idea called “auto-regression”.

Alammar, J (2019). The Illustrated GPT-2 [Blog post].


https://jalammar.github.io/illustrated-gpt2/

8
3. Text Generation:
General working flow of an LLM predicting the next word (autoregressive model )

Output at step 1 Output at step 2 Output at step 3 Output at step 4


Proba
Word in the subset of AI
vocab
models
subset 0.075 of 0.082 AI 0.080 models 0.091
class 0.060 related 0.070 techniques 0.070 algorithms 0.075
kind 0.045 different 0.002 ….. ….. and 0.015
set 0.030 and 0.063 book 0.003 …. ….
models 0.015 text 0.009 and 0.002
coherent 0.011 …… …….. …… …….
politics 0.005
…. ….
LLM

Input at Generative Artificial Generative Artificial Generative Artificial Generative Artificial


each step Intelligence (AI) Intelligence (AI) Intelligence (AI) Intelligence (AI) refers
refers to a refers to a subset refers to a subset of to a subset of AI

General working flow of an LLM predicting the next word (e.g., select the
token (word) with the highest probability) (auto-regressive models).
9
The LLM is focused on generating the next token given the sequence of tokens. The model does this in a loop appending the predicted token to the
input sequence. Then, it can generate text by predicting one word at a time. LLMs are an example of Generative AI.
3. Text Generation

A transformer model is a
neural
Input
LLM

The model first generates logits for each possible output token. Those logits then are passed to a softmax function to
generate probabilities for each possible output, giving a probability distribution over the vocabulary. Here is the softmax
equation for calculating the actual probability of a token:

𝑒 𝑙𝑜𝑔𝑖𝑡𝑘
𝑃 𝑡𝑜𝑘𝑒𝑛𝑘 𝑡𝑜𝑘𝑒𝑛 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑙𝑜𝑔𝑖𝑡𝑘 =
σ𝑗 𝑒 𝑙𝑜𝑔𝑖𝑡𝑗

Where:
- 𝑃 𝑡𝑜𝑘𝑒𝑛𝑘 𝑡𝑜𝑘𝑒𝑛 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 is the probability of 𝑡𝑜𝑘𝑒𝑛𝑘 given the context from previous tokens (𝑡𝑜𝑘𝑒𝑛1 to 𝑡𝑜𝑘𝑒𝑛𝑘−1 )
- 𝑙𝑜𝑔𝑖𝑡𝑘 is the output of the neural network

10
4. Decoding Strategy
How to select the output tokens (a.k.a. decoding strategy)?

✓ Greedy search:
A basic decoding method that predicts the most likely token at each step based on the previously generated
tokens, formally modeled as:

A transformer model is a
neural
11
Input
LLM
4. Decoding Strategy
Narrow
✓ Random sampling (sampling based methods): distribution of Broader distribution
probabilities of probabilities
• Temperature sampling:
To modulate the randomness of sampling, a practical method is to
adjust the temperature coefficient of the softmax function for
computing the probability of the j-th token over the vocabulary:

where lj′ is the logits of each word and t is the temperature


coefficient.

Temperature is a parameter that you can access in LLMs which


essentially guides the model on how random their behaviour is, cohere.ai
which means that the temperature influences the model's
creativity.
12
4. Decoding Strategy
• Top-k sampling:
top-k sampling directly truncates the tokens with lower probability and only samples from the tokens with
the top k highest probabilities.

K=3 =>

cohere.com

- top-k sampling does not consider the overall possibility distribution, a constant value of k may not be
suitable for different contexts. 13
4. Decoding Strategy
• Top-p sampling
top-p sampling (a.k.a., nucleus sampling) is proposed by sampling from the smallest set having a cumulative
probability above (or equal to) p.

cohere.com

Generally, it is recommended to alter top-p or temperature but not both.

14
4.1 Practical Settings (LLM Parameters: options to control the outputs of text generation)
LLM Parameters : temperature, top_p, …
(options to control the outputs of text generation)

Prompt Completion
(input) (Output)
LLM

• While the model decides what is the most probable output, there are key parameters to consider tuning (tweaking) to
influence those probabilities for getting the best outputs of your LLM projects.

Category Example: OpenAI API parameter


name
(try experimenting on the
playground)

Let the model know when - max_tokens=1728


to stop: - stop=["."]
number of tokens, stop
words

Predictability vs. creativity - temperature=0.19


(decoding strategies): - top_p=1
temperature, top_p

Control the repetition - frequency_penalty=0.26


degree of generation: - presence_penalty=0.29
repetition penalty - logit_bias
15
5. OpenAI Text generation models
• OpenAI's text generation models (often called generative pre-trained transformers or large language
models) have been trained to understand natural language, code, and images. The models provide text
outputs in response to their inputs.

18
5.1 OpenAI Chat Completions API endpoint
API REQUEST (Creates a model response for the given prompt (chat conversation):
POST https://api.openai.com/v1/chat/completions

Application
API response: OpenAI Chat
A chat completion object, or a streamed sequence of chat completion
chunk objects if the request is streamed.
Completions API

Typically, a conversation is formatted as


follows:
- system message
- user messages
- assistant messages

An example API call looks as follows: (try experimenting on the playground)


Chat Completions response format

An example Chat Completions API response looks as


Example of Chat Completions Request:
follows:
from openai import OpenAI
client = OpenAI() {
"choices": [
response = client.chat.completions.create( {
model="gpt-3.5-turbo", "finish_reason": "stop",
messages=[
{
"index": 0,
"role": "system", "message": {
"content": "You are a helpful assistant. You "role": "assistant",
will be provided with a general knowledge "content": "The Los Angeles Dodgers won the
question and your task is to provide a World Series in 2020. "
concise answer." }
},
{
}
"role": "user", ],
"content": "Who won the world series in 2020?" "created": 1677664795,
} "id": "chatcmpl-7QyqpwdfhqwajicIEznoc6Q47XAyW",
], "model": "gpt-3.5-turbo-0613",
temperature=1, "object": "chat.completion",
max_tokens=256,
top_p=1,
"usage": {
frequency_penalty=0, "completion_tokens": 13,
presence_penalty=0 "prompt_tokens": 28,
) "total_tokens": 41
} 20
}
6. Pre-training LLMs (Unsupervised pre-training Task):
Causal language modeling task (autoregressive models)
• Most LLMs are developed based on the decoder-only architecture, such as GPT-3, BLOOM, Gopher, and OPT.
• During the pre-training stage, LLMs are trained using the language modeling objective on a large-scale
corpus.
• Language Modeling task (i.e., the conventional LM) is the most commonly used objective to pre-train
decoder-only LLMs (e.g., GPT-3).
Given a sequence of tokens:

The LM task aims to autoregressively predict the target tokens based on the preceding tokens
in a sequence. A general training objective is to maximize the following likelihood:

✓ A large language model, after pretraining (using a language modeling task), is able to provide a global
understanding of the language it is trained on.
✓ Large language models can perform hundreds of NLP tasks they were not trained for. 21
LLMs can perform
hundreds of NLP tasks
they were not trained
for.

Example: Brand
Monitoring, keep a
close eye on your
brand’s reputation

In this example, the


LLM performs 3 tasks
at once:
✓ Sentiment Analysis
✓ Emotion
recognition
✓ Entity extraction

22
Ratios of various data sources in the pre-training data for existing LLMs

(Wayne Xin Zhao et al., 2023)

23
7. Types of Large Language Models
There are three types of LLMs:
▪ Base LLMs (pre-trained model checkpoints)
Model checkpoints obtained right after pre-training.
✓ A base LLM acquires the general abilities for solving
various tasks.
✓ It lacks the consideration of human values or
preferences.

▪ Fine-tuned LLMs (adapting pre-trained LLMs: instruction or


alignment fine-tuned model checkpoints, - also called chat
models)
- Example: representative alignment criteria (i.e., helpful,
honest, and harmless)

▪ Specialized LLMs (adapted model checkpoints for some


specific task or domain: Healthcare, Finance, Legal, etc.).

24
8. LLMs utilization:
How to work with large language models?

25
8.1. Prompt, completion, and prompt engineering
Prompt Completion
LLM
Prompts involve instructions and context passed to a A completion refers to the text that is generated
language model to achieve a desired task. and returned as a result of the provided
The space or memory that is available to the prompt is prompt/input.
called the context window.

Prompt (NLP spell-checking task) Completion

Proofread the following text and rewrite the corrected version: GPT-3.5
"He did not did its homework."

(NLP translation task)

Translate the following text into Arabic and French:


"The poor and middle class work to money. The rich GPT-3.5
have money to work for them."
26

Prompt engineering:

Prompt engineering is a useful skill for AI engineers and researchers to improve and efficiently use language models.
OpenAI Guide: https://platform.openai.com/docs/guides/prompt-engineering
26
8.2. Different kinds of prompts

✓ Large language models can be prompted to produce


output in a few ways :
• Zero-shot prompt
• One-shot prompt
• Few-shot prompt
• CoT prompt
• Planning

✓ Fine-tuning LLM for a specific task (supervised fine-tuning)

In-context learning (zero-shot prompt, one-shot prompt, few-shot


prompt):
A typical prompting method which formulates the task description
and/or demonstrations in the form of natural language text.

Zero-shot, one-shot and few-shot, contrasted with traditional fine-tuning 27


(Brown et al., 2020: GPT-3 original paper)
Zero-shot prompt

Prompt Completion

LLM
Classify the sentiment of
the following tweet as Sentiment: positive
positive, neutral, or
negative.
Tweet: I loved the new
Samsung Galaxy S23 Ultra

28
Zero-shot prompt
- Example: Zero-shot prompt given to GPT-3.5-turbo
Completion
Prompt

Determine whether each item in the following list of emotions is


conveyed in the text below, which
is delimited with triple backticks. GPT-3.5-turbo
Give your answer as a list with labels and 0 or 1 for each label.
List of emotions: Anger, Anticipation, Disgust, Fear, Joy, Love,
Optimism, Pessimism, Sadness, Surprise, Trust, neutral
Text : ``` I am filled with jealous rage, I am feeling quite sad,
sorry for myself but I will snap out of it soon.```

- OpenAI playground (GPT-3.5-turbo)

29
One-shot prompt

Prompt Completion

LLM
Classify the sentiment of
Task Sentiment: Negative
the following tweet as
description
positive, neutral, or
negative.
Tweet: I loved the new
One
Samsung Galaxy S23 Ultra
example
Sentiment: Positive

Tweet: The design of this


laptop is bad.
Sentiment:

30
Few-shot prompt

Prompt
Classify the sentiment of the following tweet as
positive, neutral, or negative.
Tweet: I loved the new Samsung Galaxy S23 Ultra Completion
Sentiment: Positive

Tweet: The design of this laptop is bad.


Sentiment: Negative
LLM Sentiment: Positive

Tweet: The camera of this phone is good.


Sentiment:

Typically, 10 to 100 shots for GPT-3 (Brown et al., 2020: GPT-3 original paper)

31
Chain of thought (CoT) prompting

Chain-of-thought prompting enables large language models to tackle complex arithmetic, commonsense, and
symbolic reasoning tasks. Chain-of-thought reasoning processes are highlighted. (Wei, Jason et al., 2023)

✓ The core concept behind CoT is that by presenting the LLM few-shot examples that include reasoning, it will
subsequently incorporate the reasoning process into its responses when addressing prompts.
32
Chain of thought (CoT) prompting: Zero-shot CoT
Examples with reasoning in the prompt = adding "Let's consider step by step" across the task.

(Takeshi Kojima et al., 2023) 34


Fine-tuning LLM for a specific task (supervised fine-tuning)

• Fine-tuning LLMs can make them better for specific applications.


• Fine-tuning involves updating the weights of a pre-trained model by training on a supervised
dataset specific to the desired task.
• Once a model has been fine-tuned, you won't need to provide as many examples in the prompt
(few-shot).

36
8.3 OpenAI Fine-tuning API endpoint

Fine-tuning workflow using OpenAI Fine-tuning


endpoint :
• Prepare training and validation data
(formatted as a JSONL) document).
• Upload training and validation data files (use
Files endpoint)
• Fine-tune the selected model.
• Analyse the results.

Once a model finishes the fine-tuning process, it


is available to be used in production right away.

Code to start a fine-tuning job


using the OpenAI SDK

37
9. LLM use cases, tasks, real-world applications
Large language models have numerous applications in various fields, including but not limited to:

• Language translation: LLMs can be used to translate text from one language to another.
• Question answering: LLMs can be used to answer questions based on a given context.
• Text summarization: Large language models can be used to generate summaries of text documents.
• Content creation (generation): LLMs can be used to generate content for various purposes, such as
marketing and advertising.
• Code generation
• Sentiment analysis: LLMs can be used to analyze the sentiment of text
• Chatbots
• Summarization, Essay writing
• Entity extraction
• Etc.,

✓ None of these capabilities are explicitly programmed in—they all emerge as a result of training
using language modeling task.
✓ A large language model, after pretraining (using a language modeling task), is able to provide a
global understanding of the language it is trained on.
✓ Large language models can perform hundreds of NLP tasks they were not trained for.
38
LLM-powered applications
Retrieval Augmented Generation (RAG)
✓ Some of the limitations of large language models are:
▪ The internal knowledge held by a model cuts off at the moment of pretraining,
▪ Hallucination
▪ Struggling with complex math

✓ Retrieval Augmented Generation (RAG) allows an LLM to augment its knowledge at


inference time by retrieving relevant information from external data sources.
✓ One of the earliest implementations of RAG (Lewis et al., 2020)

DeepLearning.AI (http://deeplearning.ai/ ) 39
ChatGPT (as an LLM chat model) has potentially changed how humans access information, which has been
implemented in the release of New Bing.

Bing chat is an example of a search-based LLM workflow*

* https://medium.com/data-science-at-microsoft/how-large-language-models-work-91c362f5b78f 40
High Level Overview of integrating Enterprise Knowledge with LLM*

* Sachin Kulkarni, Generative AI with Enterprise Data


(https://medium.com/@Sachin.Kulkarni.NL/generative-ai-with-enterprise-data-3c81a8bffaf2) 41
10. Application of LLMs to embeddings
• LLMs can be used to provide embeddings for ML algorithms. Embedding models return a vector representation of a given
input that can be easily consumed by machine learning models and algorithms.

Embedding (numerical
representation of text
Text embeddings measure the relatedness of text strings.
Input text
LLM useful for other systems) Embeddings are commonly used for:
• Text Search
• Clustering (where text strings are grouped by similarity)
• Recommendations (where items with related text strings are
recommended)
• Classification, Topic clustering, Anomaly detection
• Preparing data to be fed into a machine learning model

42
11. OpenAI Embeddings API endpoint
▪ OpenAI has trained several embedding models with different dimensions and different capabilities.
OpenAI recommends using text-embedding-ada-002 model for creating text embeddings for nearly all use cases.

43
Application of LLMs to embeddings
Example of using OpenAI Embedding API in semantic search, question-answering, threat detection ...

Integration of OpenAI's Large Language Models with Pinecone

Source: https://docs.pinecone.io/docs/openai
44
The scope of an Industry 4.0 artificial intelligence (I4.0 AI) specialist (Denis Rothman, 2022)

Foundation models, although designed with an innovative architecture, are built on top of the history of AI. As a result, an artificial intelligence
specialist’s range of skills is stretching!
46
13. Generative AI project lifecycle
• The overall life cycle of a generative AI project involving LLMs

DeepLearning.AI (http://deeplearning.ai/ )

• Stage 1: Define the scope as accurately and narrowly as you can


• Stage 2: Decide whether to train your own model from scratch or work with an existing base model
• Stage 3: Adapt and align model (highly iterative)
• Stage 4: Application integration

47
Challenges and some future directions
• Transformer complexity:

With a sequence of length L:

=> Efficiency has become an important issue when training and making inference with long inputs.
=> Reduce the time complexity (originally to be quadratic costs) incurred by the standard self-
attention mechanism

• Emergent abilities: when and how they are obtained by LLMs are not yet clear.

48
Challenges and some future directions

• Catastrophic forgetting (challenge for neural networks => has a negative impact on LLMs)
• Task specialization: fine-tuning a LLM according to some specific tasks => can affect the general ability
of LLMs
• Alignment tuning with human values => alignment tax

• Existing prompting approaches:


• Involve considerable human efforts in the design of prompts => automatic generation of effective
prompts !
• Lack flexible task formatting methods for complex tasks requiring logic rules (e.g., numerical
computation)

• RLHF heavily relies on high-quality human feedback data from professional labelers
=> Difficult implementation in practice

49
References
● [1] M. Shanahan, Talking about large language models, CoRR, vol. abs/2212.03551, 2022.
● [2] T. B. Brown et al., Language models are few-shot learners,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020,
● [3] (A. Chowdhery et al., “Palm: Scaling language modeling with pathways,” CoRR, vol. abs/2204.02311, 2022.
● [4] R. Taylor et al., “Galactica: A large language model for science,” CoRR, vol. abs/2211.09085, 2022.
● [5] H. Touvron et al., “Scaling instruction-finetuned language models,” CoRR, vol. abs/2210.11416, 2022.
● Alto, V. 2023. Modern Generative AI with ChatGPT and OpenAI Models: Leverage the capabilities of OpenAI's LLM for productivity and innovation with GPT3 and GPT4. Packt Publishing.
● Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E. and Brynjolfsson, E., 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
● Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., et al., 2020. Language Models are Few-Shot Learners. ArXiv. /abs/2005.14165
● Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
● Dong, L., Xu, S. and Xu, B., 2018, April. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5884-5888). IEEE.
● Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv
preprint arXiv:2010.11929.
● Glaese, A., McAleese, N., Trębacz, M., Aslanides, J., Firoiu, V., Ewalds, T., Rauh, M., Weidinger, L., Chadwick, M., Thacker, P. and Campbell-Gillingham, L., 2022. Improving alignment of dialogue agents via targeted human judgements. arXiv
preprint arXiv:2209.14375.
● Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S. and Shah, M., 2022. Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s), pp.1-41.
● Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.T., Rocktäschel, T. and Riedel, S., 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information
Processing Systems, 33, pp.9459-9474.
● Li, W., Luo, H., Lin, Z., Zhang, C., Lu, Z. and Ye, D., 2023. A survey on transformers in reinforcement learning. arXiv preprint arXiv:2301.03044.
● Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C.L., Ma, J. and Fergus, R., 2021. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the
National Academy of Sciences, 118(15), p.e2016239118.
● Rothman, D. and Gulli, A., 2022. Transformers for Natural Language Processing: Build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3. Packt Publishing Ltd.
● Schwaller, P., Laino, T., Gaudin, T., Bolgar, P., Hunter, C.A., Bekas, C. and Lee, A.A., 2019. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS central science, 5(9), pp.1572-1583.
● M. Shanahan, 2022. “Talking about large language models,” CoRR, vol. abs/2212.03551, 2022.
● A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, et al., 2022. “Palm: Scaling language modeling with pathways,” CoRR, vol. abs/2204.02311, 2022.
● H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. Lachaux, T. Lacroix, et al., 2023. “Llama: Open and efficient foundation language models,” CoRR, 2023.
● Tunstall, L., Von Werra, L. and Wolf, T., 2022. Natural language processing with transformers. " O'Reilly Media, Inc."
● R. Taylor, M. Kardas, G. Cucurull, T. Scialom, et al., 2022. “Galactica: A large language model for science,” CoRR, vol. abs/2211.09085, 2022.
● Takeshi Kojima et al., 2023. Large Language Models are Zero-Shot Reasoners. https://arxiv.org/pdf/2205.11916.pdf
● Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z. and Du, Y., 2023. A survey of large language models. arXiv preprint arXiv:2303.18223.
● Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I., 2017. Attention is all you need. Advances in neural information processing systems, 30.
● Wei, Jason et al., 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. https://arxiv.org/abs/2201.11903

Webgraphy:

• DeepLearning.AI (http://deeplearning.ai/ )

• Huggingface (https://huggingface.co/docs/transformers/index)
50

You might also like