LLMs Overview and OpenAI API Ver 1-8 - Final NLP Day-UM6P-Nov 2023
LLMs Overview and OpenAI API Ver 1-8 - Final NLP Day-UM6P-Nov 2023
OpenAI API
2
1. Definition of LLMs
Prompt Completion
(input) LLM (Output) The original Transformer
model described in
“Attention is All You Need”
(Vaswani et al., 2017)
“ Typically, large language models (LLMs) refer to
Transformer language models that contain
hundreds of billions (or more) of parameters*,
which are trained on massive text data [1], such as
GPT-3 [2], PaLM [3], Galactica [4], and LLaMA [5]. Model parameters size:
LLMs exhibit strong capacities to understand GPT-3 (2020):175B
natural language and solve complex tasks PaLM (2022): 540B
(via text generation).” (Wayne Xin Zhao et al., 2023)
Pre-train Data Scale:
GPT-3 (2020): 300B tokens
Prompt (NLP spell-checking task) PaLM (2022): 780B tokens
Completion
Proofread the following text and rewrite the
corrected version: GPT-3.5
"He did not did its homework."
5
Statistics of large language models (larger than 10B) (Wayne Xin Zhao et al., 2023)
6
2. Major development stages of Language Modeling Approach
• Statistical language models:
o Based on statistical learning methods (Markov assumption)
✓ LLMs show surprising abilities (called emergent abilities) in solving a series of complex tasks (typicaly, in-context learning
(present in GPT-3 and not observed in small-scale language models (e.g., BERT, GPT-2), instruction following, and step-by-
step reasoning (a.k.a. Chain-of-thought).
* Note that an LLM is not necessarily more capable than a small Pre-trained language models, and emergent abilities may not
occur in some LLMs.
7
3. Text Generation:
General working flow of an LLM predicting the next word (autoregressive model )
The way these models actually work is that after each token is produced, that token is added to the sequence of inputs, and
that new sequence becomes the input to the model in its next step. This is an idea called “auto-regression”.
8
3. Text Generation:
General working flow of an LLM predicting the next word (autoregressive model )
General working flow of an LLM predicting the next word (e.g., select the
token (word) with the highest probability) (auto-regressive models).
9
The LLM is focused on generating the next token given the sequence of tokens. The model does this in a loop appending the predicted token to the
input sequence. Then, it can generate text by predicting one word at a time. LLMs are an example of Generative AI.
3. Text Generation
A transformer model is a
neural
Input
LLM
The model first generates logits for each possible output token. Those logits then are passed to a softmax function to
generate probabilities for each possible output, giving a probability distribution over the vocabulary. Here is the softmax
equation for calculating the actual probability of a token:
𝑒 𝑙𝑜𝑔𝑖𝑡𝑘
𝑃 𝑡𝑜𝑘𝑒𝑛𝑘 𝑡𝑜𝑘𝑒𝑛 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑙𝑜𝑔𝑖𝑡𝑘 =
σ𝑗 𝑒 𝑙𝑜𝑔𝑖𝑡𝑗
Where:
- 𝑃 𝑡𝑜𝑘𝑒𝑛𝑘 𝑡𝑜𝑘𝑒𝑛 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 is the probability of 𝑡𝑜𝑘𝑒𝑛𝑘 given the context from previous tokens (𝑡𝑜𝑘𝑒𝑛1 to 𝑡𝑜𝑘𝑒𝑛𝑘−1 )
- 𝑙𝑜𝑔𝑖𝑡𝑘 is the output of the neural network
10
4. Decoding Strategy
How to select the output tokens (a.k.a. decoding strategy)?
✓ Greedy search:
A basic decoding method that predicts the most likely token at each step based on the previously generated
tokens, formally modeled as:
A transformer model is a
neural
11
Input
LLM
4. Decoding Strategy
Narrow
✓ Random sampling (sampling based methods): distribution of Broader distribution
probabilities of probabilities
• Temperature sampling:
To modulate the randomness of sampling, a practical method is to
adjust the temperature coefficient of the softmax function for
computing the probability of the j-th token over the vocabulary:
K=3 =>
cohere.com
- top-k sampling does not consider the overall possibility distribution, a constant value of k may not be
suitable for different contexts. 13
4. Decoding Strategy
• Top-p sampling
top-p sampling (a.k.a., nucleus sampling) is proposed by sampling from the smallest set having a cumulative
probability above (or equal to) p.
cohere.com
14
4.1 Practical Settings (LLM Parameters: options to control the outputs of text generation)
LLM Parameters : temperature, top_p, …
(options to control the outputs of text generation)
Prompt Completion
(input) (Output)
LLM
• While the model decides what is the most probable output, there are key parameters to consider tuning (tweaking) to
influence those probabilities for getting the best outputs of your LLM projects.
18
5.1 OpenAI Chat Completions API endpoint
API REQUEST (Creates a model response for the given prompt (chat conversation):
POST https://api.openai.com/v1/chat/completions
Application
API response: OpenAI Chat
A chat completion object, or a streamed sequence of chat completion
chunk objects if the request is streamed.
Completions API
The LM task aims to autoregressively predict the target tokens based on the preceding tokens
in a sequence. A general training objective is to maximize the following likelihood:
✓ A large language model, after pretraining (using a language modeling task), is able to provide a global
understanding of the language it is trained on.
✓ Large language models can perform hundreds of NLP tasks they were not trained for. 21
LLMs can perform
hundreds of NLP tasks
they were not trained
for.
Example: Brand
Monitoring, keep a
close eye on your
brand’s reputation
22
Ratios of various data sources in the pre-training data for existing LLMs
23
7. Types of Large Language Models
There are three types of LLMs:
▪ Base LLMs (pre-trained model checkpoints)
Model checkpoints obtained right after pre-training.
✓ A base LLM acquires the general abilities for solving
various tasks.
✓ It lacks the consideration of human values or
preferences.
24
8. LLMs utilization:
How to work with large language models?
25
8.1. Prompt, completion, and prompt engineering
Prompt Completion
LLM
Prompts involve instructions and context passed to a A completion refers to the text that is generated
language model to achieve a desired task. and returned as a result of the provided
The space or memory that is available to the prompt is prompt/input.
called the context window.
Proofread the following text and rewrite the corrected version: GPT-3.5
"He did not did its homework."
Prompt engineering:
Prompt engineering is a useful skill for AI engineers and researchers to improve and efficiently use language models.
OpenAI Guide: https://platform.openai.com/docs/guides/prompt-engineering
26
8.2. Different kinds of prompts
Prompt Completion
LLM
Classify the sentiment of
the following tweet as Sentiment: positive
positive, neutral, or
negative.
Tweet: I loved the new
Samsung Galaxy S23 Ultra
28
Zero-shot prompt
- Example: Zero-shot prompt given to GPT-3.5-turbo
Completion
Prompt
29
One-shot prompt
Prompt Completion
LLM
Classify the sentiment of
Task Sentiment: Negative
the following tweet as
description
positive, neutral, or
negative.
Tweet: I loved the new
One
Samsung Galaxy S23 Ultra
example
Sentiment: Positive
30
Few-shot prompt
Prompt
Classify the sentiment of the following tweet as
positive, neutral, or negative.
Tweet: I loved the new Samsung Galaxy S23 Ultra Completion
Sentiment: Positive
Typically, 10 to 100 shots for GPT-3 (Brown et al., 2020: GPT-3 original paper)
31
Chain of thought (CoT) prompting
Chain-of-thought prompting enables large language models to tackle complex arithmetic, commonsense, and
symbolic reasoning tasks. Chain-of-thought reasoning processes are highlighted. (Wei, Jason et al., 2023)
✓ The core concept behind CoT is that by presenting the LLM few-shot examples that include reasoning, it will
subsequently incorporate the reasoning process into its responses when addressing prompts.
32
Chain of thought (CoT) prompting: Zero-shot CoT
Examples with reasoning in the prompt = adding "Let's consider step by step" across the task.
36
8.3 OpenAI Fine-tuning API endpoint
37
9. LLM use cases, tasks, real-world applications
Large language models have numerous applications in various fields, including but not limited to:
• Language translation: LLMs can be used to translate text from one language to another.
• Question answering: LLMs can be used to answer questions based on a given context.
• Text summarization: Large language models can be used to generate summaries of text documents.
• Content creation (generation): LLMs can be used to generate content for various purposes, such as
marketing and advertising.
• Code generation
• Sentiment analysis: LLMs can be used to analyze the sentiment of text
• Chatbots
• Summarization, Essay writing
• Entity extraction
• Etc.,
✓ None of these capabilities are explicitly programmed in—they all emerge as a result of training
using language modeling task.
✓ A large language model, after pretraining (using a language modeling task), is able to provide a
global understanding of the language it is trained on.
✓ Large language models can perform hundreds of NLP tasks they were not trained for.
38
LLM-powered applications
Retrieval Augmented Generation (RAG)
✓ Some of the limitations of large language models are:
▪ The internal knowledge held by a model cuts off at the moment of pretraining,
▪ Hallucination
▪ Struggling with complex math
DeepLearning.AI (http://deeplearning.ai/ ) 39
ChatGPT (as an LLM chat model) has potentially changed how humans access information, which has been
implemented in the release of New Bing.
* https://medium.com/data-science-at-microsoft/how-large-language-models-work-91c362f5b78f 40
High Level Overview of integrating Enterprise Knowledge with LLM*
Embedding (numerical
representation of text
Text embeddings measure the relatedness of text strings.
Input text
LLM useful for other systems) Embeddings are commonly used for:
• Text Search
• Clustering (where text strings are grouped by similarity)
• Recommendations (where items with related text strings are
recommended)
• Classification, Topic clustering, Anomaly detection
• Preparing data to be fed into a machine learning model
42
11. OpenAI Embeddings API endpoint
▪ OpenAI has trained several embedding models with different dimensions and different capabilities.
OpenAI recommends using text-embedding-ada-002 model for creating text embeddings for nearly all use cases.
43
Application of LLMs to embeddings
Example of using OpenAI Embedding API in semantic search, question-answering, threat detection ...
Source: https://docs.pinecone.io/docs/openai
44
The scope of an Industry 4.0 artificial intelligence (I4.0 AI) specialist (Denis Rothman, 2022)
Foundation models, although designed with an innovative architecture, are built on top of the history of AI. As a result, an artificial intelligence
specialist’s range of skills is stretching!
46
13. Generative AI project lifecycle
• The overall life cycle of a generative AI project involving LLMs
DeepLearning.AI (http://deeplearning.ai/ )
47
Challenges and some future directions
• Transformer complexity:
=> Efficiency has become an important issue when training and making inference with long inputs.
=> Reduce the time complexity (originally to be quadratic costs) incurred by the standard self-
attention mechanism
• Emergent abilities: when and how they are obtained by LLMs are not yet clear.
48
Challenges and some future directions
• Catastrophic forgetting (challenge for neural networks => has a negative impact on LLMs)
• Task specialization: fine-tuning a LLM according to some specific tasks => can affect the general ability
of LLMs
• Alignment tuning with human values => alignment tax
• RLHF heavily relies on high-quality human feedback data from professional labelers
=> Difficult implementation in practice
49
References
● [1] M. Shanahan, Talking about large language models, CoRR, vol. abs/2212.03551, 2022.
● [2] T. B. Brown et al., Language models are few-shot learners,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020,
● [3] (A. Chowdhery et al., “Palm: Scaling language modeling with pathways,” CoRR, vol. abs/2204.02311, 2022.
● [4] R. Taylor et al., “Galactica: A large language model for science,” CoRR, vol. abs/2211.09085, 2022.
● [5] H. Touvron et al., “Scaling instruction-finetuned language models,” CoRR, vol. abs/2210.11416, 2022.
● Alto, V. 2023. Modern Generative AI with ChatGPT and OpenAI Models: Leverage the capabilities of OpenAI's LLM for productivity and innovation with GPT3 and GPT4. Packt Publishing.
● Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E. and Brynjolfsson, E., 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
● Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., et al., 2020. Language Models are Few-Shot Learners. ArXiv. /abs/2005.14165
● Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
● Dong, L., Xu, S. and Xu, B., 2018, April. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5884-5888). IEEE.
● Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv
preprint arXiv:2010.11929.
● Glaese, A., McAleese, N., Trębacz, M., Aslanides, J., Firoiu, V., Ewalds, T., Rauh, M., Weidinger, L., Chadwick, M., Thacker, P. and Campbell-Gillingham, L., 2022. Improving alignment of dialogue agents via targeted human judgements. arXiv
preprint arXiv:2209.14375.
● Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S. and Shah, M., 2022. Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s), pp.1-41.
● Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.T., Rocktäschel, T. and Riedel, S., 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information
Processing Systems, 33, pp.9459-9474.
● Li, W., Luo, H., Lin, Z., Zhang, C., Lu, Z. and Ye, D., 2023. A survey on transformers in reinforcement learning. arXiv preprint arXiv:2301.03044.
● Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C.L., Ma, J. and Fergus, R., 2021. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the
National Academy of Sciences, 118(15), p.e2016239118.
● Rothman, D. and Gulli, A., 2022. Transformers for Natural Language Processing: Build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3. Packt Publishing Ltd.
● Schwaller, P., Laino, T., Gaudin, T., Bolgar, P., Hunter, C.A., Bekas, C. and Lee, A.A., 2019. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS central science, 5(9), pp.1572-1583.
● M. Shanahan, 2022. “Talking about large language models,” CoRR, vol. abs/2212.03551, 2022.
● A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, et al., 2022. “Palm: Scaling language modeling with pathways,” CoRR, vol. abs/2204.02311, 2022.
● H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. Lachaux, T. Lacroix, et al., 2023. “Llama: Open and efficient foundation language models,” CoRR, 2023.
● Tunstall, L., Von Werra, L. and Wolf, T., 2022. Natural language processing with transformers. " O'Reilly Media, Inc."
● R. Taylor, M. Kardas, G. Cucurull, T. Scialom, et al., 2022. “Galactica: A large language model for science,” CoRR, vol. abs/2211.09085, 2022.
● Takeshi Kojima et al., 2023. Large Language Models are Zero-Shot Reasoners. https://arxiv.org/pdf/2205.11916.pdf
● Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z. and Du, Y., 2023. A survey of large language models. arXiv preprint arXiv:2303.18223.
● Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I., 2017. Attention is all you need. Advances in neural information processing systems, 30.
● Wei, Jason et al., 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. https://arxiv.org/abs/2201.11903
Webgraphy:
• DeepLearning.AI (http://deeplearning.ai/ )
• Huggingface (https://huggingface.co/docs/transformers/index)
50