You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Fine-tune Llama 2 in Google Colab | Step-by-step guide to fine-tune your first Llama 2 model. |[Article](https://mlabonne.github.io/blog/posts/Fine_Tune_Your_Own_Llama_2_Model_in_a_Colab_Notebook.html)| <ahref="https://colab.research.google.com/drive/1PEQyJO1-f6j0S_XJ8DV50NkpzasXkrzd?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
20
+
| Fine-tune LLMs with Axolotl | End-to-end guide to the state-of-the-art tool for fine-tuning. |[Article](https://mlabonne.github.io/blog/posts/A_Beginners_Guide_to_LLM_Finetuning.html)| W.I.P. |
21
+
| Fine-tune a Mistral-7b model with DPO | Boost the performance of supervised fine-tuned models with DPO. |[Tweet](https://twitter.com/maximelabonne/status/1729936514107290022)| <ahref="https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
| 1. Introduction to Weight Quantization | Large language model optimization using 8-bit quantization. |[Article](https://mlabonne.github.io/blog/posts/Introduction_to_Weight_Quantization.html)| <ahref="https://colab.research.google.com/drive/1DPr4mUQ92Cc-xf4GgAaB6dFcFnWIvqYi?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
28
+
| 2. 4-bit LLM Quantization using GPTQ | Quantize your own open-source LLMs to run them on consumer hardware. |[Article](https://mlabonne.github.io/blog/4bit_quantization/)| <ahref="https://colab.research.google.com/drive/1lSvVDaRgqQp_mWK_jC9gydz6_-y6Aq4A?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
29
+
| 3. Quantize Llama 2 models with GGUF and llama.cpp | Quantize Llama 2 models with llama.cpp and upload GGUF versions to the HF Hub. |[Article](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html)| <ahref="https://colab.research.google.com/drive/1pL8k7m04mgE5jo2NrjGi8atB0j_37aDD?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
30
+
| 4. ExLlamaV2: The Fastest Library to Run LLMs | Quantize and run EXL2 models and upload them to the HF Hub. |[Article](https://mlabonne.github.io/blog/posts/ExLlamaV2_The_Fastest_Library_to_Run%C2%A0LLMs.html)| <ahref="https://colab.research.google.com/drive/1yrq4XBlxiA0fALtMoT2dwiACVc77PHou?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
| Decoding Strategies in Large Language Models | A guide to text generation from beam search to nucleus sampling |[Article](https://mlabonne.github.io/blog/posts/2022-06-07-Decoding_strategies.html)| <ahref="https://colab.research.google.com/drive/19CJlOS5lI29g-B3dziNn93Enez1yiHk2?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
18
37
| Visualizing GPT-2's Loss Landscape | 3D plot of the loss landscape based on weight pertubations. |[Tweet](https://twitter.com/maximelabonne/status/1667618081844219904)| <ahref="https://colab.research.google.com/drive/1Fu1jikJzFxnSPzR_V2JJyDVWWJNXssaL?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
19
38
| Improve ChatGPT with Knowledge Graphs | Augment ChatGPT's answers with knowledge graphs. |[Article](https://mlabonne.github.io/blog/posts/Article_Improve_ChatGPT_with_Knowledge_Graphs.html)| <ahref="https://colab.research.google.com/drive/1mwhOSw9Y9bgEaIFKT4CLi0n18pXRM4cj?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
20
-
| Fine-tune Llama 2 in Google Colab | Fine-tune a Llama 2 model on an HF dataset and upload it to the HF Hub. |[Tweet](https://twitter.com/maximelabonne/status/1681791164083576833)| <ahref="https://colab.research.google.com/drive/1PEQyJO1-f6j0S_XJ8DV50NkpzasXkrzd?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
21
-
| Introduction to Weight Quantization | Large language model optimization using 8-bit quantization. |[Article](https://mlabonne.github.io/blog/posts/Introduction_to_Weight_Quantization.html)| <ahref="https://colab.research.google.com/drive/1DPr4mUQ92Cc-xf4GgAaB6dFcFnWIvqYi?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
22
-
| 4-bit LLM Quantization using GPTQ | Quantize your own open-source LLMs to run them on consumer hardware. |[Article](https://mlabonne.github.io/blog/4bit_quantization/)| <ahref="https://colab.research.google.com/drive/1lSvVDaRgqQp_mWK_jC9gydz6_-y6Aq4A?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
23
-
| Quantize Llama 2 models with GGUF and llama.cpp | Quantize Llama 2 models with llama.cpp and upload GGUF to the HF Hub. |[Article](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html)| <ahref="https://colab.research.google.com/drive/1pL8k7m04mgE5jo2NrjGi8atB0j_37aDD?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
24
-
| ExLlamaV2: The Fastest Library to Run LLMs | Quantize and run EXL2 models and upload them to the HF Hub. |[Article](https://mlabonne.github.io/blog/posts/ExLlamaV2_The_Fastest_Library_to_Run%C2%A0LLMs.html)| <ahref="https://colab.research.google.com/drive/1yrq4XBlxiA0fALtMoT2dwiACVc77PHou?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
25
-
| Fine-tune a Mistral-7b model with DPO | Introduction to RLHF with PPO and DPO. |[Tweet](https://twitter.com/maximelabonne/status/1729936514107290022)| <ahref="https://colab.research.google.com/drive/1yrq4XBlxiA0fALtMoT2dwiACVc77PHou?usp=sharing"><imgsrc="images/colab.svg"alt="Open In Colab"></a> |
26
-
27
39
28
40
## 🧩 LLM Fundamentals
29
41
@@ -185,7 +197,7 @@ After supervised fine-tuning, RLHF is a step used to align the LLM's answers wit
185
197
186
198
***Preference datasets**: These datasets typically contain several answers with some kind of ranking, which makes them more difficult to produce than instruction datasets.
187
199
*[**Proximal Policy Optimization**](https://arxiv.org/abs/1707.06347): This algorithm leverages a reward model that predicts whether a given text is highly ranked by humans. This prediction is then used to optimize the SFT model with a penalty based on KL divergence.
188
-
***[Direct Preference Optimization](https://arxiv.org/abs/2305.18290)**: DPO is another RL algorithm that does not need a reward model, making the RLHF process simpler and more lightweight.
200
+
***[Direct Preference Optimization](https://arxiv.org/abs/2305.18290)**: DPO simplifies the process by reframing it as a classification problem. It uses a reference model instead of a reward model (no training needed) and only requires one hyperparameter, making it more stable and efficient.
189
201
190
202
📚 **References**:
191
203
*[An Introduction to Training LLMs using RLHF](https://wandb.ai/ayush-thakur/Intro-RLAIF/reports/An-Introduction-to-Training-LLMs-Using-Reinforcement-Learning-From-Human-Feedback-RLHF---VmlldzozMzYyNjcy) by Ayush Thakur: Explain why RLHF is desirable to reduce bias and increase performance in LLMs.
0 commit comments