Unsloth技术揭秘：如何实现AI模型微调速度的2-5倍提升？-CSDN博客

一、引言

在当今人工智能快速发展的时代，语言模型的应用越来越广泛。然而，要将预训练的语言模型应用到具体的任务中，往往需要进行微调。Unsloth 作为一个预训练模型微调框架，因其出色的性能和独特的技术特点，受到了众多开发者的关注。本文将深入探讨 Unsloth 的相关内容，帮助读者全面了解这个强大的工具。

二、Unsloth 简介

Unsloth 是一个专门为模型微调而设计的框架，它旨在解决模型微调过程中常见的训练速度慢、显存占用高等问题。通过一系列创新的技术和优化策略，Unsloth 能够显著提高模型微调的效率，使得开发者能够在更短的时间内获得更好的模型性能。
在这里插入图片描述

三、Unsloth 的主要优势

快速的训练速度 在对主流模型（如 llama - 3、qwen2、mistral 等）进行微调时，Unsloth 展现出了令人瞩目的训练速度提升。相比其他传统的微调方法，它的速度可以提高 2 至 5
倍。这意味着开发者能够更快地完成模型的训练过程，大大缩短了开发周期。例如，在处理大规模文本数据时，Unsloth
能够迅速收敛，减少了训练时间，让开发者能够更快地看到模型的效果。

低显存占用 显存占用是模型微调过程中一个关键的问题，尤其是对于一些资源有限的设备。Unsloth 巧妙地解决了这个问题，它能够减少约 70%的显存使用量。这使得即使在显存有限的硬件上，如一些中低端的 GPU
设备，也能够顺利进行模型微调训练。这一优势为更多开发者提供了机会，让他们能够在不同的硬件环境下开展工作，而不必担心硬件资源的限制。

在这里插入图片描述

四、Unsloth 的技术特点

强大的兼容性
Unsloth 支持多种硬件设置，涵盖了从 Nvidia Tesla T4 到 H100 等不同型号的 GPU。不仅如此，它还扩展到了 AMD 和英特尔 GPU 的兼容性，这为使用不同硬件的开发者提供了极大的便利。无论你使用的是哪种 GPU 设备，都可以尝试使用 Unsloth 进行模型微调。这种广泛的兼容性使得 Unsloth 能够在不同的硬件平台上发挥出其优势，为开发者提供了更多的选择。
优化的内存使用
Unsloth 采用了智能权重上投等开创性技术，在 QLoRA 过程中减少了权重上投的必要性，从而有效地优化了内存使用。通过这种方式，它能够更好地利用硬件资源，提高模型训练的效率。此外，Unsloth 还能够迅速利用 BFloat16，提高 16 位训练的稳定性，进一步加快了 QLoRA 的微调过程。这种对内存和计算资源的精细管理，使得 Unsloth 在处理大规模模型和数据时表现出色。

五、Unsloth 的使用体验

1.安装 Unsloth

安装 Unsloth 相对简单，你可以通过以下命令进行安装：pip install "unsloth(cu121 - torch230)@git + https://github.com/unslothai/unsloth.git"。当然，具体的安装命令可能会因环境和需求的不同而有所差异。在安装过程中，建议参考官方文档，以确保安装的顺利进行。

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

执行如下：
在这里插入图片描述

2.镜像设置

由于网络原因，可能无法访问huggingface上的资源，可以使用国内的镜像站。https://hf-mirror.com
1）安装依赖

!pip install -U huggingface_hub

安装如下：
在这里插入图片描述

2）设置环境变量

import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

3.模型加载

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 2x faster
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # 4bit for 405b!
    "unsloth/Mistral-Small-Instruct-2409",     # Mistral 22b 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!

    "unsloth/Llama-3.2-1B-bnb-4bit",           # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-3B-Instruct", # or choose "unsloth/Llama-3.2-1B-Instruct"
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

加载如下：
在这里插入图片描述

4.LoRA 配置

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

5.数据集准备

使用 Maxime Labonne 的 ShareGPT 风格的 FineTome-100k 数据集。
https://huggingface.co/datasets/mlabonne/FineTome-100k
在这里插入图片描述

将 (“from”, “value”)格式，替换为(“role”, “content”) 格式

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("mlabonne/FineTome-100k", split = "train")

数据集读取
在这里插入图片描述

我们现在使用standardize_sharegpt将sharegpt风格的数据集转换为HuggingFace的通用格式。

{"from": "system", "value": "You are an assistant"}
{"from": "human", "value": "What is 2+2?"}
{"from": "gpt", "value": "It's 4."}

{"role": "system", "content": "You are an assistant"}
{"role": "user", "content": "What is 2+2?"}
{"role": "assistant", "content": "It's 4."}

from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True,)

抽查第5条记录的数据格式

dataset[5]["conversations"]

输出：

[{'content': 'How do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?',
  'role': 'user'},
 {'content': 'Astronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.',
  'role': 'assistant'}]

查看第5条记录，模板格式化后的效果

dataset[5]["text"]

输出：

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|eot_id|>'

6.模型训练

配置训练参数

from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

在这里插入图片描述

使用 Unsloth 的方法只在助手输出上进行训练，而忽略用户用户的inputs

from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)

在这里插入图片描述

检查的掩码处理后的，输入的input_ids

tokenizer.decode(trainer.train_dataset[5]["input_ids"])

输出：

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|eot_id|>'

检查的掩码处理后，输入的labels

space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])

输出：

'                                                                \n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|eot_id|>'

我们可以看到系统和指令提示已成功屏蔽！

开始模型训练

trainer_stats = trainer.train()

训练效果如下：
在这里插入图片描述

7.模型推理

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True,
                         temperature = 1.5, min_p = 0.1)
tokenizer.batch_decode(outputs)

输出：

['<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nContinue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThe next two terms would be 13 and 21.\n\nFibonacci Sequence: 1, 1, 2, 3, 5, 8, 13, 21.<|eot_id|>']

8.保存微调模型

model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")

9.加载微调模型并推理

if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Describe a tall tower in the capital of France."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

推理结果如下：

The Eiffel Tower is a famous tall structure located in Paris, the capital of France. It was built for the 1889 World's Fair and stands at a height of 324 meters (1,063 feet) high. The Eiffel Tower has become a symbol of Paris and is often referred to as the Iron Lady. Its construction was designed by Gustave Eiffel, a French engineer, and it was intended to be a temporary structure. However, it has remained standing for over a century and has become an iconic landmark in the city.<|eot_id|>

六、Unsloth 在实际项目中的应用

Unsloth 的高效性和灵活性使其在众多领域都有着广泛的应用前景。

在自然语言处理任务中，如文本分类、情感分析、机器翻译等，Unsloth 可以帮助开发者快速微调预训练模型，以适应不同的数据集和任务需求。通过减少训练时间和显存占用，开发者可以更高效地进行实验和优化，提高模型的性能。

在对话系统开发中，Unsloth 能够让开发者快速训练出个性化的对话模型。通过对大规模对话数据的微调，模型可以更好地理解用户的输入，并生成更加自然和准确的回复。这对于构建智能客服、聊天机器人等应用具有重要意义。

此外，在内容生成领域，如文章写作、故事创作等方面，Unsloth 也可以发挥其优势。开发者可以利用 Unsloth 微调语言模型，使其能够根据给定的主题或提示生成高质量的文本内容。

七、总结与展望

Unsloth 作为一个强大的预训练模型微调框架，为开发者提供了高效、便捷的模型微调解决方案。它的快速训练速度、低显存占用以及广泛的兼容性等优势，使其在人工智能领域具有重要的地位。通过合理地使用 Unsloth，开发者可以更加轻松地将预训练模型应用到实际项目中，推动人工智能技术的发展和应用。

当然，Unsloth 也在不断发展和完善中。未来，我们可以期待它在更多方面的创新和突破，为模型微调带来更多的惊喜和可能性。同时，我们也希望更多的开发者能够关注和使用 Unsloth，共同探索人工智能的无限潜力。

八、相关资料地址

如果你对 Unsloth 感兴趣，想要了解更多详细信息，可以参考以下资料：
github地址：Unsloth github链接
官网地址：Unsloth 官方文档链接

在这里插入图片描述

😎 作者介绍：我是寻道AI小兵，资深程序老猿，从业10年+、互联网系统架构师，目前专注于AIGC的探索。
📖 技术交流：欢迎关注【小兵的AI视界】公众号或扫描下方👇二维码，加入技术交流群，开启编程探索之旅。
💘精心准备📚500本编程经典书籍、💎AI专业教程，以及高效AI工具。等你加入，与我们一同成长，共铸辉煌未来。
如果文章内容对您有所触动，别忘了点赞、⭐关注，收藏！加入我，让我们携手同行AI的探索之旅，一起开启智能时代的大门！