多图理解，更懂中文，支持function call的Phi-3.5来了！

最新推荐文章于 2025-07-20 22:09:34 发布

原创最新推荐文章于 2025-07-20 22:09:34 发布 · 2.1k 阅读

20 ·

CC 4.0 BY-SA版权

文章标签：

#pdf #人工智能 #开源 #javascript #开发语言

01.引言

微软继今年4月推出Phi-3系列小型语言模型后，又一鼓作气三连发布并开源其「小而美」系列 Phi-3.5模型！

本次发布的三个模型各有特色：

Mini型：Phi-3.5-mini-instruct(3.8B)

Phi-3.5 mini 具有 38 亿个参数，基于Phi-3 的数据集（合成数据和经过筛选的公开网站）构建，重点关注高质量、推理密集的数据。该模型属于 Phi-3 模型系列，支持 128K 令牌上下文长度。该模型经过了严格的增强过程，结合了监督微调、近端策略优化和直接偏好优化，以确保精确遵守指令和强大的安全措施。Phi-3.5 mini 在中文场景有所增强，但是受限于模型的大小，依然会有较多的事实错误，通过RAG的方式可以有效降低错误。

MoE型：Phi-3.5-MoE-instruct (16x3.8B)

Phi-3.5-MoE-instruct是一个MoE模型，有 16x3.8B 个参数，使用 2 位专家时有 6.6B 个活动参数。该模型使用词汇量为 32,064 的标记器。Phi-3.5-MoE-instruct在推理能力上大大增强（尤其是数学和逻辑），也非常适用于function call的场景。

多模态：Phi-3.5-vision-instruct (4.2B)

Phi-3.5-vision-instruct 多模态版本可支持 128K 上下文长度（以 token 为单位）有 4.2B 参数，主要包含图像编码器和 Phi-3 Mini 语言模型。本次Phi-3.5-vision-instruct 支持多图理解，在如下场景上有较好的效果：

一般图像理解；
光学字符识别 (OCR)
图表和表格理解；
多幅图像比较；
多图像或视频片段摘要

同时魔搭社区已经上线Phi-3.5-mini-instruct-GGUF，可更加方便的使用ollama，llama.cpp，lmstudio等工具运行。

模型链接：

Phi-3.5-mini-instruct：

https://modelscope.cn/models/LLM-Research/Phi-3.5-mini-instruct

Phi-3.5-MoE-instruct：

https://modelscope.cn/models/LLM-Research/Phi-3.5-MoE-instruct

Phi-3.5-vision-instruct ：

https://modelscope.cn/models/LLM-Research/Phi-3.5-vision-instruct

Phi-3.5-mini-instruct-GGUF：

https://modelscope.cn/models/LLM-Research/Phi-3.5-mini-instruct-GGUF

cookbook链接：

https://github.com/microsoft/Phi-3CookBook

02.模型推理

Phi-3.5-mini-instruct

小模型Phi-3.5-mini-instruct在中文能力上有更好的支持。

import torch``from modelscope import AutoModelForCausalLM, AutoTokenizer``from transformers import pipeline``   ``torch.random.manual_seed(0)``   ``model = AutoModelForCausalLM.from_pretrained(`    `"LLM-Research/Phi-3.5-mini-instruct",``     device_map="cuda",  ``     torch_dtype="auto",  ``trust_remote_code=True,` `)``tokenizer = AutoTokenizer.from_pretrained("LLM-Research/Phi-3.5-mini-instruct")``   ``messages = "<|system|>\n 你是我的人工智能助手，协助我用中文解答问题.\n<|end|><|user|>\n 你知道长沙吗？? \n<|end|><|assistant|>"``   ``pipe = pipeline(`    `"text-generation",`    `model=model,`    `tokenizer=tokenizer,``)``   ``generation_args = {`    `"max_new_tokens": 500,`    `"return_full_text": False,`    `"temperature": 0.0,`    `"do_sample": False,``}``   ``output = pipe(messages, **generation_args)``print(output[0]['generated_text'])

Phi-3.5-vision-instruct

多模态模型Phi-3.5-vision-instruct支持了多图理解

from PIL import Image` `import requests` `from transformers import AutoModelForCausalLM` `from transformers import AutoProcessor` `from modelscope import snapshot_download``   ``model_id = snapshot_download("LLM-Research/Phi-3.5-vision-instruct")``   ``# Note: set _attn_implementation='eager' if you don't have flash_attn installed``model = AutoModelForCausalLM.from_pretrained(`  `model_id,``   device_map="cuda",  ``   trust_remote_code=True,  ``   torch_dtype="auto",  ``_attn_implementation='flash_attention_2'`    `)``   ``# for best performance, use num_crops=4 for multi-frame, num_crops=16 for single-frame.``processor = AutoProcessor.from_pretrained(model_id, ``   trust_remote_code=True,  ``  num_crops=4``)` `   ``images = []``placeholder = ""``   ``# Note: if OOM, you might consider reduce number of frames in this example.``for i in range(1,20):`    `url = f"https://modelscope.oss-cn-beijing.aliyuncs.com/resource/Introduction-to-Microsoft-Azure-Cloud-{i}-2048.webp"``    images.append(Image.open(requests.get(url, stream=True).raw))`    `placeholder += f"<|image_{i}|>\n"``   ``messages = [`    `{"role": "user", "content": placeholder+"Summarize the deck of slides."},``]``   ``prompt = processor.tokenizer.apply_chat_template(`  `messages,``   tokenize=False,  ``  add_generation_prompt=True``)``   ``inputs = processor(prompt, images, return_tensors="pt").to("cuda:0")` `   ``generation_args = { ``     "max_new_tokens": 1000,  ``     "temperature": 0.0,  ``"do_sample": False,` `}` `   ``generate_ids = model.generate(**inputs, ``   eos_token_id=processor.tokenizer.eos_token_id,  ``  **generation_args``)``   ``# remove input tokens` `generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]``response = processor.batch_decode(generate_ids, ``   skip_special_tokens=True,  ``clean_up_tokenization_spaces=False)[0]` `   ``print(response)

Phi-3.5-MoE-instruct

Phi-3.5-MoE-instruct模型推理能力更强，本文演示的为agent场景

`import torch``from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline` `from modelscope import snapshot_download``model_dir = snapshot_download("LLM-Research/Phi-3.5-MoE-instruct")``torch.random.manual_seed(0)` `   ``model = AutoModelForCausalLM.from_pretrained( ``     model_dir,   ``     device_map="cuda",   ``     torch_dtype="auto",   ``trust_remote_code=True,`  `)` `   ``tokenizer = AutoTokenizer.from_pretrained(model_dir)` `   ``   ``pipe = pipeline( ``     "text-generation",  ``     model=model,  ``tokenizer=tokenizer,` `)` `   ``generation_args = { ``     "max_new_tokens": 500,  ``     "return_full_text": False,  ``     "temperature": 0.0,  ``"do_sample": False,` `}`

设置system message

sys_msg = """You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:``   ``- Blog: This tool helps you describe a certain knowledge point and content, and finally write it into Twitter or Facebook style content``- Translate: This is a tool that helps you translate into any language, using plain language as required``   ```To use these tools you must always respond in JSON format containing `"tool_name"` and `"input"` key-value pairs. For example, to answer the question, "Build Muliti Agents with MOE models" you must use the calculator tool like so:````````json ``   ``{`    `"tool_name": "Blog",`    `"input": "Build Muliti Agents with MOE models"``}``   `` ```````Or to translate the question "can you introduce yourself in Chinese" you must respond:``   `` ```json ``   ``{`    `"tool_name": "Search",`    `"input": "can you introduce yourself in Chinese"``}``   `` ````````Remember just output the final result, ouput in JSON format containing `"agentid"`,`"tool_name"` , `"input"` and `"output"`  key-value pairs .:````````json ``   ``[``   ``   ``{   "agentid": "step1",`    `"tool_name": "Blog",`    `"input": "Build Muliti Agents with MOE models",`    `"output": "........."``},``   ``{   "agentid": "step2",`    `"tool_name": "Search",`    `"input": "can you introduce yourself in Chinese",`    `"output": "........."``},``{`    `"agentid": "final"`    `"tool_name": "Result,`    `"output": "........."``}``]``   `` ```````The users answer is as follows.``"""

def instruction_format(sys_message: str, query: str):`    `# note, don't "</s>" to the end`    `return f'<|system|> {sys_message} <|end|>\n<|user|> {query} <|end|>\n<|assistant|>'

query ='Write something about Generative AI with MOE , translate it to Chinese'``input_prompt = instruction_format(sys_msg, query)``   ``import os``   ``os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True "``output = pipe(input_prompt, **generation_args)``output[0]['generated_text']

显存占用：

Phi-3.5-mini-instruct-GGUF

使用Ollama本地运行Phi-3.5-mini-instruct-GGUF

Linux环境使用

下载GGUF模型：

modelscope download --model=LLM-Research/Phi-3.5-mini-instruct-GGUF --local_dir . Phi-3.5-mini-instruct-Q5_K_M.gguf

Liunx用户可使用魔搭镜像环境安装【推荐】

modelscope download --model=modelscope/ollama-linux --local_dir ./ollama-linux``cd ollama-linux``sudo chmod 777 ./ollama-modelscope-install.sh``./ollama-modelscope-install.sh

启动Ollama服务

ollama serve

创建ModelFile

复制模型路径，创建名为“ModelFile”的meta文件，内容如下：

FROM /mnt/workspace/Phi-3.5-mini-instruct-Q5_K_M.gguf``TEMPLATE """``{{ if .System }}<|system|>``   ``{{ .System }}<|end|>``   ``{{ end }}{{ if .Prompt }}<|user|>``   ``{{ .Prompt }}<|end|>``   ``{{ end }}<|assistant|>``   ``{{ .Response }}<|end|>"""

创建自定义模型

使用ollama create命令创建自定义模型

ollama create myphi3_5 --file ./Modelfile

运行模型

ollama run myphi3_5

显存占用：

03.模型微调

我们使用ms-swift对LLM: Phi-3.5-mini-instruct, VLM: Phi-3.5-vision-instruct进行微调。swift是魔搭社区官方提供的大模型与多模态大模型微调推理框架。

ms-swift开源地址：https://github.com/modelscope/ms-swift

环境准备

git clone https://github.com/modelscope/swift.git``cd swift``pip install -e .[llm]``   ``# 可选, 对phi3_5-mini-instruct进行推理加速.``pip install vllm

LLM微调

这里我们使用alpaca-zh, alpaca-en作为示例数据集，展示可运行的demo。

您可以在modelscope上这两个数据集：

alpaca-zh:

https://modelscope.cn/datasets/AI-ModelScope/alpaca-gpt4-data-zh
alpaca-en:

https://modelscope.cn/datasets/AI-ModelScope/alpaca-gpt4-data-en

自定义数据集参考：https://swift.readthedocs.io/zh-cn/latest/LLM/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.html

微调脚本：

# 显存占用: 4 * 11GB``# 以下脚本分别采样alpaca-zh, alpaca-en数据集20000条``# 更多超参数含义可以查看文档``CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \`  `--model_type phi3_5-mini-instruct \`  `--model_id_or_path LLM-Research/Phi-3.5-mini-instruct \`  `--sft_type lora \`  `--learning_rate 1e-4 \`  `--output_dir output \`  `--dataset alpaca-zh#20000 alpaca-en#20000 \`  `--lora_target_modules ALL \`  `--deepspeed default-zero2

微调后推理脚本：

# 推理``CUDA_VISIBLE_DEVICES=0 swift infer \`    `--ckpt_dir output/phi3_5-mini-instruct/vx-xxx/checkpoint-xxx \`    `--load_dataset_config true``   ``   ``# merge-lora 并使用vllm进行加速``CUDA_VISIBLE_DEVICES=0 swift infer \`    `--ckpt_dir output/phi3_5-mini-instruct/vx-xxx/checkpoint-xxx \`    `--load_dataset_config true --merge_lora true \`    `--infer_backend vllm

VLM微调

这里我们使用coco-en-mini作为示例数据集，该数据集的任务是对图片内容进行描述，展示可运行的demo。

您可以在 modelscope上找到该数据集：https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary

自定义数据集格式如下（单图、多图和无图）：

{"query": "<image>55555", "response": "66666", "images": ["image_path"]}``{"query": "eeeee<image>eeeee<image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}``{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "images": []}

微调脚本：

# 显存占用: 4 * 12GB``# 默认会将lora_target_modules设置为llm和projector所有的linear``CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \`  `--model_type phi3_5-vision-instruct \`  `--model_id_or_path LLM-Research/Phi-3.5-vision-instruct \`  `--sft_type lora \`  `--dataset coco-en-mini#20000 \`  `--deepspeed default-zero2

如果要使用自定义数据集，只需按以下方式进行指定：

  `--dataset train.jsonl \`  `--val_dataset val.jsonl \`

显存占用：

训练loss图（时间原因，只训练了450个step）：

微调后推理脚本如下：

# 推理``CUDA_VISIBLE_DEVICES=0 swift infer \`    `--ckpt_dir output/phi3_5-vision-instruct/vx-xxx/checkpoint-xxx \`    `--load_dataset_config true``   ``# merge-lora并推理``CUDA_VISIBLE_DEVICES=0 swift infer \`    `--ckpt_dir output/phi3_5-vision-instruct/vx-xxx/checkpoint-xxx \`    `--load_dataset_config true --merge_lora true \`    `--safe_serialization false

微调后模型对验证集进行推理的示例：

如何学习大模型 AI ？

由于新岗位的生产效率，要优于被取代岗位的生产效率，所以实际上整个社会的生产效率是提升的。

但是具体到个人，只能说是：

“最先掌握AI的人，将会比较晚掌握AI的人有竞争优势”。

这句话，放在计算机、互联网、移动互联网的开局时期，都是一样的道理。

我在一线互联网企业工作十余年里，指导过不少同行后辈。帮助很多人得到了学习和成长。

我意识到有很多经验和知识值得分享给大家，也可以通过我们的能力和经验解答大家在人工智能学习中的很多困惑，所以在工作繁忙的情况下还是坚持各种整理和分享。但苦于知识传播途径有限，很多互联网行业朋友无法获得正确的资料得到学习提升，故此将并将重要的AI大模型资料包括AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频免费分享出来。

在这里插入图片描述