在3090卡服务器上面进行funasr模型微调

最新推荐文章于 2025-06-20 11:53:30 发布

丹宇码农

最新推荐文章于 2025-06-20 11:53:30 发布

阅读量4.5k

点赞数 23

CC 4.0 BY-SA版权

分类专栏： AI 文章标签： AI ASR 模型微调

本文链接：https://blog.csdn.net/happyweb/article/details/139207333

AI 专栏收录该内容

7 篇文章

订阅专栏

文本记录了在3090卡上对实时asr模型进行微调的过程，包括数据准备、模型微调、验证微调后的模型。

一、参考文档：

https://github.com/alibaba-damo-academy/FunASR/blob/main/examples/industrial_data_pretraining/paraformer_streaming/README_zh.md

二、数据准备
数据格式：https://github.com/alibaba-damo-academy/FunASR/tree/main/data/list
wav文件为16KHz，16位的单声道音频数据。6秒到10秒之间的时长。
编写utf-8格式的csv文件，第一列为音频文件路径名，第二列为对应的音频文本。第一行为头部标签(Audio:FILE,Text:LABEL)。
具体参见下面数据集中的speech_asr_aishell_testsets.csv文件。

train_wav.scp 和 train_text.txt 生成 train.jsonl
val_wav.scp 和 val_text.txt 生成 val.jsonl

数据集：https://www.modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets/summary

data_process.py的源码如下：

import os
import csv
import sys
import shutil   # 拷贝文件

# type    要处理的数据集类型，可以是train,val,test,分别代表训练、验证、测试集
# max_num 最大数据条数,如果为0则无限制
# csv_file csv的文件路径名
# output_path .scp和text及wav文件输出目录
# wav_parent_path wan文件最终的输出父目录，比如最终微调的时候放到/home/data/目录下
def process_data(type, max_num, csv_file, output_path, wav_parent_path=''):
    if type != "train" and type != "val" and type != "test":
        print("type is invalid,train or val or test is valid data")
        return

    # 根据csv文件获取wav文件所在的目录
    data_path = os.path.dirname(csv_file)


    #写train_wav.scp
    wav_path = os.path.join(output_path,f"{type}_wav.scp")
    wav_file = open(wav_path, 'w', encoding='utf-8')

    #写train_text.txt
    text_path = os.path.join(output_path,f"{type}_text.txt")
    text_file = open(text_path, 'w', encoding='utf-8')

    # 以UTF-8编码打开CSV文件
    with open(csv_file, mode='r', encoding='utf-8', newline='') as csvfile:
        # 创建csv.reader对象
        reader = csv.reader(csvfile)
        # 读取标题行
        headers = next(reader)
        print(headers)

        # 读取每一行
        index = 0
        for row in reader:
            print(row)
            index += 1

            audio_file = row[0]
            text = row[1].replace(" ","")

            file_name_with_extension = os.path.basename(audio_file)
            file_name_without_extension, _ = os.path.splitext(file_name_with_extension)

            directory = os.path.dirname(audio_file)
            wav_output_dir = os.path.join(output_path, directory)
            # 检查目录是否存在
            if not os.path.exists(wav_output_dir):
                # 如果目录不存在，则创建目录
                os.makedirs(wav_output_dir)

            #将wav文件拷贝到输出目录
            src_path = os.path.join(data_path,audio_file)
            dest_path = os.path.join(output_path,audio_file)
            shutil.copyfile(src_path, dest_path)

            # 将wav的实际保存路径格式化为linux格式，因为最终微调都是在linux系统下
            if wav_parent_path.endswith("/"):
                audio_file = f"{wav_parent_path}{audio_file}"
            else:
                audio_file = f"{wav_parent_path}/{audio_file}"

            wav_file.write(f"{file_name_without_extension} {audio_file}\n")
            text_file.write(f"{file_name_without_extension} {text}\n")
            if max_num>0 and index >= max_num:
                break

    wav_file.close()
    text_file.close()

if __name__ == "__main__":
    process_data("train",20000, "./data/speech_asr_aishell_trainsets.csv", "./data/list/",'/home/data/asr_finetune')
    process_data("val", 10000, "./data/speech_asr_aishell_devsets.csv", "./data/list/",'/home/data/asr_finetune')
    #process_data("test", 5000, "./data/speech_asr_aishell_testsets.csv", "./data/list/",'/home/data/')

执行python data_process.py 处理数据，就是生成train_wav.scp,train_text.txt和val_wav.scp 和 val_text.txt及test_wav.scp 和 test_text.txt。
同时有wav文件目录：speech_asr_aishell_trainsets及speech_asr_aishell_devsets，其保存的是wav音频文件。
将生成的speech_asr_aishell_trainsets及speech_asr_aishell_devsets 拷贝/home/data/asr_finetune/目录下

测试数据集可以不用，不涉及微调。

.scp中的wav文件路径改成绝对路径，这样不容易出错。

三、微调
conda activate funasr
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -U funasr
pip install -U modelscope

pip install chardet

# 下载模型
cd /home/data/model
git clone https://www.modelscope.cn/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online.git

# 下载源码
cd /home/data/asr_finetune
git clone https://github.com/alibaba-damo-academy/FunASR.git

数据放到/home/data/asr_finetune/FunASR/data/list目录下面。

cd /home/data/asr_finetune/FunASR/examples/industrial_data_pretraining/paraformer_streaming
修改finetune.sh:
model_name_or_model_dir="/home/data/model/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online"
修改++dataset_conf.batch_size=2000，原来为20000，否则在2卡的3090服务器上面会提示cuda 显存不足。改完后2块卡各6.8G
根据情况修改max_epoch为实际期望的轮次。

确保模型存在，且路径正确，数据路径都正确。

bash finetune.sh
# "log_file: ./outputs/log.txt
如果去掉最后面那一行的 &> ${log_file}，可以输出信息到屏幕。

ls ./outputs/
下面存在 config.yaml model.pt model.pt.ep1 model.pt.ep1.2000 等文件

微调后生成的模型文件在： /home/data/asr_finetune/FunASR/examples/industrial_data_pretraining/paraformer_streaming/outputs
cd /home/data/asr_finetune/FunASR/examples/industrial_data_pretraining/paraformer_streaming/outputs

四、验证微调后的模型

(1).有configuration.json
假定，训练模型路径为：./model_dir，如果改目录下有生成configuration.json，只需要将上述模型推理方法中模型名字修改为模型路径即可

例如：
(1.1)从shell推理
python -m funasr.bin.inference ++model="./model_dir" ++input=="${input}" ++output_dir="${output_dir}"

(1.2)从python推理
from funasr import AutoModel
model = AutoModel(model="./model_dir")
res = model.generate(input=wav_file)
print(res)

(2)无configuration.json时
(2.1)方法一、利用原来模型的配置文件
# 将原来的模型拷贝一份，然后替换model.pt为刚微调的模型文件，其它配置文件还是用原来的
cp -r /home/data/model/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/ /home/data/model/test_finetune_asr_model/
cp model.pt /home/data/model/test_finetune_asr_model/

或者拷贝其它的模型文件也可以：
cp model.pt.ep3.2000 /home/data/model/test_finetune_asr_model/model.pt

# 推理，验证微调后的模型
cd /home/data/asr_finetune/
python asr_infere.py
asr_infere.py的源码如下：

# pip install soundfile
# pip install -U funasr
# #pip install -U modelscope

from funasr import AutoModel

chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model="/home/data/model/test_finetune_asr_model")

import soundfile
import os

wav_file = "BAC009S0764W0121.wav"
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
    is_final = i == total_chunk_num - 1
    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
    print(res)

(2.2)方法二、如果模型路径中无configuration.json时，需要手动指定具体配置文件路径与模型路径
python -m funasr.bin.inference \
--config-path "${local_path}" \
--config-name "${config}" \
++init_param="${init_param}" \
++tokenizer_conf.token_list="${tokens}" \
++frontend_conf.cmvn_file="${cmvn_file}" \
++input="${input}" \
++output_dir="${output_dir}" \
++device="${device}"
参数介绍

config-path：为实验中保存的 config.yaml，可以从实验输出目录中查找。
config-name：配置文件名，一般为 config.yaml，支持yaml格式与json格式，例如 config.json
init_param：需要测试的模型参数，一般为model.pt，可以自己选择具体的模型文件
tokenizer_conf.token_list：词表文件路径，一般在 config.yaml 有指定，无需再手动指定，当 config.yaml 中路径不正确时，需要在此处手动指定。
frontend_conf.cmvn_file：wav提取fbank中用到的cmvn文件，一般在 config.yaml 有指定，无需再手动指定，当 config.yaml 中路径不正确时，需要在此处手动指定。